open source software in libraries: a workshop by eric lease morgan open source software in libraries: a workshop by eric lease morgan this text/handout is a part of a hands-on workshop for teaching people in libraries about open source software. this text/handout is free software; you can redistribute it and/or modify it under the terms of the gnu general public license as published by the free software foundation; either version of the license, or (at your option) any later version. it is also distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. see the gnu general public license for more details. you should have received a copy of the gnu general public license along with this manual if not, write to the free software foundation, inc., temple place, suite , boston, ma - usa copyright eric lease morgan, october for possibly more up-to-date information about the workshop, see: http://infomotions.com/musings/ossnlibraries-workshop/ [http://infomotions.com/musings/ossnlibraries-workshop/] . url(http://infomotions.com/musings/ossnlibraries-workshop/) table of contents . introduction .......................................................... purpose and scope of this text/workshop .............................. . open source software in libraries ..................................... introduction......................................................... what is oss .......................................................... techniques for developing and implementing oss ....................... oss compared to librarianship ........................................ prominent oss packages ............................................... state of oss in libraries ............................................ national leadership .............................................. mainstreaming, workshops, and training ........................... usability and packaging .......................................... economic viability ............................................... redefining the ils ............................................... open source data ................................................. conclusion and next steps ............................................ notes................................................................ . gift cultures, librarianship, and open source software development .... gift cultures, librarianship, and open source software development ... acknowledgements..................................................... notes................................................................ . comparing open source indexers ........................................ abstract............................................................. indexers............................................................. freewais-sf...................................................... harvest.......................................................... ht://dig......................................................... isite/isearch.................................................... mps.............................................................. swish............................................................ webglimpse....................................................... yaz/zebra........................................................ local examples ....................................................... summary and information systems ...................................... links................................................................ . selected oss .......................................................... introduction......................................................... apache............................................................... cvs.................................................................. docbook stylesheets .................................................. fop.................................................................. gnu tools ............................................................ hypermail............................................................ koha................................................................. marc::record......................................................... mylibrary............................................................ mysql................................................................ perl................................................................. swish-e.............................................................. xsltproc............................................................. yaz and zebra ........................................................ . hands-on activities ................................................... introduction......................................................... installing and running perl .......................................... installing mysql ..................................................... installing apache .................................................... cvs.................................................................. hypermail............................................................ marc::record......................................................... iv swish-e.............................................................. yaz.................................................................. koha................................................................. mylibrary............................................................ xsltproc............................................................. . gnu general public license ............................................ preamble............................................................. gnu general public license terms and conditions for copying, distribu- tion and modification ................................................ no warranty .......................................................... open source software in libraries: a work- shop v chapter . introduction [./ossnlibraries-workshop.png] purpose and scope of this text/workshop this text is a part of a hands-on workshop intended to describe and illustrate open source software and its techniques to small groups of librarians. given this text, the accompanying set of software, and reasonable access to a (unix) computer, the student should be able to read the essays, work through the ex- ercises, and become familiar with open source software especially as it per- tains to libraries. i make no bones about it, this text is the combination of previous essays i've written about open source software as well as a couple of other newer items. for example, the second chapter is the opening chapter i wrote for a lita guide in ("open source software for libraries," in karen coyle, ed., open source software for libraries: an open source for libraries: chicago: american library association, pg. - .). the third chapter comparing open source software, gift cultures, and librarianship was originally formally published as a book review for information technology and libraries (volume , number , march ). the chapter on open source software indexers is definitely getting old. it was presented at the o'reilly open source convention, san diego, ca july - , . the following section is built from the content of a american libraries association annual conference presentation. the new materials are embodied in the list of selected software and the hands-on ac- tivities. i believe open source software is more about building communities and less about computer programs. it is more about making the world a better place and less about personal profit. allow me to explain. i have been giving away my software ever since steve cisler welcomed me into the apple library of tomorrow (alot) folds in the very late 's. through my associations with steve and alot i came to write a book about macintosh-based http servers as well as an applescript-based cgi script called email.cgi in . this simple little script was originally developed for two purposes. first and foremost it was intended to demonstrate how to write an applescript common gateway interface (cgi) application. second, it was intended to fill a gap in the web browsers of the time, namely the inability of macweb to support mailto url's. since then the script has evolved into an application taking the con- tents of an html form, formatting it, and sending the results to one or more email addresses. it works very much like a c program called cgiemail. as tcp utilities have evolved over the years so has email.cgi, and to this date i still get requests for technical support from all over the world, but almost invariably the messages start out something like this. "thank you so very much for email.cgi. it is a wonderful program, but..." that's okay. the program works and it has helped many people in many ways -- more ways than i am able to count because the vast majority of people never contacted me personally. as i was bringing this workbook together i thought about steve cisler again, and i remembered a conference apple computer sponsored in called ties that bind: converging communities. (a pretty bad travel log documenting my ex- periences at this conference is available at http://infomotions.com/travel/ties-that-bind- /.) in the conference we shared and discussed ideas about community and the ways technology can help make com- munities happen. in between a session cisler displayed the original piece of art that became the motif for the conference. he noted that he got the paint- ing in australia some time the previous year. he liked it for its simplicity and connectivity. the painting is acrylic, approximately ' " x " ", and is composed of many simple dots of color. the image at the top of the page is that piece of art, and it is significant today. it too is "a lot" (all puns intended) like open source software and the "the unix way." the value of open source software is measured in terms of its simplicity and connectivity. the simpler and more connective the software, the more it is valued. the unix way is a philosophy of computing. it posits that a computer program will take some input, do some processing, and provide some output. there is very little human interface to these sorts of programs be- cause they get their input from a thing called standard input (stdin) and send the output to a thing called standard output (stdout). if errors occur, errors are sent to standard error (sterr). since the applications are expected to get their input from stdin and send it to stout it is possible to string many to- gether to create a working application. connectivity. such a design philosophy allows tiny programs to focus on one thing, and one thing only. simplicity. this modular approach allows for the creation of new applications by adding or deleting older modules from the string. the motif brought to my attention by cisler is a lot like stringing together open source software applications. each individual dot does not do a whole lot on its own, but strung together they form a pattern. the pattern's whole is greater than the sum of its parts. this is true of communities as well. indi- viduals bring something to the community, and the community is made better for the contribution. the open source community exists because of individuals. these individuals have particular strengths (and weaknesses). as people add what they can to the community, the community is strengthened. the rewards for these contributions are rarely monetary. instead, the contributions are paid for with respect. people who give freely of themselves and their time are re- warded by the community as experts whose opinions are to be taken seriously. true, participation in open source software activities does not always put food on the table, but neither do other community-based activities our society values to one degree or another such as participation in community theater, helping out at the local soup kitchen, being involved in church activities, picking up litter, giving directions to a stranger, supporting charities, par- ticipating in fund-raisers, etc. open source software is about communities, communities that have been easier to create with the advent of globally net- worked computers. as described later, it is about "scratching an itch" to solve a problem, but it is also about giving "freely" to the community in the hopes that the community will be better off for it in the end. a few years after writing email.cgi, i participated in another application called mylibrary. this portal application grew out of a set of focus group in- terviews where faculty of the nc state university said they were suffering from information overload. in late , when these interviews were taking place, services like my yahoo, my excite, my netscape, and my dejanews were making their initial appearance. in the digital library initiatives depart- ment, where i worked keith morgan and doris sigl, we thought a similar appli- chapter . introduction cation based on library content (bibliographic databases, electronic journals, and internet resources) organized by subjects (disciplines) might prove to be a possible solution to the information overload problem. by prescribing sets of resources to specific groups of people we (the libraries) could offer fo- cused content as well as provide access to the complete world of available in- formation. since i relinquished my copyrights to the university and the software has been distributed under the gnu public license the software has been downloaded about times, mostly from academic libraries. the specific number of active developers is unknown, but many institutions who have downloaded the software have used it as a model for their own purposes. in most cases these institu- tions have taken the system's database structure and experimented with various interfaces and alternative services. such institutions include, but are not limited to the university of michigan, the california digital library, wheaton college, los alamos laboratory, lund university (sweden), the university of cattaneo (italy), and the university of new brunswick. numerous presentations have been given about mylibrary including venues such as harvard university, oxford university, the alberta library, the canadian library association, the acrl annual meeting, and asis. as i see it, there are three or four impediments restricting greater success of the project: system i/o, database restructuring, and technical expertise. mylibrary is essentially a database application with a web front-end. in order to distribute content data must be saved in the database. the question then is, "how will the data be entered?" right now it must be done by content providers (librarians), but the effort is tedious and as the number of biblio- graphic databases and electronic journals grow so does the tedium. lately i have been experimenting with the use of rdf as an import/export mechanism. by relying on some sort of xml standard the system will be able to divorce itself from any particular database application such as an opac and the system will be more able to share its data with other portal applications such as uportal, my netscape, or o'reilly's meerkat through something like rss. yet, the prob- lem still remains, "who is going to do the work?" this is a staffing issue, not necessarily a technical one. in order to facilitate the needs a wider audience, the underlying database needs to be restructured. for example, the databases contains tables for bib- liographic databases, electronic journals, and "reference shelf" items. each of the items in these tables are classified using a set of controlled vocabu- lary terms called disciplines. many institutions want to create alternative data types such as images, associations, or internet resources. presently, do accomplish this task oodles of code must be duplicated bloating the underlying perl module. instead a new table needs to be created to contain a new con- trolled vocabulary called "formats". once this table is created all the infor- mation resources could be collapsed into a single table and classified with the new controlled vocabulary as well as the disciplines. furthermore, a third controlled vocabulary -- intended audience -- could be created so the re- sources could be classified even further. given such a structure the system could be more exact when it comes to initially prescribing resources and al- lowing users to customize their selections. again, the real problem here is not necessarily technical but intellectual. librarians make judgments about resources in terms of the resource's aboutness, intended audience, and format all the time but rarely on such a large scale, systematic basis. our present cataloging methods do not accommodate this sort of analysis, and how will such analysis get institutionalized in our libraries? the comparitavly low level of technical expertise in libraries is also a bar- rier to wider acceptance of the system. mylibrary runs. it doesn't crash nor hang. it does not output garbage data. it works as advertised, but to install the program initially requires technical expertise beyond the scope of most libraries. it requires the installation of a database program. mysql is the current favorite, but there are all sort of things that can go wrong with a mysql installation. similarly, mylibrary is written in perl. installing perl chapter . introduction from source usually requires answering a host of questions about your com- puter's environment, and in all nine or ten years of compiling perl i still don't know what some of those questions mean and i simply go with the de- faults. then there are all the perl modules mylibrary requires. they are a real pain to install, and unless you have done these sorts of installs before the process can be quite overwhelming. in short, getting mylibrary installed is not like the microsoft wizard process; you have to know a lot about your host computer before you can even get it up and running and most libraries do not employ enough people with this sort of expertise to make the process com- fortable. this workbook brings together much of my experience with open source software. it describes sets of successful open source software projects and tries to enumerate the qualities of successful project. the workbook has been in the hopes people will read it, give the exercises a whirl, learn from the experi- ence, and share their newly acquired expertise with the world at large. through this process i hope we can make the world we live in just a little bit better place. idealist? maybe. a worthy goal? definitely. chapter . introduction chapter . open source software in libraries introduction this guide is an introduction to open source software in libraries, with de- scriptions of a variety of software packages and successful library projects. but before we get to the software itself, i want to describe the principles and techniques of open source software (oss) and explain why i advocate the adoption of oss in the implementation of library services and collections. as you will see, there are many shared principles between oss and librarian- ship, especially the free and equal access to information. because of the freedom we gain with the use of oss is it possible to have greater control over the ways computers function and therefore greater control over how li- braries operate. anybody who works with computers on a daily basis can con- tribute to oss because things like information architecture, usability test- ing, documentation, and staffing are key skills required for successful projects, and these skills are inherent in the people who use computers as a primary tool in their work. the implementation of oss in libraries represents a method for improving library services and collections. what is oss oss is both a philosophy and a process. as a philosophy it describes the in- tended use of software and methods for its distribution. depending on your perspective, the concept of oss is a relatively new idea being only four or five years old. on the other hand, the gnu software project -- a project advo- cating the distribution of "free" software -- has been operational since the mid ' 's. consequently, the ideas behind oss have been around longer than you may think. it begins when a man named richard stallman worked for mit in an environment where software was shared. in the mid ' 's stallman resigned from mit to begin developing the gnu -- a software project intended to create an operating system much like unix. (gnu is pronounced "guh-new" and is a recur- sive acronym for gnu's not unix.) his desire was to create "free" software, but the term "free" should be equated with freedom, and as such people who use "free" software should be: . free to run the software for any purpose . free to modify the software to suit their needs . free to redistribute of the software gratis or for a fee . free to distribute modified versions of the software put another way the term "free" should be equated with the latin word "lib- erat" meaning to liberate, and not necessarily "gratis" meaning without return made or expected. in the words of stallman, we should "think of 'free' as in 'free speech,' not as in 'free beer.'"[ ] fast forward to the late ' 's after linus torvalds successfully develops linux, a "free" operating system on par with any commercial unix distribution. fast forward to the late ' 's when globally networked computers are an every day reality and the .com boom is booming. there you will find the birth of the term "open source" and it is used to describe how software is licensed: • the license shall not restrict any party from selling or giving away soft- ware • the program shall include source code and must allow distribution of the code • the license shall allow modifications and derived work of the software • the license may restrict redistribution only if patches (fixes) are in- cluded • the license may not discriminate against any person or group of persons • the license may not restrict how the software is used • the rights attached to the program must apply to all whom the software is redistributed • the license must not be specific to a product • the license must not contaminate other software by place restrictions on it [ ] techniques for developing and implementing oss oss is also a process for the creation and maintenance of software. this is not a formalized process, but rather a process of convention with common char- acteristics between software projects. first and for most, the developer of a software project almost always is trying to solve a specific computer problem commonly called "scratching an itch." the developer realizes other people may have the same problem(s), and consequently the developer makes the project's source code available on the 'net in the hopes other people can use it too. if there seems to be a common need for the software, a mailing list is usually created to facilitate communication, and the list is hopefully archived for future reference. since the software is almost always in a state of flux, de- velopers need some sort of version control software to help manage the project's components. the most common version control software is called cvs (concurrent versions system). co-developers then "hack away" at the project adding features they desire and/or fixing bugs of previous releases. as these features and fixes are created the source code's modifications, in the form of "diff" files -- specialized files explicitly listing the differences between two sets of programming code -- are sent back to the project's leader. the leader examines the diff files, assesses their value, and decides whether or not to integrate them into the master archive. the cycle then begins anew. much of a project's success relies on the primary developer's ability to fos- ter communication and a sense of community around a project. once accomplished the "two heads are better then one" philosophy takes effect and the project matures. writing computer programs is only one part of the software development. soft- ware development also requires things such as usability testing, documenta- tion, beta-testing, and a knowledge of staff issues. consequently, in any en- vironment where computers are used on a daily basis are places where the tech- niques of oss can be practiced. knowledge of computer programming is not nec- essary. in fact, a lack of computer programming is desireable. you do not have to know how to write computer programs in order to participate in oss develop- ment. anybody who uses computers on a daily basis can help develop oss. for example, you can be a beta-tester who tries to use the software and finds its faults. you can write documentation instructing people how to use the software. you can conduct usability tests against the software discovering how easy the chapter . open source software in li- braries software is to use or not use, and how it meets people's expectations. if com- puter software is intended to make our lives easier, you can evaluate the use of the software and see what sorts of things can be eliminated or how re- sources can be reallocated in order to run operations more efficiently. all of these things have nothing to do with computer programming, but rather, the use of computers in a work place. oss compared to librarianship one the most definitive sets of writings describing oss is eric raymond's the cathedral and the bazaar.[ ] these texts, available online as well as in book form, compare and contrast the software development processes of monolithic organizations (cathedrals) with the software processes of less structured, more organic collections of "hackers" (bazaars).[ ] the book describes the en- vironment of free software and tries to explain why some programers are will- ing to give away the products of their labors. it describes the "hacker mi- lieu" as a "gift culture": gift cultures are adaptations not to scarcity but to abundance. they arise in populations that do not have significant material scarcity problems with survival goods. we can observe gift cul- tures in action among aboriginal cultures living in ecozones with mild climates and abundant food. we can also observe them in cer- tain strata of our own society, especially in show business and among the very wealthy.[ ] raymond alludes to the definition of "gift cultures", but not enough to sat- isfy my curiosity. the literature, more often than not, refers to information about "gift exchange" and "gift economies" as opposed to "gift cultures." probably one of the earliest and more comprehensive studies of gift exchange was written by marcell mauss.[ ] in his analysis he says gifts, with their three obligations of giving, receiving, and repaying, are in aspects of almost all societies. the process of gift giving strengthens cooperation, competi- tiveness, and antagonism. it reveals itself in religious, legal, moral, eco- nomic, aesthetic, morphological, and mythological aspects of life.[ ] as gregory states, for the industrial capitalist economies, gifts are nothing but presents or things given, and "that is all that needs to be said on the matter." ironically for economists, gifts have value and consequently have im- plications for commodity exchange.[ ] he goes on to review studies about gift giving from an anthropological view, studies focusing on tribal communities of various american indians, cultures from new guinea and melanesia, and even an- cient roman, hindu, and germanic societies: the key to understanding gift giving is apprehension of the fact that things in tribal economics are produced by non-alienated labor. this creates a spe- cial bond between a producer and his/her product, a bond that is broken in a capitalistic society based on alienated wage-labor.[ ] ingold, in "introduction to social life" echoes many of the things summarized by gregory when he states that industrialization is concerned: exclusively with the dynamics of commodity production. ... clearly in non- industrial societies, where these conditions do not obtain, the significance of work will be very different. for one thing, people retain control over their own capacity to work and over other productive means, and their activi- ties are carried on in the context of their relationships with kin and commu- nity. indeed their work may have the strengthening or regeneration of these relationships as its principle objective.[ ] in short, the exchange of gifts forges relationships between partners and em- phasizes qualitative as opposed to quantitative terms. the producer of the chapter . open source software in li- braries product (or service) takes a personal interest in production, and when the product is given away as a gift it is difficult to quantify the value of the item. therefore the items exchanged are of a less tangible nature such as obligations, promises, respect, and interpersonal relationships. as i read raymond and others i continually saw similarities between librarian- ship and gift cultures, and therefore similarities between librarianship and oss development. while the summaries outlined above do not necessarily mention the "abundance" alluded to by raymond, the existence of abundance is more than mere speculation. potlatch, a ceremonial feast of the american indians of the northwest coast marked by the host's lavish distribution of gifts or sometimes destruction of property to demonstrate wealth and generosity with the expecta- tion of eventual reciprocation, is an excellent example. libraries have an abundance of data and information. i won't go into whether or not they have an abundance of knowledge or wisdom of the ages. that is an- other essay. libraries do not exchange this data and information for money; you don't have to have your credit card ready as you leave the door. libraries don't accept checks. instead the exchange is much less tangible. first of all, based on my experience, most librarians just take pride in their ability to collect, organize, and disseminate data and information in an effective man- ner. they are curious. they enjoy learning things for learning's things sake. it is a sort of platonic end in itself. librarians, generally speaking, just like what they do and they certainly aren't in it for the money. you won't get rich by becoming a librarian. even free information is not without financial costs. information requires time and energy to create, collect, and share, but when an information ex- change does take place, it is usually intangible, not monetary, in nature. in- formation is intangible. it is difficult to assign information a monetary value, especially in a digital environment where it can be duplicated effort- lessly: an exchange process is a process whereby two or more individuals (or groups) exchange goods or services for items of value. in library land, one of these individuals is almost always a librarian. the other individuals include tax payers, students, faculty, or in the case of special libraries, fellow employ- ees. the items of value are information and information services exchanged for a perception of worth -- a rating valuing the services rendered. this percep- tion of worth, a highly intangible and difficult thing to measure, is some- thing the user of library services "pays", not to libraries and librarians, but to administrators and decision-makers. ultimately, these payments manifest themselves as tax dollars or other administrative support. as the perception of worth decreases so do tax dollars and support. [ ] therefore when information exchanges take place in libraries librarians hope their clientele will support the goals of the library to administrators when issues of funding arise. librarians believe that "free" information ("think free speech, not free beer") will improve society. it will allow people to grow spiritually and intellectually. it will improve humankind's situation in the world. libraries are only perceived as beneficial when they give away this data and information. that is their purpose, and they, generally speaking, do this without regards to fees or tangible exchanges. in many ways i believe oss development, as articulated by raymond, is very similar to the principles of librarianship. first and foremost with the idea of sharing information. both camps put a premium on open access. both camps are gift cultures and gain reputation by the amount of "stuff" they give away. what people do with the information, whether it be source code or journal ar- ticles, is up to them. both camps hope the shared information will be used to improve our place in the world. just as jefferson's informed public is a ne- cessity for democracy, oss is necessary for the improvement of computer appli- cations. chapter . open source software in li- braries second, human interactions are a necessary part of the mixture in both librar- ianship and open source development. open source development requires people skills by source code maintainers. it requires an understanding of the problem the computer application is trying to solve, and the maintainer must assimi- late patches with the application. similarly, librarians understand that in- formation seeking behavior is a human process. while databases and many "digi- tal libraries" house information, these collections are really "data stores" and are only manifested as information after the assignment of value are given to the data and inter-relations between datum are created. third, it has been stated that open source development will remove the neces- sity for programers. yet raymond posits that no such thing will happen. if anything, there will an increased need for programmers. similarly, many li- brarians feared the advent of the web because they believed their jobs would be in jeopardy. ironically, librarianship is flowering under new rubrics such as information architects and knowledge managers. oss also works in a sort of peer review environment. as raymond states, "given enough eyeballs, all bugs are shallow." since the source code to oss is avail- able for anybody to read, it is possible to examine exactly how the software works. when a program is written and a bug manifests itself, there are many people who can look at the program, see what it is doing, and offer sugges- tions or fixes. instead of relying on marketing hype to promote an application, oss relies on its ability to satisfy particular itches to gain prominence. the better a piece of software works, the more people are likely to use it. user endorse- ments are usually the way oss is promoted. the good pieces of software float to the top because they are used the most often. the ones that are poorly written or do not satisfy enough itches sink to the bottom. in a peer review process many people look at an article and evaluate its va- lidity. during this evaluation process the reviews point out deficiencies in the article and suggest improvements. the reviewers are usually anonymous but authoritative. the evaluation of oss often works in the same vein. software is evaluated by self-selected reviewers. these people examine all aspects of the application from the underlying data structures, to the way the data is manip- ulated, to the user interface and functionality, to the documentation. these people then offer suggestions and fixes to the application in an effort to en- hance and improve it. some people may remember the "homegrown" integrated library systems developed in the ' 's and ' 's, and these same people may wonder how oss is different from those humble beginnings. there are two distinct differences. the first is the present-day existence of the internet. this global network of computers enables people to communicate over much greater distances and it is much less expensive than twenty-five years ago. consequently, developers are not as iso- lated as they once were, and the flow of ideas travels more easily between de- velopers -- people who are trying to scratch that itch. yes, there were tele- phone lines and modems but the processes for using them was not as seemlessly integrated into the computing environment (and there were always long-distance communications charges to contend with.[ ]) second, the state of computer technology and its availability has dramatically increased in the past twenty-five years. twenty-five years ago computers, es- pecially the sorts of computers used for large-scale library operations, were almost always physically large, extremely expensive, remote devices whose ac- cess was limited to a group of few specialized individuals. now-a-days, the computers on most people's desktops have enough ram, cpu horsepower, and disk space to support the college campus of twenty-five years ago.[ ] in short, the oss development process is not like the homegrown library sys- tems of the past simply because there are more people with more computers who are able to examine and explore the possibilities of solving more computing chapter . open source software in li- braries problems. in the times of the homegrown systems people were more isolated in their development efforts and more limited in their choice of computing hard- ware and software resources. prominent oss packages there are quite a number of mainstream oss applications. many of these appli- cations literally run the internet or are used for back-end support. the apache project is one of the more notable (www.apache.org). apache is a world wide web (http) server. it started out its life in the mid ' 's as ncsa's httpd application, the web server beneath the first graphical web browser. the name for the application -- apache -- is a play on words. it has nothing to do with indians. instead, in an effort to write a more modular computer program, the original httpd application was rewritten as a set of parts, or patches, and consequently the application is called "a patchy server." few experts would doubt the popularity of the apache server. according to netcraft, more http servers are apache http server than any other kind. [ ] mysql is a popular relational database application. it is very often used to support database-driven websites. it adhears to the sql standard while adding a number of features of its own (as does oracle and other database vendors). mysql is known for its speed and stability. the canonical address for mysql is www.mysql.org. sendmail is an email (smtp) server used on the vast majority of unix comput- ers. this application, developed quite a number of years ago is responsible for trafficing much of the email messages sent throughout the world. sendmail is a good example of an application supported by both a commercial institution as well as a non-profit organization. there is a free version of sendmail, complete with source code, as well as a commercial version that comes with formal support. see www.sendmail.org. bind is an acronym for the berkeley internet name domain, a program converting internet protocol (ip) numbers, such as . . . into human-readable names such as www.apple.com. it is a sort like an old fashioned switchboard operator associating telephone numbers with the telephones in people's homes. bind is supported by the internet software consortium at www.isc.org. perl is a programming language written by larry wall in the late ' 's. it too runs much of the internet since it is used as the language of many common gateway interface (cgi) scripts of the internet. wall originally created perl to help him do systems administration task, but the language worked so well others adopted it and it has grown significantly. perl is supported at www.perl.com. linux is the most familiar oss application. this program is really an operat- ing system -- a program directly responsible converting human-readable com- mands into computer (machine) language. it is the software that really makes computers run. linux was originally conceived by linus torvols in the late ' 's because he wanted to run a unix-sort of operating system on intel-based computer. linux is becoming increasingly popular with many information tech- nology (it) professionals as an alternative to windows-based server applica- tions or proprietary versions of unix. see www.linux.org. state of oss in libraries daniel chudnov has been the library profession's oss evangelist for the past three or four years. he is also the original author of the open source program jake (jointly administered knowledge environment). chudnov has done a lot to raise the awareness of oss in libraries. to that end he and others help main- tain a website called oss lib (www.oss lib.org). the site lists library-re- lated applications including applications for document delivery, z . clients and servers, systems to manage collections, marc record readers and chapter . open source software in li- braries writers, integrated library system, and systems to read and write bibliogra- phies. for more information visit oss lib and subscribe to the mailing list. the state of oss in libraries is more than sets of computer programs. it also includes the environment where the software is intended to be used -- a so- cio-economic infrastructure. any computing problem can be roughly divided into % technology issues and % people issues. it is this % of the problem that concerns us here. given the current networked environment, the affinity of oss development to librarianship, and the sorts of projects enumerated above what can the library profession do to best take advantage of the cur- rently available oss? i posed this question to the oss lib mailing list in april and may of and it generated a lively discussion. [ ] a number of themes presented themselves, each of which are elaborated upon below. national leadership one of the strongest themes was the need for a national leader. it was first articulated by david dorman as the osln (open source library network). karen coyle and aaron trehab elaborated on this idea by suggesting organizations such as ala/lita, the dlf, oclc, or rlg help fund and facilitate methods for providing credibility, publicity, stability, and coordination to library-based oss projects. mainstreaming, workshops, and training along theses same lines was the expressed desire for the mainstreaming of oss articulated by carol erkens, rachel cheng, and peter schlumpf. this main- streaming process would include presentations, workshops, and training ses- sions on local, regional, and national levels. these activities would describe and demonstrate open sources software for libraries. they would enumerate the advantages and disadvantages of open sources software. they would provide ex- tensive instructions on the staffing, installation, and maintenance issue of oss. usability and packaging in its present state, open sources software is much like microcomputer comput- ing of the ' 's as stated by blake carver. it is very much a build it your- self enterprise; the systems are not very usable when it comes to installa- tion. this point was echoed by cheng who recently helped facilitate a nercomp workshop on oss. peter schulmpf points to the need for easier installation methods so maintainers of the system can focus on managing content and not software. using oss should not be like owning an automobile in the 's. "i shouldn't necessarily need to know how to fix it in order to make it go." economic viability oss needs to be demonstrated as an economically viable method of supporting software and systems. this was pointed out by eric schnell and david dorman. libraries have spent a lot of time, effort, and money on resource sharing. why not pool these same resources together to create software satisfying our pro- fessional needs? oss is not like the "homegrown" systems. spaghetti code and goto statements should be a thing of the past. more importantly, a globally networked computer environment provides a means of sharing expertise in a man- ner not feasible twenty-five years ago. we need to demonstrate to administra- tors and funding sources that money spent developing software empowers our collective whole. it is an investment in personnel and infrastructure. oss is not a fad, yet is will not necessarily replace commercial software. on the other hand, oss offers opportunities not necessarily available from the com- mercial sector. redefining the ils chapter . open source software in li- braries there are many open source library application available today. each satisfies a particular need. maybe each of these individual applications can be brought together into a collective, synergistic whole as described by jeremy frumkin and we could redefine the integrated library system. presently our ils's man- age things like books pretty well. with the addition of fields in marc records they are beginning to assist in the management of networked resources, but libraries are more than books and networked resources. libraries are about services too: reserves, reading lists, bibliographies, reader advisory ser- vices of many types, current awareness, reference, etc. maybe the existing oss can be glued together to form something more holistic resulting in a sum greater than its parts. this is also an opportunity, as described by schnell, for vendors to step in and provide such integration including installation, documentation, and training. open source data oss relates to data as well as systems as described by krichel. the globally networked computer environment allows us to share data as well as software. why not selectively feed url's to internet spiders to create our own, sub- ject-specific indexes? why not institutionalize services like the open direc- tory project or build on the strength of infomine to share records in a manner similar to the manner of oclc? conclusion and next steps this essay has described what oss is and it compared oss to the principles of librarianship. the balance of the book details particular systems of oss for libraries. after reading this book i hope you go away understanding at least one thing. oss provides the means for the profession to take greater control over the ways computers are used in libraries. oss is free, but it is free in the same way freedom exists in a democracy. with freedom comes choice. with freedom comes the ability to manifest change. at the same time, freedom comes at a price, and that price is responsibility. oss puts its users in direct control of computer operations, and this control costs in terms of account- ability. when the software breaks down, you will be responsible for fixing it. fortunately, there is a large network at your disposal, the internet, not to mention the creator of the software who has the same problems you do and has most likely previously addressed the same problem. open source provides the means to say, "we are not limited by our licensed software because we have the ability to modify the software to meet our own ends." instead of blaming ven- dors for supporting bad software, instead of confusing the issues with con- tractual agreements and spending tens of thousands of dollars a year for ser- vices poorly rendered, oss offers an alternative. be realistic. oss is free, but not without costs. this being the case, what sorts of things need to happen for oss to become a more viable computing option in libraries? what are the next steps? the steps fall into two categories: ) making people more aware of oss and ) improving the characteristics of oss. librarians need to become more aware of the options oss provides. this can be done in a number of ways. for example, a formal study analyzing the desirabil- ity and feasibility of libraries making a formal commitment to oss might demonstrate to other libraries the benefits of oss. library boards and direc- tors need feel comfortable commiting funds to oss installation and develop- ment, but before doing so the boards and directors need to know what oss is and how its principles can be applied in libraries. by mentoring existing li- brarians to become more computer literate the concepts of oss will become eas- ier to understand. similarly, by mentoring librarians to be more aware of the ways of administration these same librarians will have more authority to make decisions and direct energies to oss development. all librarians should not be afraid of the idea of open sources software because they think computer pro- gramming experience is necessary. there is much more to software development chapter . open source software in li- braries than writing computer programs. simple training exercises will also make more people aware of the potential of open sources software. finally, communication -- testimonials -- will help disseminate the successes, as well as failures, of oss. oss itself needs to be improved. the installation processes of oss are not as simple as the installation procedures of commercial software. this is area that needs improvement, and if done, fewer people would be intimidated by the installation process. additionally, there are opportunities for commercial in- stitutions to support oss. these institutions, like red hat or o'reilly & as- sociates, could provide services installing, documenting, and trouble shooting oss. these institutions would not be selling the software itself, but services surrounding the software. the principles of oss of very similar to the principles of librarianship. let's take advantage of these principles and use them to take more control of over our computing environments. notes . the ideas behind gnu software and its definition as articulated by richard stallman can be found at http://www.gnu.org/philosophy/free-sw.html. accessed april , . . much of the preceeding section was derived from dave bretthaur's excellent article, "oss: a history" in information technology and libraries ( ) march, . pg. - . . the cathedral and the bazaar is also available online at http://www.tuxedo.org/~esr/writings/cathedral-bazaar/. accessed april , . . it is important to distinguish here the difference between a "hacker" and a "cracker". as defined by raymond, a hacker is person who writes computer pro- grams because they are "scratching an itch" -- trying to solve a particular computer problem. this definition is contrasted with the term "cracker" denot- ing a person who maliciously tries to break computer systems. in raymond's eyes, hacking is a noble art, cracking is immoral. it is unfortunate, the dis- tinction between hacking and cracking seems to have been lost on the general population. . raymond, e.s., the cathedral and the bazaar: musings on linux and open source by an accidental revolutionary. st ed. , [sebastopol, ca]: o'reilly. pg. . . mauss, m., the gift; forms and functions of exchange in archaic societies. the norton library, n . , new york: norton. . lukes, s., mauss, marcel, in international encyclopedia of the social sci- ences, d.l. sills, editor. , macmillan: [new york] volume , pg. . . gregory, c.a, "gifts" in eatwell, j., et al., the new palgrave : a dictio- nary of economics. , new york: stockton press. volume , pg. . . ibid. . ingold, t., introduction to social life, in companion encyclopedia of an- thropology, t. ingold, editor. , routledge: london ; new york. p. . . morgan, e.l., "marketing future libraries", http://www.infomotions.com/musings/marketing/. accessed april , . . as an interesting aside, read "stalking the wily hacker" by clifford stoll chapter . open source software in li- braries in the communications of the acm may ( ) pg. . the essay describes how clifford tracked a hacker via a cent error in his telephone bill. it is on the web in many places. try http://eserver.org/cyber/stoll .txt. accessed april , . it is believed a past chairman of ibm, thomas watson, said in , "i think there is a world market for maybe five computers." . see http://www.netcraft.com for more information. accessed april , . . an archive of the oss lib mailing list is available at this url http://www.geocrawler.com/lists/ /sourceforge/ / /. accessed april , . chapter . open source software in li- braries chapter . gift cultures, librarianship, and open source software development gift cultures, librarianship, and open source soft- ware development this short essay examines more closely the concept of a "gift culture" and how it may or may not be related to librarianship. after this examination and with a few qualifications, i still believe my judgments about open source software and librarianship are true. open source software development and librarianship have a number of similarities -- both are examples of gift cultures. i have recently been reading a book about open source software development by eric raymond. [ ] the book describes the environment of free software and tries to explain why some programers are willing to give away the products of their labors. it describes the "hacker milieu" as a "gift culture": gift cultures are adaptations not to scarcity but to abundance. they arise in populations that do not have significant material scarcity problems with sur- vival goods. we can observe gift cultures in action among aboriginal cultures living in ecozones with mild climates and abundant food. we can also observe them in certain strata of our own society, especially in show business and among the very wealthy. [ ] raymond alludes to the definition of "gift cultures", but not enough to sat- isfy my curiosity. being the good librarian, i was off to the reference de- partment for more specific answers. more often than not, i found information about "gift exchange" and "gift economies" as opposed to "gift cultures." (yes, i did look on the internet but found little.) probably one of the earliest and more comprehensive studies of gift exchange was written by marcell mauss. [ ] in his analysis he says gifts, with their three obligations of giving, receiving, and repaying, are in aspects of almost all societies. the process of gift giving strengthens cooperation, competi- tiveness, and antagonism. it reveals itself in religious, legal, moral, eco- nomic, aesthetic, morphological, and mythological aspects of life. [ ] as gregory states, for the industrial capitalist economies, gifts are nothing but presents or things given, and "that is all that needs to be said on the matter." ironically for economists, gifts have value and consequently have im- plications for commodity exchange. [ ] he goes on to review studies about gift giving from an anthropological view, studies focusing on tribal communities of various american indians, cultures from new guinea and melanesia, and even an- cient roman, hindu, and germanic societies: the key to understanding gift giving is apprehension of the fact that things in tribal economics are produced by non-alienated labor. this creates a spe- cial bond between a producer and his/her product, a bond that is broken in a capitalistic society based on alienated wage-labor.[ ] ingold, in "introduction to social life" echoes many of the things summarized by gregory when he states that industrialization is concerned: exclusively with the dynamics of commodity production. ... clearly in non- industrial societies, where these conditions do not obtain, the significance of work will be very different. for one thing, people retain control over their own capacity to work and over other productive means, and their activi- ties are carried on in the context of their relationships with kin and commu- nity. indeed their work may have the strengthening or regeneration of these relationships as its principle objective. [ ] in short, the exchange of gifts forges relationships between partners and em- phasizes qualitative as opposed to quantitative terms. the producer of the product (or service) takes a personal interest in production, and when the product is given away as a gift it is difficult to quantify the value of the item. therefore the items exchanged are of a less tangible nature such as obligations, promises, respect, and interpersonal relationships. as i read raymond and others i continually saw similarities between librarian- ship and gift cultures, and therefore similarities between librarianship and open source software development. while the summaries outlined above do not necessarily mention the "abundance" alluded to by raymond, the existence of abundance is more than mere speculation. potlatch, "a ceremonial feast of the american indians of the northwest coast marked by the host's lavish distribu- tion of gifts or sometimes destruction of property to demonstrate wealth and generosity with the expectation of eventual reciprocation", is an excellent example. [ ] libraries have an abundance of data and information. (i won't go into whether or not they have an abundance of knowledge or wisdom of the ages. that is an- other essay.) libraries do not exchange this data and information for money; you don't have to have your credit card ready as you leave the door. libraries don't accept checks. instead the exchange is much less tangible. first of all, based on my experience, most librarians just take pride in their ability to collect, organize, and disseminate data and information in an effective man- ner. they are curious. they enjoy learning things for learning's things sake. it is a sort of platonic end in itself. librarians, generally speaking, just like what they do and they certainly aren't in it for the money. you won't get rich by becoming a librarian. information is not free. it requires time and energy to create, collect, and share, but when an information exchange does take place, it is usually intan- gible, not monetary, in nature. information is intangible. it is difficult to assign it a monetary value, especially in a digital environment where it can be duplicated effortlessly: an exchange process is a process whereby two or more individuals (or groups) exchange goods or services for items of value. in library land, one of these individuals is almost always a librarian. the other individuals include tax payers, students, faculty, or in the case of special libraries, fellow employ- ees. the items of value are information and information services exchanged for a perception of worth -- a rating valuing the services rendered. this percep- tion of worth, a highly intangible and difficult thing to measure, is some- thing the user of library services "pays", not to libraries and librarians, but to administrators and decision-makers. ultimately, these payments manifest themselves as tax dollars or other administrative support. as the perception of worth decreases so do tax dollars and support. [ ] therefore when information exchanges take place in libraries librarians hope their clientele will support the goals of the library to administrators when issues of funding arise. librarians believe that "free" information ("think free speech, not free beer") will improve society. it will allow people to grow spiritually and intellectually. it will improve humankind's situation in the world. libraries are only perceived as beneficial when they give away this data and information. that is their purpose, and they, generally speaking, do this without regards to fees or tangible exchanges. in many ways i believe open source software development, as articulated by raymond, is very similar to the principles of librarianship. first and fore- most with the idea of sharing information. both camps put a premium on open access. both camps are gift cultures and gain reputation by the amount of "stuff" they give away. what people do with the information, whether it be chapter . gift cultures, librarianship, and open source software development source code or journal articles, is up to them. both camps hope the shared in- formation will be used to improve our place in the world. just as jefferson's informed public is a necessity for democracy, open source software is neces- sary for the improvement of computer applications. second, human interactions are a necessary part of the mixture in both librar- ianship and open source development. open source development requires people skills by source code maintainers. it requires an understanding of the problem the computer application is trying to solve, and the maintainer must assimi- late patches with the application. similarly, librarians understand that in- formation seeking behavior is a human process. while databases and many "digi- tal libraries" house information, these collections are really "data stores" and are only manifested as information after the assignment of value are given to the data and inter-relations between datum are created. third, it has been stated that open source development will remove the neces- sity for programers. yet raymond posits that no such thing will happen. if anything, there will an increased need for programmers. similarly, many li- brarians feared the advent of the web because they believed their jobs would be in jeopardy. ironically, librarianship is flowering under new rubrics such as information architects and knowledge managers. it has also been brought to my attention by kevin clarke (kevin_clarke@unc.edu) that both institutions use peer-review: your cultural take (gift culture) on "open source" is interesting. i've been mostly thinking in material terms but you are right, i think, in your assess- ment. one thing you didn't mention is that, like academic librarians, open source folks participate in a peer-review type process. all of this is happening because of an information economy. it sure is an ex- citing time to be a librarian, especially a librarian who can build relational databases and program on a unix computer. acknowledgements thank you to art rhyno (arhyno@server.uwindsor.ca) who encouraged me to post the original version of this text. notes . raymond, e.s., the cathedral and the bazaar : musings on linux and open source by an accidental revolutionary. st ed. , [sebastopol, ca]: o'reilly. . ibid. pg. . . mauss, m., the gift; forms and functions of exchange in archaic societies. the norton library, n . , new york: norton. . lukes, s., mauss, marcel, in international encyclopedia of the social sci- ences, d.l. sills, editor. , macmillan: [new york] volume , pg. . . gregory, c.a, "gifts" in eatwell, j., et al., the new palgrave : a dictio- nary of economics. , new york: stockton press. volume , pg. . . ibid. . ingold, t., introduction to social life, in companion encyclopedia of an- thropology, t. ingold, editor. , routledge: london ; new york. p. . . merriam-webter online dictionary, http://search.eb.com/cgi-bin/dictionary?va=potlatch chapter . gift cultures, librarianship, and open source software development . morgan, e.l., marketing future libraries, http://www.lib.ncsu.edu/staff/morgan/cil/marketing/ chapter . gift cultures, librarianship, and open source software development chapter . comparing open source indexers abstract this text compares and contrasts the features and functionality of various open source indexers: freewais-sf, harvest, ht://dig, isite/isearch, mps, swish, webglimpse, and yaz/zebra. as the size of information systems increase so does the necessity of providing searchable interfaces to the underlying data. indexing content and implementing an html form to search the index is one way to accomplish this goal, but all indexers are not created equal. this case study enumerates the pluses and minuses of various open source indexers currently available and makes recommendations on which indexer to use for what purposes. finally, this case study will make readers aware that good search interfaces alone to not make for good information systems. good information systems also require consistently applied subject analysis and well structured data. indexers below are a few paragraphs about each of the indexers reviewed here. they are listed in alphabetical order. freewais-sf of the indexes reviewed here, freewais-sf is by far the grand daddy of the crowd, and the predecessor isite/isearch, swish, and mps. yet, freewais-sf is not really the oldest indexer because it owes its existence to wais originally developed by brewster kahle of thinking machines, inc. as long ago as or . freewais-sf supports a bevy of indexing types. for example, it can easily in- dex unix mbox files, text files where records are delimited by blank lines, html files, as well as others. sections of these text files can be associated with fields for field searching through the creation "format files" -- config- uration files made up of regular expressions. after data has been indexed it can be made accessible through a cgi interface called sfgate, but the inter- face relies on a perl module, wais.pm, which is very difficult to compile. the interface supports lots o' search features including field searching, nested queries, right-hand truncation, thesauri, multiple-database searching, and boolean logic. this indexer represents aging code. not because it doesn't work, but because as new incarnations of operating systems evolve freewais-sf get harder and harder to install. after many trials and tribulations, i have been able to get it to compile and install on redhat linux, and i have found it most useful for indexing two types of data: archived email and public domain electronic texts. for example, by indexing my archived email i can do free text searches against the archives and return names, subject lines, and ultimately the email mes- sages (plus any attachments). this has been very helpful in my personal work. using the "para" indexing type i have been able to index a small collection of public domain literature and provide a mechanism to search one or more of these texts simultaneously for things like "slave" to identify paragraphs from the collection. harvest harvest was originally funded by a federal grant in at the university of arizona. it is essentially made up of two components: gatherers and brokers. given sets of one or more urls, gatherers crawl local and/or remote file sys- tems for content and create surrogate files in a format called soif. after one or more of the soif collections have been created they can be federated by a broker, an application indexing them and makes them available though a web in- terface. the harvest system assumes the data being indexed is ephemeral. consequently, index items become "stale", are automatically removed from retrieval, and need to be refreshed on a regular basis. this is considered a feature, but if your content does not change very often it is more a nuisance than a benefit. harvest is not very difficult to compile and install. it comes with a decent shell script allowing you to set up rudimentary gatherers and brokers. config- uration is done through the editing of various text files outlining how output is to be displayed. the system comes with a web interface for administrating the brokers. if your indexed content is consistently structured and includes meta tags, then it is possible to output very meaningful search results that include abstracts, subject headings, or just about any other fields defined in the meta tags of your html documents. the real strength of the harvest system lies in its gathering functions. ide- ally system administrators are intended to create multiple gatherers. these gatherers are designed to be federated by one or more brokers. if everybody were to index their content and make it available via a gatherer, then a few brokers can be created collecting the content of the gatherers to produce sub- ject- or population-specific indexes, but alas, this was a dream that came to fruition. ht://dig this is nice little indexer, but just doesn't have the features of some of the other available distributions. configuring the application for compilation is not too tricky, but unless you set paths correctly you may create a few broken links. like swish, to index your data you feed the application a configuration file and it then creates gobs of data. many indexes can be created and they then have to be combined into a single database for searching. not too hard. the indexer supports boolean queries, but not phrase searching. it can apply an automatic stemming algorithm, but upon doing so you might give the unsus- pecting user information overload. the search engine does not support field searching, and a rather annoying thing is that the indexer does not remove du- plicates. consequently, index.html files almost always appear twice in search results. on the other hand, one nice thing ht://dig does do that the other en- gines don't do (except webglimpse) is highlight query terms in a short blurb (a pseudo-abstract) of the search results. ht://dig is a simple tool. consid- ering the complexity of some of the other tools reviewed here, i might rank this one as # after swish. isite/isearch isite/isearch is one of the very first implementations based on the wais code. like yaz/zebra, it is intended to support the z . information retrieval protocol. like freewais (and unlike yaz/zebra) it supports a number of file formats for indexing. unfortunately, isite/isearch no longer seems to be sup- ported and the documentation is weak. while it comes with a cgi interface and is easily installed, the user interface is difficult to understand and needs a lot of tweaking before it can be called usable by today's standards. if you require z . compliance and for some reason yaz/zebra does not work for you, then give isite/isearch a whirl. mps mps seems to be the zippiest of the indexers reviewed here. it can create more data in a shorter period of time than all of the other indexers. unlike the chapter . comparing open source indexers other indexers mps divides the indexing process into two parts: parser and in- dexer. the indexer accepts what is called a "structured index stream", a spe- cialized format for indexing. by structuring the input the indexer expects it is possible to write output files from your favorite database application and have the content of your database indexed and searchable by mps. you are not limited to indexing the content of databases with mps. since it too was origi- nally based on the wais code it indexes many other data types such as mbox files, files where records are delimited by blank lines (paragraphs), as well as a number of mime types (rtf, tiff, pdf, html, soif, etc.). like many of the wais derivatives, it can search multiple indexes simultaneously, supports a variant of the z . protocol, and a wide range of search syntax. mps also comes with a perl api and an example cgi interface. the perl api comes with the barest of documentation, but the cgi script is quite extensive. one of the neatest features of the example cgi interface is its ability to al- low users to save and delete searches against the indexes for processing later. for example, if this feature is turned on, then a user first logs into the system. as the user searches the system their queries are stored to the local file system. the user then has the option of deleting one or more of these queries. later, when the user returns to the system they have the option of executing one or more of the saved searches. these searches can even be de- signed to run on a regular basis and the results sent via email to the user. this feature is good for data that changes regularly over time such a news feeds, mailing list archives, etc. mps has a lot going for it. if it were able to extract and index the meta tags of html documents, and if the structured index stream as well as the perl api were better documented, then this indexer/search engine would ranking higher on the list. swish swish is currently my favorite indexer. originally written by kevin hughes (who is also the original author of hypermail), this software is a model of simplicity. to get it to work for you all that needs to be done is to down- load, unpack, configure, compile, edit the configuration file, and feed the file to the application. a single binary and a single configuration file is used for both indexing and searching. the indexer supports web crawling. the resulting indexes are portable among hosts. the search engine supports phrase searching, relevance ranking, stemming, boolean logic, and field searches. the hard part about swish is the cgi interface. many swish cgi implementations pipe the search query to the swish binary, capture the results, parse them, and return them accordingly. recently a perl as well as php modules have been developed allowing the developer to avoid this problem, but the modules are considered beta software. like harvest, swish can "automagically" extract the content of html meta tags and make this content field searchable. assume you have a meta tag in the header of your html document such as this: the swish indexer would create a column in its underlying database named "sub- ject" and insert into this column the values "adaptive technologies" and "cil (computers in libraries)". you could then submit a query to swish such as this: subject = "adaptive technologies" chapter . comparing open source indexers this query would then find all the html documents in the index whose subject meta tag contained this value resulting in a higher precision/recall ratio. this same technique works in harvest as well, but since the results of a swish query are more easily malleable before they are returned to the web browser, other things can be done with the swish results; swish results can easily be sorted by a specific field, or more importantly, swish results can be marked up before they are returned. for example, if your cgi interface supports the get http method, then the content of meta tags can be marked up as hyperlinks allowing the user to easily address the perennial problem of "find me more like this one." webglimpse webglimpse is a newer incarnation of the original harvest software. like har- vest, webglimpse relies on glimpse to provide an indexing mechanism, but un- like harvest, webglimpse does not provide a means to federate indexes through a broker. compilation and installation is rather harmless, and the key to us- ing this application effectively is the ability to edit a small configuration file that is used by the indexer (archive.cfg). once edited correctly, another binary reads this file, crawls a local or remote file system, and indexes the content. the index(es) are then available through a simple cgi interface. un- fortunately, the output of the interface is not configurable unless the com- mercial version of the software is purchased. this is a real limitation, but on the other hand, the use of webglimpse does not require a separate pair of servers (a broker and/or a gatherer) running in order to operate. webglimpse reads glimpse indexes directly. yaz/zebra the yaz/zebra combination is probably the best indexer/search engine solution for librarians who want to implement an open source z . interface. z . is an ansi/niso standard for information retrieval based on the idea of client/server computing before client/server computing was popularized: it specifies procedures and structures for a client to search a database pro- vided by a server, retrieve database records identified by a search, scan a term list, and sort a result set. access control, resource control, extended services, and a "help" facility are also supported. the protocol addresses communication between corresponding information retrieval applications, the client and server (which may reside on different computers); it does not ad- dress interaction between the client and the end-user. - -http://lcweb.loc.gov/z /agency/markup/ .html put another way, z . tries to facilitate a "query once, search many" inter- face to indexes in a truly standard way, and the yaz/zebra combination is probably the best open source solution to this problem. yaz is a toolkit allowing you to create z . clients and servers. zebra is an indexer with a z . front-end. to make these tools work for you the first thing to be done is to download and compile the yaz toolkit. once installed you can feed documents to the zebra indexer (it requires a few yaz libraries) and make the documents available through the server. while the yaz/zebra com- bination does not come with a perl api, you there are at least a couple of perl modules available from cpan providing z . interfaces. there is also a module called zap! (http://www.indexdata.dk/zap/) allowing you to embed a z . client into apache. there is absolutely nothing wrong with the yaz/zebra combination. it is well documented, standards-based, as well as easy to compile and install. the dif- ficulty with this solution is the protocol, z . . it is considered overly complicated and therefore the configuration files you must maintain and the formats of the files available for indexing are rather obtuse. if you require z . , then this is the tool for you. if not, then something else might be chapter . comparing open source indexers better suited to your needs. local examples a number of local implementations of the various indexers reviewed here have been created. use these links to play and see how well they work: • freewais-sf (plain text files where each "record" is delimited by a blank line) • harvest (plain text and html files across the internet) • ht://dig (html pages containing html meta tags) • isite/isearch (html pages containing html meta tags) • mps (plain text files on the local file system) • swish (html pages containing html meta tags) • webglimpse (html pages containing html meta tags) summary and information systems indexers provide one means for "finding a needle in a haystack" but don't rely on it to satisfy people's information needs; information systems require well-structured data and consistently applied vocabularies in order to be truly useful. information systems can be defined as organized collections of information. in order to be accessed they require elements of readability, browsability, searchability, and finally interactive assistance. readability is another word for usability. it connotes meaningful navigation, a sense of order, and a sys- tematic layout. as the size of an information system increases it requires browsability -- an obvious organization of information that is usually embod- ied through the use of a controlled vocabulary. the browsable categories of yahoo! are a good example. searchability is necessary when a user seeks spe- cific information and when the user can articulate their information need. searchability flattens browsable collections. finally, interactive assistance is necessary when an information system becomes very large or complex. even though a particular piece of information exists in a system, it is quite likely a person will not find that information and may need help. interactive assistance is that help mechanism. by creating well-structured data you can supplement the searchability aspects of your information system. for example, if the data you have indexed is html, then insert meta tags into your documents and use a controlled vocabulary -- a thesaurus -- to describe those documents. if you do this then you can use swish or harvest to extract these tags and provide canned field searching ac- cess to your documents; freetext searches rely too much on statistical analy- sis and can not return as high precision/recall ratios as field searches. if your content is saved in a database, then it is an easy process to create your html and include meta tags. such a process is described in more detail in "cre- ating 'smart' html pages with php" (http://www.infomotions.com/musings/smart-pages/). the indexers reviewed here have different strengths and weaknesses. if your content is primarily html pages, then swish is most likely the application you would want to use. it is fast, easy to install, and since it comes with no user interface you can create your own with just about any scripting language. chapter . comparing open source indexers if your content is not necessarily html files, but structured text files such database dumps, then mps or the yaz/zebra combination may be something more of what you need. both of these applications support a wide variety of file for- mats for indexing as well as the incorporation of standards. links here is a list of url's pointing to the indexers reviewed in this text. • freewais-sf - http://ls -www.informatik.uni-dortmund.de/ir/projects/freewais-sf/ • harvest - http://harvest.sourceforge.net/ • ht://dig - http://www.htdig.org/ • isite/isearch - http://www.etymon.com/isearch/ • mps - http://www.fsconsult.com/products/mps-server.html • swish - http://sunsite.berkeley.edu/swish-e/ • webglimpse - http://webglimpse.net/ • yaz/zebra - http://indexdata.dk/zebra/ chapter . comparing open source indexers chapter . selected oss introduction below is a list of open source software especially useful in libraries and open source software in general. this list is not intended to be comprehensive but selective instead. it is representative of the types of open source soft- ware available and the most used tools. a more comprehensive lists of open source software especially designed for li- braries can be found at oss lib (http://www.oss lib.org/). there you will also find the archives of the oss lib mailing list, a low-traffic but ongoing dis- cussion surrounding the issues of open source software in libraries. for an even more comprehensive list of software, check out sourceforge (http://sourceforge.net/). there you will find just about any type of open source software you desire. apache link: http://httpd.apache.org/ apache is the most popular web (http) server on the internet and a standard open source piece of software. it's name doesn't really have anything to do with american indians. instead, it's name comes from the way it is built. it is "a patchy" server, meaning that it is made up of many modular parts to cre- ate a coherent whole. this design philosophy has made the application very ex- tensible. for example, there are the core modules that make up the server's ability to listen for connections, retrieve files, and return them to the re- questing client (the "user agent" in http parlance). there are other modules dealing with logging transactions and cgi (common gateway interface) script- ing. other modules allow you to rewrite incoming requests, manage email, im- plement the little-used http put method, write other modules in perl, or transform xml files using xslt. apache is currently at version . , but for some reason many people are still using the . series. i don't really know why. i have not upgraded my apache servers to version . because i do not want to loose the functionality of axkit, an xml transformation engine. apache is a part of lamp (linux apache mysql perl/php), a term coined by redhat to denote the core open source applications dealing with stuff web. cvs link: http://www.cvshome.org/ cvs is an acronym for concurrent versions system. it is the way open source software is shared by developers. it consists of a client and server applica- tion. the server is set up and points to a directory where one or more projects are saved. usernames and passwords are created, and the server sits and waits for connections. for the most part, the cvs client is command-line driven. on the command-line you specify the location of a cvs server, the pro- tocol you are going to use to connect to the server, and your username/pass- word. once logged in you give cvs various commands used to download remote projects. you then spend your time hacking away at the source code. when you think you have created the latest and great hack, you issue the cvs diff com- mand to create a diff file. this file lists the changes you made to the origi- nal source. by sending this diff file to the project's maintainer, your hack can be incorporated into the next release. alternatively, you might be granted write access to the remote project. in which case you issue cvs commit com- mand, and your hacks are automatically incorporated. if you are going to do any open source software development, then you must get acquainted with cvs. luckily, it comes pre-installed with many unix variants, but it is just as easily compiled. docbook stylesheets link: http://docbook.sourceforge.net/projects/xsl/ given a set of xml/docbook files, the docbook stylesheets, and/or an xsl pro- cessor such as xsltproc or fop, you can transform your docbook files into pdf documents, html documents, xhtml documents, or a few other file types. when you download the stylesheets, but sure to download the xsl sheets and not other types. you would need other processors to use the other types. the stylesheets are configurable by setting a number of parameters. through this means you can specify a cascading stylesheet to be incorporated into your xhtml/html files. the stylesheets are thorough but do not allow you to change very much of the resulting output. if you don't like the way the stylesheets format your xml, you can always write your own stylesheets, but i'm willing to bet you have better things to do with your time. as a person who is interested in open source software, learning how to write docbook files is a skill that will come in handy in the future. fop link: http://xml.apache.org/fop/ fop is an implementation of the formating objects standards for transforming xml documents into documents intended for printing. it is mentioned here, not because it a primary open source software application, but because it is a java application and represents a nice way to create pdf documents. for exam- ple, given an java virtual machine, a docbook file, the docbook stylesheets, and fop, you can create pdf versions of your docbook documents. i have only had success with version . . but it has proven indispensible a number of times. writing fo stylesheets is not easy, and that is why i have relied on the docbook fo stylesheets. learning how to use fop will give you good experi- ence with java as well as xml files. gnu tools link: http://www.gnu.org/directory/ the gnu family of tools is wide and varied. probably the most important one is gcc, a c compiler. ironically, you can not compile the compiler unless you haves a compiler. crazy. consequently, beginning the process of software de- velopment is an sort of chicken and egg problem. for example, while you might be able to download the gcc distribution, but you will need gunzip and tar to uncompress the distribution, and you can't build gunzip nor tar without the compiler. no worry, many operating systems now come with an "unzipper" and a "de-tarrer". frequently flavors of unix (including linux) come with a version of gcc pre-installed, allowing you to upgrade accordingly. besides gcc, gun- zip, and tar, there are a number of other very useful gnu tool including berkeley db (database library), binutils (miscellaneous binary utilities espe- cially a linker and assembler), bison (alternative to yacc), curl (internet user agent), emacs (text editor), fileutils (miscellaneous file utilities such as cp, mv, and rm), less (alternative to more), make (a sort of scripting lan- guage used to build source files), openssh and openssl (implementations of se- cure socket transactions), patch (applies diff files to source files), proc- mail (mail filter), sendmail (mail transfer agent), and wget (internet user agent). by the way, and interesting discussion can be had by comparing the philosophy of "open source software" and gnu software. hypermail link: http://www.hypermail.org/ hypermail converts email messages into sets of html files browsable by author, subject, date, thread, and attachment for the purpose of creating a mailing chapter . selected oss list archive. as alluded to earlier, open source software is about communi- ties. email mailing lists are one of the primary, if not the primary, communi- cation channels in the open source software world. as you develop open source software and manage a mailing list to keep everybody up-to-date, don't lets those valuable pieces of information go to big byte heaven. capture those "perls" of wisdom by maintaining a mailing list archive with hypermail. hyper- mail is a c program driven by a number of configuration files and/or command line switches. pass hypermail raw, smtp messages (unix mbox files) and it will create sets fo browsable html files. the look, feel, and some functionality of the archives can be changed through templates and the configuration files. the only thing hypermail does not support is searching the resulting archive. for that functionality you need an indexer, preferably an indexer that can index mbox files, but you usually end up using an indexer that can index html files. koha link: http://www.koha.org/ koha is an integrated library system with a growing user community. written in perl and using mysql as the underlying database, koha makes it simple to cre- ate and manage a small integrated library system. equipped with acquisitions, cataloging, circulation, and searching modules it provides much of the func- tionality of traditional online catalogs. with the recent implementation of its z . interface, it is easy to enter isbn numbers into the system, locate marc records, and have those records added. the user and system interfaces are simple and unencumbered, but alas, not very customizable. for many libraries, the catalog is the center piece of the operation. koha represents a major step in providing a catalog that is functional and usable for small libraries. as long as support continues, i expect koha to be more viable option for medium and possibly large library collections. the obstacle is not technology. the obstacle is time and effort. marc::record link: http://marcpm.sourceforge.net/ this perl module is the perl module to use when reading and writing marc records. it is very well supported on the perl lib mailing list, and a testa- ment to the module's abilities is its incorporation into things like koha and net::z . if you are not familiar with object oriented programing techniques in perl, then marc::record might take a bit of getting used to. on the other hand, learning to use marc::record will not only improve your programming abilities but it will educate you on the intricacies of the marc record data structure, a structure that was designed in an era of scarce disk space, non- relational databases, and little or no network connectivity. mylibrary link: http://dewey.library.nd.edu/mylibrary/ mylibrary is a user-driven, customizable interface to sets of library re- sources -- a portal. technically, mylibrary is a database-driven website ap- plication written in perl. it requires a relational database application as foundation, and it currently supports mysql and postgresql. mylibrary grew out of a number of focus group interviews where people said they were suffering from information overload. to address this problem, mylibrary takes three es- sential components of librarianship (resources, patrons, and librarians) and tries to create relationships between them through the use of common con- trolled vocabularies such as a list of subject terms. like a library catalog, mylibrary provides the means to create collections of resources and classify these resources with a controlled vocabulary. unlike a library catalog, the system also allows librarians as well as patrons to be classified in this man- chapter . selected oss ner. by sharing a common set of controlled vocabulary terms relationships be- tween resources, patrons, and librarians can be made thus addressing things like, "if you are like this, then these resources may be of interest", or "if you have this interest, then your librarian is...", or "these people have ex- pressed an interest this, therefore your patrons are...", or potentially even doing amazon-like things such as "people like you also used...". mysql link: http://www.mysql.com/ mysql is a relational database application, pure and simple. billed as "the world's most popular open source database" mysql certainly has a wide support in the internet community. many people think mysql can't be very good because it is free, especially oracle database administrators. true, it does not have all the features of oracle, nor does it require a specially trained person to keep it up and running. a part of the lamp suite, mysql compiles easily on a multitude of platforms. it comes as a pre-compiled binary for windows. it has been used to manage millions of records and gigabytes of data. fast and ro- bust, it supports the majority of people's relational database needs. on its down side, it does not currently support triggers, transactions, nor roll- backs. nor does it have a gui interface. at the same time, a program called phpmyadmin, a set of php scripts, can be used to manage, manipulate, and query mysql database through a web browser window. if there were one technical skill i could teach the library profession, it would be the creating and maintenance of relational databases, and i would teach them how to use mysql. perl link: http://www.perl.com/ perl is a programming language. originally written to handle various systems administration tasks, perl's strength lies in its ability to manipulate strings (text). perl matured through the era of gopher but really started be- coming popular with the advent to cgi scripting. perl has been ported to just about any computer operating system, has one of the largest numbers of support forums, and has been written about in more books than you can count. perl can be compiled into apache making it possible to run perl scripts as fast as c programs. it easily connects to database applications through a module called dbi. it can be run from the command line. it can listen and respond to net- working connections. it can call many aspects of your computer's operating system. in short, perl is mature and very robust. other very good programming languages exist and can do much of what perl can do. examples include other "p" languages such as php and python. these languages are becoming increas- ingly popular, especially php, but at the risk of starting a religious war, i advocate perl because of its very large support base and its cross-platform functionality. swish-e link: http://www.swish-e.org/ swish-e is an uncomplicated indexer/search engine. once built you feed the swish-e binary a configuration file and/or a set of command line switches to index content. this content can be individual files on a file system, files retrieved by crawling a website, or a stream of content from another applica- tion such as a database. the indexing half of swish-e is able to index specif- ically marked up text in xml and html as fields for searching later. the in- dexes created by swish-e are portable from file system to file system. the same binary that creates the indexes can be used to search the indexes. swish-e supports relevance ranking, boolean operations, right-hand truncation, field searching, and nested queries. later versions of swish-e come with a c chapter . selected oss and perl api allowing developers to create cgi interfaces to these indexes. swish-e is an unsung hero. it's inherently open nature allows for the creation of some very smart search engines supporting things like spelling correction, thesaurus intervention, and "best bets" implementations. of all the different types of information services librarians provide, access to indexes is one of the biggest ones. with swish-e librarians could create their own indexes and rely on commercial bibliographic indexers less and less. xsltproc link: http://xmlsoft.org/xslt/ xsltproc and its companion program, xmllint, are very useful applications for processing xml files with xsl. both applications are built from a c library that is becoming increasingly popular for parsing and processing xml docu- ments. by feeding xsltproc an xsl stylesheet and an xml data file you can transform the xml data file into any one of a number of text files whether they be sql, (x)html, tab-delimited files, or even plain text files intended for printing. xmllint is a syntax checker. given an xml file, xmllint will check the validity of your xml files against a dtd. by first installing the c library and mod_perl, you will be able to incorporate axkit into your apache http server allowing you to transform xml data on the fly and serve it accord- ingly. swish-e desires the c library. it is easy to use the docbook stylesheets with xsltproc to create xhtml versions of your docbook files. with xsltproc and a plain o' text editor, you can learn a whole lot about xml. yaz and zebra link: http://www.indexdata.dk/yaz/ and http://www.indexdata.dk/zebra/ yaz is a c library and resulting binary application implementing a z . /srw client. zebra is an indexer and z . server. the yaz-client is a straight- forward terminal application. zebraidx is the indexer, and requires bunches o' configuration files. it is not as straight-forward as other indexers, but its data can be served by zebrasrv. since the client is built on a library, it can (and has) been compiled into other tools such as php and apache. the yaz api also has a perl interface. yaz/zebra are definitely worth your time exploring if you want to make your collections available through z . . yes, you will spend time learning the in's and out's of z . in the process, but that ex- perience can be taken forward and applied on other venues where z . is needed. chapter . selected oss chapter . hands-on activities introduction this part of the manual outlines the hands-on aspects of the workshop. the activities outlined below were selected based on the software's popular- ity, the installation techniques they represent, the length of time and exper- tise they require, and their applicability to a library setting. this is not a comprehensive list of activities. a glaring omittion may be the installation of a number of gnu tools, specifically, some sort of text editor, the compiler gcc, and make. consequently, these activities assume the hosting (your) com- puter is duly equipped or the activities can be accomplished on top of windows or unix/linux operating systems without compilation. for the most part, the activities are listed in priority order; many times you must install a previous package before a subsequent package can be installed, but this is not always the case. all the packages to be installed in these ex- ercises are included on the cd. thus acquiring the software is a matter using the copy command (cp) from the cd to your home directory, or acquiring the software from the distribution site. the choice is yours. the installation of open source and gnu software follows a pattern. you usu- ally: . download the software . uncompressess and un-tar the package . run some sort of configuration program prior to compilation . compile (make) the software . test it . install it downloading the software is usually done through an ftp or http interface. i like to get the url of the remote file and feed it to a program called wget which then does all the work. uncompressing and un-tarring the is the work of gunzip and tar, respectively. to configure for compilation there is usually some sort of file called config- ure or in the case of perl modules you run the command "perl makefile.pl". in either case the script examines the contents of your downloaded package to make sure it is complete, examines your computer's hardware and software to make sure you have the necessary tools installed, and finally builds some sort of a "make" file which is a script used to actually make the software. the most often used configuration is "--prefix". this configuration denotes where the software will eventually be installed. by default, most software gets in- stalled in /usr/local. this is usually a good place, but circumstances are not always the same from person to person, so running a configuration like this, ./configure --prefix=/disk /local, might be just what you need. when in doubt, try ./configure --help for a complete list configuration options. in almost all cases the next step is to run make and the software is built. if there are problems, then you can usually run "make clean" to remove the mis- takes, re-run the configuration script, and try make again. once the program is built, hopefully without errors, you might be able to run "make test" which will examine whether or not program works. finally, you can run "make install" to put the program onto your file system. access to /usr/local/bin, /usr/local/man, /usr/local/lib, /usr/local/etc, and /usr/local/include is usually restricted to root-level users. consequently, you might need root privileges for this last step, but remember the --prefix configuration option. using this option allows you to save the installation in your home directory. (hint, hint!) installing and running perl in this exercise you will install perl. . acquire the perl distribution by copying it from the cd, another location on your file system, or from the internet. save the distribution in your home directory. . unzip the distribution with this command: gunzip perl- . . .tar.gz. . un-tar the distribution with this command: tar xvf perl- . . .tar. . change directories to the newly created distribution: cd perl- . . . . configure the build process with this command: ./configure. . the configuration script will ask you lots o' questions. accept the de- fault answers to all of them except when it comes to where perl and its supporting files will be installed. when asked these questions, specify your home directory like this: /home/[username] where [username] is your... username. . the configuration process takes a few minutes to complete, but when it is done simply run make by typing make at the command line. . the make process takes a number of minutes as well. perl is very well written and will most likely make (compile) without any problems. . test the make process using "make test". . install the software by running "make install". . to verify that everything worked correctly, you should be able to type "perl -v" from the command line to see how things got built and where they got saved. you can now run your first perl script. . copy the file named hello-world.pl from the extras directory of the cd to your home directory. . examine the contents of the file with this command: more hello-world.pl. . run the script like this: perl hello-world.pl . open the script with pico, a text editor: pico hello-world.pl. . change the contents of the print command. . save your changes by pressing: ctrl-x chapter . hands-on activities . go to step # until satisfied. ta da! you have successfully installed perl and run a perl program. installing mysql installing mysql is the goal of this exercise. be patient. . acquire the distribution from the cd, internet, or local file system. . uncompress and untar it: gunzip mysql- . . .tar.gz; tar xvf mysql- . . .tar. . change to the newly created directory: cd mysql- . . . . configure the installation and use many configuration options: ./configure --prefix=/home/[username]/mysql - -with-unix-socket-path=/home/[username]/mysql/var/mysql.sock - -with-tcp-port=[portnumber] --with-mysqld-user=[username] where [username] is your username and [portnumber] is a tcp port number assigned to you. . compile the application: make. . install the application: make install. . initialize mysql with this command: ./scripts/mysql_install_db . give the root user of mysql a password: / home/[username]/mysql/bin/mysqladmin -u root password [username] where [username] is your username. . change directories to the location of your mysql installation and start the server: cd ~/mysql; ./bin/mysqld_safe &. (to stop the server run: ~/mysql/bin/mysqladmin -uroot -p shutdown.) in this exercise you will install some sample data into mysql. . make sure the mysql server is running: ps -u[username] where [username] is your username. . create a new database: mysqladmin -uroot -p create mylibrary. you will be prompted for a password, and use the password from the previous exercise. . change directories to the extras directory of the cd and take a look some sample data: more mylibrary.sql. . import the sample data into the new database: mysql -uroot -p mylibrary < mylibrary.sql. . run the terminal-based mysql client to begin to see the fruits of your labors: mysql -uroot -p mylibrary. . once given the mysql client prompt (mysql>), you can issue any of the fol- lowing commands: • select * from librarians; chapter . hands-on activities • select name, email_address from librarians order by name; • explain disciplines; • select discipline_name from disciplines; • explain items librarians; • select name, discipline_name from librarians, items librarians, disci- plines where librarians.librarian_id = items librarians.librarian_id and items librarians.discipline_id = disciplines.discipline_id order by dis- cipline_name; installing apache here the basics of installing apache are outlined. . acquire the apache software from the cd, local file system, or the inter- net. . unzip the distribution: gunzip apache_ . . .tar.gz. . un-tar the archive: tar xvf apache_ . . .tar. . change into the newly created directory: cd apache_ . . . . run the configuration script making sure you specify your home directory as the prefix. in this case, it is also a good idea to put apache in its own, separate directory like this: ./configure --with-port=[portnumber] where [portnumber] is the port number assigned to you - -prefix=/home/[username]/apache where [username] is your username . after the configuration files are created, make the server: make. . finally, install: make install. . should now be able to start the server: ~/apache/bin/apachectl start. . verify that the server is working by connecting to it with your web browser. the server's url will be a combination of the ip address of your hosting computer and the value of port described in step # , such as: http:// . . . : /. now, create your own home page. . copy the file named home.htm from the cd's extras directory to apache's htdocs directory. the command will look something like this: cd home.html ~/apache/htdocs. . take a look at the new file: more home.html. . view the home page in your web browser with a url looking something like this: http:// . . . : /home.html. . open home.html with your text editor: pico home.html. chapter . hands-on activities . season (edit) it to taste and save the changes: ctrl-x. . reload the home page: http:// . . . : /home.html. cvs in this exercise you will install cvs. . acquire the cvs "distro" from the internet, cd, or local file system. . use the usual technique for unpacking, making, and installing the applica- tion: gunzip cvs- . . .tar.gz; tar xvf cvs- . . .tar; cd cvs- . . ; ./configure --prefix=/home/[username] where [username] is your username; make; make install. while the syntax is a bit confusing, retrieving a cvs repository is not too difficult. . log into a repository, such as the one for mylibrary: cvs -d :pserver:anonymous@dewey.library.nd.edu:/usr/local/cvsroot login . when prompted for a password, simply press return because the user anony- mous does not have a password. . download the repository: cvs -d :pserver:anonymous@dewey.library.nd.edu:/usr/local/cvsroot checkout myli- brary. . in this exercise you will edit a file from the repository and create a "patch". . use your favorite editor to change the contents of any text file in the repository: pico changelog. . save your changes: ctrl-x. . create a "diff" file or patch: cvs diff -u changelog > patch.txt . take a look at the patch. it is the file you would send to the developer for inclusion into the repository: more patch.txt. hypermail in this exercise you will create and install hypermail. the process is pretty standard. . download or copy the hypermail archive from the internet or cd to your home directory: cp hypermail- . . .tar.gz ~/. . unzip the archive: gunzip hypermail- . . .tar.gz. . untar the archive: tar xvf hypermail- . . .tar. . change to the newly created directory: cd hypermail- . . . . configure the make process making sure to specify your home directory as chapter . hands-on activities the prefix: ./configure --prefix=/home/[username] where [username] is your username. . compile the program: make. . install the program: make install. . you should now be able to run the program and read a simple help text: hy- permail --help. do this exercise to create a browsable archive from a standard mail box file. . make sure you have installed apache, and make sure it is running. . create a new directory under your apache file system as a place to save your archive: mkdir ~/apache/htdocs/colldev. . from the extras directory of the cd, copy the supplied mail box file to the colldev directory: cp colldev.mbox ~/apache/htdocs/colldev. . create a browsable index of the mail box with the following (long) com- mand: hypermail -d ~/apache/htdocs/colldev -m ~/apache/htdocs/colldev/colldev.mbox -m. . change directories to and list the items in the newly create directory. you should see bunches o' files as well as an index.html file: cd ~/apache/htdocs/colldev/; ls. . finally, view the fruits of your labors in your web browser: http:// . . . : /colldev/. if you have previously installed swish-e, then you can do this exercise where you will create a searchable index of your browsable archive. . install swish-e. . copy from the extras directory of the cd a swish-e configuration file to the colldev directory: cp swish-colldev.cfg ~/apache/htdocs/colldev. . take a quickie look at the file. it contains extra instructions for the indexer (swish-e): more swish-colldev.cfg . create an index: swishe-e -c ./swish-colldev.cfg -i / home/[username]/apache/htdocs/colldev/ *.html where [username] is your username. . you should now be able to search the index with simple swish-e commands: swish-e -w books. . install a cgi script from the extras directory allowing you to search the index: cp swish-colldev.cgi ~/apache/cgi-bin. . edit the script making sure its very first line points to your perl bi- nary: pico ~/apache/cgi-bin/swish-colldev.cgi. the very first line should look something like this: #!/home/[username]/bin/perl where [username] is your username. . edit the line in the script that defines where the index resides. the line should read something like this: my $index = '/ home/[username]/apache/htdocs/colldv/index.swish-e'; where [username] is chapter . hands-on activities your username. . make the script executable: chmod +x swish-colldev.cgi. . finally, give the script a whirl: http:// . . . /cgi-bin/swish-colldev.cgi. because hypermail created structured data with meta tags, and because swish-e was configured to ex- tract the meta tags and save them to specific fields, it is possible to do field searching against this email archive using queries like "subject = book". marc::record in this exercise you will install marc::record. . obtain the marc::record distribution from the cd, local file system, or internet. . unzip it: gunzip marc-record- . .tar.gz. . uncompress it: tar xvf marc-record- . .tar . change into the newly created directory: cd marc-record- . . . do the standard perl installation procedure: perl makefile.pl; make; make test; make install. . take a look at the perl documentation: perldoc marc::record. i wish they were all this straight forward. next, you will use marc::record to extract author and title information from a set of marc records. . copy the files named marc-read.pl and marc-records.mrc from the extras di- rectory of the cd to your home directory: cp marc-read.pl ~/ and cp marc- records.mrc ~/. . take a peek at both of the files: more marc-read.pl and more marc- records.mrc. the first file is a simple perl script to read author, title, and subject data from a file such as the second file. . give the marc-read.pl script a whirl: perl ./marc-read.pl marc-records.mrc | more. marc::record can also write marc records. here is an example demonstrating how: . copy the files named marc-write.pl from the extras directory of the cd to your home directory: cp marc-write.pl ~/. . take a look at the insides of the file: more marc-write.pl. . run the script: perl marc-write.pl. (while doing your data-entry, you might have to press crtl-h to backspace and correct any mistakes you make.) chapter . hands-on activities . go to step # until you get tired. . examine the fruits of your labors by feeding your marc-write.pl output file to marc-read.pl. if you have installed yaz, then you can do the following exercise to download marc records from the library of congress. . make sure you have installed the yaz tool kit. . install the perl modules named event and net-z they are found on the cd. both of these modules install in the normal perl fashion: gunzip, untar, perl makefile.pl, make, make test, make install. . copy the file named marc-get.pl from the extras directory of the cd to your home directory: cp marc-get.pl ~/. . take a look at the file's insides: more marc-get.pl and notice how the re- mote database, server, and port are defined. . equip yourself with a few isbn numbers and feed them to marc-get.pl: perl marc-get.pl > catalog.mrc. . browse your newly created catalog: perl marc-read.pl catalog.mrc. swish-e use this process to install swish-e. . acquire the swish-e distribution from the cd, file system, or internet. . uncompress and untar the distribution: gunzip swish-e- . . -pr .tar.gz; untar xvf swish-e- . . -pr .tar. . change to the newly created directory: cd swish-e- . . -pr . . configure the make process being sure to specify your home directory as the prefix: ./configure --prefix=/home/[username] where [username] is your username. . compile, test, and install it: make; make test; make install. . verify that things worked by running the newly created executable: ~/bin/swish-e -h. you should see a bunch o' command line switches that swish-e can use. now, let's index and search some data. . copy the file named alawon.tar.gz from the extras directory to your home directory: cp alawon.tar.gz ~/. . uncompress and untar the archive: guzip alawon.tar.gz; tar xvf alawon.tar. . change to the newly created directory and examine any of the files using the more command. each of the files is a little newsletter regularly put out by the american library association. chapter . hands-on activities . make sure you are in the alawon directory and index the newsletter like this: swish-e -i *.txt. . swish-e will output some diagnostic information. when it is complete list the contents of the alawon directory and notice the newly created files named index.swish-e and index.swish-e.prop. combined, these files are your index. . search the index with like this: swish-e -w [term] where [term] is a word or quoted phrase such as books or "library of congress". swish-e should return a list of scores, file names, "titles", and sizes for each file that match your query. while swish-e can be run from the command line, its real power is demonstrated though one of its programming interfaces. in this exercise you will install swish-e's perl module and search the index with a perl script. . change into swish-e's distribution directory: cd ~/swish-e- . . -pr . . change into the distributions's perl directory: cd perl. . use the standard perl installation technique: perl makefile.pl; make; make test; make install. when complete you should be able to read the perl doc- umentation for swish-e: perldoc swish::api. . return to the alawon directory: cd ~/alawon. . copy the file named swish-alawon.pl from the extras directory of your cd to the alawon directory: cd swish-alawon.pl ~/alawon. . take a look at the script: more swish-alawon.pl. . run the script using queries you tried in the previous exercise: perl swish-alawon.pl. the resulting output should be a bit prettier. yaz here you will compile and install yaz. . acquire the "distro" from the cd, file system, or internet. . uncompress and untar the distribution: gunzip yaz- . . .tar.gz; tar xvf yaz- . . .tar. . change directories to the newly created directory: cd yaz- . . . configure making sure to specify your home directory as the prefix: ./configure --prefix=/home/[username] where [username] is your username. . make the application: make. . install it: make install. in this exercise you will search a z . target with the yaz client. . run the yaz client: yaz-client. chapter . hands-on activities . open a connection to the library of congress: open tcp:z .loc.gov: /voyager. . do a simple free text search: f origami. . display the first record: show . . do a simple phrase search: f "structures of experience" . show the first record: show . do an isbn search: f @attr = koha in this exercise you will explore koha. . open your web browser to the patron url given to you in the workshop. sim- ply explore and play with the interface searching for items, reading de- tailed records, and creating an account for yourself. . open your web browser to the librarian url given to you in the workshop. notice the components that are available. play with the librarian inter- face, specifically the acquisitions module, and try adding a few records via the z . interface or batch marc records load process. (remember, you might have a set of marc records to play with from a previous exer- cise.) . return to the patron interface and search for the items you added to the collection in step # . mylibrary in this exercise you will explore mylibrary. . open your web browser to the patron url given to you in the workshop. ex- plore the interface noticing how the searching, browsing, and account cre- ation/customization features operate. . in a second browser window, open the administrative interface with the url given to you in the workshop. . select the global message option from the administrative interface, and use the resulting form to edit/submit the content of a global message. . make the patron interface active by selecting your first browser window and reload the page. you should see the edits you made in the administra- tive interface. . return to the administrative interface and use the message from the li- brarian option. create edit/submit the content of the same discipline you chose when creating a mylibrary account in step # . . return to the patron interface, reload the page, and notice how the con- tent of your page changes. . return to the administrative interface and create a link to a new informa- tion resource by using the reference shelf, databases, or electronic jour- nals menu options. chapter . hands-on activities . again, return to the patron interface, customize the content of your page, and notice how the resource you just added in the administrative interface is now an option in the patron interface. . make the administrative interface active, and use the create static pages option to create browsable lists of the information resources in the un- derlying mylibrary database. . make the patron interface active, and browse the newly created lists by using the all resources link. . make the administrative interface active, and use the discipline defaults menu option to create the defaults for a discipline of your choice. . make the patron interface active. log out. create a new account making sure you select the discipline you just modified, and notice how the de- faults you created are manifested. xsltproc in this exercise you will install libxml and libxslt, the libraries necessary to run xsltproc. the process adheres pretty much to the standard gnu installa- tion process: configure, make, make install. . acquire the libxml library from the cd, local file system, or internet and save it in your home directory. . uncompress the distribution: gunzip libxml - . . .tar.gz. . un-tar the distribution: tar xvf libxml - . . .tar . change directories accordingly: cd libxml - . . . . configure the build process remembering to specify your home directory as the prefix: ./configure --prefix=/home/[username] where [username] is your... username. . build the library: make . install the library: make install. when you are finished with this step there ought to be directory in your home directory named lib, and lib should contain a file named libxml . now you will make a binary application that uses the libxml library, xstl- proc. . acquire the libxslt distribution from the cd, local file system, or the internet and save it in your home directory. . uncompress the distribution: gunzip libxslt- . . .tar.gz. . un-tar the distribution: tar xvf libxslt- . . .tar. . change into the newly created directory: cd libxslt- . . . . configure, making sure to specify your home directory as the prefix: ./configure --prefix=/home/[username] where [username] is your username. . compile like this: make. chapter . hands-on activities . install like this: make install. . when you are done you should have a binary named xsltproc in the bin di- rectory of your home directory. you can run the command like this: xslt- proc. in this exercise you will transform an xml document into some other type of document using an xsl stylesheet and xsltproc. . copy the files named hello-world.xml and hello-world.xsl from the extras directory of the cd to your home directory. . take a look at the files like this: more hello-world.xml and more hello- world.xsl. . do an xml transformation like this: xsltproc hello-world.xsl hello- world.xml. . open hello-world.xml in a text editor: pico hello-world.xml. . add a new message to the file and exit the editor by pressing ctrl-x. . go to step # until satisfied. you can get a lot of use out of xsltproc, but the fact that it is distributed as a library than can be compiled into other applications make it even more powerful. chapter . hands-on activities chapter . gnu general public license version , june copyright (c) , free software foundation, inc. temple place, suite , boston, ma - usa everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. preamble the licenses for most software are designed to take away your freedom to share and change it. by contrast, the gnu general public license is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. this general public license applies to most of the free software foundation's software and to any other program whose authors commit to using it. (some other free software foundation software is covered by the gnu library general public license instead.) you can apply it to your programs, too. when we speak of free software, we are referring to freedom, not price. our general public licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. to protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. these restric- tions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. for example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. you must make sure that they, too, receive or can get the source code. and you must show them these terms so they know their rights. we protect your rights with two steps: ( ) copyright the software, and ( ) of- fer you this license which gives you legal permission to copy, distribute and/or modify the software. also, for each author's protection and ours, we want to make certain that ev- eryone understands that there is no warranty for this free software. if the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. finally, any free program is threatened constantly by software patents. we wish to avoid the danger that redistributors of a free program will individu- ally obtain patent licenses, in effect making the program proprietary. to pre- vent this, we have made it clear that any patent must be licensed for every- one's free use or not licensed at all. the precise terms and conditions for copying, distribution and modification follow. gnu general public license terms and conditions for copying, distribution and modification . this license applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this general public license. the "program", below, refers to any such program or work, and a "work based on the program" means either the program or any derivative work under copyright law: that is to say, a work containing the program or a portion of it, either verbatim or with modifications and/or translated into another language. (hereinafter, translation is included with- out limitation in the term "modification".) each licensee is addressed as "you". activities other than copying, distribution and modification are not covered by this license; they are outside its scope. the act of running the program is not restricted, and the output from the program is covered only if its con- tents constitute a work based on the program (independent of having been made by running the program). whether that is true depends on what the program does. . you may copy and distribute verbatim copies of the program's source code as you receive it, in any medium, provided that you conspicuously and appropri- ately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this license and to the absence of any warranty; and give any other recipients of the program a copy of this license along with the program. you may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. . you may modify your copy or copies of the program or any portion of it, thus forming a work based on the program, and copy and distribute such modifi- cations or work under the terms of section above, provided that you also meet all of these conditions: a) you must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) you must cause any work that you distribute or publish, that in whole or in part contains or is derived from the program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this license. c) if the modified program normally reads commands interactively when run, you must cause it, when started running for such inter- active use in the most ordinary way, to print or display an an- nouncement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a war- ranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this li- cense. (exception: if the program itself is interactive but does not normally print such an announcement, your work based on the program is not required to print an announcement.) these requirements apply to the modified work as a whole. if identifiable sec- tions of that work are not derived from the program, and can be reasonably considered independent and separate works in themselves, then this license, and its terms, do not apply to those sections when you distribute them as sep- arate works. but when you distribute the same sections as part of a whole which is a work based on the program, the distribution of the whole must be on the terms of this license, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the program. chapter . gnu general public license in addition, mere aggregation of another work not based on the program with the program (or with a work based on the program) on a volume of a storage or distribution medium does not bring the other work under the scope of this li- cense. . you may copy and distribute the program (or a work based on it, under sec- tion ) in object code or executable form under the terms of sections and above provided that you also do one of the following: a) accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of sections and above on a medium customarily used for software inter- change; or, b) accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete ma- chine-readable copy of the corresponding source code, to be dis- tributed under the terms of sections and above on a medium customarily used for software interchange; or, c) accompany it with the information you received as to the offer to distribute corresponding source code. (this alternative is al- lowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with subsection b above.) the source code for a work means the preferred form of the work for making modifications to it. for an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installa- tion of the executable. however, as a special exception, the source code dis- tributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. if distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. . you may not copy, modify, sublicense, or distribute the program except as expressly provided under this license. any attempt otherwise to copy, modify, sublicense or distribute the program is void, and will automatically terminate your rights under this license. however, parties who have received copies, or rights, from you under this license will not have their licenses terminated so long as such parties remain in full compliance. . you are not required to accept this license, since you have not signed it. however, nothing else grants you permission to modify or distribute the pro- gram or its derivative works. these actions are prohibited by law if you do not accept this license. therefore, by modifying or distributing the program (or any work based on the program), you indicate your acceptance of this li- cense to do so, and all its terms and conditions for copying, distributing or modifying the program or works based on it. . each time you redistribute the program (or any work based on the program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the program subject to these terms and conditions. you may not impose any further restrictions on the recipients' exercise of the rights granted herein. you are not responsible for enforcing compliance by chapter . gnu general public license third parties to this license. . if, as a consequence of a court judgment or allegation of patent infringe- ment or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contra- dict the conditions of this license, they do not excuse you from the condi- tions of this license. if you cannot distribute so as to satisfy simultane- ously your obligations under this license and any other pertinent obligations, then as a consequence you may not distribute the program at all. for example, if a patent license would not permit royalty-free redistribution of the pro- gram by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this license would be to refrain entirely from distribution of the program. if any portion of this section is held invalid or unenforceable under any par- ticular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. it is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. many people have made generous contributions to the wide range of software dis- tributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to dis- tribute software through any other system and a licensee cannot impose that choice. this section is intended to make thoroughly clear what is believed to be a consequence of the rest of this license. . if the distribution and/or use of the program is restricted in certain countries either by patents or by copy- righted interfaces, the original copyright holder who places the program under this license may add an explicit geographical distribution limitation exclud- ing those countries, so that distribution is permitted only in or among coun- tries not thus excluded. in such case, this license incorporates the limita- tion as if written in the body of this license. . the free software foundation may publish revised and/or new versions of the general public license from time to time. such new versions will be similar in spirit to the present version, but may differ in detail to address new prob- lems or concerns. each version is given a distinguishing version number. if the program speci- fies a version number of this license which applies to it and "any later ver- sion", you have the option of following the terms and conditions either of that version or of any later version published by the free software founda- tion. if the program does not specify a version number of this license, you may choose any version ever published by the free software foundation. . if you wish to incorporate parts of the program into other free programs whose distribution conditions are different, write to the author to ask for permission. for software which is copyrighted by the free software foundation, write to the free software foundation; we sometimes make exceptions for this. our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. no warranty . because the program is licensed free of charge, there is no warranty for the program, to the extent permitted by applicable law. except when otherwise stated in writing the copyright holders and/or other parties provide the pro- gram "as is" without warranty of any kind, either expressed or implied, in- chapter . gnu general public license cluding, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. the entire risk as to the quality and per- formance of the program is with you. should the program prove defective, you assume the cost of all necessary servicing, repair or correction. . in no event unless required by applicable law or agreed to in writing will any copyright holder, or any other party who may modify and/or redistribute the program as permitted above, be liable to you for damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the program (including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the program to operate with any other programs), even if such holder or other party has been advised of the possibility of such damages. end of terms and conditions chapter . gnu general public license