2922 ---- COMPUTER BASED ACQUISITIONS SYSTEM AT TEXAS A&I UNIVERSITY .. Ned C. MORRIS: Texas A&I University, Kingsville, Texas. 1 In September, 1966, a system was initiated at the University which pro- vides for the use of automatically produced multiple orders and for the use of change cards to update order information on previously placed orders already on disk storage. The system is geared to an IBM 1620 Central Processing Unit ( 40K) which has processed a total of 10, 222 order transactions the first year. It is believed that the system will lend itself to further development within its existing framework and that it will be capable of handling future work loads. In 1925, the library at Texas A&l University (first known as South Texas State Teachers College and later as Texas College of Arts and Industries) had an opening day collection of some 2,500 volumes. By the end of August, 1965, the library's collection had grown to 142,362 volumes, in- cluding 3,597 volumes purchased that year. The book budget doubled in September of 1965, and the acquisitions system was severely taxed as the library added by purchase a total of 6,562 volumes. After one full year under the mechanized system discussed below, a total of 9,062 vol- umes had been added by purchase. Counting gifts, transfers, and cancel- lations, the computer actually handled 10,222 order transactions the first year. The computer-based acquisitions system now in operation was initiated in September of 1966, eleven months after the decision was made to ~echanize the process. The library had already experienced successes m computerizing the circulation and serial systems and, because a rapidly 2 Journal of Library Automation Vol. 1/ 1 March, 1968 expanding book budget had caused the old traditional type of acquisi- tions system to become unwieldy and seemingly obsolete, it seemed im- minent that the installation of a computerized acquisitions system would follow. Furthermore, it was agreed that acquisitions could make use of the computer at no additional cost, since the library was already paying its share of the machine rental costs for circulation and serials. Following the decision to go ahead with the project of computerizing the acquisitions system, a preliminary survey was made of the literature on the subject, and a plan for approaching the task conceived. Briefly, the plan hinged upon the idea of an automatically produced multiple order form similar to that proposed by IBM ( 1). It also provided for use of the change card, reported by Becker to be "a unique and very impor- tant part of the Penn State System" ( 2) . It further provided for the automatic production of a weekly books on order list or ''Processing In- formation List" similar to that reported by Schultheiss to be in use at the University of Illinois Libraries ( 3) . The plan was written in the form of a proposal which was then sent with an accompanying flow chart to the director of the campus computer center for consideration. The basic pro- posal for the new system was accepted, and work toward implementa- tion of the system was begun immediately. As was expected, the plan and How chart had to be altered in some areas as the project progressed. As a first step, the book order request form was redesigned to serve as a work slip in the verification routine, as a source document for key- punching, and, in the end, as notification to the requester that a requested item had been cataloged. The redesigned request card consisted of a single record form printed on one side of an IBM tab card (Figure 1 ). The only objection to usage of this form appeared to be that the requester would have no record of his request unless he produced one for himself. However, this form was adopted because it was judged less expensive , BOOK ltEQUEST FOil'ol FirttNCW'I"' et I Dept. _ _ volumes (complete s• t) Volum•_(onlyl [clition: Series: Yeor: Nvmber or Copi es:___ Lht I'Ti c:e:'---- - D.alcr : Cat . No. : ltan No . : (FO~ LIUA!!Y STAFF USE ONLY(: 0 OW~ l .C. Nu.Y... . . • ..•... ... .• .•. 1419.00 965.55 1633.66 ll80.2l· BUSINESS ADMINISTRATION ·--3358.00-1492.1":3- - .. 3.872.9H ·-.zoo7~l:-·· · ·- •. CHEMISTRY . 1182.00 385 .20 646.19 150.61 EDUCATION 4i~9.00 1755.05 1094.34 1549.61 ENGINEERING 2904.00 1143.60 1938.56 178.16· ENGLISH . ~~6~;o6 i591.06 1463~43 11Sd;S1 GEOGRAPHY 671.00 1310.75 826.58 1466.33· GOVERNI'IE'iil--· ·--- ---,2~7""3~3.""oo 1ooi:-8s·.. 1356.31 374.84 ·-- ·-·--- HISTORY 3091.00 1666.95 2856.31 1432.26- HEALTH AND PHYSICAL EDUCATION b.26.00 132.60 215.16 .. il8.24 HOME ECONOMICS 675.00 429.17 8.92 236.91 INDUSTRIAL ARTS 89i.bo .. 193.57 29;21 668;·~2 JOURNALISM 312.00 83.40 115.41 ll3.19 -,...¥A~TH~E~M~A~T~IC~S;-·----------.1~1+17'4'-;.0S:0;--,9.;;0~7-'-c.8~5i- 3ll.74 105.59· MODERN LANGUAGE 1801.00 626.45 978.38 196.17 MUSIC iilil2.00· 1214.15 338.92 328.93 PHYSICS 1494.00 634.30 823.19 36.51 ·;;svtHOlifGv··· · ....... ..... 1oz;oo· ·- ·:;az.oo 57i.6o ·· 21.6o- soc IOLOGY 739.00 197.45 69.96 471.59 ~SP;;.;E;,;E~C7:'H=-: ___________ ......... ll92.00 991.00 421.48 220.48· ~E~~.R.~L. .• 263?_1.00 8557.10 16907.12 886._78 TOTAL 64396.00 25984.62 37938.95 472.43 ··- ------- VALUE OF GIFTS ANO TRANSF ERS 7643.29 Fig. 7. Example of Computer Produced Financial Statement 8 Journal of Library Automation Vol. 1/ 1 March, 1968 (Figure 8) for budgetary purposes. The computer also gives credit to the appropriate fund for items cancelled. This accounting is accomplished through the use of one of the change cards mentioned above. The "books on order" list mentioned above is necessarily cumulative to include all new orders processed, since all new requests are checked against this list for possible duplications. This list always provides current inf01mation on the status of an order, enabling the user to find out to what stage in the total process a given order has progressed. Non-book materials are differentiated from book materials through use of Form Codes (Figure 9) which appear on the "books on order" print-out. Code Department AED AG ART BIO BA CHM ED EN ENG CEO GOV HST HPE HE lA JRN MTH MDL MUS PHY PSY soc SPE GEN GFT · Agricultural Education Agriculture Art Biology Business Administration Chemistry Education Engineering English Geography Govemment History Health and Physical Education Home Economics Industrial Arts Journalism Mathematics Modern Language Music Physics Psychology Sociology Speech General Gifts and Trasnfers Fig. 8. Fund Codes Used in the Acquisitions System Form Gode MICROFORMS _______ M FILMS _________________ __ ________ C FILMSTRIPS -----·---------S RECORDS ____ ___________ D TAPES _______________________ T MAPS --"---C---------- ------A MANUSCRIPTS ________ U SERIALS ___________ _______ p Fig. 9. Form Codes Used for Non-book Materials Computer Based Acquisitions System/ MORRIS 9 USE OF CHANGE CARDS If a dealer reports an item unavailable, cancellation data is noted on the first change card, which then is sent to the Computer Center. Here cancellation data is keypunched into the change card and the change card is fed into the computer to remove all information pertaining to the order from disk storage and consequently from the "books on order" list. The second change card is then discarded. If a dealer supplies an item, actual cost and date received is indicated on the first change card, which is then returned to the Computer Center. Here cost and date received is keypunched into the change card and the change card is processed through the computer to record receipt of the item and to adjust the corresponding account if necessary. The second change card then accom- panies the newly acquired item through the various stages of cataloging. At the appropriate time during the cataloging routine, the call number is written on the second change card. When the catalog cards are ready to be filed in the public catalog, the second change card is returned to the Computer Center where the call number is keypunched into it. From here this change card, usually in a group of several hundred, is fed into the computer and a list of current acquisitions (Figure 10) is printed out. The second change card then is coded so as to make possible the deletion from disk storage of all information pertaining to an order which has appeared on an acquisitions list for as long as two months after the item has been cataloged. This allows the Catalog Department ample time to file cards in the public catalog, thus reducing the possibility of unintentional duplication. Once deleted, the item no longer appears on the "books on order" list. USE OF FIVE-PART ORDER FORM Part one (the original) of the order is sent to the dealer. Part two is sent to the catalog department for use as an order for cards from the Library of Congress. Part three differs from part two in color only and serves primarily as a record of the Library of Congress card order. Part four, with part five and corresponding change cards, is filed alphabetically first by dealer and then by main entry. Part four serves as a report form on which to record dealer reports and other messages pertaining to the status of the item on order. In the event that an order is cancelled, part four is sent to the catalog department as a signal that Library of Con- gress cards may also be cancelled. Part four is discarded if a claim or cancel procedure is negated by receipt of an ordered item. Part five, with part four and corresponding change cards, is filed in the same manner as part four above. When an item is received and paid for, cost and date received is recorded on this copy of the order. Part five, designated as the Control Copy, then is filed by order number in the library's "contror file for possible use in the identification of items already approved for 10 Journal of Library Automation Vol 1/ 1 March, 1968 F /G183DS 0 15.72 /G5896F V1+2 016.37139/H383P 016.519 /INP.B 016.9 /K953D 028 . 52 /B6448 029 .6 /M1990 031. /W569F 056. /IN25 V3 1963 060. /W893 1966 67 110 . /M494E 130.1 /V631B 131. /K1396P 131 .3464 /W6321 137.842 /B388R V1 1961 137.842 /B388R V2 137. B42 /B388R V3 GARLAND HAM A DAUGHTER OF TH E MIDDLE BORDER PETER SMITH 1960 GONZALEZ LUIS FUENTES DE LA HIST CONTEMP HEX COLEGIO MEXICO HEX 1961 HENDERSHOT CARL PROGRAMED LEARNING BIBLI OGR APHY ED 3 THE AUTHOR MICH 1964 WOLD HERMAN 0 BIBLI OGR APHY ON TIME SERIES MIT PRESS MASS 1966 KUEHL W F DISSERTATIONS HI STORY AMER L1 B ASSOC MCKERROW R B WHEELER WILL A PAN AM UNION MEL SEN A VAN VESEY GOD N A KANTOR J R WICKES FRAN G BECK SAMUEL J V 1 (ONLY>. BECK SAM J V 2(0NLY) BECK SAMUEL J V 3(0NLY) UNIV OF KENTUCKY KY 1965 BOOKS FOR CHILDREN 1960-1965 AM L I ASSOC CHIC 1966 ON THE PUBLICATION OF RESEARCH MLA N y FAMILIAR ALLUSI ONS GALE IND EX TO LATIN AMER PERIODICALS EO 3 SC AR ECROW 1965 WORLD OF LEARNING 1966-67 EO 17 INTERNATL 1967 EVOLUTION AND PHILOSOPHY DUQUESNE 1965 BODY AND MIND READINGS IN PHILO HUMANITIES 1965 PROBLEMS OF PHYSIOLOGICAL PSY PRINCIPIA PRESS !NO 1947 THE INNER WORLD OF MAN UNGAR N Y 1959 RORSCHACKS TEST BASIC PROCESS ED 3 GR UNE 1961 RORSCHACKS TEST VARIETY OF PER GRUNE 1949 RORSCHACKS TEST ADVANCES IN GRUNE 1<':152 150.1943 /B78B BROADBENT D E BEHAVIOR BASIC BOOKS 1961 Fig. 10. Example of Computer Produced Current Acquisitions List payment which may no longer appear on the "books on order, list. It further provides official evidence that purchase was duly authorized. GIFTS AND TRANSFERS A gift item is processed in the same manner as a purchase except that part one of the order is discarded. An estimate of the value of each title is submitted so that the total value of gifts can be produced automatically Computer Based Acquisitions System/MORRIS 11 for a given period. An item transfelTed from the Bookstore or any other department of the institution is processed in the same manner as a gift, except that the actual cost of the item is used rather than an estimate. STANDING AND CONTINUATION ORDERS A standing or continuation order for a series is keypunched with coded information which causes it to appear indefinitely on the "books on order" list. The two-fold purpose of this is to eliminate the possibility of unin- tentional duplication and to serve as evidence that the order was author- ized. An item actually received on a standing or continuation order basis is processed as a confirmation order and is assigned an order number different from the one assigned the original order. In this way, the item received will appear on the "books on order" list next to the original entry only as long as it takes to catalog the item. CLEARANCE OF INVOICES AND FINAL ROUTINES Upon receipt of shipment and colTesponding invoice, an item is ac- cepted (if as ordered) and the date of acceptance and cost (as per in- voice ) is noted on the first change card. This change card is then re- turned (usually in a group of several hundred) to the Computer Center, where cost and receipt date are keypunched into it. This information is fed into the computer and accurate accounting results. The next print- out of the "books on order" list will indicate that the item was received on the date noted. Part four of the order is discarded. Part five of the order, bearing cost and date received, is filed by ~rder number in the "control" file. The second change card and the original request card ac- company the book to the catalog department. Book pockets are pasted in the books at this point to accommodate the second change card and, later, the IBM circulation card used by the library's circulation depart- ment. At the end of . the cataloging routine, the original request card is sent to the requester as notification that the item is ready for use. DISCUSSION No attempt has been made to compare costs of the new system to the old. On the surface, however, there appears to be considerable saving in time and clerical personnel. Automatic accounting alone results in a net gain of approximately twenty hours per week in clerical time which can be applied to other necessary manual tasks. Manual typing of orders has been completely eliminated with the use of the computer produced or- der, resulting in further savings in clerical time. Limitations of the new system are about the same as those encountered by other mechanized systems, the limiting factors of space in input and electronic storage being most obvious. The present disk storage equip- ment is capable of storing data on approximately thirteen thousand book orders and this capacity could be doubled with the addition of another 12 Journal of Library Automation Vol. 1/ 1 March, 1968 disk unit. The problem · of disk storage space is not critical at present because removal of order information from storage at two-month inter- vals after the cataloging process creates additional space for new orders. Although the new system has definite advantages, perfection was never expected nor does it exist. The human error factor in the book verifica- tion and keypunching processes shows up now and then. Experience bears out the fact that output is only as perfect as input. Nevertheless, there has been a noticeable gain in accuracy with the installation of the new system, mainly because the more exacting method of procedure helps in detecting an error before it is beyond retraction. Even keypunch- ing accuracy has been much greater than expected. CONCLUSION The new acquisitions system at Texas A&I University does the job that it was designed to do. It has resulted in faster clearance of orders, better control over unintentional duplication of orders, and automatic account- ing. It is believed that the system will lend itself to further development within its existing framework and that it will be capable of handling future work loads. ACKNOWLEDGEMENTS Much of the credit for the success of the program goes to Dr. J. R. Guinn, Professor and Chairman of the Department of Electrical Engi- neering. His time in reviewing the original proposal and his subsequent efforts toward the implementation of the project resulted in a workable, practical system. Credit goes also to Mr. Patrick Barkey, former Librarian at Texas A&I University (then known as Texas College of Arts and Indus- tries) for the encouragement he gave to the writer and for the support he gave to the project. Appreciation is extended also to Mr. R. C. Jane- way, Librarian at Texas Technological College, for submitting some worthy ideas on design of order forms and on acquisitions procedures in general. REFERENCES 1. International Business Machines: "Mechanized Library Procedures," IBM Data Processing Application Manual (White Plains: IBM, n. d.), p. 11. 2. Becker, Joseph: "System Analysis-Prelude to Library Data Process- ing," ALA Bulletin, 59 (March 19~), 296. 3. Schultheiss, Louis A.: "Data Processing Aids in Acquisitions Work," Library Resources and Technical Services, 9 (Winter 1965), 68. 4. Cox, Carl C.: "Mechanized Acquisitions Procedures at the University of Maryland," College and Research Libraries, 26 (May 1965), 232. 2924 ---- 51 BROWN UNIVERSITY LIBRARY FUND ACCOUNTING SYSTEM Robert WEDGEWORTH: Brown University Library, Providence, R. I. The computer-based acquisitions procedures which have been developed at the Library provide more efficient and more effective control over fund accounting and the maintenance of an outstanding order file. The system illustrates an economical, yet highly flexible, approach to auto- mated acquisitions procedures in a university library. · The Fund Accounting System of the Brown University .Library was initi- ated on the basis of a program developed in April, 1966. Subsequently, it was decided to implement the program in the fall of that year. The necessary in-house equipment, namely, an IBM 826 Typewriter Card Punch and an IBM 026 Keypunch, was placed on order along with new six-part order forms. About the same time an agreement was reached with the Administrative Data Processing Office of the University (Tabu- lating) which would provide for rental time on their IBM 1401, 12K system with three magnetic disks and four magnetic tape-storage units. The services of a part-time programmer were also secured through this office. The system became fully operational on December 1, 1966. The primary objective of the project was to establish more efficient and more effective control over the approximately 150 fund accounts ad- ministered by the Order Department of the University Library. In addi- tion, it seemed that a number of by-products were possible. Among these were statistical information for management and a file of bibliographical records from which a new accessions list could be drawn on a regular basis. The system was to accommodate the payment of all invoices to be posted against the aforementioned accounts. These include mono- 52 Journal of Library Automation Vol. 1/ 1 March, 1968 graphic and serial publications as well as supplies and equipment. How- ever, records of outstanding orders were to be maintained for monographic publications only. Although the basic routines were to remain much the same, some minor adjustments wer~ necessary to accommodate the new machine system. Also, several flle s of dubious value to the new system · were to be maintained in order to gain empirical evidence as to their worth. This report is presented as a record of an attempt to develop an eco- nomical, yet highly flexible approach to the automating of acquisitions procedures of a university library. Perhaps the scope of the computer-based acquisitions procedures at Brown may be determined more easily relative to three recently reported systems of varying complexity. One of tl1e best surveys of automated university library acquisitions systems appears in the project report of the University of Illinois, Chicago Campus (1). However, two of the systems summarized here are more recent. The University of Michigan was included in the Illinois literature survey, but the first full description to be published appeared just recently. · Automated acquisitions procedures have been in operation at the Uni- versity of Michigan Library since June, 1965 (2). The system features a list of items produced by computer from punch cards in which order · information has been recorded. This list is produced on a monthly basis with semi-weekly cumulative supplements. The computer also produces status report cards. These are punch cards, containing summarized order information, which travel with the book and at appropriate processing stages are coded and returned to the computer in order to up-date the status code in the processing list. Thus by checking the status code one can determine that a book has been received, received and paid, or cataloged. Claim notices are automatically produced for items which remain on order for longer than the predetermined period. In addition to creating and maintaining full financial records and compiling selected statistics, the system will produce specialized acquisitions lists on demand. Yale University Library creates a machine readable record of a request before it is searched or ordered ( 3). As a result, the status-monitoring system is almost immediately effective. An IBM 826 Typewriter Card Punch is used to type purchase orders, and the IBM 357 Data Collection System is used to monitor the progress of an item through the system. The process information list is produced weekly with daily supplements. Automatic claiming and financial record maintenance are also products of the system. Moreover, numerous statistics are planned for management purposes. The fund control system reported by the University of Hawaii features financial accounting for book purchases based on pre-punched cards cor- responding to purchase orders typed ( 4). The list price is keypunched into the appropriate card in a separate operation and used to encumber Library Fund Accounting System / WEDGEWORTH 53 funds. Upon receipt of the book the invoice is matched with the appro- priate punch card, and after actual cost is keypunched the card is used to up-date the account. The Michigan and Yale systems incorporate all of the major features of operational university library automated acquisitions systems. Foremost among them are the list of items being processed and its coordinate monitoring system. The cost of creating and maintaining such a file was prohibitive for Brown. Brown, Michigan and Hawaii generate a machine record after searching. Unlike Michigan and Yale, Brown and Hawaii do not have "total" acquisitions systems plans. At Brown serials control is not included. At Hawaii fund accounting is the only task of the system. Also, Brown differs from Michigan and Yale in that the claiming pro- cedure merely notifies the department that certain items are overdue. The Brown system is certainly not as economical as that of Hawaii, but the use of the Typewriter Card Punch creates a highly flexible and easily expanded system for the difference in cost. MANUAL FILES AND PROCEDURES The manual routines of the Order Department are based upon the maintenance of four basic files. The file documents are all parts of the six-part purchase order form. The Outstanding Order Search File is an alphabetical card file representing unfilled orders, requests t9 search for items, and inquiries for bibliographical information. This file is virtually independent of other routines, thus making it feasible for it to be merged with the file of items waiting to be cataloged. The Processing File con- sists of outstanding orders filed first by book dealer, and second by order number. This file is used to check in shipments of books, to record re- ports on orders and to record claims. The Numerical Control File is an order number sequence file containing one copy of every order typed regardless of its ultimate disposition. It provides rapid access to informa- tion regarding retrospective orders. The Fund File is a file of completed or cancelled transactions filed first by fund name and second by order number. The latter two files were thought to be of dubious value to the new system. However, it was agreed to maintain both for the time being. In order to accommodate the Ftmd Accounting system, the procedures developed feature two basic routines based on the presence or absence of a unique order number. Unique Order (Figure 1) Items acquired in this fashion include purchases and solicited gifts. Continuations, but not serials, are included. When a request is received in the Order Department, it is searched in the main catalog, the waiting catalog and the outstanding order file. If it is found to be neither in the Library nor on order, it is then given to an Order Assistant who com- pletes the bibliographical work, if necessary, and assigns a fund and 54 Journal of Library Automation Vol. 1/1 March, 1968 Fig. 1. Unique Order Procedure. ABBRF.V!ATIONS KP - Key Punch KV • Key Verify C - Exhibit C FIM- F~te Maintenance . N.B . Library Fund Accounting SystemjWEDGEWORTH 55 Fund Slip To KP Card KP-KV Record card · Book Arrives Pull Out .... t-----1 Order Card Catalo!! er Re- tuH~ Update E* Proposed accessions listinr pror ram Fig. 1 Continued. 56 Journal of Library Automation Vol. 1/ 1 March, 1968 dealer. If the price is listed in a foreign currency, the Assistant converts it to U. S. dollars. The request then proceeds to the typist. All unique orders are typed on an 826 Typewriter Card Punch. As the typist fills in the six-part order form, pre-~lected pieces of information are key- punched automatically. These fields are as follows : Order number Order date Source type - D for domestic, etc. Fund number List price Author Title Imprint Series Orders are proofread on the day after they are typed. The forms are separated and the outstanding order cards are filed immediately in order to detect duplicate orders. At this point the dealer slips are mailed and the numerical control slips filed. The processing file documents, each con- taining a fund slip, an L.C. order slip, and a cataloger's work slip on a separate perforation, are then filed pending the arrival of the books. Also, the deck of IBM cards which has been weeded of voided orders goes to Tabulating. ·. Although books may be processed without invoices, the normal practice is to process after the arrival of the invoice. The processing file document is obtained and the cost, invoice date and the number of volumes are noted on the fund slip. If the item is a continuation, a supplementary fund slip is made and the original returned to the processing file with the receipt noted. The invoices are cleared and sent to the Controller. The fund slips representing books received are sent to the keypuncher in order to up-date the accounts. In the meantime the books, along with the work slips and the L.C. order slips, are sent to the Catalog Depart- ment. As the books are cataloged, the work slips noting any major biblio- graphical changes and the call number are returned to the Order De- partment. From these slips are punched bibliographical adjustment cards and an up-date record card containing the call number and coded for subject and location. The resulting bibliographical record forms the data base for the new accessions listing. No Unique Order (Figure 2 ) Items acquired in this fashion include unsolicited gifts, exchanges, standing orders, etc. Some continuations and all serials invoices are in- cluded. Upon arrival, invoiced items without unique order numbers are searched. If they are duplicates they are retwned for credit. If they are not duplicates, they are sent to the typist. Catalog file slips are typed Library Fund Accounting SystemjWEDGEWORTH 57 KP • KV Series 9 card Book Without Unique Order Number Here Book & Invoice To Typist Create Slips & Record card Books & Slips To cataloger Invoice Cheeked And Si ned Invoice To Controller N.B. Of course no r.ecord card will be made for alfts or exchanges Fig. 2. No Unique Order Procedure. 58 Journal of Library Automation Vol. 1/ 1 March, 1968 and by-product bibliographical and ·accounting records are punched. On the record card for accounting, the order number field is filled with nines. This signals the program that this entry is a receipt for which there was no unique order number. The series of order numbers beginning with 900000 was originally reserved for assignment to our standing order agree- ments with presses, societies, etc. Eventually, each will have its own order number. However, the last number of the series, 999999, will con- tinue to be used for miscellaneous receipts. Presently no accessions listing records are being generated for items without unique order numbers. However, all purchases without unique order numbers are processed with a series 9 order number. Serials All serial invoices are handled as series 9 transactions with no attempt to record bibliographical information or volume counts. Expenditures for serials are accumulated and entered as one transaction each time the accounts are up-dated. This decision was made in anticipation of the development of a separate serials control program. IBM 1401 FILES AND PROCEDURES The basic function of the computer program for the Fund Accounting System is to maintain current balances on the various library fund ac- counts and to maintain a file of outstandmg orders exclusive of standing orders. Although several correlative functions are distinct possibilities, the only additionaT function planned is a file of bibliographic records for the production of an accessions listing. Figures 3, 4, 5 and 6 illustrate the major __ tasks to be performed by the system. The programming language used is Autocoder. Fund Balance Forward File A card file created at the beginning of each fiscal year having two card types. L Fund Group Header Card a. Group Code b. GroQ.p Name This card assigns a unique code and name to categories of funds such as endowed, special, etc. 2. Fund Balance Forward and Appropriation Card a. Fund Group Code b. Fund Code c.-- Fund Name d. Previous Year Balance Forward e. Current Income . or Appropriation f. Balance Forward Code g. Remaining Previous Year Encumbrances Library Fund Accounting Systemj WEDGEWORl'H 59 Fund Balance Forward File Create Libr.ary Fund Accounting File Fig. 3. Fttnd File Creation. Fund Grou9 Headers Fund Listing This card contains information used to establish the individual funds at the beginning of each year. The Balance Forward Code directs the program to carry over excess funds to the next year, not to carry over excess funds to the next year, or to carry over a negative balance to the next year, thereby reducing Cash Balance resulting from the new income or appropriation. Encumbrances are carried over to the next year in order to maintain an accurate Net Available at all times. 60 Journal of Library Automation Vol. 1/ 1 March, 1968 .--- I F/M I ·-- -- Fig. 4. File Maintenance. Completed Record Cards r-- 1 I I I I Library Fund Accounting System/ WEDGEWORTH 61 New Or ders ------.., I I I I _..J Fig. 5. Fund Accounts Updating . 62 Journal of Library Automation Vol. 1/1 March, 1968 Library Fund File A magnetic tape file created from the Fund Balance Forward File and containing three record types,. 1. Fund Group Header 2. Fund Record a. Fund Group Code b. Fund Code c. Fund Name d. Previous Year Balance Forward e. Current Income or Appropriation f. Current Expenditures g. Cash Balance h. Amount Encumbered i. Net Available j. Volumes Purchased k. Balance Forward Code Fund Record fields a, b, c, d, e, h and k initially are taken from the corresponding fields in the Fund Balance Forward Card. Current Year Expenditures and Volumes Purchased are preset to zero each year. Cash Balance is determined by the sum of the Previous Year Balanc(l Forward_ and the Current Income or Appropriation. Amount Encumbered will be · preset to zero or taken from the fund card. Net Available is determined by the difference between Cash Balance and Amount Encumbered. 3. Fund Group Trailer This record is the last within each fund group and contains a summa- tion of the quantitative fields in that fund group. It is used primarily for control purposes. Figure 4 illustrates the file maintenance program for the Library Fund Files. This program permits the addition or deletion of a Fund Group Code, changes to a Fund Group Header, addition or deletion of a spe- cific fund or changes to a specific fund. However, changes to quantitative fields are limited to those fields which are contained in the Fund Balance Forward Card. Thus, Net Available may not be changed directly by file maintenance but may be changed by manipulating Current Income or Appropriation. . The Library Fund F ile is a serial file maintained in ascending algebraic sequence on Fund Group Code, Fund Code and Fund Record from major to minor respectively. Outstanding Order File A magnetic disk file created and up-dated by three card types. 1. Order Card a. Order Number b. Order Date Library Fund Accounting System/ WEDGEWORTH 63 c. Source Type - D is domestic, F is foreign d. Fund Number e. List Price Figure 5 illustrates the program which processes new orders. This pro- gram validates Fund Code, rejects duplicate order numbers and encum- bers List Price, thereby reducing Net Available. 2. Record Card a. Order Number b. Invoice Date c. Fund Code d. Cost e. Continuation Order Code, if applicable f. Number of Volumes Standing orders, blanket orders, serials, etc. are purchased without placing an order. Consequently, a series 9 order number is assigned to these Record Cards. Such cards will not match the Outstanding Order File by definition but will increase Amount Expended, decrease Cash Balance and Net Available and increase Volumes Purchased. All other Record Cards must match an existing order number on file. On continu- ations the Record Card for each part received produces a transaction as described above, except that the encumbrance remains unchanged until the final Record Card appears without the Continuation Order Code. 3. Adjustment Card This card may be submitted for either an order card or a Record Card. It is differentiated by a special code. Its primary purpose is to correct a previous error or to effect a cancellation. · The Outstanding Order File is in ascending algebraic sequence by Fund Group, Fund Code and order number. All cards used in this pro- gram must be pre-sorted into this sequence. P1'intout Products The accumulated punch cards are processed on a bi-weekly schedule by the Tabulating Office. A file maintenance report (Figure 4) is the first product of each run. It lists in detail any adjustments, additions, or deletions to the fund listing plus the results of such operations. At the end of the detailed report is a summary of the status of each active fund. Copies of this latter report are distributed for desk use to all Order As- sistants, the Chief Order Librarian, and the Librarian. The transaction register of fund activity (Figure 5) lists each transac- tion posted to each fund for the inclusive period. The Assistant in charge· of bookkeeping is the primary user of this and the detailed file mainte- nance report. 64 Journal of Library Automation Vol. 1/ 1 March, 1968 The delinquent orders report (Figure 6) lists all past due outstanding orders according to two cycles. Domestic orders are listed bi-monthly and foreign orders are listed quart<;rly. The listing is of the "tickler" variety, as it may not be necessary to ask reports on all of the items. An order will remain on the delinquent orders report until it is filled or cancelled. Control Card List Delinquent Orders Fig. 6. Delinquent Order Listing. CONCLUSION Delinquent Orders . As of October, 1967, the Fund Accounting System has been in opera- tion for ten months. Assessment of its effectiveness in terms of meeting the primary objective shows the System to be an immediate success. At this . point costs are about the same for the manual system as for the present one. However, accounts which used to require from 25 to 30 man-hours per month are maintained with about 5 man-hours per month. Our current equipment and processing costs run about $325 per month. On the other hand, we have become aware of some shortcomings of the system. The addition of a currency conversion sub-routine would greatly expedite the many requests for foreign publications received daily. Secondly, the addition of a dealer code would make the delinquent orders list much more useful. At present a user must search the numerical file for the order to ascertain the dealer. The processing file copies are then pulled to go to the typist who asks reports on delinquent orders. A revised program incorporating both of these features is being planned and will be operational early in 1968. The proposed accessions listing has been rejected as a by-product of this system primarily because of the limited character set available on our IBM 1403 print chain and the excessive length of the average listing. The time and expense of storing and up-dating the bibliographical record Library Fund Accounting System j WEDGEWORTH 65 for each new acquisition should, in our estimation, result in a more palat- able end-product. We have, therefore, temporarily discontinued produc- ing punch cards for the bibliographical records. As a corollary, it should be added that we have turned to a consideration of the paper tape type- writers as input/ output devices, focusing on their expanded character set and operating speed. The speed of the 826 leaves much to be desired. The Numerical Control File has proven its usefulness as a rapid index to our files spanning several years. It is extremely helpful in identifying quotes on old order numbers which have long since been cancelled. The Fund File, however, has proven to be a duplicate of our machine file. It is thought that replacement of the slip in the numerical control file with the fund slips would at the same time reduce our files by one and up-date the information in the numerical file. Finally, this modest beginning, occasioned by limited financial resources as well as the lack of personnel with experience in data processing, seems to have been justified. Moreover, although the increasing complexity of our involvement in library automation poses some serious planning and supervisory problems, we are encouraged by our initial success. ACKNOWLEDGMENTS The staff of the Order Department have all contributed to the produc- tion of this report. However, a special note of gratitude is acknowledged for t:Pe assistance of Dorothy Woods and Gloria Hagberg imd for the technical advice and assistance of AI Hansen, library programmer, and David A. Jonah, Librarian. REFERENCES 1. Kozlow, Robert D.: Report on a Library Project Conducted on the Chicago Campus of the University of Illinois, (Washington: NSF, 1966), p. 50. 2. Dunlap, Connie : "Automated Acquisitions Procedures at the Univer- . sity of Michigan Library," Library Resources & Technical Services, 11 ·(spring 1967), 192. 3. Alanen, Sally; Sparks, David E.; Kilgour, Frederick G.: "A computer- . monitored library technical processing system," American Documen- · tation Institute. Proceedings, 3 ( 1966), 419. 4. · Shaw, Ralph R.: "Conh·ol of Book Funds at the University of Hawaii Library," Library Resomces & Technical Services, 11 (Summer 1967) , 380. 2925 ---- 66 1 COMPARATIVE COSTS OF CONVERTING SHELF LIST RECORDS TO MACHINE READABLE FORM Richard E. CHAPIN and Dale H. PRETZER: Michigan State University Library, East Lansing, Michigan A study at Michigan State University Library compared costs of three different methods of conversion: keypunching, paper-tape typeW1·iting, and optical scanning by a service bureau. The record converted in- cluded call number, copy number, first 39 letters of the author's name, first 43 letters of the title, and M.te of publication. Source documents were all of the shelf list cards at the Library. The end products were a master book tape of the library collections and a machine readable book card for each volume to be used in an automated circulation system. The problems of format, cost and techniques in converting bibliographic data to machine readable form have caused many libraries to defer the automation of certain routine operations. The literature offers little for the administrator facing the decisions of what to convert and how to con- vert it. Automated circulation systems require at least partial conversion of the accumulated bibliographic record. The University of Missouri, like many libraries, has been converting the past record only for books as they are circulated ( 1) . Southern Illinois University ( 2) and Johns Hopkins ( 3), on the other hand, have converted the record for their entire collections. The Southern Illinois program is based upon converting only the call num- ber. Johns Hopkins has converted the call number, main entry, title, pagination, size, and number of copies. And Missouri has recorded call number, accession number, and abbreviated author and title. Costs of Shelf List Conversion/ CHAPIN and PRETZER 67 · Several methods of converting the record have been described. Mis- souri employed keypunching; Southern Illinois marked code sheets which were scanned electronically and converted to magnetic tape; Johns Hop- kins, working from microfilm copy of the shelf list, used special type font and typed the records for optical scanning. An IBM report on converting the National Union Catalog recommended an on-line terminal as the best method of conversion ( 4). Studies at Michigan State University led to the conclusion that acquisi- tion, serials, circulation, and card production contained certain routines that might well be automated. Once automation of circulation was de- cided upon as our initial effort, decisions were necessary as to the con- version. It was recommended that a portion of the bibliographic record for all items in the shelf list should be converted. Information other than the call number is being used for other programs ( 5) . Cost figures for converting library records are scarce. In only two in- stances are figures available. The IBM report on the National Union Cata- . log shows that the average entry in NUC contains 277 characters, with an estimated conversion cost ranging from $0.3531 to $0.417 per entry. The proposed conversion method employs an on-line terminal, a tech- nique not available to most libraries. The Johns Hopkins conversion of "about 300,000 · cards" was accom- plished by optical scanning and cost $18,170 (3,p.4). This figures out at about $.06 per record. Later in the report it is stated that the conversion "is at a rate of $.0038 per character converted" ( 3,p.25). At $.06 per card and $.0038 per character, the converted record would consist of 16 char- acters! In the study herewith reported every effort was made to arrive at com- parative cost figures for the three methods of conversion that are readily available to most research libraries: keypunching, paper-tape typewriting, and optical scanning as accomplished through a service bureau. METHODS OF STUDY The shelf list records of the Michigan State University Library were divided into three sections by numbering catalog drawers in sequence: 1,2,3; then 2,3,1; then 3,1,2. All the drawers marked with number one became one sample group; those marked two and three made up the other groups. This method of numbering the drawers gave samples from each area of the classification schedule for each method of conversion. The bibliographic data were taken directly from the shelf list without transferring information to worksheets. A sample of the shelf list shows that 74 per cent of the cards are Library of Congress cards or copies of Library of Congress proof slips. Of those cards produced in the library, only 12 per cent of the total were abbreviated records. The keypunch operators, the typists, and the service bureau were in- 68 Journal of Library Automation Vol. 1/ 1 March, 1968 structed to extract information from the shelf list record. All differences in type-capitals, italics, etc.-were to ~e ignored; transliterated titles were to he used in those cases where entries were in non-Roman alpha- bet; accents and diacritical marks were ignored, except where it made a difference in filing, as with umlauts; all numbers in title and author fields were to be spelled as if written. apahaf dlsw.rslon m crystal rtics and the theory of excitons ,l>yl 1'. M. Agranovic and V. L. Ginzburg. Trnnslnted from the original manuscript by Literaturpro- jekt, Innsbruck, A~tria. London, New York Intencience Publishers ,cu~r~ \!1 ' ml, 316 p. ua. 24 em. (lnterteleoce mon01rapba and te:dl In phyiiiCI.I and utronomy, Y, 18) Translation of KpHcTu .. oonTHKa c )"'eTOM n~Tp&HCTaeiiiiOI .IIIC- nepc:HH H Teo pH• JltCHTOHOB ( romanlzed: ~latallooptlka I adletom prostrn n11t\ ennoJ dl1pe11111 I teorUl ~ksltonoy) B.lbllograpby: p . 807-313. 1. Cry11tal optiCI. 2. E1:clto" tbeory. t. Gln&bUrJ, Vltallt Lua- NYich, 191&- joint author. n. Title. (Sertea: lntencience monOSJ&phl iD physics ao aattooomy, v. 18) QD941.A.13 548'.9 66-2tH7 Llbrarr of Con1re1111 ed • ern.tional .relations• San Francisco, Chandler 0 Fig. 1. Shelf List Cards. Costs of Shelf List Conversion/ CHAPIN and PRETZER 69 Information that was transcribed is marked in the example, Figure 1. The complete call number 1) was included. Author 2) was typed through 39 spaces, including dates, if possible. In cases where author entry was lengthy the operators were instructed to stop at the end of 39 spaces. Title 3) was recorded as completely as possible th1·ough 43 spaces, but not to extend beyond the first major punctuation. Date 4) was included as shown. Only one copy 5) was shown on each entry. In the example of abbreviated form in Figure 1, five separate records were required, with change only in copy number. The master book tape includes the call number, which occupies 32 spaces; 3 spaces are allowed for copy number, 39 for author, 43 for title, and 4 for date of publication. On the book card, Figure 2, which was generated by the computer from the master book tape, the format is as follows: 32 spaces for call numbers, 3 for copy number, 11 for author, 26 for title and 4 for the year published. The remainder of the card is for machine codes used in the circulation system. L. I I I I II I Ill Ill I I I MICHIGAN STATE fJNjVERSiiY I LIBRARY I I I I I I I I I II I IMPORTA.fr: I IF THIS CARD IS LOST OR DAMAGED, A FINE JLL IE CHARGtA. MSU 7J6 I I I I I Fig. 2. Book Pocket Card. I I I I I I II . I I The book card alone can be created directly by the keypunch. How- ever, if a library has equipment available for a more complete program, it is useful to prepare information in a format to create a master book tape. Programs have been written so that the master tape can be added to or deleted from at a later date. Four operators worked on the project at Michigan State University. Two of them were average keypunch operators with little typing skill, one was an expert typist, and the other was an expert keypunch operator. The first two operators were trained to use both the keypunch and the Flexowriter. The purpose in using a variety of typists and operators for the job was to arrive at average figures for the conversion project. The data show great variance of output among operators. .70 Journal of Library Automation Vol. 1/ 1 March, 1968 The outline of the methods used is shown in Figure 3. The keypunch method recorded the bibliographic data by use of an IBM 026 keypunch. The punch cards were transferred to a magnetic tape and the book cards were generated by the computer. The paper-tape typewriter information was punched in paper tape by the use of a 2201 Flexowriter. A portion of the sample was converted directly to magnetic tape. Since some libraries will not have a paper-tape to magnetic-tape converter, the remainder of the paper-tape sample was converted to punch cards and then to magnetic tape. TYPED 1-----+ PAGE Fig. 3. Flowchart of Shelf List Record. OPTICAL SCANNER The optical scanning method was handled by Farrington Corporation·s service bureau, Input Services, in Dayton, Ohio. The service bureau as- signed 10 to 15 employees to transcribe the shelf list. They used IBM selectric typewriters, with special type font. Special symbols were used to designate end of field. The data were recorded on continuous-form paper. The typed record was then edited and scanned, producing a mag- netic tape. Mter the tape was used for production of book cards, it was added to the master book tape. The first batch of cards sent to Dayton was gone from the Library for approximately four weeks. After the personnel at Dayton became accus- tomed to the format and to library terminology, the turnaround time was approximately two weeks. The 255,000 records which were converted by the service bureau were sent off campus in four separate batches. Costs of Shelf List Conversion/CHAPIN and PRETZER 71 Machine verification of the record was not required. Each operator was instructed to proofread her own copy. Machine verification was consid- ered, but the idea was discarded because of the extra cost involved. Also, since book cards were to be inserted in all volumes, final verification would result when the books and cards were matched. RESULTS In the conversion keypunching cost 6.63 cents per record. Paper-tape ran slightly higher-7.07 cents; this higher cost was due to the added cost of machinery and the added cost of going from paper tape to mag- netic tape. Optical scanning, through a service bureau, was exactly the same as keypunching-6.63 cents, including the programming costs. Cost details are shown in Table 1. Table 1. Average Cost Per Shelf List Record Converted Labor (1) Salary Fringe Benefits Equipment Rental (2) Computer Supplies Overhead ( 4) Contractual Services Keypunch $.04073 .03723 .00350 .00322 .00280 .00042 .00003 .02232 Paper-tape Typewriter $.03960 .03620 .00340 .00888 .00840 .00048 . (3) .00052 .02172 Scanning, Service Bureau $.00030 .06600 (5) TOTAL $.06630 $.07072 $.06630 ( 1) Average costs for all operators based upon salary of $2.10 per hour, and fringe benefits of 9.4 per cent. ( 2) Rental tin1e to Library of IBM 1401 computer is $30.00 per hour, including personnel costs. (3) Includes $.000089 for tape-to-tape conversion and $.000091 for tape to card to magnetic tape conversion. ( 4) University charge of 54.87 per cent of salaries, for space, utilities, maintenance, etc. This figure does not include cost of training and supervision. ( 5) $.057 per record plus .009 per record for programming costs. Late in the study we observed that a seemingly inordinate amount of the Flexowriter time was consumed by the automatic movement of the typewriter caniage to the pre-determined fixed fields. In order to circum- 72 Journal of Library Automation Vol. 1/ 1 March, 1968 vent this the operator was instructed to strike one key to indicate end of field, and then she no longer had to wait for the carriage movement. By using the manual field markers, as opposed to automatic fixed fields, the cost of the Flexowriter operation was reduced to 6.672 cents per record. The disadvantage of the manual field-marking system was the increased chance of operator error, which amounted to 3.13 per cent more than the fixed-field method. For this reason, and in spite of the economy of the manual method, the use of pre-determined fixed fields for Flexowriter conversion is to be preferred. In the comparison of the salary costs for keypunching and for the use of Flexowriter, great variations were shown among operators. Two par- ticipants were asked to use both the keypunch and the Flexowriter on varying days, with tallies of their output accounted for throughout the entire project. Operator 1 was essentially a skilled keypunch operator who had some background in typing. Her salary cost per record during keypunching was 3.98 cents; her salary for the paper-tape typewriter was 7.92 cents. Operator 2 was a skilled keypunch operator who was also sent to typing class for one term to raise her typing skill. Her salary cost was 3.92 cents per record on the keypunch and 3.79 cents per record on the paper-tape machine. Operator 3, who was a skilled keypunch operator, averaged 2.32 cents per record for salary cost. Operator 4, who was a typist and not a keypunch operator, produced records on the Flexowriter at a cost of 3.56 cents per record. The above figures indicate salaries only, and do not include overhead, fringe benefits, and other expenses which are reflected in the total conversion cost shown. A letter from Farrington Service Corporation stated the following in- formation about the scanning operation: "1) Our typists produced an approximate total of 7,950 typing pages in the course of this conversion. 2) Each typist averaged from 3.6 to 3.8 pages per hour. 3) We processed an average of 800-1,000 (shelf list) cards, per girl, per day. 4) The total man hours expended in this project was 2,144. 5) The amount of error detected as a result of sight verification varies significantly from girl to girl. The average, however, ran approximately 2.8 per cent (of records to be corrected)." Comparison was made of actual records converted per eight-hour day by each of the methods. The service bureau, with skilled typists, was able to convert approximately 100 records an hour for each typist. The most efficient keypunch operator averaged about 75 records per hour, which was noticeably more than the average. The paper-tape typist, us- ing pre-programmed fixed fields, reached 65 records per hour, but was able to produce 73 records per hour by manually typing the field markers. A short-run sample was stop-watch-timed to give an indication of the differences in results for each method when only minimum changes in certain fields, such as copy number or volume number, were required. Costs of Shelf List Conversion/ CHAPIN and PRETZER 73 On the keypunch machine an operator consumed 34.6 seconds in typing the initial record and 20.4 seconds in duplicating the basic information and changing data in one given field. The operator with the automatic program Flexowriter consumed 47.2 seconds typing the initial record, in- cluding 13.2 seconds in shifting fields and automatically firing the record marks, and 24 seconds duplicating the record. When she manually indi- cated the field information, she was able to convert the initial record in slightly less time-30 seconds; and she took 22.8 seconds to duplicate the data with a change in one field. Final verification will be completed only when all cards are matched with the proper books. For those books that do not circulate, this may never be accomplished. A sample of cards was selected to reflect the three methods of conversion. The service bureau cards contained fewer errors than those produced by keypunching and paper-tape typewriting. Production of records that were not acceptable to the computer in an edit program occurred in 1.75 per cent of the sample for keypunching, 0.93 per cent for paper-tape typewriting, and 0.16 per cent for service bureau. Operator errors, discovered while matching cards with books, showed a higher percentage: 4.62 per cent for keypunching, 3.60 per cent for Flexowriter, and 0.35 per cent for service bureau. CONCLUSIONS AND RECOMMENDATIONS 1. The cost of converting a portion of the bibliographic record is rela- tively inexpensive when compared to the total cost of automated library programs. One reason for our delay in entering into the field of an auto- mated circulation program was that of making the book cards. Now that this task has been completed, it is obvious that conversion is a one-time cost that can well be absorbed. If the library cannot afford the original conversion, at a cost of 6 or 7 cents a record, then the library cannot afford to proceed with automated programs. 2. There is no difference in cost between keypunching a machine readable record and in having the project undertaken by a service bu- reau. The use of paper-tape typewriter for conversion costs more than the other two methods. 3. Large scale conversion of records to machine readable form might well be done by an outside organization. In order to get the task com- pleted in a short period of time, a library would be required to hire a number of short-term clerical employees. In the case of Michigan State, situated in the small community of East Lansing, recruiting and training a large number of employees for short-term projects is most difficult. It is rather certain that the overhead for such a program would bring the cost beyond that of using a service bureau. On the basis of our experience it is recommended that the conversion be sent to a service bureau. 74 Journal of Library Automation Vol. 1/ 1 March, 1968 4. A library can get along without portions of. a shelf list for short pe- riods of time. One of the predicted problems of sending material off cam- pus to be converted was that of losing the availability of the shelf list records. Although there were some inconveniences, it was found that the library could carry on its operations and function without the shelf list. Certainly, this could not be done if the shelf list cards were gone for any length of time. ACKNOWLEDGMENT A grant from the Council on Library Resources, Inc., made possible the study described in this paper. REFERENCES 1. Parker, Ralph H.: "Development of Automatic Systems at the Uni- versity of Missouri Library," in University of Illinois Graduate School of Library Science, Ptoceedings of the 1963 Clinic on Library Appli- cations of Data Processing. (Champaign, Ill.: Illini Union Bookstore, 1964)' 43-55. 2. Southern Illinois University. Office of Systems and Procedures: An Automated Circulation Control System for the Delyte W. Morris Li- brary; the System and Its Progress in Brief. (Carbondale, Ill.: Southern Illinois University, 1963). 3. The Johns Hopkins University. The Milton S. Eisenhower Library: Progress Report on an Operations Research and Systems Engineering Study of a University Library. (Baltimore: Johns Hopkins, 1965). 4. International Business Machines. Federal Systems Division: Report on a Pilot Project for Converting the pre-1952 National Union Catalog to a Machine Readable Record. (Rockville, Maryland: IBM, 1965). 5. Chapin, Richard E.: "Administrative and Economic Considerations for Library Automation," in University of Illinois Graduate School of Li- brary Science, Proceedings of the 1967 Clinic on Applications of Data Processing. (In press). 2923 ---- 13 A BOOK CATALOG AT STAN FORD Richard D. JOHNSON : Stanford University Libraries, Stanford, Calif. Description of a system for the production of a book catalog for an under- graduate library, using an IBM 1401 Computer (12K storage, 4 tape drives), an expanded print chain on the 1403 Printer, and an 029 Card Punch for input. Described are the conversion of cataloging information into machine readable form, the machine record produced, the computer programs employed, and printing of the catalog. The catalog, issued an- nually, is in three parts: an author & title catalog, a subject catalog, and a shelf list. Cumulative supplements are issued quarterly. A central idea in the depiction of entries in the catalog is the abandonment of the main entry concept. The alphabetical arrangement of entries is discussed: sort keys employed, filing order observed, symbols employed to alter this or- der, and problems encountered. Cost factors involved in the preparation of the catalog are summarized. In November, 1966, a new library opened at Stanford University. De- signed primarily to serve undergraduates, the J. Henry Meyer Memorial Library is a major addition to the libraries on the University's campus. A four-story structure with 88,000 square feet of usable space, it has shelving for 140,000 volumes and seating for 1,900 readers. The new li- brary has numerous distinctive features. One is the subject of this account -the catalog. There is no standard card catalog in the building. Instead, copies of a book catalog are situated at eighteen locations throughout the library, easily accessible to all students and staff. In addition, copies 14 Journal of Library Automation Vol. 1/ 1 March, 1968 of the catalog have been placed at other points on the campus: the main and departmental libraries, offices of academic departments, and student dormitories. The literature now contains numerous accounts on the preparation of book catalogs in libraries ( 1,2). One may question the value of yet an- other narrative, but an account of the Stanford experience is valuable for several reasons. The genesis of the Stanford book catalog has been recorded, and a follow-up describing what happened subsequently is the next chapter in the story. The book catalog experience at Stanford is now sufficiently advanced that one may recount the undertaking both in depth and breadth-from its inception, through design, implementation, and first full year of operation. Such an account can give Stanford's approach to some still unsolved problems; for example, filing order, and the inno- vations it has made. The particular environment within which the book catalog was designed was conducive to innovation, because the entire University Library system was not itself committed. Finally, the approach here employed has been eclectic, and this report can record thanks to the many individuals .and institutions whose ideas and plans have been examined for possible use in the Stanford undertaking. Of particular im- portance to this project were the example and experience of Florida Atlantic University, the Ontario New Universities Library Project at the University of Toronto, and the Columbia-Harvard-Yale Computerization Project. ORIGINS The Stanford book catalog had its ongms in 1962. During planning for an undergraduate library it was felt a catalog in book form and avail- able in many locations would have immeasurable educational benefits for the students. Particularly was it felt that the subject portion of such a catalog would prove a valuable bibliography to students in the Uni- versity ( 3). Somewhat later, when the size and proposed layout for the new library indicated the desirability of at least three complete card catalogs as an adequate guide to the collection, further emphasis was given to the possibility of a book catalog in multiple locations. A grant in 1963 to Stanford University from the Council on Library Resources, Inc., permitted a study by Robert M. Hayes and Ralph M. Shoffner on the economics of book catalog production. This investigation compared the costs of the various ways in which a book catalog can be produced ( 4). Of the methods considered, Stanford selected the com- puter to study further. The computer was chosen not only because equip- ment was already available on campus but also because of the recent introduction of an expanded print chain with the capability of printing upper and lower case letters as well as necessary diacritical marks. In the fall of 1964 Stanford undertook further study, employing the Hayes-Shoffner report as a basis but now comparing refined costs of a A Book Catalog at Stanford/ JOHNSON 15 computer-produced book catalog with costs for three complete card cata- logs in the new library, as well as costs for two shelf lists and main entries in the University Libraries' union catalog. This second study was completed in December, 1964, and University officials approved the prep- aration of a computer-based book catalog for the library when it was determined that such a catalog would prove more useful, and for a few years less expensive, than the three card catalogs ( 5). While the autumn study was in progress, cataloging of the new lj- brary' s collection began. Plans were made for three card catalogs. Al- though the card catalogs were never prepared, the planning was of con- siderable value later in establishing field and record lengths for the ma- chine record, as well as in securing general agreement on the kind of information to include and the format of the final catalog. SYSTEMS DESIGN Preliminary systems design began in January, 1965. A systems engineer from IBM guided a team of University staff composed of librarians and personnel from the Administrative Data Processing Center in the Con- troller's Office. At the outset it was recognized that the assignment to produce a book catalog for the new library did not call for consideration of the other aspects involved in the library's operation~, such as acquisi- tions, circulation and reference. But as work proceeded, efforts were con- sciously made to design a system that could be integrated into a larger system at a later date. The basic object of the preliminary systems design was to refine fur- ther the cost estimates from the study of the preceding autumn. The system as it was being designed, however, called for increased machine time and corresponding increases in cost for processing as well as for programming. In retrospect, the major achievement of the preliminary systems design was to establish the environment for a meaningful dialogue between the librarian and systems and computer personnel. When the study began, the librarian requested a system that would have involved use of a large configuration of equipment with direct-access capability. The sys- tems and computer staff approached the design with knowledge of the equipment that would be used for the project (an IBM 1401 Computer, 12K storage, 4 tape drives) and thought in terms of fixed-length records and fixed-length fields. Through a program of mutual education, the li- brarian learned of the computer and what it could do and what it could not do; and systems and computer personnel learned of the library's re- quirements and desires. There evolved the basic design for a system capable of being implemented on the equipment at hand and acceptable to the library. As preliminary systems design drew to a close, necessary equipment was ordered. The principal element was the expanded print chain for 16 Journal of Library Automation Vol 1/ 1 March, 1968 the IBM 1403 Printer, containing 100 different characters and developed earlier by Florida Atlantic University, Yale University, and the University of Toronto. In addition, appropriate modifications were made to the cen- tral processing unit of the 1401 Computer to be used in the project. For the inputting of data the IBM 026 Card Punch was selected. It was available, and there was considerable local experience in its use. A modi- fication made to it simplifies punching of one character, the word-sepa- rator character, used to designate an upper-case letter. Delivery time on the 026 Card Punch was four months. Although it was realized that the newly announced IBM 029 Card Punch would be superior for our project, delivery time on it was one year. Even before the 026 Card Punch was received in July, 1965, an order was placed for an 029. The 029 replaced the 026 in August, 1966. The 029 Card Punch, designed for use with System/ 360, was considered superior to the 026, because it is possible to punch each of the characters specified on the expanded print chain without resorting to the multi-punch key. Appropriate modifications were ordered for the 029 so that desired characters would print at the top of the punched cards. Detailed systems design was completed by June, 1965, and the system may be described in the following manner. OUTPUT The design called for four basic outputs from the system: 1. An edit list to facilitate proofing of the items converted into machine readable form. This was considered essential be- cause of print-out in upper and lower case. 2. An author & title catalog listing items under their author and title entries. 3. An alphabetical subject catalog listing items under Library of Congress subject headings. 4. A shelf list entering all items in call number order (the Li- brary of Congress classification was adopted in May, 1965), giving all tracings for a particular entry, as well as the num- ber of volumes and copies and their location in the library. A complete catalog was to be printed annually (author & title, subject, and shelf list) with cumulative monthly supplements to each. Output for the annual author & title catalog and subject catalog from the com- puter printer were to be photographically reduced, offset masters created, and fifty copies printed. The catalogs were then to be bound in reusable binders. Later it was decided to restrict use of the reusable binders to the shelf list, printed in four copies, and the supplements for the author & title and subject catalogs, to be printed in six copies, and to bind the basic annual catalog in standard book form. It was also decided to print ten copies of the author & title and subject supplements. A Book Catalog at Stanford/JOHNSON 17 It was originally proposed to divide the catalog in a slightly different manner: names (as authors and as subjects) and titles in one section, and topical subjects in the other. Although this seemed to have consider- able logical value, it proved impossible to implement during preliminary work with card files, given the time and staff available. Provision was also made to print the catalog in one section as a dic- tionary catalog if so desired, or on cards if the book catalog should be abandoned at a later date. INPUT To achieve the above output, the design called for four kinds of input into the system: 1. Entries for titles cataloged. A separate record was to be made for each volume or copy of a title cataloged so as to provide holdings information for the shelf list and for inte- gration into a circulation system at a later date. 2. Cross references to connect headings in the author & title catalog and in the subject catalog. In addition, the cross ref- erence format would permit the introduction of information notes into any of the catalogs. 3. Changes to entries that are in the catalog. 4. Entries for items that are on order, with a view to integrating this form of input into a larger acquisitions system at a later date. IMPLEMENTATION The systems design called for the preparation of eight different com- puter programs to transform the input into the various documents as specified above. The basic programs were written dlll'ing the six-month period of June-December, 1965. During the first part of 1966 the pro- grams were debugged and the very important change procedure pre- pared that enables revision or deletion of a record. Coincident with the preparation of the programs, library staff began in July, 1965, the input- ting of cataloging information. The expanded print chain was installed in June, 1965, and edit listings for proofing purposes were available in August. In order to test the programs and study the catalog's format, a first test catalog was prepared in January, 1966. A second test, incorpo- rating the change procedure, was undertaken in April; and a third, par- tial, test was run in June. THE MACHINE RECORD When Stanford first considered the costs of a book catalog in 1962, it was quickly discovered that the most expensive element was reproduc- 18 Journal of Library Automation Vol. 1/ 1 March, 1968 tion of the individual pages. This factor influenced many decisions in design: The more entries per page, the fewer pages and less overall expense. It became necessary then to consider which elements in a stand- ard catalog entry could be omitted or abbreviated. Decisions were fairly simple to make. The collection duplicates almost entirely material in the main research library's collection, with full bibliographical information given in that library's union card catalog. In addition, browsing is en- couraged among the open shelves of the new library. The books are readily available should further information be required. Along with the factor of cost another element appeared-the desire to make a book catalog that would be something more than reproductions of unit catalog cards. As this thought evolved, it was learned that more space could be saved in the catalog through abandonment of the unit card and main entry concept. Articles by Ralph H. Parker ( 6) and Wes- ley Simonton ( 7) were instrumental in developing this aspect of the system. · The Library was amenable to a short entry in the catalog, but the actual length was another matter. From a sampling of items cataloged, it was learned that more than 99 per cent of the entries would be less than 500 characters in length. There was considerably less certainty on maximum lengths for the individual units, or fields, composing each en- try. Computer personnel argued in favor of a fixed-length machine record in order to simplify programming, and a successful compromise was made: There was to be a fixed-length record composed of one fixed- length field and six variable-length fields. Each record is 570 characters in length. For the few catalog entries that are extremely long it is pos- sible to use two records for one catalog entry. The maximum length for any catalog entry is thus approximately 1,000 characters. It is possible to enter even longer units by dividing them into sections and entering each as an analytical entry. To speed input-output time and to conserve space on tape, the records are placed on magnetic tape in blocks of two records each. Each of the six variable-length fields in the record is individually tagged. It was learned during the preparation of a later program that it would be necessary to restrict the overall length of any one field, and it was agreed that the maximum length of any one of the variable-length fields would be 400 characters. Through a misunderstanding, the author did not realize that in tape storage an upper-case letter is equivalent to two characters, a factor not taken into account when record and field lengths were established. Fortunately, this minor error has occasioned no problem. THE MASTER TAPE RECORD The master tape record (Table 1) illustrates how all of the informa- tion appears on magnetic tape. (Figure 5 gives an example of the layout.) A Book Catalog at Stanford/JOHNSON 19 Table 1. Map of Master Tape Record. Position Type of Information 1-30: 31-35: 36-42 : 43-44: 45-46: 47: 48: 49: 50: 51: 52: 53-54: 55-57: 58-71: 72-77 : 78-83 : 84-89: 90-95 : 96-101: 102-107: Library of Congress classification Size and/or format of publication (e.g., folio, Mfilm) Volume number Part number Copy number Type (blank: monograph, no anal.; 1: monograph, anals. made; 2: serial received in unbound form; 3: serial, unbound, anals. made; 4: serial received in bound form; 5: serial, bound, anals. made; 6: analytic; 7: author-title cross reference; 8: subject cross reference; 9: item on order) Record indicator (program supplies ''1'' if there is an overflow record and "2" in second record) Special location in library (code A-Z) Change indicator (code C for revision; code D for delete) Title indicator (code T if entry desired under title) Shelf list indicator (code S if entry .is to appear in shelf list only) Year acquired (e.g., 67) Month and year reported missing (e.g., 117 for Nov. 1967). It is assumed a book will be removed from the catalog if missing more than· nine years. Future codings Address and length of main entry (Area 20) (three positions for address, three for length) Address and length of conventional title (Area 30) Address and length of title paragraph (Area 40) Address and lengh of notes (Area 50) Address and length of subject headings (Area 60) Address and length of added authors and added titles (Area 70) 108-570: Variable length fields THE FIELDS To simplify coding and keypunching, each field in the record is called an Area and numbered 10 through 70. As will be shown later, these num- bers are not transferred to tape. A description of the seven fields in each record can give a good idea of the elements included in cataloging and how the unit card/main entry concept was abandoned. Area 10 is the one fixed-length field in the record. It is 71 characters long and contains positions for call number, volume number, and copy l 20 Journal of Libra1·y Automation Vol. 1/ 1 March, 1968 number. In addition, it contains indicators for other elements: type of publication; record indicator (program supplied if there is overflow to a second record); special location in the library; change indicator; title indicator; shelf list indicator; year of acquisition; and date missing. Four- teen positions remain blank for future use. Area 20 contains the main entry, Area 30 the conventional title, Area 40 the title paragraph. The title paragraph includes: the title; author statement; edition statement; imprint, limited to publisher and date; and collation, limited to pagination. Area 50 contains notes. Subject headings are recorded in Area 60, entered one after another and separated one from another by a record mark, a symbol resembling a double dagger. Added authors and added titles are entered in Area 70, similarly separated one from another by the record mark. Only added titles are entered in Area 70. If a catalog entry is desired under title, then the title indicator is marked in Area 10. PERSONAL NAMES On the form of personal names in the catalog, it was decided to an- ticipate the Anglo-American Cataloging Rules, publication of which was imminent. In general, the title-page form of a personal author's name is used. On the one hand, this has meant a shorter record and greater sim- plicity in inputting data; on the other hand, it became necessary to main- tain a name authority file when the form adopted for the book catalog differed from that established by the Library of Congress or earlier cata- loging rules. The relator, the element that describes the relationship of a person used as an entry to the work being cataloged (e.g., ed., tr., comp., illus.), is omitted in the heading to save space. The relationship is shown in the title paragraph. A heading in the book catalog, either author or subject, is printed once before a group of titles and repeated only if the titles associated with it are continued in another column. In addition to not permitting use of the relator, the system does not permit in the author & title catalog "added" entries composed of an au- thor and a title. In standard cataloging such a technique may be used instead of a separate analytical entry. In the author & title catalog, how- ever, such a composite entry would establish a new "author" (name plus title of the work) and would file as a separate unit after all works by that author. In the subject catalog the author-title entry is permitted so that books about voluminous authors and their individual works may be better displayed. THE CONVENTIONAL TITLE The conventional title has been employed to assemble under an au- thor's name editions of a work with variant titles. Collected writings of an author, or selections, are given the conventional title [Works] or [Se- A Book Catalog at Stanford/JOHNSON 21 lections] ; Through a combination of coding and programming, they are entered first under an author's name before titles of individual works are listed. (See in Figure 8 the entry under Karen Horney for an example. ) The conventional title has meaning only as it is related to the main entry. For that reason it prints only in the catalog when preceded by the main entry. THE TITLE PARAGRAPH AND THE UNIT RECORD As summarized above, the title paragraph includes the title, author statement, edition statement, imprint, and collation. With one major ex- ception this involves the copying of, or truncation of, information present on a Library of Congress card. The exception is the author statement. As shown in a recent investigation ( 8), this element was present in but twenty-five per cent of the entries studied. Current cataloging rules per- mit in some cases the omission of the author statement when it is identical with the form used in the heading ( 9,10). These rules are based upon a cataloging system employing unit records on cards, the first element of which is the main entry. In unchanged form the author statement is used as the main entry; for added entries another heading, such as au- thor, title, or subject, is superposed on the card. In the Stanford system a new unit record was introduced. The first element of it is the title paragraph. All headings, rriain or added, are placed directly above it; and if entry under title is desired, a . title entry is made in hanging-indention form. The Stanford book catalog thus does away with the main entry con- cept completely. The necessity, or even wisdom, of setting apart one field in the machine record as main entry may be questi~ned. Why not group the main entry with the other added author entries in Area 70? There were two reasons: First, it is simpler to adapt the information from Library of Congress cataloging information if the form can be fol- lowed relatively closely. Second, we wished to allow for the possibility of printing standard catalog cards if necessary, and this would allow for a reinstatement of the standard unit card concept. A basic requirement of the system is that the author statement must be included in the title paragraph. If for any reason it cannot be listed there, then it is recorded in note position in Area 50. Although no formal study was undertaken, it was believed that works by single personal au- thors would constitute more than fifty per cent of the collection. The ad- dition of the author statement in the title paragraph for each such book could add considerable bulk to the catalog. Accordingly, through the use of record marks as coding symbols, the author statement is set off in the title paragraph for those works by single personal authors. Through pro- gramming, the author statement is suppressed when the work is entered in the catalog under the name given in the main entry; whereas it ap- pears under all "added" entries. 22 Journal of Library Automation Vol. 1/1 March, 1968 i }7({]1\_, '-. 0 ~' -oifOII j 1Y !.0. • "<• "'·><• ~~· ! :·~·;.~~~,, t: ,· DEC z Q 1966 1 1' fl ' 6'6 1 ·, . :··' ·""': '~~ ,,, ~~~ -·· ' ' ' ,. ':·.-:. }"'~:::;> ' '., • ., ..... NO. '/'---"'~' ' ' I . . ' ~ ' ? I .. · ·' .':' . ' . 1018~ Hdll~· ).;--' . ':.' _i''··-::-''~·· :;-_, . v ~·~" 101 i~- 0 ' . . ; ; r.~~- -Ickes, Harold L. • u • 1 " • u ~ ~O(]J ll .:l,'h•;j}secret diary ot Kt.rold L, I ekes, ' . ,t.-1 2 0 1, 1111:.: ' !4oiT.i- 8o6 Simon t.nd Schuster, 1954-~5- 3 v. II:J. '1' ~ ' :4.~~ :12 ~\eet.a, lc 'illae fire~ tbocuana OOOK No. OAT£ _l; I , . da;s, 1933 1936. ..e, 'l'he iulde n-t;~ 1 l1JJ : • · 11 1 c, . .,.,o.f_ atx oggh, 1936 1939 3 'Rio loun"'&- u.l.o · · : 1 • tl:Odth 1939 19'•1 1 OTH( '\ l.• O. sur&. h.._,,u,f ootS ! so[]] '<•'l-"·••,_, .. ' . { . ' V•';.,...,,,.;, J· . I llll I ~ ' u.s.~-Polltics and sover.,..ent--1933-1~' . · . ·~· Do~ I. Title . VOL. NO. u I'T. (f)I'Y s4 . ~ ' 11 J 11 J i. ·. ' l'{Jl () " . " ........ '1· .!' .:. ·. (\;,: ..... --·· · .. c.:T-·-··· OECZO~ _ ..... •· . I ·. 6J6' . ·- Fig. l. The Coding Sheet. A Book Catalog at Stanford/JOHNSON 23 The system as it has been established calls for works to be entered alphabetically by title under the heading in the catalog. In the subject catalog this means, too, that works are listed alphabetically by title under each subject heading and not by author. It was felt that this form of arrangement is quite satisfactory for a selective collection, such as the Meyer Library; and it offers the possibility of scanning a page of thirty or more entries. It has occasioned one problem-when a subject heading expresses form and not subject. For example, under "SYMPHONIES," works are arranged alphabetically by title and not by composer. CONVERSION The basic source document used is the Library of Congress catalog card. Although the card itself could be used by crossing out unnecessary information and adding other data, it was felt that a clearer document would result if the needed information were copied onto another catalog card. Examples of such cards are shown in Figure 1. (Subsequent figures depict the manipulation of the entry for the book by Thomas A. Bailey, Presidential Greatness.) As illustrated in Figure 1, an identification num- ber is assigned to each catalog card and four catalog cards are placed upon a coding board and a xerographic copy made. . The original cards are filed in a manual shelf list with the identification number as an indication that the information has been coded. The coding sheet is given to the coder, who enters Area 10 information in the blocks at the right and indicates to the keypuncher where other areas begin and what special symbols should be used. · To simplify the inputting of data and the scanning of punched cards, a special data-processing card was designed for input (see Figure 2) . Each title converted is represented by a decklet of punch cards averag- ing six. Each of the seven areas or fields begins on a separate card. Al- CN1'1oV 1. 0. !fQ tl.Ali :O ~VJS:ot\11 $ 118 fiN SOOt! ~!\ OAT I!: Oflll"!l 1. ()_ ~4/PI:IItMA....!l:!'OUJ!oi:E Ml'li!:R ~.:.y !' ~ " . 1\ /·· •o O.IIOEIIt£0 041[ OIIOt" IM*.ij" ; ,__ 1niT' ::-. ~ 0 0 0 0 D 0 DOOOODOOIIIDOOOOOOOOOOCOOODODIIIOOOODOOOOOOOOOOOOOOOOOOOOOOIDDOOOQOODO ~ '~ 1 t =. ~~Q~"~n ••Nwnn~ nuu~n~nnu~u~n~••ftdUM~u••u•l1t~li»~"~~~UMflHUMUHUhunnnnHn~n~na • 111111 I 11 I 1 I II 1111111 I 1.1 11 111 11 I 11 I 1 I 1111111 I I 11 11 11 1111 I ~ 2,0 MAIN ENTRY ~i 222222 30 222222222222222222222222222222222222222222222222222 CONVENfiON.AL TITlE• g~ 3 3 3 3 33 3333333333333333333333333313 33 33333333333 3333333333 z~ 40 TITLE Q~ 444444 444444444444444444444444444444444444444444444444444 ~~ $0 NOTE$ ~;;1 SSSSSS SS5555555555 55SS55 SSSSSSSSSSSSSSSSSSSSSSSSSSSSS6l55 c •o SUO JECT HEADINGS !ll sssiu iiiSISiiSIIIIGGii ii illiiiilllli l iillilliiilllllllll !:; TO IIII(DAUI~$-IGC£0 mm ~111117 11 I I I I I I 11111 111 111111 1J1111111JIII111111111111H 1771111111111711111111 IIIIIIIIIIIIIIIIII I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII III IIIUIIIII I IIIIIIIIIIIIIII !9999919!9999999919999999999999!99999911119999!!99!99999!!! !!9! ! ! !! ! 1!!9911!1!9! I l 3 t I & ',.! ~~~~~~ 112 11 U I)J* ll)t IIJtJ\UU!cUan:UUOll Ublt ""31M··~·~ 0"4UUitiUSI$1'1" ~' "~Ui51~Hh1UUMIUUIMM71 n nn )4 Wltn Hltl Fig. 2. The Data Processing Card. 24 Journal of Library Automation Vol. 1/ 1 March, 1968 though this may be considered wasteful of cards and indicative of "SO- column mentality," it does have its benefits. A mistake in punching in one area requires repunching of the material in that area only, the area being the smallest unit for editing purposes. So that the cards are kept in correct order for processing, the first ten columns of the cru:d are used for identification numbers. The six-digit identification number assigned to the original card is punched in the first six columns. The first digit is the month, 1-9 being January through September and zero being October-December, one "month" of 92 days; the second and third digits are the day of the month; and the fourth through sixth a consecutive number assigned each day. It is possible thus to code 999 enhies each day. The year is omitted, because it was as- sumed data would be transferred to magnetic tape at least once a year and probably more often. The area number is punched in columns 1 and 8 and the sequence within the area in columns 9 and 10. Cataloging information begins in column 11. Figure 3 shows a decklet of cards for one title. Information in Area 10 is fonnatted somewhat differently for books on order or for cross references. To enter a book on order, the word "OR- DERED" is punched in columns 22 through 28, the date and order num- ber in columns 29 through 40, and 9 in column 57, the type indicator. ·. This information prints in the catalog as a call number, and books on order are listed first in the shelf list. To date, entering of books on order has been limited to a few sample cases only. A cross reference in the subject catalog has a "call number" composed of the first nine characters of the entry to which reference is made (punched in columns 13-21) and an eight-digit identification number (in columns 22-29). An 8 is punched in column 57, the type indicator. A cross reference in the author & title catalog has a similar "call number" composed of the first ten characters of the entry to which reference is made (punched in columns 12-21) and an eight-digit identification number (in columns 22-29). A 7 is punched in column 57, the type indicator. Subject cross references are listed in the shelf list in "call number" order following books on order; and author & title cross references follow subject cross referen.ces. The "call numbers" for cross references do not print in the catalog proper, and serve only as addresses to retrieve the cross reference from magnetic tape when a change or deletion is neces- sary. Added copies and volumes are entered by preparing Area 10 informa- tion only and punching S in column 61, the shelf list indicator. The first copy entered is never coded explicitly as copy 1, even though a second copy is being simultaneously added. The program automatically identifies the first copy entered as copy 1, and the number prints in the shelf list when another copy is added or a volume or location is shown. ~o long as cataloging information is on punch cards and not on mag- A Book Catalog at Stanford/ JOHNSON 25 netic tape, the 10-digit identification number is the device used to re- trieve the information. When the information is transferred to magnetic tape, the identification number is lost and the call number becomes the identification device. As shown in Figure 3, information in Area 40 (and in Areas 20, 30, 50, and 70 as well ) continues to successive cards. It is not actually neces- sary tb punch through column 80 before beginning a continuation card for an area, and experience indicates that corrections are simplified if blanks are left at the end of each card. The only requirement is that if 0 10'1000 1 0 0 00 1 08110001100800DOD8001000DGDGOO G~GOOOBIGODOOOOGOOGIOO nQQMBM ~ ·~Mnnn"aKnnH·~~n~ nXUM~~~U~~UUtt ~"M~U~~MMgMHUUVU~ »MUUUNftD n~n MnN ~ MAIN (HUY COHV( NTKlftAL. TI TlE> TITL E NOTE:S II 1("0 SUII..ICCT H£ /tDI,.G3 11 1 1111 1 1111 1 1111 1 111 11 11 1 1111 1 11 1 11 1 1111 1 11 1 111111 2 2 2 22 2 22 2 2 2 2 2 2 2222 222 22 2 22 22222 2 2 222 222f22 2 22222222 llll 3 J 3 3 3i 3 J 3 3 J J J 33 J 3 J l J J J J JJ J J J J J J 33 J J l J l ll lll J l J l 1 4411114 44111144114 4 4 1 44 1 1114 41 444 4 14 4 4 44 4444 4 4 14 44 I I S 5 IS IS II I I 51 I I 5 1 I IS I IS 55 5 fS I 1511151555 55 5 55 5 51 Ill AI:UO AUTII«Nez Perc6 lnd!ons end the opening of the Northwest+• by Alvin"· Josephy0 Jrto Yale Un lv. Press1 19€5. 705 P• Sub.! NEZ PERCE INDIANS--HISTORY NORT H~ EST0 PACIFIC-HISTORY r . o. o8I009 Call Auth Tltl Subj J.D. Call Auth Tl tl Sub j I.D. Call Auth 1111 Subj I. D. E17l .A43 INDEX Amer i can h eri ta ge typ Joe cllg u. sht yr 66 Ten year cu~ula t! ve Index, Amer!con heritage. Volu•e VI, Hu•ber 1-Volu~e XV• Nu•~•r 6. Dece•~•r• 195t - Octobe~, 1964. American Heritage, 1965. 167 P• U. S. --HIS TORY--INDEXES u.s.--CIVILIZ~TION--INDEXES 081010 El76.1.Bl7 typ loc chg tlo T aht vr 65 ::!!~~;ni~:~•:r~;tness: the Saage and th e • •n !toe George Washington to the presentt• By Tho••• Ao B•lleyte App l eton- Cen t ury, 196~ 368 ~· PRES IDEtiTS- -U .s. 081011 EB06.12 V.l tvp Joe chg I ckes, Harold t. secret dlar~ of Harold L• Iekel• Si~on 1nd Schuster, 1954-~5. 3 v. The pro9ra• ha; e ncountered a card sequence error. u. s. --POL !TI CS ANO GOVERN~ENT--1933-19~ 5 08101 2 u. T slit )lr 66 Page 35 FRANCE Call I.D. . E81!.§..I2 V.2 ·~~~i_:i) typ I oc chg tlo sht S yt 66 ( .... <;.. C'-<1 e'l/c'l3 Fig. 4. The Edit List. ~ <:r.l I .,... .a 1:'"1 .,.. ~ ~ ~ ~ B" ~ .... a· ~ ~ !""" ....... ........... ....... ~ II) '"t _?- ....... ~ ~ A Book Catalog at Stanford/ JOHNSON 27 THE EDIT LIST On a regular basis, generally once a week during activation, an edit list is run on the computer for cards punched since the last listing. An example of a page from this list is shown in Figure 4. This list is proofed against original cataloging data as represented on the coding sheets. In- formation still remains on punched cards, and errors detected are cor- rected on the cards. As an aid. to proofing, the edit program generates a number of error messages: cards out of order; absence of Area 10 information; call num- ber incorrectly formatted; an invalid character punched; information too long to fit into two machine records; information in one field more than 400 characters in length; an incorrect use of coding symbols (for use in determining filing order and to set off author statements) ; and incorrect use of record marks in Areas 60 and 70. A nagging problem encountered during proofing is the fact that it is done out of context. In the preparation of cards for a standard card cata- log, a second form of proofing is possible when cards are filed and entries compared with headings already in the catalog to insure that they are compatible. With a machine performing this function, this further check is not practical with existing equipment. Similarly, the machine does not recognize human errors and will file a misspelled word as it was entered and not as the word it was meant to be. The discipline required for the accurate inputting of cataloging information intended for machine ma- nipulation is at times frightening. Differences between a book catalog and a card catalog became more obvious as work progressed. In a card catalog one can: see from the typography, the stains on older cards, and the kind of card stock used, tl1at information was entered at different times. One is more willing to tolerate the differences that appear. Because it is not generally possible to compare a number of entries at the same time in a card catalog, one misses many inconsistencies. On the other hand, in a machine-produced book catalog, one scans numerous entries with one glance. Produced at the same printing, they appear of equal vintage even though they may have been entered at different times. The inconsistencies resulting from changing cataloging rules become very obvious. This is particularly an- noying with respect to the matter of capitalization, and the effort to pro- duce an internally consistent document is difficult. COSTS FOR INPUTTING The first year's experience ( 1965-66) has indicated a cost of $.40 per title for inputting of cataloging information. Indications are that this cost has remained constant for the second year. Included in inputting for each title is provision for all extra records needed for added volumes and copies and cross references. The cost does not include actual cata- 28 Journal of Library Automation Vol. 1/ 1 March, 1968 loging, preparation of the typed catalog card, overhead, or computer charges for the edit list. The cost may be broken down as follows: Table 2. Inputting Costs, 25,000 Titles (1965-66) Coding: 50 titles per hour @ $2.20 per hour Keypunching: 12 titles per hour @ $2.20 per hour Proofing: 72 titles per hour @ $7.40 per hour ( 2 staff members) Equipment : Keypunch rental ( $926.02); punch cards ( $312.34); and coding sheets ( $520.86) $ 1,100.00 4,583.33 2,569.43 1,759.22 Total $10,011.98 The Stanford experience indicates that over a period of time it is pos- sible to input 100 titles per eight-hour day on each card punch or ap- proximately 2,000 per working month. This figure is based on a shortened catalog record as described, but includes provision for separate records for added volumes and copies and cross references. The staff employed at Stanford had no previous experience keypunching and were instructed either in a formal school for five days or, as is now done, on the job. Three or four staff members are trained for keypunching at all times and have regular schedules. With a staff of this size, punching can proceed . on a steady basis in spite of vacations and illnesses. THE PROGRAMS Systems design called for the preparation of eight computer programs. In addition, two package sort programs from IBM are employed for the filing of entries in the shelf list and in the author & title and subject catalogs. As experience increased, it was found that the updating of en- tries in the first annual catalog when merged with new data for the sec- ond annual catalog could be simplified if three utility programs were used; these also were prepared. A locally devised assembly language, SOPAT, similar to Autocoder, was used for the programs. The first prog~am is the Edit Program ( LB001) which processes the cards to prepare the edit list described above. During the first two years of operation the basic pattern has been to prepare weekly the edit list described above using the Edit Program ( LB001), and transfer data to permanent storage on magnetic tape once every three months. (The punch cards are stored in another area on campus, as back-up.) The quarterly basis has coincided with the sched- ule for the program tests as well as the quarterly supplements and annual catalogs. In brief, the following happens: 1. ·Cataloging information is transferred from punch cards to magnetic tape through the Card to Tape Progr~m ( LB010). 2. Through a Call· Number Sort ( IBM Sort 7) the above rec- ords are arranged in call number order for a basic shelf list. A Book Catalog at .Stanford/ JOHNSON 29 3. Through the Format and Update Program (LB020), all the necessary entries for the author & title catalog and subject catalog are generated from the above records and the shelf list is updated. 4. Through an Alphabetical Sort (IBM Sort 7 or IBSYS 7090 Sort depending on the magnitude) the entries for the author & title and subject catalogs are arranged in alphabetical order. (Longer sorts have been run on an IBM 7090 Computer. ) 5. Through the Author-Title and Subject Update Program ( LB050) , the new entries created by LB020 are merged with existing entries and existing records are deleted as speci- fied by LB020. 6. The Author-Title and Subject Split Program ( LB060) sets up the entries on magnetic tape (line length, indention) as they will appear in the catalog and establishes the two columns for each page. 7. The Author-Title and Subject Printout Program (LB070 ) prints the pages of the author & title and subject catalogs. 8. The Shelf List Split Program ( LB030) performs the same function for the shelf list as undertaken by the Split Program for the author & title and subject catalogs ( LB060). 9. The Shelf List Printout Program ( LB040) prints the pages of the shelf list. THE CHANGE PROCEDURE It is possible to change information in a preceding supplement by fol- lowing tl1e change procedure: To change a call number, the entire entry is deleted, employing a record consisting of Area 10 and a delete symbol in the change indicator, and a new entry inserted in a separate record. To change an area, a change record is prepared consisting of Area 10 information with the change indicator marked plus card( s) for any area( s) to be changed. For example, if there is an error in a subject heading, it is necessary to prepare only an Area 10 plus an Area 60 change record showing the subject headings desired. The smallest unit for editing purposes between supplements is the Area. CHANGES IN AN ANNUAL CATALOG Dm;ing systems design it was realized tl1at machine time could be saved if a different procedure were followed to change information in the preceding year's catalog when merging it with fresh data to form a new annual catalog. The procedure used is to delete through three utility programs ( LB075, LB080, and LB090) entries that are to be changed and then enter them anew. The first of these programs ( LB075) is a card-to-tape program. Special delete cards (essentially Area 10 information giving call number ) are 30 Journal of Library Automation Vol. 1/1 March, 1968 prepared for each volume and copy to be deleted. They are transferred to magnetic tape and sorted into order by call number. In the second program, the Shelf List Delete Program ( LBOBO), the last annual shelf list tape is read and entries as specified by LB075 are deleted and a new shelf list tape written. Since the author-title and subject files are not in call-number order, a table look-up technique is used in the third program, LB090, the Author- Title and Subject Delete Program. The table consists of call numbers (in proper sequence) for all records to be deleted. Each entry in tl1e author-title and subject files is checked against the table and deleted if the call number for the entry is listed there. A revised author-title and subject tape is thus prepared. RECORDS ON TAPE The basic tape record follows the format of the master machine record described earlier. Figure 5 shows a tape dump of the master shelf list 501 601 701 801 901 1001 1101 1 lOt 201 301 ~01 501 601 701 801 ,01 1001 1101 1 101 201 301 401 501 601 701 801 901 1001 1101 1 101 201 101 ~01 501 601 701 801 901 1001 1101 1 101 H 174 0563 12 1 C S46 + 1 10 20 30 40 50 60 70 80 ~0 100 e 1761 ~17 1 r 66 108021 129151> 27901 7 BAILEYo THOMAS A.+ PRESIDENTIAL GREATNESS THf IMAGE ANO ne MAN FROM GEORGE WASHING TON TO THE P~ESE~T+o 5Y THOMAS Ao 5AILEY?. "PPLETON- CENTURY, 1966o 368 PotPRESIOENTS•-u.S.t S66 + 1 10 20 30 40 50 60 70 80 9 0 !CO e 1761 n 1 c r 66 1oao23 t31nnrco3510501 7 KANE, JCSEPH NATHAN+ FACTS ABOUT THE PRESIDENTS A CCMPilATICN OF 810GRAPe l "l _.C HI STOR !CAl OAU. 5Y JOSFPH NATHAN KANE. Ho W. WilSON, 1960. 348 P.H S$H$ESLSVSESO CNLY I ~ ARU Z 30o+PRESIOENTS-U. S.+ +E 1762 "4 1 T 66 108018 126153 279CH ~EANSo ~lRIA~~E+ 7~E WC~AN IN THf WHITE ~OVSE THE liVES, TIMES AND INFlUENCE OF TWElVE MTABlE fi~ST l~CIES+. 8Y II ARIAMNE ~EANS+. RANOCH ~OUSEo 1963o 299 P.+PRESIOENTS--U.S.--~IVES+ + 1 10 20 30 40 50 60 70 80 90 100 E 178 A2~ 1 I K T 66 108C24 1321C1233 C9933204 'I AOl~S, JAHES 7~USlCW+ ALBUM OF ~MERICAN HISTORY. JAMES ·TR~SLC .. ACAMS, ECITOR I ~ CHIH . . C. SCA.Ifthf.R• l91tlt-60. 6 V.+- VOle 5 EClTED BY J. G. E. HOPkiNS.• VOL. 6 (S l"'CEX.4$ SS H$ E HlVS ESO IN REFERENCE ALCOVE HO.+U.s.--HISTCRY+U . S.--SOCIAL LIFE AND CUSTO~S? +E 178 A24 2 1 K 566 + 10 2 0 30 50 60 70 8 0 9 0 10 0 178 .~24 l K S66 Fig. 5. Tape Dump of Master Shelf List. A Book Catalog at Stanford/ JOHNSON 31 as formatted in the Card to Tape Program (LB010) and after having been sorted into call-number order. As stated above, two records are placed in a block of 1,140 characters, each record with 570 characters. Printed with a limited print chain, some characters do not appear as they will finally. The record mark prints as a plus sign, and in the entry for the work by Kane the symbol indicating underscoring prints as a dollar sign. The word-separator characters and some other characters do not print at all; spaces are left to indicate their presence. Once the information has been processed through the Format and Update Program (LB020), however, the machine record is somewhat different. New records, one for each entry that will appear in the final catalog; are generated. A listing of the elements in each of these records is shown in Table 3. Figures 6 and 7 show tape dumps for author a11d subject entries. Table 3. The Author-Title and Subject Tape Record. Position 1: 2-81: 82-101: 102-131: 132-136: 137: 138: 139: 142: 145: 148: 151: 154-618: Type of Information Catalog indicator ( 1: Author & Title Catalog; 2: Sub- ject Catalog) Major sort key Minor sort key Library of Congress classification number Size and format Record indicator (Program supplies ·"1" if there is an overflow record and "2" in second record) Delete indicator (program supplied, for use in change procedure) Address for main entry (Area 20) or added author/ added title (Area 70) Address for title paragraph (Area 40) Address for conventional title (Area 30) Address for notes (Area 50) Address for subject heading (Area 60) Variable-length fields THE SORT KEY The Format and Update Program ( LB020) generates the sort key for each entry. The sort key determines the characters that will be consid- ered when the entry is to be alphabetized. A succeeding sort program does the actual alphabetizing. Mter some study, experimentation, and conjecture a 100-character sort key was selected. It is in two parts: a major sort key of 80 characters and a minor sort key of 20 characters. The major sort key is formed from the first 80 characters of the ele- 32 Journal of Library Automation Vol. 1/ 1 March, 1968 1 10 1 20 1 30 1 401 501 6 0 1 1 10 1 20 1 JCI 40 1 SO l t01 1 1 01 201 301 4 CI SC I 6 01 1 10 1 201 30 1 40 1 501 6 01 1 101 20 1 30 1 4 0 1 50 1 f 0 1 I 10 \ 20 1 301 1 10 zo 10 1BA1CWI ~ T ~ RPR 3095 83 L OF I~E S~AKESPEAREAN COMPANY • 40 s o 60 70 eo 90 100 CRG ANIZHICN ANC PE 154 171 BALDWIN, T. W,+ THE ORCANllATICN A~O PERSONNE RUSSELL - RUSSELL, 1961. 463 P.+ + I 1 10 zo 30 40 50 60 70 80 90 100 POLl T ICAL PART! ES 8AllEY, STEPHEN KEHP+ POliTI CA L PARTlf. St U. EOJIEO BY ROBERT A, CCLCW JN. RAN C MC HAL 1BAILEY STEPHE N KEHP UJK 226S G59 1 54 178 s. A. ESS.V S tY STEPHEN KENP BA ILEY M/0 OTHERS. lY, 1965 . 158 P. + + 1 10 20 30 40 50 60 1"AI LEY TPO~AS A EE 1761 Bl7 . 1 541 75 BAILEY, P. E !~AGE ANO THE "AN FROH GEORGE WA SHI NGTON TO THE PRESENT • + ·1 10 20 30 40 ! BAI LEY TPO~AS A TO 6 4 1 A7828 1 54 175 RE4T BETRAYAL, H ICOUNTER PAP ERBAC KS, 1963 . + 50 6 0 BAILEY , lt29 P.+ ro eo " too PRES IOH T I AL CREATH THOMA S A.+ PR ES,! CENT I AL CRE6TNESS T APPLETON- CEN TURY , 1~66. 168 P,+ 70 80 90 10 0 WCCC ROo WILSON AND TP.OHA S A.+ WC CCRCW oiLS CN AND T~E C 1 10 20 30 40 50 60 70 80 90 100 LIFE AND EXPl OR AT 10 1 6 AJ N J A~THUR N~ 70018~38) 1 54173 BAJ~ , J , ART J.tVR+ LIFE AND EXPLORATIONS OF f RICTJCF NANS EN . • lt'i 9 P.+ ~EW EO• RE V. AND f.ONSIOERABL Y ENL. , WtTH NUMEROUS !LlUS. ANC MAP. w, SCCTT , N. O + 1 10 20 30 40 50 ! BA I NVIllE JACCUES 7C C 335 832 1 54 175 70 -1 935 , TRANS lAT ED, WITH AN INTROOUC TDRY NCTE, BY 60 70 80 90 10 0 FRE~CH REPUeLIC 18 BAINVILLE , JACOU ES + T~E F~fNC' RfPUBLI C, 18 HAMIS H HILES. J, CAPE , 1916. 25 3 P. + Fig. 6. Tape Dump of Entries in Author & Title Catalog. ment that will serve as the entry in the catalog-the author, the added author, the title (for entry under title), the added title, or the subject heading. The minor sort key is formed from the first 20 characters of the element that follows the heading-the title or conventional title. The conven- tional title can never be a major sort key. The title, on the other hand, can be either a major sort key or a minor sort key. During the course of the Split Program ( LB060), major sort keys are compared. If two or more are identical, the entry words for the second and subsequent identical headings are suppressed and do not print. Un- der the heading that does print, entries are arranged in alphabetical order through the first twenty characters of the element generating the minor sort key. In a sense, there is a third sort key. If both major and minor sort keys A Book Catalog at Stanford/JOHNSON 33 are identical, items will print in call-number order. In no case will the element generating the minor sort key be suppressed. During the first test of the programs one mistake was discovered with respect to the generation of sort keys and the suppression of entry ele- ments. In a few cases the library possessed multiple copies of the same book cataloged under different call numbers (for example, one as a separate and one as part of a series). The problem arose when there was to be an entry under title, with two major sort keys identical. A similar situation arose for periodicals. A periodical might be entered itself under its title (Area 40) and also appear in the catalog as an author (Area 20) for a book it might issue. To eliminate this problem, a minor 1 10 20 ...)O,L 40 50 60 70 80 90 ICO 1 2P~F.~ERVATI~N CF 2COLOGICAL SPEC t•ENS SE E" ZC CLCGTI'l!L SPF.C 1 01 I lCCLCCICA12ll~616 191 154PRESERV4TICN GF ZCCLCGICAL SPECI"E'S+\ SHSE Z 201 CClCGJCAL SPECI"ENS-- COllECTIO~ 4NO PE PECPLE S CHOIC E 201 HC" THE VOTER ~AKES UP HIS Ml~lO I~ A PRESIOENT!Al CAMPAIGN, eY PAUL F. LAZARSFELC, BERNARC 8 3CI EltELSCN, AND' HAlEL GAUOET. ZO EO. COLUMBIA UNtV. PRESS, 1965, 11e P o+ 4 01 501 t:Cl + 1 10 20 30 40 50 60 70 BO 9C ICO 1 2PRF~ICENTS U S ELECTION STATISTI CAL ~ ISTCRY 1CI JK 1967 P4 tal 34q[54PRESIOENTS--U.s.--ELECTIC•• A SHTIS IICAL tt lSTC 2CI RY CF THE AHRICA~ PRESIOF tHIAl ELECTIONS, P.V SVF.NO PET~RSEN, INTRCC. C"~ ·I·TIC,A L ELECTic•s 301 AY LCUIS Fill ER: • F. UNCARt 1963. 247 P.+S S,H.E'ItlVSE'O 1~ REFERfNCE ALCCVE 'HO.+ 4CI '3Cl ttl + 1 tr.l 201 1 10 2.0 30 40 2PHSIDENTS U S ElECTION 1696 CF 71C J6 11!7 tAL HECTIC~ CF 1•~6, 8Y STA~LEY lo JO>;fs, so 60 70 eo 9C teo PRESlt£•TIAl ElECT! 154PR£S I OE~TS--U. S .--FLECT tc•--1~96• T~E PRES I CENT lJNIVo OF wtSCO'SIN PRFSS, 1964. ~3~ Po+ Fig. 7. Tape Dump of Entries in Subject Catalog. 34 Journal of Library Automation Vol. 1/ 1 March, 1968 change was made in the program: If a title generates a major sort key, it is never suppressed if identical with a preceding major sort key. FILING ORDER Formation of sort keys leads immediately to a discussion of the un- solved problem of alphabetization. In the Meyer Library catalog the aim has been to duplicate as closely as possible the arrangement of entries found in the University Libraries' union card catalog. Basically, this means a word by word alphabetization. In addition, we have attempted to pre- serve as many of the currently used typing conventions as possible in preparing entries for the card catalog. For example, two or more initials separated one from another by periods have no spaces following internal periods. Thus, the abbreviation for the United States is typed as U.S. and not as U. S. Abbreviations filed as they are spelled and Me's and Mac's in separate sequences are two of the major differences from stand- ard manual library filing. It was recognized that in generating the sort key the computer will scan the entry words character by character and space by space. Thus, it is important that each character and each space be positioned accu- rately. The computer checks a character and either interprets it as a blank, a letter, a numeral, a symbol, or else ignores it. In alphabetizing, · this basic rule is followed : A blank files before a letter (A through Z), and a letter files before a numeral ( 0 through 9). . Certain marks of punctuation are interpreted as a space. They are: period, comma, colon, semicolon, hyphen, and question mark. Some marks of punctuation are ignored. They are : parentheses, brackets, dol- lar sign, virgule or slash ( / ), equal sign, number or sharp ( #), per cent, asterisk, apostrophe, Hat sign, and ampersand. It was believed that the presence of a space on either side of an ampersand would place entries in correct order, but in some cases this did not happen. Some diacritical marks change the value of the character with which they are associated. For example, an umlaut over an "a," "o," or "u" changes that character to "ae," "oe," or "ue" respectively. NON-FILING SYMBOLS If an alphabetical order is desired other than that explicitly given in the entry words, special symbols are employed at the time of coding. Since language of publication is not coded, it is necessary to place sym- bols around introductory articles for them to be ignored. The less-than ( < ) and greater-than ( >) signs are the symbols used to set off a sequence of characters to be ignored. The placement of these symbols is important. For example, to eliminate the article from the title, The century of sci- ence, it is necessary to place the symbols in this manner: century of science. In this way the sort key would be generated starting with A Book Catalog at Stanford/JOHNSON 35 the letter "C" in the word "century"; and a space would be left between the words "the" and "century" in the printed heading. Use of the non-filing symbols internally in an entry is limited and must be strictly controlled through recording of decisions in authority files. So that names filed under prefixes and written as two words will be filed in the same sequence as names written as one word, the non-filing sym- bols are employed internally. For example, in order that Van Buren and Vandenburg will file in the same sequence, Van Buren is coded as Van< >Buren. In this manner the computer is instructed to ignore the space when forming the sort key. The use of non-filing symbols has proved quite useful in subject head- ings to arrange period subdivisions in chronological order when there is a word or words intervening between the heading and the date. Thus, with non-filing symbols employed as shown below, these particular sub- ject headings are arranged chronologically: CT. BRIT.-HISTORY-1042-1066 CT. BRIT.-HISTORY-1066-1154 CT. BRIT.-HISTORY-1066-1485 CT. BRIT.-HISTORY-1135-1154 CT. BRIT.-HISTORY-1154-1189 CT. BRIT.-HISTORY- 1154-1216 THE FILING SYMBOL The less-than and greater-than signs are provided on the expanded print chain purchased for the book catalog project. To date it has not been found necessary to use these signs as symbols in titles, and so their use is restricted to their role in forming non-filing elements. As work proceeded, need was felt for another symbol-one that would set off a field that would not print but which would be filed upon. For example, we wished to file the title, 1848: chapters of German history, as though it were written, Eighteen forty-eight; chapters of German history; yet we did not wish to violate the form of the title as given in the book. An examination of all characters in the print chain led us to sacrifice the symbol @ for use as a sign in its own right. It is used solely as a filing symbol. Thus, any characters or spaces placed between two @'s will generate a sort key as specified by those characters, but the information will not print. The title, 1848: chapers of German history, will be coded in this manner: @eighteen forty-eight@< 1848 >: chapters of German history. It will be filed as though it were written: Eighteen forty-eight: chapters of German history. The use of the filing symbol has been especially useful in arranging period subdivisions in chronological order when treating years in the pre- Christian era or before the year 1000 A.D. For years in the pre-Christian era, coding permits chronological arrangement of years beginning with 36 Journal of Library Automation Vol. 1/ 1 March, 1968 9999 B.C. The following procedure is observed: The year in question is subtracted from 9999 and the resulting difference, preceded by the letter Z, is entered inside the filing symbols. The year will thus file after all letters but before years in the Christian era. For years before 1000 A.D. the leading 0 is simply placed inside the filing symbol, for example, @0@476. To illustrate further, here are three subject headings as manually filed: ROME-HISTORY-REPUBLIC, 510-30 B.C. ROME-HISTORY-REPUBLIC, 365-30 B.C. ROME-HISTORY-AUGUSTUS, 30 B.C.-14 A.D. They are coded in this manner : ROME-HISTORY- @Z9489-Z9969@ ROME-HISTORY-@Z9734-Z9969@ ROME-HISTORY-@Z9969-0014@ The following sort keys are generated: ROME-HISTORY-Z9489-Z9969 ROME-HISTORY-Z9734-Z9969 ROME-HISTORY-Z9969-0014 The headings will file in correct chronological order and print as origi- nally shown above. OBSERVATIONS ON FILING ORDER With but few exceptions, the filing order as designed has proved a very satisfactory arrangement. It has been felt advisable to place notes at various points in the catalog to link together headings which are filed separately. For example, the abbreviation Mr. is filed as mr, and the word Mister is filed as mister. Here a note refers from one to the other. In the subject catalog it was discovered that if a country or local heading is abbreviated, two different alphabets are established. So far this has occurred for the United States (U.S.) and Great Britain ( Gt. Brit.) The terminal period generates a space when the sort key is established. Thus subdivisions separated from the heading by a dash (two hyphens equiva- lent to two spaces) are in fact separated by three spaces and file before jurisdictional or form subdivisions which do not require the dash. For example, U.S.-HISTORY files before U.S. DEPT. OF STATE. A note in the catalog gives instructions on the filing order in such a case. Less fortunate is the situation of the author who chooses to use a name with a first initial and a complete middle name. Because of the period and space separating the first initial from the middle name, there are established two spaces. Thus, the following "incorrect" alphabetical order is established. Smith, J. Russell Smith, J.A. Smith, J.C. A Book Catalog at Stanford/JOHNSON 37 As may be expected, situations such as those described above do not occur often. It is hoped that through scanning of the open page before him the reader will find the correct heading. It may be argued that the coding required to achieve the alphabetical order in this catalog is too demanding for a project based upon use of a sophisticated electronic computer. Possibly, programming should have taken care of all of this work. It has been our belief that we have achieved, in terms of the present state of the art, a good balance between what the machine should do and what the human should do. In the process we have been able to keep the form of the information as it appears in the source. As examples, introductory articles have not been eliminated from titles, and Library of Congress subject headings have been retained ( 11). Most important, it has been possible to implement these rules consistently with a relatively inexperienced staff. PAGE CREATION With each entry created and alphabetized, the Author-Title and Sub- ject Split Program ( LB060) is called upon to generate the lines for the final catalog and create the two columns of each page on magnetic tape. The final program ( LB070) prints the pages of the catalog from the tape. The computer line printer permits the use of 132 print positions in each line. The type size is the same as pica type-ten characters to the horizontal inch, six lines to the vertical inch. It was decided that the completed page size for the book catalog should be 8~" · x 11". With an allowance for an adequate margin on all four sides of the page, it was believed that the reduction necessary to employ the 132 characters in a line probably no longer than seven inches would be too great. Ex- perimentation led us to accept a reduction to 68 per cent and use of 98 of the 132 print positions. This can, however, prove expensive, as the printer takes as long to print 98 characters as it does 132. The catalog page as designed calls for two columns, each 45 characters in length, with an eight-character margin between them. The text is 80 lines, and the page is 84 lines in length because of the heading at the top and the page number centered at the bottom. Catalog entries are not split between columns, so that the bottoms of the pages are rarely even. To simplify programming it was decided not to attempt programmed hyphenation of words or to require right and left justification of the lines in the catalog. The first words of a catalog entry are set flush left, and all successive lines are indented two spaces. The call number is set flush right on the last line of the entry if there are three spaces separating it from the last word of the entry; otherwise, it is set flush right on the following line. As stated earlier, entry words (authors, subject headings, added titles) are suppressed if they are the same as those found in a preceding entry 38 journal of Library Automation Vol. 1/ 1 March, 1968 and are repeated only at the head of a new column if the entries are continued there. Entry words are so clearly shown in the catalog that it was not considered necessru:y to use keys at the top of each page indicating which letters are included on that page. Because an expanded print chain is employed, speed on the printer is considerably reduced, actually to 250 lines per minute. The printer requires eighteen seconds to print one page. The page image is approxi- mately ten inches by fourteen inches. Through use of the Itek Plate- master, this image is reduced to 68 per cent and an offset master cre- ated for reproduction on offset equipment. Figures 8 and 9 show repre- sentative pages from the 1967 annual catalog. The foregoing account has emphasized the preparation of a page in an annual catalog. Except for the size of the page and the kind of paper used, the identical process is followed for the preparation of the supple- ment. Through a switch setting, a forty-line page for the supplement is printed. The supplement is printed on ten-ply paper ( 8~" x 11"), kept in unburst form, and bound at the top in post binders. Figures 10 and 11 show representative pages from the January, 1967, supplement and illustrate how some of the entries earlier depicted in Figures 6 and 7 appear in final form. Through similar programs the shelf list is prepared and printed in . essentially the same format as the supplement, a 98-character line and a page of forty lines. A key at the top of each page indicates the first call number on that page. The shelf list is printed on four-ply paper, and copies distributed to important staff service points in the Main and Meyer Libraries. Figure 12 shows a copy of a page from the January, 1967, shelf list, depicting how the information from Figure 5 appears when finally printed. The lone call number, E176.1.B77, copy 2, is for an item, of which copy one is represented in the 1966 annual shelf list. THE FIRST ANNUAL CATALOG AND ITS SUPPLEMENTS The first annual catalog was prepared during the summer of 1966, list- ing the 25,000 titles cataloged as of the end of June. The catalog was 2,804 pages long, 1,569 pages in the author & title catalog and 1,235 pages in the subject catalog. Each page from the printer was first scanned by library staff and seri- ous errors masked with white tape. In consequence, the user of the cata- log will encounter an occasional blank on a page. The pages were then sent to the University's Photo Reproduction Service, where offset masters were created and fifty copies of each page reproduced. The Stanford University Press prepared the binding. Each set of the catalog was bound in red buckram in seven volumes, approximately 400 pages in each, four volumes for the author & title catalog, three volumes for the subject catalog. There is a title page in each volume and several pages of expla- nation on the use of the catalog. Letters included in each volume are A Book Catalog at Stanford/ JOHNSON 39 AUTHOR ~ TITLC CATALOG -- JUNE 1967 Ho rnbeln, Thomas r. Everest: th e west ri dge. Photop raphs fr o• the A•er le~n Hount Eve rest EKped ltl on and bv tts leader, Nor~ an G. Dyhrentur th . Introd. bw William E. S!rl. Edited by David Br.ower. Sierr a Club, 1965. 198 P•, Ill us. DS486. E8H54 Foil o Hornberger, Theodore · Benj~mln Fra nklin. Univ. of Minnesota Pre ss, 1 962. 48 P• PS75l oH6 lfo rnbJow, Ar thur The ca p tive, b~ Edoua rd Bourdet. Translated by Arthur Ho r nb l ow, Jr. lntr od . by J. Brooks Atk inson. Bre n t ano's , 19 26 . 255 p. PQ2603.0777P7Z A. hi sto ry o( the theatre in Afl'l.e rlca (t' om Its beg lnnln n s t o the pre sent tt-e. J. D. Li ppin cot t, 19 13. 2 v. PN2221.H6 The trlu~ph of dea th, by Ga briele d'Annunz io . Tra n, lated by Arthur Ho rn blow. lntrod. by Burton nasc oe . 8oni a nd Llverl ght, 192~ . 412 P • PQ4 803.Z3T7 Hornb y , A. S . A nuid e t o patterns and usage in English. Oxford Untv. Pre ss , 1962 . 26 1 p. Hot"n e, Alistair The fall of Pari s; Commune , 1870-71. 1966. 458 P• PC14 60.H5 4 the Siege and the St. Hartin's Pr e ss, OC311.H6 The price of ~ J o ry! Verdun 1 91 6 . S t. Ma rti n*s Pr ess , 196J . 371 P• D545.V3H6 Return t o po wer; a r epor t on th e new Germa ny . Pr a ege r, 1956. 41 5 P• 00259.4 .H65 Ho r ne, C. Sl l vest ~r Put"lt.t nlsD .tn d art ; an l nq ulr11 int o a popular rat lacy. B~ Joseph Crouc h. Introd. by the Re v. C. Si lv es ter Horne. C as s e llt 19 10. J8 1 P• N7Z.C 8 Ho r ne d Moo n ; an a ccount of a jou r ne y throu g h Pakis t an, Kash• ir, a nd Af gha nist a n. BV I an S te phen,. . Indillna Unlv. Press, 1955. 2.88 P• 05377.58 Horner, Har l an Ho~t Lin coln and Gre:ele!l• Un l v. ot 11tlno t s Press, 1953. 43 2 P• E45 7. Z. H79 Horne~ , tlia ren rwork o . 196 4.] The coll ec t~d work s o f K~ren Horney. W.W. No rt on, 19~2-6~. 2 v . Conte n t s . - v.l. Th e neurot ic personality ot our t i ~e.- v.2. Self- analy s i s . RC435 .ff6 Neurosis and hu man Qrowth; the struggl e t o ward se lf-reall z otlon. w.w. Norton, 1950. 391 P• RC 343. H5 48 The neuro t i c pe r sonality of our tf •e • W.W . Norton, 1937. 299 P• RC343oH75 N~w Wll l/S ln psychoanalysts. w.w • . Nof'ton, 1939. 3 13 P• BF1 73.H762 Ou r inn e r confli cts: a co nstructive the o ry of neurosis. w.w. Norton, 1945. 25 0 P• RCJ43,H56 115Z J. HENRY MCYCR MEMOR I AL LIBRARY Horne~ , Karen Self-an al ysis. W. ~. Norton, 1942. 309 P• BFI73.H"1625 Ho rn gren, Charl es T. Cost accountin g ; a managerial e~pha~ i s. Pre ntice-Hall, 1 964. 80 1 r • HF~5~6. C8H59 Hornik, He nri Le t emp le d'honneur et de ver tu~, par Jean Lenalre de Belges. Ed. critique pub ll ~e p4r Henri Horni k. Oroz, 1957. 136 p. PQ1628 . LST4 Horns, stri nn s and har•ony . fty Art hu r H. Be na::le. Doubleday, 1960. 271 P• ML380S.B33 Das Ho rnunge r Helm we h, a nd o ther 3to rt es . By Wern e r Bergen~ ruen. Edi t ed by W. I. Lucfts. To Ne ls on , 1963. 1 17 P• PT260J.ES9H6 Ho rodls ch • Abraha m Picasso ~s a book a rti s t. ~orld Pub• co •• 1962. 136 P• NC247.PSH63 H~ ronj~rr, Robert Th e p lanning and design ot airports, HeGraw -H lll, 1962. 464 P• TL7 2S . 3 .P5H6 Horowit z, David He•ls phe res no rt h and south; econo~ lc di s parity among na ti ons . John s Hopkins Press, 1966. 11 8 P• HD82 . H617 Stu:fe n t . R~lla~tt n~ Books, 1962 . 160 P• Ho r owitz, Irving Lou l5 The anarchists, edited Irvin g Louis Horowitz. P• L076 0.1i6 vll h an intr od . by 0• 11, 1 96 4. 6 40 HX 82 6.H6 The I dee o f var end p eace in conte mpora r y p h ilos op hy. W~th an in tr oduc tory essay by Roy Wood Se llar s. Pai ne-~hit~an , 1~57. 224 P• J X1952 . H72 The ne ~ socio l ogy; e$says In ~oc ial sc ie nce and socia l th eo r y in honor o ( C. Wri ght Mi ll s . r.di ted by Ir ving Lou is Ho rowitz. Oxfor j Un i v. Press , }g64 . 5 1 2 P• H35.H68 Radi ea 1lsm and t he rev o lt B3alnst reason; the socia l th eori e s d f Georges So r e l, wit h a tra ns lati on o ( hls ess a y on The deco•posl ti on of Ma r xi s m. Hu• en t tir.s PreSS t 1961. 264 P• HX26J.S6H5 Revolution In Brazil; polit i cs a nd soc iet y In a d eve l opi ng nation. E.P. Dutton , 196 4 . ~30 P• f2538 . 2 . H6 Th r ee .,. orl ds o t deve l o pMen t; the t heory and p ra c t ice o f international stratl!icatlon r Oxfor d Univ. Press , 1966. 4 75 P• 0640.H6 Horrabin, J.r. An atlas or Afrtca. 2d, rev . ed . F.A. Praeger, 1961. 126 P• G2445.116 Mathe ~atics Cor th e mlllton, by Lance l o t Hogb en. I llustrations by J .f. Horrahtn. W.~. Nort on, 1 937. 647 P• QA36 .H6 Horrobln , Dav id P. The com~unlcat lon systems o r t he body . Basic Bo oks, 1964. 214 P• QH508 .H6 Fig. 8. A Page in the Annual Author & Title Catalog. 40 Journal of Library Automation Vol. 1/1 March, 1968 SUBJ ECT CATALOG -- JUNE 196? CROMYELL, OLIVER Ullver Cro~well, by John Morley. Centurv Co., 1900. 48G p. DA4Z6.K86 Oliver Cro•well, by c.v. Wcdgwood . Mac~illan, 1956. 14~ P• DA426.W4 Oliver Cro• wetl and the rule or the Puri tans In En~ l•nd. By S ir Charles firth . With an lntrod. by G.H. Youn". Oxro~d Unlv. Press, 1~61. 486 P• OA~26.F52 CROHYE LL, THOMAS Tho•as Cro•wel l and the En~lish Refor••tlon 1 b~ A.C. Dickens. English Unive rsities Pr e1 s 1 1959. 192 P• DA334.C9D5 CRONIN, A.J. Adventur e s in two worlds, by A.J. Cronin. Little, Brown, 1952. 331 P• PR6005.R68A4 CROP R£PO RTS ~.!! Agrleulture--Statlstlcs. CROPS--STATIST IC S ~~~ Agrlcu1ture--Stotiotlcs. CROPS AND CLIKATE--U.s. Cli•ete and •an, the yeerboo~ ot agriewlture, 1 ~41. U.s. Dept. or Agr iculture. U. S . Govt. f>riot. orr., lO•H. 1248 P• S21oA35 1941 CROSS, "ARI~N EVAN S §!!El iot, George, pseud. CROSS-COUNTRY RUN~ING §:!~ RIJnning. CROW INDIANS Cro w Indian beadwork: a descriptive and historical studS~• By Willie"' Wildschut and John C. Ewe rs. Museu~ of the A•e rlean lndlan, Heye Foundation, 1959. SS p. , il1uo. £99.C92W5 The Crow Indians, by Robert Lowie. farr•~ ~ Rinehart,. 1935. 350 P• E9!1.C92l913 The lite •nd a dventu res or Ja~es P. Ueckwourth. Edited by T.O. Bonner. A.A . Knopr, 1931. 405 P • f592.B393 The re11glon of the Crow Indians, by Robert H. lovte. AMerican Museue ot Natural History, tnz. 30>1-4H P• E99.C92L6 CROIIDS The crowd: Gu stave Le 23~ P• a studv or the popular ~I nd. By Bon. T. Fisher Unvln, 1917. HM28t.L 5 The cro wd In history: a ~tud~ ot popular disturbances In france and Eng land, 1730- 1848· 8y George Rude. J. Wiley, 1964. 2&1 P• HMZ83,R8 The erowd In the French Revolution, by Ceo r o• Rud~. Cl•rendon Press, 1961. 267 P• DC158.8.R& The psychology or social movement•, by Hadley Cantril. J. Wlley 9 1~41. 274 P• H"291o C3 418 CRUSADES An Arab-S~ rl•n "entJem•n and wlrrlor In the period or the crus•des: •e•olrs or Usa11ah lbn-Hunqldh. Trans l ated fro~ the original Manuscript by Philip K. Hltti. Colu~bla Univ . Preos, 1929. 265 P• DS!I?.U5 Background to the Crusades. a 8BC publication. British Broadc asting Corpore tlon, n.d. 38 P• 015 9. 87 The Crusades, by Ernest Barker. Oxford Unlv. Press, 1923. 11 2 P• 0158.83 The crusade3t by Richard A. Newhall. Rev. ed. Holt, Rinohort and Winston, 1964. 136 P• Dl58.N~ fhe crusades, by Zo~ Old~nbourg . Translated by An ne Cdrter. Pantheon Books, 1966. 650 P• 0158,04 The Crusades: Iron •en and saint~. By Ha r old La•b• Doubleday. Doran, lVJO. 368 P• D15?.L3 The Crusadest the ~tory ot the Latin Kingdo• ot Jeruaale~. By T.A. Archer and Charles L• Kingsford. c.p. Putn••• 1936. 467 P• Dl58.A67 A hlatorv ot the Crusades, by Steven Runc:l•an. Cambridge, Eng., Unlv. Press, 19~?. 3 v. D1 57.R8 The klngdo• of the cru•eders, by Dana Carleton Munro. o. Appleton, 1935. 216 P• D18 Z.M8 The recovery ot the Holy Land, by Pierre Dubois. Translated wlth an Introduction and not~• by Walth e r 1. Brandt. Columb ia Un lv . Press, 1956. 2Sl P• DI~Z.D813 CRUS ADE S-- HI STORY A history ot the Cru~ade~. £dttor-ln-chiet, Kenneth H. Setton. Unlv. or Pennsylvania Press. 1 958 - Librory has v.~-2. D1S?.S48 CRUSADES--FIRST, 109 6-1099 The first erusede; the accounts witnesses and participants. Pe 1958. 299 P• or eye- S~Ith, D161.1.A3~7 Gest• Francoru• et e11orum Hlerosollai t anorum. The deeds of the Franks and t~e other pllgrl~s to Jerusalem. Edited bl( Rosalind Hilla; lntrod. by R.A.B. "ynors. T. Nelson. 196Z. 103 9 103 P• In Lati n and En~ iish. D16l.l.G4 CRUSADES--SECOND, 114?-1149 De protection• Ludovlcl VII 1ft oriente•• edited, vlth an English translation by Virg inia Gingef"lck. Berry. Coluabla Uniy. •Preos, 1948. 154 P• Dl62oloU3 CRUSADES--THIRD, 1189-1192 The Crusade ot Richa r d Llon-hea'f't, by A~brolse. Translated by "erton Jero~e Hdbert. With notes and documentation by John L. L• Monte. Columbia Univ. Press, 1~41. 4?8 P• D163.AJA52 Fig. 9. A Page in the Annual Subject Catalog. 0 • • • ~ • • • ~ ~ oe ~ ~ ~ • /\UniOII E TITLE CATALOG -- JANUAI!Y 1967 SIJPPLf.HE:NT Bailey, Stephen Ke~p Political pnrtles, U.S./I,; essays by Step~en Kemp Bailey and others. Edited by Robert A. Goldwin, Rand McNally, 1965. 158 P• JK2265.G59 Bailey, Thomas A, Presidential greatness; the l10age and the man trcm George ~ashlngton to the present. Appleton-Century, 1966, 368 P• El?6.l.B17 Woodrow ~llson and the great betrayal , Encounter Paperbacks, 1963. 429 P• D543.A7B28 Baln, J. ATthur Lite and exploratlcns of Fridtjot Nansen. New ed. rev. and considerably en!., with numerous illus. and map. w. Scott, n.d. 449 P • G700,l893.B3 Bainville, Jacques The French Republic, 1870-1935. Translated, with an Introductory note, by Hamish Miles. J, Cape, 1936, 253 P• DC335,B;JZ Baird, A, Craig ArAumentatlon, discussion, and debate. McGraw-11111, 1950, 422 P• PN418l,B29 General speech; an Introduction. By A. Cral!l Ealrd and Franklin H. Knower. 3d ed. McGraw-1111 I, 1963. 44 8 P• PNH21. 8314 Speech criticism; the development ot ~tandar.rls Cor rhetorl cal aoo••al9al. Bv J, HENRY ~EYER MEKORIIIL LIBRARY Baird, Donald The English novel, 1578-1956; a checklist o! twentieth-century criticisms. By Inglis f. Bell and Donald !laird. A. Swallow, 1958. 169 P• ~h~l~ed only In Reference Alcove 280, Z20l4.F4B4 Baird, Wlllla~ Ralmond Baird's manual ot American college fraternities. 17th ed. G. Ba~ta, 1963, 834 p. ~h£1~£4 only In Area 2~0. LJ31.B2 Baird's manual ot American college fraternities. 17th ed , G. Banta, 1963. 834 P• ~h£1Y£4 only In Area 230. LJ3l, B2 La baja Edad Media, por Enrique Bag~ e y Juan Petit. s. Barra!, 1956. 412 P•t lllus. DP99.B3 Bak, B-rge Elementary Introduction to molecular spectra. 2d rev. ed. Intersclence, 1962. 144 P• ~C451.B16 Baker, Carlos American Issues: The social record. Edited by Merle Curti, Willard Thorp, and Carlos Baker. 4th ed. J.a. Llpplncctt, 1960. 1160 P• PSSO?.T54 Ernest He10ln9w~y: critiques of four major novels, F.dlted by Carlos Oaker. C. Fig. 10. A Page from the January 1967 Author-Title Supplement. C) • u uu 0 • • • w wu 8 • • • p uv • • w u~ ol > tl:l c c ..,... CJ ~ ~ ~ .... C/:) 4 ~ 0 :I: z C/:) 0 z ~ ~ \. '· • # .. C> v 0 40) ~ ~ q, ~ ~ ~ 0 Co 0 0 • ... , ''-V' 40$1o•· ....:.t~......,._ ,_... ·~-: ........... ~ .......... """',. ..... ....~ ~~- <:J: "' .. ,.JJt .,: r ,!A r., l.u.: -- Jf' ~IJJ\:t 'l 1 Jfj-) SU P ... LC~t: •: t" PO~YS, LLEWELLYN The Po~y3 brotherst by R.C. Churchill• ton gmnns , Grce nt l oos and Influence of twalve notable u~·at '• ~~~~. Col l:t ... llo,6 }lclel\~t e.dl .t.or: l if\ c~Jef• C:. ~crlb.11e t & F•"l .C& '!•1 c:,l Co v.l! c,l Co .Itt.~ c.:..l . ~.. ,111 .. "~"'60 .• _6 " ·•-. Y.o:ta ct.l C: ~'1.1.• ~ ~cl.l.t•d. ,bl( ~.G~J:•. JiO.pl\1 114•. ,vo t. !> .1 I\ ' l 11<\elt.f. . &11'6·)·)l~7 . . · DelleJt, Tt.o., u ~. h .utdoilt(.i1 a're~t!"•.••: tk• ... aae and. t.."~· ·~n . . r:rP•. Qearge V~s.f\l11ato11 to th~ .Present~ Ill!. Tka•.u ·"·-.. B•ll•ll • 1\ppleta!I,.,C.ellt.lll:.lft .~!1116 •. , 3d~ .Jh ..• .P RES.J DE:NTS~:-U •. $, tJUe E1.7~.l:,.B77 Co.2 &171it1 •. J<3 Kllll .t:t .!ot.eph tla~han . t:ec:.t• abou.t .the Presidents.:~ .CiollpiU.tlOI\ p,t ,b_logrephloe.l ~ncl hl,.,t.qr.lcal .d!ll.t!'• 811 "o .. ph Nathan Kane, H,,w, IU.l•olla ,\,96.0 .•. ~41\ .. P.·· . ~hlll!~d onl~ In Area ~~o. PRilS1D£11T!i--U, S• '-' tte Col C ~l!tl.l!!!!. 111. Re.hr.c.nc!\ ~! .. cove ~40• U,S,--HISTO.RY ~:;s~:~.:sQcJ/.1. j.IJI:: ~ND (;~STQ/1.~ 1..l.t.l!! . ' . !(~J c:, _J, !'\,. ~(·~ ~.·,1, K.:t ~.·3. c::~.1 ICa v_._4 .~.1, Kt ~~~ ~~ .J. 1\,. v.,G c:.l 1\ tl78o:S91\tl 1.94?. fa.J.I.q 811ttertletd 1 Roger ;fh~ :"""'rl.c•ll Pa~t i . •. ~lstOI')I ot 't;he Uri (ted StAt."s r:r~ .... Con~o.rd ,tP .H.I.~o.shlu, .. 1776:-1~4~ .• . B:v: ,R.O!J~r But.t.e.rt.l•~.l~~ SJ.aon •11cl Schuster, 19'17. 47!1 P• t lllu•• tl:~:~;:n~~~g~:~~:P l<:T~RfA~i; Wil~i';fi . :r.t..t.le 1!1~ ... .- . ~'""' ~I • ··~ ·· : • ,1 ' ~:·· ~ ' ~ ; 0 1!-t\) " \D }., \)( t·..;-o . . ~ : l I ~ ... • ! • u· , : ..... 1: I I ._.~ j J .. .- > l:z:j 0 0 ;>~-' ~ I ~ ..... en ~ -it a ~ 0 :I: z en 0 z t!Z 44 Journal of Library Automation Vol. 1/ 1 March, 1968 imprinted on the spine. Fifty sets of the catalog were ready when the building opened in November, 1966. The shelf list, printed in four copies, contained 3,261 pages. Each set required seven binders. In view of the fact that activation of the Library continued through the first year of opening, the collection grew at a much greater rate than is anticipated for subsequent years. Hence when the building opened, there was available besides the annual catalog a first supplement, in ten copies, listing the 4,000 titles cataloged from July through September, 1966. Although it was proposed originally to prepare monthly supplements, factors of cost and staff time led . to acceptance of quarterly supplements. The second supplement, issued in January, 1967, included the 8,000 titles cataloged from July through December, 1966. A third supplement, issued in April, 1967, included the 12,000 titles cataloged from July, 1966, through March, 1967. The April supplement had 1,934 pages in its author & title section, 1,206 pages in the subject section, and 1,752 pages in the shelf list. The major drawback to an off-line, batch-process book catalog is that it is an obsolete document when produced. This was especially true during the first year when the library grew at the rate of 100 volumes per working day. As a partial remedy to this situation a brief, dated catalog card accompanies each book cataloged for the Meyer Library. In- formation included consists of call number, author, and title. This card is placed in an alphabetical file at the reference desk and purged when a new supplement is issued. The Meyer Library s~aff considered the ten copies of the supplement inadequate for use in the building; during the second year twenty copies of each supplement are to be prepared by running the print program twice. Supplements in the second and succeding years will, of course, be considerably shorter than those issued in the first year. THE SECOND ANNUAL CATALOG Preparation of the second annual catalog began in the spring of 1967. This catalog lists the 41,000 titles cataloged as of the end of June, 1967. The first procedure was to emend the 1966 tape by purging the entries to be changed or deleted; corrected entries come in with new data. In July the information for titles cataloged from April through June was transferred to magnetic tape and merged with the data in the April, 1967, supplement. All programs were run through the Author-Title and Subject Update Program and at that point merged with the emended catalog from the preceding year, the Split Program was run, and the pages for a new catalog were created. As in the preceding year, library staff scanned the completed pages and masked noticeable errors. The Photo Reproduction Service printed A Book Catalog at Stanford/ JOHNSON 45 75 ·copies of each page during the first half of August, and the Stanford University Press bound the catalog during the following month. Com- pleted sets of the catalog were delivered on September 20, 1967, a week before classes were to begin for the new academic year. The 1967 catalog is 4,612 pages in length-2,683 pages in the author & title catalog, divided into five volumes of 530 pages each, and 1,929 pages in the subject catalog, divided into four volumes of 480 pages each. As in 1966, there is a title page and an explanatory introduction in each volume. Floor plans of the library are on the end sheets; and imprinted in gold on the spine are the letters included in each volume, there being clean alphabetical breaks between volumes. The second annual shelf list is 5,634 pages long, and each of the four copies requires eleven binders. Some confusion resulted in 1966, when both author & title and subject catalogs were bound in the same color. In 1967 the author & title catalog was bound in tan bookcloth and the subject catalog in light green. MACHINE TIMING As the above figures demonstrate, the 1967 edition of the book catalog is no brief document, Similarly, time required to process the information on the computer was not brief. As stated earlier, the addition of the expanded print chain considerably reduced the speed of the line printer. Instead of printing in excess of 600 lines per minute, the printer speed was reduced to 250 lines per minute. This speed was determined by timings made of the print programs. To print each page in the annual author & title and subject catalogs, eighteen seconds were required. To print each page in the supplements or in the shelf list, ten seconds were required. Thus, for example, to print the 4,612 pages in the 1967 annual catalog, twenty-three hours were required on the computer printer. In processing the supplements and annual catalogs, it has now become necessary to talk of time required for processing in terms of hours and not seconds or minutes. During the preparation of the 1967 annual cata- log timings were made of the various internal programs, whose output was magnetic tape and which were not tied to the mechanical limitations of the line printer. Sample times are shown in Table 4. Table 4. Program Running Times Format and Update Program ( LB020) Shelf List Split Program ( LB030) Author-Title and Subject Update Program (LB050) Author-Title and Subject Split Program ( LB060) 6.5 hours 11.2 hours 3.7 hours 28.5 hours Throughout the year time is required on the computer for the prepara- tion of edit lists. Timing was conducted for this particular program as well For each 100 records entered, four minutes of machine time are required to prepare an edit list. 46 Journal of Library Automation Vol. 1/ 1 March, 1968 · The computer employed for the project is a university facility, and the Library was billed for its use at the rate of $32.00 per hour. The Library receives a monthly statement for various charges from the Ad- ministrative Data Processing Center, and these have served as one basic record to employ in calculating the actual costs of the book catalog. COSTS The determination of actual costs is a difficult undertaking, and a mean- ingful comparison with costs estimated during the planning process is filled with problems, uncertainties, questions of definition, etc. In a sense, it is impossible to make a meaningful comparison. An element measured during planning is not the same as the element actually achieved. For example, during the early planning stages, before systems design actually began, there was no clear plan for the shelf list nor idea of what its role would actually be. The shelf list as finally designed and implemented is a far more sophisticated document than was then visualized. Second, there was no clear thought given to the inputting of separate records for each added volume or copy of a given item in order to achieve an in- ventory control document as well as a classed listing of items in the li- brary. Third, there was no clear determination as to the length of the sort field required and its effect on processing. Fourth, a principal study . conducted to justify the book catalog compared its projected costs with the costs of three dictionary card catalogs in the new library. Although the book catalog was implemented, the three card catalogs were not, and we have no idea as to the accuracy of our calculations of their cost, even though our experience with the preparation of card catalogs is greater. Fifth, cost studies were based upon the preparation of a 40,000- title catalog as the first product. This was an unrealistic assumption to make, because the library was to open with only 25,000 titles in its col- lection. Given such reservations and conditions, an effort has been made to summarize estimated costs and so attempt an understanding of how they compare to actual costs. Even the determination of actual costs is diffi- cult. It must be borne in mind that the complete operation was per- formed "in house." Cost statements thus omit considerations of such nec- essary factors as overhead and considerable administrative supervision. For example, during the second year the Library was not charged for program maintenance, a signi.Jicant contribution from the Administrative Data Processing Center. Initial planning was based upon preparation of a 40,000-title ( 60,000- volume) catalog, ·and it is possible to present cost approximations in two sections, the first recording · costs required to prepare 50 copies of the 25,000-title catalog issued in 1966; and, second, the additional costs re- quired to input the next 16,000 titles, issue three supplements, and pre- A Book Catalog at Stanford/JOHNSON 47 pare 75 copies of the 41,000-title ( 60,000-vohime) catalog issued in 1967. They are shown in Table 5. . . Table 5. Cost Approximations July 65-Aug. 66 Sept. 66-Aug. 67 Input (@ $.40 per title) $10,000 $ 6,400 Programming Computer charges Edit lists Test catalogs Supplements Annual catalog Reproduction Binding . 5,945 3,000 4,000 2,500 4,570 1966 ( 350 vols.); 1967 ( 675 vols. ) 805 Binders for shelf list and supplements 84 Totals $30,904 1,660 4,460 4,950 5,270 1,690 300 $24,730 . If we eliminate the costs directly related to the production of the 25,000-title catalog, we may be able to isolate the cost of the 41,000-title catalog issued in 1967. This calculation is subject to a certain amount of error, because some processing done in preparation of the 1966 catalog was used again in 1967. This may be compensated for, however, by the time required for the utility programs to emend and delete items from the 1966 tape. Test catalogs and their cost were not considered in early planning, and so their $4,000 cost is eliminated as well. In Table 6 below the actual costs for the 41,000-title catalog so de- rived are compared with the estimates prepared in the fall of 1964 and the estimates offered in April, 1965, at the conclusion of the preliminary systems design. Various adjustments have been made so that these figures are as comparable as possible. For example, the systems estimate did not include a cost for inputting, and this has been added. The actual figures have been adjusted to include costs only for the printing and binding of fifty sets of the catalog instead of the seventy-five which actually were prepared. Although the December, 1964, estimate included under com- puter charges a factor for supplements, these are not included in the systems estimate or actual charges. The format of the supplement par- ticularly became so sophisticated in design and implementation, both in format and number of copies, that this discrepancy is minimal. These figures necessarily cannot be precise, but they give some magnitude of the work undertaken. Although the cost figures indicate that the actual cost was more than fifty per cent greater than actually estimated in 1964, it does remain close to the estimate prepared in the systems design. The chief reasons for the discrepancy may be summarized as the underestimation of the amount 48 Journal of Library Automation Vol. 1/ 1 March, 1968 of machine time needed for the various programs, the underestimation of the programming job involved, underestimating the charges for edit lists; and, most important, the design of a system that was very much more sophisticated than that originally foreseen in 1964. Table 6. Comparison of Estimated and Actual Costs Input of 40,000 titles Computer charges Reproduction (50 copies) Binding Programming Totals Dec. 64 April 65 Estimate Estimate $11,060 $16,647 1,750 8,595 4,324 4,500 . 3,750 2,385 3,000 6,000 $23,884 $38,127 Actual Costs $16,400 9,610 5,115 1,600 5,945 $38,670 Even though costs were greater than expected, one estimate did hold up, namely the time required to complete the job. Delivery and installa- tion of equipment, programming, program testing, inputting of data, re- production, and binding-all were on schedule with only minor slippage that did not affect the completion date of the overall job. THE FUTURE The publication of the second annual book catalog coincided with the completion of the library's activation project. Continued work on the addition of materials to the new library has been assigned to existing divisions within the University Library. Inputting of cataloging informa- tion for the Meyer Library and preparation of supplements to the book catalog and of new annual catalogs are now functions of the Catalog Division. Growth of the library will henceforth proceed at a slower rate, with from 5,000 to 8,000 titles being added each year. The first by-product of the system has appeared-a listing of serial publications in the library for use in ordering and claiming operations. Even as the first annual catalog was being prepared, the feeling was expressed that the equipment employed (the IBM 1401 Computer) was not adequate in an economic sense to undertake this mission for increas- ingly larger masses of material. This · feeling became clearer with the preparation of the second edition. Looking to the future, we see at pres- ent several paths we may follow. First, studies are under way on the conversion of the book catalog operation to larger equipment, in this case an IBM System/360, probably linking it to the overall library program of automation. Not only might this change entail use of more powedul equipment for the ·off-line proc- essing necessary to prepare a book catalog, but there may be possibilities as well of instituting on-line inquiry. There could thus be eliminated the problem of supplements and time-lags. A Book Catalog at Stanford/JOHNSON 49 Second, preliminary inquiries have also been made on the use of the existing tapes in computerized typesetting equipment. The hoped-for re- sult would be the achievement of graphic arts quality on the book catalog page and less bulk to the completed catalog through the greater legibil- ity and greater density thus realized. CONCLUSION Such success as this project has achieved may be attributed to a num- ber of factors: The entire operation was performed "in house;" we were able to draw upon the skills of many staff members on the Stanford campus-in the Library, the Administrative Data Processing Center, the Photo Reproduc- tion Service, the News and Publications Office, and the Stanford Univer- sity Press. IBM representatives, and particularly the systems engineer assigned to the project, gave considerable impetus and guidance to the under- taking. Equipment was delivered on schedule and functioned well. There was a particularly harmonious and understanding working rela- tionship achieved among the many participants in the project, and ad- ministrative support from Library and University officials was constant. There never was any problem in gaining access to the computer, and the staff responsible for its operation gave devoted service in the prepara- tion of the catalog. - Through a happy combination of circumstances, sufficient lead time was available for the project to be completed on schedule. When it became obvious that we should exceed the cost estimates originally prepared, Library funds were available to continue the work. It is clear from student reactions that the book catalog is a useful tool in the new library, and it is hoped that the experience here recounted will prove valuable to the profession at large. REFERENCES 1. McCune, Lois C.; Salmon, Stephen R.: "Bibliography of Library Au- tomation," ALA Bulletin, 61 (June 1967), 674-94. 2. Weber, David C.: "Book Catalog Trends in 1966," Library Trends, 16 (July 1967), 149-64. 3. Freitag, Wolfgang M.: "Planning for Student Interaction with the Library," California Librarian, 26 (April 1965), 89-96. 4. Hayes, Robert M.; Shoffner, Ralph M.: The Economics of Book Catalog Production, a Study Prepared for Stanford University Li- braries and the Council on Library Resources (Sherman Oaks, Calif.: Advanced Information Systems Division, 1964). 5. Hayes, Robert M.; Shoffner, Ralph M.; Weber, David C.: "The Eco- nomics of Book Catalog Production," Library Resources and Techni- cal Services, 10 (Winter 1966), 63-65, 90. 50 Journal of Library Automation Vol. 1/ 1 March, 1968 6. Parker, Ralph H.: ·"Book Catalogs," Library Resources and Technical Services, 8 (Fall 1964), 348. 7. Simonton, Wesley: "The Computerized Catalog: Possible, Feasible, Desirable?" Library Resources and Technical Services, 8 (Fall1964 ), 403-405. 8. Avram, Henriette D .; Guiles, Kay D.; Meade, Guthrie T.: "Fields of Information on Library of Congress Catalog Cards: Analysis of a Random Sample, 1950-1964," The Library Quarterly, 37 (Aprill967), 190-91. 9. "Rule 134," Anglo-American Cataloging Rules. North American Text (Chicago: American Library Association, 1967), pp. 196-97. 10. "Rule 3:6,'' Rules for Descriptive Cataloging in the Library of Con- gress (Washington, D. C.: U. S. Government Printing Office, 1949), p. 14. ll. Hines, Theodore C.; Harris, Jessica L.: Computer Filing of Index, Bibliographic, and Catalog Entries (Newark: Bro-Dart Foundation, 1966)' p. 18. 2926 ---- 75 THE DEVELOPMENT AND ADMINISTRATION OF AUTOMATED SYSTEMS IN ACADEMIC LIBRARIES Richard DE GENNARO: Harvard University Library, Cambridge, Mass. The first part of this paper considers three general approaches to the development of an automation program in a large research library. The library may decide simply to wait for developments; it may attempt to develop a total or integrated system from the start; or it may adopt an evolutionary approach leading to an integrated system. Outside consult- ants, it is suggested, will become increasingly important. The second part of the paper deals with important elements in any program regardless of the approach. These include the building of a capability to do auto- mation work, staffing, equipment, organizational structure, selection of projects, and costs. Since most computer-based systems in academic libraries at the present time are in the developmen tal or early operational stages when improve- ments and modifications are frequent, it is difficult to make a meaningful separation between the developmental function and the administrative or management function. Development, administration, and operations are all bound up together and are in most cases carried on by the same staff. This situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. In any case, this paper will not attempt to distinguish between developmental and administrative ftmc- tions but will instead discuss in an informal and non-technical way some of the factors to be considered b y librarians and administrators when 76 Journal of Library Automation Vol. 1/ 1 March, 1968 their thoughts turn, as they inevitably must, to introducing computer systems into their libraries or to expanding existing machine operations. Alternative approaches to library automation will be explored first. There will follow a discussion of some of the important elements that go into a successful program, such as building a capability, a staff, and an organization. The selection of specific projects and the matter of costs will also be covered briefly. APPROACHES TO LIBRARY AUTOMATION Devising a plan for automating a library is not entirely unlike formu- lating a program for a new library building. While there are general types of building best suited to the requirements of different types of library, each library is unique in some respects, and requires a building which is especially designed for its own particular needs and situation. As there are no canned library building programs, so there are no canned library automation programs, at least not at this stage of development; therefore the first task of a library administration is to formulate an approach to automation based on a realistic assessment of the institution• s needs and resources. Certain newly-founded university libraries such as Florida Atlantic, which have small book collections and little existing bibliographical ap- paratus, have taken the seemingly logical course of attempting to design and install integrated computer-based systems for all library operations. Certain special libraries with limited collections and a flexible bibligraphi- cal apparatus are also following this course. Project INTREX at M.I.T. is setting up an experimental library operation parallel to the traditional one, with the hope that the former will eventually transform or even supersede the latter. Several older university libraries, including Chicago, Washington State, and Stanford, are attempting to design total systems based on on-line technology and to implement these systems in modules. Many other university libraries (British Columbia, Harvard, and Yale to name only a few) approach automation in an evolutionary way and are designing separate, but related, batch-processing systems for various housekeeping functions such as circulation, ordering and accounting, cata- log input, and card production. Still other libraries (Princeton is a notable example) expect to take little or no action until national standardized bibliographical formats have been promulgated, and some order or pat- tern has begun to emerge from the experimental work that is in progress. Only time will tell which of these courses will be most fruitful. Meanwhile the library administrator must decide what approach to take; and the approach to automation, like that to a building program, must be based on local requirements and available resources ( 1,2). For the sake of this discussion the major principal approaches will be considered under three headings: 1) the wait-for-developments approach, Automated Systems in Academic Libraries/ DE GENNARO 77 2) the direct approach to a total system, and 3) the evolutionary ap- proach to a total system. The use of outside consultants will also be discussed. The Wait-For-Developments Approach This approach is based on the premise that practically all computer- based library systems are in an experimental or research-and-develop- ment stage with questionable economic justification, and that it is un- necessary and uneconomical for every library to undertake difficult and costly development work. The advocates of this approach suggest that library automation should not be a moon race and say that it makes sense to wait until the pioneers have developed some standardized, workable, and economical systems which can be installed and operated in other libraries at a reasonable cost. For many libraries, particularly the smaller ones, this is a reasonable position to take for the next few years. It is a cautious approach which minimizes costs and risks. For the larger libraries, however, it overlooks the fact that soon, in order to cope with increasing workloads, they will have to develop the capability to select, adapt, implement, operate, and maintain systems that were developed elsewhere. The development of this capability will take time and will be made more· difficult by the absence of any prior interest and activity in automation within the adapt- ing institution. The costs will be postponed and perhaps reduced because the late-starters will be able to telescope much of the process, like coun- tries which had their industrial revolution late. However, it will take some courage and political astuteness for a library administrator to hold firmly to this position in the face of the pressures to automate that are coming from all quarters, both inside and outside the institution ( 3). A major error in the wait-for-developments approach is the assumption that a time will come when the library automation situation will have shaken down and stabilized so that one can move into the field confi- dently. This probably will not happen for many years, if it happens at all, for with each new development there is another more promising one just over the horizon. How long does one wait for the perfect system to be developed so that it can be easily "plugged in," and how does one recognize that system when one sees it? There is real danger of being left behind in this position, and a large library may then find it difficult indeed to catch up. The Direct Approach To A Total System This approach to library automation is based on the premise that, since a library is a total operating unit and all its varied operations are inter- related and interconnected, the logic of the situation demands that it be looked upon as a whole by the systems designers and that a single inte- 78 Journal of Library Automation Vol. 1/ 1 March, 1968 grated or total system be designed to include all machinable operations in the library. Such a system would make the most efficient and eco- nomical use of the capabilities of the computer. This does not require that the entire system be designed and implemented at the same time, but permits treating each task as one of a series of modules, each of which can be implemented separately, though designed as part of a whole. Several large libraries have chosen this method and, while a good deal of progress is being made, these efforts are still in the early develop- ment stage. The University of Chicago system is the most advanced (4) . Unlike the evolutionary approach, which assumes that much can be done with local funds, home-grown staff, batch processing and even sec- ond generation computers, the total systems approach must be based on sophisticated on-line as well as batch-processing equipment. This equip- ment is expensive; it is also complex, requiring a trained and experienced staff of systems people and expert programmers to design, implement, and operate it effectively. Since the development costs involved in this approach are considerable, exceeding the available resources of even the larger libraries, those libraries that are attempting this method have sought and received sizable financial backing from the granting agencies. The total systems approach has logic in its favor: it focuses on the right goal and the goal will ultimately be attainable. The chief difficulty, however, is one of timing. The designers of these systems are trying to telescope the development process by skipping an intermediate stage in which the many old manual systems would have been converted to simple batch-processing or off-line computer systems, and the experience and knowledge thus acquired utilized in taking the design one step further into a sophisticated, total system using both on-line and batch-processing techniques. The problem is that we neither fully understand the present manual systems nor the implications of the new advanced ones. We are pushing forward the frontiers of both library automation and computer technology. It may well be that the gamble will pay off, but it is extremely doubtful that the first models of a total library system will be economi- cally and technically viable. The best that can be hoped for is that they will work well enough to serve as prototypes for later models. While bold attempts to make a total system will unquestionably ad- vance the cause of library automation in general, the pioneering libraries may very well suffer serious setbacks in the process, and the prudent administrator should carefully weigh the risks and the gains of this ap- proach for his own particular library. The Evolutionary Approach To A Total System This approach consists basically of taking a long-range, conservative view of the problem of automating a large, complex library. The ultimate goal is the same as that of the total systems approach described in the Automated Systems in Academic Libraries/DE GENNARO 79 preceding section, but the method of reaching it is different. In the total systems approach, objectives are defined, missions for reaching those ob- jectives are designed, and the missions are computerized, usually in a series of modules. In the evolutionary approach, the library moves from traditional manual systems to increasingly complex machine systems in successive stages to achieve a total system with the least expenditure of effort and money and with the least disruption of current operations and services ( 5 ) . In the first stage the library undertakes to design and implement a series of basic systems to computerize various procedures using its own staff and available equipment. This is something of a bootstrap operation, the basic idea of which is to raise the level of operation - circulation, acquisitions, catalog input, etc. -from existing manual systems to simple and economical machine systems until major portions of the conventional systems have been computerized. In the process of doing this, the library will have built up a trained staff, a data processing department or unit with a regular budget, some equipment, and a space in which to work: in short, an in-house capa- bility to carry on complex systems work. During this first stage the library will have been working with tried and tested equipment and software packages - probably of the second generation variety - and mean- while, third generation computers with on-line and time-sharing software are being debugged and made ready for use in actual operating situations. At some point the library itself, computer hardware and software, and the state of the library automation art will all have advanced to a point where it will be feasible to undertake the task of redesigning the simple stage-one systems into a new integrated stage-two system which builds upon the designs and operating experience obtained with the earlier sys- tems. These stage-one systems will have been, for the most part, mecha- nized versions of the old manual systems; but the stage-two systems, since they are a step removed from the manual ones, can be designed to incorporate significant departures from the old way of doing things and take advantage of the capabilities of the advanced equipment and software that will be used. The design, programming, and implementa- tion of these stage-two systems will be facilitated by the fact that the library is going from one logical machine system to another, rather than from primitive unformalized manual systems to highly complex machine systems in one step. Because existing manual systems in libraries produce no hard statistical data about the nature and number of transactions handled, stage-one ma- chine systems have had to be designed without benefit of this essential data. However, even the simplest machine systems can be made to pro- duce a wide variety of statistical data which can be used to great advan- tage by the designers of stage-two systems. The participation of non- 80 Journal of Library Automation Vol. 1/ 1 March, 1968 library-oriented computer people in stage-two design will also ·be facili- tated by the fact that they will be dealing with formalized machine sys- tems and records in machine readable form with which they can easily cope. While the old stage one of library automation was one in which librar- ians almost exclusively did the design and programming, it is doubtful that stage-two systems can or should be done without the active aid of computer specialists. In stage one it was easier for librarians to learn com- puting and to do the job themselves than it was to teach computer people about the old manual systems and the job to be done to convert them. This may no longer be the case in dealing with redesign of old machine systems into very complex systems to run on third or fourth generation equipment in an on-line, time-sharing environment. There is now a gen- eration of experienced computer-oriented librarians capable of specifying the job to be done and knowledgeable enough to judge the quality of the work that has been done by the experts. There is no reason why a team of librarians and computer experts should not be able to work ef- fectively together to design and implement future library systems. As traditional library systems are replaced by machine systems, the special- ized knowledge of them becomes superfluous, and it was this type of knowledge that used to distinguish the librarian from the computer expert. Just as there is a growing corps of librarians specializing in computer work, so there is a growing corps of computer people specializing in li- brary work. It is with these two groups working together as a team that the hope of the future lies. The question of who is to do library automa- tion - librarians or computer experts - is no longer meaningful; library automation will be done by persons who are knowledgeable about it and who are deeply committed to it as a specialty; whether they have ap- proached it through a background of librarianship or technology will be of little consequence. Experience has shown that computer people who have made a full-time commitment to the field of library automation have done some of the best work to date. Stage-two, or advanced integrated library systems, may be built by a team of library and computer people of various types working as staff members of the library, as has been suggested in the preceding discussion, but this approach also has its weaknesses. For example, let us assume that a large library has finally brought itself through stage one and is now planning to enter the second stage. It may have acquired a good deal of the capability to do advanced work, but its staff may be too small and too inexperienced in certain aspects of the work to undertake the major task of planning, designing, and implementing a new integrated system. Additional expert help may be needed, but only on a temporary basis during the planning and design stages. Such people will be hard to find, and also hard to hire within some library salary structures. They Automated Systems in Academic Librari-es/ DE GENNARO 81 will be difficult to absorb into the library's existing staff, administrative, and physical framework. They may also be difficult to separate from the staff when they are no longer needed. USE OF OUTSIDE CONSULTANTS There are alternative approaches to creating advanced automated sys- tems. The discussion that follows will deal with one of the most obvious: to contract much of the work out to private research and development firms specializing in library systems. What comes to mind here is an analogy with the employment of spe- cialized talents of architects, engineers, and construction companies in planning and building very large, complex and costly library buildings, which are then turned over to librarians to operate. When a decision has been made to build a new building, the university architect is not called in to do the job, nor is an architect added to the library staff, nor are li- brarians on the staff trained to become architects and engineers qualified to design and supervise the construction of the building. Most libraries have on their staffs one or two librarians who are experienced and knowl- edgeable enough to determine the over-all requirements of the new build- ing, and together they develop a building program which outlines the general concept of the building and specifies various requirements. A qualified professional architect is commissioned to translate the program into preliminary drawings, and there follows a continuing dialogue be- tween the architect and the librarians which eventually produces accept- able working drawings of a building based on the original program. For tasks outside his area of competence, the architect in turn engages the services of various specialists, such as structural and heating and venti- lating engineers. Both the architect and the owners can also call on library consultants for help and advice if needed. The architect participates in the selection of a construction company to do the actual building and is responsible for supervic;ing the work and making sure that the building is constructed according to plans and contracts. Upon completion, the building is turned over to the owners, and the librarians move in and operate it and see to its maintenance. In time, various changes and additions will have to be made. Minor ones can be made by the regular buildings staff of the insti- tution, but major ones will probably be made with the advice and assist- ance of the original architect or some other. In the analogous situation, the library would have its own experienced systems unit or group capable of formulating a concept and drawing up a written program specifying the goals and requirements of the automated system. A qualified "architect" for the system would be engaged in the form of a small firm of systems consultants specializing or experienced in library systems work. Their task, like the architect's, would be to turn 82 Journal of Library Automation Vol. 1/ 1 March, 1968 the general program into a detailed system design with the full aid and participation of the local library systems group. This group would be ex- perienced and competent enough to make sure that the consultants really understood the program and were working in harmony with it. Mter an acceptable design had emerged from this dialogue, the consultant would be asked to help select a systems development firm which would play a role similar to that of the construction company in the analog: to com- plete the very detailed design work and .to do the programming and de- bugging and implementation of the system. The consultant would over- see this work, just as the architect oversees the construction of a building. The local library group will have actively participated in the develop- ment and implementation of the system and would thus be competent to accept, operate, maintain and improve it. Success or failure in this approach to advanced library automation will depend to a large extent on the competence of the "architect" or consult- ant who is engaged. Until recently this was not a very promising route to take for several reasons. There were no firms or consultants with the requisite knowledge and experience in library systems, and the state of the library automation art was confused and lacking in clear h·ends or direction. It was generally felt tl1at batch-processing systems on second and even third generation computing equipment could and should be designed and installed by local staff in order to give them necessary ex- perience and to avoid the failures that could come from systems designed outside the library. Library automation has evolved to a point where there is a real need for advanced library systems competence that can be called upon in the way that has been suggested, and individuals and firms will appear to satisfy that need. It is very likely, however, that the knowledge and the experience that is now being obtained in on-line systems by pioneering libraries such as the University of Chicago, Washington State University and Stanford University, will have to be assimilated before we can expect competent consultants to emerge. The chief difficulty with the architect-and-building analog is that while the process of designing and constructing library buildings is widely un- derstood, there being hundreds of examples of library buildings which can be observed and studied as precedents, the total on-line library sys- tem has yet to be designed and tested. There are no precedents and no examples; we are in the position of asking the "architect'' to design a prototype system, and therein lies the risk. Mter this task has been done several times, librarians can begin to shop around for experienced and competent "architects" and successful operating systems which can be adapted to their needs. The key problem here, as always in library auto- mation, is one of correct timing: to embark on a line of development - Automated Systems in Academic Libraries/DE GENNARO 83 only when the state of the art is sufficiently advanced and the time is ripe for a particular new development. BUILDING THE CAPABILITY FOR AUTOMATION Regardless of the approach that is selected, there are certain prerequi- sites to a successful automation effort, and these can be grouped under the rubric of "building the capability." To build this capability requires time and money. It consists of a staff, equipment, space, an organization with a regular budget, and a certain amount of know-how which is gen- erally obtained by doing a series of projects. Success depends to a large extent on how well these resources are utilized, i.e. on the overall sh·ategy and the nature and timing of the vari- ous moves that are made. Much has already been said about building the capability in the discussion on the approaches to automation, and what follows is an expansion of some points that have been made and a re- capitulation of others. Staff Since nothing gets done without people, it follows that assembling, training, and holding a competent staff is the most important single ele- ment in a library's automation effort. The number of trained and experi- enced library systems people is still extremely small in ·relation to the ever-growing need and demand. To attract an experienced computer li- brarian and even to hold an inexperienced one with good potential, li- braries will have to pay more than they pay members of the staff with comparable experience in other lines of library work. This is simply the law of supply and demand at work. To attract people from the computer field will by the same token require even higher salaries. In addition, library systems staff, because of the rate of development of the field and the way in which new information is communicated, will have to be given more time and funds for training courses and for travel and attendance at conferences than has been the case for other library staff. The question of who will do library automation-librarians or computer experts-has already been touched upon in another context, but it is worth emphasizing the point that there is no unequivocal answer. There are many librarians who have acquired the necessary computer expertise and many computer people who have acquired the necessary knowledge of library functions. The real key to the problem is to get people who are totally committed to library automation whatever their background. Computer people on temporary loan from a computing center may be poor risks, since their professional commitment is to the computer world rather than that of the library. They are paid and promoted by the com- puting center and their primary loyalty is necessarily to that employer. Computer people, like the rest of us, give their best to tasks which they find interesting and challenging, and by and large, they tend to look l 84 Journal of Library Automation Vol. 1/ 1 March, 1968 upon the computerization of library housekeeping tasks as trivial and un- worthy of their efforts. On the other hand, a first-rate computer person who has elected to specialize in library automation and who has accepted a position on a library staff may be a good risk, because he will quickly take on many of the characteristics of a librarian yet without becoming burdened by the full weight of the conventional wisdom that librarians are condemned to carry. The ideal situation is to have a staff large enough to include a mixture of both types, so that each will profit by the special knowledge and experience of the other. To bring in computer experts inexperienced in library matters to auto- mate a large and complex library without the active participation of the library's own systems people is to invite almost certain failure. Outsiders, no matter how competent, tend to underestimate the magnitude and com- plexity of library operations; this is tme not only of computing center people but also of independent research and development firms. A library automation group can include several different types of per- sons with very different kinds and levels of qualifications. The project director or administrative head should preferably be an imaginative and experienced librarian who has acquired experience with electronic data processing equipment and techniques, and an over-all view of the gen- eral state of the library automation art, including its potential and direc- tion of development. There are various levels of library systems analysts and programmers, and the number and type needed will depend on the approach and the stage of a particular library's automation effort. The critical factor is not numbers but quality. There are many cases where one or two inspired and energetic systems people have far surpassed the efforts of much larger groups in both quality and quantity of work. Some of the most effective library automation work has been done by the people who com- bine the abilities of the systems analyst with those of the expert program- mer and are capable of doing a complete project themselves. A library that has one or two really gifted systems people of this type and permits them to work at their maximum is well on the way to a successful auto- mation effort. As a library begins to move into development of on-line systems, it will need specialist programmers in addition to the systems analysts described above. These programmers need not be, and probably will not be, librar- ians. Other members of the team, again depending on the projects, will be librarians who are at home in the computer environment but who will be doing the more traditional types of work, such as tagging and editing machine catalog records. In any consideration of library automation staff, it would be a mistake to underestimate the importance of the role of keypunchers, paper tape Automated Systems in Academic Libraries/DE GENNARO 85 typists, and other machine operators; it is essential that these staff mem- bers be conscientious and motivated persons. They are responsible for the quality and quantity of the input, and therefore of the output, and they can frequently do much to make or break a system. A good deal of discussion and experimentation has gone into the question of the relative efficiency of various keyboarding devices for library input, but little con- sideration is given to the human operators of the equipment. Experience shows that there can be large variations in the speed and accuracy of different persons doing the same type of work on the same machine. Equipment One of the lessons of library automation learned during the last few years is that a library cannot risk putting its critical computer-based systems onto equipment over which it has no control. This does not neces- sarily mean that it needs its own in-house computer. However, if it plans to rely on equipment under the administrative control of others, such as the computer center or the administrative data processing unit, it must get firm and binding commitments for time, and must have a voice in the type and configuration of equipment to be made available. The im- portance of this point may be overlooked during an initial development period, when the library's need for time is minimal and flexible; it be- comes extremely critical when systems such as acquisitions and . circula- tion become totally dependent on computers. People at university computing centers are generally oriented toward scientific and research users and in a tight situation wiU give the library's needs second priority; those in administrative data process~g, because they are operations oriented, tend to have a somewhat better appreciation of the library's requirements. In any case, a library needs more than the expressed sympathy and goodwill of those who control the computing equipment-it needs firm commitments. For all but the largest libraries, the economics of present-day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need, dedicated solely or largely to library uses. Even the larger libraries will find it extremely difficult to justify a high-discount second generation machine or a small third gen- eration machine during the period when their systems are being devel- oped and implemented a step or a module at a time. Eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. In the immediate future, most libraries will have to depend on equipment located in computing or data processing centers. The recent experience of the University of Chicago Library, which is pioneering on-line systems, suggests that this situation is inevitable, given the high core requirements and low com- 86 Journal of Library Automation Vol. 1/ 1 March, 1968 puter usage of library systems. Experience at the University of Missouri ( 6), suggests that the future will see several libraries grouping to share a machine dedicated to library use; this may well be preferable to having to share with research and scientific users elsewhere within the univer- sity. A clear trend is not yet evident, but it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library; and that local situations will dictate a variety of arrangements. While it is clear that the future of library automation lies in third-gen- eration computers, much of their promise is as yet unfulfilled, and it would be premature at this point to write off some of the old, reliable, second-generation batch-processing machines. The IBM 1401, for exam- ple, is extremely well suited for many library uses, particularly printing and formatting, and it is a machine easily mastered by the uninitiated. This old workhorse will be with us for several more years before it is retired to Majorca along with obsolete Paris taxis. Organization When automation activity in a library has progressed to a point where the systems group consists of several permanent professionals and several clericals, it may be advisable to make a permanent place for the group in the library's regular organizational structure. The best arrangement might be to form a separate unit or department on an equal footing with the traditional departments such as Acquisitions, Cataloging, and Public Services. This Systems Department would have a two-fold function: it would develop new systems and operate implemented systems; and it would bring together for maximum economy and efficiency most of the library's data processing equipment and systems staff. It will require ade- quate space of its own and- above all- a regular budget, so that per- manent and long-term programs can be developed and sustained on some thing other than an ad hoc basis. There are other advantages to having an established systems depart- ment or unit. It gives a sense of identity and esprit to the staff; and it enables them to work more effectively with other departments and to be accepted by them as a permanent fact of life in the library, thereby di- minishing resistance to automation. Let there be no mistake about it - the systems group will be a permanent and growing part of the library staff, because there is no such thing as a finished, stable system. (There is a saying in the computer field which goes "If it works, it's obsolete.") The systems unit should be kept flexible and creative. It should not be allowed to become totally preoccupied with routine operations and submerged in its day-to-day workload, as is too frequently the case with the traditional departments, which consequently lose their capacity to see their operations clearly and to innovate. Part of the systems effort Automated Systems in Academic Libraries/DE GENNARO 87 must be devoted to operational systems, but another part should be de- voted to the formulation and development of new projects. The creative staff should not be wasted running routine operations . . . There has never been any tradition for research and development work in libraries - they were considered exclusively service and operational institutions. The advent of the new technology is forcing a change in this traditional attitude in some of the larger and more innovative libraries which are doing some research and a good deal of development. It is worth noting that a concomitant of research and development is a certain amount of risk but that, while there is no such thing as change without risk, standing pat is also a gamble. Not every idea will succeed and we must learn to accept failures, but the experiments must be conducted so as to minimize the effect of failure on actual library operations. ·Automated systems are never finished - they are open-ended. They are always being changed, enlarged, and improved; and program and system maintenance will consequently be a permanent activity. This is one of the chief reasons why the equipment and the systems group should be concentrated in a separate department. The contrary case, namely dispersion of the operational aspects among the departments responsible for the work, may be feasible in the future as library automation becomes more sophisticated and peripheral equipment becomes less expensive, but the odds at this time appear to favor greater centralization. · The Harvard University Library has created, with good results, a new major department along the lines suggested above, except that it also in- cludes the photo-reproduction services. The combination of data process- ing and reprography in a single department is a natural and logical rela- tionship and one which will have increasingly important implications as both technologies develop concurrently and with increasing interdepend- ence in the future. Even at the present time, there is sufficient relation- ship between them so that the marriage is fruitful and in no way prema- ture. While computers have had most of the glamour, photographic tech- nology in general, and particularly the advent of the quick-copying ma- chine, during the last seven years has so far had a more profound and widespread impact on library resources and services to readers than the entire field of computers and data processing. Within the next several years, computer and reprographic technology will be so closely inter- twined in libraries as to be inseparable. It would be a mistake to sell reprography short in the coming revolution. PROJECT SELECTION No academic library should embark on any type of automation program without first acquiring a basic knowledge of the projects and plans of the Library of Congress, the National Library of Medicine, the National Li- 88 Journal of Library Automation Vol. 1/ 1 March, 1968 · brary of Agriculture, and certain of their joint activities, such as the Na- tional Serials Data Program. As libraries with no previous experience with data processing systems move into the field of automation, they frequently select some relatively simple and productive projects to give experience to the systems staff and confidence in machine tec;hniques to the rest of the library staff. Pre- cise selection will depend on the local situation, but projects such as the production of lists of current journals (not serials check-in), lists of re- serve books, lists of subject headings, circulation, and even acquisitions ordering and accounting systems are considered to be the safest and the most productive type of initial projects. Since failures in the initial stage will have serious psychological effects on the library administration and entire staff, it is best to begin with modest projects. Until recently it was fashionable to tackle the problem of automating the serials check-in sys- tem as a first project on the grounds that this was one of the most impor- tant, troublesome, and repetitive library operations and was therefore the best area in which to begin computerization. Fortunately, a more realistic view of the serials problem has begun to prevail - that serial receipts is an extremely complex and irregular library operation and one which will probably require some on-line updating capabilities, and com- plex file organization and maintenance programs. In any case, it is de- cidedly not an area for beginners. A major objection to all of the projects mentioned is that they do not directly involve the catalo~, which is at the heart of library automation. Now that the MARC II tormat has been developed by the Library of Congress and is being widely accepted as the standardized bibliographi- cal and communications format, the most logical initial automation effort for many libraries will be to adapt to their own environments the input system for current cataloging which is now being developed by the Li- brary of Congress. The logic of beginning an integrated system with the development of an input sub-system for current cataloging has always been compelling for this author - far more compelling than beginning in the ordering process, as so many advocate. The catalog is the central record, and the conversion of this record into machinable form is the heart of the matter of library automation. It seems self-evident that sys- tems design should begin here with the basic bibliographical entry upon which the entire system is built. Having designed this central module, one can then tum to the acquisitions process and design this module around the central one. Circulation is a similar secondary problem. In other words, systems design should begin at the point where the perma- nent bibliographical record enters the system and not where the first tentative special-purpose record is created. Unfortunately, until the ad- vent of the standardized MARC II format, it was not feasible, except in Automated Systems in Academic Libraries/ DE GENNARO 89 an experimental way, for libraries to begin with the catalog record, sim- ply because the state of the art was not far enough advanced. The development and acceptance of the MARC II format in 1967 marks the end of one era in library automation and the beginning of another. In the pre-MARC II period every system was unique; all the programming and most of the systems work had to be done by a library's own staff. In the post-MARC II period we will begin to benefit from systems and programs that will be developed at the Library of Congress and elsewhere, because they will ~e designed around the standard format and for at least one standard computer. As a result of this, automation in libraries will be greatly accelerated and will become far more wide- spread in the next few years ( 7). An input system for current cataloging in the MARC II format will be among the first packages available. It will be followed shortly by pro- grams designed to sort and manipulate the data in various ways. A library will require a considerable amount of expertise on the part of its staff to adapt these procedures and programs to its own uses (we are not yet at the point of "plugging-in" systems), but the effort will be considerably reduced and the risks of going down blind alleys with homemade ap- proaches and systems will be nearly eliminated for those libraries that are willing to adopt this strategy. The development and operation of a local MARC II input system with an efficient alteration and addition capability will be a prerequisite for any library that expects to learn to make effective use of the magnetic tapes containing the Library of Congress's current c;atalog data in the MARC II format, which will be available as a regular subscription in July, 1968. In addition to providing the experience essential for dealing with the Library of Congress MARC data, a local input system will en- able the library to enter its own data both into the local systems and into the national systems which will l?egin to emerge in the near future. Since the design of the MARC II format is also hospitable to other kinds of library data, such as subject-headings lists and classification schedules, the experience gained with it in an input system will be transferable to other library automation projects. COSTS The price of doing original development work in the library automa- tion field comes extremely high- so high that in most cases such work cannot be undertaken without substantial assistance from outside sources. Even when grants are available, the institution has to contribute a con- siderable portion of the total cost of any development effort, and this cost is not a matter of money alone; it requires the commitment of the library's limited human resources. In the earlier days of library automa-:- tion attention was focused on the high cost of hardware, computer and 90 Journal of Library Automation Vol 1/ 1 March, 1Q.68 peripheral equipment. The cost of software, the systems work and pro- gramming, tended to be underestimated. Experience has shown, how- ever, that software costs are as high as hardware costs or even higher. The development of new systems, i.e., those without precedents, is the most costly kind of library automation, and most libraries will have to select carefully the areas in which to do their original work. For those libraries that are content to adopt existing systems, the costs of the sys- tems effort, while still high, are considerably less and the risks are also reduced. These costs, however, will probably have to be borne entirely by the institution, as it is unlikely that outside funding can be obtained for this type of work. The justification of computer-based library systems on the basis of the costs alone will continue to be difficult because machine systems not only replace manual systems but generally do more and different things, and it is extremely difficult to compare them with the old manual systems, which frequently did not adequately do the job they were supposed to do and for which operating costs often were unknown. Generally speak- ing, and in the short run at least, computer-based systems will not save money for an institution if all development and implementation costs are included. They will provide better and more dependable records and sys- tems, which are essential to enable libraries simply to cope with increased intake and workloads, but they will cost at least as much as the inade- quate and frequently unexpansible manual systems they replace. The picture may change in the long run, but even then it seems more reason- able to expect that automation, in addition to profoundly changing · the way in which the library budget is spent, will increase the total cost of providing library service. However, that service will be at a much higher level than the service bought by today's library budget. Certain jobs will be eliminated, but others will be created to provide ·new services and services in greater depth; as a library becomes increasingly successful and responsive, more and more will be demanded of it. CONCLUSION The purpose of this paper has been to stress the importance of good strategy, correct timing, and intelligent systems staff as the essential in- gredients for a successful automation program. It has also tried to make clear that no canned formulas for automating an academic library are waiting to be discovered and applied to any particular library. Each li- brary is going to have to decide for itseH which approach or strategy seems best suited to its own particular needs and situation. On the other hand, a good deal of experience with the development and administra- tion of library systems has been acquired over the last few years and some of it may very well be useful to those who are about to take the plunge for the first time. This paper was written with the intention of Automated Systems in Academic Libraries/ DE GENNARO 91 passing along, for what they are worth, one man's ideas, opinions, and impressions based on an imperfect knowledge of the state of the library automation art and a modest amount of first-hand experience in library systems development and administration. REFERENCES 1. Wasserman, Paul: Th e Librarian and the Machine (Detroit: Gale, 1965). A thoughtful and thorough review of the state of the art of library automation, with some discussion of the various approaches to automation. Essential reading for library administrators. 2. Cox, N. S. M.; Dews, J. D.; Dolby, J. L.: The Computer and the Libmry (Newcastle upon Tyne: University of Newcastle upon Tyne, 1966). American edition published by Archon Books, Hamden, Conn. Extremely clear, well-written and essential book for anyone with an interest in library automation. 3. Dix, William S.: Annual Report of the Librarian for the Year Ending June 30, 1966 (Princeton: Princeton University Library, 1966). One of the best policy statements on library automation; a comprehensive review of the subject in the Princeton context, with particular empha- sis on the "wait-for-developments" approach. 4. Fussier, Herman H.; Payne, Charles T.: Annual Report 1966/67 to the National Science Foundation from the University of Chicago Li- brary; Development of an integrated, Computer-Based, Bibliographi- cal Data System for a Large University Library (Chicago: University of Chicago Library, 1967 ). Appended to the report is a paper given May 1, 1967, at the Clinic on Library Application of Data Processing conducted by the Graduate School of Library Science, University of Illinois. Mr. Payne is the author, and the paper is entitled "An Inte- grated Computer-Based Bibliographic Data System for a large Uni- versity Library: Progress and Problems at the University of Chicago." 5. Kilgour, Frederick G.: "Comprehensive Modern Library Systems," in The Brasenose Conference on the Automation of Libraries, Proceed- ings. (London: Mansell, 1967), 46-56. An example of the evolutionary approach as employed at the Yale University Library. 6. Parker, Ralph H.: "Not a Shared System: an Account of a Computer Operation Designed Specifically and Solely for Library Use at the University of Missouri," Librm·y Journal, 92 (Nov. 1, 1967), 3967-3970. 7. Annual Review of Information Science and Technology (New York: lnterscience Publishers), 1 ( 1966) - . A useful tool for surveying the current state of the library automation art and for obtaining citations to current publications and reports is a chapter on automation in li- braries which appears in each volume. 2927 ---- AUTOMATED BOOK ORDER AND CIRCULATION CONTROL PROCEDURES AT THE OAKLAND UNIVERSITY LIBRARY Lawrence AULD: Oakland University, Rochester, Michigan 93 Automated systems of book order and circulation control using an IBM 1620 Computer are described as developed at Oakland University. Rela- tive degrees of success and failure are discussed briefly. INTRODUCTION Oakland University, affiliated with Michigan State ·university and founded in 1957, offers degree programs at the bachelor's and master's levels. By September, 1967, 3,896 students were enrolled and continuing growth is anticipated in coming years. The library had holdings of 86,755 Jlumes and 17,908 units of microform materials on July 1, 1967. Although young, Oakland's library has already encountered a host of problems common to most academic libraries. In recognizing a need to 1utomate or otherwise improve basic routines of handling book ordering •• U. circulation control, Oakland is simply another member of a growing club. The book order system developed at Oakland is noteworthy because of ·~rtain features which may be unique: a title index to the on-order file, a computer prepared invoice-voucher form, and a computer prepared voucher card which serves as input to the computer for writing payment checks. In logic the system is related, through parallel invention, to the Machine Aided Technical Processing System developed at Yale Univer- sity ( 1). The system developed with unit record equipment at the Uni- versity of Maryland is perhaps more directly related, particularly in the use of the purchase order as a vendor's report form (2,3 ). The Pennsyl- 94 Journal of Library Automation Vol. 1/ 2 June, 1968 vania State University Library design for automated acquisitions, which uses a similar purchase order, includes the capacity for an elaborate and variable method for reporting the progress of each item from initial order to completion of cataloging ( 4,5) . The IBM 357 circulation control system developed at Southern Illinois University, Carbondale, set the pattern followed by most subsequent sys- tems ( 6,7) . Oakland's circulation control system, a variation of the IBM 357 system, is more flexible than some because it uses trigger cards to control machine operations. This paper, originally distributed to a relatively small group of persons and redrafted for a more general reading, presents a case study of how one institution in modest circumstances set about solving certain problems. It describes not systems to be copied but rather a learning process which will continue for many years to come. BACKGROUND During the winter of 1964/ 65, Oakland University Library laid out the plans and began work on a program of automation of the University Li- brary. An initial four-phase plan was conceived: 1) book order, 2) circu- lation control, 3) serials acquisitions, and 4) a printed book catalog. These housekeeping routines were felt to be the foundation for develop- ing further automation in the library. Their automation would liberate the staff, clerical and professional, from such nonproductive and repetitive_ tasks as alphabetizing and re-copying of bibliographic information. An early decision to learn by doing rather than attempting to design the ulti- mate system in advance was supported by the University administration. Consensus being that a larger computer to replace the IBM 1620 would be delivered within two years, computer programs were planned to be useful for twenty-four to thirty-six months. Work on developing the book order system was begun in March, 1965; perhaps an all-time speed record was achieved when the system was put into use on July 1 of the same year. Work on a circulation control sys- tem was begun in August and on February 21, 1966, it too was ready. Phases three and four, serials acquisitions and the printed book catalog, were by then being held in abeyance until larger computer equipment should become available to the library. At Oakland University all computer and related services are pro- vided by the Computing and Data Processing Center. The computer sys- tem includes the following pieces of equipment: IBM 1620 Computer, 40K with Monitor 1 and additional instructions feature (MF, TNF, TNS) IBM 1622 card reader/ punch (240 cpm/ 125 cpm) Two IBM 1311 disk drives with changeable disk packs IBM 1443 line printer ( 240 lpm) Automated Book Order/ AULD 95 Only one of the two disk drives is available for production use because the other is committed to monitor, supervisor, and stored programs. A disk pack on the IBM 1620 can accommodate two million numeric or one million alphabetic characters. The computer language used for most of the library programs is 1620 SPS (Symbolic Programming System); Fortran is used for some computational work. Equipment within the Library consists of an IBM 026 printing keypunch which is used for the order system and an IBM 357 data collection device, including a time clock, with output via a second IBM 026 printing key- punch for the circulation system. BOOK ORDER PROCEDURE As may be inferred from a birdseye view of the order system (Figure 1), the initial input to the computer is decklets of punched cards. Output from the computer is a series of printouts: purchase orders, Library of Congress card orders, Oakland University invoice-vouchers, a complete Fig. 1. Flow Chart of Book Order System . 96 Journal of Library Automation Vol. 1/ 2 June, 1968 on-order listing with title and purchase order number indices, depart- mental listings, and budget summaries. Facu1ty and library staff submit requests for book purchases to the Acquisitions Department on a specially designed Library Book Request Form (Figure 2 ) . The 5x8-inch size provides adequate room for notes, checking marks, etc., and makes for improved legibility, which in turn makes for easier, faster, and more accurate keypunching. Kttt;e libt o ry Oo~larul Untve r1 ity Jildg. Q 11ery LIBRARY BOOK REQUEST Mutt be Typ41d &JP St orch Au th o, CIJ Tit I• P' TLA PIIP Brit . ... P~o~bli•h•r and "oce r----- No. Copie• I P, bll•h Dote l fd ition J•'· _t·· ~ Mo . Yr. Cotl. ltl!qu tttt d It D!portment Cited in r---o:;;- P'rice t ·· Dept I Vando• Clau l l C Cood N•mbe • l.C. Fig. 2. Book Order Request Form. The request form calls for the bibliographic data customarily required for book purchasing, plus date of ordering, code number for the depart- ment originating the order, and vendor number. Oakland University uti- lizes campus-wide a five-digit vendor code system; since the Library's vendor numbers are a part of the University's vendor code, this interface is one of several points where the book order system ties in with other University records and procedures. A tag number is assigned to each Library Book Request Form upon its arrival in the Acquisitions Department. After routine bibliographic identi- fication is completed, decklet cards (Figure 3) are keypunched. The in- dividual cards in each decklet are kept together by the tag number, punched into columns one through five. To keep the cards in order within decklets, column six is punched to identify the type of card as 1) author, 2) title, 3) place and publisher, or 4) miscellaneous information. Column seven indicates the card number within type of card. For exam- ple, code 11 in columns six and seven wou1d be the first author card and code 12 the second. ·- Automated Book Order/ AULD 97 I I : I •I l l I l ~~ I l I I ..... ~;,, I I I I I l l ! I I I : I .,! AUTHO• ~ ~ : 0 : I!! u ~ ••• C< ANC ~u&ll&~ biSct/11~ Cll~b:S . Each book has a machine readable book card (Figure 7). The period for which the book normally circulates is indicated with a letter code punched into column one; column two identifies the collection within the library from which the material came; column three identifies · the type of material. The call number and/ or other identifying information is punched into columns four through forty-one. Column forty-two is punched with an end-of-transmission code . 104 Journal of Library Automation Vol. 1/ 2 June, 1968 .. ::;; !!~8 g c: .. ~~z z ... . 0 p 0 -1 1111 .. !,._ ~oot ~ - .... i z:o =;.: "' !::o:r.oa 1);1 ... ;!,..< .. ... '"It"' !Ill 0 $ ;;;~_. ... ;., 0 c: S!! ~~1!!1 ,. Ill: z o:= lire:~ ... ~-~ -zr.oa s 0 n_ c: ..... :IIII I:' s; S20 z-;. z =~= 0 0-it . . c: ........ .. z !o~ ... < ... ~:~ .. "' ~ . "' PI 'I t' 1 • t •h • eijn!lf!M!i!u+•+•+•+•+•+•+•+•• .. Fig. 7 . Book Card. The IBM 357 data collection device will perform only one operation without special instructions. If it is to perform more than one operation, it. must receive instructions for each variant operation and it must receive them each time the variant operation is performed. This limitation can be met in one of three ways: by not admitting variant operations, by using a cartridge as a carrier for some information, or by providing special . instructions as they are needed via a "trigger" card. Denying the existence of a variant operation was not practical, because at Oakland the identification of a borrower constitutes a set of variant operations. The Library's clientele includes not only Oakland University students, faculty, and staff, but also residents from the surrounding com- munities, area high school students, and neighboring college students. The heaviest users are Oakland's own students and faculty, who have machine readable plastic identification cards issued by the Registrar or the Personnel Office. It has been impractical for the Library to attempt to issue similar cards to guest borrowers. Thus, the identification of a bor- rower is a set of variant operations. Use of a cartridge to gain the borrower identification number would be possible but would leave the borrower identification badge unused. This badge card constitutes an official identification card and as such should be utilized throughout the University whenever practical. - . · Trigger cards to instruct the 357 in the pedormance of variant opera- tions were developed to control the recording of borrower identifica- tion and to identify discharging and certain charging functions. The use of trigger cards provides flexibility, in that machine. instructions are carried in trigger cards and are not an integral part of the book cards. A change in machine configuration would probably not require ·repunch- ing book cards for the book collection. At the same time a wide range · of 357 machine functions are made possible through ·the use of different Automated Book Order/ AULD 105 trigger cards. In short, the adoption of trigger cards provides the greatest degree of flexibility in operating the 357. In the customary borrowing procedure the student brings a book to the circulation desk and presents it, along with his machine readable student ID card, to the desk attendant. The attendant first inserts the book card into the IBM 357 data collection device, then retrieves the book card and inserts a "student badge trigger card", which activates the badge reader on the 357. Then the badge is inserted into the badge reader, completing the transaction. By remote control this has created on an IBM 026 printing keypunch a card with the following information: typical loan period, collection from which the item came, type of material, call number, borrower type, borrower's identification number, the day of the year, and the time of day secured from an on-line clock. If the borrower does not have a machine readable badge card, an alternate method of charging a book is to use a "manual entry trigger card" which activates the manual entry unit, with which can be recorded numeric information identifvine: the borrower. With special trigger cards .bo;;ks can also be charged to reserve, bindery, or "missing". Books are discharged by passing the book card through the 357 and following it by a "discharge trigger card". Monday through Friday at closing the charge and discharge cards for the day are delivered to the Computing and Data Processing Center, where they are processed by the IBM 1620 computer system. The circulation file is maintained on a disk pack similar to that for the order. system. Three reports are received from the Computing and Data Processing Center: a daily cumulative listing of all books and materials in circula- tion (Figure 8); a cumulative weekly list of all books on long-term loan; and a weekly fines-due report. In addition, overdue notices, computer printed on mailable postcard stock, are sent weekly to the Library where they are audited before being mailed. The fines-due report is arranged by borrower, bringing together in one place all of the borrower's delinquen- cies; the books which he has neglected to return are listed here, as are the overdue books which he returned through the outdoor book return chute. For the latter the number of days overdue at the time of return is listed. Subsequent refinements introduced into this system include two addi- tional reports: a pre-notice report in call number sequence produced two days in advance of the fines-due report and a listing of books discharged each day. The pre-notice report makes it possible to search the shelves for books which have been returned but, because of time lag, may still have overdue notices generated. Normal tum-around time for the system is 24 hours, but on weekends it goes to 63 hours and at certain holiday periods even higher. The daily list of discharges documents the return and discharge of each book and is used to answer the student who says, "But I returned the book." 106 Journal of Library Automation Vol. 1/2 June, 1968 S HORT TERM BOOKS IN CIRCULATION . WEDS-JUl. 13.1966 PAG F. 1 8 CALL NUMB ER BORROWER DAY OF YR DUE ODUE 01 JC0153ol.79 01 000009B74 20B 01 JC 0179 , R723 01 000007736 209 01 J C0 179oR83-1962 01 000004838 199 01 J C0 179.R86-195 4 01 000007935 209 01 J C025 lol..27 01 000009021 20 I 01 JC0421oB8Vol 01 000000207 127 * 01 J C0 4 23oi..58Co2 04 000002393 19 9 01 JK0246oB9-1895V o 2 01 00000020 7 127 * 01 JK04 2 1oP4 01 000006266 203 01 JK0421o S7 01 000006266 209 01 JK0516oS3 01 000 003891 199 01 JK0518oH6 01 000006266 209 01 JK0524ol.38 01 000007717 2 1 4 01 JK154 1oJ27 01 000006266 182 * 01 JK1561o527 01 000003891 199 01 JK1 57 1oM8 01 000003640 208 01 J K1976 oM5-Co2 0 5 0 00002256 207 01 JK2295oM5253 01 000007397 209 01 JK2372 oH5 04 000002194 2 10 01 JK 2372oP6 04 000002194 2 1 0 01 JK2408oK4 0 1 000 00020 7 146 * 01 JN6769 oA5K622 01 00000 52 31 2 13 01 J01503 o1 912 oB7 01 000003824 209 01 J01503o1911oH72 01 000003824 207 0 1 J01512oK7 01 000 003824 207 01 J S0323oC58 01 0000 07717 209 01 JS0341oW7 0 1 00 0 00 7717 2 09 01 J X14 25 oP384 04 00000 2925 213 01 JX14 28 oC 6C5-1 964 01 000004154 199 01 JX1 977o2 oC5A73 01 000009 11 9 207 01 JX1977o2oU5577 0 1 000007371 201 Fig. 8. Example of Short Term Circulation Rep01t. Maximum file capacity will permit up to about 9,000 charges at one time. Assuming an average life of four weeks for each charge, the maxi- mum number of transactions which can be accommodated in one year is about 115,000. The circulation control system utilizes eight programs. All are written in 1620 SPS and utilize 40K storage. (An additional computational pro- gram not included in the production package is written in Fortran.) With only minor modification the programs could be made to work with 20K storage. The individual programs are described in Table 2. Tabk 2. LIB 201 LIB 202 LIB 204 LIB 205 LIB 207 LIB 209 LIB 212 LIB 213 Circulation Control System Programs To update file and to print short-and long-term reports. To print overdue notices and fines-due report. Phase 1 routine for LIB 202. Cold start program to "seed" circulation file. To restart files from one term to the next. To print pre-notice report. To print daily discharges. To print circulation file or part thereof. • Automated Book Order/ AULD 107 APPRAISAL The book order system has been described as it was originally de- signed, and the circulation control system as designed and modified. A partial update together with a critical appraisal follows. Implicit in the planning of both systems was the assumption that the IBM 1620 would eventually be replaced by a larger and faster machine and that both systems would be redesigned and augmented. However, the IBM 1620 is continuing in use for a maximum rather than minimum projected time. In July, 1965, Oakland initiated an accelerated library development pro- gram. Overnight the book budget projection for several years was avail- able and in less than three months the book order system was conse- quently overloaded. With the disk Ble filled and many orders waiting, drastic action was required. The most obvious solution seemed to be use of an additional changeable disk pack to expand the purchase order file, but this procedure would have been hopelessly unwieldy. To use a second pack would re- quire either that all transactions be run against both disk packs, roughly doubling computer time and costs, or that each transaction be addressed to a particular disk pack which would necessitate extensive systems re- design. Another proposed solution was to revert to a completely manual system, but the Order Section preferred, if at all po~sible, to retain the automated fiscal control and invoice-voucher preparation features of the order system. , The alternative finally adopted required a basic philosophical change in the system. As originally designed, the system accounted for a book from the time it was placed on order to the time it was cataloged and placed on the shelf. The disk file was one-half occupied· with items re- ceived and paid for but not yet cataloged. By purging the file of such items, an on-order file in the narrowest sense was created and a doubling of file capacity gained. Now a new problem was created. How was a book to be accounted for that had been received, paid, and purged from the on-order file, but not yet cataloged? The solution was to print a second (carbon) copy of the LC card order slip which would be hand-filed into the card catalog; there it would serve as an on-order/ in-process slip until replaced by a catalog card. Hand-filed slips replacing a machine-filed list further altered the philosophical basis of the system. Discrepancies in entry do occur, but not so often that the expedient does not work. Four months later the system was again overloaded and a routine had to be devised whereby purchase orders could be issued either manually or through the computer. However, all items were still paid via the com- puter and all invoice-vouchers computer prepared. Fiscal control was re- tained even though the rationale of the system was violated . 108 Journal of Library Automation Vol. 1/ 2 June, 1968 During the summer of 1967 a change of a different nature was imple- mented. As originally designed the system provided constant communi- cation between the Library and each faculty department through the departmental report. But, after the changes described above, the depart- mental report now included less than one-half of the items being purchased with the department's book fund allocation. It had ceased to serve any purpose and was omitted after July, 1967, with a consequent reduction of nearly two-fifths of line-printer time required for the book order sys- tem. To the question, "Would it be better to return to a completely man- ual system for ordel'ing books?" the answer by the Order Section has always been "No, retention of the automated system for fiscal control and voucher preparation is preferable, even with the patched system at hand." Nor should it be forgotten that the book order system as originally designed worked well until the demand on it exceeded its production capacity. Also to be recognized is the gain in experience and insight by the library staff during these three years. Reading about or visiting someone else's work is enlightening but day-to-day work brings an under- standing for which it is difficult to obtain a substitute. ACKNOWLEDGMENTS Four persons deserve special recognition for the roles they played in the foregoing: Dr. Floyd Cammack, former University Librarian, without · whose imagination and courage library automation at Oakland would not have been attempted; Mr. Donald Mann, Assistant Director, Computing and Data Processing Center, an outstanding systems analyst and pro- grammer; Mrs. Edith Pollock, Head of the Order Section, who likes com- puters; Mrs. Nancy Covert, Head of Circulation Department, who likes students. REFERENCES I. Alanen, Sally; Sparks, David E.; Kilgour, Frederick G.: "A Computer- Monitored Library Technical Processing System," In American Docu- mentation Institute: Proceedings of the Annual Meeting, V. 3, 1966 (Woodland Hills, Calif.: Adrianne Press, 1966) p. 419-26. 2. Cox, Carl R.: "The Mechanization of Acquisitions and Circulation Procedures at the University of Maryland Library," In International Business Machines Corporation: IBM Library Mechanization Sym- posium (Endicott, N. Y.: 1964) p. 205-35. 3. Cox, Carl R.: "Mechanized Acquisitions Procedures at the University of Maryland," College & Research Libraries, 24 (May 1965) 232-36. 4. Minder, Thomas L.: "Automation-The Acquisitions Program at the Pennsylvania State University Library," In International Business Ma- chines Corporation: IBM Library Mechanization Symposium (Endi- cott, N. Y.: 1964) p. 145-56. Automated Book Order/ AULD 109 5. Minder, Thomas L.; Lazorick, Gerald: "Automation of the Penn State University Acquisitions Department" In International Business Machines Corporation: IBM Library Mechanization Symposium (Endicott, N. Y. 1964) p. 157-63. (Reprinted from American Documentation Institute: Automation and Scientific Communication; Short Papers Contributed to the Theme Sessions of the 26th Annual Meeting ... (Washington: 1963) p. 455-59. ) 6. DeJarnett, L. R. : "Library Circulation Control using IBM 357's at Southern Illinois University," In International Business Machines Cor- poration: IBM Library Mechanization Symposium (Endicott, N. Y.: 1964) p . 77-94. 7. McCoy, Ralph E.: "Computerized Circulation Work: a Case Study of the 357 Data Collection System," Library Resources & Technical Services, 9 (Winter 1965), 59-65. 2928 ---- C1'eation of compUH" "'.,---' llO in (9) presents me results 01 a comparison test 01 me first mree CREATION OF COMPUTER INPUT IN AN EXPANDED CHARACTER SET Donald V. BLACK: System Development Corporation, Santa Monica California (Formerly, University of California, Santa Cruz, Calif.) , Keypunching of an expanded character set for library catalog data is described. The set included 101 different characters. Source documents were shelf list cards, the master record at the University of California Library, Santa Cruz. At the end of February, 1967, some 50 million characters, 1'epresenting more than 110,000 separate titles, had been punched. Some of the considerations leading to the adoption of this method for the creation of machine readable input are given, and details on costs and production rates. For manipulation by a computer, data must be converted to machine readable fornl. There are still only a few reasonably flexible means of creating machine readable records, especially if the data include an. ex~ panded character set. Five possible methods utilize one of the fnUoWlllg. standard keypunch, paper tape-producing typewriter, optical character reader, keyboard device that encodes dh'ectly onto magnetic tap~, or f keyboard tenuinal that inputs directly into a computer. DescriptIOns. 0 some of these methods are available in the literature. The Johns Hopkin! University (1) used optical character recognition which can handle a ft~_ > alphanumeric representation, whereas Southern Illinois (2) used mar. sense scanning to convert only a limited amount of information. Car~ wright (3) and IBM (4) discuss direct computer input from a keybo~r terminal. Buckland (5) discusses the use of the paper tape-produc:nf typewriter. Hammer (6) and Kilgour (7) discuss keypunching. PatflC t( 8) discusses several methods of conversion, but only in the abstrac · ...Cbap\110ds above.does not discuSS the relative merlts 01 these methods, but. 'fb paper ts the details of a system that has converted approximately es 11resen . h 2 1ra"eris;fuon characters 01 library catalog data on more t an 0 anguag , 1 500 to et of 101 characters. with • ~ iversity 01 california at Santa Cruz is one 01 three university '!'h. n recently established by the State. It opened lor business in the ~Pls~~65 with a core collection 01 some 55,000 titles in approximately fw000 volun,es. Early in the operation 01 the Library, it was decided to SO, achin ro as much as possible; therelore me existing catalog emods eos. as had to be converted il the original collection were to be a part 'f,~e lutur machine system. The creation 01 the core collection lor the e ;)"ee neW campuses 01 the University 01 Calilornia has been described in the literature (10). METHODSBids were sought to convert the catalog records during the summer 01 )965. The shell list record produced by me new campuses' project was the master record and was to be me source lor conversion. Unlortunately, the shell list consisted 01 both printed Library of Congress caIds and cards produced at me new campuses' project Irom typewritten multilim mas' ters. No editing was to be done on me shell list caIds. The only addition was the stamping 01 an arbitrary number using a five-digit automatic numbering machine, the purpose 01 the number being to keep individual punch cards together for each entry. Weighing me responses to me request lor bids was a disherutening experience. Only lour responses were received Irom a total 01 15 requests sent out. The bid request did not specify the method to be used to convert to machine readable form, but only the resulting machine read­ able record. Since the specifications had used punch cards as an example, P,:,haps this limited the minking 01 some 01 the organizations involved, With the result that they did not choose to bid, e Three .bids were based on keypunching. One was from Florida and me .ompleXlhes of the task made the choice of such a distant company :mpossible. II problems had arisen during the course 01 me conversion, ravel costs would have been excessive. cF'0ther .response estimated the cost to be about $1.50 per record. early, tllls was too costly, and since bids of this nature are apt to be ~::ervative in 'h. matter ~f ~Itimate tot~1 costs, we I~lt the choice 01 eth an mgam7.abon to do tne lob would, mdeed, result m a target figure at would be too high. I ?nly one bid used optical scanning as the method 01 conversion. Un· orufately, the bid was for me scanning only, and Library staff members wou d have had to retype the records for the scanner. Since the cost the scanJling alone was close to 301 a title, that bid was also lJaSed"'ll~.~being ultUnately more costly, choice 01 a kelY"nching service in San Francisc~ was made m 1'b" fill . 01 its pr<>xirillty to Santa Cruz, on the enthuSlas 01 the '" tbO b""J, task to be undertaken, and on a reasonable cost estimate. lJidDor I -"" .... N <: ' .... ..... dedco, col '" 0 because it was aVlUlable on an IBM/1401 computer at the Los.<: " >,. +J ... CO COIr\, P CO I I ..... >, I -::t -::t cbg'j':. caIDpus 01 the University (UCLA). At that time, it was the only >< tdp (V") I IS '"' $,,; ~;:1~ ~with sucb a printer on the West Coast. The character set had been ! g .& ~ .~ ~ ro...... ... z ..., Joe Joe 0) f CO CO I, ted by librarians at UCLA from characters offered by IBM in the ..; ~~ Pi~ '}J~~ co al IJ) Joe 0 I • ..-i .;.> ~~§IJ)'rlPi ... ~~ (J) ~er of 1964 for the 1403 printer.'" >r< A 'rle:;j~~g ~ ...... ~g lor the special characters is descnbed in the tables. There are'-" d ,....-! ~ -rl" (V") col col ~ 0 0 .;.> ~ Ul
  • ~ Q) Q} , o;:j Q.! 0) Pi 0) .c ~rl+J roOO"'CO I I .;.> > u ~ .... ..-i 0 '" ~ a.g -a:ci~~ 0J,'f'cr~ ~ e ~ ~ ~ ~ ~ ~ ~ ;I;«: ~ ~ E-o ~ (J) E-o U '" 0:l.!ll8c"l«:~~~~;:1~ 01 the character; obtaining a centered minus requires a multiple punch u ~ '" ~ -&3 (11. ). The underscore prints in a space by itself, just as do other char· 0:; 'B Po< ~ acters. It requires special programming to overprint this character by 0UJ co co ..... 0 ~£, , , r-I C\J (V") -::t If'\ \0ffi '):lr-co~J,~ ..... ..... suppressing paper spacing. The virgule overprint requires two columns E-o o ;:1 ..... ~ ~ CJ It> punch. Sharp.eyed readers will notice that the virgule appears twice ~~ u in Figure 1, and it has been counted twice lor the total 01 101 char­ o;J '""l acters. The blank has also been counted as a character, but the black ~ , < o lZ -A '" ... > ....... "" Ed «: .'ifJ square, which was not used at Santa Cruz, was not counted.fu ~ All data elements were encoded in fixed card fields; that is, the field for each type of inforrnation had a fixed length, generally 300 characters. ..," It was not necessary, however, to use the entire field or to fill it with g ..-i -­ :a til zeros or other codes. No terminating characters were used to separate II the fields. Each type of information was included on one or more cards'"' ~ ~ ~ ~ ~ U ~ ~ m m :arin ~ code which would tell the computer precisely what type 01UJ 1l ~ ~~ ~(Q" gA II '"' .<: a .., "" '"' ~ () as ..., 0)
  • <: U There are basically two ways that information can be encoded into ~ ,.a<>: 1-'~g~ ~~6 H~~P~" rd 0) '-" J..t ~1J)~!!~gbj'ga~p~,.... cards. This is discussed in references (3) and (6) especially. To use a.cH~MenO)~ en 0 EI ro ~ ,0 r::!30)O.rllQo)....-i~VJdCO.....-lA"II ~ ~: '" .~ ..; ... ~ ° '" °M ~ .;.> SO'"8 rl0 ~ '" h '" r-Id ~~og.;i§ n,",daIQo,"PodJl) .... p,..; u~etelY variable lonnat it is necessary to bave field delimiting codes. Cd ~ A«:~~p., (J) 0:; U W ::0: Z ~ ~ ~ «: u UJ «: ~ Po< 000Z I' ~ xed sequence 01 data elements is established (e.g., author, title, pub­ co CO CO CO ""'-!cococoEd O?Cfo'?'?'f'?'? 'f'f '" a"fu . II a number 01 individual codes are to be used to delimit fields, H u ell- ~ I B M 8 70- p r o o£ • 0033 • 00 4 3 • 0 036 • 00 4 6 • 003 9 • 00 51 ..... 0 I BM 1401 -proof • 004 6 ~ • 009 1 ~ ..... .... Proofr eaders (2) 0 ;:$ Proofr eadi ng • 0 11 5 • 004 4 • 0 11 3 • 00 4 3 . 0118 • 00 45 • 011 6 • 0044 Proofr eading and c orrecting • 0 120 • 0 0 55 • 0 12 2 • 005 5 • 0 11 9 • 0 0 54 • 009 1 • 004 1 ~ I BM 140 1 • 0149 • 0085 • 0313 • 0 156 • 023 1 • 011 6 • 024 5 • 0 112 !"""' ...... IB M 8 70-ca r d typing • 0 104 '-.... l'O Card St o c k • 0 149 • 01 49 • 01 2 5 • 0125 '--1 T O TA L • 0 9 18 • 0981 • 0884 • 09 35 § v(l) ...... <;;0 N um b er of Cards 1 5, 149 9343 27,210 28, 129 0:> 00 Number of Titles 1, 6 55 990 2 , 920 3,1 30 Cards per Titl e 9 . 2 9. 4 9. 3 9 . 0 ~--- · Costs of Librm·y Catalog Cards/KILGOUR 125 particularly among countries, time per card produced is also included in the Table to facilitate comparison with other systems. Of course, amounts of tim~ calculated by dividing elapsed time by amount of product are not directly comparable with results of time and motion studies such as Henry Voos' helpful study (7) . However, two different methods of com- paring the input costs in Table 1 with those Johnson ( 8) published for the Stanford book catalog gave divergences of only 2 and 6 per cent. Source of the increase in costs of six-tenths of a cent from the first pro- cedure to the second is entirely the increase in computer charges when the 1401 replaced the 870 to print cards. When the two-up form was employed on the computer in variant three, charges then dropped to less than the combined 1401 and 870 costs in the first procedure. Costs rose again in procedure four. Here the principal cause of the increase was the substitution of computer-produced proof listings after the 870 Document Writer had been returned to the manufacturer. Although there is no reason to think that preparation of cataloging copy on a worksheet is either more or less expensive than older techniques, coding a worksheet constitutes additional work for which there is no equivalent in classical procedures. Coding costs were examined between 9 March and 11 May 1965, when six individuals, ranging from professional catalogers to a student assistant, recorded time required to code 725 worksheets. Time per final catalog card produced was three seconds; in other words, $.003 for a cataloger receiving $7500 a year, or $.001 for a student assistant earning $1.50 an hour. If total coding cost, . rather than a portion of it, were to be charged to card production, costs reported in Table 1 could rise one- to three-tenths cents. DISCUSSION The accurate comparison of costs would be with those of systems similar to the CHY system that produce more than one product. For instance, the CHY system also produced monthly accession lists from the same punch-card decklets that produced catalog cards. The accession list was produced mechanically at a cost far less than that for the previous manual preparation. The decklets also constituted machine readable in- formation available for other purposes, most of which have not yet been realized. System costing would assign only a portion of keypunch- ing and proofreading costs to card production. Another saving was the appreciable shortening of time required for catalog cards to appear in the catalog. In procedures one through three, usually three or four days elapsed from the day on which the cataloger completed cataloging to the day on which cards were filed into the catalog. However, in procedure four, the computer, which was then a mile distant from the Medical Library, was used on two separate occasions for each batch of decklets, so that elapsed time rose to at least a week. ' i li II II '· ,, .. '· ,, ' • ,, 126 Journal of Library Automation Vol. 1/ 2 June, 1968 Even though other benefits are not reflected in comparative costs, it is clear from Fasana's findings that the CHY computer-produced cards cost far less than do LC cards, and have a similar cost to those produced mechanically on which Fasana reported. Although there appears to be no published evidence that photocopying techniques can produce finished catalog cards at less expense than 9 cents, it is possible that some photo- reproduced cards may be less expensive than those described in this article. However, it must be pointed out that photo-reproduced cards are products . of single-product procedures, whereas the CHY cards are one of several system products. Increase in cost betweEn procedure three and procedure four was due to increase in cost of prooflisting in upper and lower case on the 1401 computer as compared to prooflisting on the 870 Document Writer. This cost increase was not detected until calculations were done for this in- vestigation, and therein lies a moral. It was the policy at the Yale Library for all programming to be done by library programmers, since various inefficiences, and indeed catastrophes, had occasionally been observed when non-library personnel had pre- pared programs for library operations. The single exception to this policy was the proof program, which this investigation reveals used an exhorbi- tant amount of time-one-third of that required for subsequent card pro- duction. Since it had been felt that writing and coding a prooflisting program. was perfectly straightfmward, an outside programmer of rec- ognized ability was employed to write and code the program. Because the program was simple, and because the programmer had high compe- tence, efficiency of the program was never checked as it should have been. This episode raises the question that if even the wary can be trapped, how can the tmwary avoid pitfalls? There is no satisfactory answer, but it would appear that some difficulties could be avoided by review of new programs by experienced library programmers, of which there are un- fortunately far too few. Comparison with data such as that in Table 1 will also be helpful, but not definitive, in evaluating new programs. Of course, when widely used library computer programs of recognized efficiency are generally available, magnitude of the pitfalls will have been greatly re- duced. CONCL"QSION Computer-produced catalog cards, even when they are but one of sev- eral system products, can be prepared in finished form for a local cata- log less expensively and with less delay than can Library of Congress printed cards. Computer card production at 8.8 to 9.8 cents per completed card appears to be competitive with other procedures for preparing cata- log cards. However, undetected inefficiency in a minor program increased costs, thereby emphasizing need to insure efficiency in programs used routinely. Costs of Library Catalog Cards/ KILGOUR 127 ACKNOWLEDGEMENTS The author is most grateful to Mrs. Sarah Boyd, keypuncher extraordi- nary, who maintained the record of the data used in this study. National Science Foundation Grant No. 179 supported the CHY Proj- ect in part. REFERENCES 1. Kilgour, Frederick G.: "Mechanization of Cataloging Procedures," Bul- letin of the Medical Library Association, 53 (Aprill965), 152-162. 2. Koh, Hesung C.: "A Social Science Bibliographic System; Computer Adaptations," The American Behavioral Scientist, 10 (Jan. 1967), 2-5. 3. Summit, Roger K.: "DIALOG; An Operational On-line Reference Re- trieval System," Association for Computing Machinery, Proceedings of 22nd National Conference, (1967), 51-56. 4. Fasana, P.J.: "Automating Cataloging Functions in Conventional Li- braries," Library Resources & Technical Services, 7 ( Fall1963), 350-365. 5. Kilgour, Frederick G.: "Library Catalogue Production on Small Com- puters," American Documentation, 17 (July 1966), 124-131. 6. Weisbrod, David L.: "An Integrated, Computerized, Bibliographic System for Libraries," (In Press). 7. Voos, Henry: Standard Times for Certain Clerical Activities in Tech- nical Processing (Ann Arbor, University Microfilms, 1965). 8. Johnson, Richard D.: "A Book Catalog at Stanford~" Journal of Library Automation, 1 (March 1968), 13-50. --------------- --------- 2930 ---- 128 BELL LABORATORIES' LIBRARY REAL-TIME LOAN SYSTEM (BELLREL) R. A. KENNEDY: Bell Telephone Laboratories, Murray Hill, New Jersey Bell Telephone Laboratories has established an on-line circulation sys- tem linking two terminals in each of its three largest libraries to a central computer. Objectives include improved service through computer pooling of collections, immediate reporting on publication availability or a bor- rower's record, automatic reserve follow-up; reduced labor; and increased · management information. Loans, returns, reserves and many queries are handled in real time. Input may be keyboard only or combined with card reading, to handle all publications with borrower present or absent. BELLREL is now being used for some 1500 transactions per day. INTRODUCTION As part of a continuing program to exploit available technology to im- prove library service, the Technical Information Libraries system of the Bell Telephone Laboratories has established an on-line, real-time computer circulation network. The initial configuration links two terminals in each of the Holmdel, Murray Hill and Whippany, New Jersey, libraries to a cen- tral computer at Murray Hill. These are the three largest libraries in Bell Laboratories, handling 75% of a system total of more than 300,000 loans per year. The BELLREL system is designed to process loans, returns, reservations and queries with real-time speed and responsiveness; addi- tionally, it provides a wide range of other products and information basic to the effective control and use of library resources. The libraries of Bell Laboratories, like many other research libraries, have experienced unprecedented growth over the past decade in facilities, collections, services and traffic. New approaches have had to be found BELLREL/ KENNEDY 129 not only to supply information services of sufficient power and diversity to meet the needs of a communications research organization of over 15,000 people, but also to cope with the expanding volume of everyday work in its eighteen library units. As elsewhere, a large component of that work is circulation in all of its ramifications: direct service, record-keeping, follow-up, resource identification, inter-unit coordination, feedback for purchase and purge decisions, etc. The BELLREL system is addressed to these problems within the context of the Bell Laboratories. The use of computers in circulation control is no longer novel. The studies done by George Fry and Associates for the Library Technology Project of the American Library Association emphasize the expense of implementing computer-aided circulation systems ( 1,2). Despite these studies, which tend to focus more on the gross costs of substituting data processing for manual techniques than on the immediate and long-range gains for the library as an information system, a trend to the computer is clear. Southern Illinois ( 3), Lehigh ( 4), and Oakland ( 5) are among the many university and research libraries which have automated circulation operations using the IBM 357 Data Collection System and batch process- ing. Comparable systems are in use or planned by other libraries (6,7). Latterly there is increasing evidence of serious interest in real-time cir- culation control. The Queen's University of Belfast ( 8 ), and the State University of New York at Buffalo ( 9) are two institutions reporting studies. Redstone Arsenal has been demonstrating a two-terminal, on-line system for about a year as part of a comprehensive automation pro- gram (10). The BELLREL system was put into regular service in March, 1968, after two months of dry-run testing at all six terminals. This paper de- scribes the reasons for changing from a manual system; the objectives established for the new system; the alternatives evaluated; the principal elements, operations and services of the selected system; and problems and performance in the brief period of operations to date. The paper is essentially a summary description; it does not report in detail on all card, disk and tape formats, maintenance procedures, products, logical opera- tions, etc., of the system and its fifty-plus programs. A further report on BELLREL will be published when significant experience has accrued. THE DISPLACED MANUAL SYSTEM The Newark self-charge-signature system has been used by Bell Lab- oratories' libraries for some forty years. In this well-known simple system, the borrower writes his name and address on a prepared book card pulled from the book pocket. For the two out of three loans at Bell Labs where the borrower is not present, a circulation assistant fills out the card, which is then date stamped, tabbed for date due and filed by author. Minor variations on this practice are used for unbound journals and other items lacking book cards. 130 Journal of Library Automation Vol. 1/2 June, 1968 Reservations for individuals or other libraries in the network are hand posted on the charge card. Files are scrutinized for overdue dates every several days (latterly, less frequently as traffic has mounted) and notices prepared by Xerox copying of the charge card on an appropriate form. Although standard loan periods run from one to four weeks, depending upon the item and demand, about 30% of all loans result in overdue notices. Each library in the network has maintained its own circulation rec~ ords, including records for the local circulation of items borrowed on inter~ unit loan. Inter-unit traffic is heavy, although substantial duplication of important publications exists in the various libraries. The merits of the Newark self-charge system-simplicity, fast handling of borrowers, relatively low cost-are widely known. The system is a venerable one; it works. But all circulation systems have imperfections and in the Bell Laboratories long-recognized deficiencies of the manual system became increasingly unacceptable when loan traffic began to ap- proach, then exceed, 200,000 items per year. These deficiencies included: 1. An increasing number of hours spent on the tedious and uninspiring tasks of sorting, tagging, posting, slipping, checking and hus- banding cards. 2. Labor, frequent delays and poor service associated with process- ing over 60,000 overdue notices per year. 3. Inability automatically to use the pooled resources of several librar- ies to meet demands. 4. Inability to determine quickly not merely the holdings of other copies of a title in the library system (union catalogs serve this pur- pose, after some steps and card handling) but the availability of loan copies at the moment of need. 5. Inefficiencies in tracking down missing publications, inventory items, etc. 6. Inability to identify all publications currently on loan to a borrower or used by him sometime previously. 7. Inadequate information on collection use for resource management. 8. Excessive service delays due to combinations of the preceding factors. NEW SYSTEM OBJECTIVES The deficiencies listed above suggest some of the characteristics de- fined for the new system. Library management concluded early in 1965 that any replacement for the existing system must: 1. Meet the long-range needs of each of the major libraries in Bell Laboratories and be extensible to other units in the library network as traffic, experience and costs warranted. . 2. Provide not merely a more effective means for handling circulation operations within the walls of any one library but also, if possible, BELLREL/KENNEDY 131 an instrument for knocking walls down, for bringing the combined resources of a number of libraries to bear on any information need. 3. Handle all types of materials, bound or unbound, and all types of requests whether in person, by mail, direct telephone or recorded message (i.e., Telereference) service. 4. Give immediate up-to-the-minute accounting for all items on loan or otherwise off the shelves and locate copies still available for loan. 5. Hold reservations against system resources (in line with objective 2) and direct the first copy returned, wherever returned, and as auto- matically as practical, to the first person on the reserve queue, what- ever his base location. 6. Identify promptly all items currently charged to a borrower and, as required, previously borrowed by him. 7. Monitor circulation traffic and generate, as necessary, overdue no- tices, missing item lists, high-demand lists, zero-activity reports, statistics, use analyses and other feedback fundamental to effective control and management of the collection. 8. Lift the circulation staff from clerical tasks to more personal service to library users, in the interest of the "human use of human beings," to use Norbert Wiener's phrase. 9. Integrate the loan system with other computer-aided systems in use or planned in the libraries. 10. Improve the total response of the library to the user. SYSTEMS EVALUATED In view of these objectives it will be apparent that only a computer- aided system could be seriously considered. None of the several dozen noncomputer systems surveyed in the Fry report ( 1) could be considered a worthwhile alternative to the Libraries' manual system. The essential questions therefore became: Off-line or on-line access? Batch or real-time processing? The demonstrated success of the IBM 357 batch processing circulation system compelled study and on-site investigation in several libraries. It was concluded, however, that while the 357 system would meet a num- ber of the established goals, and at moderate cost, the important objec- tives of immediate accountability, automatic follow-up on reserves, full disclosure of copies available for loan, and automatic pooling of network resources would be seriously compromised. Further, the fact that two- thirds of all loans made in Bell Laboratories do not involve the presence of the borrower substantially detracted from one of the major virtues of the 357 system, i.e., the simplicity of input using a pre-punched man ( identi- fication) card submitted by the borrower. The various alternatives for coping with this situation in a 357 system, for 200,000 loans a year and a potential of over 15,000 people, were not attractive. I .. ' I I (. ! :: " ' r ' 132 Journal of Library Automation Vol. 1/ 2 June, 1968 The feasibility of on-line access has been widely demonstrated in the research and business world. Remote, on-line computer processing is clearly a common course of the near future. Equally predictably, it will steadily give more favorable cost/ value ratios as machine costs decrease and labor costs mount. In sum, the Technical Information Libraries con- cluded that an on-line system was worth the investment and that no other system was worth the price. Only an on-line approach would meet the overall objectives for a new system and offer advantages sufficient to justify conversion effort at this time. As Frederick Ruecking has ob- served, "A charging system should not be selected because it is 'cheaper' than others. If the selected system does not meet the present and future needs, the choice is poor." ( 11) THE BELLREL SYSTEM BELLREL is a joint development of the Technical Information Librar- ies and the Comptroller's Division of Bell Laboratories. The system was designed, programmed and implemented in a little over two years, be- ginning in late 1965. During this time, preparation of the bibliographic records, system design and programming took about seven man years. Basic Machine Elements The initial network is illustrated in Figure 1. The two IBM 1050 ter- W.£ . O ATA PHONE HOLM0£L L I BRARY ,...I,-,.,050~TE=R=MIN""A-..~ 1051 1052 1056 SELECTOR MURRAY HILL LIBRARY 1050 TERMINAL 10~0 TERMINAL 10" 105 2 lOSt 1051 1052 1056 CHANNE~ #I 1-----.------.1 CONSOlE TYPEWRITER Fig. 1. BELLREL Circulation System Network . WHIPPANY LIBRARY 105 0 TE RMINA~ r-10 __ 5_0 --TE--.M-IN-4--.~ 10" 1052 1051 1051 1052 1056 WON I TO R TE"MINAL BELLREL/ KENNEDY 133 minals in each of the three libraries incorporate keyboard, printer and card reader facilities for maximum flexibility in handling all types of transactions and queries. Each terminal is linked by telephone lines, using Western Electric 103A Data-Sets, to an IBM 360-40 computer in the Compb·oller's Division at Murray Hill. The Murray Hill Library is only a building away from the computer. The Holmdel and Whippany Libraries are about thirty and twelve miles distant, respectively. The computer, in heavy daily use along with other computers for regu- lar operations of the Comptroller's Division, has a 262,000 byte ( charac- ter) core memory. Core is partitioned, permitting effective simultaneous use of the computer for routine batch operations and the BELLREL system. In addition to core requirements for the 360 Operating System, core partitions include (a) the teleprocessing logic of the IBM Queued Teleprocessing Access Method (QTAM), (b) message editing logic and application logic packages, including library applications and (c) batch processing programs and operations for all purposes. Figure 2, a flow- chart of the real-time processing logic, illustrates core partitioning for FUNCTION PROCESS EQUIP INPUT AND OUTPUT OF INQUIRY OR TRANSACTION AT 1050 TERMINAL TELEPROCESSING LOGIC *INTER-T.ERMINAL COMMUNICATION, IF DESIRED BY LIBRARIAN MESSAGE EDITING LOGIC APPLICATION LOGIC COMMON LOGIC ROUTINES OUTPUT ~ MESSAGE ( RESPONSE) v RECEIVE a I QUEUE MESSAGE I (SWITCH) I QUEUE a SEND RESPONSE • PROCESS MESSAGE • REFER TO DISK FILES • UPDATE FILES • GENERATE RESPONSE DISK INPUT. AND OUTPUT LOGIC PROCESS LOGIC 1050 Fig. 2. General Flowchart of BELLREL Real-Time Programming Logic. 134 Journal of Library Automation Vol. 1/ 2 June, 1968 (a) and (b). In addition to the programs resident in core (portions of which can be overlaid as necessary by other real-time operations) certain programs for particular functions (e.g. loan, return, etc.) are called from disk as needed. In all, 32 real-time and 23 batch programs, together with the 360 Operating System, are used by BELLREL. The programs are written in COBOL level F and Basic Assembly Language. Disk Records Publication and man records are stored on an IBM 2314 disk pack with a capacity of some 29,000,000 characters. About two-thirds of this space is in use or dedicated. The man records, which are up-dated daily from tape used for telephone directory, payroll and other purposes, cover about 19,000 people including BTL employees and Resident Visitors, i.e., con- tractual people who may also use library facilities. Each man record is 161 characters in length and contains such information as payroll account number, name, department number, telephone number, location, oc- cupational class, space for three book loans, keys referring to overflow loan trailers elsewhere on disk, etc. The man file is organized by payroll account number, a five-digit number which is keyed in or read from a pre- punched card for all loans, reservations and other transactions requiring it. Access to man records on disk is by the IBM Index Sequential Access Method ( ISAM). Publication records vary in format, length and method of access de- pending upon the class of publication. Five classes of publications are currently in the system: books (Class 1), journals (Class 2), trade cata- logs (Class 3), college catalogs (Class 4) and Dewey-classified continua- tions and multiple-volume titles cataloged as sets (Class 5). Other classes of information, e.g., documents, motion picture films , etc. will be added. Each title in each class is assigned a unique six-digit identification num- ber, the first digit of which identifies the class. A typical number for a monograph title is 127391. The punched cards and book labels for each copy of this title also indicate the holding library and its copy number, e.g., 127391MH01, 127391 WH05. A sample card and label, generated by the computer, are shown in Figure 3. As noted above, books fall in two classes-1 and 5. Each class provides a maximum of 100,000 title numbers, more than adequate for the pre- dicted growth of the Technical Information Libraries where weeding is heavy. The book collections for the three libraries now on disk total about 33,000 titles and 66,000 volumes. The disk record for each Class 1 title is 188 characters in length and contains the book number, 43 characters of author-title, the call number, copies by location, the fields for file maintenance change infonnation, three loans, two reserves, keys to loan trailers and reserve trailers, etc. Each loan field identifies borrower, date due, copy and status of the loan (e.g., overdue number, renewed, number of reserves, returned). The BELLREL/KENNEDY 135 I I 102362MHUI I I HOLTON, G ./ SCI~NC E AND THE MOD E RN MIND II Ill II I I II II I I I I I I I I I _ .... u 500/H7 5 1111 I I I I I I I I 111111 I I BELL 'ELEPHONE LABORATORIES II I I I I I I I I I I TEtHNICAL .. IIOFI.1ATJON liiRAftiES I I I I I I I I I I I I I I 1023!>2 MH 01 HOLTON, G./ SCIENCE ANO THE MODERN MIND 500/H75 Ill I Fig. 3. BELLREL Book Card and Label. z ~ ::u rr1 3': I 0 < rr1 I I -u r rr1 l> (f) rr1 0 0 identification number for each new Class 1 book is assigned by the com- puter on update runs. Numbers are sequential. Disk access is direct. Class 5 books-cataloged continuations and multiple-volume titles cata- loged as sets-share a different kind of disk record. They could all have been entered as Class 1 items, in which case each volume of a set would have had a separate record on disk, a unique (not ~ecessarily consecu- tive) identification number, and a separate listing in the author, call number and identification number printed catalogs. The Class 5 ap- proach, however, permits grouping of volumes in sets and series. Ten volumes of one title are handled in one disk record, 288 characters in length, under the same identification number. Additional volumes, up to a total of 100, are handled in succeeding records. All of the records of the set carry the same first five digits in their identification number. Disk access is by the Index Sequential Access Method ( ISAM). In addition to grouping sets, Class 5 records effect a saving on disk space and permit use statistics to be derived for the set as a whole, as well as for each volume in the set. The principal disadvantage of the approach is that all keyed messages dealing with any volume in the set must cite both the basic access number and the specific data (e.g., volume number) perti- nent to the volume in question. The journal disk records cover all the 2700 journal titles held in the library system. Unlike books, however, records of all copies and volumes of each title are not permanently stored on disk. Instead, each !55- character journal title record contains the journal identification number ,. ,, 136 Journal of Library Automation Vol. 1/ 2 June, 1968 and 48 characters of title, plus fields for file maintenance changes, two loans, one reserve, and keys to loan and reserve trailers. Specific bound volumes or unbound issues are recorded on this record only as long as they are current loan or reserve transactions. To expedite loans and re- turns, punch cards and computer-printed labels have been prepared for some 10,000 bound journal volumes. Additional volumes are similarly processed as circulated or bound. Disk records for trade catalogs and college catalogs are also 155 char- acters long. Access to records is also by the Index Sequential Access Method. Unlike journal volumes, however, each separate catalog is specifically identified and recorded on disk. When conversion is complete, more than 5000 catalogs will be accessible on disk. The loan and reserve trailers for each publication class accommodate overflow. Trailer records vary in number and length depending upon function, publication type and predicted need. For example, 5000 31- character trailer records, each handling three reserves, are available for book reserves. For journals, 800 59-character records, each handling three reserves, are provided. The difference reflects the heavier book traffic and the particularly sharp peaking of reserves on new book titles. Apart from the normal safety back-up files (e.g., the nightly dump to tape of the current disk records), the only remaining machine record which requires mention is the history tape. This tape, up-dated daily, is a continuing record of all completed loans which provides information nec- essary for statistics and use analyses. On-Line Transactions Twenty-two different transaction codes are currently available to han- dle loans, returns, renewals, reservations and queries in real time. In addition, any terminal can call another terminal by a single digit code and one terminal in each library can call the other two libraries simultaneously by a 'broadcast' code. This inter-library, typewritten message facility is a highly useful component of the total system. Ten of the twenty-two transaction codes handle loans, returns, reserva- tions and renewals. These codes, their prime functions and associated data inputs are listed in Table 1. The eleven LQ (Library Query) codes for requesting information from BELLREL are listed in Table 2. One additional code causes the computer to print out at the query terminal a statistical log of all classes of transactions at each terminal and their totals. It also gives the number of input errors made at each terminal. The log aids in adjusting work loads and monitoring performance. Let us now consider several common transactions in more detail. Loans: If the borrower is present, he gives the desired book to the circulation clerk. He shows his badge or, alternatively, writes his surname and five- Table 1. On-Line Codes for Loans, Etc. Code Function 1. Loans LM LN 2. Returns LC LK 3. Reserves LA LB LD 4. Renewals, etc. Loan of 1-5 items for one man at one time. Overnight loan. Assigns overnight loan period automatically; does not pick-up reserves on return. Cancel loan. Charge out automatically to first person on reserve queue. Cancel loan. No automatic charge-out. Add to reserve queue. Give reserve no., copies held and available, etc. Bypass reserve queue. Put designated man first. Delete from reserve queu~. LP Change loan period assigned. LR Renew loan, once. LG Force renewal irrespective of reserves, overdue status, etc. Input Data & Method Man no., item no. (including location and copy). Usually card read. , Item no., with location and copy. Usually card read. " Man no., item no. (less location and copy). Keyed. , , New loan period, complete item no. Keyed or card read. , , t:x:j ~ ~ ~ ~ ~ ~ ........... ~ t'%:1 z z t'%:1 t1 ~ )-' ~ I ( ~ .. = ~ •• 142 Journal of Library Automation Vol. 1/ 2 June, 1968 similar messages are not accepted by the system. Recovery from errors may be done by aborting input, repeating it correctly or, if all elements are legitimate to the message edit program, by using the appropriate on- line code to correct the record. On the first day of full operations, 10% of the input transactions were incorrect. One week's experience reduced the error rate to 3%, and further improvement is expected. The .25% error rate estimated by Lazorick and Herling for a system planned to function without any pre- punched cards ( 9) appears unrealistic. Non-Personal Codes Some thirty special codes, which function like man numbers in the sys- tem, are available to handle real-time transactions involving branch li- braries, outside organizations and such internal library functions as charges to recataloging, repair, new book shelf, etc. All are three-digit codes, essentially mnemonic, e.g., AL9 - Allentown ( Pa.) Library; WI9- Withdrawn. Most of the codes generate overdue notices; the codes for binding, missing, repair and a few oth~rs do not. Several require back- up manual records, e.g., ALA interlibrary loan forms for charges to out- side libraries. Batch Processes and Products Overdue notices and daily loan lists are produced in a nightly file maintenance run which also updates the history tape. The preprinted forms used for first and second overdue notices are address-sorted for direct mailing. The third notice, triggered three days after the second and ten days after the first, is a listing with telephone numbers and other data for telephone follow-up. The daily loan list is primarily a back-up record in the event of system down-time. Current loans, the number of reserves and other information are combined in one list for all three libraries. The BELLREL master book catalog is run quarterly from disk records. Main entry, Dewey number and access number catalogs are produced. All new copy; new title and other record changes made on disk in main- tenance runs are reflected in cumulative weekly catalog supplements . These runs also produce all the new or changed cards and labels re- quired. The BELLREL catalog is a precursor to a system-wide printed book catalog which will replace nearly one million catalog cards held in eighteen libraries. When completely developed, input to the circulation system will be a sub-system of the master catalog maintenance proce- dures. Maintenance of the disk journal records for BELLREL follows a comparable integrated approach: journal code numbers, title abbrevia- tions, data changes and the like are derived from the computer routines used to prepare the serials catalog since 1962. Trade catalog files for the BELLREL/KENNEDY 141 the book and the number of copies still available for loan at each library. Getting one copy into the hands of the requester is then very simple. The holding library nearest to the borrower is instructed, by telephone or terminal message, to send call number such-and-such "out." The re- quester's name and address are not relayed. The holding library gets the book from the shelves and cancels it, using the LC command with the card reader. Although this copy was not on loan, the computer ignores this fact because someone is waiting for the book, i.e., the requester whose reserve triggered this sequence. As a consequence of the cancel operation, the requester is automatically charged with the book, the hold- ing library is told his name and address, and mailing follows. The LC command is also used in the same way to get additional copies of a book, when purchased to meet high demands, into the hands of the re- questers. The LA reserve transaction is put to particularly good use in handling the 600-plus requests received within a few days each month for new books announced in the Library Bulletin. Bulletin Request Forms sup- ply both item numbers ~nd man numbers. Mass input follows and the computer responds with all the signposts needed to put every copy in the system to work, with a dispatch speed hitherto impossible to achieve. As shown in Table 1, two transactions permit changes in reserve queues. LD deletes a requester. LB permits the queue to be bypassed and inser- tion of a new name at the top of the list. Queries This is a fact retrieval facility. The codes listed in Table 2 are reason- ably self-explanatory, and take into account the realities of on-line cir- culation service. LQC, for example, tells the status of a title at the moment of asking, an up-to-dateness not available from the backup daily loan list generated each night. Typical responses to the LQC code are: COPIES AVAILABLE, MH02 WHO!; TITLE REMOVED MY68; or ALL 03 COPIES LOANED, 14 RESERVES Similarly, LQL provides a requester with an immediate, printed listing of all the items he has on loan. Two query codes cause display of the complete disk record for a publication ( LQD) or a person ( LQE), including current loans, reserves and trailer records. Error Detection In any keyboard operation mistakes will be made. BELLREL attempts to signal critical errors and prevent them from affecting records. As noted previously, input man numbers and item numbers are translated by the computer into alpha characters. Numerous diagnostics are also returned: e.g., INVALID TRANSACTION CODE; INVALID BOOK ID #; INVALID EMPL #;INVALID TRANSACTION BAD COPY#; VAR- IABLE DATA REQUIRED, etc. Incorrect inputs generating these and Table 2. On-Line Library Query Codes. Code Query 1. Publications LQC What is the status of title . . . ? LQS What is the status of copy . . . ? LQN What overnight items are still out? LQD Display the complete disk record for title .... 2. People LQM How many items are on loan to ... ? LQL What items are charged to . . . ? LQQ Who is first on the reserve queue for . . . ? LQR Is man . . . on reserve? Where? LQW Who are the borrowers of title ... ? LQZ Who is man number ... ? LQE Display the complete disk record for man . . .. Input Data & Method (All queries are keyed.) Item no. (less location and copy). Item no. (with location and copy). Location symbol only. Item no. (less location and copy). Man no. ,, Man no., item no. (less location, copy) . " Man no. " " ~ I ..a t-'4 ... I it I g· ~ ..... ~ '-1 § s~ ..... CD &5 BELLREL/ KENNEDY 139 digit number on a card. While he is doing this, the clerk hits the 'Request" button on the keyboard-printer, and inserts the book card in the card reader along with an End-of-Transmission card. With the keyboard 'Pro- ceed' light on ( 2 seconds after 'Request'), the clerk returns the typewriter carriage and keys in LM (the loan code) and the man number obtained from the borrower: e.g., LM43486. Input is completed by activating the card reader. The card reader reads only to the End-of-Block punch (col- umn 16 in book cards ) , ignoring the author-title data and call number in columns 17-78. As the book identification number is read, it is listed on the typewriter. The loan period is not punched in the card, but is assigned by the computer on the basis of the first digit of the publication's identification number. The assigned loan may be altered from the keyboard, if desired. The computer responds to the loan transaction in 3 to 5 seconds, print- ing back in upper-case red the first three letters (trigram) of the bor- rower's surname and twenty characters of the book's author-title entry. These responses provide checks against errors in keying. Man numbers are usually keyed (although they may also be card read) and a book number is keyed when its punch card is not available, e.g., in posting a reserve. As noted below, a wide range of other computer responses are available to flag errors and aid diagnosis. The loan transaction is completed by inserting the punch card in the book and date stamping it. Total elapsed time from the borrower's presen- tation of the book to date stamping averages about 23 seconds for a single loan of the type described. This compares with about 20 seconds cited for one IBM 357 system ( 3) and 14 second~ in Bell Laboratories manual system; however, in both these systems further processing is re- quired. If a borrower wishes to charge out more than one book at a time (a common occurrence which ruled out punching the End-of-Transmis- sion code on the book card), up to five books may be handled with one keyboarding. Total elapsed time for multiple loans averages about 15 seconds per book. Loans of bound journals, trade catalogs and other publications with pre- punched cards follow the routine described. For unbound journals and other items lacking cards, it is necessary to obtain the title number from the printed catalog and to key this in with the relevant issue informa- tion. Other transaction codes, as noted in Table 1, deal with loan period changes and renewals. Typical computer responses from the renew code (LR) include: RENEW; OVERDUE; RES WAITING; NO RENEW. Returns The two-character return code ( LC) is used with card reading or typ- ing. Five items may be discharged with one LC action. The computer ·' .,, IIi '•' Ill 1;1 '• ' I ' . ~ . . ,, '•' ' o , I , I. ... I ' 140 Journal of Library Automation Vol. 1/ 2 June, 1968 responds with twenty characters of author-title, and one of the following messages, for each item: RETURNED i.e., the loan is complete and no one is waiting for a copy of this book. LOAN TO . . . i.e., send the book to the man indicated by name and address. Since he was first on the reserve queue, the book is now charged out to him for the loan period shown. MAIL TO ... LIBR i.e., this book belongs to the library shO\vn and should be returned there. No one is waiting for it. NOT ON LOAN i.e., this book was previously cancelled or some- body borrowed' it without charging it out. The LOAN TO ... response noted above is a particularly valuable serv- ice and time-saving feature. In effect, if any reserve exists anywhere in the system for the title, then the first copy returned is automatically charged to the first person in the queue and the next person moved up. The loan period assigned by the computer depends upon whether there is a waiting list for the book. The library does not need to take any charge-out · action except to date stamp the book and address a mail envelope using the information provided . The MAIL TO ... LIBR response, calling attention to the fact that the book should be returned to its 'home' library, is coupled with automatic charging by the computer to IN TRANSIT TO . . . Questions about the copy will receive this response during the time it takes to ship it to its home base. When the book is cancelled at the home library, any reser- vations made during the 'in transit' phase will cause automatic loan in the manner already described . Cancellation of a loan charge without automatic follow-up on the re- serve queue is sometimes desirable. For example, after a copy of a book has been charged to 'MISSING' and search has failed to locate it, a charge to 'LOST' may be desirable for record purposes. Use of the LK return code, instead of the normal LC, makes this possible without automatic pick-up of the reserve queue. Reservations Since reserves are posted in BELLREL in real time, any copy of a title returned, even seconds after the reserve is made, will be charged to the first man on reserve. Reserves are input using the keyboard se- quence LA Man Number, Item Number. The computer response in- cludes the standard name trigram and publication data. If all copies of the title are on loan, the computer also responds with information to the requester on where he stands; as an example, "RES #03, COPIES HELD 05". If all copies are not on loan, the response includes the call number of l BELLREL/KENNEDY 143 circulation system are similarly correlated with other existing machine processes and products. As stated earlier, much improved feedback on collection use, demand patterns, and other matters important to library management was a major goal of BELLREL. The history tapes serve this purpose both for spe- cial-purpose analyses and regular system reports. The latter include cir- culation statistics by subject class and library, laboratory location, user department and so on. Three other reports may be mentioned: 1. High-Demand List-This is a weekly list focusing attention on all titles with more than a specified number of reserves. Reserves and copies are shown by location. Previous loan totals are also given to aid in purchase decisions to meet demands. 2. Zero-Loan List-This is a semiannual listing of all titles in the col- lection with no recorded loan activity in one or more libraries for the period surveyed. A summary of previous loans is given, to help in decisions on weeding. 3. Missing Items List-This is a twice-monthly, Dewey-ordered list of all titles charged to 'MISSING.' It is used to conduct scheduled searches in all libraries until the items are converted to 'LOST' and replaced or withdrawn. OPERATING EXPERIENCE . This paper is being written after only one month's . use of BELLREL in regular service. The following observations are therefore limited. Circulation Assistants have adapted very quickly to the ·input me- chanics, familiarity with typewriter keyboards and the novelty of con- versing with the computer being contributing factors .. BELLREL appears to be regarded as a powerful and perceptive colleague with the occa- sional off moments accepted in a friend. · Burdensome tasks, such as preparing overdue notices and maintaining card records have ·been dropped with enthusiasm. Staff members are de- veloping new perspectives as they understand the functioning of an information network. The total system concept, embracing the resources of all participating libraries . and permitting one copy of a book to serve many readers without inter-library loan, is modifying many practices. Greater record accuracy, · completeness and utility is also being realized, along with significant time-savings throughout the system. The query . facility, which shows promise of being much used, provides immediate answers to certain questions which previously could not be asked and gives a glimpse of the eventual responsiveness of a complete on-line library catalog. .. · Customer reaction has ranged from some technical interest (technical staff members were consulted in the development of the system and in- formation about its purposes and functions has been widely disseminated) to more common approval and enthusiasm. The increase in time to charge ,. •' if' ,. II · It '• '•' It' ' • ,, " •' 144 Journal of Library Automation Vol. 1/ 2 June, 1968 out a book in person in BELLREL-about nine seconds more than the manual system for a single loan and two seconds more per book for multifle loans-appears to be widely accepted. Whether this ·is due to initia tolerance of a new system, or less 'work' by the borrower in the charge operation, or an appreciation that service as a whole will be faster and more responsive, is not known. It is expected that charging time will be reduced with program modifications and experience. It should also be recalled that in two out of three loans the borrower is not present: far from experiencing additional delay, he gets what he wants faster. The usual bugs in a complex of programs have arisen; certain trailers had to be enlarged; the 360 Operating System and hardware have failed several times. Down-time, under initial loads of up to 1500 on-line transactions per day, has been less than anticipated for the first month and is expected to drop sharply. About two down-times per day were experienced in the first month, about half of these being deliberate, and most recoveries have taken less than fifteen minutes. Down-time logs are used to record transactions for immediate entry into the system when it becomes alive, a similar procedure being used for after-hours loans. COSTS The costs of operating the BELLREL system are, understandably, higher than the displaced manual system, the two systems, of course, not being comparable in services and functions. In the operations which can be fairly directly correlated, BELLREL permits very significant labor savings. Appreciable materials savings are also anticipated as a result of collection pooling (leading to reduced duplication of resources in the in- dividual libraries), better inventory control, and other factors. Rental costs are the major component. Each of the six terminals, for example, with associated Data-Sets and telephone lines, costs $275 a month. Costs of the portion .of the transmission control unit and disk facil- ity used by the libraries total about $1100 a month. In addition to a small amount for materials, other costs include a share of the central process- ing unit and core memory charges, depending upon usage. To execute 1500 real-time transactions per day appears to require less than 12 minutes of main-frame computer time, but a share of the real-time terminal polling and batch processing time must also be included. However, ex- perience with the automated system has been far too brief to reach any precise cost figures for the whole system. In particular, although the dollar value of the largely intangible but very real benefits to library users and library staff can only, at least at this stage, be guessed at, BELLREL has been implemented on the premise that these benefits are major. It should be noted that the costs of the manual (Newark) system in Bell Laboratories differ greatly from the costs calculated by the Library Technology Project ( L TP) for this system in an academic library ( 12). BELLREL/KENNEDY 145 LTP cost estimates for both the Newark and the IBM 357 systems do not conform to our calculations for more reasons than can he discussed here. In the main, however, environmental conditions, strongly affecting labor costs, are too different. For example, in arriving at labor costs, L TP uses the figure of 44 overdues per 1000 circulations in academic libraries; in our library system where there are no fines or long loans, overdues total about eight times this figure. Again, as a result of book announcement services, discipline concentration and other factors, reserves in the Bell Laboratories libraries are nearly twenty times the ratio used by the Li- brary Technology Project for academic libraries. Still further, in Bell Lab- oratories some 200,000 loans per year are made without the borrower being present to fill in the loan card. These and other factors add heavily to the cost of labor. Few industrial organizations can obtain labor at the cost of $2.00 per hour cited in Library Technology Reports when person- nel benefits and other overhead are included. CONCLUSION Paul Fasana has observed: "Since cost is primarily a quantitative meas- me of a system, it is hut one of several factors (and possibly not even the most important factor) to consider in evaluating an automated system. Other factors . . . qualitative factors . . . must also he considered. . . . They include such items as operating efficiency, reliability, services rendered, and growth potential." ( 13) . A full judgment on these factors in the BELLREL system must await fmther experience hut the following observations may he made: 1) BELLREL is not an experiment; it is addressed to practical problems in an industrial library network. 2) It is not a final system; software and hardware evolution will see to that. 3) It is not a model system, trans- portable in toto to another context; any system of comparable complex- ity and investment requires careful matching to local needs and ob- jectives. BELLREL objectives, to reiterate, include improved service through computer pooling of dispersed library collections, up-to-date reporting on the status of any publication, immediate identification of all items on loan to a person and automatic follow-up on reserve queues; reduced clerical labor; better inventory control; much enriched feedback for li- brary management; more effective realization of the information network philosophy; and experience in the new era of man-machine communi- cation in a real-life environment. The evidence is strong that these ob- jectives are being achieved. ACKNOWLEDGMENTS The Technical Information Libraries gratefully acknowledge the unstint- ing and imaginative aid given by the Comptroller's Division of Bell Tel- ,, •' ~ • ' r .. ... 146 Journal of Librm·y Automation Vol. 1/ 2 June, 1968 ephone Laboratories in the design, development and operation of the BELLREL system. BIBLIOGRAPHY 1. 2. 3. 4. 5 . 6. 7. 8. 9. 10. 11. 12. 13. George Fry and Associates, Inc.: Study of Circulation Control Sys- tems (Chicago: ALA, 1961). American Library Association, Library Technology Project: The Use of Data-Processing Equipment in Circulation Control (Chicago: ALA, July 1965), Library Technology Reports. McCoy, Ralph E.: "Computerized Circulation Work: A Case Study of the 357 Data Collection System," Library Resources & Technical Se1·vices, 9 (Winter 1965), 59-65. Flannery, Anne; Mack, James D.: Mechanized Circulation System, Lehigh University Library (Center for the Information Sciences, Lehigh Univ.: Nov. 1966), Library Systems Analyses Report No. 4. Cammack, Floyd; Mann, Donald: "Institutional Implications of an Automated Circulation Study," College & Research Libraries, 28 ( March 1967), 129-32 . Cuadra, Carlos A., ed.: American Documentation Institute Annual Review of Information Science and Technology, Vol. 1. (New York: Interscience, 1966), pp. 201-4. McCune, Lois C.; Salmon, Stephen R.: "Bibliography of Library Auto- mation," ALA Bulletin, 61 (June 1967), 674-94. Kimber, RichardT.: "Studies at the Queen's University of Belfast on Real-Time Computer Control of Book Circulation," Journal of Doc- umentation, 22 (June 1966), 116-22. Lazorick, Gerald J.; Herling, John P. : "A Real Time Library Circula- tion System without Pre-Punched Cards," Proceedings of the Ameri- can Documentation Institute, v. 4 (Washington: ADI, 1967), 202-6. Croxton, F. E.: On-Line Computer Applications in a Technical Li- brary (Redstone Scientific Information Center, Redstone Arsenal, Ala- bama: November 1967), RSIC-723. Ruecking, Frederick, Jr.: "Selecting a Circulation-Control System: A Mathematical Approach," College & Research Libraries, 25 (Sept. 1964)' 385-90. American Library Association, Library Technology Project: Three Systems of Circulation Control (Chicago: ALA, May 1967), Library Technology Reports. Fasana, Paul J.: "Determining the Cost of Library Automation," ALA Bulletin, 61 (June 1967) 661. 2931 ---- 149 AN INTEGRATED COMPUTER BASED TECHNICAL PROCESSING SYSTEM IN A SMALL COLLEGE LIBRARY Jack W. SCOTT: Kent State University Library, Kent, Ohio (Formerly Lorain County Community College, Lorain, Ohio) A functioning technical processing system in a two-year community col- lege library utilizes a model 2201 Friden Flexowriter with punch card control and tab card reading units, an IBM 026 Key Punch, and an IBM 1440 computer, with two tape and two disc drives, to produce all acqui- sitions and catalog files based primarily on a single typing at the time of initiating an order. Records generated by the initial order, with slight updating of information,. are used to produce, via computer, manual and mechanized order files and shelf lists, catalogs in both the traditional 3x5 card form and book form, mechanized claiming of unfilled orders, and subject bibliographies. The Lorain County Community College, a two-year institution designed for 4000 students, opened in September 1964, with no librarian and no library collection. When the Librarian was hired in October 1964, lack of personnel, both professional and clerical, forced him to examine closely traditional ways of ordering and preparing materials, his main task being the controlled building of a collection as quickly as possible. No library having been established, there were no inflexible rules gov- erning acquisitions or cataloging and no catalogs or other files enforcing their pattern on future plans. The Librarian was free to experiment and adapt as much as he desired; and adapt and experiment he did, remem- bering, at least most of the time, the primary reasons for designing the 150 Journal of Library Automation Vol. 1/3 September, 1968 system. These were 1) to notify the vendor about what material was de- sired; 2) to have readily available information about when material had been ordered and when it might arrive; 3) to provide a record of en- cumbrances; 4) to make sure that material received was the material which had been ordered; 5) to initiate payment for material received; 6) to provide catalog copy for technical processes to use in producing card and book catalogs; 7) to provide inexpensive control cards for a circulation system; and 8) to provide whatever other statistics might be needed by the Librarian. The Librarian attended the Purdue conference on library automation (October 2-3, 1964) and an IBM conference on a-utomation held in Cleve- land (December 1964), and visited libraries with data processing instal- lations, such as the Decatur Public Library. Then an extensive literature search was run on the subject of mechanization of libraries and the avail- able material thoroughly reviewed. It was the consensus of the President, the Librarian, and the Manager of Data Processing that, as White said later, "The computer will play a major part in how libraries are organized and operated because libraries are a part of the fabric of society and computers are becoming a daily accepted part of life." ( 1) Moreover, it was agreed that the use of data processing equipment would be justified only if it made building a collection more efficient and more economical than manual methods could do. METRO}) After careful consideration of the IBM 870 Document Writing System ( 2) and the system described by Kraft ( 3) as input techniqu~s for the College Library, ·it . was decided to use the Friden Flexowriter, recom- mended both at Purdue and, in European applications, by Bernstein ( 4). Its most attractive feature was the use of paper tapes to generate various secondary. records without the necessity of proofreading each one. The College, by mid-1965, ·had the following equipment available for library use: one Friden Flexowriter (Model 2201) with Card Punch Con- trol Unit and Tab Card Reading Unit, one IBM 026 Key Punch with al- ternate programming, and guaranteed time on the college-owned IBM 1440 8K computer with two tape and lwo disc drives. To produce punched paper tape and tab cards with only one keyboarding, an electrical con- nection between the Flexowtiter and the keypunch was especially de- signed and installed. . It was fortunate for the Library that the College also had an excellent Data Processing· Manager who was interested in seeing data processing machines and techniques utilized in as many ways as possible. With his enthusiastic support, aid in programming and preparation of flow charts, and patient cooperation, it was not surprising that the automation of li- brary processes was completely successful. ·· At this time it ·was decided that since the college was likely to remain Integrated Computer Based Processing/ SCOTT 151 a single-campus institution it would be uneconomical to rely solely on a book catalog, even though the portability of such a device was most at- tractive to Librarian and faculty alike. Therefore, it was planned to have the public catalog, as well as the official shelf list, in card form, permitting both to be kept current economically. These two files were to be supple- mented with crude book catalogs which would be a by-product, among others, of the typing of the original book orders. These book catalogs were not to replace the card catalog but simply to extend and facilitate use of the collection. It was also decided to design a system which would duplicate as few as possible of the manual aspects of normal technical processing systems, but one which would, at the same time, permit the return to a manual system from a machine system with a minimum of trouble and tribulation if support for the Library's automated system should be withdrawn. Con- cern about such withdrawal of support had originally been voiced by Durkin and White in 1961, when they said: "There have been a number of unfortunate examples of libraries that abandoned their home-grown catalogs for a machine retriev(tl program because there was some free computer time, only to lose their machine time to a higher priority project and to be left with information storage to which they no longer have access. Many of these librarians, and others who have heard about their plight, are determined not to bum their bridges behind' them by abandon- ing their reliable, if old-fashioned, 3x5 card catalogs." ( 5) Although the necessity of returning to an inefficient manual system has not, to date, raised its ugly head, there were times when it was most comforting to know that routes of retreat and reformation were available. Under the present system there is only one manual keyboarding of descriptive catalog main entries for most titles. All other records are gen- erated from these main entries. This integrated system was adopted on the assumption that cataloging infonnation in some form ( 6) would be available for a high percentage of books. Experience showed that about 95 percent of acquisitions did have catalog copy readily available. Of 4029 titles processed in a 5-month period, catalog copy was available for 3824. After verification that a requested title is neither in the library nor on order, a copy of a catalog entry is located in a source such as the National Union Catalog, Library of Congress proofsheets, or Publisher's Weekly, etc. The catalog information is manually typed in its entirety (including subject headings) onto five-part multiple request forms, using the Friden Flexowriter. Output from the Friden consists of the multiple order, a punched paper tape containing the full bibliographic entry but no order information, and tab cards, punched by the slave IBM Key Punch, which contain full order information but only abbreviated bibliographic data. (Figure 1 ). The tab cards, containing full order information, are used as input to the 1440 computer to create an "on order" file arranged by order 152 /ou·rnal of Library Automation Vol. 1/ 3 September, 1968 MAIL COPIES TO VENDOR TYPED MULTIPLE BOOK ORDERS ON ORDER TAPE Fig. 1 On Order Creation Routine. START FLEXOWRITER 026 KEY PUNCH ON ORDER CARDS CARDS TO WEEK Integrated Computer Based P1'0cessing / SCOTT 153 number and stored on magnetic tape, from which an "on order" printout is produced weekly (Figure 2). At any given time this magnetic tape order file can be used to total the dollar amount of outstanding orders to any given vendor, or the total amount outstanding to all vendors (Figure 3 ). The punched paper tape and two copies of the Request Form are stored in a standard 3x5 card file arranged by main entry. One copy of the Request Form is to be used as a work slip when material is received. ON ORDER CARDS FOR ONE WEEK Fig. 2 On Order Update. START CPU ON ORDER UPDATE SCRATC H A F TER UPDATE 154 Journal of Library Automation Vol. 1/ 3 September, 1968 The original and one copy of the Request Form is sent to the vendor, with instructions to return one copy with shipment. In the event the ven- dor does not comply, the main entry can be located readily by checking the order number or order .date on the "on order" printout and using the abbreviated bibliographic information which appears there. If the material requested has not been shipped within three months, the magnetic tape order file is used to prepare tab cards containing all original order information and the cards are sent to the Library with a notice stating that shipment is overdue. These tab cards are used as input Fig. 3 On Order Cost Tally. START CPU LIST OR TAB OF ON ORDER FILE BY COST #30000 ON ORDER COST TAB Integrated Computer Based Processing/ SCOTT 155 to the Flexowriter tab card reader unit which activates the Flexowriter itself and prepares "overdue, ship or cancel" notices to the vendor (Fig- Fig. 4 Late On Order Routine. ure 4). 156 Journal of Library Automation Vol. 1/ 3 September, 1968 PRODUCTS When material is received, the paper tape and one copy of the main entry work slip are pulled from the card order file and sent to the cata- loger who notes on the work slip the call number to be used as well as any changes. The work slip, punched paper tape and book then pass to the technician who does the shelf listing. At this point the original output paper tape containing full bibliographic information is used as input for the Flexowriter to create a standard 3x5 hard-copy shelf list card containing full bibliographic information, as well as inventory data such as vendor, date of receipt and cost. The last three items and the call number are added manually as "changes." Simultane- ously a new paper tape is produced as output which contains biblio- graphic information from the first tape and all revisions deemed neces- sary by the cataloger. The revised paper tape is used on the Flexowriter to prepare 3x5 card sets for the public catalog. At the same time the slave keypunch prepares a set of tab cards containing full acquisitions Fig. 5 Shelf List Creation Routine. Integrated Computer Based Processing/SCOTT 157 information: cost, vendor, date of receipt; and abbreviated bibliographic information: short author, short title, full call number (including copy, year, part and volume), accession number and short edition statement (Figure 5). The tab cards are used first to delete the item from the mag- netic tape "on order" file and second as input to create a magnetic tape shelf list of abbreviated information arranged by call number (Figure 6). The magnetic tape shelf list is used to create 1) eight copies of author, title, and classified catalogs which are updated semi-annually; 2 ) print- outs of weekly acquisitions; 3) subject printouts on demand; and 4) tab cards which serve as circulation cards for books, film s, drawings, tape and disc recordings, filmstrips and any other materials. The tab cards can be used with the IBM 357 circulation system or any similar system. DISCUSSION The efficiency of this system is most dramatically demonstrated by the amount of work accomplished per person per year. One technician can SORT BY CALL NUMBER CPU CIRC. CARO PREP Fig. 6 Weekly Shelf List Update. SORT BY CONTROL NUMBER CPU 158 Journal of Library Automation Vol. 1/ 3 September, 1968 process over one thousand orders per month. Over fifteen thousand fully cataloged volumes per year (approximately eleven thousand titles) are added to the collection by a technical processing department which con- sists solely of one full-time cataloger and two full-time technicians. One technician spends one half of her time typing orders and the other half preparing the shelf list. At present the limiting factor in processing mate- rial is not the personnel time available but rather time on the Flexowriter- keypunch combination, which runs continuously for sixty hours per week. The cataloger feels if some thirty hours more per week were available for running the machines, or if a second Flexowriter were available to handle catalog card output, it would then be possible to order, receive, and fully process fifteen thousand titles per year (eighteen to twenty thousand volumes) with only the present technical processing staff. REFERENCES 1. White, Herbert S.: "To the Barricades! The Computers are Coming!" Special Libmries 57 (November, 1966), 631. 2. General Information Manual: Mechanized Library Procedures (White Plains, N.Y.: IBM, n.d.). 3. Kraft, Donald H .: Libmry Automation with Data Processing Equip- ment (Chicago: IBM, 1964). 4. Bernstein, Hans H.: "Die Verwendung von Flexowritern in Dokumen- tation und Bibliothek", N achrichten fur Dokumentation 12 (June, 1961), 92. 5. Durkin, Robert E.; White, Herbert S.: "Simultaneous Preparation of Library Catalogs for Manual and Machine Applications", Special Li- braries 52 (May, 1961), 231. 6. Kaiser, Walter H.: "New Face and Place for the Catalog Card", Li- brary Journal 88 (January, 1963 ), 186. 2932 ---- COST COMPARISON OF COMPUTER VERSUS MANUAL CATALOG MAINTENANCE 159 John C. KOUNTZ : County of Orange Public Library, Orange, California Is a computer assisted catalog system less expensive than . its manual counterpart? A method for comparing the two was developed and ap- plied to historical data from the Orange County Public Library. Com- parative costs obtained were $ .89 per entry for computer assisted catalog maintenance versus $1.71 for manual maintenance. INTRODUCTION Since November 1965, the County of Orange Public Library has per- formed all acquisitions by means of a computer assisted system. As a by- product of this continuing operation, records for over 30,000 titles are now available in machine readable form on magnetic tape. The next logi- cal step to realize the Library's goal of mechanizing a major portion of its many nonprofessional functions is the production of a comprehensive multi-access list of its holdings suitable for both Library and patron use; in short, a Book Catalog. The 30,000 captive entries, however, comprise only a quarter of the Library's total holdings of 120,000 titles. Before the envisioned Book Catalog can be produced, approximately 90,000 titles remain to be captured, and subsequent file handling and data printout operations must be developed. An undertaking of this magnitude naturally prompted a review of the literature. Initially, Hayes and Shoffner's work for the Stanford Univer- sity Undergraduate Library ( 1) would appear adequate. On closer exam- 160 Journal of Library Autouwtion Vol. 1/ 3 September, 1968 Fig. 1. Manual Card Catalog System. CEM~\.\UO 0~'\lii..,.IOtl~ (C~ ... "'"'""'C.T\OH) Cost Comparisonj KOUNTZ 161 Fig. 2. Proposed Computer Assisted Book Catalog System. 162 Journal of Library Automation Vol. 1/ 3 September, 1968 ination, however, their approach did not optimize the cycle for supple- ment production or catalog reprint; nor was particular attention given this problem in the Institute of Library Research Report to the California State Library ( 2). The Cartwright and Shoffner Study for the California State Library ( 3) paid close attention to cycle length, but the system therein described differed extensively from the system proposed for Orange County. Further, though the costing of data capture has been well documented and continues to appear in the literature ( 4,5,6), there is little concerning the cost of maintaining data once on file. In brief, neither a method nor basic information was available which could be ap- plied generally, although several specific approaches and results had been presented (1,7,8,9,10,11), and an approach to the analysis of manual op- erations established ( 12). When it became apparent that more than article reading .would be required, cost analysis of the existing manual operation and the proposed computer assisted Book Catalog program was performed. In addition, a method was designed to discern what cost benefit, if any, was implied in a computer maintained file before a massive keying effort and systems development should be undertaken. It is important to note that the analy- sis gives no consideration to increased level of service, esthetics, practi- cality, or the subsidiary products of a computerized system. Nor is the capital investment represented by existing card catalogs considered, as those units are assumed to have been paid for in the course of their creation. MANUAL CARD CATALOG SYSTEM The manual system to be replaced consists of individual card catalogs and shelf lists in each of the Library's service units, comprising 25 branches and a separate Bookmobile base. This system, depicted in Figure 1, consists of: 1) centralized card pro- duction, and; 2) branch catalog maintenance. In the centralized opera- tions, offset masters are created from worksheets prepared by the cata- loging section and used for two-up card production. These cards are col- lated into sets, the sets merged with their corresponding books, and the completed packages sent to the ordering branches. When book and card packages are received by the branch, shelf list and catalog cards are sorted and merged with their respective files. Withdrawal of a book (dis- carded or lost) from a branch collection triggers a reversal of this process, and all cards for the withdrawn item are purged from the files. PROPOSED COMPUTER ASSISTED BOOK CATALOG SYSTEM The computer assisted system (Figure 2) consists of tlll'ee phases of computer operation and catalog printing. In the first phase the computer receives as input magnetic tapes produced by the Library's ongoing book acquisition system and/ or the output of a device providing a direct key- Cost Comparisonj KOUNTZ 163 board to tape capability, processes the input data into updated records, and merges the updated records with the Master File of Library holdings. The first phase will build the initial Master File through capture of the Library's remaining 90,000 titles via the keyboard-to-tape device in- dicated above, and will also form the main avenue for communicating revision (update) data to the Master File. In the second phase the computer extracts two print tapes from the Master File: the first is the Biblio File, consisting of all the bibliographic data and the record number for each Master File entry in alphabetical sequence (author-title mix); the second, or Locate File, contains location codes and copy counts for each record number in numeric sequence. In the third and final phase, the Biblio and Locate Files are processed. From tl1e Biblio File are produced keylines (camera-ready copy) for the Book Catalog and periodic cumulative supplements of new entries. Out of the Locate File are generated three numeric listings: 1) a Locate List containing all entries, 2) periodic cumulative Locate Supplements, 3) Branch Inventories. In production of the Book Catalog, the computer produced keylines are used to create offset masters for printing. The end product of the printing process is 400 bound copies of the Book Catalog. FACTORS IN COST COMPARISON Following is an examination of the principal factors which must be equal or identical to permit comparative analysis of the two systems. Unit of Comparison (ENTRY) To facilitate the cost analysis between manual and computer assisted file maintenance systems, a unit of comparison was established which would be compatible to both. This unit is called the ENT}\Y, and in the analysis which follows is the basis for all cost comparisons. For the manual system, ENTRY means creation, distribution, filing and, ultimately, purging of the complete set of cards (Figure 3) pertaining to a specific book; while for the computer assisted counterpart, an entry is a record (Figure 4) in machine readable form which has been captured, sorted, listed and updated. Frequency of Transactions Either system, in addition to creating and posting new records to a file, must periodically update both entire records and the elements of those records. The number of these updates can be determined for a given period of time, and for our purposes we call this figure the fre- quency of transactions. With regard to the systems under analysis, the frequency of transactions is identical, and includes two elements: titles added and withdrawn; and volumes added and withdrawn (including re-assignments) as shown in Table 1. 164 Journal of Library Automation Vol. 1/ 3 September, 1968 Don baa 940.5472 940.5472 Sandulescu, Jacques Donbas. McKay, 1968. 217p $4.95 ESCAPES Sandulescu, Jacques Donbas. McKay, 1968, 217p $4.95 WORLD WAR, 1939-1945 - PRISONERS AND PRISONS, RUSSIAN - PERSONAL NARRATIVES 940.5472 ., 940.5472 Sandulescu, Jacques Donbas. McKay, 1968. 217p $4.95 Sandulescu, Jacques Donbas. McKay, 1968, 217p $4.95 5217S3 940,5472 940.5472 Sandulescu, Jacques · Donbaa. McKay, 1968, 217p $4.95 Sandulescu, Jacques Donbas. McKay, 1968. 217p $4.95 521763 1. World War, 1939-1945 - Prisoners and prisons, Russian - Personal narratives (WO 63866) 2. Escapes (ES14042) I, T 0 68-14127 Fig. 3. Set of Catalog Cards. 1- 1-- I. ~ M~~fR. ~ Rl::c.olitc z ~ ... 0 "' ...J "' a: "' • "' :> z 0 c: ... 0 "' ...J "' a: "' .. 2 ::> z 0 ~ <> "' .J "' 2. ::; NA.!'l!: ; - :> Sv~!:>)E<.T ~ REc.oRO:: 0 Fig. 4. "' .J "' DEP ARTM ENT ~' . . . MULTIPLE LAYOUT FORM FOR ELECTRIC ACCOUNT ING MAC HINE CARDS INTERPRETER SPACING N'-MI:: / SvaJE.CT C.ODES LC./OC. NVM8t:R • . ~ IU...F-. o-n l.tNnnc."n•• 0 c.o~ coac C.ODt c.oot. c.oo~ c.oot:. P1tlrt• NV"at:" ~ ~ ~ ~ ~ ~ ~ S' 0 0 0 0 0 0 EFFECTIVE DATE----- FILING- TITLE. v tt « « tt. II It 9 9 919 919 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 I 2 3 4 S 6 1 I S 10 11 I Jl 14 15 1l 11 II II 10 21 2 23 24 25 2l 27 2t 1'! 30 )I 32 ll 34 35 3C l1 31 Jt 41 41 4~43 44 4S 46 47 41 0 $0 51 52 S3 S4 55 ~ 51 S. 5t SO 61 ' 2 6J 64 5$ "51 61 " 70 71 11 1314 75 11 7171 7t to FIUNIPTITI..E (C.ON'r.) SVS·T\Tl.E: 9 9 9 9 9 9 9 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1 ~ 3 4 5 I 7 I t \ 0 II 12 ll 14 15 " 11 II It )(! 21 n ll 24 2'5 K 1l 21 2! 30 ll l2 33 :W l S 3l 31 31: ll 40 41 42 4J 44 '54& 41 U 49 SO 51 52 5J S4 5.5 ~ 57 5I St 50 61 CJ 63 S4 65 6' 51 M tt 70 71 ll 7l 74 15 71 71 1111 • 5UB·TtTL.E (.c.• .. T - ) D"-TE Jfvelfi$4.-FieS A~ Pte /SONS 1'?1):.~ •1\N - Pt"IU~N~ t N~tttt'A11VGS(Wll ".39,6) ;). • Fsc"PJ:s (Es 11./oi.l~) STRIPS: 6 PRICE New Ti tle 0 Recol. [=:J End Papers New Ed. D New Set c=J Bind Have D l abels I ,._., Re info rce F850-66 . 2 Bind ing: TRADE Closs: 9 01 02 1 20 21 32 53 54 55 56 61 Fig. 5. Sub Purchase Order. ANF . 1 63 71 72 73 Cost Comparisonj KOUNTZ 171 ADULT TITLES ON OROER BRANCH 11 LAGUNA BEACH 04-30-67 PAGE 3 01933.5 ARCO 03-67 $6.50 SCORING HIGH BN READING TESTS 019413 ARCO 03-67 1 S4.00 VOCABULARY, SPElliNG, GRA1<4MAR 022015 ARMSTRONG, CHARLOTTE 05-67 1 $4.95 THE CIH SHOP 022483 ASHLEY ~ONTAGU, M. F. 05-67 1 $5.95 AMERICAN WAY OF LIFE 017896 ASHUY-~ONTAGU S4.50 ON BEING HIJMAN 021535 ASI"'OV, ISAAC 05-67 1 S3.75 WHLSPRINGS OF LIFF 022599 ATTWOOD, . WIL~IA~ 05-67 1 $5.95 THE RF.DS & THF BLACKS 020713 AUCfiiNCLOSS, LOUIS 04-67 1 $4.95 TALES OF ~ANHATTAN 019960 AUER, ALFONS 03-67 1 S5.95 OPEN TO THE WORLD 022460 AUSTEN, JANE 05-67 S5.95 PRIDE & PREJUDICE 018680 AUSTEN, JANE 1 S3.95 EM'4A 021536 BAK F.R, GEOFFREY 05..;67 1 Sl5.00 MOTFLS 021666 BALOGH. THAS 05-67 $7.95 ECONOMICS OF POVERTY 018116 BANNISTER, MARGARET $4.95 6UP.N TfiE LITTLE LAMP 022093 BARING-GOULO, wiLLIAM 05-67 ss.oo THE LURE OF TH£ LIMERICK 018392 BARLOW, JAMES 1 s5.95 ONE MAN IN THE WORLD 021188 BARNETT, A •. OOAK 04-67 1 $6.00 CHINA AFTEI-t MAO Fig. 6. On Order Listing. 172 Journal of Library Automation Vol. 1/ 3 September, 1968 is taken from historical costs for the operation of a report generator doing this in the acquisitions system. Similarly, the reformatted entries in "printable" form must also be : sequenced alphabetically (single ·author- title mix) before they can be printed, and again the $0.00034 cost is taken from historical data. Finally, the sorted, reformatted entries are printed (upper case) at a cost of $0.027 each ( 2 lines). The total cost for these operations becomes $0.036 per entry, as shown in Table 6. Table 6. Keyline Production Cost Computer Reformat Master File Entries Sort Reformatted Enh·ies Print (Offline) TOTAL Entry Cost $00.009 00.00034 00.027 $00.036 For catalog printing, the computer generated keylines will be reduced photographically to 60 percent, and the reductions assembled for 16-up reproduction with approximately 100 entries per sheet (both sides). Ini- tial book catalog production will be 400 copies of approximately 1,800 sheets. There are slightly more than 61,000 author entries which will re- ceive full bibliographic data, call and LC order numbers, while the 120,000 title entries will present only author data and call number. The estimated total cost of printing is given in Table 7. The resultant. cost per entry is $00.186, regardless of the number of lines required. Table 7. Catalog Printing Cost Set-up Plates Run Time Gather/ Collate Paper Cover / Perfect Bind TOTAL: $ 3,000 7,000 6,000 2,000 4,000 350 $22,350 As the printed and bound Book Catalog will not present the locations of the materials it lists, an off-line Locate List will be produced concurrent with catalog creation. This list will contain 120,000 numeric entries ( LC order number, coded locations and price), and will be generated for Library use only. The cost of offline printing of this list ( 25 copies) is based on a historical print cost of $0.014 per one-line entry extended to the number of entries, or $1,680.00. Cost Comparisonj KOUNTZ 173 Summary of Computer Assisted Catalog Production Costs The grand total cost per entry for all operations leading to the initial book catalog (based on initial data capture through file construction above) is given in Table 8. As shown in this table, the computer cost per Table 8. Composite Cost Per Entry of a Computer Produced Book Catalog Operation File Construction Keyline Production Book Production Locate List Production TOTAL Cost $0.286 0.036 0.186 0.014 $0.52 entry for the first 'edition' of the book catalog is $0.52 .. This figure is com- parable to the manual system figure of $0.99 per entry derived earlier. However, the cost per entry figure for computer assisted file maintenance must also be derived before comparison with the total manual figure of $1.71 per entry is possible. Table 9. Cost of Posting and Printing Catalog Update Data Operation Unit Cost Total Locate Update Input Print Locate List (offline) Subtotal TOTAL for 133,000 actions Biblio Update Reformat Master File Entry Sort Reformatted Entry Print Biblio List (offline) Subtotal ( rounded) TOTAL for 9,000 actions GRAND TOTAL ( 142,000 actions) $00.007 00.014 00.021 00.009 00.00034 00.027 00.036 $2,793.00 324.00 $3,117.00 174 Journal of Library Automation Vol. 1/ 3 September, 1968 Computer File Maintenance The figures developed in Table 9 establish a cost per entry for file maintenance, from keyboarding corrected data to the production of an offline printout of Biblio and Locate supplements. To understand their derivation, let us review the frequency of transactions. In Table 1, it can be noted that the 15,000 titles added to the collection annually will ne- cessitate Master File location update for the volumes they represent. The locations for withdrawals will also require update. In combination, addi- tions and withdrawals mean a total of 129,000 actions, plus 4,000 last copy withdrawals, or 133,000 updates yearly to keep the Master Locate File current. In contrast, only the 9,000 new titles will require bibliographic listing. The Locate File update input cost is identical to that used for the entry of error correction data (Table 5). This is possible since approximately the same number of characters must be keyed to address an entry and enter updated location data ( 9 characters for record and card code, an action digit, and 2 numeric location characters for a total of 12 characters versus the 14 characters required for entry correction) . Similarly, the off- line print cost for Locate data remains the same as that indicated for the initial Locate List printout: $0.014 per entry. The Biblio File update costs are the computer keyline production fig- ures presented in Table 6. To derive the cost per entry, both Locate and Biblio figures are extended to reflect the proportion of the final figure they represent, and reduced to a single cost per entry. In summation, $0.0242 is the cost per entry for file maintenance for one year. However, this figure is of limited value without reference to either the frequency of supplemental production or total catalog reprint period. Therefore, the optimum frequency of supplement production and the period of mainte- nance are discussed below to bring this raw $0.0242 per entry into per- spective. Optimum Frequency of Supplement Production and Catalog Reprints The optimum frequency of bibliographic supplement production is based on the most timely reporting of new title disposition at the least cost. That is, a determination of the number of cumulative listings of new titles in concert with all location changes which can be produced before their production cost equals or exceeds the cost of total catalog reprint. The most economical approach to reporting revised, new, or deleted bibliographic and location entries would be through listing only those entries which have been changed. The summary figures presented in Table 10 reflect only the cost per entry developed in Table 9 for the pro- duction of cumulative exception listings, assuming an equal monthly dis- tribution of transactions. In addition, the annual cost per year, excepting the tweHth month, is tabulated to reflect overall cost where total reprint - Cost Comparison/ KOUNTZ 175 would occur instead of last cumulative supplement cycle. A quarterly supplement production cycle is selected, as it best meets the optimum defined earlier (i.e., most timely reporting for the least cost). Table 10. Cumulative Supplement Costs for Various Cycles Computer Runs Per Year 12 6 _..,.. 4 3 2 1 Annual Cost @ $0.0242/Entry 12th Month 12th Month Included Excluded $ 22,335.92 $ 18,899.63 12,027.04 8,590.74 8,590.74 5,154.44 6,872.26 3,436.30 5,154.44 1,718.15 3,436.30 By extending the quarterly supplement production costs shown in Ta- ble 10 to represent recurring annual expenses and cumulating these an- nual expenses for comparison with the total cost of complete book catalog and Locate List product, the number of years between catalog reprints becomes obvious. This calculation is shown in Table 11, where 3 years is the optimum reprint cycle for the qua1terly supple.ment costs selected. Table 11. Years I 2 _..,.. 3 4 Catalog Reprint vs. Supplement Production Costs Supplement Cost Annual Cumulative $ 8,591 17,182 25,773 34,364 12th Month Excluded $ 5,154 13,746 22,337 30,928 (Year's End) Catalog Reprint $ 23,000 24,250 25,500 26,750 COMPARISON AND CONCLUSION To return to the cost per entry for catalog maintenance alone for opti- mum reprint cycle, there is a total outlay of $47,837 for 3 years of cumu- lative supplements and a catalog reprint to report an average of 129,000 titles. From this base can be derived a cost per entry of $0.37 for entry maintenance. This $0.37 can then be summed with the $0.52 cost per entry for the catalog "first edition", for a grand total of $0.89 as the cost per entry for a computer assisted catalog production and maintenance system. Further, this cost per entry is realized in a document equal to 400 card catalogs! In terms of the manual system, maintenance was $0.72 per entry, and some 26 files had to be maintained. Thus, it is possible to extend the single file maintenance cost to a systemwide average of $18.72, plus the $00.99 required for entry preparation, or a grand total of $19.72 per entry, rather than the $1.71 indicated earlier. 176 Journal of Library Automation Vol. 1/3 September, 1968 The lesson implied here is simple: manual cost per entry is dependent upon the number of manual files being maintained. This is of importance since it means a significant increase in outlay for file maintenance with the addition of each new branch; whereas, costing for a computer pro- duced and maintained catalog is relatively independent of the number of service units accommodated. Finally, a word of caution. There is a potential danger lurking in these figures for the small public library which has a limited number of branches. This is the fact that the cost per entry, even for the single shelf list/card-catalog comparison, has been calculated for an operating system serving a relatively large number of branches. The cost-per-entry method used in this paper does not include amortization of the capital outlay for "computerization" which, in this specific case, amounts to al- most $200,000 for design of system, procedures and forms, and for design, coding and debugging of programs. Although savings equal to this amount, or more, would be realized over a period of time because of reduced clerical operations and attendant burden, a large sum would still have to be earmarked for expenditure during a relatively short period with no immediate return. Foreknowledge of this "one-shot" cost and its related cost-per-entry payoff should not be a deterrent. Rather, it should permit the administra- tor of a limited operation to deal effectively with increased clerical costs and to make meaningful decisions relative to service bureau overtures, library board interrogations, or the goals of a new library system. REFERENCES 1. Hayes, Robert M.; Shoffner, Ralph M.; Weber, David C.: "The Eco- nomics of Book Catalog Production," Library Resources and Techni- cal Services, 10 (Winter 1966), 65, 68-82, 87-88. 2. University of California, Institute of Library Research: Report to the California State Library Preliminary Evaluation of the Feasibility of Mechanization (Institute of Library Research, University of Califor- nia, 1966), p . 3-6. 3. Cartwright, Kelly L.; Shoffner, Ralph M.: Catalogs in Book Form: A Research Study of Their Implications for the California State Library and the California Union Catalog, with a Design for Their Implemen- tation (Institute of Library Research, University of California, 1967), p. 58-68. 4. Bourne, Charles : Bibliographic Data Conversion Techniques (Mimeographed tables presented at Oregon Library Mechanization Workshop, June 1968) , Table II. · 5. Chapin, Richard E.; Pretzer, Dale H.: "Comparative Costs of Con- verting Shelf List Records to Machine Readable Form," Journal of Library Automation, 1 (March 1968) , 71. L Cost Comparison j KOUNTZ 177 6. Black, Donald V.: "Creation of Computer Input in an Expanded Character Set," ] ournal of Library Automation, 1 (June 1968), 117- 118. 7. Fasana, Paul J.: "Automating Cataloging Functions in Conventional Libraries," Library Resources and Technical Services, 7 (Summer 1963), 358, 361-365. 8. Robinson, Charles W.: "The Book Catalog: Diving In," Wilson Library Bulletin, 40 (November 1965), 265-268. 9. MacQuarrie, Catherine; Martin, Beryl L.: "The Book Catalog of the Los Angeles County Public Library; How it is Being Made," Library Resources and Technical Services, 4 (Summer 1960), 225-226. 10. Heinritz, Fred: "Book versus Card Catalog Costs," Library Resources and Technical Sm·vices, 7 (Summer 1963), 231-236. 11. Smith, F. R.; Jones, S. 0.: Card Versus Book-Form Printout in a Mechanized Library System, (Douglas Aircraft Company, 1967; Clearing House Document #AD 653 697), p. 7-8. . 12. Wynar, Don: "Cost Analysis in a Technical Services Division:· Li- brary Resources and Technical Services, 7 (Fall 1963 ), 320-326. 2933 ---- 178 SUBJECT REFERENCE LISTS PRODUCED BY COMPUTER Ching-chih CHEN: Massachusetts Institute of Technology, Boston, Massa- chusetts (formerly University of Waterloo) and E. Robert KINGHAM, University of Waterloo, Waterloo, Ontario, Canada. A system developed to produce fourteen subject reference lists by IBM 360 f75 is described in detail. The computerized system has many advan- tages over conventional manual procedures. The feedback from students and other users is discussed, and some analysis of cost is included. INTRODUCTION The University of Waterloo, with the third largest enrollment in the province of Ontario, was the first in Canada to institute a "cooperative education plan". Undergraduate students enrolled in cooperative courses (all engineering and some science, mathematics and arts students) spend eight four-month terms at the University for academic studies, alternated with six four-month terms with industry or government for practical expe- rience related to their academic programmes. An IBM 360/ 75 at the University of Waterloo is the heart of the largest university computer installation in Canada, and is an important tool for faculty, students and administration. Under multi-processing it can serve many departments through terminals around the campus. One terminal serves the Data Processing Department of the Computer Centre, where all the maintenance and printing of various reports required for the pro- ject under discussion are handled for the Engineering, Mathematics & Science Library ( E.M .S. Library) . The E.M.S. Library contains approximately 75,000 volumes of mono- graphs, periodicals, technical reports and government documents, and Subject Reference Lists/ CHEN and KINGHAM 179 currently receives 1,650 periodical titles. It serves about 4,500 on-campus students and more than 300 faculty members in the fields of engineering, mathematics and science (in 1967/68), and provides assistance on request to business and industry in the area. SYSTEM Since E.M.S. Library users have frequently requested subject reference lists to guide them in the use of library materials, and library reference statistics have proved that there is a justified need for them ( 1), the ref- erence staff began, in the Fall of 1966, to investigate means of compiling and producing these lists. It was planned that each subject list should first be prepared and edited by reference librarians, but at that point, conventional manual procedures should be abandoned in favor of using the computer available on campus. In this way, operations in revising and updating the lists and in adding new lists in other subject areas would be simplified, manual clerical work would be reduced significantly (no typing would be needed) and titles related to interdisciplinary areas of study could be easily coded to appear on more than one list. Although library literature contains numerous accounts of library auto- mation programmes ( 2), it is very obvious that the chief emphasis has been on technical services and circulation applications. So far as "reference services" or "information services" go, many developments have been dis- cussed in recent years in the areas of documentation, indexing, retrieval techniques and systems, selective dissemination of information, inter- library communication, etc. . . Concise summaries . can be found in many papers (3, 4, 5, 6, 7). However, in the initial stages of developing our system, we failed to locate any existing mechanized system of producing subject bibliographies for reference use. Such subject reference lists could be easily generated if the library catalogue were in machine readable form ( 6, 8), but since a computer- ized catalogue was not foreseen at Waterloo for some time to come, the library had to design and develop an independent system to fulfil refer- ence needs. Since December 1965, the University of Waterloo Libraries have achieved success in producing a Serials List by computer. The techniques used in the original Serials Project (using an IBM 1620 with card input) which started in Spring 1965 and was completed in December 1965, were not new, and the fields and codes used were based on modifications of those used by the National Research Council Library ( 9) and Dalhousie University [the Dalhousie - AA U list] ( 10). These techniques have also been used with various modifications at several libraries in the United States, such as M.I.T. Libraries ( 11 ). In 1966, the Waterloo Serials Project was greatly modified by conversion from IBM 1620 to IBM 360, and from a card system to a tape system, by re-writing the FORTRAN II pro- 180 Journal of Library Automation Vol. 1/3 September, 1968 gramme in RPG (Report Programme Generator) and by expanding and adding certain data fields. The reference project was initiated in November 1966. It was apparent that, after the revision of the Serials Project, the newly improved serials. system could be adapted to maintain the Master File of the reference subject lists. The project is unique in that it uses a separate code structure that makes possible the provision of information from the Master File by types of materials within each subject area. It was decided that the existing Library Serials Maintenance Form could be used with minor modifications to produce reference lists. The original form was . modified to facilitate maintenance of the Master File by the lib_rary reference staff and easy transcription onto cards by keypunch op- erators. Through the use of these forms, the Master File was created and is kept up-to-date. Reference Master File There are four record types in the Master File, each of which is 64 characters in length. They are stored on tape in a blocked length of 6,400 characters for faster processing on the computer, tape being a relatively slow input-output device. The fields in each of the record types are as follows: 1. Reference 1st Record 1-7 Serial number 8-10 Record type code 1 [blank] [blank] 11 Form type 12~21 Classification number 22-32 Cutter number 33-34 Agent number 35 Country code 36 Language code 37-38 Department code 39 Serial exclusion code (for future use) 40-42 Sequence number 43 Library location 44-64 F~ller (for future use) 2; Reference Title Record 1-7 Serial number 8-10 Record type code (2NN) 11-63 Title information 64 Filler (for future use) 3. Reference Holdings 1-7 Serial number 8-10 Record type code (3NN) 11-63 Holdings information 64 Filler (for future use) ' ( Subject Refetence Lists/ CHEN and KINGHAM 181 4. Reference Notes Record 1-7 Serial number 8-10 Record type code ( 4NN) 11-63 Notes information 64 Filler (for future use) LIBRARY DATA PROCESSING YES ADDITIONS Fig. 1. Flowchart of Maintenance Run. COMPUTER PRINT RUN --, I I I I I I I I I I I I I __ J I i r ( 182 Journal of Library Automation Vol. 1/ 3 September, 1968 Programmes were written in R.P.G. ( 12) to achieve operational status rapidly with a minimum of debugging. R.P.G. is a problem-oriented Ian· guage designed to provide users with an efficient, easy-to-use technique for generating programmes. A set of specification sheets is required, on which the user makes entries. The forms are simple and the headings on the sheets are largely self-explanatory. LIBRARY I DATA PROCESSING I SOURCE '1'0 MAI~AliCE .., ___ ...._ __ -t FORM OH DISPLAY IMTBE LIBRARY Fig. 2. Flowchart of Listings Run. COMPU'rER PORLIC 'l'RANSACTION SORT PUBLIC PRINT RUN Subject Reference Lists/ CHEN and KINGHAM 183 There are three phases to the E.M.S. computer runs: 1) Maintenance Run (weekly or as required) (Figure 1); 2) Listings Run (monthly ) (Figure 2); 3) Masters Run (semi-annually) (Figure 3). . PRINTS!iOP SUBJJX:T REFERJlfCE BOOKLETS PRINTED LIBRARY AVAILABLE 'lO STUD!m'S '-----1~ AT. 25¢ P!3 COPY Fig. 3. Flowchart of Masters Run. COMPUTER 184 Journal of Library Automation Vol. 1/ 3 September, 1968 Library Maintenance Form Most of the fields on this form (Figure 4) are self-explanatory; how- ever, the following may need further definition. LIBRARY MAINTENANCE FORM SERIAL NO. [ lXI I I I I I I • I >P l • lsl•l7 le l' INSERT: "A" F OR ADDIT ION,"(;" FOR C:HANGE, OR "D" FOR DELETIO N F 0 R M <:LASS IF I CATION <: UTTER. I /K I I I I I l l_L I I I I SERIA LS- WHI TE REFERENCE - PINK AGNT. <: L DEPT . S E F. CODE T CODE N A NO. E & 8. ~ ~ ·~· SEQ. NO. 5 y <: R N N N I l 10 II 11 ll 14 IS 16 17 11 19 20 2 1 22 21 1 4 1 5 16 17 1 8 19 30 31 l2 ll H lS 16 37 38 19 40 41 42 4] 4<4 45 WHEN "CHANGE' ' HAS BEEN CHECKED ABOVE - AN O- IT AFF ECTS TiTLE HOLDINGS OR NOTES I NDIC ATE THE TYPE OF CHANG E WITH A-A DD ITION C-CHANGE 0-0ELETION HERE; PLACE A LI NE TY P E CODE T-TITLE H-HOLOING N- NOT ES HE RE;; PLACE TtlE SEQUENCE NUMBER WITt~IN T HE LINE TYPE HE RE : l 10 ll·l l 13 I S 20 25 ) 0 Fig. 4. Library Maintenance Form. Columns 1-2: Card Code There are six possible codes: 35 •o 1. A[blank] New entry to Master File. so 5 5 __l I 60 65 2. C[blank] Change to Record Type 1 (see Cols. 10 - 12 as described below). 3. CAl 4. cc~ 5. CDJ 6. D[blank] Change in lines for an existing entry on the Master File, which add, change or delete respectively title, holdings and/ or notes. Deletion of an entry from Master File. Subject Reference Lists/ CHEN and KINGHAM 185 Columns 3-9: Serial Number (Major Sequence of Master File) Serial number is assigned to every new entry to maintain the alpha- betical order of the complete listing. It consists of one alphabetic char- acter taken from the first letter of the main entry, followed by six nu- merics which serve to make each entry unique within the letter. Columns 10-12: (Minor Sequence of Master File) There are four record type codes: · · 1. Record type 1 One record permitted per entry ( informa- tion on call number, subject matter of the entry and other data). Cols. 11 & 12 not 2. Record type Tl Record type H ~ Record type N J Column 13: Form Code used. Col. 10 contains "Title", "Holdings" & "Notes" information respectively from Cols. 13-65 inclusive. Cols. 11 & 12 permit up to 99 lines per record type per entry. This alphabetic code represents form of publication, e.g., "A" stands for "Abstract", "P" stands for "Periodical'' etc ... Column 39-40: Department Code This numeric code indicates the subject list or lists which reference librarians. assign to each entry, and there are two code types: 1. Prime department numbers, of which there are 14, e.g. 20 Physics ..... to appear on the Physics list. 2. Implied department numbers : to appear on 2 or more of the prime department lists. e.g. 41 Math., Phys. & Chern. _..,... to appear on the Math., Phys. & Chemistry lists. 60 General _ ____ ..,... to appear on all fourteen sub- ject lists. etc. . . Column 42- 44: Sequence Number Col. 42 is always "R", which stands for "Reference List". Col. 43 & 44 is a numeric code which indicates type of reference materials. e.g. 12 REFERENCE BOOKS - DICTIONARIES 14 REFERENCE BOOKS- HANDBOOKS AND TABLES 60 ABSTRACTS AND INDEXES where .. REFERENCE BOOKS" & "ABSTRACTS AND INDEXES" are section headings, and "DICTIONARIES" & "HANDBOOKS AND TA- BLES" are sub-section headings. 186 Journal of Library Automation Vol. 1/3 September, 1968 Pre-edit Report The programme that produces the Pre-edit Report (Figure 5) checks the maintenance transactions for the following known error conditions: 1. Card code invalid. 2. Serial number invalid. 3. Sequence number invalid. 4. First record card columns 46-80 should be blank. 5. Agent code invalid. 6. Country code invalid. 7. Language code invalid. 8. Department number invalid. 9. Exclude code should be "X" or blank. 10. Reference code invalid. 11. Library location invalid. 12. Deletion card should be blank card columns 10-80. 13. 1st record card missing on addition. 14. Title, holding or note card sequence error. 15. Title, holding or note delete card should be blank card columns 13-80. 16. Title, holding or note, addition or change card should be blank card columns 66-80. This step catches approximately 80% of the clerical and keypunching errors. PAGE 1 REI'ERI:lIT REI'ORT liAllCR 0}, 1968 S , B,-W'T· SERIAL EX REF.LlST R-80 LlB.LOCN, 1)200000 ADDITION TITLE TOl DESIGN QUARTERLY. D200000 ADDITION HOLDING HOl 1966/67- D520700 CHANGE TITLE FROM DIRECTORY OF BRITISH SCIENTISTS, LONOON, E. BENN, 'f01 TO DIRECTORY OF BRITISH SCIENTISTS, Vi03100 CHANGE BOLDING FRO!~ LIBRARY HAS VOLS. 1-3· B01 TO LIBRARY HAS VOLS, l-It, M4o8ooo .... ATTEHPT TO ADD NEW RECORD HAS BEEN UNSUCCESSnlL - SERIAL NWmER EXISTS ALREADY .... CARD COLUMNS .. . . 5 ••• 10 ••• 15 • •• 20 ••• 25 .. . 30 ... 35 .. ,40, .. 45 ... 50 ... 55 ... 60 ... 65 ., .,INVALID CARD A M4o8oool BE331 B55 40 Rlelt A66 88 4 D36 D36 82 Ito 4 4 E9 60 4 Nl?3000 .... ATTD!PT TO CHANGE A RECORD HAS BEEN UNSUCCESSFUL - ADD T,H OR N SEQ.NO, EXISTS ALREADY .,.,CARD COLUMNS .. , .5. , .10 ... 15 ... 20 ••• 25 .. ,3Q ... 35 ... 40 ... 45 ... 50 ... 55 ... 6o .. ,65 •••• INVALID CARD CAN173QOOH02 1925-1962// 111?3000 .,.,ATTEMPT TO CHANGE A RECORD HAS BEEN UNSUCCESSFUL - ADD T,H OR N SEQ.NO. EXISTS ALREADY ,.,,CARD COLUMNS , ... 5 ... 10 ... 15 ••• 20 ••• 25 ... 30 ... 35 ••• 40 ... 45 ... 50 • • • 55 ... 60 ... 65 ... ,INVALID. CARD CAIU73000N01. SUPERSEDED BY ITS HIGHWAY RESEARCH RF.X:ORD, MASTER/l'ILE RECORDS READ 6292 lltlMBER OF RECORDS ADDED 162 NUMBER OF RECORDS DELETED 55 MASTER/FILE RECORDS 'riRITTEN 6399 lltlMBER OF INVALID MAINTENANCE RECORDS NOT PROCESSED 8 Fig. 6. Maintenance Report. 188 Journal of Library Automation Vol. 1/ 3 September, 1968 3. Two types of error conditions that fail to appear in the Pre-edit Report due to the absence of the Master File in the pre-edit pro- gramme. a. Additions where serial numbers and/ or sequence numbers ( Cols. 10- 12) exist already. b. Changes/deletions where serial numbers and/ or sequence numbers are non-existent. 4. Master File maintenance statistics on: ·~ a. Master File records read. b. Number of records added. c. Number of records deleted. d. Master File records written. e. Number of invalid maintenance records not processed. Addition List This list (Figure 7) is an alphabetical summary (in serial number se- quence) containing added entries only from the "maintenance" run. In- formation on call number, complete bibliographical data of the entry, department or subject code ( Cols. 39 - 40) and location are printed for each entry. This augments the Internal Reference List between the "list- ings" run (see Figure 2). PAGE 1 REFERENCE ADDITION LIST FOR_ WED< ENDING ~ANUARY 30, 1968. SERIAL A262000 ABS QDl A 53 D200000 PER NK1 AG CNTRY L DPl'. 85 T01 AMERICAN CHEMICAL SOCIETY • lf02 ABSTRACTS OF PAPER. HOl 196?- 60 E9 TOl DESIGN QUARTERLY • HOl 1966/6?- D56.5000 REF Z?916 D6 01 TOl DOC~TS DIGEST. H01 VOL. 16, NO. ?- Fig. 7. Addition List. LOCN ma. ma. ENG. Subject Reference Lists/ CHEN and KINGHAM 189 Internal Reference List This is a complete alphabetical list (Figure 8) of all entries on the Master File, similar to the Addition List (Figure 7) in arrangement and format. The serial number sequence facilitates the reference staff assign- ment of unused serial numbers to new entries and the easy location of serial numbers of entries for updating purposes. This document is the prime source of information for maintaining the Master File. Public List The main list (Figure 9) is first divided by subjects of which there are fourteen: Mathematics, Astronomy, Biology, Chemistry, Earth Sci- ences, Physics, Design, Management Sciences, Aero Engineering, Chemi- cal Engineering, Civil Engineering, Electrical Engineering, Mechanical Engineering and Nuclear Engineering. Each subject list is further divided into the following sections and sub-sections: 1. REFERENCE BOOKS a. GUIDES TO THE LITERATURE AND BIBLIOGRAPHIES b. PERIODICAL LISTINGS c. DICTIONARIES d. ENCYCLOPEDIAS e. HANDBOOKS AND TABLES f. DIRECTORIES -INDIVIDUALS PAGE 1 SERIAL A002500 PER TK1 A8 A020000 PER QC221 A4 A028ooo PER QD1 A325 INTERNAL REFERENCE LIST AG CNTRY 1. DPT • REF X R 48 R80 T01 ASEA JOURNAL H01 VOL. 32- 1959- N01 PUBLISHED WITH ABSTRACTS 60 RBo '1'01 ACOUSTICAL SOCIETY OF AMERICA. '1'02 JOURNAL, H01 VOL. 17• 1945/46- 44 RBo '1'01 ACTA CHEMICA SCANDINAVICA. H01 VOL. 1· 1947- Fig. 8. Internal Reference List. . FEBRUARY 5, 1968. LOCN ENG, ENG, ENG. 190 Journal of Library Automation Vol. 1/ 3 September, 1968 CHDiiSl'Rt PAGE 4 ENCYCIDPAEDIAS REF THE ENCYCLOPEDIA OF CIIDUSl'Rt. 21> ED. QD5 NEW YORK, REINHOLD PUBLISHING CORP., 1966. E.58 REF HAMPEL, CLIFFORD ALLEN, ED. QD.553 THE FZfCYCWPEDIA OF ELFX;TROC!m4ISTRY • B3 NEW YORK, REINHOLD, C1964. REF INTERNATIONAL ENCYCLOPEDIA OF CHE24ICAL SCIENCE. QD5 PRINCETON, N.J., VAN NOSTRAND, 1964. I5 REF JACOBSON, CARL ALFRED, ED. QDl55 EJICYCLOPEDIA OF CHOOCAL REACTIONS. NEW YORK, J} REINHOLD PUB. CORP., 1946-1959. 8V. KINGZE'rl', CHARLES THOMAS. KINGZE'rl'S CHEMICAL ENCYCIDPEDIA, A DIGEST OF CHEJfiS'l'RY &c ITS INOOS'l'RIAL APPLICATIONS.gr ~:;:H~._~ PRINCETON, N.J., VAN NOSTRAND, 1966. Fig. 9. Public List. g. DIRECTORIES- ORGANIZATIONS h. INTERNATIONAL CONFERENCES 2. STANDARDS AND PATENTS 3. IMPORTANT SERIES 4. THESES 5. ABSTRACTS AND INDEXES 6. PERIODICALS Reference Booklets It is planned that semiannually the E.M.S. Library will receive from the Computer Centre the computer produced Masters, which are exact duplicates of the public list except that they are printed on unlined paper with a special printer ribbon. The Masters are then sent to the University's Printshop, and the fourteen separate reference booklets are printed from offset masters photo-reduced to 75% of the original. This results in a publication of convenient size (8~"x5~") with clearer typo- graphical representation than the actual computer printout. Figure 10 shows a representative page from the Aero Engineering list of the first edition. RfF THO) F.~7 RFF TAll .E~ REF TA'I J6~ REf l~C63 A~ ~2 REF Q 17.1 1032 Subfect Refet·ence Lists/ CHEN and KINGHAM 191 AtRON, ENG , P~GE 6 T~E Et;CYtLOP(DJA OF F.NGI~EERING IUHIUALS ANO PROCESSES, NfW YCP~t R~INI'CLI) PUB, CORP,, 1'163, HCYCLOPEilU OF HCIMF.RING SIC:NS ANC SYIII'OlSo NfW YOR~, COYSSF.Y PI\FS~, C\'165, JONES, FRlN~LIN DAY, fC, ENei~FERING F.NCYCLOPEOIA, 30 f.O, NEW YORK, INOUSTI\Ul PRESS, C\'163, KEMPE S ENG INFERS YF.AR-IIOOK, 720 Ell, lCNOON, "ORGAN BROTI'F.RS, l9H, 2V, l MCCRAW-HILL ENCYCLOPEDIA QF SCIENCE ANO TECHNOLOGY, REV. f!), NEW YllRK, ~CGRAW-HLL, 1'166, 15V, ~CC~AW-Hlll YEARBOOK Of SCIE~CE ANO TECHNOLOGY, NFH YORK, ~CCRAW-HilLt 1'16?.- HANOftOOKS AND TABLES REF TJZ33 1572 TJIU6 A~6 TH07 J.7Z ~EF OClU 8~5 AMERICAN SOCIETY FOR TESTING ANn "ATF.RIAL S, COII"T TTEE ~-1 D n f 'ON-C~ROIIIUII, IRCN-CHROMTLIII- Nl CKElt ANO RELUED AllOYS COMPILATION, CCIIFILATION OF CHEll I CAL COMPOS IT IONS AND RUPTURE STRENGTHS OF SUPER-STRENGTH ALLOYS, PHILADELPHIA, 1'16 .... AMERICAN SOCIETY OF ~ECHHICAl ENGINEERS, ASME HANDBOOK, 2D F.D, NEW Y~RK, MCGRAW-HILL, 1965- LIBRARY HAS YOL o l • AIURICAN SOCIETY OF TOOl AND HAN\.IFACTURINC ENGINEERS, MACHINING THF. SPACE-AGE ~ETALS.,, CEARBORNo HICHIGAIC, 1'165. ARMCUR RESURCH FOUNCATIONt CII'ICAGO, HANDBOOK OF THHHOPHVSICAL PRI)PERTHS OF SOliD MATERIAlS, REV, EO, NeW YORK, IICIIILLAN, 1'161. 5Vo AVIATIOt; ACE RESEARCH AND DEVELOPME~T TECHNICAL HANDBOOK, l'IH-1958, BENEDICT, ROftF.RT P, HA~CftCCK CF GENERILlZEC CAS DYNAMICS , NEW YORKo PLENUH ~RESS DATA DIVISlllNt 1966, Fig. 10. Page (Actual Size) in the Aero Engineering List. •' •' ,. ' I I I I I I ' 192 Journal of Library Automation Vol. 1/ 3 September, 1968 Table 1. Information on First Edition Copies Number Estimated Copies Ordered Sold to of printing First Second Students & Subject pages cost/ copy printing printing Faculty Astronomy 9 14c 30 40 7 Biology 16 18c 90 37 Chemistry 16 18c 150 64 Earth Sciences 22 2lc 50 19 Physics 15 18c 100 44 Design 14 17c 50 12 Management Sciences 11 16c 30 100 88 Mathematics 15 18c 150 81 Aero Engineering 20 Chemical 20c 30 40 11 Engineering Civil 28 24c 100 46 Engineering Elech·ical 23 22c 100 57 Engineering Mechanical 26 23c 100 65 Engineering Nuclear 27 24c 100 44 Engineering 16 18c 30 40 5 DISCUSSION First Edition An estimated number of copies for each list, as shown in Table 1, was ordered on the basis of sttident enrollment figures in different depart- ments of the faculties of engineering, mathematics and science, and on the subject matter of each list in relation to the academic programmes of the University. It was hoped that those copies could adequately meet the demands of students, faculty and interested people outside the U ni- versity until the completion of the second edition, tentatively set then for September 1968. The first edition of the reference lists was available for distribution at the end of November 1967. Experience having shown that free library materials were no sooner received than discarded, it was decided to give some value to these lists by a charge of 25¢ per copy. From the start stu- dents responded so enthusiastically to ·the lists that one week after their , - Sub;ect Reference Lists/ CHEN and KINGHAM 193 availability, the Library had to order 100 additional copies of the second printing of the "Management Science" list, and by the end of February 1968,. 40 additional copies each of the "Aero Engineering," "Nuclear Engi- neering" -and "Astronomy" lists. Table 1 gives information on quantities printed, costs, and sale to students and faculty of first edition lists. The estimated printing cost per copy is based on printing of 100 copies. · Mter the announcement of the availability of the lists in several library professional journals, the E. M. S. Library received many letters of inquiry and requests for complimentary copies. Complimentary distribution was made of 12 sets and some 80 lists of different subjects. Purchase orders were received for 83 complete sets "Of lists, 21 from Canada, 58 from the United States, and two each from Australia and England. By the end of -March 1968, stock of the first edition was exhausted, and there were still 44 purchase orders unfilled and 28 filled only partially. Questionnaire Instead of ordering more copies of the first edition from the University f'rintshop to meet the requests received thus far, the reference staff de- cided to work on a second edition, and the original scheduled completion date of that edition was moved ahead to early June 1968. ' Although the E. M. S. Library had already received many valuable suggestions and comments on the project from Waterloo faculty and in- terested librarians in Canada and the United States, including some very enthusiastic library school professors, there was little feedback at that - time from the immediate users, the students, on their use of the lists. Since addresses and department affiliations of most of those who pur- chased lists had been recorded, it was possible to send out questionnaires (Figure 11) to 210 undergraduates, 122 graduate students and 30 faculty members in the beginning of April 1968. By April 20, 65 returns ( 31%) were received from the undergraduates, 41 ( 33.6%) from the graduates and 11 (36.6%) from the faculty. A summary of those returns, shown in Table 2, has been of great help in assessing the value of the first edition. From the replies it is certain that almost all who purchased lists found them useful and would be willing to buy the updated edition. Most im- portant, students used the list for research' 'purposes (including term pa- pers and thesis work), thus fulfilling the original purpose of the project. Another fact emerging from the questionnaire ' was that the number of serial titles included should be greatly expanded. Second Edition Reference librarians started at the end of April to update the fourteen subject reference lists by incorporating the valuable feedback and com- ments received and to compile the fifteenth list, "Optometry" (the Uni- versity of Waterloo has had a new optometry school since September 1967). Many changes, additions and deletions have been made, and the 194 Journal of Library Automation Vol. 1/3 September, 1968 UNIVERSITY OF WATERLOO - E.M,S, LIBRARY According to our records, you have purchased one (or mQre) of the reference booklets. In order to plan for a second edition, and to a ssess the value of th~ first edition, we would be most grateful if you would fill out this questionnaire as completely as possible and mail it to us before April 20 1 1968. It is not necessary to sign yo~r name. 1. HaYe you used your reference list? a, If so did you find it helpful? Yes Yes 0 No 0 No 0 0 b. For what purpose did you use the list? RegularD Studies Research o (Including Term Papers, c. Did you use the list in place of the serials list and card catalogue? 2. Did the list save you time in your use of the library? 3. Should the list include more or fewer titles? Yea Yes More a, Which area~ do you feel should be expanded or deleted? Thesis Work) 0 No 0 0 No 0 ~~ Fewer ~~ Expanded Deleted GUIDES TO THE I.ITERATURE & BIBLIOGRAPHIES •••••••••••••• PERIODICAL LI STINGS ••• , ••••••.•••••• · •••• , • •••••••• , , , • , DICTIONAR~ES • •• •••• •• .•• •. • • .••• •..•. • •• , • • , , •• , •. , • , .• ENCYCLOPEDIAS , , ••• , • , •••••••••••• • ••••• , •••••• , • , • , • , • , HANDBOOK AND TABLES , • , •••••••.•••• ••• , ••••• , ••• , •••• , •• DIRECTORIES- INDIVIDUALS •••• ••••• ••••••••••••••••••••• DIRECTO!UES - ORGANIZATIONS ••••••••••• , • . ••••••• , , • , , , • INTERNATIONAL CONFERENCES •• , ••••••••• , ••••••••••••••••• STANDARDS AND PATENTS ••••••••• • •••• •• ••••.•••••• , , , •• , • IMPORTANT SERIES , , , •• , , ••• •• • • •••••••••••• , • , , •• •• , • , , • THESES ••••• •• ••• , • , ••••••••••••••••• •• , • •• • •••• • •••• , • , ABSTRACTS AND INDEXES • , , ••••• • •••• , • , • • , • , •• , •••• , , •• , , PERIODI CALS •. , , •• , •. , . , •.•.•• .• , •• . . ••. , •. , •. , •. , •••• , , • b. Which specific titles do you feel should be added? c. Which specific titles do you feel should be deleted? 4, Would you be interes ted in buying an updated edition of the reference list? 5. Additional comments 6. Undergraduate D Graduate· 0 Faculty 0 Thank you for answering this questionnaire. If you would like to discuss further anything pertaining to the reference lists, please feel free to call us, Fig. 11. Questionnaire on Use of Reference Subject Lists. Subject Reference Lists/ CHEN and KINGHAM 195 Table 2. Summary of Questionnaire Returns Question Undergr. Grad. Fac. 1. Yes 39 30 5 No 26 11 5 la. Yes 31 26 5 No 8 3 1 lb. Studies 16 9 1 Research 27 23 5 lc. Yes 19 12 2 No 25 20 4 2. Yes 27 21 6 No 11 11 1 3. More 45 32 5 Fewer 2 3a. Handb. exp. 16 18 7 .. del. 2 1 2 Series exp. 19 8 3 .. del. 1 Theses exp. 15 14 1 .. del. Abst. exp. 15 13 3 '' del. 1 Per. eA. 28 23 6 " de. 2 1 4. Yes 23 24 6 No 17 8 6. 65 41 14 serial titles greatly expanded as requested by users. A new sub-section heading has been created under the section "REFERENCE BOOKS" for reference materials of a very general nature; thus materials such as En- cyclopaedia Canadiana, Canada Yearbook, etc ... are pulled out from sub- sections such as "REFERENCE - ENCYCLOPAEDIA" and "REFER- ENCE - HANDBOOKS & TABLES" etc . .. to the sub-section "REFER- ENCE - GENERAL" at the very beginning of each subject list. It is estimated that the second edition will be available at the beginning of June. A comparison of the two editions is shown in Table 3. COST Although up to this time, the Computer Centre has made no internal charge for its services to the Library, it is estimated that with the Univer- sity's present computer configuration, the monthly cost of maintaining 196 Journal of Library Automation Vol. 1/3 September, 1968 Table 3. First and Second Editions Compared Edition Completion Date No. of Records on Master File Addition Up-dating . Change (no. of entnes) D 1 t' e e 1011 No. of Subject Lists Number of Pages of Each Subject List Aero Engineering Chemical Engineering Civil Engineering Electrical Engineering Mechanical Engineering Nuclear Engineering Design Management Sciences Mathematics Astronomy Biology Chemistry Earth Sciences Physics Optometry I Nov./ 67 c.5,500 14 20 28 23 26 27 16 14 11 15 9 16 16 22 15 n · June/68 7,446 280 216 7 15 26 37 31 34 35 21 17 15 21 14 23 27 27 22 15 this project is approximately $40.00. This cost covers about 4 minutes/ month computer time, about 2 hours/month for keypunching and veri- fying and the cost of punch cards, multipart paper etc. . ., but does not cover the initial cost of system analysis and the charges for printing the booklets. By comparison, it would cost approximately $95.00 per month to produce the copy by hand and this method . would not provide the flexibility and other advantages of a computerized system. HEFERENCES 1. Chen, Ching-chih: "Computer-produced Subject Reference Lists," IPLO Newsletter, 9 (Feb. 1968), 38-40. · ·. 2. McCune, Lois C.; Salmon, Stephen R.: "Bibliography of Library Au- tomation," ALA Bulletin, 61 (June 1967), 674-94. · 3. Black, Donald V.; Farley, Earl A.: "Library Automation," In Annual Review of Information Science and Technology, edited by Carlos A. Cuadra (New York, Inter science: Wiley) . 1 ( 1966), · 273 - 303. 4. Schultz, Claire . K.: "Automation of Reference Work," .Libmry Trends, 12 (Jan. 1964 ), 413-424. · Subject Reference Lists/ CHEN and KINGHAM 197 5. Brownson, Helen L. : "New Developments in Scientific Documenta- tion," CLA Occasional Paper, no. 32, 1961. 6. Hammer, Donald P.: "Automated Operations in a University Library; a Summary," College & Research Libraries, 26 (Jan. 1965), 19-29, 44. 7. Prodrick, R. G.: "Automation Can Transform Reference Services," Ontario Library Review, 51 (Sept. 1967) , 145-50. 8. Cox, N. S. M.; Dews, J. D. ; Dolby, J. L.: The Computer and the Library; the Role of the Computer in the Organization and Handling of Information in Libraries (Hamden, Conn.: Archon Books, 1967), 78-84. . . 9. Brown, J. E.; Walters, Peter: "Mechanized Listing of Serials at the National Research Council Library," Canadian Library, 19 (May 1963 ), 420-26. 10. Wilkinson, John P.: "A.A.U. Mechanized Union List of Serials," APLA Bulletin, 29 (May 1965), 54 - 59. 11. Nicholson, Natalie N.; Thurston, 'Villiam : "Serials and Journals in the M.I.T. Library," American Documentation, 9 ( 1958), 304-7. 12. International Business Machines Corporation : "IBM System 360 Op- erating System-report Programme Generator Specifications," IBM System Reference Library, File no. S360-20, Form C24-3337, (IBM Programming Publications Dept. 452, San Jose, C.alif. 95114, 1965 + revisions). - 2934 ---- 198 PRODUCTION OF LIBRARY CATALOG CARDS AND BULLETIN USING AN IBM 1620 COMPUTER AND AN IBM 870 DOCUMENT WRITING SYSTEM Donald P. MURRILL: Philip Morris, Incorporated, Richmond, Virginia A program is presented which runs on an IBM 1620 Computer and pro· duces punched cards that activate an IBM 870 Document Writing System to type catalog cards in upper- and lower·case characters. Another pro· gram produces punched cards which instruct the 870 to type a library accessions bulletin. The programs are written in FORTRAN II and are de· scribed in detail. Estimates of costs and production times are included. Producing library catalog cards and accessions bulletins with the aid of a computer is not a new idea. Since 1963 several published papers have described projects that have resulted in the production of such cards and bulletins, either as the principal end products or as two products under a total systems concept. Kilgour ( 1,2,3), while with the Yale Medical Library, described a project which had been developed jointly with tl1e Harvard and Columbia Medical Libraries under a grant from the National Science Foundation. Six pro. grams were written to produce catalog cards by means of an IBM 1401 computer and either a 1403 line printer or an IBM 870 Document Writer. These programs were of interest to the Philip Morris, Incorporated, Re· search Center Library because of the upper· and lower·case printing capability. During the period 1961·1963 the Technical Library of the Bureau of Ships, Department of the Navy, used a 1401 computer to automate the preparation of 3x5 catalog cards and the library accessions bulletin. In Production of Library Catalog Cards/MURRILL 199 reports concerning Project SHARP (SHips Analysis and Retrieval Project) ( 4,5) these functions are described as being two of eleven formal outputs of the project. Production of the cards is termed a "by-product" of the accumulated bibliographic data. In an IBM publication concerning the Administrative Terminal System, Holzbaur and Farris ( 6) list the production of catalog cards and of the library bulletin as two of approximately seven outputs of a total system using the 1401 computer. IBM began automatic card preparation in 1963. Other publications by IBM also deal with the production of catalog cards and library bulletins with the help of a computer ( 7,8). In 1964 Buckland ( 9) prepared a report for the Council on Library Resources, Inc., in which he described the preparation of catalog data on a tape-punching typewriter. The perforated tape was processed by com- puter for phototypesetting, tape typewriter, or line printer output. At the 1967 meeting of the American Documentation Institute (now American Society for Information Science) Cariou ( 10) discussed the preparation of catalog cards by means of a computer and an IBM Docu- ment Writing System. She programmed her computer to count the num- ber of spaces between sentences and to use this count to determine the type of information it could expect next and, thus, what kind of processing it should give that information. The files set up by this program were used with another program to punch cards for the Document Writer. The computers used with the programs discussed in the foregoing were of the IBM 1400 series or equivalent. The Philip Morris Research Library has access to such computers but only on a limited basis. It has ready access to a 1620 computer. For this reason its programs were written for use with the 1620. The punched card output is processed by the 870 Document Writing System to produce printout in upper- and lower-case characters. The Philip Morris library contains 5000 books, subscribes to 600 journals, and serves approximately 300 people. It is growing at the rate of approxi- mately 1000 volumes per year. THE IBM 1620 COMPUTER The 1620 computer for which the programs described in this paper were written has 20,000 positions of core, and typewriter and punched card input. It has typewriter and punched card output and differs from the computers used with the ptograms mentioned earlier in not having magnetic tape or a line printer. The lack of these devices does not detract from the final output, although that output is not produced as fast with the 870 Document Writer as it would be with a printer. Also, data punched into cards cannot be stored on tape but must be saved, if de- sired, in the cards. There are many 1620 computers extant, however, and many libraries which might use this machine for the automation of their library functions. Programs which permit the use of the 1620 in the pro- 200 Journal of Library Automation Vol. 1/ 3 September, 1968 duction of catalog cards and accessions bulletins might be of help to these libraries. The programs are written in FORTRAN II, specifically for compilation with a PDQ FORTRAN processor deck, as developed by Maskiell ( 11 ) of McGraw-Edison Company. When the program for the Library Catalog Cards is compiled, columns 7-13 of card 25058 in the Fixed Format Sub- routine deck must be changed to 4903158. This eliminates the punching of sequence numbers in the computer output cards which are to be run through the 870 system, thus permitting use of the last eight columns for tum up control characters ( & ) and for the forms feed character (- ). THE IBM 870 DOCUMENT WRITING SYSTEM The 870 Document Writing System is a combination of a Control Unit, which is a card punch machine with a control panel, and a typewriter. A complete system could include an auxiliary keyboard, a paper-tape punch, an auxiliary card punch, and a second typewriter; but for an out- put of library catalog cards and bulletin only the Control Unit and one typewriter are needed. The punched cards whose contents are to be typed are placed in the hopper of the Control Unit and are passed in a continuous feed under the read head. The control panel interprets the punched characters in the cards and produces the desired alphameric symbols and punctuation marks on the typewriter. For the production of library catalog cards, continuous 3x5 card forms are put into the typewriter and the punched cards which are output from the computer, as explained in the next section, are passed through the Control Unit. Carriage turnup is controlled by a continuous chain of small beads in which four large beads are equally spaced three-and-a-half inches apart. One rotation of the chain corresponds to the turnup of four 3x5 cards. Before typing begins, one of the large beads is positioned to coin- cide with the top of a card. A special character in the last card of a unit of punched cards obtained from the computer activates the carriage con- trol, causing tumup to the next large bead and the top of the next card. Eleven special character exits on the control panel correspond to the special characters on the 836 card-punch keyboard. Any one of the exits can be wired to do a certain job when the special character is encountered in a punched card. It seems logical to have a punched period produce a typed period, a punched comma a typed comma, and a punched slash a typed slash. These three characters, along with the numbers, are lower case on the 866 typewriter. Other special characters on the typewriter, such as parentheses, brackets, colon and semicolon, are obtained by punch- ing the upper-case shift character in the card immediately before the appropriate lower-case character. The left parenthesis, for example, is produced from an upper-case 9, the right parenthesis from an upper-case 0. Reference to the typewriter keyboard gives ready knowledge of how Production of Library Catalog Gauls/ MURRILL 201 to obtain any desired character (See Table 1). Other special character exits on the control panel are wired to produce lower-case shift, single and multiple upper-case shift, typewriter carriage return, tab control, underlining (obtained from an asterisk punched immediately before the character to be underlined), and forms feed (See Table 2). Table 1. Special Character Production Card punch input @. Typewriter output @, @/ #1 #2 #3 #4 #5 #6 #7 #8 #9 #0 #. #, #I • ( @ = lower case shift, # = upper case shift) Table 2. Control Panel Wiring Typewriter 1 ON _ _ ..,.. Column zero single Card read ON ..,.. Column zero single Carriage return __ ..,.. Card drop out Special character exits - Common channel: . __ ..,.. Type-only entry . , __ ..,.. Type-only entry , / __ ..,.. Type-only entry / o __ ..,.. Type-only entry o & __ ..,.. Carriage return @ _ _ ..,.. Lower case shift # __ ..,.. Single upper case shift $ __ ..,.. Multiple upper case shift % __ ..,.. Typewriter tab control - __ ..,.. Forms feed start 1 ' I " l + [ ] ? & ( ) (underline) (Punched "&" prints as "+", punched "%" as "(", and punched ":If' as "-'' ) - . 202 Journal of Library Automation Vol. 1/ 3 September, 1968 To obtain panel control of the typewriter the star wheels of the Con- trol Unit must be engaged and a program card must be on the drum. For the library programs a blank program card is used. The same panel that is used to control the printout of the catalog cards is used for the library bulletin. The same manually punched cards are used as input data to both programs, with modifications in the case of the bulletin, as will be noted later. LIBRARY CATALOG CARDS The data cards containing the bibliographic information on the books being cataloged must be punched with care, but no worksheet need be filled out by the librarian. This saves him a great deal of time and trouble. He must designate the call number, tracings, and other details for the keypunch operator, but he can do this on a plain piece of paper, which also contains a transcription of the title page or of an LC proof slip. He does not need to remember or look up any codes, nor does he need to be concerned with where each letter will go in the punched cards. The keypuncher must know certain details, of course; e.g., that the author's name always starts in column nine, and that two blank spaces are inserted after a period, one after a comma. She must know the spe- cial characters which have been wired in the control panel to produce upper- and lower-case printout on the 866 typewriter. She must remember that the character for multiple upper-case printing will produce capital letters until a different control character is encountered, and she must punch the appropriate control characters where needed. These are details which are quickly learned with use, but because of them only one key- punch operator should be selected to handle the library data and to be responsible for production of the catalog cards. The 3x5 catalog card will hold seventeen typed lines. It was arbitrarily decided that the tracings, i.e., the subject entries and added entries, would always start on line thirteen. If there are more than twelve lines of biblio- graphic information before the tracings, the program described here will not punch the thirteenth card, and the computer processing will stop. If there are more than seventeen lines total, the program will not provide for automatic turnup to the necessary continuation card. Catalog cards which fall into these two categories must be typed manually, but only a small percentage of cards need to be prepared in this way. Computerizing the production of catalog cards enables one set of key- punched cards to produce several sets of computer punched cards and, thus, several catalog cards for each book. The necessity for typing the card manually is eliminated. In the program presented here each punched card represents one printout line on the catalog card. This makes it possible to count the number of lines before the tracings, by counting each punched card as it is read by the computer, and to skip as· many lines as is necessary to always start the tracings on line thirteen. Production of Library Catalog Cards/ MURRILL 203 To facilitate explanation in the following discussion certain terms and definitions have been assumed: a "unit" is all of the manually punched cards required for all of the catalog cards for a given book, including main entry cards and tracing cards, the latter being those with headings taken from the subject and added entries; a "set" is all of the manually punched cards required for the main entry card and not including the headings punched separately; the "body" cards are those required for the "body" of the catalog card, down to but not including the tracings. Figure 1 shows how the keypunched cards look. Class code Subject code Cutter number $TX :;@415 ij_IF29 ~ : r; ~ r;$Tx r;sTx v@415 VIF29 #FRITZSCHE #BROTHERS IINC. #GUIDE TO flAVORING INGREDIENTS AS ClASSIFIED UNDER THE #FEDERAL #FOOD, #DRUG AHD fCOSMETIC fACT . #NEII#YORK, 1966. 84P. 1 l. fFLAVORING ESSENCES. II . #T ITLE. 1 FLAVORING ESSENCES 1 #GUIDE TO flAVORING INGREDIENTS AS CLASSifiED UNDER THE #FEDERAL I FOOD, #DRUG AND #COSNETIC #ACT 1-i ..,_..- ~ Fig. 1. Keypunched Cards. II l I Unit l Certain controls must be punched into the data cards. A "1" is punched into a field called MON to indicate the last body card; a "1" is punched right-justified into a two-character field called KON to designate the last tracings card, before a repetition of the tracings to serve as headings; and "-1" is punched into the field KON to indicate the last card for a given book. One rule must be followed: if there are no non-spacing characters in a card, e.g., upper- and lower-case shift characters, punching should not extend beyond column 57. For each non-spacing character punching can be extended one column, but must not extend beyond column 65. In the READ statement of the computer program discussed here sixty- five columns of alphameric data are read and stored in an array. The contents of MON and KON are saved until the next card is read. 10 READ 11, M1(I),M2(I),M3(I), (Read statement for all lines 1M4(I),M5(I),M6(I),M7(I), on main entry cards and all 2M8(I) ,M9( I),M10(I),Mll(I), lines except headings on trac- 3Ml2(I),M13(I),MON,KON ing cards.) 11 FORMAT( 13A5,11X,2I2) 204 Journal of Library Automation Vol. 1/ 3 September, 1968 As long as MON and KON are zero, each card is punched as it is read, and the program returns to the READ statement: 15 PUNCH 8, M1(I),M2(1) ,M3(I) , (Punch statement for all lines 1M4(I),M5(I),M6(I),M7(I), except last of body of main 2M8(I ),M9(I ),M10(I) ,Mll(I) , entry cards.) 3M12 (I) ,M13(I) 8 FORMAT(13A5) 14 GO TO (32,33) ,NBC 32 I=I+1 GO TO 10 (NBC is set to "1" at begin- ning of program. ) When the last body card is read and MON=1, the program branches to a statement which stores N-1, N being the number of cards read to that point. The program then calculates the difference between thirteen and N and branches to the appropriate statement for punching the last body card and the special characters needed to produce the number of blank lines which will begin the tracings on line thirteen in the printout: 23 MAX=13-N GO TO (15,19,20,21,22,34), MAX 21 (e.g.) PUNCH 30, M1(I),M2(I), 1M3( I) ,M4( I ),M5(I) ,M6(I ), 2M7 (I),M8(I) ,M9(I ),MlO(I) , 3Mll (I) ,M12( I) ,M13( I) 30 FORMAT ( 13A5,11X,4H&&&&) (Sample punch statement for last body line; includes special characters to produce skipped lines.) The computed GO TO contains only six statement numbers to which the program can branch because of the limited memory of the 1620 computer. This means that there must be at least seven cards before the tracings and, if necessary, blank cards must be added to reach this count. The subject entries and added entries, i.e., the tracings, are read and punched, card by card, after a branch back to the READ statement ( 10). With the reading of the last tracings card, KON=l. The program branches to a statement which punches the last tracings data and a special char- acter ( - ) which, during printout, will cause the typewriter to tum up to a new 3x5 card. This completes the preparation of punched cards for the first main entry card. Most libraries need more than one such card. Additional sets of punched cards are prepared by means of a DO loop and a return to an earlier part of the program: 36 DO 6 K= 1,NB 6 PUNCH 8,Ml(K),M2(K) , 1M3 ( K) ,M4( K) ,M5( K) ,M6( K), 2M7(K) ,M8(K),M9(K),MlO(K), 3Mll(K),M12(K),M13(K) ( NB has been previously set to one less than the number of body cards.) (Punch statement for all except last body card for all main en- try cards except the first. ) MAX is again defined in statement 23 and the last body card is again Production of Library Catalog Cards/ MURRILL 205 punched. Statement 14 sends the program to statements which punch the tracings for the second and subsequent cards: 33 NIX=NB+2 IF(NOT-NIX)1,9,9 (NOT=one less than total number of cards in set.) 9 DO 47 JO=NIX,NOT 47 PUNCH 8, M1(JO),M2(JO ), 1M3(JO ),M4(JO ),M5(JO ), 2M6(JO),M7(JO),M8(JO), 3M9(JO ),M10(JO),Mll(JO ), 4M12(JO),M13(JO) 1 I=NOT+1 PUNCH 26, M1(I),M2(I), 1M3( I ),M4( I) ,M5(I ),M6( I), 2M7 (I ),M8( I ),M9( I),M10( I) , 3Mll( I) ,M12( I) ,M13(I) 26 FORMAT ( 13A5,14X,1H-) (Punch statement for all but last line of tracings. ) (Punch statement for last line of tracings; includes special character to produce type- writer turn up.) A count is kept, in a fixed point variable, NCS, of the number of card sets which have been prepared. This variable is now used to determine the next step in the program: GO TO ( 36,36,53,53,53,53,53,53,etc.) ,NCS The number of card sets punched for main entry cards is one more than the number of times "36" is inserted in the foregoing computed GO TO statement. . The headings for the tracing cards, as shown in Figure 1, are placed after the card set, thus completing the card unit. A 'T' in the MON field controls the processing of one heading at a time. If a heading requires more than one card, the control is punched into the last card. Preparation of the remaining punched cards required for the tracing cards is provided for in the following statements: 59 M=1 2 READ ll,N1,N2,N3,N4,N5,N6, 1N7,N8,N9,N10,N11,N12,N13, 2MON,KON (Read statement for headings. ) PUNCH 8, N1,N2,N3,N4,N5,N6, (Punch statement for head- 1N7,N8,N9,N10,Nll,N12,N13 ings.) M=M+1 IF(MON)2,2,51 51 IF(M-4)12,52,52 52 I= 3 PUNCH 13, M2(I),M3(I), 1M 4 (I) ,M5 (I) ,M6 (I) ,M7 (I ) , 2M8( I ),M9( I ),MlO( I) ,Mll (I), 3M12( I) ,M13( I) (Punch statement for main en- try on tracing card, drawn from fOmputer memory; class code is t>mitted.) 206 Journal of Library Automation Vol. 1/ 3 September, 1968 A header card is punched not only with the heading but also with the alphabetic class code of the book being cataloged. If there is a second card to the heading, it is punched with the subject code number. A third card, if there is one, contains the Cutter number. In the program state- ments cited above, provision is made for retrieving from the computer memory any part or parts of the call number that are not read in with the heading. Care is taken to assure that no part of the number that is not needed is retrieved. As headings are generally in capital letters, and the multiple upper-case shift character is used to produce them, it is necessary to precede the subject code number of a second card to a heading with the lower-case shift character. Otherwise, instead of the subject code number's being typed, the upper-case characters for the digits of this number will be typed. Cards for the remainder of the body of the catalog card, except the last line, are punched by use of a DO loop: M= 4 12 DO 7 J= M,NB (Punch statement for remain- der of tracing card through next-to-last body line.) 7 PUNCH 8, M1(J),M2(J),M3(J), 1M4(J),M5( J) ,M6(J ),M7 (J), 2M8 ( J),M9(J),M10(J),Mll(J), 3M12(J),M13(J) Before the last body card can be punched, the value of N must be set. Before this can be done, the value of I, subscript for the PUNCH state- ment, must be t ested, which is accomplished in a series of IF statements. I is set to NB + 1; then if there are fewer than four cards in a heading: 54 IF(I-8)18,42,43 43 IF(I-10)44,45,46 46 IF(I-12)48,49,50 18 N= 7 42 N= 8 44 N= 9 45 N= 10 48 N= ll 49 N= 12 After each statement setting N to a specific value, a GO TO statement sends the program back to statement 23. If the number of cards in a heading is four or more, the value of N is set by the following statements, MAT having been equated earlier to one more than the number of cards in the heading: 41 IF(MAT-4 )23,27,37 37 IF(MAT-6)38,39,40 27 N= N + 1 38 N=N + 2 39 N= N + 3 40 N= N + 4 ------------ - (It is assumed that no head- ing will be longer than seven lines.) --~~--==---................ """"""'""""""'-... Production of Library Catalog Cards/ MURRILL 207 Again, after each statement setting N to a specific value, a GO TO state- ment sends the program back to statement 23. . The last card of the unit of manually punched cards contains -1 in the KON field. When this control number is read and tested, the program branches to a DO loop which erases the bibliographic data stored in the computer's memory, then branches to the beginning of the program to statements which set N=O, I=l, NBC=l, and NCS= O. The computer is ready to read the first card of the next unit. Figure 2 shows a printout of a main entry card. Exact cost figures for the production of catalog cards are not available, but a good estimate would be approximately eighteen cents per card. This cost is high, perhaps, but when the cards issue from the 870 system, they are complete, including call numbers and tracings, items that are missing from the LC cards and that must be typed onto these cards. The saving is in time; the cards produced by the program described here can be ready for filing in the library within a week; delivery of LC cards is, on the average, six months after the order date. The overall cost is re- duced somewhat by the fact that the same cards that are punched for the catalog cards are used for the accessions bulletin. They can also be used for bibliographic listings under selected headings. A listing of the complete catalog card program may be obtained from the author. TX 415 F29 Fritzsche Brothers Inc. Guide to flavoring ingredients as classified under the Federal Food, Drug and Cosmetic Act. New York, 1966. ·a4p. l.Flavoring essences. I.Title. Fig. 2. Printout of Main Entry Cm·d. 208 Journal of Library Automation Vol. 1/ 3 September, 1968 LIBRARY BULLETIN The data for the Library Bulletin program consists of the cards which were punched manually for the catalog card program. The information concerning each book which is to be included in the bulletin is the choice of the individual librarian. At the Philip Morris, Incorporated, Research Center Library none of the data after the publication date is included. Care must be taken that there are at least five cards remaining for each book after unwanted cards have been discarded. The reason for this will become apparent later. The headings, consisting of the first subject entries in the tracings of the books to be listed, need to be punched. The books are grouped in the bulletin under these headings. Each of the headings is to be in upper- case letters, so the first column of each card must be punched with a dollar sign, the special character wired in the 870 control panel to pro- duce multiple upper-case printout. As with the catalog card program, certain controls are required for the bulletin program. A 'T' must be punched in the field called MON in the last card of each book set. A 'T' must be punched in a field called JON in each header card. A code punched into the cards to facilitate keeping them in the proper order is not necessary for the computer program, but it is certainly de- sirable. Therefore, a sequence code consisting of eleven digits is in each card: the first five digits designate the subject heading, the next four the author, and the last two the card sequence for each book. Judicious selec- tion of the codes makes it possible to put the cards into proper or near proper order with a card sorter - an especially useful feature for pre- paring a listing of all the titles under a given heading acquired over a period of several months. In the bulletin the titles are numbered sequentially beginning with 'T', the numbering being controlled with a fixed point variable, NO, which is set to one at the beginning of the computer program. The data cards are arranged in alphabetic order by author and are placed behind the appropriate header cards, which have also been ar- ranged in alphabetic order. The first card to be read and punched is a header card: READ 4, KA,KB,KC,KD,KE, KF, (Read statement for first head- lKG,KH,KI,KJ,KK,KL,KM,JON er card.) 4 FORMAT( 13A5,11X,ll) 15 PUNCH 3, KA,KB ,KC,KD,KE, (Punch statement for header lKF,KG,KH,KI,KJ,KK,KL,KM cards; includes special charac- 3 FORMAT ( 3H&&&, 13A5 ) ters to produce skipped lines.) The first three data cards are read in a DO loop. The three parts of the call number are stored in an array, then the main entry, in the third card, is punched, along with the sequence number and the first part of the call number, the alphabetic class code. Production of Library Catalog Cards/ MURRILL 209 DO 6 I= 1,3 6 READ 1, M (I ) ,N (I) ,LB,LC,LD, 1LE,LF,LG,LH,LI,LJ,LK,LL, 2LM,MON 1 FORMAT(A5,A4,11A5,A1,12X,Il) PUNCH 12, NO,LB,LC,LD,LE, 1LF,LG,LH,LI,LJ,LK,LL,LM, 2M( 1),N ( 1) 12 FORMAT ( IH&,1H@,I4,1H.,2X, 112A5, 1H%,A5,A4) (Read statement for call num- ber and main entry.) (Punch statement for sequence number, main entry, and class code; includes characters for skipped line, lower case, and typewriter tab control. ) The next two cards are read and new cards are punched in another DO loop. The remainder of the call number, the subject code number and the Cutter number, are punched into these two cards. If there are fewer than five cards to be processed in each book set, part of the call number will be lost. Blank cards are added, if necessary, to bring the count to five. 16 DO 7 I = 2,3 READ 17, KA,KB,KC,KD,KE, 1KF,KG,KH ,KI,KJ,KK,KL,MON 17 FORMAT(8X,11A5,A2,12X,Il) PUNCH 5, KA,KB,KC,KD,KE, lKF ,KG,KH,KI,KJ ,KK,KL,M (I), 2N(I) 5 FORMAT( 4X,1H%, 12A5,5X, 11H%,A5,A4) 7 CONTINUE (Read statement for fourth and fifth cards of set. ) (Punch statement for second and third lines of bibliographic data and remainder of call number; includes typewriter tab controls. ) The MON field in the fifth card is tested in an IF statement. If this field is zero, indicating that there are more cards in the set, the additional cards are read and new ones are punched in another pair of READ and PUNCH statements. If the MON field is ''1'', indicating that the processing of the card set is complete, one is added to the sequence number vari- able, NO, and the next card is read. If this is a header card, as indicated by ''1'' in the field called JON, the program branches back to statement 15. If, on the other hand, it is the first card of another title under the same heading, the class code is stored in an array, and the second and third cards are read in a DO loop. The main entry, the content of the third card, is then punched, along with the class code and the sequence number. 9 NO=N0 + 1 READ 4, KA,KB,KC,KD,KE,KF, 1KG,KH,KI,KJ,KK,KL,KM,MON (Read statement, for heading if JON=1, for main entry and class code of new card set if JON= 0.) 212 Journal of Library Automation Vol. 1/ 3 September, 1968 7. IBM Manual No. E20-8094 : Mechaniz ed Library Procedmes, 14. 8. IBM Manual No. E20-0093: Library Catalog Production -1 401 and 870. 9. Buckland Lawrence F. : The Recording of Library o f Congress Bibliographical Data in Machine Form. A report for the Council on Library Resources, Inc. (Washington Council on Library Resources, Inc.: November 1964 ) (rev. February 1965 ). 10. Cariou, Mavis: "A Computerized Method of Preparing Catalogue Cards, Using a Simplified Form of Data Input," Proceedings, Ameri- can Documentation Institute Annual Meeting, 4 (October 1967 ), 186-90. 11. Maskiell, Frank H.: "PDQ FORTRAN (An Interpretive Program for the FORTRAN Language )" (November 1963 ). 2935 ---- CONVERSION OF BIBLIOGRAPHIC INFORMATION TO MACHINE READABLE FORM USING ON-LINE COMPUTER TERMINALS 217 Frederick M. BALFOUR: Information Systems Engineer, Technical In- formation Dissemination Bureau, State University of New York, Buffalo, New York A description of the first six months of a profect to convert to machine readable form the entire shelf list of the Libraries of the State University of New York at Buffalo. IBM DATATEXT~ the on-line computer serv- ice which was used for the conversion, provided an upper- and lower- case typewriter which transmitted data to disk storage of a digital com- puter. Output was a magnetic tape containing bibliographic information tagged in a· modified MARC I format. Typists performed all tagging at the console. AU information except diacriticals and non-Roman alphabets was converted. Direct costs for the first six months were $.55 per title. Several recent articles have reported on methods and related costs to convert library bibliographic information to machine readable form. Chapin ( 1) compared keypunching, paper tape, and optical char- acter recognition. Keypunching was also described by Hammer ( 2), and Black (3) . Buckland (4) described paper tape conversion, and Johns Hopkins University ( 5) reported on optical character recognition. On- line computer terminals have been proposed ( 6), but have hitherto not been tried in a large library. Without attempting to discuss the various techniques, this paper presents a detailed report of converting with on-line computer terminals. It is hoped that the experiences reported here and in the cited articles will 218 Journal of Library Automation Vol. 1/ 4 December, 1968 provide suitable information to a library administration considering large- scale conversion. BACKGROUND In 1965 a systematic program of automation was begun in the Libraries of the State University of New York at Buffalo. The general goals of the program were to improve services to patrons and streamline internal op- erations. There are three general areas usually considered for automation in a library: acquisitions and accounting, the card catalog, and circulation control. An analysis of the system indicated that conversion of the card catalog to machine readable form would provide the greatest improve- ment in library services and operations. The reasons for the decision were as follows. First, the University Li- braries are growing rapidly; in one year the shelf list will increase by 60,000 to 100,000 titles, or about 15 to 25 per cent. Second, SUNY Buf- falo is currently planning a new campus which will be completed in five to ten years. In the interim, the University will be spread over three major campus locations, with many smaller offices and departments lo- cated throughout the city, and the Libraries must provide some form of bibliographic index for each location. The conversion of the shelf list to machine readable form will allow this distribution of the bibliographic information at a very low cost per title. Finally, the project will provide experience in using magnetic tape for the handling of bibliographic in- formation, so that when the Library of Congress' MARC Project begins to produce magnetic tapes, SUNY Buffalo will be able to utilize them immediately. SELECTING THE CONVERSION HARDWARE In 1966, a proposal for converting the shelf list to machine readable form ( 7) was presented to the Library administration. It pointed out the many improvements in patron services, the advantages to the Library staff, both professional and clerical, and the monetary savings to be realized by such a conversion. It discussed the four methods of file con- version then feasible: punched cards, optical scanners, punched paper tape, and magnetic tape-keyed data converters (as exemplified by tl1e Mohawk Data Sciences equipment) ( 8). The proposal recommended using the magnetic tape-keyed data converters because of their input speed, ease of entry, and elimination of handling cards or paper tape. During the first quarter of 1967, a fifth method of conversion was considered, an IBM product called DATATEXT (9). It required the rental of an IBM 27 41 communications terminal (essentially a typewriter), a Western Electric 103a Data-Set, and a voice-grade telephone line to the nearest IBM installation, which was Cleveland, Ohio. A customer may buy time in six-hour blocks called DATATEXT Agreements. An Agree- Conversion of Bibliographic Information/ BALFOUR 219 ment covered a time segment from 7:00a.m. to 1:00 p.m., or from 1:30 p.m. to 7:30p.m., five days a week. DATATEXT provided everything that the magnetic tape converters did with some important additions. First, it had upper- and lower-case alphabet using a shift character (The Library ad- ministration had seen only the Mohawk upper-case converter). Second, the typewriter gave a typed copy which was easy to proofread. Third, corrections were much easier because of the text-editing capabilities of the on-line computer. Text-editing can best be illustrated by describing a typical DATA- TEXT job. A typist working from source material produces a typewritten page; at the same time, the IBM 27 41 she is using transmits the data being typed to the computer in an area called "working storage". When typing is completed, the clerk gives the appropriate command and the information is stored in an area called "permanent storage", a computer manipulation which can be compared to taking a page from the type- writer and placing it in a folder in a file cabinet. When the typist wishes to make changes to the information, she can give a command to recall it from permanent storage to working storage. She can then manipulate it in several ways. During original entry, the computer automatically assigned numbers to each line. Using these line numbers, a typist can move information within the text, can add or delete information, and can correct errors. Commands are very simple and concise; for example, it takes four keystrokes to move a new line into the text. In making a cor- rection, the typist merely types the incorrect word and the correct word; the computer then types the complete line to show that the correction has been properly executed. (This instant replay, or on-line interaction, is a benefit unique to the on-line terminal.) After any change, the com- puter automatically renumbers lines and reformats the entire text. A sample of typed input is illustrated and discussed later in the article. In April 1967, it was decided to test the DATATEXT service because of its powerful correction capability, and because it could be installed and working within three weeks. In May the console was delivered, the telephone equipment installed, and a long-distance line to Cleveland rented. A one-month test of DATATEXT proving successful, three more consoles, data sets and telephone lines were added, and the Conversion Project was fully underway. TRAINING THE TYPISTS The majority of the typing and proofreading staff were drawn from existing personnel in the cataloging department. Individuals chosen had a background in either catalog card typing or file maintenance, and con- sequently a good working knowledge of information on a catalog card. It was anticipated that with a minimum of further training, the typists could identify and tag information as they were typing it at the console. This assumption was critical to the success of the project, since the Li- .... ----------------~- -- --- - 220 Journal of Library Automation Vol. 1/ 4 December, 1968 brary could not afford the professional time necessary for complete pre- tagging of bibliographic information. Typists involved in the one-month test were given several hour-long training sessions on tagging before the console arrived. When the project got underway, a list of all possible tags was posted near the console, and a librarian was nearby to answer questions. Mter three weeks of opera- tion, it was obvious that the typists could tag at the console, thus making this part of the test run a success. The tagging system used was developed from the MARC I pilot proj- ect ( 10). Most of the original tags were retained and several additional ones designed to meet specific local needs. Tape files created were for- matted according to MARC I specifications, although fixed fields were left blank. The tagging system is outlined in a reference manual pre- pared for tYPists and proofreaders ( 11). Operation of an on-line console requires special training. IBM sent a DATATEXT Instructor to Buffalo on several occasions to provide tYPist training. For the major training session, which occurred in June, the IBM representative came for a full week. Ten typists were trained; five spe- cialized in entering information, and five specialized in retrieving, cor- recting, and transmitting information. By the end of the week both groups were skilled in their respective specialities, and many typists were able to perform well in both areas. Later, typists were trained in several ses- sions by one of the Library's typing staff. During the first three months, the author was near the terminals at all times to answer questions on terminal operation, to collect data for measuring and controlling performance, and to act as supervisor. A li- brarian was on call for questions on complex library problems, and the Programmer-Analyst was available to help solve problems regarding input format and tagging. At the end of this period, appropriate clerical staff had been trained to supervise minute-to-minute operation. CONVERSION PROCEDURES The general method of conversion (Figure 1 ) was as follows. A typist typed into "working storage" for an hour, inputting 15 to 30 shelf list cards. She instructed the computer to store this "document" in a perma- nent storage location on disc. She then placed the typed copy and cards in a proofreading bin, cleared working storage, and started another docu- ment. A proofreader compared typed copy with original cards and indicated any errors. The corrected document then went to a correction tYPist who "retrieved" the document from permanent storage to working storage, performed the corrections, and transmitted the corrected document to magnetic tape. The original uncorrected document was left in permanent storage over- night and deleted the following day. Documents were transmitted to tape Bqffalo 3 X 5 Shelf List Cards Hard Copy Proof- Reading Operation File Conversion of Bibliographic Information/ BALFOUR 221 Cleveland Computer Disc Storage Mail to Library Fig. 1. Shelf List Conversion Information Flow. 222 Journal of Library Automation Vol. 1/ 4 December, 1968 for about two weeks and the accumulation returned to the library via the mails. (IBM saved all permanent storage records for one week as a security measure. If a library typist inadvertently deleted a document , it could be retrieved by the computer operator. ) Figure 2 shows a sample of typed input and subsequent correction. Line numbers, as they are stored on the disc, are included on the right margin for ease of explanation. Lines typed in capitals are computer r e- sponses to commands, the first entry being the command to clear work- ing storage. The computer responds and then indicates that the console is in one of two general input modes. All cards are typed in "automatic" mode, for which the typist gives the appropriate command. When the computer responds the typist asks for the next line number, which is 3, and begins to input the card. In line 4, the typist makes an error and realizes it before throwing the carriage. She hits the "attention" key pro- c CLEARED UNCONTROLLED MODE a AUTOMATIC MODE n NEXT NUMBER -- 3 90t BS2575.3.A7 lOt Bible. N.T. Matthew. English. 1963. New English. 20t The Gospel according to Matthew=. Commemen 3 ntary by A.W. Argyle. 4 30a Cambridge 30b University Press 30c 1963 40t 227 p. maps. 20 em. SOt The Dambridge Bible Commentary: New English Bible 70t Bible. N.T. Matthew -- Commentaries. 7lt Argyle, Aubrey William, 1910- 73t Title. 60z 92t 226.207 94t 63-23728 n NEXT NUMBER -- 10 6 Dambridge Cambridge ~Ot The Cambridge Bible commentary: New English Bible Fig. 2. Sample Input and Correction of One Shelf List Card. 5 6 7 8 9 - Conversion of Bibliographic Info1'mation/BALFOVR 223 clueing the underscore, rolls the platen down, back spaces, and retypes the correct word. The computer then corrects the error. In line 6 the typist misspells "Cambridge", but does not realize it before throwing the carriage. The correction is shown at the bottom although the input typist could not have performed it herself; it would have gone through proof- reading and back to the correction typist. The correction is made by typ- ing the line number, in this case "6", the incorrect word, "Dambridge", tab, and the correct word. The computer responds by typing out the complete line showing the correction. Except for a brief period, the shelf list was converted in alphabetic order, and by December 1 shelf list drawers through the E's were com- pleted. Early in the project, some of the literature classification, P and PQ, was converted. Foreign languages in the PQ's gave no particular problems, and typing rates did not drop. All cards were converted in shelf list order except for those having non-western alphabets. When possible, these were transliterated and en- tered. Otherwise their input was delayed. Since the 2741 console has no diacritical marks, these were left out; however each card having them was entered and given a special tag to permit retrieval at a later date when diacritical marks could be added by special coding such as used by MARC. Conversion consoles and shelf list were in the same building. Each day, several inches of cards were removed from the drawer being pro- cessed and a marker inserted indicating where the cards had gone. In general operation, cards were returned and refiled in less than a day so that inconvenience to staff was minimal. As a card was proofread, it was marked on the back with a "C" and the ·upper right hand comer re- ceived a very small notch with a McBee punch. Thus, newly cataloged cards filed with cards already converted are recognizable by the un- notched comer. COSTS Table 1 gives a statistical summary of the conversion project from July 31 through December 1, 1967. The term "L.C. card" refers to a com- plete bibliographic entry for a title and may include more than one physi- cal card, or may include writing on the back of a card. Input and cor- rection functions are reported separately and then totaled to give a real- istic input rate per hour for corrected cards. Supervisor cost reflects wages of clerical supervisors only. Those of the Programmer-Analyst, the Librarian and the Systems Analyst assigned to the project are not included. A breakdown of monthly equipment costs per console is given in Table 2. Installation costs were $150 for each terminal, and $50 for each leased telephone line. When the project operated four consoles, the monthly equipment cost was $4,472. 224 Journal of Library Automation Vol. 1/ 4 December, 1968 Table 1. Conversion Project Statistics (July 31-Dec. 1, 1967) Input, Proofreading and Correction Total L.C. Cards Input Typist Hours Input Typist Hours Correcting Total Typist Hours Proofreading Hours Number of Errors per L.C. Card L.C. Card Input Rate per Hour L.C. Card Correction Rate per Hour Overall Conversion Rate (Input & Correction) Cards per Hour Proofreading Rate, Cards per Hour Costs Labor Cost @ $1.75 per Hour Equipment and Supervisors Total Cost Cost per Card Converted Utilization of Console Time Hours Typed Hours Consoles Down Hours Computer Down Hours Lost Time Table 2. Monthly Operational Costs per Terminal IBM 2741 Communications Terminal Western Electric 103a Data Set 24-hour voice-grade lease line to Cleveland plus local telephone costs 2 DATA TEXT Agreements @ $310. TOTAL 3,035 492 3,381 245 91 438 4,155 49,348 3,527 1,235 .42 16.3 100 14 40 $ 8,078.00 18,995.00 $27,073.00 $0.55 81.4% 5.9% 2.2% 10.5% 100.0% $ 85.00 27.50 385.50 620.00 $1118.00 "Hours Typed" is time that consoles were actually being used to in- put or correct cards. This is slightly less than "Typist Hours Worked" because some correction has been delayed, but it was included in hours worked to give true representation of input rates. "Hours Consoles Down" reflects time lost due to console breakdown. During the early part of the Conversion of Bibliographic Information/ BALFOUR 225 period, two consoles were failing often. However, as operating problems were solved, console down-time dropped far below the average 5.9 per cent shown. "Hours Computer Down" was also greater during early weeks of the project. However, for each hour down, IBM credited the Library with $12.00 ( $3.00 per terminal for four terminals). "Hours Lost Time" reflects periods when a working console could not be manned because of personnel breaks or operator absence. All times are given in console-hours, four consoles operating for one hour being recorded as four hours. The error rate of .42 errors per card is very low. Allowing 350 char- acters per shelf list card, typists were making one error for every 830 keystrokes. This translates to about 3 errors per typewritten page of 50 characters per line, 50 lines per page. The Office of Secretarial Studies of SUNY at Buffalo indicates that this rate is well within the tolerance for "normal" typing, as in a typing pool. When it is considered that typists were tagging and inputting complicated bibliographic informa- tion, rate of accuracy was commendably high. Typists used in the project included the lowest salary grade of civil- service typists, part-time hourly workers, and students. An acceptable input rate for civil service typists was 18 cards per hour, which is equiva- lent to 21 5-character words per minute. The faster typists, at 26 cards per hour, were typing at 30 words per minute. Again, let it be mentioned that the material was complex and that typists were required to tag each piece of information. CONCLUSIONS Several points can be made about converting with DATATEXT. It was easy to implement and received excellent support from IBM. The IBM Information Marketing staff in Cleveland provided constant assistance during the early part of the installation and visited often once the project was successfully underway. IBM sent the DATATEXT instructor as often as needed and provided free computer time during teaching sessions. The four long-distance telephone lines and Data Sets proved reliable. There was only one instance during the period when a line was inoper- able and it was repaired in three hours. The liaison and support from New York Bell Telephone was very good. DATATEXT costs would have been lower had the IBM installation been nearer. Cleveland is 173 miles from Buffalo giving a 24-hour lease- line cost of $342 per month. (DATATEXT service will soon include a uniform long-distance-lines cost.) Verification or correction on DATATEXT does not require human re- typing of each line of entry. Only the word in error and its replacement need be typed; the console then types the corrected line to show that the error was deleted and the replacement inserted. Consequently cor- rection costs are low and corrections accurate. 226 Journal of Library Automation Vol . l / 4 December, 1968 Average rates and costs given in Table I reflect learning during the first six months of the project. Towards the end of the reported period, rates were improving and costs decreasing. Since December 1967, the project has added three more consoles and uses a DATATEXT service provided by a campus computer. Costs have dropped below $.45 per card, a figure which will increase somewhat when diacriticals are added. Potentially cost per title for complete conversion is under $.50. REFERENCES 1. Chapin, Richard E.; Pretzer, Dale H.: "Comparative Costs of Convert- ing Shelf List Records to Machine Readable Form," Journal of Li- brary Automation, 1 (March 1968), 66-7 4. 2. Hammer, Donald P.: "Problems in the Conversion of Bibliographic Data- A Keypunching Experiment," American Documentation, 19 (January 1968), 12-17. 3. Black, Donald V. : "Creation of Computer Input in an Expanded Char- acter Set," Journal of Library Automation, 1 (June 1968), 110-120. 4. Buckland, L. F.: Recording of Library of Congress Bibliographical Data in Machine Readable Form (Rev. ed.; Washington, D.C.: Coun- cil on Library Resources, 1965). 5. The Johns Hopkins University. Milton S. Eisenhower Library: Prog- ress Report on an Operations Research and Systems Engineering Study of a University Library (Baltimore: Johns Hopkins, 1965). 6. International Business Machines Corporation. Federal Systems Divi- sion : Report of a Pilot Protect for Converting the Pre-1952 National Union Catalog to a Machine Readable Record (Rockville, Maryland: IBM, 1965). 7. Lazorick, Gerald J.; Herling, John; Atkinson, Hugh: Conversion of Shelf List Bibliographic Information to Machine Readable Form and Production of Book Indexes to Shelf List (Buffalo, N.Y.: State Univer- sity of New York at Buffalo, Technical Information Dissemination Bureau, 1966). 8. Mohawk Data Sciences Corp.: DATAGRAM No. 35, 1181 TWK Cor- respondence Data-Recorder, (Herkimer, N.Y., Mohawk Data Sciences Corp., 1967). 9. International Business Machines Corporation: DATATEXT Operators Instruction Guide, Form # J20-0010-1 (IBM, White Plains, N.Y., 1967). 10. U.S. Library of Congress, Information Systems Office: A Preliminary Report on the MARC (MAchine Readable Catalog) Pilot Protect (Washington, D.C.: Library of Congress, 1966). 11. Michael M. Coffey: Reference Manual for Typists and Proofreaders. SUNYAB Shelf List Conversion Project (Buffalo, N.Y. : SUNY at Buf- falo, Technical Information Dissemination Bureau, 1968). 2936 ---- BIBLIOGRAPHIC RETRIEVAL FROM BIBLIOGRAPHIC INPUT; THE HYPOTHESIS AND CONSTRUCTION OF A TEST Frederick H. RUECKING, Jr.: Head, Data Processing Division, The Fondren Library, Rice University, Houston, Texas 227 A study of problems associated with bibliographic retrieval using unveri- fied input data supplied by requesters. A code derived from compression of title and author information to four, four-character abbreviations each was used for retrieval tests on an IBM 1401 computer. Retrieval accuracy was 98.67%. Current acquisitions systems which utilize computer processing have been oriented toward handling the order request only after it has been manually verified. Systems, such as that of Texas A & I University (1), have proven useful in reducing certain clerical routines and in handling fund accounting ( 2). Lack of a larger bibliographic data base and lack of adequate computer time have prevented many libraries from studying more sophisticated acquisitions systems. At the time the MARC Pilot Project ( 3) was started, the Fondren Li- brary at Rice University did not have operating computer applications in acquisitions, serials, or cataloging. The University administration and the Research Computation Center provided sufficient access to the IBM 7040 to permit the study of problems associated with bibliographic retrieval using input data which has varying accuracy. In 1966, Richmond expressed the concern of many librarians about the lack of specific statements describing the techniques by which on-line re- trieval could be accomplished without complicating the problems pre- sented by the current card catalog ( 4). She had previously described some of the problems created by the kind and quality of data being uti- lized as references by library users ( 5). 228 Journal of Library Automation Vol. 1/ 4 December, 1968 An examination of the pertinent literature indicates that most of the current work in retrieval, while related to problems of bibliographic re- trieval, does not offer much assistance when the input data is suspect ( 6, 7,8 ). Tainiter and Toyoda, for example, have described different tech- niques of addressing storage using known input data ( 9,10). One of the best-known retrieval systems is that of the Chemical Abstracts Service, which provides a fairly sophisticated title-scan of journal articles with a surprising degree of flexibility in the logic and term structure used as input. Comparable systems are used by the Defense Documentation Center, Medlars Centers, and NASA Technology Centers. These systems have one specific feature in common: a high level of accuracy in the input data. USER-SUPPLIED BIBLIOGRAPHIC DATA The reliability of bibliographic data supplied to university libraries from faculty and students has long been questioned ( 5). Any search system which accepts such data must be designed 1) to increase the level of con- fidence through machine-generated search structures and variable thresh- holds and 2) to reduce the dependence upon spelling accuracy, punctu- ation, spacing and word order. The initial task of formulating an approach to this problem is to deter- mine the type, quality, and quantity of data generally supplied by a user. To derive a controlled set of data for this purpose, the Acquisition Depart- ment of the Fondren Library provided Xerox copies of all English language requests dated 1965 or later and a random sample of 295 requests was drawn from that file of 5000 items. This random sample was compared to the manually-verified, original order-requests to determine 1) the frequency with which data was sup- plied by the requestor and 2) the accuracy of the provided information. Results of this study are given in Table 1. Table 1. Level of Confidence in the Input Data Data Times Times Level of Elements Given Correct Accuracy Confidence Edition 295 294 99.6 99.6 Title 295 292 99.0 99.0 Author 290 264 91.0 82.7 Publish. 268 218 81.3 73.9 Date 265 215 81.1 72.8 The results suggest that edition can have great significance when speci- fied and should be used as strong supporting evidence for retrieval. It should not necessarily be a restrictive element because of the low-order magnitude of actual specification, which was five times in the sample. (Unstated editions were considered as first editions, and correct. ) Bibliographic Retrievalj RUECKING 229 Title is the most significant and most reliable element. As Richmond indicates, use of the entire title for searching would present distinct prob- lems for retrieval systems ( 4) . Consequently, an abbreviated version of the title must be derived from the input data which will reduce the impact and significance of the problems described by Richmond (5). THE HYPOTHESIS It is hypothecated that retrieval of correct bibliographic entries can be obtained from unverified, user-supplied, input data through the use of a code derived from the compression of author and title information sup- plied by the user. It is assumed that a similar code is provided for all en- tries of the data base using the same compression rules for main and added entry, title and added title information. It is further hypothecated that use of weighting factors for individual segments of the code will provide accurate retrieval in those cases when exact matching does not occur. Before the retrieval methodology can be described, it is necessary to outline the compression technique to be used with author and title words. TITLE COMPRESSION To gain some understanding of the problems to be faced in compressing title information, a random sample of 500 titles was drawn from the first half of the initial MARC I reel (about 4800 titles). Each of these titles was analyzed for significant words and tabulations were made on word strings and word frequencies. The following words. were considered as non-significant: a, an, and, by, if, in, of, on, the, to. The tabulated data, shown in Table 2, contain some surprising attributes. Approximately 90% of the titles contain less than five significant words, which suggests that four significant words will be adequate to match on title. Table 2. Significant Word Strings in Titles Length of Word String 1 2 3 4 5+ Total Number of titles 42 151 179 76 52 500 Percentage 8.4 30.2 35.8 15.2 10.4 100.0 Cumulative Percentage 8.4 38.6 74.4 89.6 100.0 Letting n stand for the corpus of words available for title use, the ran- dom chance of duplicating any specific word in another title can be stated 1 as - . When a string of words is considered, the chance of randomly n 1 selecting the same word string may be considered as -a, where 'a' is the n number of words in the string. 230 Journal of Library Automation Vol. 1/ 4 December, 1968 Certain words are used more frequently than others, and the occurrence of such words in a given string reduces the uniqueness of that string. The curve displayed in Figure 1 shows the frequency distribution of words in the sample. The mean frequency of words in the title-sample is 1.33. 'iOO ( )B~f 800 700 600 t.r) 0 a: 0 3.500 lL. D 0:: LLJ CDfOO ~ =:I :z: 3()() 2IXJ \ 100 fi'}.. I~ K f+!.~ \' Jtl-' __() (I) I (~ _c[).l I z 3 '1- s 6 7 8 f/ 10 II /2 Ffi!EQUENCY Fig. 1. Frequency Distribution of Words in Sample. Bibliographic RetrievaljRUECKING 231 Therefore, the chance of selecting an identical word string can be more accurately expressed as: n" An examination of word lengths, as shown in Table 3, shows that 95% of the significant title words contain less than ten characters. An examina- tion of the word list revealed that some 70% of the title words contain inflections and/ or suffixes. If these suffixes and inflections are removed, approximately 43% of the remaining word stems contain less than five characters and 59% contain less than six. Table 3. Distribution of Character Length and Stem Length Length in Total Different Percent Stems Percent Characters Words Words 1 7 5 0.5 5 0.8 2 25 14 1.3 14 2.3 3 87 48 4.6 48 7.9 4 172 117 11.1 196 32.3 5 229 163 15.5 92 15.2 6 198 153 14.5 94 15.5 7 202 159 15.3 64 10.6 8 158 122 11.6 45 7.4 9 121 102 9.7 15 2.5 10 84 69 6.6 8 1.3 11 54 48 4.6 7 1.2 12 38 28 2.7 2 0.3 13 14 12 1.1 2 0.3 14 6 4 0.4 0 0.0 15 3 3 0.3 0 0.0 16 2 2 0.2 0 0.0 Summary 1400 1049 592 The reduction of word length does affect the uniqueness of the individ- ual word, merging distinct words into common word stems at a mean rate of 2.5 to 1.0. In Table 3 the difference between 1049 words and 592 stems reflects the reduction of similar words into a common stem; for example: America, American, Americans, Americanism, etc., into A.mer. Thus, the uniqueness of a string of title words is reduced to the following chance of duplication: (2.5 X 1.33 )• 3.3• n• or-n" 232 Journal of Library Automation Vol. 1/ 4 December, 1968 An analysis of consonant strings made by Dolby and Resnikoff provides frequencies of initial and terminal consonant strings occurring in 7000 common English words (with suffixes and inflections removed) ( 11,12, 13). These frequency lists clearly show that the terminal string of conso- nants has considerable information-carrying potential in terms of word identification. The starting string also carries information potential, but significantly less than the terminal string. By combining the initial and terminal strings, it is possible to generate an abbreviation which has ade- quate uniqueness and reduces the influence of spelling. The high percentage of four-character word stems and the fact that the maximum terminal string contains four consonants suggest the use of a four-character abbreviation. To compress a title word into four characters, it is necessary to specify a set of rules. The first rule will be to delete all suffixes and inflections which terminate a title word. The second rule will be to delete vowels from the stem until a consonant is located or the four-character stem is produced. The suffixes and inflections deleted in this procedure are contained in Table 4. When the stem contains more than four characters, the third compression rule states that the four-char- acter field is filled with the terminal-consonant string and remaining posi- tions are filled from the initial- character string. Table 4. Deleted Suffixes and Inflections -ic -ive -in -et -ed -ative -ain -est -aged -ize -on -ant -oid -ing -ion -ent -ance -og -ation -ient -ence -log -ship -ment -ide -olog -er -ist -age -ish -or -y -able -al -s -ency -ible -ial -es -ogy -ite -ful -ies -ology -ine -ism -ives -ly -ure -urn -ess -ry -ise -ium -us -ary -ose -an -ous -ory -ate -ian -ious -ity -ite The relative uniqueness of the generated abbreviation can be calcu- lated using the data supplied by Dolby and Resnikoff. For example, Car- ter and Bonk's Building Library Collections would be abbreviated- BULD, LIBR,COCT. The random chance of duplicating any abbreviation can be stated as consisting of the product of the random chance of duplicating the initial string and the random chance of duplicating the terminal string: Bibliographic Retrievalj RUECKlNG 233 fl ft -X- x3.32 n1 nt The frequencies listed by Dolby and Resnikoff may be substituted in the above equation producing the following chances for duplication: 324 63 1 x - - x 10.89 = -- for BULD 6800 6800 208 288 6800 277 6800 1 1 x 6800 x 10.89 = 14745 for LIBR 16 1 x 6800 x 10.89 = 1041 for COCT The random chance of duplicating this string of three abbreviations can be calculated by multiplying the individual calculations, which yields the random chance of 1 in 32 x 108• This high uniqueness declines rapidly when the title contains less than three significant words and contains high frequency words, such as the title Collected Works, for which the same uniqueness calculation produces the random chance of 1 in 44 x 104• To increase the level of uniqueness on short titles, like Collected Works, it becomes necessary to provide supporting data to the title information. It is clear that the supporting data must come from supplied author text. AUTHOR COMPRESSION The same compression algorithms can be used for both personal and corporate names with some modifications. The frequent· substitution of "conference" for "congress" and "symposia" for "symposium" suggests that meeting names should be considered as a secondary sub-set of non-signifi- cant words. Names of organizational divisions, such as bureau, department, ministry, and office, can be considered as part of the same sub-set. The rules which govern the deletion of inflections, suffixes and vowels can be used for corporate names, but personal author names must be car- ried into the compression routine without modificatjon. Only the last name of an author would be compressed into a code. CONSTRUCTING THE TEST Four, four-character abbreviations are allowed for title compression and four for author. Rather than use a 32-character fixed field for these codes, the lengths of the input and main-base codes are variable, with leading control digits to specify the individual code sizes for the title and author segments. . Provision is made for the inclusion of date, publisher and/ or edition in the search-code sh·ucture although these were not implemented in the test performed. . 234 Journal of Library Automation Vol. 1/ 4 December, 1968 At the time the input data is read, the existence of title, author, edition, publisher and date is indicated by the setting of indicators which control the matching mask and which, in part, control the specification of the retrieve threshhold. The title indicator specifies the number of compressed words in the supplied title which must be matched by the base code. A simple algorithm is used to calculate the threshhold values given in columns two through four of Table 5. Columns five through seven are obtained by adding two to the calculated threshholds. Each agreement within the mask adds to a retrieve counter the values indicated in the last five columns of Table 5, the values of X and Y being the number of matching code words in the title and author segments respectively. CONDUCTING THE TEST As mentioned above, the initial tests of the retrieve were based upon title and author matching exclusively and required three runs on the Fondren Library's 1401 computer. The first loaded 2874 original order- requests, generated a search code utilizing the rules specified in this paper and created an input tape. The second run extracted title and author data from the MARC I data base, created multiple search codes for title, main entry, added title and added entry. Both tapes were sorted into ascending search-code sequence. The final run was the search program which attempted to match input codes with the MARC I base codes. When there was agreement based on relationship of threshhold and retrieve counter, the printer displayed threshhold, short author and short title on one line, and retrieve value, input author and title on the next line as illustrated in Figure 2. The printed results were compared to validate the accuracy of the retrieve. This comparison was cross-checked against the results of the acquisition department's manual procedures. The search program also provided for an attempt to match titles on the basis of a rearrangement of title words. In such attempts the retrieve threshhold was raised. ANALYSIS OF RESULTS The raw data obtained from this experimental run are shown in Table 6. Of the 287 4 items represented in the input file , 48.4%, or 1392, were actually found to exist in the data base. Of those actually present 90.4%, or 1200, were extracted with an overall accuracy of 98.67%. An examination of the sixteen false drops revealed several omissions in the compression routines for the input data and for the data base. One of the more significant omissions was failing to compensate for multi-char- acter abbreviations, particularly 'ST.' and 'STE.' for 'Saint.' A subroutine for acceptance of such abbreviations added to the search-code generating program would increase the retrieve accuracy to 99%. Table 5. Values for Variable Threshhold Data Threshhold Values Agreement Values Given Full-Code Test Individual Code Test Title Author Edition Publish. Date TAEPD 3 or 4 2 1 3 or 4 2 1 XYlll 12 8+2Y 4+2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 XYllO 12 8+2Y 4+2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 XYlOl 12 8+2Y 4+ 2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 XYlOO 12 8+2Y 4+ 2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 XYOll 12 8+2Y 4+2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 l::x; .... XYOlO 12 8+2Y 4+ 2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 ~ g:' XY001 12 8+2Y 4+2Y 14 10+2Y 6+2Y 4X 2Y 3 2 1 (1Q ~ "';j XYOOO 12 8+2Y 4+2Y 14 18+2Y 6+2Y 4X 2Y 3 2 1 ;;:to .... ~ XOlll 12 11 7 13 12 7 4X 2Y 3 2 1 ::x; {';) xouo 12 11 7 13 12 7 4X 2Y 3 2 1 ..... "'t .... {';) c: X0101 12 11 7 13 12 7 4X 2Y 3 2 1 ~ "' X0100 12 11 7 13 11 7 4X 2Y 3 2 1 !:l:l c::: xoou 12 10 6 13 11 7 4X 2Y 3 2 1 trl (") p.:: XOOlO 12 10 6 13 Not permitted 4X 2Y 3 2 1 -z X0001 12 9 5 13 Not permitted 4X 2Y 3 2 1 0 1:-0 xooo 12 Not permitted Not permitted c.:> CJ1 10 4ME R4M8RHCHS 10 AM~R4MBRHCHS Ob AME~BOLL Ob AMEii.BOLL 12 AMERBUSQSHOWZIEN 12 AMERBUSQSHOWZEIN 12 AMERCNTRCAMPBRTH 12 AMERCNTRCAMP 12 AHERJEWSISRLISCS 1~ AMERJEWSISRLISCSHALOR 12 AMEROCCPSTCTBLAU 1~ AMEROCCPSTCTBLAU 12 AMEROCCPSTCTOUNN 12 AHEROCCPSTCTBLAU 12 AMERPARTSYSMCHRS 14 AHERPARTSYSMCHRS 10 AMERPREOWARN 10 AMERPREOWARN 10 AMERSCHKilLCK 10 4MERSCHKBLCK 10 AMERSCHOSEXI'i 10 AMERSCHOSEXNPATCCAYO 12 AMERSPACEXPRSHtN 1~ AMERSPACEXPRSHEN 12 AMERTHETTOOAOOWR 1~ AMERTHETTOOAOOWR 12 AME R THTii.AS SEENBRWN 11> AMERTHTRAS SEE NMOS SMONSJ 12 AMEiHHTRAS SEENMOSS 18 A!'IERTHTRAS SEEI'iMOSSMO'ISJ 12 AN4ZPHPHARGUMCGL 12 ANAZPHPHARGUMCGFJAN PHIP 12 ANCIHUNTFAR WESTPOUD 18 ANCIHUNTFAR WE STPOUO Fig. 2. Sample of Retrieved Citations. HEINRICHS, WALDO H. HEINRICHS* BOSWELL, CHARLES. BOSWELlt liEOMAN, IRVING . lEIOMANt BOSWORTH, ALLAN R. CLAY, c. T.t ISAACS, HAROLD ROBERT; ISAACS, HAROLD R.t BLAU, PETER MICHAEL. BLAUt DUNCAN, OTIS OUOLEYo JO BLAUt CHAMBERS, WILLIAM NISBET CHAHBERSt WARREN, StONEY, 1916 - WARREI'it BLACK, HILLEL. BLACKt SEXTON, PATRICIA CAYO. SEXTOI'io PATRICIA CAYOt SHELTON, WILLIAM ROYo SHELTONt DOWNER, ALAN SEYMOUR, OOWNERt . BROWN, JOHN MASON, 1900 HOSES, MOi'iTROSE J.t AMERICAN AMBASSAOOR JOSEPH C. GR AMERICAN AMBASSAOJRt THE AMERICA THE STORY OF THE WORL THE AMERICA. THE STORY OF THE WORLD THE AMERICAN BURLESQUE SHOW. THE AMERICAN BURLESQUE SHOWt AMERICA-S CONCENTRATION CAMPS BY AMERICA-S CONCENT~ATION CAMPSt AMERICAI'i JEWS IN ISRAEL BY HAAO AMERICAI'i JEWS IN ISRAELt THE AMERICAI'i OCCJPATIONAL STRUCTUR THE AMERICAN etCUPATIONAL STRUCTURE THE AMERICAN OCC~PATIONAl STRUCTUR THE AMERICAN OCCUPATIONAL STRUCTURE THE AMERICAN PARTY SYSTEMS STAGES THE AMERICAN PART~ SYSTEMS• STAGES THE AMERICAN PRESIDENT THE AMER[CAN PRESIOENTt THE AMERICAN SCHJOLBOOK. THE AMERICAN SCHOOLBOOK* READINGS THE AMERICAN SCHOOL A SOCIOLOGIC THE AMERICAN SCHOlL. A SOCIOLOGICAL AMERICAN SPACE EXPLORATION THE F AMERICAN SPAt~ EXPLORATION. THE FIR THE AMERICAN THEATER TODAY, EOITE THE AMERICAN THEATER. TODAY* THE AMERICAN THEATRE AS SEEN BY IT THE AMERICAN THEATRE AS SEEN BY ITS MOSES, MONTROSf JQNASo THE AMERICAN THE4TRE AS SEEN BY IT HOSES, MONTROSE J.t THE AMERICAN THEATRE AS SEEN BY ITS MCGREAL, IAN PHILIP, 19 ANALYZING PHILOSOPHICAL ARGUMENTS MCGREAF, JAN PHILLIPt ANALYZING PHILOSOPHICAL ARGUMENTS. POURAOE, RICHARD F. POURADE* ANCIENT HUNTERS JF THE FAR WEST, ANCIENT HUNTERS OF THE FAR WEST* ~ o;, 0" ~ ....... .Q.. t"'l .... ~ ~ ~ I e· ;:$ < 0 r- ....... ~ t1 (!) (') (!) g. (!) ..:-: ....... CD 85 Bibliographic RetrievaljRVECKING 237 Table 6. Table of Results Retrieve Total Correct False Percentage Values Hits Hits Hits Correct 6 14 14 0 100 8 0 0 0 0 10 311 311 0 100 12 264 248 16 93.3 14 232 232 0 100 16 118 118 0 100 18 260 260 0 100 20 1 1 0 100 Totals 1200 1184 16 98.7 Table 7. Distribution of Errors Title Errors Author Errors No. of Title Author Author Codes Error Spelling Lacking Error Spelling Other Total 1 2 3 10 12 27 4 58 2 2 6 17 26 60 23 134 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 Total 4 9 27 38 87 27 192 The occurrence of titles with the words "selected". or "collected," etc., produced additional false drop when the title word string exceeded two words. A modification to the search program to raise the threshhold when the input data contain codes such as 'SECT; 'COCT' would increase the retrieve accuracy to 99.17% The presence of personal names in titles, such as 'Charles Evans Hughes' and 'Franklin Delano Roosevelt' caused seven additional false drops. At present it seems unlikely that a simple method to prevent them can be included. CONCLUSION The experimental results indicate that the hypothesis suggested is valid. Use of multiple codes for added entry, added title in addition to the main entry, and main title data are clearly necessary. Approximately 10% of the correctly retrieved items were produced by the existence of an added entry code. The influence of spelling accuracy was lessened by use of a compres- sion technique. An inspection of extracted titles revealed the existence of 43 spelling errors which did not affect retrieval. Thus, the search code reduced the significance of spelling by some 30%. Utilizing table search followed by table look-up and linking random- 238 Journal of Library Automation Vol. 1/ 4 December, 1968 access addresses, should enable the search code approach to bibliographic retrieval to provide rapid, direct access to the title sought. ACKNOWLEDGMENT This study was supported in part by National Science Foundation grants GN-758 and GU-1153 and by the Regional Information and Communica- tion Exchange. The assistance of the Acquisitions Department staff, the Research Computation Center staff and the staff of the Fondren Library's Data Processing Division is gratefully acknowledged. REFERENCES 1. Morris, Ned C.: "Computer Based Acquisitions System at Texas A & I University," Journal of Library Automation, 1 (March 1968 ), 1-12. 2. Wedgeworth, Robert: "Brown University Library Fund Accounting System," I ournal of Library Automation, 1 (March 1968), 51-65. 3. U. S. Library of Congress: Project MARC, an Experiment in Auto- mating Library of Congress Catalog Data (Washington: 1967). 4. Richmond, Phyllis A.: "Note on Updating and Searching Computer- ized Catalogs," Library Resources and Technical Services, 10 (Spring 1966), 155-160. 5. Richmond, Phyllis A.: "Source Retrieval," Physics Today, 18 (April 1965)' 46-48. 6. Atherton, P.; Yorich, J. C.: Three Experiments with Citation Indexing and Bibliographic Coupling of Physics Literature (New York, Ameri- can Institute of Physics, 1962). 7. International Business Machines Corporation: Reference Manual, Index Organization for Information Retrieval (IBM, 1961). 8. International Business Machines Corporation: A Unique Computable Name Code for Alphabetic Account Numbering (White Plains, N.Y.: IBM, 1960). 9. Tainiter, M.: "Addressing Random-Access Storage with Multiple Bucket Capacities," Association for Computing Machinery Journal, 10 (July 1963 ), 307-315. 10. Toyoda, Junichi; Tazuka, Yoshikazu; Kasahara, Yoshiro: "Analysis of the Address Assignment Problems for Clustered Keys," Association for Computing Machinery Journal, 13 (October 1966), 526-532. 11. Dolby, James L.; Resnikoff, Howard L.: "On the Structure of Written English Words," Language, 40 (Apr-June 1964), 167-196. 12. Resnikoff, Howard L.; Dolby, James L.: "The Nature of Affixing in Written English, Part I," Mechanical Translation, 8 (March 1965), 84-89. 13. Resnikoff, Howard L.; Dolby, James L.: "The Nature of Affixing in Written English, Part II," Mechanical Translation, 9 (June 1966), 23-33. 2937 ---- AUTOMATIC RETRIEVAL OF BIOGRAPHICAL REFERENCE BOOKS Cherie B. WElL: Institute for Computer Research, Committee on Information Science, University of Chicago, Chicago, Illinois 239 A description of one of the first pro;ected attempts to automate a refer- ence service, that of advising which biographical reference book to use. Two hundred and thirty-four biographical books were categorized as to type of subjects included and contents of the uniform entries they con- tain. A computer program which selects up to five books most likely to contain answers to biographical questions is described and its test results presented. An evaluation of the system and a discussion of ways to extend the scheme to other forms of reference work are given. Ideally the reference librarian is the "middleman between the reader and the right book" ( 1 ) , and this is what the program here described is intended to be. In the past there has been very little interest shown in automating this service, probably because it is neither urgent nor practical in current reference departments. Many developments in automating other areas of libraries have indirectly benefitted reference librarians, and the literature primarily emphasizes this aspect. For instance, where circulation systems have been automated, the location of a particular volume can be quickly ascertained and librarians need not waste time searching. Auto- mation of the ordering phase provides them with information on the proc- essing stage of a new volume. If the contents of the catalog have been put in machine readable form, special bibliographies can be rapidly pro- duced in response to a particular request or as a regular service of selective dissemination. The development of KWIC (Key Word In Context) in- 240 Journal of Library Automation Vol. 1/ 4 December, 1968 dexes, which are compiled and printed by computer, has enabled pub- lishers to provide indexes to their books much faster. Computers have also been programmed to make concordances and citation indexes ( 2). The combination of paper-tape typewriters, computer and a photocomposer has introduced automation into compiling Index Medicus (3). Changes in reference services themselves, however, may make automa- tion of question-answering practical. One trend is toward larger reference collections to be shared by several libraries; some areas have already set up regional reference services. There are also cooperative reference plans whereby several strong libraries agree to specialize in certain fields and cooperate in answering questions referred by the others ( 4). These trends will mean two things to reference librarians: greater concentration of re- sources, allowing more specialized books and mechanization; and screen- ing of questions at the local level, letting reference centers concentrate on more complex questions that utilize their specialized books. Thus it seems likely that special reference centers may look increasingly toward mechanizing their services, and retrieval schemes of the type presented here will be important to consider. BASIC ASSUMPTIONS The categorizing system was based on two nearly universal generaliza- ti.ons about biographical reference books: 1) They are consistently con- fined to biographies of persons who have something in common: for ex- ample, being alive or dead; or having the same nationality, sex, occupa- tion, religion, race, memberships; or possessing some combination of those attributes. These common characteristics in the people covered by a given book are herein called "exclusive categories." 2) The books generally maintain uniform entries for each subject; that is, they give the same data for each biography. These facts are referred to herein as "specifics" or "specific categories." Certain assumptions were made about reference work: 1) All biographi- cal reference books fit into the scheme and can be categorized. 2) The more limited a book's scope, the more likely it is to contain the person a user wants to find. In other words, if a user is interested in a Dutch economist, he is more likely to find information in a book limited to Dutch economists than in a general biographical dictionary. The user, however, does not want to miss any source that might be useful. Therefore a gen- eral biographical dictionary should be given to him as a last resort, after books on Dutch economists, Dutchmen of all occupations, and economists of all nationalities. 3) Certain requirements, the specifics, have no substi- tutes. For example, a book lists addresses or it does not, and if a user wants an address, books without them are useless. There is merit in suggesting to a user which book to use as opposed to giving him the direct answer to his question. Probably the best argument for this assumption is that the volume of names that would have to be Retrieval of Biographical Reference BooksjWEIL 241 compiled and stored for a direct inquiry system is staggering, only a small number would ever be looked up, and it is impossible to predict which ones would be searched for. There are advantages to mechanizing this particular task of a reference librarian: good reference librarians should be freed to perform work less easily mechanized; there are not enough reference librarians who have perfect recall of their collections even to knowing which exclusive cate- gories all the books fit into; and no librarian could have complete recall as to the specifics contained in each biographical reference book in the collection. THE COMPUTER PROGRAM The program was written in the COMIT language, a non-numerical programming language developed for research in mechanical translation, information retrieval and artificial intelligence. It is a high-level problem- oriented language for symbol manipulation, especially designed for deal- ing with strings of characters. The program could probably be converted to other list-processing languages ( 6) for operation at other installations. The program was run at the University of Chicago Computation Center on an IBM 7094 having the COMIT system on a disk. Questions were submitted and nm in large batches. · THE DATA All biographical reference books in English, with alphabetical ordering of subjects, which are in the reference room of the University of Chicago's Harper Library were included in the data and no other books were in- cluded. Since one assumption was that all biographical reference books could be categorized by the scheme, it seemed more useful to prove the system could handle any biographical reference tool than to compile a balanced list of biographical books. There was no difficulty in categorizing the books. All books are categorized in the following way. First an arbitrary ab- breviation for the book is chosen to be its entry in the file; it is referred to as a "constituent." Each book is then described by determining the values of nine subscripts each constituent carries, the subscripts being SEX, LIVING, NAT (nationality), OCCUP (occupation), MIN ( minori- ties), DATE, INDEX, SPECl and SPEC2 (specifics). Values of the first five subscripts-the exclusive categories-are first de- termined. That is, is the book limited to one sex? Are all the subjects living or dead? Do they all have a certain occupation? Does the book include only certain nationalities? Or is there another restriction; e.g., to alumni of a college, members of the nobility or a religious group? The exclusive categories for a book are determined and coded from a table of abbrevia- tions. SEX, for example, allows three values: restricted to males ( M), re- stricted to females (F) , or no restriction ( Z) . Also a value X must occur 242 Journal of Library Automation Vol. 1/ 4 December, 1968 with M or F, indicating there is a restriction. Therefore SEX can have the following combinations: SEX Z, SEX F X, or SEX M X; the values M X and F X are both the opposite of Z. Next the book's date is determined by asking "At what date did the values on LIVING (yes or no) apply? Or, if the subjects are not restricted to living or dead (LIVING Z), "When was the book up to date?" Next any indexes to the biographies are noted. All the biographical books list subjects in alphabetical order by surname. Lists of subjects in any other order are considered indexes even if the subjects are actually listed in some other order in the main body and the list that is alphabetic by surname is an index. Finally, specific categories (SPEC I and SPEC2) are coded for such facts as birthdate, birthplace, college attended, degrees held, hobbies, il- lustrations, social clubs, and marital status. When all categorizing is finished, a data item is punched in this form: DICTPHILBIO/ INDEX FIELD X, LIVING N X, OCCUP Z, SEX A, NAT PHILIP ASIAN X, SPECl DC DS FL BP L CL CM DG E I Z, DATE 50S X, SPEC2 P PL R MS PD Z, MIN Z +. This represents the Dictionary of Philippine Biography, a book limited to dead Filipinos and giving for each entry: dates, career, descendants, field, birthplace, long articles, class in college, degrees, education, picture, parents, publications, references, marital status and physical description. The book has a special index to find subjects by their field of work. One specific value, that for a long article, requires special mention. Though most biographical reference books provide the same facts about all the subjects in list form, a few provide different facts about different subjects in a nanative form. Such books carry the SPECl L, and the other specifics these books are listed as providing are not always given for every subject. For example, a book with a list format may provide the birthplace for every subject when it can be ascertained, but in a book using the narrative form, where often different authors write the articles, birthplace is not necessarily given. Books in narrative form are used less for quick reference; therefore the program provides a note, when a long article is requested, that the card catalog may provide more long articles on the subject. Ease of file maintenance is one advantage of this system. As data is analyzed in the first place, if a new value for a category is required, such as an occupation which is not in the list, the new value is simply added under OCCUP for that particular book and in the list of abbreviations for fuh1re use. It is a little more complicated to make an existing value more specific. For example, to differentiate BOTANIST, CHEM, PHYS- ICS and ASTRON and still maintain SCIENTIST as a general category embracing them all, another short program is required to retrieve the data to be reclassified. Retrieval of Biographical Reference Books/ WElL 243 CODING THE QUESTION A biographical question can be quickly coded. The nine required sub- scripts are the same as those for the data books, but only one value for each subscript is necessary. For example, "'What are the publications of a living Dutch economist? A current book is desired." is coded as Q / SEX Z (or M), LIVING Y, NAT DUTCH, OCCUP ECON, MIN Z, INDEX Z, DATE 60S, SPEC! Z, SPEC2 PL +. OPERATION OF THE PROGRAM Briefly, the program reads in data and then the first question. It weeds out data items that can never be suitable, discarding all but those items that have the same values as the question has on the subscripts INDEX, SPEC! and SPEC2. It then weeds out data items that do not have either the same values as the question, or the value Z, on the subscripts OCCUP, NAT, MIN, SEX and LIVING. Mter each weeding the program checks to determine that there are data items left; if all the books have been weeded out, there are no answers. There is also a provision to allow the user to designate certain titles to be ignored on a particular question in case he has already checked them, for example. All data items left after weeding are potential answers and could simply be printed out. However, subsequent searches over the remaining items serve the purpose of rearranging them into an order in which they are more likely to produce answers. It was decided that five answers are enough to judge the types of titles chosen yet few enough to avoid very long searches. A shorter list of answers would obviously be cheaper and a longer list more likely to produce a book containing the desired subject. Ordering proceeds as follows: first values of subscripts SEX, LIVING, MIN, OCCUP, NAT and DATE on the question as originally stated are matched to those of books in the data. The computer is at this stage searching for books that are limited in just those categories in which the question is limited. For example, if the question Q / SEX Z, LIVING Y, MIN Z, NAT DUTCH, OCCUP ECON, INDEX Z, DATE 60S, SPEC! A, SPEC2 PL + will match only those books published in the 1960's and restricted to living Dutch economists which give publications for all the subjects (or the majority), and the books cannot be restricted to a sex or to any "minority" group. The books found may or may not have additional values on the subscripts; that is, a book may also contain French econo- mists. Such books found on the first search are mostly likely to contain the subject the questioner is looking for. If there are fewer than five books found which are a perfect match with the question, the program begins to alter the question. To make the least significant possible change in the question, the program changes the value of the subscript judged to be the limiting factor on the fewest books in the data, namely sex. If SEX has a Z as its value (because the questioner did not know the sex or did not prefer a book limited to one 244 Journal of Library Automation Vol. 1/ 4 December, 1968 sex) it is changed to X so that a book limited to one sex will not be over- looked. If SEX does not have a Z value (which means it has either M X or F X), it is changed to Z. This means the questioner preferred books limited to one sex but presumably his second choice is books not limited to any sex. Clearly if the question has SEX F X it can never be changed to SEX M X or SEX X, since SEX X will find books in the data classified SEX M X. Anything other than Z changes to Z, and Z only changes to X. Mter this change is made, another search is conducted and the answers counted. Until there are five books or the data is exhausted, the original question is altered and the cycle continued. Alterations proceed by chang- ing the values of one subscript at a time in the following order: SEX, LIVING, MIN, NAT and OCCUP. Then they are changed two at a time, three at a time, four at a time, and finally all five are changed, so there are thirty-one possible changes. If at the end of the thirty-second search there are still not five answers and there are more data items, the date restriction on the question is checked. If DATE has a value other than Z, it is changed to Z, which matches all the data items, and the computer prints a note if this is done; the program will then select any book regardless of date. Control returns to search and begins the cycle again, continuing until five answers are found or the data is exhausted. Mter searching is finished, the writing routine commences. One at a time the computer takes each answer, writes out its code for possible further reference, and then writes out the complete author, title, copy- right date and Library of Congress call number, all of which the computer finds in a list within the program. RESULTS To obtain some measure of the program's accuracy, fourteen textbook questions, probably more challenging than the average patron would ask, were submitted to the computer and to a professional librarian who was especially familiar with biographical reference books. (See Figure 1 for sample questions and results. ) The librarian spent a total of an hour and a half, and found answers to eleven out of fourteen questions. On the three she could not answer she felt she had exhausted the resources. In one of the eleven she answered ("How many Americans won the Nobel Prize in medicine between 1930 and 1950?") she found the answer in a source not specifically biographical (World Almanac) and therefore not in the computer's data. No problems occurred in forming the questions for submission to the computer. The program found some reasonable sources in all cases. It found books containing the answer in ten out of fourteen cases, the four answers not found being those three the librarian missed and the one requiring an almanac. In all but one case there were more possibilities than the five books given per answer. Some questions were rerun ignoring Retrieval of Biographical Reference Books / WElL 245 Qu~stion: In one source find a list of .1t least twenty references t o biog raphical information about Dmitri ~1endelee£ (or Mende. lev), Russ i an chemist (1834-1907). As submitted to computer: Q / SEX H, LIVI NG N, OCCUP CHEH, NAT RUSSIAN, MIN Z, SPECl Z, SPECZ R , INDEX z, Dt.TE Z + Librarian's results: B Phillips, Dictionary of Biographical A Encyclopedia Britannica A Encyclopedia P-'llericana A Biography Index (1949-64 volumes) Reference .. 0 references 6 references 1 reference .. 14 references time: 10 minutes Computer's results: A Index to Scientists ... 27 references A Biography Index C Drake, Di ctionary of American Biography (sounds wrong but it is international.) B Phillips, Dictionary of Biographical Reference A Encyclopedia Britannica Question: What academic degrees have been earned by Professor Reuben L. Hill, Director of Family Study at the University of Minnesota'?: As submitted to computer: (l) Q/ SEX M, LIVING Y, OCCUP EDUC, NAT AHER, MIN Z, SPECl DG, SPEC2 z, I~"DEX z, DATE Z + ( 2) IGNORE + 1\}iECONASSN + I GNORE + AMERSCIENCE + IGNORE + AMPOLISCI + IGNORE + DAMERSCHOL + IGNORE + LEADEDUC + Q / SEX M, LIVING Y, OCCUP EDUC, NAT AMER, MIN z, SPECl DG, SPECZ Z, INDEX Z, DATE Z Librarian's results: B Leaders in Education A Who's Who in Arne rica - Answer: BS, PhM, PhD time: 3 minutes Computer 's results: ( l) D Handbook of the American Economic Association D D D B (2) B c A B B American Men of Science Biographical Directory of the American l'olitical Science Assoca t ion Directory of American Scholars Leaders in Education Who's Who i n American Education Outstanding Young Men of America Who's Who in America h'ho' s Who in various areas National Cyclopedia of American Biograp hy Question: Where might I find information about a New England ancestor named Jacob Billings who was born around 1753'? As submitted to the computer: A / SEX M, LIVING N , OCCUP Z, NAT AMER, MIN FF , INDEX Z, DATE z, SPECl z, SPEC2 Z + Librarian's results: D Handbook of Genealogy - about genealogists not families A Compendium of American Genealogy time: 8 minutes Computer t s results: A Compendium of A(!letican Genea logy C Dictionary of American Biography C Who ~.ras ~"1'10 in America C Lamb's Biographical Dict i onary of the U. S. C Concise Dictionary of American Biography A = It has the answer or a t least part of it B = Good choice but it does not have answer C = Reasonable choice but the r e arc better ones D = Poor choice Fig. 1. Sample Reference Questions. 246 Journal of Library Automation Vol. 1/ 4 December, 1968 the first five answers, and five more titles were retrieved; even then there were more possibilities. In some cases the program did better than the librarian because she wasted time looking in sources that did not give the specifics sought. For instance, when the question asked for the pronunciation of the surname of Paul and Lucian Hillemaker, French composers, she looked in diction- aries that do not give pronunciation. The computer found the only four possible sources immediately. In other cases the program came up with rather far-fetched answers a human would have skipped. A question asking for biographies of Franz Rakoczy, an Hungarian hero, retrieved in its second five sources three Jewish encyclopedias and a book on composers! These were not wrong and, in cases where occupation or minority group affiliations were un- known, these might be good sources. As an answer to the Nobel-prize-winner question the computer retrieved sources on American doctors, Nobel winners and scientists, which are the best choices from the data and would have the answers buried in them. However, what is really required is an index to award winners, and there were none in the data. The test revealed the necessity for allowing questions to have dummy values; that is, ones not used in the data. For instance there are no books limited to botanists, so OCCUP BOTANIST is not allowed in a question, though OCCUP SCIENTIST is, and CHEM and PHYSICS are included as more specific values under SCIENTIST. Asking for OCCUP SCIEN- TIST when searching for a botanist avoids getting books devoted to non- scientific occupations but also gets books devoted to chemists and physi- cists. Since one would want these books if he did not know the scientist was a botanist, that should not be changed. If he asks for OCCUP BOT- ANIST he wants books devoted to botanists first, then scientists in general. A short-term solution is to have dummy values to stand for all these other values. For example OCCUP OTHER-SCIENTIST could include all scientific occupations except those specifically listed, and it would re- trieve books limited to all scientists but not to specific scientific occupa- tions mentioned in the data. A long-term solution is to use a computer language allowing tree-structured data. Presently this problem does no more than cause extraneous retrievals which the person using the list can easily skip. DISCUSSION Advantages of the scheme can be speculated. From the library's point of view its virtues are that it is simple and inexpensive. Original imple- mentation would not require a major block of time to be spent in human indexing or abstracting. Operating costs would be low because it does not require such a large store of information in memory that several tapes must be searched, and because updating the file is simple. When a new Retrieval of Biographical Reference Books/ WElL 247 book is added, an experienced person could categorize it in five minutes, punch a new data card and, if required, add to the list of values in the table of abbreviations. The system could provide useful information to other departments. It could keep tallies for the acquisitions department of how often a book is given as an answer, indicating whether new editions of it or similar books would be good buys. From the user's point of view the system avoids a major pitfall of some retrieval schemes which retrieve on the basis of ambiguous terms or asso- ciation chains; that is, missing relevant items. If the user resubmits the same question ignoring already retrieved books each time, he will eventu- ally have a comprehensive list of possible sources in the data that have the index and specifics he requires. A user also wants his information as brief as possible, listed in order of importance and with no extraneous answers ( 7); this requirement could be met as the program stands by having a human simply cross out any unnecessary titles. Users like to know the reliability of the information ( 7) ; this detail could be provided along with the titles. Users also want speed and convenience. As it stands, this system could be made available to users of the University of Chicago Library tomorrow with no more equipment than is presently in the Computation Center. Time delay in the present implementation could be remedied by using an on-line system. Users often prefer to be given facts themselves and not just citations ( 7). A program that gives biographical facts directly has no connection with this scheme or classification system, but the output of this program could be used as a tool by a librarian to find the answer for a patron. BIBLIOGRAPHIES The most obvious area to which the retrieval scheme could be extended is that of bibliographies. Like biographies, they are limited in their scopes to certain exclusive categories, and they contain the same specific facts for each entry. Logical exclusive categories could be: NATIONALITY, FORM (with such values as drama, poetry, fiction, maps, etc. ), SUBJECT (probably the most frequently used criterion on which to select books for a bibliography), and DATE. Since there is no LIVING with which to connect DATE, DATE here should probably have not just the most re- cent relevant date but as many values as necessary. For instance DATE 40S 50S 60S would apply to an index that began publication in the 1940's and is current. Then a request for any of those dates would find it. Possible SPECIFICS include number of pages, the cost, or a facsimile of the title page. ARRANGEMENT would be needed, being different from INDEX in that bibliographies, unlike biographies, cannot be assumed to have the same order (alphabetic by subject's name) plus indexes in other orders. ARRANGEMENT would list as values all the ways the con- 248 Journal of Library Automation Vol. 1/ 4 December, 1968 tents of the bibliography could be approached: by subject, author, title, chronology or a combination of these. DICTIONARIES Dictionaries also lend themselves well to this type of scheme; one ex- clusive category, SUBJECT, might even be adequate for dictionaries. Dic- tionaries' special subjects could be broken down into FIELD (such as chemistry or business) and TYPE (such as slang or geography), if neces- sary. LANGUAGE would be a specific category, since there are no sub- stitutes for the language required. Other possible SPECIFICS are pronun- ciation, definition, etymology and illustration. ATLASES Atlases are also suited to the scheme. Exclusive categories that seem appropriate are AREA covered, special SUBJECT atlases, and the size of the SCALE. SCALE should probably act as DATE does in the biographi- cal program; that is, if a particular scale is requested, that would be searched for first and, if no answer is found, a note would be given and another search made for any scale. SPECIFICS for atlases could include items like topography, rainfall, winds, cities, highways and major products. Factual books (those that give the highest mountain, the first four- minute mile, the January lOth price of U.S. Steel, etc.) do not lend them- selves to the scheme. Because these books are not uniform as to entries and subject coverage, the list of possible specifics and exclusive categories would be extremely long and the number of searches consequently pro- hibitive. Also, since such books are far fewer in number than biographical or bibliographical works, the proper one is easier to find by browsing. CONCLUSION A scheme for categorizing biographical reference books by their exclu- sive and specific categories makes it possible to automatically retrieve titles of those which would best answer reference questions. When tested it was found acceptable, with minor refinements, and it is easily adaptable to other reference book forms. Such a system seems a logical direction in which to go when automation of actual reference functions is undertaken. ACKNOWLEDGMENT The project under discussion was undertaken in partial fulfillment of requirements for the M. A. degree at the University of Chicago's Gradu- ate Library School. The computer program employed is detailed in the author's thesis ( 8). The work was partially completed under the auspices of AEC Contract No. AT(ll-1)614. Retrieval of Biographical Reference Books / WElL 249 REFERENCES 1. University of Illinois Library School: The Library as a Community Information Center. Papers presented at an Institute conducted by the University of Illinois Library School September 29-0ctober 2, 1957 (Champaign, Illinois: University of Illinois Library School, 1959), p. 2. 2. Shera, Jesse: "Automation and the Reference Librarian," RQ, III, 6 (July 1964), 3-4. 3. Austin, Charles J.: Medlars 1963-1967 (Bethesda, National Institutes of Health, 1968). 4. Haas, Warren J.: "Statewide and Regional Reference Service," Li- brary Trends. XII, 3 (January 1964), 407-10. 5. Yngve, Victor: COM IT Programmers' Reference Manual (Cambridge, Mass.: M. I. T. Press, 1962). 6. Hsu, R. W.: Characteristics of Four List-Processing Languages (U. S. Department of Commerce, National Bureau of Standards, Sept. 1963). 7. Goodwin, Harry B. : "Some Thoughts on Improved Technical Infor- mation Service," Readings in Information Retrieval (New York, Scare- crow Press, 1964) , p. 43. 8. Weil, Cherie B.: Classification and Automatic Retrieval of Biographi- cal Reference Books (Chicago: University of Chicago Graduate Li- brary School, 1967). 2938 ---- 250 COMPRESSION WORD CODING TECHNIQUES FOR INFORMATION RETRIEVAL 'William R. NUGENT: Vice President, Inforonics, Inc., Cambridge, Massachusetts A description and comparison is presented of four compression techniques for word coding having application to information retrieval. The emphasis is on codes useful in creating directories to large data files. It is further shown how differing application objectives lead to differing measures of optimality for codes, though compression may be a common quality. INTRODUCTION Cryptographic studies have documented much useful language data having application to retrieval coding. Because unclassified cryptographic studies are few, Fletcher Pratt's 1939 work ( 1) remains the classic in its field. Gaines ( 2) has the virtue of being in print, and the more recent crypto- graphic history of Kahn ( 3), while comprehensive, lacks the statistical data that made the earlier works valuable. The word coding problem for language processing, as opposed to cryptography, has been extensively studied by Nugent and Vegh ( 4). Information theorists have contributed the greatest volume of literature on coding and have added to its mathe- matical basis, largely from the standpoint of communications and error avoidance. A brief discussion of compression codes and their objectives is here presented, and then a description of a project encompassing four com- pression codes having application to retrieval directories. Two of the Compression Word Goding/ NUGENT 251 codings are newly devised. One is Transition Distance Coding, a ran- domizing code that results in short codes of high resolving power. The second is Alphacheck. It combines high readability with good resolution, and permits simple truncation to be used by means of applying a randomized check character that acts as a surrogate of the omitted portion. It appears to have the greatest potential, in directory applications, of the codes considered here. Recursive Decomposition is a selected letter code devised by the author several years ago ( 4). It has been tested and has the advantages of simple derivation and high resolution. Soundex(5) is the only compression code that has achieved wide usage. It was devised at Remington Rand for name matching under conditions of uncertain spelling. OBJECTIVES OF COMPRESSION CODING It is desired to transform sets of variable length words into fixed length codes that will maximally preserve word to word discrimination. In the final directories to be used, the codes for several elements will be accessible to enable the matching of several factors before a file record is selected. The separate codes for differing factors need not be the same length, though each type of code will be of uniform length; nor need the codes for differing factors be derived by the same process. What we loosely call codes, must be formally designated ciphers. That is, they must be derivable from the data words themselves, and not require "code books" to determine equivalences. This is so because the file directories must be derivable from file items, ent:ries in directory form must be derivable from an input query, and these two directory items must match when a record is to be extracted. The ciphers need not be decipherable for the application under consideration, and in general are not. Fixed length codes which provide the rough equivalent and simplicity of a margin entry in a paper directory, are generally desirable for machine directories. The functions of the codes will detennine their form, and a code or file key designed to meet one objective will generally not be satisfactory for any other objective. The following typical objectives serve as four examples: ( 1) Create a file key for extraction of records in approximate file order, as is required for the common Sorting and Print- out Problem. A typical code construction rule is to take the first six letters. JOHNSEN _..,.. JOHNSE JOHNSON _..,.. JOHNSO JOHNSTON _..,.. JOHNST JOHNSTONE _..,.. JOHNST 252 Journal of Library Automation Vol. 1/ 4 December, 1968 ( 2) Create a file key for extraction of records under conditions of uncertainty of spelling (airline reservation problem). A typical code construction rule is Vowel Elimination or Soundex. A typical matching rule is best match. Vowel Elimination Soundex JOHNSEN_..,.. JHNSN J525 _..,.. J52 JOHNSON_..,.. JHNSN J525 _..,.. J52 JOHNSTON _..,.. JHNSTN J5235 _..,.. J52 JOHNSTONE _..,.. JHNSTN J5235 _..,.. J52 ( 3) Create a file key extraction of records from accurate input, with objective of maximum discrimination of similar entries (cataloging search problem). Typical code construction rules are Recursive Decomposition Coding or Transition Distance Coding. Recursive Decomposition Transition Distance JOHNSEN_..,.. JHNSEN BFTZ JOHNSON _..,.. JHNSON DNWU JOHNSTON _..,.. JHSTON ZIKY JOHNSTONE _..,.. JHSONE ECRC For the file keys of primary concern accurate imput data is assumed and the objective is maximum discrimination. Desirably, a code would be as discriminating as Transition Distance Coding and be as readable as truncation coding. This can be achieved to some degree by combining the two codes into one, with an initial portion truncated and a final check character representing the remainder via a compressed Transition Distance Code: Alphacheck. ( 4) Create a file key for human readability and high word to word discrimination. Possible code construction rules are Alphacheck, and simple truncation plus a terminal check character. JOHNSEN _..,.. JOHNSV JOHNSON _..,.. JOHNSX JOHNSTON _..,.. JOHNSD JOHNSTONE _..,.. JOHNSS METHODS The algorithms for creating the preceding codes are described in the following sections. It is axiomatic that randomizing codes give the greatest possible dis- crimination for a given code space. The whole trick of creating a good compression code is to eliminate the natural redundancy of English orthography, and preserve discrimination in a smaller word size. Compression Word Goding/ NUGENT 253 Letter-selection codes can only half accomplish this, due to the skewed distribution of letter usage. They can eliminate the higher-frequency components, but they cannot increase the use of the lower-frequency components. Randomizing codes-often called "hash" codes, properly quasi-random codes-can equalize letter usage and hence make best use of the code space. Prime examples here are the variants of Godel coding devised by Vegh ( 4) in which the principle of obtaining uniqueness via the products of unrepeated primes is exploited, as it is in the randomizing codes con- sidered here. The problem in design of a randomizing code is that the results can be skewed rather than uniformly distributed due to the skewed nature of the letters and letter sequences that the codes operate on. In Transition Distance Coding, the natural bias of letters and letter sequences is overcome by operating on a word parameter that is itself semi-random in nature. The following principle, not quite a theorem, applies: "Considering letters in their normal ordinal alphabetic position, and considering letter transitions to be unidirectional and cyclic, the distribution of transition distances in English words is essentially uniform." In view of the fact that letter usage has an extremely skewed distribu- tion, with a probability ratio in excess of 170 to one for the extremes, it is seen that the more uniform parameter of transition distances is a superior one for achieving randomized codes. The relative uniformity of transition distance needs further investigation, but one typical letter diagram sample from Gaines ( 2) with 9999 transitions (means number of occurrences of each distance = 385) yielded a mean· deviation of 99 and a standard deviation of 123, and an extreme probability ratio of 3.3 to one for the different transition distances from 0 to 25. The distribution can be made more uniform by letter permutation. Permutation is used in the algorithm for Transition Distance Coding but not in Alphacheck. Algorithm The method of Transition Distance Coding is used to operate on a variable length word to achieve fixed length alphabetic or alphanumeric codes that exhibit quasi-random properties. The code is formed from the modulo product of primes associated with transition distances of permuted letters. The method is intended strictly for computer operation, as it is a simple program but an extremely tedious manual operation. There are five steps: ( 1) Permute characters of natural language word. This breaks the diagram dependency that could make the transition dis- tances less uniformly distributed. This step might be dispensed with if the resulting distributions prove satisfactory without it. The permutation process consists of taking the middle letter (or letter right of middle for words with an even number of letters) , the first, the last, the second, the next-to-last, etc. 254 Journal of Library Automation Vol. 1/4 December, 1968 until all letters have been used. That is, for a letter sequence: at, a2, ... at ... ' an The following permutation is taken: arnt ( -i +1), at, an, a2, an-1, ... a2 ~oi l~ of orals, Vest aa4 .. ~:ibe r i~ Tu ndra and forest-tundra CRY OCE MICS Cryogenic engineering CBYOTURBATI011 central 23-319) soils 23-3195 23-2276 cryogenic ·fossil c(p.vasses coun ty. ··: CRYSTAL GBOVTH in L'is let 23-2314 Patter ns on the ice surface of a lake ~3- 27 14 CRYSTAL LATTICES Growth of an ice cr ystal ' in anal og y wit h a n elec tro s tatic field 23-2624 CRYSTAL STRUCTOBE Crystal struc ture of water 23-22ij5 Gr o vt h o f s nowflakes 23-2874 Si lver 'iod ide nucleating sites 23-2269 Snow c ry s tals in Fusb iai District,' Kyoto 23-2878 CBYSTAL STUDY T!CHIIQO!S Complexities of the three-diaensional ' shape of indivi dual crystals in g l ac i er ice 23-2943 CRYSTALS Physi ca l properties of aolecular c ry s tals, liquids, and glasselJ-2231 COBIC ICE Hexagonal and cubic ice at lov temperature 23-2651 Ten s ile strength of c ubic crystals under pressure . 23-2928 CYCLO'E BLOKIIG SIOK ft!f!R "Cyclone~ blowing snow •eter and its u se a t llirnyy 23-3073 CJBSTAL STODY TECUIIQO!S Contac't • ethod of photographing snow ann firn sa•ples 23-2799 DIIIAC! Avalan ches on Rebun I s land, Japan 23-29.13 Daaage by snowstora of Jan. 1963 in Japan 2l-a67 Por e s t da mage caused by avalanches ' 23-2875 Snov and ice da•age o n electric communication lin es in Hokkaido 23-2~8 1 DARAG!690 fOREST TBE!S Co•parative studies · of avalanche injury and vind daaage to forests 23-2ij2~ DUS Building da a s of •oraine deposits 23-2556 Bui l ding e abank•ents in freezing weather 23-3150 Chang ing the hydrologica l regiae of a river by controlling its flow23-2429 The ycar-round constru c tion of t~~ Yilyuy power station da• in t•e Extreoe North 23-2982 DBPOIR1TIOI Build in g d efor•ati o os caused bJ frost h eave 2 l- 26 07 Concrete defor•ation due to shrinkage at minus te•pera ture s 23-2558 Deformation of brid ge abut•ents erected on penafrosf 23-2599 Roadbed deforaati o n due to ground thawing and fros t heave 23-2865 Stab ility o f foundations built oo fr os t heaving ground 23-2598 Strains in concrete due to negati•e t e aperatures 2l-28ij5 D!GfiEE DUS De vel o pment of ~bore ice in the Lazare v station region 23-303 7 Fig. 3. Output of COBOL Language Program Using MARC II Data. MARC II and COBOL/ AVRAM and DROZ 271 Table 8. Manpower Expenditure Activity Analysis and Programming Debugging and Checkout Man Weeks 1 2 TOTAL 3 Since the processing time of a print program is usually a function of the speed of the printer, no accurate internal processing times were re- corded . . However, there was no noticeable time difference between this program and other MARC print programs written at the Library of Con- gress in assembly language. COMMUNICATION FORMAT PROCESSING The aforementioned techniques are equally adaptable for use with the MARC II communications format ( 3) with the following changes in format conventions: 1) The communication format has a 24-character leader rather than 92 characters of fixed length items in the processing format. In the program, under the "WORKING-STORAGE SECTION", the group item labelled "FIXED-MARC" would have to be redefined to conform with the 24-character leader. The COBOL statements that are noted with " 0 0 " would require a change of their value from "92" to "24". 2) The communication format has no total count of entries in the record directory. A calculation would have to be made to arrive at the total count and that figure stored in a new hold area labelled "DIRECTORY-COUNT". The base address of the data in the communication format is not rela- tive to the first position of the record as defined in the processing format, but to the first position of the first variable field. This base address is carried in the record leader, and is available for the calculation required for the Directory Entry Count (Base address -24/ 12). In the program, after the record directory had been searched and the proper entry placed in the work area, the "MOVE-DATA" sub-routine would move the appropriate field to the work area for processing with the one alteration noted below with an asterisk. MOVE-DATA. MOVE ZEROS TO TSUB. MOVE SPACES TO HOLD-DATA. MOVE D-ADDRESS TO DSUB. ADD BASE-ADDRESS TO DSUB. o PERFORM MOVE-A D-LENGTH TIMES. MOVE-A. ADD 1 TO DSUB. ADD 1 TO TSUB. MOVE MARC-BYTE (DSUB) TO D-HOLD (TSUB). Programming techniques naturally are dependent on the processing re- quired and the format characteristics at the individual institution. If the MARC II communications format were to be manipulated in the form 272 Journal of Library Automation Vol. 1/ 4 December, 1968 in which it is received (each byte equal to a character with a 24-character leader followed by 12-character directory entries) an alternate approach to that suggested above could be to work in the record area and not move data to a work area. CONCLUSION The only MARC II data available to users up to the writing of this article (October 1968) has been the MARC II test tape released by the Library of Congress in August 1968. Therefore, it is probable that most people expressing doubts about the use of COBOL with MARC records have done so without the experience of actually using the language. We now have this experience at the Libary of Congress. COBOL was suc- cessfully used for the computer processing of MARC records. The com- plexity of the record did not detract from ease in programming. Although the programs written were for a report function, the data accessing modules of COBOL nevertheless can be used for many other functions. File maintenance and retrieval algorithms could be defined and programmed in COBOL with facility equal to that in programming the subject function. REFERENCES 1. Griffin, Hillis: "Automation of Technical Processes in Libraries," In Annual Review of Information Science and Technology, edited by Carlos A. Cuadra (Chicago: Encyclopaedia Britannica) 3 (1968), 241-262. 2. U. S. Library of Congress, Information Systems Office: Subscriber's Guide to the MARC Distribution Service (Washington, D. C.: Li- brary of Congress, 1968). 3. Avram, Henriette D.; Knapp, John F.; Rather, Lucia J.: The MARC II Format: A Communications Format for Bibliographic Data (Wash- ington, D . C.; Library of Congress, 1968 ), pp. 1, 2, 10. 3794 ---- 1 AN AUTOMATED MUSIC PROGRAMMER (MUSPROG) David F. HARRISON, Music Director, WSUI-KSUI, and Randolph J. HERBER, Applications Programmer, University Computer Center, The University of Iowa, Iowa City, Iowa A system to compile programs of recorded music for .broadcast by the University of Iowa's radio stations. The system also provides a permanent catalog of all recorded music holdings and an accurate inventory control. The program, which operates on an IBM 360/65, is avaaable in FOR- TRAN IV, COBOL and PL/1, with assembly language subroutines and eXternal maintenance programs. The State University of Iowa (Iowa City) owns and operates two broad- casting stations, WSUI, at 9l0 KC, and KSUI, at 91.7 MC. WSUI was the first educational radio station in operation west of the Mississippi, and ranks among the oldest stations iri the country; KSUI was among the earliest of the frequency modulation outlets in the area to offer pro- gramming in multiplex stereo. In the Spring of 1967, when it became necessary to completely reorgan- ize their recorded music libraries, an investigation wali simultaneously underway to determine the feasibility of utilizing automated data proc- essing ( A.D.P.) techniques in the discographic operations of the stations. At the time there were several working bibliographic applications ( 1), ranging from relatively simple record-keeping (Where is ... ? ) to more ambitious cross-referencing and indexing operations, one of which uses the KWIC (keyword-in-context) computer program to classify musical re- cordings ( 1). On the basis of the awareness of these applications, and a belief that the intrinsic principles could be utilized and extended to cover somewhat different needs, it was proposed that the facilities of the Uni- 2 Journal of Library Automation Vol. 2/1 March, 1969 versity Computer Center be employed in the selection and updating of recorded music programs. In designing a coded set of instructions to per- form these tasks, it was deemed necessary that any attempt at the selec- tion or compilation of a series of music programs should be made in ac- cordance with certain criteria supplied to the system by the user, and that these selection specification parameters should closely parallel those which would be employed were such an extraction from the total librar- ies to be performed manually. Additional requisites were that provision be made for updating and enlarging the master file as new items were acquired, and that the coding of the programmed instn.1ctions should be sufficiently flexible to permit inclusion of supplemental criteria as they became desirable. The above proposal met with a certain degree of opposition, the main bone of contention being that such an application would necessarily "de- humanize" music programming. There have been, and will continue to be, similar objections raised by those who are unaware of the advantages offered by A. D. P. and concomitantly unaware of the mental processes which result in what is commonly referred to as "artistic judgment." It is not the purpose of this article to attempt an exhaustive analysis of such processes, nor to castigate the objectors; it is rather simply to bring forth several basic observations dealing with the problem under discussion. A contemporary composer-theorist interested in the applications of A. D. P. techniques to the process of musical composition has observed that no paradoxical "almighty force" exists in science, which, in actual fact, progresses by discrete steps which are at once limited but unpre- dictable ( 3, 4). The following list of conclusions, although relating specifi- cally to the problems of machine-"created" music, find no less an applica- tion to the current problem: Creative human thought is an aggregate of restrictions or choices in all fields of human activity, including the arts. Certain aspects of these ·judgments can be mechanized and simulated by certain physical mechanisms currently extant, including the cm;nputer. The rapidity of calculation or decision by computer frees human be- ings from the long and arduous task of manually selecting, compiling and checking of programmed works. The time thus saved can be better spent on such amenities as scripting, with complete performance information and record data, and the always-too-necessary pronouncing aids. More- over, the computer program can be "exported" to any place similarly equipped to be used by other individuals, or where other programmers are able to alter the algorithm to meet their specific needs. The Automated Music Programmer ( MUSPROG) was interpreted as being a series of steps, the first of which specifies that complete mus~c programs are to be selected in accordance with a table of specifications introduced as data, each card containing inforination pertinent to a dis- crete program. The second step requires that each and every entry in the catalog be I f Automated Music Programmer/HARRISON and HERBER 3 checked for availability by any program in the tables established in the preceding step, this status to be determined on the basis of a satisfactory comparison with the individual criteria supplied on the selection specifi- cation card. Among these are "tests" (note that a failure to meet the re- quirements in any step disqualifies the item) to determine when the item was last selected, as well as the number of times selected; a check for allowable time length; a check for duplication of composer and/ or title; a statement that stereo recordings are to be used only for FM; a check for acceptable period, style and type of composition; and the decision to update the master file. In the final operation of the program, each duplicate title of a work selected is also updated, simulating selection to prevent its selection dur- ing the next month. If each duplicate were given the date factor of the item actually selected, the latter would tend to appear much more fre- quently than its companions because the program would continue to se- lect the longest available item, and it is reasonably safe to assume that the selected item is the longest version of the title in question. It was necessary, therefore, to devise a means by which each version of a given work (indicated by both title and composer) be given equal weight for "fair" selection. A unidimensional array called ITEM was constructed with ten positions as follows: ITEM (10)/0', '0', '0', '0', '1', '1', '1', '2', '2', '3'/. The index of the array was then selected by referencing a routine which generates random, positive integers in the range one through ten. The contents of that position in ITEM are added to the date factor of the record selected, and the result placed in the corresponding field of the duplicate title under scrutiny. Thus there exists a 40% probability that the duplicate will have the same "weight" as the selected item, a 30% chance that the duplicate will be "pushed back" one month, 20% for two months, and a 10% probability that the date factor of the duplicate title will be increased by three months. 'When all the titles have been thus read or updated, the run concludes. Figure 1 is the flowchart that is the basic design of MUSPROG and from which the computer program was coded. The program runs on an IBM 360/65 and is available in FORTRAN IV, COBOL, and PL 1 with assembly language subroutines and external maintenance programs. Cop- ies of these programs may be obtained from the National Auxiliary Publi- cations Service (NAPS #00278). The machine readable catalog system currently employed by the Uni- versity's radio stations is, on the whole, independent of the record's origin or manufacture. (The catalog number could be considered as nothing more than an indication of a discrete shelf space. ) The system was de- signed to facilitate maximally efficient use of the 80 columns available on a punch card. By utilizing two alaphabetic and two decimal characters ranging from AOO through ZZ99 provision is made for identification of records and tapes in quantities somewhat in excess of seventy-thousand 4 ] ournal of Library Automation Vol. 2/1 NUMBER Of P ROGRAM SPEC .. lfiCATION SPACES +1 TO SNUMI SNUMI -1 TO SNUMI WRITE PROGRAM DETERMINE fOR WHICH STATION PROGRAM IS BEING SELECTED INDIC ATE NULL SELECTION liST IY SETTING POl NTER= ~ O'flfRMINE WHICH COMPO~ENTS Of THE AllOWABlE CHARACTERISTICS ARE SIGNifiCANt Fig. 1. Flowchart for MUSPROG. March, 1969 TO THE PAOGRAM POINTER SET J TO POINTER IN PIECE CEP•J-It--l.--------' I fROM lAG ADD OH f 10 I\IUMifR Of PlfCfS SflfCTfD CHAIN rl f: Cf TO BEGI NN I NG Of liST FOR ,ROGRAM SUI I RAC T DURATION fROM TIME REMAIN ING C OM,U I f lAG fR OM DU RA l IO N COPY P I E CE INfORMAl I ON 1 ~ 1 0 SPAC f Automated Music Programmer/ HARRISON and HERBER 5 PH 1 W RI TE PROG RAM POI NTER REWIND ~NPUI AND R AN OOM C HANG E IN TO PR OPU l A G f i EL O Fig. 1 continued. 6 Journal of Library Automation Vol. 2/1 March, 1969 individual discs or reels. The total of actual single titles possible to catalog in this manner is at least twice that number. The card catalog is made up along more or less standard, triple-refer- ence lines on the familiar 3x5-inch card. These remain in the master card file, but are actually used only for reference purposes, rather than for actual selection. The "real" Master Library exists in the form of punched cards (later transferred to magnetic tape). Each card image contains the following information, with blank columns separating contiguous fields: Columns 1-10 Composer, or first ten characters if abbreviation necessary. 12-27 Title, abbreviations standardized 29-33 Duration of work in seconds. 35-37 Period of composition. 39-40 Type of composition. 42-45 Catalog number. 47-57 Physical location of item on cataloged disc or tape. 59-64 "Date fields, used for updating and usage factors. 66-69 Seasonal key, a blank indicating general usefulness. 71-80 Field used by MUSPROG for internal record-keeping. OPERATION Selection of music by the system is performed in accordance with a table of program specifications which includes information pertinent to the length of the desired program and maximum permissible length of any single work within it, the type of music desired, and additional in- formation, such as date, time and title of the program to be aired and an indication of the station for which the program is to be selected. All the selections for KSUI ( FM) are required to be stereophonic. Classifica- tion into stereophonic and monophonic groups is a function of the catalog number, AOO through Z99 being stereophonic and AAOO through ZZ99 being monophonic. A program selection card contains the following data: Columns 1 2-6 7-11 12 13-27 28-79 Station code: W for WSUI, blank for KSUI Duration of program in seconds. Maximum duration of each item to be selected ( 0 or blank indicates program may consist of but a sin- gle work equivalent in length to program duration. Number of types being specified. Three three-plus-two character fields to specify pe- riod and type (MODIA equals "twentieth-century, orchestra"). If any field is blank, MUSPROG assumes anything acceptable. Title of program to be selected, day and time. Automated Music Programmer/HARRISON and HERBER 7 As an example, the following specifications were made for a program called "Aubade" which was aired at 10:00 a.m. on Tuesday, July 30, by WSUI. Program duration was to be 3400 seconds (56:40), allowing 3:20 for continuity. Maximum length of any single work within the program was to be 900 seconds (15 :00). Music could be chosen from the contem- porary orchestral repertoire, any instrumental work from the Classic pe- riod, or any type "3" work, i. e., soloist and piano, or chorus a cappella. Figure 2 shows a printout of selections for two programs. MUSIC SEL ECTED FO R WS UI EVEN I NG CON CERT 5:30 PM THU RSDAY. JULY 2 5 PR OGRAM NO, 6 9 LE NG TH 860 0 UNUSE D TIME IS 1 TOTA L 2:23:19 RANGS TROM Di V EL EG !ACO FA41 52/82 PROK O~ ! EV SEMYON KOTKO SUI KA 4 7 51-2/E BACH CANTATA 146 MA 19 S1-2/E BEETHO VEN PIA CO N ARR VN C KB-> 1 S 1- 2/ E MUSIC SELE CTE D FOR WSUI 0:15:32 0:42:12 0:42:25 0 :43:10 EVEN I NG CO NCE RT 5:30 PM TU ESDAY• JULY 30 PROCR Mt NO , 6 7 LENGTH 060 0 UNUS ED T IM E IS 0 TOTAL 2:23:20 HA YDN SY MP HO NY 38 Kfl21 51/E TCHAIK OVSK SY~ PHONY 5 GA 5 0 S l -2/E I VE S PI A SON 2 CON COR NA8 1 S l-2/ E BEETHOVEN STRIN G QRT 15 CA 12 51-8/E Fig. 2 Printout of Selections. 0 : 13:40 0 ! 42:58 0:4 3: 0 An additional feature of MUSPROG is provision for a periodic sum- mary of library usage, affording the Librarian a concise account of fre- quently played items, as well as an indication of those works which have been selected infrequently or ignored altogether. This report allows the programmer to assess more accurately the maximum number of times a selection may be programmed before it is declared unacceptable. The system also puts out printed lists of works extracted from the library in accordance with a user-specified table of reference fields: e. g., all sym- phonies, all works by Bach, all works of under ten minutes in length, all Christmas music; or conceivably, any symphonies by Bach which are 8 Journal of Library Automation Vol. 2/1 March, 1969 suitable for Christmas and less than ten minutes long. This latter step could also include, with minor alterations in the computer program, pro- vision for performances by one specific ensemble or artist only. An external program allows adding items to the master tape, deleting those no longer needed and correcting any of the various fields within individual records; thus if mis-timings or other inaccuracies are noted, it becomes a relatively simple matter to correct them. DISCUSSION It can readily be seen that "the machine" neither possesses nor displays "taste" in any conventional sense of that word, since it can select only those types of music which the programmer has declared acceptable. It does not, indeed cannot, show any predilection toward certain types of music to the detriment or exclusion of others, save those which have been removed from the list of potential selections by the programmer. It per- forms no independent judgments. Without doubt, then, there is no logical basis for t.~e cry of "dehumanization," since the program was originally designed by human minds and is, at each step of the process of selection, governed by the human-designed control parameters and program speci- fications; therefore it cannot select music willy-nilly, but must be told what to do and how to do it. It also has been found that specifications cannot be "plugged in" at random, for the programs thus selected would prove little more than a conglomerate of sundry works bearing no relation to one another. It is very much a necessity that organization and logic be designed into each program to have any coherent programming result. The machine does not "know" what to do unless told. It should be brought out that because of a built-in logic and the order of titles on the master file the program will tend to select the longest works available to fill the specified program time, making up the differ- ence, if any, with progressively shorter pieces until the time is filled, or until no work of acceptable type and sufficient brevity can be located. Since the longer works tend to occur among certain types and/ or styles of music, there may be some tenuous grow1ds for a suspicion of bias. It will be observed that MUSPROG does not include information perti- nent to performer, conductor, etc. One of the several reasons for this ap- parent oversight is that such information would, at the outset, have re- quired the use of one to four additional data cards per title. Since this information was not deemed absolutely essential to the immediate func- tions of the program, it was decided to postpone inclusion of such a re- finement to some future date. CONCLUSION MUSPROG has been utilized by the State University of Iowa since March, 1968, and has resulted in considerable time-saving. For example, the July, 1968, programming required one hundred and two programs Automated Music Programmer/HARRISON and HERBER 9 varying in length from thirty minutes to somewhat over four hours, and consisting of a variety of musical styles and representing a diversity of programming difficult to achieve efficiently by ordinary means. In three minutes and twelve seconds, MUSPROG selected the programs, updated the catalog, checked for duplication of selections, timed each program, and printed out the resultant copy properly headed. At an approximate cost of $250.00 per hour, this comes to less than fifteen dollars per month to perform tasks which might normally require two persons, at perhaps two or three dollars per hour, to work an entire week or more. It is doubtful that even then each catalog entry could be examined and an accurate record of usage be kept. ACKNOWLEDGMENTS A Staff Research Grant from the Graduate College, University of Iowa, partially supported development and operation of this system. Dean Duane C. Spriestersbach of the Graduate College, Professor Ge- rard P. Weeg, Chairman of the Department of Computer Science, and program Supervisor Robert E. Irwin gave generous support and encour- agement to the development of MUSPROG. REFERENCES 1. 'Wilhoit, G. Cleveland: "Computerized Indexing for Broadcast Music Libraries," Journal of Broadcasting, 11 (Fall, 1967) 325-337. 2. Brook, Barry S.: "RILM, Repertoire Internationale de Ia Litterature Musicale," Notes; the Quarterly Journal of the Mtisic Library Associa- tion, 23 (March, 1967) 462-467. 3. Xenakis, Iannis: "In Search of a Stochastic Music," Gravesano Review, 11 (1958). 3795 ---- 10 HIGH SCHOOL LIBRARY DATA PROCESSING Betty FLORA: Librarian, Leavenworth High School, Leavenworth, Kansas and John WILLHARDT: Data Processing Instructor, Central Missouri State College, Warrensburg, Missouri. Planning and operation of an automated high school library system is described which utilizes an IBM 1401 data processing system installed for teaching purposes. Book ordering, shelf listing and circulation have been computerized. This paper presents an example of a small automated high-school library system which works efficiently. A great deal of emphasis to date in library automation has been on large university and college libraries, but the rela- tively few schools that have pioneered in the field of school library auto- mation have demonstrated its feasibility and its potential. Data processing is economically within the realm of large and medium-sized school dis- tricts. The Port Huron District, Port Huron, Michigan, has an accounting machine, keypunch and verifier; among the operations performed are printing purchase orders and book cards. The Port Huron staff consists of one professional librarian, two clerks and two part-time working students. Evanston Township High School, Evanston, Illinois, has an automated library system processed with an IBM 1401 computer. Other high schools using library data processing are the Oak Park-River Forest High School in Illinois; Beverly Hills, California; West Hartford, Connecticut; Weston, Massachusetts; and the Burnt Hills-Ballston Lake and Bedford-Mt. Kisco School Districts in New York State (1). There are a small number of high schools and vocational schools in Kansas and Missouri that have High School Library EDP/FLORA and WILLHARDT 11 data processing equipment which is used for teaching purposes. Names and addresses of these schools may be obtained from the Missouri Di- rector of Vocational Education at Jefferson City, Missouri, and from the Kansas State Supervisor of Technical Training at Topeka, Kansas. INTRODUCTION Leavenworth Senior High School, Leavenworth, Kansas, a campus- style school comprising six buildings, has approximately 1350 students. The Library, located in the main academic building, is presently being remodeled and enlarged. It contains approximately eighteen thousand volumes, including the professional collection; and fifteen hundred to two thousand new volumes are added each year. The Library staff consists of one qualified librarian, two full-time clerical assistants, and twenty student assistants, each of the latter working one class period a day. The library is, in the true sense of the term, a media center. A mobile listening center is available, and there are large collections of recordings, cartridge and reel tapes, film strips, films, microfilms, reproductions of paintings, educational games, magazines and vertical file material. Fortunately, there is a consistently substantial budget of more than eight dollars per student, including some federal funds, which makes additions to the collection possible in stable development. · Data processing at Leavenworth High School was made possible by the Vocational Education Act of 1963, which provided for the Secretary of Health, Education, and Welfare to enter into agreements with the several State vocational education agencies to provide such occupational training as found to be necessary by the Secretary of Labor (2). Under the provisions of the Act, federal money is alloted to the states, which in turn allot a portion of this money to various school districts; a school system receiving such money must lease or purchase data processing equipment and use it mainly for teaching purposes. A data processing curriculum was initiated in the school year 1964-65 at Leavenworth High School, under conditions and regulations set up by the State Supervisor of Technical Training which gave first priority in the use of the data processing equipment to teaching. This has been ad- hered to strictly at Leavenworth High School; the equipment is used over half of the school day for teaching purposes and adult education courses in data processing are offered at night. Class time consists of lecture and application, with students having opportunity to operate, wire, program and test problems. Data processing classes are scheduled first in the Com- puter Room; administrative and library operations are scheduled to be processed in the remaining hours during the school day and after school, each operation being assigned a specific time. Although unit record equipment was initially leased, plans for a small computer were included in the original decision to offer data processing courses. Equipment, plus salaries to those conducting the program, con- 12 1 ournal of Librm·y Automation Vol. 2/1 March, 1969 stitute a major investment for a medium-sized public high school. Conse- quently, although the classes are a valuable addition to the vocational training area of the curriculum, as many applications as possible are made of school operations, such as enrollment, record keeping, grade reports and payroll, in order to further justify the cost. For this reason, the Super- intendent of the Leavenworth school system suggested that the Library might, by using data processing in many of its procedures, both support the data processing instructional program and increase its own effective- ness. METHODS AND MATERIALS To develop a system requires systems analysis, which necessitates a clear formulation of purposes and requirements independent of any par- ticular design for implementation ( 3); and the development of procedural applications to be processed on a computer system should be a joint re- sponsibility of both the systems staff and line management ( 4). Further- more, any conversion of library procedures to automation should be care- fully planned in advance. Proceeding in the fullest cooperation with a view to mutual benefits, the Librarian at Leavenworth and the Head of Data Processing spent many hours working out the details of their joint effort. The Librarian explained her needs and suggested methods of achieving the desired objectives. For his part, the Head of Data Process- ing evaluated the possibilities from a technical point of view and sug- gested methods of achieving the desired objectives. Together they worked out an initial plan, and the various phases were then programmed. The Leavenworth Data Processing Library System was set up to 1) order all new library books; 2) complete shelf cards and book checkout cards; 3) run shelf card listings; 4) correct and file shelf cards; 5) repro- duce book checkout cards for books checked out; 6) run first and second overdue notices; and 7) provide library inventory, book count lists and book catalogs. All the lists, notices, and reproduced cards are done on the 1401 computer; computer programs for these operations are written in Autocoder. The amount of computer time required for the processing of library data and reports is comparatively small in relation to other op- erations of the Data Processing Department and was set up to run partly in the daily schedule and partly after school. Time required for prepara- tion of information for the computer is significant and must be scheduled more carefully. Again, part of this time is fitted into the daily schedule and part of it is accomplished after classes. The high school leases the following IBM data processing equipment: two 024 Card Punches, one 026 Printing Card Punch, one 082 Sorter, one 548 Interpreter, one 085 collator, and one 1401 computer with 4K and one disk storage drive. The 1401 computer consists of the 1401 Cen- tral Processing Unit, a 1402 Card Reader Punch, a 1403 Printer and a 1311 disk storage drive. High School Library EDP /FLORA and WILLHARDT 13 The following cards were developed for the procedure: Shelf Card A is punched from lists of books to be ordered and only the following in- formation and columns are punched: author name (columns 14-35), title (columns 36-71), copyright date (columns 72-73), and purchase date (columns 79-80). When the book is received, this card is completed with the following information: shelf letter (column 1), Dewey decimal num- ber (columns 2-7), author number (columns 8-13) and accession number (columns 7 4-78). Shelf Card B is punched and filed behind Shelf Card A. Only the fol- lowing information and columns are punched: price (columns 8-13), pub- lisher (columns 36-65), and an X-punch in column 80. The Book Checkout Card (Figure 1) is first reproduced from the com- pleted Shelf Card A, and after that from Book Checkout Cards when books have been checked out of the Library. This card contains the shelf letter (column 1), Dewey decimal number (columns 2-7), author num- ber (columns 8-13), author name (columns 14-30), title (columns 31-66), student number (columns 68-73), accession number (columns 7 4-78), and an X-punch in column 80. !lOOK TITLt AUTHOR I ACCEPT RESPONSIBILITY FOR THIS BOOK ANO SHOULD THIS BOOK BE LOST, DESTROYED OR STOLEN WHILE CHECKED OUT TO ME, I WILL PAY THE REPLACEMENT COST OF THE BOOK, I AGREE TO PAY niE FIN~ FOR OVERDUE BOOK$ AS FOLLOWS: I TO 5 DAYS OVERDUE 2¢ PER DAY 6 TO 10 DA'IS OVERDUE !5¢ PER DAY OVER 10 DAYS ~l:Rt!V£ 10¢ PER DA't ---=moc.,=-=T '""':=::-'""' __ ] Fig. 1. Book Checkout Card. STUDENT tiVMDIR I A Student Finder Card locates the student's name and parent's name and address on the computer disk pack. The biggest initial task was keypunching an IBM card for each book in the Library, which at that time comprised 13,000 books. It was done by data processing students in the high school, working occasionally dur- ing class, but mostly after class and on Saturdays on a voluntary basis. Toward the end of the second semester, many of the procedures had been reviewed and discussed with students in the data processing classes as part of the vocational program. 14 Journal of Library Automation Vol. 2/1 March, 1969 AondB CARD I NTE'RPRET CARDS RUN LISTING ON 1401 RE-:-RUN LIST Fig. 2. Book Order Procedure. iNTERPRET BOOK CHECKOU CARDS INTERPRET SHELF CARD REPRODUCE SHELF CARD INTO BOOK CHECKOUT CARD Fig. 3. New Book P1'0cessing. High School Library EDP/FLORA and WILLHARDT 15 R ECEJ VE BOOK · CHECKOUT CARDS FROM LIBRARY REPRODUCE NEW BOOK CHECKOUT CARDS INTERPRET NEW BOOK CHECKOUT CARDS RETURN OLD AND NEW BOOK CHECK: OUT CARDS TO ll BRARY Fig. 4. Book Checkout Procedure. CARDS FOR OVERDUE BOOKS SEND FINDER CARDS TO DATA PROCESSING RUN ADDRESS LABELS RETURN LABELS AND FINDER CARDS TO LIBRARY Fig. 5. Overdue Notice Procedure. 16 Journal of Library Automation Vol. 2/1 March, 1969 Book Order (Figure 2) The Library furnishes the Data Processing Department with request cards or lists of books to be ordered, giving author name, title, copyright date, price, publisher, and purchase date (year). Data Processing punches two cards for each book according to Shelf Cards A and B. These cards and batches must be kept in the order received from the Library. The cards are interpreted, checked for correct punching and listed by batch. The Library must check the number of copies ordered and the total amount of each group or batch. After verification and corrections, the cards are returned to Data Processing for rerunning of the number of copies necessary to send with the purchase order. New Book Processing (Figure 3) When new books are received, the Library staff discards Shelf Card B and writes the following information on Shelf Card A for punching in the columns indicated: shelf letter in column 1 ( B for biography, K for Kansas, P for professional, R for reference, S for story collection, or a blank which indicates fiction); Dewey decimal number in columns 2-7; author number in columns 8-13; and accession number in columns 74-78. These columns are interpreted on the 548 Interpreter. Shelf Card A is used to reproduce the Book Checkout Card. Shelf cards are block sorted on column 1; each group is then sorted by author number and Dewey decimal number. Individual cards must be hand filed into the shelf list. The shelf list can be used to provide classification listings, inventory list- ings, library book counts and book catalogs. Book Checkout Cards are interpreted, sorted by author name (columns 14-23 alpha), returned to the Library and filed in the respective books. Book Checkout Card Reproduction (Figure 4) As books are checked out of the Library, the Book Checkout Card (Figure 1) is signed by the student and his number is written on it. Once a week accumulated Book Checkout Cards are sent to Data Proc- essing to be reproduced into new Book Checkout Cards which are inter- preted and merged behind the old Book Checkout Cards. Each week's cards are kept separately. The old cards are for books due in the library in two weeks. The new cards are inserted in these books as they are re- turned and the old ones placed in a separate file for library circulation statistics. Overdue Notices (Figure 5) The Library is provided with a deck of Student Finder Cards (on~ for each student), with student name, number and finder number on the card and in the address file on the disk. When books are overdue, Finder Cards are pulled by the Library staff and sent to Data Processing,. where they are sorted by a disk accession number. Address labels are run on High School Library EDP/FLORA and WILLHARDT 17 the 1401 computer for those students with overdue books. These labels are presently attached to pre-printed envelope overdue notices (Figure 6), but it is planned to replace the envelope with a continuous-form post card. The first notice is addressed to the student at his home and the second to his parents. LEAVENWORTH SENIOR HIGH SCHOOL LIBRARY If you have returned your overdue library materials disregard this notice •••• IF NOT, PLEASE COME TO THE LIBRARY AT YOUR EARLIEST CONVENIENCE. 1 to 5 days overdue..................... 2¢ per day 6 to 10 days overdue ..................... 5¢ per day Over 10 days overdue ..................... lO¢ per day Fig. 6. Overdue Notice. DISCUSSION The book checkout and overdue notices procedures were the first con- crete ones developed. These were initiated during the 1965-66 school year, and have proved to be quite successful in saving time and effort. One of the most useful purposes of the Leavenworth system is that any portion of the shelf list can be easily provided for an instructor who wishes to assign special readings. Also the system has simplified and accelerated preparation of lists for inventory purposes. The ordering process gives the librarian the opportunity to check the order lists before forwarding them to the business manager; this improves the accuracy of the order. 18 I ournal of Library Automation Vol. 2/1 March, 1969 Standardization of procedure and operation is essential for efficiency ( 5,6). Basically the Leavenworth procedure utilized two types of cards, SheH Card A and the Book Checkout Card, which are very similar in format. SheH Card A is initiated when ordering books and is used to re- produce Book Checkout Cards and to make sheH listings, inventory and book count listings. Moreover the system was designed on the basis of having a minimum of skilled clerical workers. Student help is used for correcting and filing sheH cards. The ability to provide a book catalog in the future is an advantage. A book catalog need not be confined to one area and may be done in multi- ple copies. Different editions of a work may be more readily seen and compared on a printed page than in a card catalog, where only one entry can be examined at a time. Also a book catalog may concentrate in a single easily handled volume entries which would occupy several heavy drawers in a card catalog (7). One of the problems associated with developing a system like the one here described is that of communication. As in all technical and profes- sional areas, a specialized terminology develops, a kind of esoteric jargon which confuses meanings and impedes understanding. This difficulty nat- urally diminishes as each party to the cooperative effort becomes more familiar with the terminology of the other, and a little plain talking and clear thinking will soon eliminate it. The effectiveness of an automated library program depends, of course, upon the unqualified cooperation between the library and the data proc- essing department. The librarian must establish a reasonable and accept- able schedule of work upon which the data processing department can depend, and she must assure that library material essential to that work is delivered according to schedule. Conversely, the data processing de- partment must undertake to complete the work promptly and accurately. EVALUATION Certainly one of the most significant benefits of automation is the great saving of time. Tedious and detailed tasks essential to the efficient opera- tion of any library, tasks which formerly required many hours to complete and which had by their natures to be repeated periodically, are accom- plished in a fraction of the time. Consequently, the librarian is freed for more professional work; most importantly, she has more time to give to the students and their problems, which should be, above all, her first concern. The value of the Leavenworth High School Library System lies not only in greater accuracy and saving of time for the Librarian and he~ staff, but also in the opportunity it provides for student help to learn and operate a system. It is apparent, finally, that automation, properly applied, can be an invaluable asset to the school library. Like all systems it depends, in the High School Library EDP /FLORA and WILLHARDT 19 final analysis, upon the human factors involved. So long as interests are mutual, and so long as efforts are equal, the library and data processing departments can work effectively together for the benefit of both. ACKNOWLEDGMENTS Mr. Jack Spear, KSU, Manhattan, Kansas, advised on the initial plan- ning of the system. The authors received cooperation and encouragement from Mr. Gordon Yeargan, Superintendent of Schools in Leavenworth, and Mr. Dino Spigarelli, Principal of Leavenworth High School. Mr. Fred Buis, data processing instructor at the high school, helped with the prepa- ration of this paper and is continuing to develop the potential of the system. REFERENCES 1. McCusker, Sister Mary Lauretta: "Implications of Automation for School Libraries - Part 2," School Libraries, (Fall, 1968), 15-22. 2. United States Department of Health, Education and Welfare: Voca- tional and Technical Education (Washington: Government Printing Office, 1964). 3. Markuson, Barbara Evans, ed.: Libraries and Automation (Washing- ton: Library of Congress, 1964). 4. Elliott, Orville C.; Wesley, RobertS.: Business Information Processing Systems (Homewood, Illinois: Richard D. Irwin, Inc~ , 1968). 5. Laden, H. N.; Gildersleeve, T. R.: System Design for Computer Ap- plications (New York: John Wiley & Sons, Inc., 1963). 6. Dougherty, Richard M.: "Manpower Utilization in Technical Serv- ices," Library Resources and Technical Services, 12 (Winter, 1968), 79-80. 7. Kingery, Robert E.; Tauber, Maurice F., eds.: Book Catalogs (New York: The Scarecrow Press, Inc., 1963). 3796 ---- 20 FILE ORGANIZATION OF LIBRARY RECORDS I. A. WARHEIT: International Business Machines Corporation, San Jose, California Library records and their utilization are described and the various types of file organization available are examined. The serial file with a series of inverted indexes is preferred to the simple serial file or a threaded list file. It is shown how various records should be stored, according to their utilization, in the available storage devices in order to achieve opti- mum cost-performance. One of the problems data processing people are beginning to face is the organization of library files. These are some of the largest and most vo- luminous files that will have to be organized, maintained and searched. They range in size from the National Union Catalog of the Library of Congress, which has over sixteen million records with an average of three hundred characters each, down to the hundreds of small college catalogs of 100,000 records. There are more than fifty universities whose holdings range from one million to over eight million volumes. The average hold- ings of library systems serving cities of 500,000 or more exceed two million volumes, although the actual number of titles is less. Since the tum of the century the university libraries have been growing exponentially and at present are doubling, on the average, every fifteen years. , Also the abstracting-indexing services, whose records are very similar to library catalog records and are used in much the same way, have grown very large. Chemical Abstracts which has been operating since 1907, now has over three and a half million citations. It provides data on File Organization of Library Records/W ARHEIT 21 some three million compounds and is today adding over a quarter of a million citations each year. If the present rate of growth continues, it will be adding 400,000 citations a year by 1971. Index Medicus and Biological Abstracts are very similar and there are a number of other somewhat smaller bibliographic services in the field of metals, engineering, physics, petroleum, urban renewal, atomic energy, meteorology, geology, aero- space, and so on. In addition, library-type file maintenance, organization and search are being applied to medical records, adverse drug reaction reports, intelligence files, engineering drawings, museum catalogs · and the like, and these too, represent very large information retrieval files. In other words, library files are very widespread and are beginning to become a problem for data processing. CHARACTERISTICS OF FILES The aforementioned library files have certain common characteristics. First, as already noted, they are large. In the next ten or fifteen years there will probably be several hundred libraries with holdings exceeding one million volumes each. Second, the records themselves are alphabetic and tend to be voluminous. They range from two hundred characters in an index journal, to three hundred characters for the standard catalog card up to two thousand characters for the abstract journals. In 1962 the Library of Congress, for example, estimated that it would need a file exceeding 9 x 108 bits to do its normal library processing and to store the serial records; it would need a file of 1.3 x 109 bits to store the circu- lation records and location directory and monitor the use of the collection, and would need a file of 1012 bits for the central catalog and the catalog authority files ( 1) . On the basis of library experience since 1962, these figures are generally considered too low. Third, file records are variable in length. The librarian cannot control his inputs. The world's publica- ,tions appear in every shape, form and identity and they must be recorded the way they have appeared so that they can be properly identified. Arti- ficial identification such as book numbers, call numbers, Coden numbers for journals and the like are simply parochial conveniences and do not replace the actual bibliographic record. Records in a large catalog file are generally stable and not dynamic. If there is a new edition of a document, a new bibliographic record is made. If the old document is retained along with the new edition, the old catalog record is also retained. The record is discarded only if the document is discarded and, in the large research library, this occurs very infrequently. New indexing or cataloging is seldom applied to old records. In contrast, the smaller item record file used for acquisition and process- ing, the circulation file, and the serials records file, all ranging from 10,000 to 100,000 records, are dynamic records requiring many and frequent changes, additions and deletions. _ 22 Journal of Library Automation Vol. 2/1 March, 1969 Each record item must have a number of different access points, since a single class or access point which everyone will accept is an impossi- bility. At present, with conventional library cataloging, card catalogs and printed indexes provide about five or six access points or records per title. However, computer systems, with their greater opportunity to do deeper indexing, are providing from ten to twenty keys or access points per title. Distribution of index tenns is very uneven and not predictable. A few terms have a great many postings or addresses, while many terms, notably author entries, have only one or two postings. File segmentation by subject class has been proposed by some data processing personnel, but inter-disciplinary needs are such that subject segmentation is not considered very seriously. File segmentation by date, especially for the abstract services, is increasing in popularity. It is gen- erally thought that major activity, in the technologies especially, is con- centrated in current records; this is less true, however, in the sciences and even less in the humanities. Public library and undergraduate library personnel may not object to segmenting their files, but those librarians responsible for major research collections that cover all disciplines do not look with favor on segmented files. Although circulation records do provide some clues as to the activity of the various parts of a library's collection, no one really knows what the search activity in the catalog is, or how it is distributed across the various records used. Therefore, since every record is considered perma- nent in libraries, major effort has been expended on input processing which has included the recording of much material whose utility is ques- tionable. A user wants to access files in open language, and wants to receive re- sponse in open language; he will not use codes and so-called machine language and will tolerate only a minimum of training on methods to interrogate the file. He prefers to engage in an actual dialogue with the file and if he cannot do this will ask a reference librarian or reader's advisor to find the references for him. He also wants real-time response. If he doesn't get fairly prompt answers, he will go elsewhere to satisfy his informational needs. TYPES OF FILES The librarian must work with a number of files: 1) The item record file is the record of an item, book, journal, report. etc., that is being or- dered, is on order, is being received, or is being processed by the cata- loger. 2) The catalog file is the permanent bibliographic and subject rec- ord of the item that has been processed by the cataloger. 3) The serial~ record file, which is in two parts, is the record of holdings of completed volumes both bound and unbound, and the check-in record of currently received periodical issues. 4) The circulation control file keeps the record of all items loaned or otherwise charged out. 5) The catalog authority File Organization of Library Records/W ARHEIT 23 file is the thesaurus-like vocabulary control which indexers and catalogers use as their authority list and guide in assigning index terms. It is also used to "normalize" the inquiries of a searcher and convert them to legiti- mate index terms. The librarian is also concerned with a number of indexed abstracts produced by various discipline oriented institutions which are used in libraries. He also uses a number of special files: borrower or patron file, special collection files, location files, vendor files, and the like. Except for a few comments about the item record, this discussion is confined primarily to the catalog file, which is by far the largest file and, for the librarian and the general user, the most important. As already noted, in most respects it is very similar to the indexed abstract file and, in fact, in certain special libraries, these two files are combined. IN PROCESS FILE The in process, or item record, file consists of records of all items which the library is acquiring and processing. It is not a very large file, or, at least if properly policed, should not be. Unfortunately, because in manual systems it is difficult continuously to follow up outstanding orders, a lot of deadwood accumulates and files become unnaturally large and difficult to handle. In a well controlled file, however, the number of records does not grow appreciably, for, although new items are added, processed titles are removed when they are added to the catalog file. · In addition to providing such normal bibliographic access points as personal author, corporate author, title, report nmJ}ber and the like, the item record may also be searched by a number of specialized keys: order number, vendor, publisher, journal code, contract number, fund, requester. The item record is very dynamic. Information available to the librarian when the order for an item is placed may be faulty. New information will be coming in about the item, such as price, shipping costs, invoice number, change in vendor, and change in title. Various funds have to be charged and obligations changed, payments authorized, funds decre- mented, receipt notices prepared and sent to requesters, flags in various files changed to prevent duplicate orders and the bibliographic record transmitted to the cataloging staff. However, once an item has been re- ceived and cataloged, only the bibliographic information (author, title, place, publisher, date, pagination) are retained and the rest of the infor- mation is retired to an historical file. ( 2). Because it would provide greater flexibility as new and unexpected demands are generated, the best way to handle this dynamic file would be with a generalized data management system rather than with a tailor- made acquisitions and processing program. Although present data man- agement systems are really not suitable, because of variable length rec- ords in item record files and because terminals will be used, it appears that some could be adapted. 24 Journal of Library Automation Vol. 2/1 March, 1969 CATALOG FILE The tendency today, however, is to build a single master file with vari- ous functional fields where bibliographic information, ordering, and pur- chasing data, loan records, location information and other item control data are stored. How should this very large master catalog file be or- ganized so that it will be easy and economical to maintain and provide all the desired search capabilities? There are three basic file organization schemes in use today for infor- mation retrieval: the serial file, the inverted file and the list process file ( 3,4,5). Actually, from a technical point of view, both the inverted file and the list process file represent two different classes of list structures and are, therefore, sometimes referred to as the inverted list system and the threaded list system. Serial File Organization Although the serial file is the easiest and cheapest to maintain, the librarian obviously cannot accept purely serial searching of his catalog. The file is much too big and the real time requirements are such as to rule out any but the shortest, simplest serial or sequential search. As will be pointed out later, the librarian does need some serial searching capa- bility, and of course he does need it if he wants to do any browsing. However, if he is to provide any kind of useful service, he must use direct-access storage devices and access to his records individually. Threaded List File Organization For a while there was some interest in using a threaded list file organi- zation for the catalog file. Here, the searcher is first directed through a dictionary or directory to the latest record associated with a term. This record also contains the chain address of the previous record having the same descriptor, so that a user can run through a "chain" or "list" until he reaches the oldest or last record, or comes back full circle to the starting record. Each record belongs to a number of lists, one for each descriptor used to describe it, and there are as many lists as there are descriptors. Such a system seems economical of storage space in that a secondary or separate index does not have to be stored, but, since storage space for the chain or link address has to be provided, the actual savings are very small. There are several possible refinements of this list file organiza- tion which reduce storage costs. Some involve elimination of redundant information; a term, or any other searchable piece of information, is stored just once, sometimes in the form of a table. Each record that contains searchable information has a pointer to the term itself. There have to be, of course, pointers from every term back to the records as well. Insofar as the pointers may require fewer bits than the terms or addresses them- selves, there is a saving in storage space. It does cost some additional processing time and file maintenance is somewhat complicated. ( 6). File Organization of Library Records/W ARHEIT 25 Another economy measure is provided by what is generally called a multilist system which groups several-usually three-descriptors into one super key with one chain address. A multilist not only saves space but also speeds both file posting and searching by processing multiple de- scriptors simultaneously. ( 7,8,9,10,11). Such a system, to be workable, must permit grouping of various descriptors into mutually exclusive groups, and within each group there must be some equitable distribution of descriptors posted to records. In normal library information retrieval applications, a very large percentage of the descriptors are used just for one or two documents and only a few descriptors are used to identify a large number of document records. In other words, most of the so-called super keys end up having just a single real descriptor, which is equiva- lent to establishing a separate list for each descriptor. In a test made with the Defense Document Center collection it turned out that about ninety percent of the super keys had only single descriptors. ( 12,13). There are, in addition, special modifications of multilist files which es- sentially involve segmenting the multilist to fit the hardware, for example, the track length or cylinder size. (14). A fragmented sub-list, sometimes referred to as a cellular multilist, may even contain all the link addresses in the directory, thus becoming indistinguishable from an inverted file. Any list process file organization, however, does pose serious file main- tenance problems, especially where individual records must be changed or deleted. Also special precautions must be taken to avoid broken chains and provision made to repair breaks, although some advocates of list process files claim it is easier to maintain thread~d lists than inverted lists. Of course, if multilists are used, a special effort must be made to build the super keys. · It must not be forgotten that a threaded list directory can only provide the search statistics for a single term and, unlike the inverted list, can only provide intersection statistics upon completion of a total search. The few librarians who have been exposed to threaded list file organization have not reacted favorably. A few have been interested in applying this technique to do hierarchical searches and other relationship connections in their authority lists or thesauri, but have not seriously considered using it for their catalog files. Inverted File Organization The traditional library file organization as exemplified by the standard card catalog has been based on a serial main file plus an inverted file. Here a normal serial file is "inverted" and the file sequenced by index entry or key. The record itself is duplicated under each of its keys, which librarians call tracings. By strictly limiting the number of tracings or keys applied to each record, the librarian can keep the card catalog down to a reasonable size. However, as deeper indexing is applied to the docu- ments, more keys or tracings are used and the file becomes very large. 26 ] ournal of Library Automation Vol. 2/1 March, 1969 Furthermore, storage costs in the mechanized file are appreciably higher than in an ordinary manual card file. The full record, therefore, in a mecha- nized system cannot be economically stored behind each term. Only the document or record number or file address of the master record is re- corded after each term; in other words, the inverted file is just an index to the record file. The main record file itself is a simple serial file where each record is complete in itself, the tracings or keys in the record and the address of the record being duplicated on the inverted file. The cata- log file, therefore, is made up of two parts: a serially organized main or master record file, and an inverted index to the main file. ( 15). Maintenance of an inverted index is expensive. Tracings and the ad- dresses to which they refer have to be duplicated, requiring costly addi- tional storage space. New terms and new addresses cannot simply be added to the end of a file but must be distributed and interfiled through- out tl1e index, causing a number of file maintenance problems. The in- verted index and main serial file must be kept in phase, with changes in one being reflected in the other. To maintain these files, separate in- puts should not be prepared; instead the inverted index should be gen- erated from the main record file update by program control. ( 16,17,18,19). Although the combined file organization of a serial record file and an in- verted index does cost more to maintain than serial or list file organiza- tion, it provides such superior search capabilities that it has become the favored library catalog file organization. Since the inverted file is organized by subject headings or descriptors and since a search request is specified by listing the desired descriptors and their logical relationships, the search programs need only examine the items filed behind each selected descriptor or subject heading. It is unnecessary to look at all the records, as it is with the serial file. The inverted file search, in · its basic form, takes the request descriptors, ob- tains the list of record addresses or items under each relevant descriptor, makes the specified logical connections, and produces all items satisfying the request. The search procedure examines only potentially pertinent records, ignoring the rest of the file. In other words, the file is organized every time a search is made to suit the requirements of the search. Thus, the file and the request are compatible and utilization of the file is essen- tially independent of its size. An inverted index provides a very special capability to a searcher who is using a terminal, on-line system. He can test both individually and collectively the effectiveness of the terms of his search statement without having to make a complete search of the master record, simply by exam- ining the inverted index. The system will tell him, for example, the num- ber of entries under a term. It will tell him how many entries several terms share in common so that he can test the intersections, that is, the conjunction and disjunction of the terms. The count of addresses that results from the list intersection can be returned immediately to the ter- File Organization of Library Records/W ARHEIT 27 minal as an upper limit of the number of hits. In effect decoding of the Boolean expression takes place in the inverted index, which is a very compact list, and hence the response time is fast. It is true, some addi- tional calculations and comparisons in the record itself may reduce the number of hits, but will never increase them. Sitting at a terminal, a searcher can ask the system what will be the maximum number of hits he will get in response to a search statement. He can change the parameters of his search statement and see immedi- ately what effect that will have on the response of the system. It is pri- marily because of this capability of the user to have a dialog with the machine that every terminal-oriented library information retrieval system, at least of which the author is aware, is adopting an inverted file organi- zation. In order to reduce storage costs, not every search term need be car- ried on an inverted index. Those search terms or index entries that are practically never searched alone, but used rather in conjunction with another term or tracing, are carried only in the main file and not on the inverted index. In a library catalog these terms are usually the place and date of publication, publisher, language of the book, level of the publica- tion (i.e. adult, children, youth), number and type of illustrations, and so on. These terms appear on almost every record and some of them are high density terms; that is, they are heavily posted. For example, in a typical U.S. library, some eighty per cent of the books are identified as being in English. Form headings (bibliography, essay, poem, biography, map, etc. ) , geographic headings, and numerics tha~ are used in conjunc- tion with what are called main headings, also do not appear on the in- verted index, but can be searched in the main file. In the very unlikely event that a search is required to be made only for a term not on the inverted file, then, of course, a serial search can be made of the master file. In some systems, a very compact serial file of data may simplify serial searching of the master file. PHYSICAL ORGANIZATION A basic understanding of how a library's records are used is necessary to a proper plan for their physical organization. In a manual system, logical organization and physical organization of a library's records are identical. Furthermore, all files are physically the same, usually on 3x5 catalog cards or, in a few cases, in printed book or sheaf catalogs. In a computer system, however, because of varying capacities, speeds and storage costs of different direct access devices, it is extremely important that the various records and segments of records be stored in those de- vices which will give the best cost-performance for the application. This means that the rate of utilization of the various records and parts of records, as well as the size of the records, will determine what types of devices will be used as physical files. 28 Journal of Library Automation Vol. 2/1 March, 1969 In a library operation there is very heavy use of index terms, or subject headings and author entries, to search the files; records for these entries can be very short. Borrower records and charge-out records in circulation control systems are also very actively used. There is less use made of the bibliographic record or journal citation. These records are somewhat longer than the subject and author tracings, and hence require more storage, but do not need such rapid access. Notes, abstracts and other explanatory material can require an enormous amount of storage space but, as a rule, are used only infrequently. Patron registration, as con- trasted with borrower records, is used much less frequently, unless, of course, the two types of records are combined. Since serials holdings records do not change very frequently, printouts are quite satisfactory as finding tools and the records are usually kept off-line. Journal check-in, however, requires a great number of accesses every day. In view of the requirements generated by the above uses, the present thinking for on-line library systems, in terms of current hardware, runs something like this: In a combined file system described above, with the bibliographic record on the serially organized main file and the index in an inverted file arrangement, the inverted file, which must be accessed many more times than the main fil~, would best be carried on disk files. The bibliographic record itself, being much more voluminous and accessed less frequently, is stored in a larger, slower, more economical file like the IBM 2321, tl1e Data Cell. Abstracts, and other seldom used bulk records might well be on tape, off line. Actually, though, as libraries build up their record files to control their total collections, they will, of course, exceed the capacity of the present Data Cells and will have to go to future mass memory devices similar to the IBM Photo Digital Storage System. Then it may be economical to put even abstracts and notes of the bibliographic record on line. If there is a separate item record file of in process or acquisitions data, it can be handled in the same way as the catalog file, that is, all access points as an inverted file on disk with the record itself on the Data Cell strips. If, however, the total item record file is not too big, it might well be stored on disk. Circulation control records are carried on disk, but patron registration, if it is to be kept on line, would be more economically stored in the Data Cell. The authority list or thesaurus really has two functions. It is heavily used to validate and convert all inputs and all search requests. It is also used to store all cataloging and indexing decisions and to provide guides to users as to the formulation of search queries. The necessary data makes for long records that are either infrequently used or available as printouts. Therefore, a condensed form of the authority list or thesaurus, a forni which carries only the terms and their equivalents, is best stored on disk, whereas the full-blown authority list which is used primarily for printing the thesaurus and its supplements can be carried off-line on tape, or in File Organization of Library Records/W ARHEIT 29 the cheapest, biggest and slowest direct access device which is available. In order to achieve economical, compact storage, the subject headings, descriptors or index terms would not be stored in open language but in numeric codes. By using, for example, the decimal code as used in a Dewey decimal system, numeric codes would also make it possible eco- nomically to build hierarchies or class tables with the descriptors. It would be necessary, therefore, in every transaction, to translate from open lan- guage to code when interrogating the system and to translate from code to open language when outputing from the system. Translations would have to be very fast to accommodate the traffic of a large number of ter- minals. The translation job, using a stored table, might have to be done in an auxiliary, large core storage, which is very fast but more expensive than disk files. As a general rule, what is being proposed is that for very large files the index and the bibliographic record are not to be stored in the same device. One might start this way until the file and the traffic into it are built up and the system becomes fully operational. However, the system should be so structured that indexes could be stored in files that are faster than the bulk storage devices used for the records. The translation files, that is, the tables that convert from open language to stored codes on input and the reverse on output, can be stored in the fastest available exemal storage. ( 20). It is extremely doubtful that hardware development in the' immediate future will change these principles of library file organization very much. As storage costs drop, total capacities increase, and _access times become shorter, more and more libraries will find it practical and economical to put their files on line in order to provide the improved services that users demand. REFERENCES 1. U. S. Library of Congress: Automation and the Library of Congress (Washington, Government Printing Office, 1963), p. 74. 2. Batts, N. C.: "Data Analysis of Science Monograph Order/Cataloging Fmms," Special Libraries, 57 (October, 1966), 583-586. 3. "Corporate Data File Design," EDP Analyzer. 4 (December, 1966). 4. Climenson, W. D.: "File Organization and Search Techniques," Annual Review of Information Science and Technology. 1 (New York: Interscience, 1966), p. 50. 5. Borko, H.: "Design of Information Systems and Services," Annual Review of Information Science and Technology, 2 (New York: Inter- science, 1967), p. 50. 6. Castner, W. G., et al.: "The Mecca System- A Modified List Process- ing Application for Library Collections," Proceedings- A. C. M. Na- tional Meeting ( 1966), pp. 489-498. 30 Journal of Library Automation Vol. 2/1 March, 1969 7. Prywes, N. S., et al.: The Multi-List System (Philadelphia, Moore School of Electrical Engineering, University of Pennsylvania Techni- cal Status Report No. 1 under Contract NOnr 551(40), November, 1961). 8. Prywes, N. S.; Gray, H. J.: "The Multi-List System for Real Time Storage and Retrieval," IFIP Congress Proceedings. 1962, pp. 112-116. 9. University of Pennsylvania, Moore School of Electrical Engineering: The Tree as a Stratagem for Automatic Information Handling (Re- port of Work under ... Contract NOnr 551 ( 40) and ... AF 30 ( 602)- 2832, Moore School Report No. 63-15, 15 December 1962). 10. Lefkovitz, D.: Automatic Stratification of Descriptors (Philadelphia, Moore School of Electrical Engineering, University of Pennsylvania, Technical Report under Contract NOnr 551 ( 40), Moore School Re- port No. 64-03, 15 September 1963). 11. Landauer, I.: "The Balanced Tree and its Utilization in Information Retrieval," IEEE Transactions on Electric Computers (December, 1963), pp. 863-871. 12. UNIVAC Division of Sperry Rand Corporation: Multi-List Systems: Preliminary Report of a Study into Automatic Attribute Group As- signment; Technical Status Report No. 1-2, 3#AD 609 709, 4#AD 609 710 ( 1963-1964). 13. UNIVAC Division of Sperry Rand Corporation: Optimization and Standardization of Information Retrieval Language and Systems; Final Report (AD 630-797, 1966). 14. Lefkovitz, D.: File St1'Uctures for On-Line Systems (class syllabus). 15. Curtice, R. M.: Magnetic Tape and Disc File Organizations for Re- trieval (Lehigh University, Center for Information Sciences, July, 1966). 16. Warheit, I. A.: "The Direct Access Search System," AFIPS Confer- ence Proceedings, 24 ( 1963), pp. 167-172. 17. Warheit, I. A.: The Combined File Search System. A Case Study of System Design fm· Information Retrieval (paper presented at the F. I. D. Meeting in Washington, D. C., October 15, 1965; Abstract, 1965 Congress, International Federation for Documentation ( FID), Washington, D. C., U. S. A. 10-15, October 1965), p. 92. 18. Prentice, D. D.: The Combined File Search System (San Jose, Cali- fornia: IBM June 15, 1964). 19. 1401 Information Storage and Retrieval System- Version II; The Combined File Search System, No. 10.3.047 (Hawthorne, New York: IBM, May 1, 1966) . 20. Warheit, I. A.: File Organization for Libraries; Report to Project Intrex, MIT, Cambridge, Massachusetts, March 14, 1968. 3797 ---- 31 A FAST ALGORITHM FOR AUTOMATIC CLASSIFICATION R. T. DATTOLA: Department of Computer Science, Cornell University, Ithaca, New York An economical classification process of order n log n (for n elements), which does not employ n-square procedures. Conversion proofs are given and possible information retrieval applications are discussed. Many methods exist for ordering or classifying th<;J elements of a file. The elements are usually clustered into groups based on the similarities of the attributes of the elements. In information retrieval, the elements are frequently documents, and the attributes are words or concepts char- acterizing the documents. Classification of document files may be divided into two basic categories: 1) an a priori classification already exists and each document is placed into the cluster whose centroid is most similar to that document; 2) no a priori classification is specified and clusters are formed only on the basis of similarities among documents. Classification schemes that fall into the first class are very common and often involve manual methods. For example, new acquisitions of a library are classified by placing them into the clusters of a standard a priori classification. Problems of the second type are usually more difficult to handle, and automatic or semi-automatic methods are often used. Methods of this type are widely used in statistical programs, but the number of elements in the file is limited to several hundred, or at most, a few thousand items. In information retrieval applications, the number of ele- ments may approach several hundred thousand or even a million docu- ments, as in the case of a large library. In the present study, a method is described which is suitable for classification of very large document collections. 32 Journal of Library Automation Vol. 2/1 March, 1969 THE N2 PROBLEM Current methods of automatic document classification usually require the calculation of a similarity matrix. This matrix specifies the correla- tion, or similarity, between every pair of documents in the collection. Thus, if the collection contains N documents, N 2 computations are re- quired for calculation of the similarity matrix; however the similarity matrix is often symmetric, so the number of computations is reduced to N2/2. This immediately poses two serious problems: the storage space nec- essary to store the matrix increases as the square of the number of docu- ments, and the time required to calculate the matrix also increases quad- ratically. Fortunately, document-document similarity matrixes are normally only about ten percent dense, and only the non-zero elements need be stored ( 1). However, as N increases, auxiliary storage must eventually be used, and although this solves the space problem, it also magnifies the time problem. To illustrate the magnitude of this problem, suppose that it takes one hour of computer time to classify a one-thousand document collection. Then for N = 104, the time is approximately one hundred hours, and for N = 106, the time needed is about 120 years! The classification scheme described in this paper is an adapt~tion of the one proposed by Doyle, and the time required is of the order of N log N (2) . For example, assum- ing the logarithm has base 10, and the time required for a one-thousand document collection is again one hour, then for N = 104 the time is 13 hours, and for N = 106, the time required is about 83 days. DOYLE'S ALGORITHM The N2 problem is avoided in this classification scheme, because a similarity matrix is never computed. Assume the document set is arbitrarily partitioned into m clusters, where S, is the set of documents in cluster j, Associated with each set S, are a corresponding concept vector C1 and frequency vector F,. The concept vector consists of all the concepts occurring in the documents of S;, and the frequency vector specifies the number of documents in S; in which each concept occurs. Every concept in C; is assigned a rank according to its frequency; i.e., concepts with the highest frequency have a rank of 1, concepts with the next highest frequency receive a rank of 2, etc. Given an integer b (base value), every concept in C; is assigned a rank value equal to the base value minus the rank of that concept. The vector of rank values is called the profile P, of the set SJ. Tables 1 and 2 illustrate the concept and frequency vectors, and the corresponding profiles for a sample document collection (base value=6). , Starting from a partition of the document set into m clusters, the profiles are generated as described. Every document di in the document space is now scored against each of the m profiles by a scoring function g, where g ( dt, P1) = the sum of the rank values of all the concepts from cL Fast Algorithm for Automatic Classification/DATTOLA 33 which occur in C,. Tables 3 and 4 show the results of scoring the docu- ments in the sample collection against the profiles from Table 1 (cut- off= 10). Table 1. Concept Vectors dl d2 da d-4 d5 d6 d1 C1 CI C1 C1 C1 Cs C6 C2 C2 C7 C2 Cs Cs C5 C4 Cs Cs C5 Cn Table 2. Initial Clusters, Profiles, and Frequencies s~ c1 F. pl s2 c2 F2 p2 Sa Cs Fs - - dl C1 3 5 d2 C1 2 5 d6 Cs 1 da C2 1 3 d4 C2 2 5 d7 . C6 1 d5 Cn 1 3 Cs 1 4 Cs 1 C1 1 3 C4 1 4 Cs 2 4 C5 2 5 Table 3. Document Scoring Document Profile of Highest Score Score ~ 2 m ~ 2 w ~ 1 ~ ~ 2 w d5 1 9 ds 3 5 ~ 3 ro Table 4. Clusters Resulting from Document Scoring S1' S2' Sa' L da d1 d1d5 d2 d6 d4 Pa 5 5 5 34 Journal of Library Automation Vol. 2/1 March, 1969 Given a cut-off value T, a new partition of the document set into m+1 clusters is made by the following formula : S/ = [d.ig(d.,PJ)~g(d.,Pk) and g(d.,P,)'=::,..T, for k=1, . . . ,m]. Thus, S/ consists of all the documents that score highest against profile PJ, provided that the score is at least as great as T. In cases where a document scores highest against two or more profiles, say Pr, .. . , Pr, 1 " the following tie-breaking rule is used: if d.eSJ and, a) f=rk, 1 L k L n, then d. is assigned to ( Srk) '; b) j=r,, for 1 L k L n, then c1 is arbitrarily assigned to (Sr 1 ) '. Those documents which do not fall into any of the m clusters S/ are called loose documents, and they are assigned to a special class L . The process is now repeated after replacing P, by P/. The iteration continues until P; satisfies the termination condition, which states that P/=P1 for f=1, . .. ,m, i.e., the profiles are unchanged after two consecutive iterations. SATISFACTION OF TERMINATION CONDITION A) Non-Convergence of Doyle's Algorithm Doyle's algorithm as described is not guaranteed to tenninate. To illus- trate this, consider the following document collection: dl d2 da d4 du d6 dr ds d9 1 1 3 3 1 2 2 2 1 2 5 7 4 5 3 3 4 2 3 6 8 7 6 4 4 5 4 4 11 9 8 7 9 7 6 5 5 12 12 8 7 6 11 9 8 7 10 10 10 Let S1=[d1-d6], S2=[d1-d9], and let P1=pro:6le of sl~ and P2= profile of s2. The two profiles are as follows (base value=7) : dl- d6 dr- d9 Concept Frequency Profile Concept Frequency Profile 1 3 5 1 1 4 2 2 4 2 3 6 3 4 6 3 1 4 4 3 5 4 3 6 5 3 5 5 2 5 Fast Algorithm for Automatic Classification/DATTOLA 35 Concept Frequency Profile Concept Frequency 6 2 4 6 2 7 3 5 7 3 8 2 4 8 2 9 2 4 9 1 ro o ro 3 11 2 4 11 0 12 2 4 12 0 Profile 5 6 5 4 6 13 0 13 1 4 Now assume that T = 0, and partition the document set by the for- mulae: S1'=[~jg( ~,P1)::::,g( dt,P2)] and Ss'=[dtjg( dt,P2)::::,g( dt,Pl) ]. The results are summarized in the following table: g(dt,Pl) g(~,P2) dl 29 25 d2 22 14 ds 23 19 d4 20 21 d5 19 20 da 19 20 d1 28 37 ds 27 39 d9 28 42 Therefore, S1'=[d1-dal and S2'=[d4-d9]. According to Doyle's algorithm, P1 is replaced by P1' and P2 by P2'. The new profiles are: d1-ds d4-d9 Concept Frequency Profile Concept Frequency Profile 1 2 6 1 2 3 2 1 5 2 4 5 3 2 6 3 3 4 4 1 5 4 5 6 5 2 6 5 3 4 6 1 5 6 3 4 7 1 5 7 5 6 8 1 5 8 3 4 9 1 5 9 2 3 10 0 10 3 4 11 2 6 11 0 12 2 6 12 0 13 0 13 1 2 36 Journal of Library Automation Vol. 2/1 March, 1969 Now the document set is again partitioned and the results are: g(i1,P1) g(t1,P2) dl 34 22 d2 29 11 da 27 17 d4 21 20 d5 22 17 do 21 18 d1 31 32 ds 31 33 do 32 34 Therefore, S1'=fd1-da] and S2'=fd1-dol. These are the original sets, so that the algorithm will never terminate for this example. B) Termination of Modified Algorithm Although Doyle's algorithm is not guaranteed to terminate, Needham proved that similar types of iterative methods are guaranteed to terminate in a finite number of steps ( 3). A small change in Doyle's method produces an algorithm that is guaranteed to terminate. The modification occurs after the calculation of the S/. Instead of automatically replacing the old P1 by P;', the following condition must also be satisfied: 'l g( c1,P/) > 'l g( t1,P1 ) i£S/ i£S/ If the above condition is not satisfied, P1 is left unchanged. Before proving that this new algorithm is guaranteed to terminate, it is desirable first to make the algorithm more general by allowing overlap between the clusters. The following theorem proves the termination of a method which allows overlapping clusters. Theorem: Let the subscript n designate the nth iteration. Let D repre- sent the document space and let Po,1, ... , Po,m represent m initial profiles corresponding to an arbitrary distribution So,1, ... , So,m of documents in D . Given a cut-off value T, the nth iteration is defined as follows : 1). Generate the sets Sn,l, ••• , Sn,m and Ln by Sn,J=[i1lg( d~,,Pn-1,1 f:""·T] Ln= (loose documents) { p n,J if, 'l g ( t1,P n,f) ? 'l g ( t1,P n-l,J) 2), Let P n,;= 't£Sn,J 't£Sn,J Pn-l,J otherwise This algorithm is guaranteed to terminate in a finite number of iterations, where termination occurs when Pn,1=Pn-l,J for all f. Proof: Extend the document spaceD to a new document spaceD# con- Fast Algorithm for Automatic Classification/DATTOLA 37 taining m distinguishable copies of every document in D. Also, add the condition that Sn,J can never contain more than one copy of each document. Clearly, any Sn,J defined on D# in this manner can also be represented on D as defined in the theorem. Con- versely, any Sn,J defined on D can also be represented on D# as defined above. Thus, it suffices to prove the theorem on D# under the added condition. Define a function F ,., which will be shown to be monotone increasing in n, by the following: m F,. = l F n,J+ T•Zn, where i=l F,.,J= lg(cL,Pn-t,J) and if.Sn,J Z,. = number of documents in L,.. After step 2 of the iteration, F,. is replaced by F ,.', where ( F n,J)' = l g ( dt,P n,J) • if.Sn,J If for any j, Pn,J =F Pn-t,J, then (Fn,J )' > Fn,J (this statement is not necessarily true in Doyle's algorithm) and therefore F ,.' > F n· If . termination occurs; i.e., Pn,J = Pn-t,J for all j; then Fn' = F,.. For the n + lth iteration, m F,.+l = l Fn+t,J + T•Zn+t, where i=l Fn+l ,J = l g(dt,Pn,J). if.Sn+z,J Consider the relation between the contribution of cL to F,.' and Fn+t, and note that each cL (where copies of a document are distinct) contributes once and only once to both F,.' and Fn+t· This relation is summarized in the following table: DocumentcL a) was assigned to Sn,J and now 1) to s,.+l,J (cL did not change clusters) 2) to Sn+l,k, k =F f ( cL did change clusters) 3) to Ln+t b) was assigned to L,. and now 1) to Ln+t 2) Sn+l ,J Relation between contribution of cL to F,.+t and F,.' > (g( dt,Pn,J) (g(cL,Pn,J)T, where 0..::::::~1 "·1·' - LT otherwise 4) if any Sn,; contains fewer than two documents, then S,.,; is elimi- nated, thereby reducing the number of clusters by one. The advantages of this method over the one defined in the theorem are discussed in the present section; the disadvantage is, of course, that termination is not guaranteed. To show this, note that conditions 1) and 2) above are equivalent to the termination condition in Doyle's algorithm, since in Doyle's method Pn,J always corresponds to the new partition S,.,,, and S,.,1 = s:J (no overlap is allowed). Also, if a = 0 in condition 3, then Tn-t,< = Hn-t,t. Thus, only those documents c1 that score highest against Pn-1,/, where Hn-u::::::... T, are assigned to Sn,J. Therefore, with a= 0 this method is equivalent to Doyle's algorithm. The first two modifications are implemented to improve the efficiency of the program. Although convergence is no longer guaranteed, all the experiments tried so far have in fact always terminated. Programs without these two modifications run about twice as slow. Also, in cases where the overlap is not too high ( s:.1=Sn,J), the new termination condition is usually equivalent to the one used in the theorem. That is, when s:.1=S:+l,J, then very often Sn,J=Sn+t,J. Fast Algorithm for Automatic Classification/DATTOLA 39 The third modification does not improve efficiency, but it allows a more flexible, and intuitively, a more desirable method for creating overlap. The algorithm described in the theorem assigns a document d. to a cluster Sn,J if g( dt, Pn-l,J )-::::...T. This has two major disadvantages: 1) the overlap cannot be increased independently of the number of loose documents; increasing the overlap by lowering T in general decreases the percentage of loose documents; 2) the difference between d.'s highest score and d/s second highest score is ignored; e.g., if T=50, g(ct,P~)=200, and g(ct,P2)=50, then ct is assigned to both sl and s2. The first problem decreases the flexibility of the algorithm, since the amount of overlap and percentage of loose documents cannot be varied independently. The example in the second part illustrates the other prob- lem. It seems desirable that a document should be assignable to two or more clusters when it scores equally (or almost equally) as high against all of them. The previous method does not take this fact into account. In the new algorithm, documents are assigned to more than one cluster on the basis of how close the score is to the highest score, relative to the cut-off value T. The parameter a determines how close to the highest score the other scores must be. When a.;_O, no overlap occurs, while a=! generates the maximum amount of overlap. With a=l, the formula reduces to Tn-l,<=T; hence, it is the same definition of Sn,J as in the theorem. The last modification increases the efficiency of the program, and also avoids forming clusters around documents which should be classified as loose. When S,,1 contains only one document, and that document is contained in no other clusters, then it has the same status as a loose document. EXPERIMENTAL RESULTS The algorithm described in the preceding section is used to cluster the 82-document ADI collection and the 200-document Cranfield word stem collection ( 4). The results of the classification indicate three important problems: 1) the scoring function g tends to give higher scores to docu- ments containing a larger number of concepts; thus, many of the docu- ments containing very few concepts are classified as loose; 2) the docu- ments do not move freely enough from one profile to another; i.e., the final clusters are quite similar to the initial ones; 3) the initial clusters cannot be chosen arbitrarily. The Scoring Function The first problem is due to the fact that g scores a document ct against a profile P1 by simply adding up the rank values of all the concepts in d, which appear in P1• If tL contains a larger number of concepts than d~c, the chances are greater for d; to receive a higher score. Figure 1 is a plot of the score of the document against its final profile vs. the number of con- 40 Journal of Library Automation Vol. 2/1 March, 1969 -G) ... 0 CJ (/) 8 0 • 1 c1uster t Cluster 4 No. of concepts /document Fig. 1. Initial Scoring Function . Q) ... 0 CJ (/) 25 5 • _jL.----"~•r--C I uster 3 ,_ .. 0 10 20 30 40 50 No. of concepts/document Fig. 2 Modified Scoring Function. Fast Algorithm for Automatic Classification/DATTOLA 41 cepts in the document for one of the ADI runs. Although there are a few exceptions, the graph indicates that the documents with a larger number of concepts generally receive higher scores. In fact, the average number of concepts in a loose document is eleven, while the average number of concepts per document for the entire collection is twenty. The solution to this problem is to weight the score inversely by the number of concepts in the document. The obvious answer is to divide the score by the number of concepts, but this overcompensates and gives many of the smaller documents the highest scores. Dividing by the square root of the number of concepts in the document does not solve the original problem; i.e., larger documents give higher scores. Satisfac- tory results are obtained when the score is divided by ( # of concepts per document) 718• Figure 2 represents the same ADI sample as Figure 1, except that the new scoring function h=g/(# concepts per document) 718 is used. Unlike the function g, h seems to be independent of the num- ber of concepts in the document. Movement of Documents The second problem is clearly indicated by examination of the results of the classification. Table 5 shows the initial and final clusters for the ADI collection. The problem occurs because the documents tend to "stick" to the clusters that they are already in. This problem is solved by a method similar to that used by Doyle. Cluster 1 2 3 4 5 6 7 Loose Table 5. Final Results of ADI Classification Initial Documents 1-12 13-24 25-36 37-48 49-60 61-71 72-82 Final Documents 1- 11, 13, 21, 30, 33, 34, 40, 43, 51, 68 3, 10, 13 - 24, 26, 33, 34, 53, 69, 79 9, 11, 13, 20, 22, 23, 25 - 28, 30 - 34, 36, 47, 51, 55, 65, 75 4, 7-9, 14, 20, 30, 37- 48, 51, 69 1, 5, 7, 20, 30, 32, 45, 47, 51 - 53, 55 - 59, 79, 80 2, 9, 27, 30, 47, 51, 61, 62, 64-71 10, 40, 51, 72 - 75, 77 - 81 12, 29, 35, 49, 50, 54, 60, 63, 76, 82 42 I ournal of Library Automation Vol. 2/1 March, 1969 During the first few iterations, documents should be allowed to move freely from cluster to cluster, until a nucleus is formed within each clus- ter. The nucleus consists of those documents that are most highly cor- related to one another. Once the nucleus is formed, these documents will probably not move from their present clusters. Clusters can be forced to contain only very highly correlated documents by raising the cut-off value T, assuming that documents with the highest scores are most similar to the other documents in the cluster. This assumption is investigated later. However, raising the cut-off value results in a larger number of loose documents. This is resolved by repeating the classification for a lower value of T, but using the clusters from the first classification as the initial clusters. This creates the problem of how to determine the initial value of T, and how much to decrement it when the classification is repeated using as initial clusters the results of the first classification. The initial value of T should be high enough so that only those documents which score very highly against profile P1 are assigned to SJ. One method of achieving this is to pick T so that the clusters after the first iteration average q docu- ments, where q is small compared to the total number of documents. In the experiments run so far, q is arbitrarily set at 4. After termination of the first classification, a nucleus is formed within each cluster. T is now chosen so that a certain percentage of the loose documents are assigned to clusters after the first iteration of the second classification. Assuming it is desirable to have approximately x percent of the documents loose after the final clusters are formed, two approaches are possible: 1) T is lowered far enough so that only x percent of the documents remain loose after the first iteration; thus, after termination of the second classification, the clusters represent the final results; 2) T is lowered just enough to allow a certain percentage of the loose documents to be assigned to clusters after the first iteration; thus, the classification is repeated until approximately x percent of the documents remain loose. Experiments performed using both methods indicate that the second approach allows greater control of the loose documents, with only slightly greater execution times. After the first classification, a large proportion of the documents still remain loose. Therefore, if x is not too high, method 1) decreases T by a large amount. This injects many new documents into the clusters, and several iterations are necessary before termination occurs. Also, T is chosen so that the percentage of loose documents is x at the end of the first iteration, but it is impossible to know beforehand the percentage of loose documents after the final iteration. In general, the more iterations, the more the final percent varies from the percent after the first iteration. In method 2), T is lowered just enough to allow a fairly small percentage ( 20% in the present experiments) of the loose documents to be assigned to clusters. This normally results in only a few Fast Algorithm for Automatic Classification/DATTOLA 43 iterations before termination occurs; therefore, the final percent of loose documents does not vary much from the percent loose after the first iteration. The ADI collection is reclassified using the procedures described above, where it is desired that about 25% of the documents remain loose. Once again seven initial clusters are used, and the initial value of T is cal- culated to be 28.2 so that the clusters after the first iteration average four documents. However, in this case cluster 3 is assigned ten documents, while clusters 1,5, and 6 contain only one document. Thus, these three clusters are eliminated, and the documents within them become loose. Mter termination occurs, the final clusters are used as initial clusters for the next classification, where T is set to 19.1. The process is repeated again for T = 16.8, and after termination 17% of the documents remain loose. Table 6 shows the final results of this classification. Compared with Table 5, many more of the documents have moved from their initial clusters. Table 6. Final Results of New ADI Classification Cluster 1 2 3 4 5 6 7 Loose Initial Clusters Initial Documents 1-12 13-24 25-36 37-48 49-60 61-71 72-82 Final Documents 3, 5, 9, 10, 14-17, 20- 28, 30, 34, "37, 43, 45, 48, 53, 57 - 59, 64, 68, 69, 72, 79, 80 1, 2, 5, 6, 8, 11, 13, 20, 21, 24, 27, 28, 30, 36, 39, 41, 43, 47, 51, 53, 55, 56, 58, 61, 62, 65 - 68, 70, 71 79, 80 7, 31, 42, 44, 46 4, 9, 19, 32, 40, 51, 73 - 75, 78, 81 12, 18, 29, 35, 38, 49, 50, 52, 54, 60, 63, 76, 77, 82 In the present study, the initial clusters are determined by assigning the first p (or possibly p+ 1) documents to cluster 1, the next p ( p + 1) to 44 Journal of Library Automation Vol. 2/1 March, 1969 Table 7. Score vs. Average Correlation for ADI Classification Cluster 1 Cluster 2 Document Score Avg. Corr. Document Score Avg. Corr. 25 19.1 .08 8 18.7 .09 5 19.6 .12 5 18.8 .12 64 20.1 .10 20 18.8 .12 23 20.2 .13 68 18.9 .10 27 20.3 .10 2 19.2 .12 34 20.6 .11 70 19.2 .13 15 20.6 .11 39 19.2 .14 37 20.7 .14 28 19.3 .11 48 20.8 .12 58 19.4 .12 58 20.9 .12 36 19.5 .14 28 21.0 .12 61 19.6 .11 53 21.0 .14 56 19.6 .14 20 21.0 .14 66 19.7 .12 68 21.1 .12 67 19.9 .12 80 21.2 .14 80 20.0 .14 57 21.2 .13 43 20.0 .14 59 21.3 .15 33 20.0 .15 14 21.4 .13 21 20.0 .15 16 21.5 .13 11 20.0 .14 79 21.5 .15 65 20.0 .15 43 21.6 .16 27 20.0 .13 24 21.6 .15 71 20.1 .14 69 21.7 .14 41 20.2 .15 26 21.7 .16 79 20.4 .16 17 21.8 .15 24 20.4 .15 72 21.8 .17 13 20.5 .15 21 22.0 .17 51 20.6 .12 3 22.0 .17 53 20.7 .16 9 22.1 .17 6 21.0 .18 30 22.2 .17 62 21.4 .18 22 22.3 .17 55 21.5 .17 45 22.4 .18 1 21.7 .20 10 22.4 .17 30 21.8 .18 Cluster 3 Cluster 4 Document Score Avg. Cor1·. Document Score Avg. Co1'1'. 31 31.0 .21 32 24.5 .10 46 31.2 .05 51 25.0 .15 44 31.3 .14 74 26.0 .16 7 31.8 .24 4 26.3 .18 42 33.2 .28 75 26.4 .16 9 26.5 .19 19 26.9 .17 73 27.0 .19 78 27.1 .18 40 27.4 .24 81 27.6 .21 Fast Algorithm for Automatic Classification/DATTOLA 45 Table 8. Score vs. Average Correlation for Cranfield Classification Cluster 1 Cluster 2 Document Score Avg. Corr. Document Score Avg. Corr. 26 22.0 .12 38 19.9 .12 6 22.3 .13 97 20.2 .13 7 22.4 .13 15 20.3 .11 117 23.0 .14 1 20.3 .13 2 23.1 .14 34 20.4 .13 121 23.2 .15 145 20.4 .14 13 23.3 .15 171 20.8 .12 19 23.4 .16 172 20.9 .13 60 23.7 .15 30 20.9 .14 23 23.8 .17 4 21.1 .15 18 23.8 .17 140 21.1 .15 44 24.1 .18 72 21.2 .15 183 24.2 .17 138 21.3 .15 116 24.3 .18 143 21.3 .15 128 24.5 .18 141 21.3 .14 61 24.6 .18 27 21.6 .13 9 24.6 .18 36 21.7 .13 197 24.7 .19 157 21.7 .17 16 24.7 .20 59 2!.8 .16 198 24.7 .17 156 21.8 .16 3 24.8 .18 200 21.9 .16 25 24.9 .20 32 22.0 .18 28 25.0 .21 137 22.4 .15 115 25.1 .21 29 22.5 .19 58 25.6 .21 148 22.5 .17 181 25.6 .20 . 57 22.8 .18 56 25.9 .21 128 22.8 .15 160 26.6 .23 44 23.1 .19 31 23.2 .19 139 23.9 .19 56 23.4 .18 160 24.3 .18 58 25.3 .21 Clustet· 3 Document Score Avg. Corr. 179 31.4 .19 154 32.5 .24 79 32.8 .27 133 32.9 .29 134 33.4 .28 77 33.7 .27 132 33.8 .32 78 34.1 .30 76 34.3 .34 74 34.4 .34 75 34.5 .36 46 ]oumal of Library Automation Vol. 2/1 March, 1969 cluster 2, . . . , and the final p to cluster m, where p = (total number of docU'ments) I m. Since the nucleus of each cluster depends quite strongly on the initial clusters, it is not surprising that different initial clusters lead to different results. If the initial clusters are chosen at random, it is unlikely that the documents within each cluster are very similar. Thus, the nucleus of each cluster might not be very tight. This problem is solved by insuring that the initial clusters contain at least a few documents that are highly con-elated. In the ADI and Cran- field collections, the order of the documents is such that many adjacent documents are quite similar; therefore, most of the initial clusters contain a few highly con-elated documents. In collections where the order of the documents is random, a simple, fast, clustering scheme can be used to determine the initial clusters. This type of an algorithm need only perform document-document con-elations within a fraction of the docll'ment space, and therefore should not take up much time. Evaluation of Results The assumption was made earlier that those documents of a cluster S; that score highest against the conesponding profile P1 are most similar to the other documents in the cluster. The phrase "most similar" is used to mean "con-elate most highly", where a standard con-elation function is used. Table 7 compares the score of each document to the average correlation ( unweighted cosine function) of each document with every other document in the cluster. The documents are arranged in ascending order by scores, and hopefully, the con-elations will also appear in ascend- ing order. As the table indicates, there is a strong tendency for the higher scores to conespond to the higher correlations. Table 8 illustrates the same results for three out of seven final clusters from the Cranfield collection. So far nothing has been said about how to choose the base value that is used to compute rank values. This integer has an important effect on the type of clusters produced. Recall that the rank value of a concept equals the base value b, minus its rank. Suppose a cluster S1 contains four documents d1-d4, and a total of twenty different concepts. The lowest possible rank value for any concept=b-4, since 4 is the lowest possible rank. If h=20, then the lowest rank value is 16, while if h=5, the lowest rank value is 1. Consider a document d. which is the same as d1 except for one concept, and assume this concept does not occur in P,. With h=20, g( d.,P1) is between 16 and 20 points less than g( d1,P1 ); with h=5, g( d.,P1 ) is only 1 to 5 points less. Since large clusters have profiles containing many concepts, the chances of a random document d, having concepts in the profile of a large cluster are greater than the chances of d. having concepts in the profile of a small cluster. Therefore, if b is high, d. will score much lower against the profile of the small cluster, and large clusters will tend to capture all the remaining loose documents at the expense of the smaller clusters. Fast Algorithm for Automatic Classification/DATTOLA 47 Experimental results support this hypothesis, i.e., a large base value pro- duces a few clusters with many documents, and many clusters with only a few documents. If, on the other hand, b is set so that the lowest rank value in an aver- age cluster is 1, then there is a tendency for small clusters to get larger and large clusters to get smaller. In smaller than average clusters, all the rank values are high, since there are only a few different ranks. In larger than average clusters, the rank value as defined might become zero or even less than zero. In these cases, it is redefined to be 1, but then it is possible for many concepts to have a rank value of I . Thus, a docu- ment often scores higher against the profiles of smaller clusters. The results of the Cranfield classification clearly indicate the ability of a document to score higher against profiles of smaller clusters. During the classification, nine clusters are generated, and cluster 9 starts to grow much larger than average (average = 22 documents). It keeps growing until it contains 27 documents, and then it starts to oscillate. The follow- ing numbers indicate the number of documents in cluster 9 on successive iterations: 27, 21, 34, 17, 56, 01 Thus, cluster 9 is eliminated. The same thing happens to cluster 8 on the next few iterations. Although this tends to keep the size of the clusters somewhat uniform, it is not desirable to throw away a cluster which might contain many highly correlated docu- ments. One solution which might be implemented is to split up large clusters into several smaller ones; i.e., classify the documents , within a single cluster. If the number of documents in the cluster is not too large, it might be practicable to use an N 2 algorithm to do this. CONCLUSION The classification algorithm that has been described in the preceding two sections requires the following parameters as input: 1) maximum number of clusters desired; 2) approximate percentage of loose documents desired; 3) decision on whether or not loose documents should be "blended" into the nearest cluster at the end of the classification; 4) amount of overlap desired. The first parameter specifies the number of initial clusters that are formed. If no clusters are eliminated during the evaluation, then the maximum number are actually generated. The experiments run so far indicate that the number of clusters produced is usually only about 60% of the maximum. The next two parameters determine the "tightness" of the final clusters; the higher the percentage of loose documents, the tighter the clusters. If no loose documents are desired, parameter b can be set to 0, but very low percentages increase the running time of the program. Almost iden- tical results are obtained in less time by specifying about 15% loose, and then asking for all loose documents to be assigned to the cluster to which they score highest. 48 Journal of Library Automation Vol. 2/1 March, 1969 The last parameter determines the amount of overlap. This number corresponds to a in the formula T _ fHn-1,•-a • (Hn-1,,-T), if Hn-1,t>T "· 1 ·' - LT otherwise which was mentioned under Implementation. When a = 0, no overlap is produced, and with a = 1, the maximum amount of overlap is produced. The actual percentage of overlap for a given value of a depends on the collection, but results indicate that 10% overlap for a = .4, and about 20% for a= .6. Although the algorithm is not guaranteed to terminate, convergence has always been obtained in practice. In order to prevent the program from looping in cases of non-convergence, the algorithm can be modified to permit a maximum of n iterations, whether or not convergence is ob- tained. The results indicate that clusters change very little after about four or five iterations, so that this modification would not make much difference in the final clusters. The true evaluation of the final clusters can only be made by actually performing two-level searches on the clustered document space. However, the algorithm is sufficiently general to allow for the evaluation of many different types of clusters. REFERENCES 1. Jones, S. K.; Jackson, D.: "Current Approaches to Classification and Clump-Finding at the Cambridge Language Research Unit," The Com- puter Journal, 10 (May 1967). 2. Doyle, L. B.: Breaking the Cost Barrier in Automatic Classification, SDC Paper SP-2516 (July 1966). 3. Needham, R. N.: The Termination of Certain Iterative Processes. The Rand Corporation Memorandum RM-5188-PR (November 1966). 4. Salton, G.; Lesk, M. E.: Computer Evaluation of Indexing and Text Processing, Information Storage and Retrieval; Report ISR-12 to the National Science Foundation, Section III. (Ithaca, N.Y.: Cornell Uni- versity, Department of Computer Science, August, 1967). 3798 ---- 49 BOOK REVIEWS Writing for Technical and Professional Journals by John H. Mitchell. John Wiley & Sons, Inc., New York, London and Sydney, 1968, 405 pp. This book reprints, describes, summarizes or refers to every item in what has to be the world's largest scrapbook of material relating to pro- fessional publication. The last 240 pages (three-fifths of the total) include "sample" style guides from the IEEE, Management Science, AIBS, ACS (including seven or eight pages of abbreviations used in Chemical Ab- stracts), AIP, the GPO, NASA, the Modem Language Association, the American Mathematical Society, the American Medical Association, the APA, the American Sociological Review, the American Economic Review, the Hispanic American Historical Review, the NEA, and sundry others. In almost every case, the excerpted or complete style guide is followed by an illustrative article. I would doubt that any other such compilation exists. The chapters which precede this anthology discuss more general aspects of writing for professional journals: design and approach, the collection, correlation, selection and anangement of data, and the . elements of jour- nal articles. The text in these chapters is crowded with material of the most varied and unexpected kinds: disquisitions on logic, formal organi- zation, outlining, interview techniques, information retrieval, the Dewey decimal system, the EJC Thesaurus, and much, much more. There is only one problem in all of this, but it is a· serious one, epito- mized by the quotation from Robert Louis Stevenson which Mitchell uses as motto for his first chapter: "If a man can group his ideas, he is a good writer." This real treasury of reference material is all but inaccessible to the reader. Titles of the five chapters are not very descriptive, and the index is not organized as a retrieval device. If one knows where in the book to look, he can find very useful information, but just leafing through the pages is neither efficient nor easy. It is made particularly difficult, in fact, by the striking lack of editorial judgment exercised in the design of the book. There is no differentiation between the author's comments and the examples and illustrations which he reprints (unless, as in some cases, the typography of the original has been reproduced). Headings within chapters, where they exist at all, are confusing-and again, it is often difficult to determine whether they are part of Mitchell's organization or part of some quoted work. As a result, it is hard to say who should buy this book and even harder to say how it might be used. Professor Mitchell, who "was elected Teacher of the Year by the students of the University of Massachusetts" in 1965, is presumably able to make selections from the contents and to present them effectively in a classroom. Perhaps the publishers might atone for 50 Journal of Library Automation Vol. 2/1 March, 1969 their abnegations of responsibility in preparing this book for the press by prevailing upon its author to write a supplementary, and much-needed, User's Guide to its contents. A. ]. Goldwyn Computer Peripherals & Typesetting by Arthur H. Phillips. London, Her Majesty's Stationary Office, 1968. 665 pp. $28.80. The appearance of a comprehensive volume on computer composition is a boon to librarians as it comes at a time when progress with MARC and other complex data bases calls for printing and other output capa- bilities which exceed those now commonly available with computers. Re- cent advances in photocomposition technology now make possible print- ing of graphic arts quality at acceptable costs for certain types of com- ~ puter produced library publications, such as book and periodical catalogs whose basic input includes upper- and lower-case and a full range of diacritical marks. With these advances librarians need no longer accept the limitations of character sets and image quality imposed by present line printers. A quality product is needed for outputs which are destined for publication. Some pioneers have already made good use of this ad- vanced technology to produce quality catalogs and lists; this book will help others to travel the same road. The volume is a comprehensive reference compendium of data on com- puter peripherals which is not otherwise available in convenient form. It gives special emphasis to the coding and keyboarding of alphanumeric texts and describes how the computer can be used for text processing with a typographic output. It also gives an appreciation of the problems involved and the techniques and equipment that are available to those who are preparing to enter this important field. The text is arranged in three sections. The first is an introduction to computer processing of alpha- numeric data which is intended for printing in typographic quality. The second describes many types of computer peripherals and gives consider- able attention to the various codes used for computer and printing equip- ment data input. The third section describes alphanumeric text composi- tion and the available graphic arts composing equipment. The text is supplemented by many illustrations, diagrams, and tables plus an index and a glossary of terms. While much of the material in the volume will become outdated within a short time, a substantial portion of it is sufficiently basic to retain its value for a longer period. This handsome book is intelligently conceived and well-written by one of England's leading authorities on printing and ' computer typesetting. For anyone seriously interested in the subject the volume is essential and worth its price. Richard De Gennaro Book Reviews 51 Coordinate Indexing, by John C. Costello, Jr. Rutgers Series on Systems for the Intellectual Organization of Information, Volume VII. Edited by Susan Artandi. The Rutgers University Press, New Brunswick, N.J., 1966. 218 pp. This paperback book is the result of ~ seminar meeting on coordinate indexing held April 28 and 29, 1966, under the sponsorship of the Rutgers Graduate School of Library Service. The volume consists of a detailed presentation of the subject by John Costello of Battelle Memorial Insti- tute, followed by a discussion of the presentation by four panelists. The objectives of the book as given in the preface are: to offer a de- scription, discussion, critique, and collection of facts and data on coordi- nate indexing as one of the systems which may be used to intellectually organize information contained in documents. Basically an introductory description of the subject is offered. However, the principles of coordinate indexing are included so that the material has value for anyone interested in the topic. With examples offered primarily from metallurgy and engi- neering, the emphasis is on the handling of technical documents. About half of the presentation is devoted to input, with storage, searching, and output comprising the other half. Discussion by the panel (Dr. Susan Artandi, Moderator; Dr. Charles L. Bernier; Dr. Vincent E. Giuliano; and Dr. I. A. Warheit) is not given verbatim, but summarized by the Editor. Although the Table of Contents is quite detailed, an index would make the book more useful. The inclusion of a selective bibliography is valu- able, but unfortunately it is almost never referred to· in the text. The bibliography is of course now somewhat out-of-date. Laura K. Osborn Libraries of the Future, by J. C. R. Licklider. The M.I.T. Press, Massachu- setts Institute of Technology, Cambridge, Massachusetts, 1965. Third Print- ing, September 1966. 219 pp. $6.00. This remarkable little book is rapidly becoming a classic in the field of information science. (Note that it is now in its third printing.) It ana- lyzes the concepts and problems of libraries of the future, "future" being defined as the year 2000. The book is the culmination of a two-year research project on the future of libraries sponsored by the Council on Library Resources. The study was conducted by Bolt Beranek and Newman, Inc. between Novem- ber, 1961, and November, 1963. The first part of this book describes man's interaction with recorded knowledge in what Mr. Licklider calls "Procognitive systems." The author assumes man will be reacting to segments of the entire body of recorded information within a vast hierarchical information network. He estimates 52 Journal of. Library Automation Vol. 2/1 Marc;:h, 1969 the present world corpus of knowledge could be stored in 1015 bits of computer memory. The rate of increase is 2·106 bits per second. Part two explores the use of computers within the procognitive system. Subjects touched upon include syntactical analysis of natural languages, quantitative aspects of the representation of information, information re- trieval effectiveness, and question-answer systems. Some time is spent with studies of current computer techniques. In general, part two is a trifle dated as it deals with specific techniques in a field where techno- logical obsolescence is precipitous. . Mr. Licklider's writing is both intellectually stimulating and delightful. In discussing the future computer console, " ... the concept of ·desk' may have changed from passive to active: a desk may be primarily a display- and-control station in a telecommunication-telecomputation system---,and its most vital part may be the cable (umbilical cord) that connects it, via a wall socket, into the procognitive utility net," A footnote goes on to say, "If a man wishes to get away from it all and think in peace and quiet, he will have merely to turn off the power. However, it may not be economic- ally feasible for his employer to pay him full rate for the time he thus spends in unamplified cerebration." Serious students of information or library science should consider this book required reading if for no other reason than the jolt it provides one's imagination1 Gerry D. Guthrie 4641 ---- 53 USA STANDARD FOR A FORMAT FOR BIBLIOGRAPHIC INFOR- MATION INTERCHANGE ON MAGNETIC TAPE The Chairman of the United States of America Standards Institute, Sectional Committee Z39, Library Work and Documentation, has ap- proved publication of the following draft "USA Standard for a Format for Bibliographic Information Interchange on Magnetic Tape" to hasten availability of this fundamental contribution to bibliographic standardiza- tion. Two important implementations follow the Standard. Part B of Appendix I is "Preliminary Guidelines for the Library of Congress, National Library of Medicine, and National Agricultural Library Imple- mentation of the Proposed American Standard for a Format for Biblio- graphic Information Interchange on Magnetic Tape as Applied to Records Representing Monographic Materials in Textual Printed Form (BooksT- more succintly known as MARC II. Part Cis a Committee working paper entitled "Preliminary Committee on Scientific and Technical Information (COSATI) Guidelines for Implementation of the USA Standard." 0. INTRODUCTION 0.1 T~is introduction is not part of the proposed standard but is included to facilitate its use. 0.2 This standard defines a format which is intended for the inter- change of bibliographic records on magnetic tape. It has not been designed as a record format for retention within the files of any specific organization. Nor has it been the intent of the subcommittee to define the content of individual records. Rather it has attempted to describe a generalized structure which can be used to transmit between systems records describing all forms of material capable of bibliographic qescriptions, as well as related records, such as authority records for authors and sub- ject headings. 54 Journal of Library Automation Vol. 2/2 June, 1969 0.3 In designing the format the subcommittee has tried to achieve the goals listed below. It recognizes, however, that the goals were not completely compatible and that various trade-offs were required. (a) Hospitality-to all kinds of bibliographic information should be provided; (b) Hardware independence-a format which can be used with a variety of digital computers should be defined; (c) Uniformity of structure-the structure of all machine records should be basically identical and include such control information as may be required "to specify unique characteristics. For any given class of records the com- ponents of the format may have specific meanings and unique characteristics; (d) Data Manipulation-the methods of recording and iden- . tifying data should provide for maximum manipulability · ·leading to ease of conversion -to other f.ormats for various uses. · 0.4 The standard· includes the concept that the bibliographic unit may .be described independently or in relation to other biblio- graphic units. Many relationships exist, including: the hier- archical, in which the bibliographic unit contains, or is contained in, another bibliographic unit, e.g., a monograph in a series; the equivalent, e.g., a work and its translation; and the sequential, e.g., a serial which appeared under a succession of titles. The standard provides for bibliographic records which describe one or more related bibliographic units, and provides for coding the relationships among them. Appendix II describes a proposed method for implementing this concept. 0.5 Preliminro·y guidelines for implementing the standard by two different groups of users are provided in Appendix I. These guidelines are not part of the standard but . are included to illustrate the use of the format. 0.6 Explanatory material which is not part of the standard but which will assist in its interpretation or implementation appears in brackets. 0.7 The appendices accompanying this standard are not part of the standard. · · · · 0.8 The development of this standard was made possible partially by support received from the National Science Foundation and the Council on Library . Resources. Personnel of . the US ASI Committee Z39 at the time the Committee approved the stand- ard were Dr. Jerrold Orne, Chairman; Mr. James Wood, Vice- Chairman; and Mr. Harold Oatfield, Secretary. USA Standard for a Format for Bibliographic Information Interchange 55 The Subcommittee on Machine Input Records, which is directly respon- sible for this standard, had the following personnel: Mrs. Henriette D. Avram, Chairman Assistant Coordinator of Information Systems Information Systems Office Library of Congress Washington, D.C. 20540 Mrs. Pauline A. Atherton School of Library Science Syracuse University 308 Carnegie Library Syracuse, New York 13210 Mr. Arthur R. Blum American Institute of Physics 335 East 45th Street New York, New York 10017 Mr. Lawrence F. Buckland President, lnforonics, Inc. 806 Massachusetts Avenue Cambridge, Massachusetts 02139 Miss Ann T. Curran Inforonics, Inc. 806 Massachusetts A venue Cambridge, Massachusetts 02139 Mr. Kay D. Guiles Information Systems Office Library of Congress Washington, D.C. 20540 Mr. Frederick G. Kilgour Director, Ohio College Library Center 1314 Kinnear Road Columbus, Ohio 43212 Mr. Abraham I. Lebowitz Assistant to the Director National Agricultural Library U.S. Department of Agriculture Washington, D.C. 20250 Mrs. Phyllis B. Steckler R. R. Bowker Company 1180 A venue of the Americas New York, New York 10036 56 Journal of Library Automation Vol. 2/ 2 June, 1969 1. GLOSSARY It has been considered unnecessary to define terms in common use. Terms which have a special meaning in the standard or which might be am- biguous are defined below. BASE ADDRESS OF DATA. A data element whose value is equal to the character position of the character following the field terminator of the directory, where the specified origin is the first character of the leader. [Example: If the directory contains two ( 2) entries, the first character position of data will be 49, and therefore the base address of data equals 49.] BASIC CHARACTER. A character occurring in columns 2, 3, 6 or 7 of the Standard Code as defined in USAS X3.4-1967 Code for Infor- mation Interchange, p. 6. [The basic character set is included as part of the illustration on page 82 of Appendix I, columns 2, 3, 6 and 7.] BIBLIOGRAPHIC INFORMATION INTERCHANGE FORMAT. A format for the exchange, rather than the local processing, of biblio- graphic records. (The terms "bibliographic information interchange format," "information interchange format," and "interchange format" are used interchangeably in this standard. ) BIBLIOGRAPHIC LEVEL. A data element which, in conjunction with the data element "type-of-record," specifies the characteristics and describes the components of the bibliographic record. [See Appendix I for an illustration of an application of this data element.] BIBLIOGRAPHIC RECORD. A collection of fields, including a leader, directory, and bibliographic data, describing one or more bibliographic units treated as one logical entity. BIBLIOGRAPHIC UNIT. A defined body of recorded information and the artifact on which it is recorded, e.g., a book, chapter of a book, map, cuneiform tablet, digital magnetic tape file, song (sheet music ), and song (phonograph record ). A bibliographic unit may be part of a larger bibliographic unit (e.g., the chapter as part of a book, which in tum is part of a series). [It is assumed that the originators of bib- liographic information and/or bibliographic descriptions follow a set of mles or guidelines which define, for the originating source, what is to be treated as a bibliographic unit.] A single author or subject heading authority record is also a bibliographic unit. CHARACTER. See INTERNAL CHARACTER. COMMUNICATIONS FORMAT. See BIBLIOGRAPHIC INFORMA- TION INTERCHANGE FORMAT. CONTROL FIELD. A variable field which supplies parameters which may be required in the processing of the bibliographic record. CONTROL NUMBER. An alphanumeric symbol uniquely associated with a bibliographic record assigned by the organization creating the bibliographic record. USA Standard for a Format for Bibliographic Information Interchange 57 DATA ELEMENT. A defined unit of information within a system. DATA ELEMENT IDENTIFIER. A code consisting of one or more basic characters used to identify individual data elements within a variable field. If and when data element identifiers are used, each occurrence must be immediately preceded by a delimiter, and each data element identifier must immediately precede the data element it identifies. The length (in characters) of the data element identifier must be uniform for each field of a given record. [In effect, a delimiter and data element identifier are combined to form a symbol used to initiate and identify data elements within a variable field. The use of the concept of data element identifiers is optional and provides a means of explicitly identifying data elements, even though in some instances there may be a redundancy of identification (e.g., if a variable field consists of only one data element, presumably the tag alone would provide sufficient identification). J DATA FIELD. A variable field containing bibliographic or other data not intended to supply parameters for the processing of the biblio- graphic record. DELIMITER. A character which serves as an initiator, a separator, or a terminator of individual data elements within a variable field. [Whether a delimiter is used to initiate, to separate, or to terminate, is dependent upon a specific system.] DELIMITER (OR DELIMITER PLUS DATA ELEMENT IDENTI- FIER) COUNT. A data element whose value is the length (in char- acters) of the delimiter (or, if data element identifiers are used, the length (in characters) of the delimiter plus data element identifier) used within the record. DIRECTORY. An index to the location of the variable fields (control and data) within a bibliographic record. The directory consists of entries. ENTRY. A fixed field within the directory which contains information about a variable field. ENTRY MAP. A data element which is used to indicate the structure of the entries in the directory. EXTERNAL CHARACI'ER. A graphic symbol which may be repre- sented by one or a series of two or more internal characters. [The external character "space, is always represented by an internal char- acter.] FIELD. A defined character string which may contain one or more data elements. See also CONTROL FIELD; DELIMITER; ENTRY; FIXED FIELD; INDICATOR; VARIABLE FIELD. FIELD TERMINATOR (FT). A character used to terminate a vari- able field within a bibliographic record. The last variable field is terminated by a record terminator and not a field terminator. 58 Journal of Library Automation Vol. 2/ 2 June, 1969 FILE. A set of related records denoted by a single name. FIXED FIELD. One in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the field from occurrence to occurrence. FORMAT. See STRUCTURE. FT. See FIELD TERMINATOR. INDICATOR. A data element associated with a data field which sup- plies additional information about the associated data field. INDICATOR COUNT. A data element whose value is the length (in characters) of the indicator( s) which appears as the first data ele- ment in each variable data field. The length (in characters) of the indicator ( s) must be uniform for each field of a given record. (A length of zero ( 0) is permitted) . INFORMATION INTERCHANGE FORMAT. See BIBLIOGRAPHIC INFORMATION INTERCHANGE FORMAT. INTERCHANGE FORMAT. See BIBLIOGRAPHIC INFORMATION INTERCHANGE FORMAT. INTERNAL CHARACTER. A pattern of bits of a predetermined length (depending on the system ) treated as a meaningful unit. (The terms "internal character" and "character" are used interchangeably in this standard.) LEADER. A fixed field which occurs at the beginning of each biblio- graphic record which provides parameters for the processing of the record. PADDING CHARACTER. A character used to fill areas in fixed fields which contain no data. [See paragraph A.2.1.4 of Appendix I.] PRIMARY BIBLIOGRAPHIC UNIT. That bibliographic unit whose physical and bibliographic characteristics determine the type-of- record and bibliographic level. RECORD. See BIBLIOGRAPHIC RECORD. RECORD LENGTH. A data element whose value is equal to the length (in characters) of the bibliographic record including the record ter- minator. RECORD TERMINATOR (RT) . A character used to terminate each record. RT. See RECORD TERMINATOR. STATUS. A data element which indicates the relation of the biblio- graphic record to a file (e.g., new, updated, etc.). STRUCTURE. The framework of fixed and variable fields within the bibliographic record. SUBRECORD. A group of fields within a bibliographic record which may be treated as a logical entity. [When a bibliographic record describes more than one bibliographic unit, the descriptions of in- dividual bibliographic units may be treated as subrecords.] USA Standard for a Format for Bibliographic Information Interchange 59 TAG. A series of characters used to specify the name or label of an associated variable field. TYPE-OF-RECORD. A data element which, in association with the data element "bibliographic level," indicates the form of the biblio- graphic description provided for the primary bibliographic unit. [It is assumed that the person providing the bibliographic description, on the basis of predefined criteria, will detemline the treatment a given item is to receive; i.e., whether the item is to be treated as a book, a journal article, a map, a picture, an abstract, a bibliographlcal footnote, etc. If a given item consists of parts which, if they occurred independently, would be accorded different bibliographic descriptions, the choice of treatment selected is assumed to be the most appro- priate. Frequently occurring combinations may be accorded their own treatments, e.g., collections of drawings with accompanying text. For each established form of bibliographic description, there will be a record format whose components are defined by the "type-of-record" data element. Among these components are the length of the fixed fields, the tagging scheme employed, and the definition of the data elements. If the interchange format is used for the interchange of records of a type for which "bibliographic description" is not a para- meter, e.g., authority records, this data element may be redefined. See Afpendix I for an illustration of an application of this data ele- ment. VARIABLE FIELD. One in which the length of an occurrence of the field is determined by the length (in characters) required to contain the data stored in that occurrence. The length may vary from one occurrence to the next. 2. PURPOSE AND SCOPE 2.1 2.2 2.3 2.4 This standard defines a format for the interchange of biblio- graphic and related [authority files, subject heading lists, etc.] records. This standard does not define a record format for retention within the files of any specific organization. This standard does not necessarily define the content of in- dividual records. It does describe a generalized structure which can be used for the interchange of records describing various forms of bibliographic material. This standard assumes the utilization of the following USASI Standards and Proposed Standards: (a) USAS X3.22-1967 Recorded Magnetic Tape for Infor- mation Interchange ( 800 CPI, NRZI) (b) USAS X3.4-1967 Code for Information Interchange (c) Proposed Standard X3.2/552 Magnetic Tape Labels and File Structure 60 Journal of Library Automation Vol. 2/2 June, 1969 3. BIBLIOGRAPHIC INFORMATION INTERCHANGE FORMAT 3.1 Schematic Representation The interchange format is schematically represented below: I . I I I . I Leader Directory I F Control IF Other IF Data IF Data I R Field I T n I IU!CORD LENGTH 0 I T NUlllber IT Control 1T Field rT I I Fields I I (If _l _l Present) I 1 I I ' I 3.2 Leader STATUS 4 5 3.2.1 Schematic Representation The leader is schematically represented below: TYPE OF IIIIILIQ- RESERVED INDI- DELIMITER !lASE RESERVED ENTRY RECORD GRAPHIC FOR CATOR (OR DE- ADDRESS FOR USI! MAP LEVEL FUTIIRI! COUNT LIMITER PLUS OF IIY USER USE DATA ELEMENT DATA SYSTEMS IDENTIFIER COUNT 6 7 8 9 10 ll 12 16 17 19 20 23 3.2.2 Record Length The record length is a 5-digit decimal number equal to the bibliographic record length. This number will include its own five characters and the record termina- tor. The record length will always be present in char- acter positions 0-4 of the record. In the interchange format the bibliographic record has a maximum length of 99,999 characters. 3.2.3 · Status A data element in character position 5 consisting of 1 basic character. 0 3.2.4 Type-of-Record A data element in character position 6 consisting of 1 basic character. o 3.2.5 Bibliographic Level A data element in character position 7 consisting of 1 basic character. 0 3.2.6 Indicator Count A data element in character position 10 consisting of 1 decimal digit 0 equal to the length (in characters) of the indicator ( s) which appears as the first data element of each variable data field. If indicators are not used, this field is set to zero ( 0). (See 3.4.2.1) • See Appendb: I for an lllustratfoa of an application of this data element. USA Standard for a Format for Bibliographic Information Interchange 61 3.2.7 Delimiter (or Delimiter Plus Data Element Identifier) Count A data element in character position 11 consisting of 1 decimal digit equal to the length (in characters) of the delimiter (or, if data element identiliers are used, the length (in characters) of the delimiter plus data element identifier) used within the record. If a delimiter is not used, this field is set to zero ( 0). If a delimiter alone (i.e., without data element identifiers) is used, this field is set to one ( 1 ) . 3.2.8 Base Address of Data A data element in character positions 12-16 consisting of 5 decimal digits and equal to the combined length (in characters) of the leader and directory (including the field terminator at the end of the directory) . 3.2.9 Entry Map (See 3.3.1 for the description of entries.) Structure of each entry in the directory: Tag Length Starting of Character Field Position Entry map: m n I ~ I ~ m = length (in characters) of the '1ength of field" portion of each entry in the directory n = length (in characters) of the "starting character position" portion of each entry in the directory 0 = undefined; available for future use The entry map is a data element in character positions 20-23 consisting of 4 decimal digits. Each decimal digit recorded corresponds sequentially to each portion of the entry, except for the portion allotted to the tag. Char- acter position 20 in the entry map indicates the length (in characters) of the "length of field" portion of each entry in the directory; character position 21 indicates the lenrh (in characters) of the "starting character position portion of each entry. If one of these does not occur, the relevant character position in the entry 62 Journal of Library Automation Vol. 2/ 2 June, 1969 map is set to zero. Character positions 22 and 23 are undefined and are available for future use. [Since bib- liographic data is usually variable in length, the struc- ture of an entry in the directory will usually follow the pattern "tag, length of field, starting character position." The inclusion of an entry map provides flexibility for those users who wish to structure the entry in the direc- tory differently, either by including (in addition to tag, length of field, and starting character position) other data elements not defined in this standard or by excluding those that have been defined. However, any restructur- ing of the entry by a user will have to be done within the general limitations imposed by the standard (see 3.3.1). The use of the entry map can be illustrated as follows : ( 1) An entry map set to 4500 would define the characteristics of a directory in which each entry consisted of a 3-digit tag (not expressed in the entry map), a 4-digit length of field, and a 5-digit starting character position. ( 2) An entry map set to 0500 would define the characteristics of a directory in which each entry consisted of a 3-digit tag, no length of field data element, and a 5-digit starting character position. See Appendix I for an illustration of an actual application of the concept of an entry map.] 3.3 Direct01·y The directory consists of a series of fixed fields (hereinafter referred to as entries). The directory ends with a field terminator. The directory must contain at least one entry for each subsequent variable field (control and data). [In the case of very long fields additional entries may be required. See 3.3.1.3.] 3.3.1 Entries Each entry consists of 12 characters. Each entry must contain, at the very least, a tag, and length of field, or a tag and starting character position and must corres- pond, unambiguously, to a specific variable length data or control field. The tag, length of field, and starting character position must, whenever they occur, be in that sequence. 3.3.1.1 Tag The tag is a data element consisting of 3 basic characters. 3.3.1.2 Tags for Control Fields Tags 001-009 are reserved for control fields as shown: USA Standard for a Format for Bibliographic Information Interchange 63 001 Control number 002 Reserved for Subrecord directory, if any• 003 Reserved for Subrecord relationship, if any• 004-009 Reserved for use by user systems 3.3.1.3 Length of Field The length of field in the entry is the length (in characters) of the variable field to which it corresponds. The length of field includes the indicator( s) and field terminator. It is expressed as a decimal number. If the length of a variable field exceeds the maximum length expressible as decimal num- ber in the length of field portion of the entry, two or more entries (called a "subset" for the purposes of this explanation) will be used to define the location and extent of such a field. Since all the entries in the subset of entries reference the same variable field, they will contain the same tag. The length of field in each entry of the subset, except the last entry in the subset, will be set to 0 to indicate that the length of field is equal to the maximum length expressible and that there is additional information for the same fi eld in the next entry in the record direptory. The length of field for all entries in the subset subsequent to the first will refer to the length (in characters) of the overflow data. [This convention cannot be followed if the structure of the entry does not contain a length of field.] 3.3.1.4 Starting Character Position The starting character position is the character position of the first character in the variable field (which may be an indicator or data; see 3.4.2) referenced by the entry. It is given relative to the base address of data (i.e., the first character of the first variable field follow- ing the directory is numbered 0) . 3.3.2 Sequence of Entries The entries in the directory may be recorded in any sequence (i.e., they need not be in the same sequence as the corresponding variable fields ) except that the • Ap pendix II illustrates a possible method of handling sub records within a b ibUoKTaphic record. This is not part of the St andard . 64 Journal of Library Automation Vol. 2/2 June, 1969 entry for tags 001-009 must always be first and in as- cending numeric sequence. [Note that specific systems may use the sequence of entries in the directory to con- vey semantic information.] 3.4 Variable Fields 3.4.1 General Following the leader and directory, the bibliographic record consists of variable fields. (Although the direc- tory is technically a variable field, the following para- graphs do not apply to it.) 3.4.2 Structure of Variable Fields INDICATOR(S)* Each variable field consists of indicators( s) (if used), a delimiter (if used), a data element identifier (if used), data, and a field terminator, as shown. Control fields do not contain indicators, delimiters, or data element iden- tifiers. DELIMITER* . DATA ELEMENT DATA FIELD IDENTIFIER* TERMINATOR * Except control fields 3.4.2.1 Indicator The indicator is the first data element in each variable field. The length (in characters) of the indicator(s), which may be 0, (i.e., no indicator is present) is recorded in the indica- tor count in the leader. All variable fields, except control fields, in the same record have the same length (in characters) for an indi- cator(s ). 3.4.3 Sequence of Variable Fields The variable fields, except for the control fields asso- ciated with tags 001-009, need not occur in the same sequence as the corresponding directory entries. The control fields which occur must be first and in ascending numeric sequence. 3.4.4 Control Fields The variable fields associated with tags 001-009 are con- trol fields. Control fields do not contain indicators, delimiters, or data element identifiers. 3.4.4.1 Control Number Field This field contains the control number, con- sisting of basic characters. This field must always occur once, and only once, in each USA Standard for a Format for Bibliographic Information Interchange 65 bibliographic record, and must immediately follow the directory. 3.5 Variable Data Fields 3.5.1 General The remainder of the bibliographic record consists of variable data fields. There are no restrictions on the munber, length, or content of the variable data fields other than those already stated or implied (e.g., those based on the limitations of the total record length). 3.5.2 Multiple Data Elements Multiple data elements within fields may be fixed or variable and may be identified by position, by the use of a delimiter alone, or by the use of a delimiter plus data element identifier( s) as the case may be. 4642 ---- 66 ]ourMl of Library Automation Vol. 2/2 June, 1969 APPENDIX I PRELIMINARY GUIDELINES FOR THE IMPLEMENTATION OF THE PROPOSED AMERICAN STANDARD FOR BIBLIOGRAPHIC INFORMATION INTERCHANGE ON MAGNETIC TAPE This Appendix is not part of the proposed Standard but is included to illustrate its application in one environment and recommended applica- tion in another. Part A of the Appendix contains general guidelines which apply to both Parts B and C. Part B contains the preliminary guidelines for the Library of Congress, National Library of Medicine, and National Agricultural Library implementation of the standard. Part C contains the proposed preliminary Committee on Scientific and Technical Informa- tion ( COSATI) guidelines for the implementation of the standard. A. GENERAL 1. Labels Volume header and file header labels are required and will con- form to USAS Proposed Standard X.3/552 Magnetic Tape Labels and File Structure. 2. Character Codes A code for a diacritical will always be recorded before the code for the alphabetic character which it modifies. 2.1 Character Definitions 2.1.1 Delimiter. The delimiter will consist of the "unit separator" (ASCII character 1/15). 2.1.2 Field TermiMtor. The field terminator will consist of the "record separator" (ASCII character 1/14). 2.1.3 Record TermiMtor. The record terminator will con- sist of the "group separator" (ASCII character 1/13). . 2.1.4 Padding Character. The padding character will consist of the "space" (ASCII character 2/ 0). 2.2 Ba.sic Character Set The basic character set will consist of the characters in columns 2, 3, 6 and 7 of the Standard Code as defined in USAS X3.4-1967 Code for Information Interchange, p. 6. This basic character set is included as part of the illustra- tion on p . 82 of this Appendix, columns 2, 3, 6 and 7. 3. Type-of-Record Symbols The following table indicates the type-of-record symbols that have been assigned at this time : Symbol a b c d e f f i j k 1 X y Meaning Printed text Manuscript text Printed music Manuscript music Printed maps Manuscript maps Motion pictures; fllms Microform publications Recorded sound (language) Recorded sound (music) Pictures Digital media Authority data- names Authority data- subjects Appendix I 67 4. Bibliographic Level Symbols The following table indicates the bibliographic level symbols that have been assigned at this time: Symbol a m s c Meaning Analytical (a bibliographic unit generally not published separately but part of a larger bibliographic entity) Monographic publication Serial publication (a bibliographic unit issued in successive parts, usually dated or numbered, intended to be continued indefinitely) Collective (a made-up collection which is gathered together and cataloged as a unit) Indicates that this data element is not used. 5. Status Symbols The following table indicates the status symbols that have been assigned at this time. The meaning of the symbols is relative to the transmitting source. Symbol n Meaning New record c d Changed or corrected record (complete record to be substituted for one previously transmitted) Deleted record 68 Journal of Library Automation Vol. 2/ 2 June, 1969 B. PRELIMINARY GUIDELINES FOR THE LIBRARY OF CON- GRESS, NATIONAL LIBRARY OF MEDICINE, AND NATIONAL AGRICULTURAL LIBRARY IMPLEMENTATION OF THE PRO- POSED AMERICAN STANDARD FOR A FORMAT FOR BIBLIO- GRAPHIC INFORMATION INTERCHANGE ON MAGNETIC TAPE AS APPLIED TO RECORDS REPRESENTING MONO- GRAPHIC MATERIALS IN TEXTUAL PRINTED FORM (BOOKS) 1. Labels 1.1 Header Labels The following table indicates the data elements of the volume and file labels and their permissible values. 1.1.1 Volume Header Data Element Name Length Contents Label Identifier 8 .. vol" Label Number 1 •T' Volume Serial Number 6 reserved for user Accessibility 1 lb Unused 26 reserved for user Format Description 28 usaslbz39.2-1969bbiibfmtli~bl313 Label Standard Level 1 1.1.2 File Header Data Element Length Contents Name Label Identifier 3 "hdr" Label Number 1 eel" File Identifier 17 mixedbbibliobdata Set Identifier 6 marcb2 File Section Number 4 "0001" File Sequence Number 4 "0001" Unused 6 blanks Creation Date 6 "l6yyddd" Expiration Date 6 '1llyyddd" or blanks Accessibility 1 13 Block Count 6 "000000" System Code 13 reserved for user Unused 7 blanks 1.2 End of File Data Element Name Length Contents Label Identifier 3 "eof" Label Number 1 'T' Appendix I 69 The next 50 characters correspond to those in the same positions in the header label. Block Count 6 nnnnnn The next 20 characters correspond to those in the same positions in the header label. ~=blank n = decimal digit YY = last two digits of year ddd = day number in Julian calendar 2. Delimiter and Data Element Identifier The delimiter will consist of the unit separator. The data ele- ment identifier will consist of one basic character. A delimiter will precede each data element identifier which in turn precedes each data element that it identifies. The first data element in each variable field will always be preceded by a delimiter and a data element identifier (even though there is only one data element in the field). 3. Indicat01' Two indicators will be used as the first two data elements in each variable data field. Each indicator will consist of one basic char- acter. H an indicator is not used, it will be set to blank (ASCII character 2/0) . No indicators are used in the control fields. 4. Leader The following table indicates the data elements in the leader and their permissible values and formats. Record Length Decimal digits, right justified, Status Type-of-Record Bibliographic Level Indicator Count Delimiter Count Base Address of Data Entry Map with leading zeros. As defined in paragraph A.5 of this Appendix. c' ,, a " ,, m Decimal digits, right justified, leading zeros. "4500" 70 Journal of Library Automation Vol. 2/ 2 June, 1969 5. Directory Each directory entry consists of the following data elements: Tag 3 decimal digits Length of field 4 decimal digits, right justified, leading zeros. Starting Character 5 decimal digits, right justified, Position leading zeros. The directory ends with a field terminator. 6. Control Fields Tag 001 008 Control Number Fixed Length Data Character Positions 0-5 Date entered on file 6 Type of publication 7-10 Date of publication 1 11-14 Date of publication 2 15-17 Country of publication code 18-21 Illustration codes 22 Intellectual level code 23 Form of reproduction code 24-27 Form of contents codes 28 Government publication indicator 29 Conference proceedings indicator 30 Festschrift indicator 31 Index indicator 32 Main entry in body of entry indicator 33 Fiction indicator 34 Biography code 35-37 Language code 38 Modified record indicator 39 Cataloging source code 7. Variable Field Data Elements Tag Indicator Data 1 010 Element 2 Identifier Preceded by a "Unit Separator." a Name Library of Congress card number Library of Congress card number Appendix I 71 011 Linking Library of Congress card number a Linkin~ Library of Congress card num er 015 National bibliography number a National bibliography number 016 Linking national bibliography number a Linking national bibliography number 020 Standard book number a Standard book number 021 Linking standard book number a Linking standard book number 025 Overseas acquisition number a Overseas acquisition number 026 Linking OAN number a Linking OAN number 035 Local system number a Local system number 036 Linking local system number a Linking local system number 040 Cataloging source a Cataloging source 041 Language ( s) 0 Work contains more than one language 1 Work is a translation a Group of 3-character language codes needed to describe languages of the text or its translation b Languages of summaries 042 Search code a Search code 050 Library of Congress call number 0 Book is in Library of Congress 1 Book is not in Library of Congress a Library of Congress classification number b Book number 051 Copy, issue, offprint statement a Library of Congress classification number b Book number c Copy information 72 Journal of Library Automation Vol. 2/2 June, 1969 060 National Library of Medicine call number 0 Book is in National Library of Medicine 1 Book is not in National Library of Medicine a National Library of Medicine classification number b Book number 070 National Agricultural Library call number 0 Book is in National Agricultural Library 1 Book is not in National Agricultural Library a National Agricultural Library classification number b Book number 071 National Agricultural Library subject category a National Agricultmal Library subject category 080 Universal Decimal classification number a UDC number 081 British National Bibliography classification number a BNB classification number 082 Dewey Decimal classification number a DDC number 086 Supt. of Documents classification number a Supt. of Documents classification number 090 Local call number 100 Personal name as main entry (Names may be established in conformity with the ALA or Anglo-American rules.) 0 Forename only 1 Single surname 2 Multiple surname 3 Name of family 0 Main entry is not subject Appendix I 73 1 Main entry is subject a Name b Numeration c Titles and other words associated with name d Dates e Relator k Form subheading t Title (of book) 110 Corporate name as main entry 0 Surname (inverted) -· 1 Place or place and name 2 N arne (direct order) 0 Main entry is not subject 1 Main entry is subject a Name · b Each subordinate tmit e Relator k Form subheading t Title (of book) 111 Conference or meeting as main entry 0 Surname (inverted) 1 Place and name 2 Name (direct order) 0 Main entry is not subject 1 Main entry is subject a Name b Number c Place d Date · e Subordinate unit in name g Other information k Form subheading t Title (of book) 130 Uniform title heading as main entry ~ Null condition in first indicator 0 Main entry is not subject 1 Main entry is subject a Uniform title heading t Title (of a book) 240 Uniform title 0 Not printed on LC card 1 Printed on LC card a Uniform title 74 Journal of Library Automation Vol. 2/2 June, 1969 241 Romanized title 0 Does not receive title added entry 1 Receives title added entry a Romanized title 242 Translated title a Translated title 245 Title statement 0 No title added entry in this form 1 Title added entry in this form a Short title b Remainder of title c Transcription of remainder of title page up to next field 250 Edition statement a Edition b Additional information 260 Imprint 0 Publisher is not main entry 1 Publisher is main entry a Place b Publisher c Date 300 Collation a Pagination or volumes b Illustration( s) c Height 350 Bibliographic price a Bibliographic price 360 Converted price a Converted price 400° Series note-personal name 0 Forename only 1 Single surname 2 Multiple surname 3 Name of family 0 Author of series is not main entry 1 Author of series is main entry a Name b Numeration c Titles, other name-associated words d Dates e Relator k Form subheading t Title (of series) v Volume or number •used only when series is traced in the same form. Appendix I 75 410. Series note-corporate name 0 Surname (inverted) 1 Place or place and name 2 Name (direct order) 0 Author of series is not main entry 1 Author of series is main entry a Name b Each subordinate unit e Relator k Form subheading t Title (of series) v Volume or number 411. Series note--conference 0 Surname (inverted) 1 Place and name 2 Name (direct order) a Name b Number c Place d Date e Subordinate unit in name ~ Other information Form subheading t Title (of book) v Volume or number 440. Title a Title v Volume or number 490 Series untraced or traced differently 0 Series not traced 1 Series traced differently a Series statement 500 General note a General note 501 "Bound with" note a "Bound with" note 502 Dissertation note a Dissertation note 503 Bibliographic history note a Bibliographic history note 504 Bibliography note a Bibliography note 505 Formatted contents note 0 "Complete" contents 1 "Incomplete" contents • Used only when series is traced In the same form. 76 I ournal of Library Automation Vol. 2/ 2 June, 1969 2 Partial contents a Contents note 506 " "Limited use note a " "Limited use note 520 Abstract or annotation a Abstract or annotation 600 Personal name as subject added entry 0 Forename only 1 Single surname 2 Multiple surname 3 Name · of .family 0 LC subject heading 1 Subj. heading assigned for use in children's catalog 2 NLM subject heading 3 NAL subject heading a Name b Numeration c Titles, other name-associated words d Dates e Relator k Form subheading t Title (of book) X General subdivision y Period subdivision z Place . subdivision 610 Corporate name as subject added entry 0 Surname (inverted) 1 Place or place and name 2 Name' (direct order) 0 LC subject heading 1 Subj. heading assigned for use in children's catalog 2 NLM subject heading 3 NAL subject heading a Name · b Each subordinate unit e Relator k Form subheading t Title (of book) X General subdivision y Period subdivision z Place subdivision •Used only when series iJ traced In tho same form. Appendix I 77 611 Conference as subject added entry 0 Surname (inverted) 1 Place and name 2 N arne (direct order) 0 LC subject heading 1 Subj. heading assigned for use in children's catalog 2 NLM subject heading 8 NAL subject heading a Name b Number c Place d Date e Subordinate unit in name f Other information Form subheading t Title (of book) :1 General subdivision y Period subdivision z Place subdivision 630 Uniform title heading as subject ~ added entry Null condition in first indicator 0 LC subject heading 1 Subj. heading assigned for use in children's catalog 2 NLM subject heading 8 NAL subject heading a Unif01m title heading t Title (of book) X General subdivision y Period subdivision z Place subdivision 650 Topical subject added entry 0 Not entered under place 1 Entered under place 0 LC subject heading 1 Subj. heading, children's catalog 2 NLM subject heading 8 NAL subject heading a Topical subject heading b Name following place entry element X General subdivision y Period subdivision z Place subdivision 78 Journal of Library Automation Vol. 2/ 2 June, 1969 651 Geographic name (not capable of authorship) as subject added entry 0 Not entered under place 1 Entered under place 0 LC subject heading 1 Subj. heading assigned for use in children's catalog 2 NLM subject heading 3 NAL subject heading a Geographic name b Geographic name following place entry element X General subdivision y Period subdivision z Place subdivision 652 Political jurisdiction as subject added entry 16 Null condition in first indicator 0 LC subject heading 1 Subj. heading assigned for use in children's catalog 2 NLM subject heading 3 NAL subject heading a Political jurisdiction X General subdivision y Period subdivision z Place subdivision 690 Local subject headings 16 Reserved for user lb Reserved for user a Subject heading X General subdivision y Period subdivision z Place subdivision 700 Personal name as added entry 0 Forename only 1 S~le surname 2 M tiple surname 3 Name of family 0 Alternative entry 1 Secondary entry 2 Analytical entry a Name b Numeration c Titles, other name-associated words ApPendix I 79 d Dates e Relator k Form subheading t Title (of book) u Non-printing filing information 710 Corporate name as added entry 0 Surname (inverted) 1 Place or place and name 2 N arne (direct order) 0 Alternative entry 1 Secondary entry 2 Analytical entry a Name b Each subordinate unit e Relator k Fotm subheading t Title (of book) u Non-printing filing information 711 Conference as added entry 0 Surname (inverted) 1 Place and name 2 N arne (direct order) 0 Alternative entry 1 Secondary entry 2 Analytical entry a Name b Number c Place d Date e Subordinate unit in name g Other information k Form subheading t Title (of book) u Non-printing filing information 730 Uniform title heading as added entry fJ Null condition in first indicator 0 Alternative entry 1 Secondary entry 2 Analytical entry a Uniform title heading t Title u Non-printing filing information 140 Title traced differently from short title fJ Null condition in first indicator 80 Journal of Library Automation Vol. 2/ 2 June, 1969 0 Alternative entry 1 Secondary entry 2 Analytical entry a Title traced differently from short title 750 N arne not capable of authorship 0 Not entered under place 1 Entered under place 0 Alternative entry 1 Secondary entry 2 Analytical entry a Name or place entry element b Name following place entry element 800° Personal name-title series added entry 810• Corporate name-title series added entry 811° Conference-title series added entry 840° Title series added entry 900 Block of 100 numbers for local use 8. Extended ASCII Character Set for Roman Alphabet and Rcnnan- ized Non-Roman Alphabets 8.1 Scope A library character set for the roman alphabet and roman- ized non-roman alphabets necessitates a larger number of characters than are provided for in the 7 -bit American Standard Code for Information Interchange (ASCII) . In addition, many libraries only have a 6-bit capability. There- fore, it was necessary to develop a character set which would meet all of the following requirements: (a) leave the 7 -bit standard (ASCII) intact, (b) expand the 7-bit standard to include an 8th bit to provide additional chamcters, (c) provide a shift mechanism which would make it possible to use all of the characters defined in the 8-bit set in the 6-bit environment. This section describes such a character set. 8.2 Criteria Governing Selection of Characters 8.2.1 Frequency of occurrence of character 8.2.2 Degree of necessity in expressing character when it occurred 8.2.3 Possibility of substituting one character for another or of expressing a character by writing it out •TaKI in the 800'• aro used for series added entries traced differently from the series rtatement. With the exception that no aecond indicators are used in the SOO's, tho indicators and data element ldentiBen are tile same as those used with the 400' s. Appendix I 81 8.3 Digital Codes The correlation of the character set to digital form code is based upon the ASCII (American Standard Code for In- formation Interchange) Standard. In conformance with the design considerations of ASCII ( 7 -bit code), the char- acter set is also correlated to an 8-bit code and a 6-bit code. The basic digital form code for the character set is the 8-bit code (see Figure 1 ) . 8.3.1 The 8-bit code is an extended form of the standard 7-bit ASCII. Some of the standard ASCII charac- ters such as the braces or the backwards slash are not proposed for the character set. However, no characters will be substituted for these code posi- tions. Other characters such as diacritical marks will be left in their standard position (unused) and duplicated in another portion of the code set reserved for special characters and diacriticals. 8.3.2 The 7 -bit code will be derived from the 8-bit code by removing the 8th bit. Those characters which previously had a 0 in the 8th bit will be considered part of the standard 7 -bit ASCII set. Those with a 1 in the 8th bit will be considered part of the non- standard set. A SO (shift out) control character will be used to go from the standard set to the non-standard. The code will stay in the non- standard mode until a SI (shift in) control char- acter is reached. 7-bit 8-bit SI I USASCII I ~I'--------'-;8-th_b_i_t _-_0__JI SO (special characters ~I :'8th bit ,. 1 I and diacriticals) : ~-----------------~---------~ 8.3.3 The 6-bit code will be derived by removing the 6th bit and the 8th bit. The 8-bit code set will be divided into 4 sets as follows: Columns 2, 3, 6 & 7 = Standard Set (referred to as the "basic set" in the Proposed Standard for a Format for Bibliographic Information Interchange, p. 56 and in paragraph A.2 of this Appendix) Fig. 1. Proposed Extended ASCII Character Set --- Standard 6-bit set -----Non-standard set 1 •• •••·•·· Non-standard set 2 ~ ~ ~ 1 2 ~ ~ 1 ~ ~ ~ ¢ ~ ¢ INUL IDLE SP ~ ~ ¢ 1 1 ISOH IDC1 ~ ¢ 1 ¢ 2 ISTX IDC2 " ~ ~ 1 1 3 IETX IDC3 tl ¢ 1 ~ ~ 4 IEOT IDC4 $ ¢ 1 ¢ 1 5 IENQ. I NAK /. ~ 1 1 ~ 6 lACK ISYN & ~ 1 1 1 7 IBEL IETB 1 ¢ ¢ ¢ 8 IBS CAN . ( 1 ¢" ¢ 1 9 IHT El-i 1 ¢ 1 ¢ 10 I LF SUB * 1 ¢ 1 1 11 IVT ESC + 1 1 ¢ . ¢ 12 IFF FS J 1 ¢ 1 13 ICR IGS ~· - 1 i 1 ¢ 14 I SO I RS !l' • ~ I~ ~ - 1 1 ¢ 1 ~ 3 I 4 ¢ I @ A 2 B . 3 c 4 D 5 E 6 F 7 G 8 H 9 I J K ( L = H > N 5 p Q R s T u v w X y z [ \ J ¢ 1 ¢ 1 ¢ I~ 1 1 1 1 ¢ 1 6 I 7 ' 1 I P a. q b r c s d t e u f v g v h X i y J z k {3 1 m } 3 A 1 , n 1 8 1 ~ ¢ ¢ 9 1 ~ ~ 1 1 ~ 1 ¢ 10 !. H D p IE (E / ; ® ± ()" 11 }: fi!J d p re ce // .l £ '& 1 ~ 1 1 0" t.r I u- 1 1 ¢ ~ 12 1 1 ¢ 1 13 ' 1 1118B 1 1 7 I 1 1 6 T ¢ 1 5 s I 14 __ ./...!s.. .. '1 .) ... .. A ... ... v r .; ) ~ ..:. 1 1 1 1 15 ~_r_ ~~rs 2.l I I ? L~ .... l.,..._1 J o I nEt I I L .......... ~ .......... ~ 1 ~ ......... ~ ........ 1 4 3 2 1 BITS Key:(1)Redefined elsewhere in the set. (2)To be used as terminators or delimiters. (3)To be "used as shift codes for 6-blt set(non•1ocking). 1- 00 to I -.Q. t"'< .... ~ ~ '"'l ~ .... ~ -c;· ;:3 < 0 r- to ~ ._ § Sll ~ CD ffi Appendix I 83 Columns 0, 1, 4 & 5 = Non-standard set ( 1) Columns 10, 11, 14 & 15 =Non-standard set (2) Character 7B in the standard set will be used as a non-locking shift code to reach non-standard set ( 1); character 7/11 (column 7, row 11 of Figure 1) will shift to non-standard set ( 2). The presence of one of these codes will indicate that the next character is in one of the appropriate non-standard set. The code will then be automatically shifted back to the standard set. C. PROPOSED PRELIMINARY COSATI GUIDELINES FOR THE IM- PLEMENTATION OF THE PROPOSED AMERICAN STANDARD FOR A FORMAT FOR BIBLIOGRAPHIC INFORMATION INTER- CHANGE ON MAGNETIC TAPE (This imple,mentation guide, prepared by a panel of the Committee on Science and Technical Information, should be regarded as a Committee working paper until approval by the Federal Council for Science and Technology, to which it is in process of being presented.) 1. LABELS 1.1 HEADER LABELS 1.1.1 VOLUME HEADER As specified in paragraph B.l.l.1 of this Appendix. 1.1.2 FILE HEADER As specified in paragraph B.l.l.2 of this Appendix, except that the Set Identifier (Field 4) shall con- tain the characters "COSATI." 2. DELIMITER The delimiter shall consist of the "unit separator" (ASCII char- acter 1/15). 3. INDICATOR The indicator is a two-character code consisting of basic charac- ters specifying the origin or authority for the data in each vari- able field. The codes as presently assigned are as follows: Federal Agency Code Legislative Branch General Accounting Office Govemment Printing Office Library of Congress Judicial Branch Administrative Office of the U.S. Courts The Supreme Court of the U.S. GG GP LI AO SI 84 Journal of Library Automation Vol. 2/2 June, 1969 Executive Branch American Battle Monuments Commission AC Appalachian Regional Commission AR Atomic Energy Commission AI Bureau of the Budget BO Canal Zone Government CV Central Intelligence Agency CL Civil Aeronautics Board CC Commission of Fine Arts CI Council of Economic Advisers CF Delaware River Basin Commission EE Department of Agriculture AL Department of Commerce CO Department of Defense Office, Secretary of Defense (includes Defense Agencies not indicated below) DD Department of Army DA Department of Navy DN Department of Air Force DF Defense Supply Agency DS Defense Atomic Support Agency DH Defense Communications Agency DK Department of Health, Education, & Welfare HH Department of Housing and Urban Development HU Department of Interior IN Department of Justice JU Department of Labor LA • . ' Department of State SU Agency for International Development SV Peace Corps SW Department of Transportation TO Department of Treasury TR District of Columbia Government CZ Export-Import Bank of Washington EI Farm Credit Administration FC Federal Aviation Agency FA Federal Coal Mine Safety Board FG Federal Communications Commission FE Federal Deposit Insurance Corporation FK Federal Home Loan Bank Board FM Federal Maritime Commission FO Federal Mediation and Conciliation Service FQ Federal Power Commission FS Federal Reserve System FU Federal Trade Commission FW Appendix I 85 Foreign Claims Settlement Commission General Services Administration Indian Claims Commission Interstate Commerce Commission National Aeronautics and Space Administration National Aeronautics and Space Council National Capital Housing Authority National Foundation on Arts and Humanities National Labor Relations Board National Mediation Board National Security Council National Science Foundation Office of Economic Opportunity Office of Emergency Planning Office of Science and Technology Office of Special Representative for FI GS IK IC NC NF NH AU NI NM NO NS OE OH OS Trade Negotiations TU Panama Canal Company PC Post Office Department PO Railroad Retirement Board RR Renegotiation Board RE Saint Lawrence Seaway Development Corporation SX Securities and Exchange Commission SL Selective Service System SR Small Business Administration SF Smithsonian Institution SO Subversive Activities Control Board SC Tax Comt of the United States TC Tennessee Valley Authority TX U.S. Arms Control and Disarmament Agency AF U.S. Civil Service Commission CR U.S. Information Agency US U.S. Tariff Commission UW Veterans Administration VA Virgin Island Corporation VI 4. LEADER The following table indicates the data elements in the leader and their permissible values and formats. Record Length Decimal digits, right justified, with leading zeros Status Type-of-Record Bibliographic Level As defined in paragraph A.5 of this Appendix (( , a As defined in paragraph A.4 of this Appendix 86 Journal of Library Automation Vol. 2/2 June, 1969 Indicator Count Delimiter Count "2" "I" Base Address of Data Decimal digits, right justified, with leading zeros Entry Map "4500" 5. DIRECTORY Each directory entry consists of the following data elements : Tag 3 decimal digits Length 4 decimal digits, right justified, leading zeros Starting Character Position 5 decimal digits, right justified, leading zeros The directory ends with a field terminator. The entries in the directory shall be sequenced in ascending numeric order by the tag. 6. CONTROL FIELD TAG DESIGNATION: CONTENT 001 Record Identification Number An identification number is assigned for purposes of file control by the specific or- ganization which is distributing the tape. The number may be newly assigned, may be the accession number assigned by a documentation center, or may be the report number assigned either by the originating organization or the monitoring organization, depending on the practice of the organiza- tion which writes the tape. Examples might be: AD-635 050, PB-170 275, UCRL-1376. 7. VARIABLE FIELD DATA ELEMENTS TAG 100 DESIGNATION: Type of Item CONTENT This defines whether tl1e item is cataloged as a monograph, serial title, journal article, patent, technical report, audio-visual mate- TAG DESIGNATION: CONTENT rial, etc. 110 Security Classification of Item This is the alphabetic code which properly specifies the security classification of the item. The codes available include: U unclassified C confidential S secret TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION : CONTENT Appendix I 87 120 Downgrading Authority Code Codes should be taken from DoD Industrial Security Manual, App. 2, Automatic Down- grading and Declassification System. 130 Distribution Limitation Statements These are limitations (other than security classification) on the availability of the item to the public. These limitations might in- clude: not reproducible; only available on loan; only available to certain recipients. 140 Distribution Limitation Codes Codes corresponding to information in field 130. 150 Cataloging Organization Acronym or name of cataloging organiza- tion. 160 Announcement Journal Reference This designates the specific issue of the an- nouncement journal in which this record is published. 170 Source Report Numbers These report numbers are the numbers as- signed to the report by the orginating or- ganization. Examples might be: 180 UCRL-1035 RM-4244-PR TM/ ADC/ 820/ 03 Monitoring Organization Report Numbers These are the report numbers assigned by the monitoring or sponsoring organizations. Examples might be : NASA-CR-263 ASD-TDR-63-24 190 Other Report Identification Numbers These are other identification numbers such as other organization identification numbers which do not fall into the other categories. 88 Journal of Library Automation Vol. 2/2 June, 1969 TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION : CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT 200 Project Numbers Included here are the project numbers un- der which the work was performed. A proj- ect is a grouping of tasks or efforts directed toward a single end result. The project is the basic building-block used in planning, reviewing, and reporting of performance of research and developing programs. 230 Security Classification of Title This is the classification of the content of data element 240. Use codes shown in data element 110. 240 Classified Title This is the classified title in the vernacular, transliterated if necessary. This title is in- tegral to the work. 250 Unclassified Translated Title This title is supplied by the cataloger if not on the document. 260 Alternate Title Entry Tllis is an alternate title derived from the secondary part of the title as given on the title pages or a catchword title or subtitle. 270 Index Annotation This is an edited or supplied version of the title that more accurately reflects subject content of the work than the original title. 280 Personal Names These are the names of people associated with the responsibility for the intellectual content of the item. This might include authors, compilers, illustrators, translators, etc., but it excludes personal names used as subjects. Data will be entered in the form Last Name, Initial. Initial. Examples nlight be: Smith, J. R. Roberts, A. B. TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT Appendix I 89 290 Personal Name Affiliation This is the affiliation of each name in data element 280 if that affiliation is different from the Corporate Author. 300 Corporate Authors These are the names of the organizations associated with the intellectual content of the work. They do not include the publisher or personal affiliation or sponsor except where they are also the Corporate Author. 310 Corporate Author Codes These are the codes which correspond to the content of data element 300 (if present). 320 Contract Numbers The contract number is an alphanumeric identifier of the contract cited in the report which designates the financial support of the report. Some examples might be: AF 33(657)-8146 DA036-039-sc-8727 4 330 Grant Numbers The grant number is an alphanumeric iden- tifier of the grant cited in the report which designates the financial support of the re- port. Some examples might be: 340 NIH-5R01-CA-03157-02 NSF-GP-2528 Sponsoring Organizations Sponsors include any and all of the follow- ing: true sponsors, who furnished financial support and issued the contracts; monitors, who supervised compliance with the con- tract; and beneficiaries, for whose benefit the work was done and the report written. 350 COSATI Subject Category Codes These are alphanumeric codes used to group subject terms according to broad subject areas established by COSATI. 90 Journal of Library Automation Vol. 2/ 2 June, 1969 TAG 360 DESIGNATION: Other Subject Heading Codes CONTENT These are alphanumeric codes used to group subject terms according to broad subject areas which have been established by organizations other than COSA TI. TAG 370 DESIGNATION: Primary Subject Term Security Classifica- tion CONTENT This is the classification code of the subject term in data element 380 which has the highest classification. Use codes shown in data element 110. TAG 380 DESIGNATION: Controlled Primary Subject Terms CONTENT This consists of vocabulary taken from the controlled list of subject terms which de- scribes the prime subject content and ap- pears as a heading in the bibliography. TAG 390 DESIGNATION: Secondary Subject Term Security Classi£ca- tion Content CONTENT This is the classification code of the subject term in data element 400 which has the highest classification. Use codes shown in data element 110. TAG 400 . ' DESIGNATION: Controlled Secondary Subject Terms CONTENT This consists of vocabulary taken from the controlled list of subject terms which de- scribes the subject content and is available in the system of the organization but does not appear in the bibliography. TAG 410 DESIGNATION: Security Classification of Provisional Sub- CONTENT ject Terms This is the classification code of the provi- sional subject term in data element 420 which has the highest classification. Use codes shown in data element 110. TAG 420 DESIGNATION: Provisional Subject Terms CONTENT These are terms which may be applied to subject-classify the content of the work but TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT Appendix I 91 are usually taken from the work itself and not from any authorized listing or thesau- rus. They may be terms which will even- tua1ly be classified as controlled vocabulary depending on their frequency and consist- ency of use or importance as a retrieval tag, or they may stay in the uncontrolled voca- bulary group indefinitely. 430 Security Classification of Special Retrieval Terms This is the classification code of the special retrieval term in data element 440 which has the highest classification. Use codes shown in data element 110. 440 Special Reb·ieval Terms These are terms which designate project names, equipment nomenclature, trade names, catch words. 450 Source Journal Citation This contains the source journal title, the volume and issue number, the pages on which the article appears, and the date of the journal issue. Definition is provided as to whether the item is a reprint or if the source journal is in another language. 460 Original Language of Item This is the language (or languages) in which the item origina1ly appeared if dif- ferent from data element number 470. 470 Present Language of Item This is the language (or languages) in which the item appears at present. It may be the original language, or it may be the result of translation. 480 Imprint Date of Item This is the date of current publication of the item. This would include new imprints, translation dates, date of revision, etc. 92 Journal of Library Aut01TUltion Vol. 2/ 2 June, 1969 TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT 490 Original Date of Item This contains the date of the original com- pletion or publication of the item only when different from the date of imprint. 500 Place of Publication This is the city of publication and includes state or country when necessary for iden- tification. 510 Country of Origin of the Intellectual Effort This is the country where the original work was done and is not necessarily the coun- try where the publication or translation occurred. 520 Number of Pages This is the number of pages and/or volumes as determined by current cataloging proce- dures. 530 Availability and Price This is the acronym or name of the specific organization, if any, from which the docu- ment is available, the hardcopy price, and the microform price. 540 Descriptive Note This is a title without subject content which describes the type of item, such as Final Report, Progress Report for the Period . . . , Quarterly Technical Status Report, etc. 550 Bibliography Note This is a note which indicates the presence of bibliographic information as part of the contents of the work. 560 Dissertation Note This is a note which identifies the work as an academic dissertation presented in par- tial fulfillment of requirements for a degree. It usually names the institution or faculty to which the dissertation was presented and TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT TAG DESIGNATION: CONTENT Appendix I 93 the degree for which the author was a candidate. 570 Contents Note This is a note which lists either all or part of the contents of a work, such as authors and titles, in order to bring out important parts of the work not mentioned in the main title. 580 Notes, General These are notes not covered elsewhere. 590 Owner or Assignee These are the names of owners or assignees of the patent. 600 Security Classification of Abstract Use the codes shown in data element 110. TAG 610 DESIGNATION: Languages of Abstracts or Summaries, if CONTENT different from data element 470. TAG DESIGNATION: CONTENT 620 Abstract This is free form as it occurs in the file of the organization writing the tape, in which the source of the abstract is identified as author of the item or the organization gen- erating the abstract. For contents notes see data element 390. 4643 ---- 94 Journal of Library Automation Vol. 2/ 2 June, 1969 APPENDIX II A PROPOSED UTILIZATION OF THE SUBRECORD DIRECTORY AND SUBRECORD RELATIONSHIP FIELDS IN THE PROPOSED AMERICAN STANDARD FOR A FORMAT FOR BIBLIOGRAPHIC INFORMATION INTERCHANGE ON MAGNETIC TAPE The following Appendix is not part of the proposed Standard but is included to illustrate a method for handling subrecords within a biblio- graphic record. 1. Subrecord Directory A subrecord directory will be used when a bibliographic record con- sists of more than one subrecord. The subrecord directory, when present, will contain entries comprising a directory of the directory entries. 0 1.1 Entries in Subrecord Directory TAG ID The format of each entry in the subrecord directory is as shown: 1 TAG LENGTH STARTING CHARACTER TYPE-OF- BIBLIOGRAPHIC POSITION RECORD LEVEL 2 3 6 7 11 1.1.1 Tag The tag is associated with a complete subrecord. 1.1.1.1 Tag ID The tag ID is a data element consisting of one basic character used to differentiate tags for multiple subrecords describing bibliogra- phic units which have the same type-of-record and bibliographic level codes (See 3.2.4 and 3.2.5). 1.1.1.2 Type-of-Record and Bibliographic Level When the subrecord does not describe a pri- mary bibliographic unit, the type-of-record and bibliographic level are assigned as though it did. Appendix II 95 1.1.2 Length of Subrecord Directory The length of that portion of the directory associated with the subrecord. (This value is always a multiple of twelve and when divided by twelve, equals the num- ber of entries associated with the subrecord.) 1.1.3 Starting Character Position The starting character position is that of the first entry in the directory which pertains to the subrecord. It is ·a five-digit decimal number relative to the first char- acter of the bibliographic record. 2. Subrecord Relationship Field A subrecord relationship field will be present if, and only if, a sub- record directory is present. The subrecord relationship field, when present, shall contain fixed fields which are used to indicate the rela- tionships of subrecords to each other. 2.1 Relationship Fields The format of each relationship field is as shown : I TAG I RELATIONSHIP INDICATOR I TAG I TAG I 0 2 3 5 6 8 9 11 If a subrecord bears the same relationship to two subrecords, they may both be shown, otherwise the second tag field is padded with the padding character. There is no limit to the number of relationship fields or to the number of relationships which may involve a specific subfield. The relationship fields may be used to develop, and define tags for, hierarchies of subfields. One bibliographic record may contain no more than 64 subrecords with the same type-of-record and bibliographic level. 2.1.1 Tag The tags in the subrecord relationship field have the same format as those in the subrecord directory. 2.1.2 Relationship Indicator A relationship indicator is a data element consisting of three ( 3) basic characters used to indicate the rela- tionship between subfields. 4644 ---- 96 Journal of Library Automation Vol. 2/2 June, 1969 BOOK REVIEWS The MARC Pilot Project; Final Report . . . prepared by Henriette D. Avram. Washington, Library of Congress, 1968. 183 pp. $3.50. MARC Manuals Used by the Library of Congress prepared by the Infor- mation Systems Office, Library of Congress. Chicago, American Library Association, 1969. 335 pp. $7.50. The first of these two important publications is a technical report of high quality. Its purpose is to describe in detail the history, objectives, system design, operation, costs, and findings of the experimental pilot project. It attains its purpose admirably; this report will long be the classic document on the first major experiment of the use of a machine readable cataloging record by a group of libraries. Mrs. A vram has included sufficient detail to enable the reader to under- stand exactly how the project operated. Procedures could be reproduced from the information given. For the many who will be using MARC I or MARC II data for experiment or operations, complete information on both formats is included. Four calculations of input costs yielded unit costs ranging from $2.26 to $1.31. If the cost of computer processing is subtracted from $1.31, the result is $.99, or double the approximate average of conversion costs reported from several other centers. Reports from seventeen participants constitute an appendix. Some ac- complished nothing, others experimented with the tapes, while a third used the data in routine operations. Of the participants' reports, those from the University of Toronto Library and the Washington State Library are the most detailed and contain most useful statistical data. MARC Manuals is an indispensable publication for any library contem- plating use of, or using, MARC II tapes. The manuals are four: 1) "Sub- scriber's Guide to the MARC Distribution Service," 76 pp.; 2) "Data Preparation Manual: Marc Editors," 218 pp.; 3) "Transcription Manual," 22 pp.; and 4) "Computer and Magnetic Tape Unit Usability Study," 18 pp. This publication is the master guide to use of MARC II records. The Government Printing Office required three-quarters of a year to produce The Marc Pilot Project while anxious users waited. The American Library Association needed hardly a month to produce the MARC Manuals. Admittedly this publication performance is new for ALA, but it should receive long and loud applause. Computerization has introduced a factor of timeliness into publication, and it is gratifying that ALA recognizes the fact. Frederick G. Kilgour Book Reviews 97 Bibliography of Research Relating to the Communication of Scientific and Technical Information. Edited by Jay Hillary Kelley, Charles L. Bernier and Judith C. Leondar. Bureau of Information Sciences Research, Gradu- ate School of Library Service, Rutgers, The State University. Rutgers Uni- versity Press, New Brunswick, N.J. 1967. 3510 pages. Do we need a review of a bibliography already two years old? The Editor of JLA says yes. More importantly, can we find good use for the bibliography it reviews? In this case, yes. Its scope is both less and more than the title indicates: less, because "communication" here means docu- mentation and excludes direct, immediate communication; more, because it extends far beyond merely the documentation of science and technology to information processes per se, though not to all of information science. Psycholinguistics and epistemology seem to be ignored, and logic is given short shrift. From the seven existing major bibliographies listed at the end of this review, and from twenty abstracting and indexing services, and nearly 300 journals, the compilers have selected items published during the years 1955-1965, in nine categories : 1 ) research resUlts, 2) new theories, 3) identifiable breakthroughs, 4) incremental gains in information sciences, services, and systems, 5) developments identified as new by the authors reporting them, 6) comprehensive reviews, 7) bibliographies, 8) evalu- ative articles, and 9) directories to current research. Excluded are items of purely historical, biographical, speculative or entertainment value, as well as bibliographies or literature surveys of fields outside Information Science (IS) . These criteria, and the book's subject classification scheme, are them- selves useful and they reflect considerable thought, even though the user is sure to find instances where: 1) items included do not seem to measure up to the criteria or 2) he will disagree with the structure of the classifi- cation scheme. However, these faults are inherent in the bibliographic activity, inexact science that it is. The introduction offers as the project's rationale some interesting and provocative hypotheses. One relates to the epidemic nature of progress in IS-i.e., that progress comes through a few identifiable discoveries. More basic is their assumption that "well-known bibliographies, reviews, work- ers, and organizations were identifiable and needed representation." (It is possible to argue that if identifiable through literature, they are likely to be already identified, at least by the people who really need them, and that a general bibliography is not needed. But the worker oriented to the literature of IS-as documentalist, librarian, or as teacher, student or researcher in IS, will probably be glad anyway to have so much of it in one place. ) Selection is slanted to the most current work, on the assumption that viable earlier contributions will be identified through citations. The editors postulated that "plagiarism, duplication, and repetition of work were so 98 Journal of Library Automation Vol. 2/ 2 June, 1969 rampant that many potential items for the bibliography could be rejected on this basis." By creating in advance a classification scheme for IS, and then placing the items selected in the classes, they predicted that it would be possible to identify gaps in the field, where more research is needed. The means for doing so are not discussed and left unanswered is the question: How do we determine the right amount of publication for each class? The result is a bibliography-of some 3700 items chosen from about 30,000 considered-intended as a guide rather than an exhaustive com- pilation. If the judgments of the editors stand the test of time, having less is more. The prospect of obsolescence, however, haunts this bibliog- raphy as it does all others, and it highlights the need for bibliographic tools that can be more easily updated by both addition and purging, like the ill-fated Automation Reporter, a looseleaf service in this field, dis- continued for lack of support. For a profession which seeks to solve other people·s information problems, IS people are often slow to get the word. But this is an indictment of the whole field, not specifically the group at Rutgers, who have provided a useful tool, if not the most useful one imaginable. Efficient use of the book is likely to be impaired by its appearance. Photo-offset reproduction of greatly reduced typescript is not ideal for a reference book such as this. Where economy dictates its use, a little imagi- nation and quality control, not evident here, can do a great deal to over- come its faults. Here, the printing is too light. There is nothing done to set off such elements as author or title. Item numbers appear at the right margin in all cases; hence they are half the time buried in the gutter. The ratio of pages of index to text is appropriately high-though of course no one knows what an optimum would be. There is about a page of author index to four pages of bibliography, and a slightly greater pro- portion of subject indexing. Shortcomings aside, this promises to be a useful bibliography. The editors do not make it clear if they intend it to be more than that-for example, the basis for a study of formal character- istics is IS literature. If not, they should consider doing so. The seven major bibliographies mentioned above were completely searched for this bibliography. They are: Balz, C. F. and R. H. Stanwood, compilers. Literature on information retrieval and machine translation. IBM, 117 pp., 2965 ref., 1962. Janaske, P. C., ed. Information handling and science, information, a select bibliography 1957-1961. Washington, D. C., American Institute of Biological Sciences, 1121 ref., 1962. National Bureau of Standards, Research Information Center and Ad- visory Service on Information Processing (RICASIP) [Computer print- out of references and indexes] Washington, D. C., National Bureau of Standards, 11 parts, approximately 18,000 ref., June 16, and July 15, 1965. Book Reviews 99 Neeland, F., ed. A bibliography on information science and technology for 1965. Santa Monica, Calif., Systems Development Corp., 3 parts 1750 ref., 1965. Snodey, S. R., compiler. Information retrieval: systems and technology, a literature survey. North American Aviation, Inc., Space and Systems Div., 272 pp. 1914 Rev. (SID 63-199), Jan. 15, 1963. Spangler, M., compiler & ed. General bibliography on information stor- age and retrieval. Phoenix, General Electric Co., Computer Dept., 1550 ref., 1962. Zell, H. M. and R. J. Machesney, compilers & ed. An international bibli- ography of non-periodical literature on documentation and information. Oxford, Robert Maxwell & Co. Ltd., 1555 ref., 1965. Joseph C. Donohue Evaluation of the Medlars Demand Search Service, by F. W. Lancaster. U. S. Department of Health, Education and Welfare, Washington ,D. C., January 1968. 276 pp. MEDLARS, a computer-based information storage and retrieval service of the medical literature, represents a very significant effort in the man- agement of the information explosion in the health sciences. The MED- LARS system in itself is quite complex and this study represents an at- tempt to evaluate the effectiveness of the storage and retrieval from the data base which now numbers more than 800,000 citations from 2,300 jour- nals from all over the world dating since January 1964. The study was designed to evaluate the factors related to the require- ments of the user: coverage, recall power, precision, response time, for- mat; and the effort that the user must expend to evoke a satisfactory re- sponse from the system. Emphasis in this report was upon recall and pre- cision. The study was based on 25 to 30 retrieved citations, the effective- ness of which was evaluated by the users. Of 299 searches studied, the system was operating at 57.7% recall of the major relevant citations from the available data base, and 54.4% pre- cision as judged relevant by the requesters. The more comprehensive the recall, the less precise is the output. In addition to a determination of effectiveness, equally important was analysis of the factors contributing to a failure. The principal causes were related to the failure of the index language, the indexing subsystem, searching, and the interaction between the user and the system. The author concludes with a number of con- siderations for enhancement of the effectiveness of the MEDLARS system. The author and the advisory committee are to be commended upon the depth of their evaluation, the objectivity of their appraisal and their thoughtful suggestions for improvement. Such a complex information sys- tem should be under continuous self-appraisal if it is to meet the urgent 100 Journal of Library Automation Vol. 2/ 2 June, 1969 needs of the scientist as he deals with the burgeoning health sciences information. John A Prior Library Effectiveness: A Systems Approach, by Philip M. Morse. The M.I.T. Press, Cambridge, Massachusetts, 1969. 207 pp. $10.00. As professor of theoretical physics at the Massachusetts Institute of Tech- nology, as a director of M.I.T.'s Computing Center, Operations Research Center, and Project MAC, and as the first president of the Operations Research Society of America, Philip Morse has been a key figure in the many scientific developments which are now playing such an important role in the design of information systems. His abiding interest in the analy- sis and improvement of libraries is less well-known, and it is fortunate that he has made available a detailed account of his seminal work in this area. The present book had its origins in a series of student projects which used the M.I.T. Library as a laboratory for the application of operations research methods. Morse has selected several mathematical models for ex- position with ample verbal explanation of their theoretical implications and their practical application in explaining and predicting user behavior in the M.I.T. Science Library. The number and kinds of tasks performed by library visitors is shown to follow a geometric-multinomial pattern, not unlike a game of craps. The essentially random demand for, and utiliza- tion of, library services is shown to give rise to a queuing or interference situation not unlike a telephone switchboard, where models are available to help predict the effect of providing duplicate services, usage restric- tions, and reservations, and to help account for the possibilities of the user's balking or becoming discouraged. Finally, the random usage of books is shown to have a mean bias with age, especially in the early years, which can be modelled by a Markov chain whereby book usage settles down in an exponential fashion to some residual or "steady state" level of usage in old age. The 'model is used to examine book retirement policies. In all of these models approaches employing probability are empha- sized, but the relationships are kept simple enough to allow for meaning- ful comparisons and combinations of different classes of users and library materials. Some of the observations Morse is able to make about the dif- ferences among biologists, chemists, mathematicians, and physicists as li- brary users are among the most interesting results of his analysis. Unfortu- nately, the absence of statistical tests of significance makes it necessary to accept many of these results as useful hypotheses in need of further validation. On page 141, Morse says that he anticipates "comments that are sure to be made about the cavalier way we have handled the model and the data. . . . Our object was to arrive at a model simple enough so results could be obtained graphically or by slide rule. Accuracy is not Book Reviews 101 often important in reaching policy decisions: order-of-magnitude figures are far better than none. . . . But, as the library becomes more 'mecha- nized' or 'computerized' these data will become enormously easier to col- lect, if the computer system is designed to gather the needed data," (author's italics) . He goes on to say: "It is the author's belief, based on discouraging experience, that neither the computer experts nor the librar- ian (for different reasons) really know what data would be useful for the librarian to have collected, analyzed, and displayed, so he can make decisions with some knowledge of what the decision implies. What is needed before the computer designs are frozen is for models of the sort developed in this book, to be played with, to see which of them could be useful and to see what data are needed and in what form, in order that both 'models and computers can be used most effectively by the librarian." Morse has addressed this book to both librarians and system analysts as an experimental but much needed venture. To the analyst it represents a good first attempt at modelling the complexities of a library and points the way toward more sophisticated techniques and more experimental work. To the librarian it provides some alternative to blind automation and a glimmer of hope that the evaluative techniques will come forth that are so badly needed to judge and control the efficacy of the new computer-aided systems being proposed. F. F. Leimkuhler The Role of the Library in Relation to Other Information Activities: A State of the Art Review, by Anne F . Painter. U. S. Army, Office of Chief of Engineers, Washington, D.C. 1968. ( CTISA Project, Rt. No. 23.) At one time the controversy over libraries and "infoitrnation centers" was of interest to many of us. The "Wienberg Report" could draw a crowd at any professional meeting but, thank goodness, such issues lose their interest and, one hopes, we go on to more productive work. Differentiating between libraries and "information centers" does not seem to this reviewer to be art. Nor does it seem to be in such a state as to be worth reviewing. Nevertheless, Professor Painter has produced a large bibliography, arranged both alphabetically and by subject, pre- ceded by some fifty rather wordy pages. The general conclusion-that libraries and "information centers" are and should be performing the same tasks to a greater and greater degree-speaks to an issue no longer of great interest. A literature survey of any kind can get tedious and one which reviews that written about a dead issue, as this publication does, becomes extremely dull. The ponderous style of official reports is present and the effort required to wade through the jargon is not rewarded by 1: 102 Journal of Library Automation Vol. 2/2 June, 1969 fresh insight nor perceptive evaluation. The publication is recommended to those who collect bibliographies on the subject and collectors of library science who exercise but little selectivity. Hugh C. Atkinson BNB MARC Documentation Service Publications Nos. 1 and 2. London, Council of the British National Bibliography, Ltd., 1968. Part 1, £2; Pt. 2, draft. These admirable publications, presented by R. E. Coward, describe, explain, and discuss essential characteristics of BNB MARC records. They constitute a more comprehensive presentation of information about MARC than has heretofore appeared as an integrated exposition. They are par- ticularly valuable for their explanations of details of MARC format and of cataloging practices. In addition, Part 1 contains useful and informative treatises on filing, subject and other added entry data. R. E. Coward prepared these documents for users of BNB MARC rec- ords. but users of any variety of MARC records will find stimulating and helpful discussions. Since it appears most probable that BNB MARC rec- ords will be used beyond the perimeters of the United Kingdom, the handbook areas of these two documents will receive wide use. The description and explanation of the communication for:mat is fully and lucidly presented. BNB has introduced a few elaborations of LC MARC that are imaginative and effective. For example, Part 2 describes an attractive technique for elimination of an initial article in a title when sorting is on title. The number of characters in the article and the space following the article is determined, and this number is placed in the otherwise unused second indicator position. This information is not on the LC tapes, and would certainly be a welcome and helpful addition. Part 1 contains discussions and solutions of filing problems that occupy two dozen pages. Since the British National Bibliography appears in book- form, its filing problems are numerous and severe. The techniques for solving their problems are effective and are presented with commendable clarity. Of comse, not all problems of arrangement of entries in bookform catalogs are solved, but the procedmes for solution will be useful in application to architecture of other filing orders. Little has been written about subject content of MARC records, and most of what has appeared is also in Part 1. Coward briefly describes subject-heading and classed subject content of MARC without pushing these two ancient jousters into the lists. However, it can confidently be predicted that MARC will become a new terrain for this heroic arena. The discussion of added entries, although brief, is also novel for a MARC document. However, the boundaries of a new battleground are Book Reviews 103 discernible in the statement that "author and title have proved to be so cumbersome and prone to error that number systems have proliferated to take · their place." Those librarians whose main objectiv~ is partici- pation in the programs of the community of which their library is a segment, will ··surely protest that the day is not in the foreseeable future when scholars and other users will substitute Standard Book Numbers for author-title citations. Part 2 of the publication supplements Piut 1 with provision of detailed information on magnetic tape specifications. It also increases compati- bility between BNB MARC and LC MARC so that no significant differ- ences. exist. Where BNB MARC does not include fields in LC MARC, the LC ·fields are nevertheless described, thereby aiding either British or American users in processing MARC records from either- source. These two publications contain much useful information about MARC records that is not available elsewhere. In addition, they contain effective emendations of MARC that will stimulate all MARC users to develop further improvements. Richard Coward and BNB are to be commended for a major contribution to MARC literature. Frederick G. KilgoU1' Library & Information Science Abstracts. 1 (Jan.-Feb. 1969). London, The Library Association. Annual subscription £6 6s. Recently two authors described librarianship as "paralyzed by decades of philosophical and literary argumentation." It is correct to state that until the past few years library literature has contained little, if any, new knowledge. However, the literature of today is beginning to swell with reports of new investigations and applications-reports which the modern librarian must make part of his armamentarium, just as the modem physi- cian must learn of new developments if he is to be increasingly successful in prevention and cure of disease. Indeed, worthwhile library literature has increased to a magnitude that requires regular perusal of abstracts to "keep up." Given tlus circumstance, it is a pleasure to welcome an excellent new absb·act journal. Library & Information Science Abstracts (LISA) is not a mere re- christening of Library Science Abstracts. To be sure, LISA evolved from the latter, and must be thought of as a new generation. The Library Asso- ciation publishes LISA but ASLIB has joined forces with LA in coopera- tive sponsorshlp. LISA now boasts a fulltime editor with some staff at LA, where responsibility for abstracting in library science resides. ASLIB furnishes the information science abstracts under a contract with LA. It is the intent of the publishers to use author abstracts or to have I' I 104 Journal of Library Automation Vol. 2/ 2 June, 1969 staff do abstracts in English and to call on a panel of abstractors that can read foreign languages. The goal for publication lag is six to fourteen weeks. If lag time can be kept within these limits, LISA will achieve at least one notable accomplishment. The main arrangement of abstracts is the British Research Groups· Classification of Library Science, which appears to be adequate. The sub- jects are much more narrow than those that Library Science Abstracts employed. Cross references are included in the form of the citation with a reference to the location of the abstract-a most helpful procedure. An author index and a subject index is in each issue. The first issue contains 358 abstracts, so that it can be expected that some two thousand will appear annually. The abstracts are the usual indicative variety found in abstract journals and are well done. The LA Library will provide photocopies of the original at page rates varying from 4 1/ d to ls Od, depending on size of page. LISA will cover proceedings, symposia and a few monographs as well as journals. The first issue lists 251 journal titles being covered-a twenty- five percent increase in numbers of titles over Library Science Abstracts. However, some titles in LSA have been dropped, so that LISA covers approximately a hundred new journals, including titles in computation and information science as well as librarianship. LISA is an excellent abstract joumal which every librarian who wishes to grow with his profession must read and use effectively. Frederick G. Kilgour 4663 ---- MARC INTERNATIONAL Richard E. COWARD: Head of Research and Development, The British National Bibliography, London, England 181 The cooperative development of the Library of Congress MARC II Profect and the British National Bibliography MARC II Project is described and presented as the forerunner of an international MARC network. Emphasis is placed on the necessity for a standard MARC record for international exchange and for acceptance of international standards of cataloging. This paper is an examination of two major operational automation proj- ects. These projects, the Library of Congress MARC II Project and the British National Bibliography (BNB) MARC II Project, are the result of sustained and successful Anglo-American cooperation over a period of three years during which there has been continuous evaluation and change. In 1969, for a brief period, the systems developed have been stabilised, partly to give time for library systems to examine ways and means of exploiting a new type of centralised service, and partly to give the Library of Congress and the British National Bibliography the oppor- tunity to look outwards at other systems being developed in other coun- tries. There has, of course, already been extensive contact and exchange of views between the agencies involved in the planning and developing of automated bibliographic systems and the possibilities of cooperation and exchange have been informally discussed at many levels. The time has now come for the national libraries and cataloguing agencies con- cerned to look at what has been achieved and to lay the foundation for effective cooperation in the future. The history of the Anglo-American MARC Project began at the Library 182 Journal of Library Automation Vol. 2/ 4 December, 1969 of Congress with an experiment in a new way of distributing catalogue data. The traditional method of distributing Library of Congress biblio- graphic information is to provide catalogue cards or proof sheets. These techniques will undoubtedly continue indefinitely into the future, but the rapid spread of automation in libraries has created a new demand for bibliographic information in machine readable form. The original MARC project ( 1) was "an experiment to test the feasa- bility of distributing Library of Congress cataloguing in machine readable form". The use of the word "cataloguing" underlines the essential nature of the MARC I project; its end product was a catalogue record on mag- netic tape. There is a very significant difference between a catalogue record on magnetic tape and a bibliographic file in machine form. The latter does not necessarily hold anything resembling a catalogue entry, although MARC II still reflects, both in the LC implementation ( 2,3) and in the BNB implementation ( 4,5), a preoccupation with the visual or- ganisation of a catalogue entry. Fortunately retention of the cataloguing ''framework" does not hinder the utilisation of LC or BNB MARC data in systems designed to hold and exploit bibliographic information, as the whole project is designed as a method for communication between sys- tems. The essence of the MARC II project is that it is a communications system, or a common exchange language between systems wishing to ex- change bibliographic information. It is highly undesirable, in fact quite impossible, to plan in terms of direct compatability between systems. Ma- chines are different, programs are different, and local objectives are dif- ferent. The exchange of bibliographic information in any medium implies some level of agreement on the best way to organise and present the data being exchanged. The need to use a fairly standard type of bibliographic structure on a catalogue card is obvious enough, and over the years a form of presentation, as best exemplified by a Library of Congress cata- logue card, has been developed which holds all the essential data and also, by means of typographical distinctions and layout, conveys the in- formation in a visually attractive style. When bibliographic information is transmitted in a machine readable form the question of visual layout does not arise but the question of structure is vitally important. This struc- ture is called the machine format and the machine format holds the data. It literally does not matter in what order the various bits and pieces that make up a catalogue record appear on a magnetic tape. What does matter very much is that the machine should be able to recognise each data element: author, title, series, subject heading, etc. In practice, either each data element must be given an identifying tag that the machine can recog- nise, or each data element must occupy a predetermined place in the record. In view of the unpredictable nature of bibliographic information, the former method- that of tag identification-is now widely used and is the technique adopted in the MARC system. MARC International/ COWARD 183 The LC and BNB MARC systems are two very closely related imple- mentations of a communications format which in its generalised form has been carefully designed to hold any type of bibliographic information. The generalised format description is now being circulated by British Standards Institute and United States of America Standards Institute. It can be very briefly described as follows : I LEADER I DIRECTORY I CONTROL FIELD(S) I DATA FIELDS The leader is a fixed field of 24 characters, giving the record length, the file status and details of the particular implementation. The directory is a series of entries each containing a tag (which identifies a data field) , the length of the data field in the record, and its starting character posi- tion. This directory is a variable field depending on the number of data elements in the record. The control fields consist of a special group of fields for holding the main control number and any subsidiary control data. The data fields are designated for holding bibliographic data. Each field may be of any length and may be divided into any number of sub- fields. A data field may begin with special characters, called indicators, which can be used to supply additional information about the field as a whole. It can be seen that the basis of MARC II is a very flexible file structure designed to hold any type of bibliographic record. Once such a level of compatability is established it is possible to prepare general file handling systems ( 6) which will convert any bibliographic record to a local file format. There is certainly much scope for agreement on local file formats as well, but such formats will necessarily be conditioned by the type of machine available and the use to be made of the file. The establishment of a generalised file structure is a great step forward but by itself means very little unless a wide measure of agreement can be reached on the data content of the record to be exchanged. Here the responsibility for cooperation and standardisation shifts from the automa- tion specialist to the librarian, and particularly to those national libraries and cataloguing agencies who can by their practical actions assist libraries to implement the standards prepared for the profession. In order to appreciate the real importance of standardisation, particu- larly in the context of the MARC Project, it is necessary to look a few years into the future. It is inevitable that the rapid spread of automated systems in libraries will create a demand for machine readable biblio- graphic records and that in turn will lead to the setting up of bibliographic data banks in machine readable form in national and local centres. These data banks will be international in scope and will contain many millions of items. In the long run the only feasible way to maintain them is for each country or group of countries to develop automated centralised cata- 184 Journal of Library Automation Vol. 2/4 December, 1969 loguing systems for handling their own national outputs and to receive from all other countries involved in the network machine readable records of the latter's national outputs. Countries cooperating on this basis must agree on standards of cataloguing (and ultimately on standards of classifi- cation and subject indexing), so that the general data bank presents a consistently compiled set of bibliographic data. There is no doubt that national data banks will be set up. Libraries today are faced simultane- ously with a rapid increase in book prices, a need to maintain ever-in- creasing book stocks to meet the basic requirements of their readers, and a persistent shortage of trained personnel to catalogue their purchases. These trends are already well established and in the United States, where they are most advanced, the result has been the massive and highly suc- cessful Shared Cataloguing Program. Historically the Shared Cataloguing Program will probably be seen as the first and last attempt to provide a comprehensive bibliographic service by unilateral action. A large num- ber of countries have cooperated in this attempt but the Shared Cata- loguing Program does not rest on the principle of exchange. It is doubtful if even the United States will be able to maintain and extend this pro- gramme in its present form. The Shared Cataloguing Program must ulti- mately be replaced with an international exchange system. National machine readable bibliographic systems will be established, but there is a grave danger that those agencies responsible will be pri- marily concerned only with the immediate problem of producing records suitable for use in their own national context or for their own national bibliography, regardless of the fact that the libraries and information cen- tres they need to serve are acquiring ever-increasing quantities of foreign material. The exchange principle will be downgraded to an afterthought, a bv-product of the fact that an automated system is being used. If this outcome is to be avoided, international standards must be pre- pared and national agencies must accept them instead of only paying lip service to them. In the past librarians have tended to be more concerned with codification than standardisation, but in the field of cataloguing at least a great breakthrough was made sixteen years ago when Seymour Lubetzky produced his "Cataloguing Rules and Principles; a Critique of the A.L.A. Rules for Entry and a Proposed Design for Their Revision" ( 7). The work of Lubetsky led to the "Paris Principles" ( 8) published by IFLA in 1963 and in due course to the preparation of the "Anglo-American Cataloguing Rules" 1967 ( 9) . These rules, though unfortunately departing from Lubetzky's principles in one or two areas provide a solid basis for standardisation. We are fortunate to have them available at such a critical moment in the history of librarianship. They must form the basis of an international MARC project. Of all the great libraries of the world, the Library of Congress has done more than any other to promote international cataloguing standards. It is now in a uniquely favourable position to promote these standards MARC International/COWARD 185 through its own MARC II Project. The LC MARC II project, together with the BNB MARC II project, can provide the foundation of the inter- national MARC network. These projects alone cover the total field of English language material and yet already the basic requirement of stand- ardisation is absent. The Library of Congress finds itself unable, for administrative reasons, to adopt fully the code of rules it worked so hard to produce and which British librarians virtually accepted as it stood in the interests of interna- tional standardisation. That a great library should be in this position is understandable. What is less understandable is that the Library of Con- gress should transfer the non-standard cataloguing rules established by an internal administrative decision to prescription of cataloguing data in the machine readable record that it is now issuing on an international basis. One of the great advantages of machine readable records is that they can simultaneously be both standard and non-standard. There is no reason that the Library of Congress, or any national agency, should not provide for international exchange a standard MARC record together with any local information the Library might want. If as a result other national agencies are encouraged to do the same, it will not be long before the absurdity and expense of examining each record received via the inter- national network in order to change a standard heading to a local variant, will become apparent. The British National Bibliography has already ac- cepted the Anglo-American code and by this action has now done much to promote its acceptance in Great Britain. Incomplete acceptance of the code is really the only significant difference between the two MARC projects. At a detailed level there are differences in some of the subfield codes. These are chiefly due to the fact that the British MARC Committee was particularly concerned with the problems of filing bibliographic entries, and as no generally accepted filing code exists it was decided to provide a complete analysis of the fields in headings. This analysis will enable the BNB MARC data base to be arranged in different sequences to test the rules now being prepared. The other difference, or extension, in the British MARC format is the provision of cross references with each entry, on the assumption that in a MARC system a total pack of cataloguing data should be provided. However these differences reflect the experi- mental nature of the British project, not the fundamental differences in opinion. In this paper an attempt has been made to look at the British and American MARC Projects not as systems for distributing bibliographic in- formation but as the forerunners of an international bibliographic network. Intensive efforts have been made to lay a foundation for this international network. The Anglo-American code provides a sound cataloguing base, the generalised communications format provides a machine base, and the Standard Book Numbering System provides an international identification 186 Journal of Library Automation Vol. 2/ 4 December, 1969 system. These developments are all part of a general move towards real cooperation in the provision of bibliographic services. They must now be brought together in an international MARC network. REFERENCES I. Avram, Henriette D.: The MARC Pilot Profect (Washington, Library of Congress: 1968). 2. U. S. Library of Congress. Information Systems Office. The MARC II Format: A Communications Format for Bibliographic Data. Pre- pared by Henriette D. Avram, John F. Knapp and Lucia J. Rather. (Washington, D. C.: 1968). 3. "Preliminary Guidelines for the Library of Congress, National Library of Medicine, and National Agricultural Library Implementation of the Proposed American Standard for a Format for Bibliographic In- formation Interchange on Magnetic Tape as Applied to Records Rep- resenting Monographic Materials in Textual Printed Form (Books)," Journal of Library Automation, 2 (June 1969) . 68-83 4. BNB MARC Documentation Service Publications, Nos. 1 and 2 (Lon- don, Council of the British National Bibliography, Ltd., 1968 ). 5. Coward, R. E.: '~he United Kingdom MARC Record Service," In Cox Nigel S. J.; Grose, Michael W.: Organization and Handling of Bibliographic Records by Computer (Hamden, Conn., Archon Books, 1967). 6. Cox, Nigel S. M.; Dews, J. D.: "The Newcastle File Handling Sys- tem," In op. cit. (note 4). 7. Lubetzky, Seymour: Code of Cataloging Rules ... Prepared for the Catalog Code Revision Committee . .. With an Explanatory Commen- tary by Paul Dunkin. (Chicago : American Library Association, 1960). 8. International Federation of Library Associations. International Con- ference on Cataloguing Principles, Paris, 9th-18th October, 1961: Re- port; Edited by A. H. Chaplin. 9. Anglo-American Cataloging Rules. British Text (London: Library As- sociation, 1967). 4665 ---- MANAGEMENT PLANNING FOR LIBRARY SYSTEMS DEVELOPMENT Fred L. BELLOMY: Head, Library Systems Office, University of California, Santa Barbara, California 187 This paper deals with the application to library systems development pro- grams of planning techniques which long ago proved their usefulness in business, military, and aerospace developments. The significant features of PERT (Program Evaluation and Review Technique), WBS (Work Breakdown Structure), planning diagrams, statements of work, cost/time estimates, schedules, manpower loading, and cost phasing are related through an example to the management requirements of a mafor systems development program at a large university library. The practical aspects of planning are treated in preference to the more theoretical. One seldom finds the sense of urgency characteristic of aerospace and military programs influencing the development of new library systems. This, of course, has both advantages and disadvantages. Compared to military programs, the level of risk demanded by the urgency of the re- quirements may be considerably lower. Development periods may be relatively longer and resource allocations can be spread out over a longer period of time, also. Fewer people need to be involved in the develop- ment at any one time, but the problem of retaining individuals with a technical knowledge of the program throughout its life is greatly increased. The development of a total library system could require twenty to fifty man-years of effort and, depending on the number of people assigned to 188 Journal of Library Automation Vol. 2/ 4 December, 1969 the program, it could span a period of a decade or more. Nevertheless, the requirements of a major library systems development program and those of a major aerospace or defense project are more similar than dif- ferent. It is appropriate, therefore, to expect that planning techniques per- fected for aerospace programs might be useful in planning major library programs. It is the purpose of this article to show how these principles are even now being applied in some library systems development pro- grams. IS PLANNING NECESSARY? The question is rhetorical, for every program manager uses some tech- nique of planning in his work. As often as not, however, he attacks prob- lems individually without an oveniding concern about the effect a par- ticular solution may have on other aspects of the library's operation. This approach to solving problems, while obviously not an optimum one from the long-range point of view, may be the only available alternative at times. Even the most ardent proponents of the total systems approach admit the possibility of critical problems requiring "quick and dirty" solutions ( 1). Many of the steps to be outlined here for planning and implementing a total library system would, undoubtedly, be omitted where a solution was urgently needed to satisfy a small set of relatively simple objectives and where few external constraints and resource limitations were imposed. Furthermore, not all systems designers agree that a library should even attempt to develop a "total system" in the beginning - arguing that man must crawl before he learns to walk ( 2). In practice, any library will find it necessary to apply a combination of approaches, but must plan from the very beginning for a total system. Even where the "fire fighting" ap- proach must be adopted it is helpful to have a knowledge of procedures to be followed were solutions approachable in an ideal manner. A planning technique, regardless of the degree of sophistication, is only a tool and can never be expected to serve as a substitute for effective management. Furthermore, such a tool must be viewed as an integral part of the entire management process. The management process has been evolving as much through the process of trial and error as through design for a long time now ( 3). Many knowledgeable people have written about the process and not all of the descriptions agree ( 4,5,6). There does seem to be general agreement, however, on some of the fundamental operations which constitute a management cycle. These are diagrammed in Figure 1. Although phrased variously by writers the management process is usu- ally defined to include: I ) the determination of objectives for an organiza- tion, 2) the preparation of plans for achieving the objectives, including the development of compatible cost and time schedules based on the plans, 3) the authorization of the required work, 4) the monitoring and evalua- tion of progress towards the objectives, and 5) the identification of alter- nate corrective action as problems develop. Systems Development Planning! BELLOMY 189 Fig. 1. The Generalized Management Cycle. It is an unfortunate fact that too many major development programs in libraries are begun without prior establishment of objectives, prepared plans or developed schedules. Too often, discussion has been begun with the unwarranted assumption that everyone concerned has a clear and identical understanding of objectives that have not been explicitly stated. During the past three years the author has had occasion to study, first hand, library automation projects underway at a large number of institu- tions : University of California- San Diego, University of California- Irvine, University of California- Riverside, University of California- Los Angeles, University of California- Santa Barbara, University of California- Santa Cruz, University of California- San Francisco, University of California- Davis, University of California- Berkeley, Stanford University, IBM- Los Gatos, Washington State University, Texas A & M, Florida Atlantic Uni- versity, Southern Illinois University, Massachusetts Institute of Technology, Yale University, University of Maryland, Harvard University, University of Missouri, Michigan State University, University of Chicago, University of Illinois- Chicago, University of Pittsburg, Ohio State University, Rens- selaer Polytechnic Institute, Johns Hopkins University, State University of New York- Albany, State University of New York- Buffalo, Honnold Li- brary- Claremont, New York Public Library, National Library of Medi- cine, Library of Congress. 190 Journal of Library Automation Vol. 2/ 4 December, 1969 In some of the major systems programs studied, planning had progressed not much beyond the identification of the initial steps which were re- quired in the program, with tentative discussions of the immediate re- sources which were needed to implement the first steps. Several of the managers reported that adequate funding for automated library systems development was hard to obtain before a technical capability had been demonstrated. Others were of the opinion that a greater degree of library automation was inevitable and that although everyone knew that the first steps would be costly and relatively ineffective, a start had to be made sometime. In retrospect it is very clear that such arguments, while un- doubtedly expedient in the short run, are not in an institution's best in- terest in the long run and, after all, as one associate put it, libraries are designed to last a millennium. PREREQUISITES TO PLANNING Resources The total systems approach implies the deployment of a team of pro- fessional people possessing diverse capabilities and backgrounds. One li- brary administrator maintains that the development of a total library sys- tem requires the skills of scientific managers, philosophers, all categories of analysts, systems engineers, many categories of design engineers, com- puter programmers, and others in addition to library scientists. It is im- probable that any one library would have on its staff personnel possessing the full range of capabilities required to pursue a successful systems de- velopment program. In some cases dedicated, full-time staff members will be able to learn the new skills which are needed; however, not all of the jobs requiring special skills need to be performed by full-time staff mem- bers. In some cases it will be feasible, and perhaps even desirable, to em- ploy on a consulting basis individuals from outside organizations ( includ- ing equipment manufacturers). It may even be advantageous to contract with an experienced outside organization to perform an entire segment of a complex systems development program. In addition to individuals with specialized skills the systems develop- ment team should include key individuals from all of the existing library operations likely to be affected by the new systems. First, these people can provide the necessary insight into their organization's operations that only an insider can develop, and second, these people will stand as strong evidence that their organization's special interests are being considered, so that the new systems will have a much better chance of being accepted once they are implemented. Above all, the early identification of one individual responsible for di- recting the entire development program is essential. This individual must have great skill in eliciting cooperation among people with diverse back- grounds, for systems work, like management, is partially a "people art" (7,8). Systems Development Planning!BELLOMY 191 While it is imperative that a library systems program be adequately staffed it is equally important to insure adequate funding for the project. Serious funding difficulties may result from a library's attempt to develop a major new system out of its existing operating budget. When a library administration commits itself to a comprehensive sys- tems study, it must be willing to accept the risk that the results of the study may indicate that existing systems are adequate; that no new major systems development is required. If a library administration is dedicated to change for change' sake or if it has decided to undertake a research project as distinguished from the development of operational systems, much of what is being said here must be viewed from a considerably different perspective. The process of analyzing existing systems is itself valuable ( 9). Li- braries which have subjected themselves to systems analysis know that problems or inconsistencies within existing systems discovered during the analysis ordinarily will be followed by some immediate corrective action. Few administrators consciously intend to maintain useless duplicate rec- ords or to prepare reports which serve no purpose. Techniques applied by effective program managers vary widely from one individual to another and from one situation to another ( 10). Aside from personal preference, factors which affect the approach taken include the complexity or scope of the objectives, the urgency of the requirements, and the risks the individual manager is willing to take. While objectives should be made explicit, they may be sketched out broadly or documented in great detail. Similarly, plans should include consideration of every major activity re- quired to achieve the objectives, but the level of detail may vary widely here, also. Plans should, either implicitly or explicitly, specify the contin- gency relationships among all of the tasks identified in the plan. It should go without saying that schedules must be based on plans. However, there are undoubtedly countless instances where schedules have been conjured up out of thin air to meet artificial deadlines, or worse, where no schedules at all have been specified. The latter is more characteristic of dozens of small library programs now in progress, and it may be suspected that the former characterizes too many of the major library programs. Objectives The reasons for undertaking a program must be determined by man- agement in advance. A library administration begins the process of devel- oping objectives for a modernization program by reviewing existing library policies, both generally understood and documented. Because program ob- jectives must be compatible with library policies, this is an essential first step. It will likely be necessary to develop a few new policies and to document many previously undocumented ones. The preliminary decision to undertake a modernization program may UNIVERSITY OBJECTIVE Serve a s 11a center of knowledge , a dive rs e collabora - tion of academic and professional disciplines, with s uch emphasis on graduate and profession al s t ud ies as will provide facul ty and faciliti e s for the most ski lied in struc t ion and the most advanced research in the academ ic and profess ional dis c ip l ines." LIBRARY OBJECTIVE Efficiently provide the informational resources re - quired by authorized inst ructional or research pro- g rams of the university I RES OU RCES DEVElOPMENT OBJE CTIVE LIBRARY MANAGEMENT OBJECT IVE PATRON SERVICE S OBJECT I VE Select , procure and process for us e in- Plan, organize, direct. control 7 a nd Satisfy pat r on requests for in forma- format i ona 1 re sources needed by all coordinate the use of all capital re - tion a I resources neede d i n connection authorized in s tructional or re search sou rces in a manner which maximizes wi th any authorized i nstructional o r pr og rams the effectiveness and minimizes the research program o f the Un ivers ity in cost of the overall I ibrary operation t he shor test time possib le I I I I I I I Select ,I Ma ter ia Is I Acq :.11 r\l Hater ia Is I Organize Materials Retrieve sl --'"later i a Is IOisseminatel Materi a ls I Educate Users I FAC I LIT IES OBJECTIVE SYSTEMS OBJECT I VE ADMINISTRAT ION OBJECT I VE Evaluate the fa c ilities needs of the Evaluate the effectiveness of ope ra - Develo p a competent sta ff - plan , or - I i brary imposed by chang ing condit i ons tiona I systems and procedures and de- ganize, direct, control , and coordi- and deve l op new facility specifications velop new systems which are more ef- nate its efforts t o utilize available requifed to meet the needs fective and less costly resources to achieve I ib rary object i ves I I SYSTEMS AND PROCEDURES SYSTEMS DEVELOPMENT PROGRAM Analyze and improve existing library Develop a tota l l ibrary system using systems and procedures t he best of the presently availa bl e devices and technologies which wil l produce a more effecti ve and/or l ess cos tly operation Fig. 2. Library Objectives Hierarchy. 1-' ~ c- ~ --Q. t"-1 & j ~ cs ~ g· ~ !"""' 1:0 ~ t:l g ~ ~~ ,_. co ~ Systems Development Planning/BELLOMY 193 have resulted from demands for change by higher governing bodies; re- quirements for new services in response to changing conditions; increas- ing backlogs; or inadequate budgets, staff or building space. In any case, program objectives will need to be established that reflect existing library policies and current or anticipated needs. H the library is a part of an institution that utilizes the Planning-Programming-Budgeting System (PPBS ), this step already may have been taken. In the case of an essential support activity like a library, the process of identifying objectives is complicated by the fact that the operation tends to be self justifying. That is, it is an integral part of the stated objectives of the higher-level organization of which it is a part. Thus, in order for a library to examine its full range of responsibilities it must first secure an approved statement of objectives for its parent organization. The pur- pose or objective of any organization depends on the perspective from which its functions are viewed. Thus, even at the highest levels of abstrac- tion, concerned individuals arrive at widely varying statements of objec- tives. In a university this process is further complicated by the general lack of concurrence on any subject, a situation which seems to be pecul- iarly characteristic of an academic community. In attempting to program the operations of a library, it is absolutely essential that the statement of objectives for the library, in some sense, be correlated with some reasonably authoritative and reasonably widely accepted statement of objectives for the parent organization. No statement of the library's objective will satisfy everyone concerned, but it must re- flect the administration's official attitude. Just as the library's objectives must contribute to the achievement of the objectives of the parent organization, so too must the objectives of the major library programs contribute to the achievement of the overall library objective. Wlien objectives for program elements are identified, these in tum must contribute to the objectives of the programs, and so forth on down to the lowest level of activity in the program. In other words, there is a hierarchy of objectives, although they are seldom discussed in these terms. A portion of this hierarchy for a university library is shown in Figure 2. The main purpose of Figure 2 is to show how the objectives of a systems development program contribute to achieving all of the objectives at suc- cessively higher levels in the hierarchy. The systems activity is divided, in this example, into two major areas of work: systems and procedures work, and major systems development projects. The systems and proce- dures work is directed at obtaining relatively short-term gains while the major systems projects have comparatively long-range goals. The systems and procedures work in the example is considered to be a continuing administrative function directed at improving the general ef- ficiency of the existing operation. Much of this work is carried on by the individual supervisors themselves, with central coordination being pro- 194 Journal of Library Automation Vol. 2/ 4 December, 1969 vided. Systems and procedures tasks include: organization planning and analysis; systems analysis and design; management audits; policy, proce- dures, and bulletin maintenance; forms analysis and design; reports analy- sis; records management; work measurement; office equipment selection; office layout; systems implementation; and related research ( 11). Most li- braries need to give this aspect of professional systems work greatly in- creased emphasis. The main objective of the systems development program cited in the example is to "develop a total library system using the best of the presently available devices and technologies which will produce a more effective and/or less costly total library operation." Specific objectives could include such things as faster processing of new book orders, better control over technical processing routines, availability of more comprehensive statistics, better management information, reduc- tion in routines performed by clerical staff, availability of better biblio- graphic descriptions of the collection, more effective utilization of profes- sional staff, improved reference services, better control over the physical collection, reduction of patron involvement in the charging transaction, better circulation control, etc. Naturally, no two libraries' general or spe- cific objectives are going to match these exactly. Selecting the First Project The steps which are usually taken when preparing a set of program plans will be discussed in terms of a relatively typical example. Let it be assumed that a systems analysis has shown that a total library system should be defined to consist of the following thirteen interrelated modules: materials selection, order processing, cataloging, materials preparation, library accounting, personnel control, systems and procedures, manage- ment information, inventory control, circulation, information retrieval, reference, and user education. Also, let it be assumed that every routine function performed by the library will support one of its stated objectives and will be subsumed within one of these thirteen modules. The library system itself will be defined to be concerned only with the operational objectives, however. Special, single-end-item projects, like fa- cilities development, objectives or policy formulation, major systems proj- ects (i.e. the development of a new module), etc., are a part of the man- agement apparatus of the system. It will be convenient to isolate these aspects of the undertaking from the operational segments of the system. While it is reasonably clear that the formulation of a total system concept has to precede the development of any of the identified modules, it is much less clear in just what order the development of individual modules should be undertaken. A study of even some of the more obvious depend- encies among thirteen such modules reveals a very complex set of con- tingency relationships. In a few cases these contingency factors will defi- nitely constrain the order in which modules must be developed. Usually, Systems Development Planning/ BELLOMY 195 however, these considerations will be much less demanding and it will appear that the choice of implementation priorities will be, for all practi- cal purposes, arbitrary. An evaluation of factors influencing the choice of implementation pri- orities will include: the nature of the interfaces among the defined boun- daries of all the modules, an evaluation of the relative value of the payoffs to be expected from developing each of the modules, an evaluation of im- minent changes in the state of the art affecting the development of a new module, and the political advantages to be gained from the development of a particular module. Thus, the library's management must take these and other factors (including technical) into consideration when they make their initial selection of implementation priority. For the example program a set of hypothetical contingency relationships have been evaluated. They are diagrammed in Figure 3, which shows how the sequence of implementation will be constrained by the various design contingencies which have been identified. The diagram says that the formulation of the total systems concept precedes development of any module. It must be the first major activity to be undertaken. Further, it says that once the total systems concept has been formulated the develop- ment of any one of five modules can be initiated. The selection of a . par- I ! Sta rt Fig. 3. Design Contingency Relationships. End - 196 Journal of Library Automation Vol. 2/4 December, 1969 ticular implementation priority is indicated by the letters associated with each block. That is, work implied by block "A" is completed first, then block "B"', then block "C", then block "D", etc., each module being completed before the next is begun. Under these circumstances there would be little justification for identifying much more than the obvious contingency rela- tionships already discussed. For more rapid development of the total library system, a much more complex planning effort would need to be under- taken. Several of the major efforts shown to occur sequentially in Figure 3 could, in fact, overlap significantly. Some of the tasks involved in develop- ing the cataloging module, for example, can be undertaken while the development of the order processing module is still in progress. Where minimizing development time is an important program objective and where all necessary resources are made available as needed, a carefully formulated and detailed program plan is warranted-indeed it is essential. The Work Breakdown Structure The work breakdown structure ( WBS ) displays two different kinds of information. First, it shows how the system itself is subdivided into suc- cessively smaller sub-components. Second, it shows how all program activ- ities making demands on available resources are related to the achieve- ment of program objectives ( 12). The development of a work breakdown structure can be undertaken as soon as the system is conceptualized. Furthermore, it should be available before an attempt is made to identify specific program tasks and the se- quence in which they should be done (PERT Programing) . The work breakdown structure is a useful means of showing the com- ponents of a major program in successively greater detail. While there is, naturally, no limit to the number of levels of subdivision which can be used, four or five should satisfy the requirements of most library system development programs. The development of the work breakdown struc- ture proceeds from the top to the bottom, showing how the total program is first subdivided into major program elements (or activities) and then how each of these in turn is successively subdivided into tasks and finally work packages. This relationship is shown generally in Figure 4. A well developed work breakdown structure provides a basis for effec- tive program planning and insures that no major program activity is over- looked during the planning phase of the program. It provides an excellent graphic representation of the interrelationship of the various components of a complex program, and shows how all aspects are related to the achieve- ment of stated program objectives. Finally, the work breakdown structure chart can be used as a convenient means for displaying progress towards achieving the objectives of a program. The details of the work breakdown structure developed for a project are heavily dependent on a number of factors. These include: the complex- ity, cost, and time span of the project; the relationship among the organi- Systems Development Planning/BELLOMY 197 PROGRAM PROGRAM ELEMENT B TASK B WORK PACKAGE B Fig. 4. Work Breakdown Diagram. zational units directly involved in the project; the objectives of the project; and externally imposed program constraints. An example of an actual work breakdown structure is presented in Figure 5 and illustrates the important features of such a diagram. It shows how a typical major development program at a large research library might be dissected into its successively more detailed component parts. In this example, the Systems Development Program is subdivided into four major subsystems developments and a general program activity. These are rep- resented by the five blocks in the second level of the diagram (program elements). Each of these five program elements is then further subdivided into more detailed tasks. Tasks are divided into work packages so that the bottommost elements on the chart represent work assignments of a manageable size for program control. This is just an example, of course. In actual practice a similar structure would be developed for each project in the program. The order processing module, for example, would be di- vided into sub-modules, etc. An integral part of the planning function involves the budgeting of avail- able funds (or the estimation of required funding) among the various program activities. A common technique for accomplishing this makes use of the use of the work breakdown structure. During later stages in the planning, all of the specific activities required to accomplish program ob- nsn"s O£Y[ L01Mt"" "'"'~ O('ftloP A TOTAl lltftA&Y SYSf~ 1)0 S'I'STUt INfO-FACES SUtSYSl(" TA$1( ~f£'-OA.IU (All ..OOUUS) - a f~U\Af£ OIJf.UIYU -2 Rf:COAO AMO AAAlYlE -) FOAMUI.AU: (OMC[JtT •'- P'1!.[PM[ SKC tr!CATIOIIS •S 0[Sl(;N A/110 O(YUOP -& A$50\ll( CCMPONIIUS • 7 TU Y tl($1~111 •I INSTAll C: fOU. i)I·UP •9 OOCI.t\(Nf •10 IIISU lLAN( Ot.IS Fig. 5. Library Systems Development Program Work Breakdown Structure. 1-' "' 00 'o t -a t"-1 t ~ ~ I ..... c;· ~ < ~ t-0 ~ tj C1> ~ g. v~ 1-' "' ~ Systems Development Planning/BELLOMY 199 jectives will fall under individual blocks in the work breakdown structure. The lowest level blocks, it will be recalled, represent work packages. Each of these work packages may in tum be assigned a cost account number for which funds may be budgeted. The work breakdown structure may also be used to establish summary budgets. While fund numbers may be assigned arbitrarily, coding is helpful. One workable technique is illustrated in Figure 5. Blocks of numbers are es- tablished for activities at each level within a structure on the diagram. One digit usually suffices at any particular level within a structure. Responsibility for Planning While it is probably better to assign responsibility for the preliminary planning activity to a single individual, it is imperative that plans eventu- ally reflect the intentions of those who will actually be responsible for doing the work. These individuals will require certain guidelines before they can complete detailed planning activities. First, they must under- stand the program objectives. Second, they must understand the basic or- ganization of the program and the fundamental planning philosophies adopted by the program manager. Third, they must understand that no plan is ever final, and should, therefore, propose every task which they believe necessary for a high probability of success. There are many advantages of drawing people responsible for major areas of work into the detailed planning activities of a program. A program plan developed in this way becomes their plan; it is one which reflects their intentions and which records their commitments. When schedules are finally developed from the plan they are much more likely to under- stand the significance of the completion dates and the consequence of slipping schedules or over-running budgets. It is well known that when an individual commits himself to a particular task completion date, he is more likely to meet that date than when he is directed to do so. PROGRAM EVALUATION AND REVIEW TECHNIQUE (PERT) Planning Factors While the work breakdown structure provides a logical means of display- ing the interrelationship among the various system components and pro- gram activities, it does not necessarily show all of the essential jobs which must be undertaken during the program. All such tasks are either implied or assumed during the preparation of the work breakdown structure, but they must be enumerated in greater detail before an attempt is made to prepare a comprehensive program plan. Examples· of implied tasks might include: the selection of personnel to be assigned to the program, the pro- curement of funding, the survey and evaluation of manufacturer's equip- ment, program review conferences, system design evaluations, etc. A com- prehensive list of such planning factors is another invaluable tool for use during the planning phase of the program. 200 Journal of Library Automation Vol. 2/4 December, 1969 G AUTHORIZATION TO START 0 P-OGMii G PATitON INFOR.HATI OH 0 N£[05 V[RIFI£D G TOTAl SVST£H COHCE:I'T @ FOIU'IULATED G AVAILABLE DOCUttENTATIOtll 0 ASSEtUI L£0 G OR0£11; Pt~;OCESSINC HODULE DESIGN CONCE ,.T FORHUI.ATEO G G SUM~RY FLOW CH.ART OF PA[S£NT OPERATION 8 PII.£PARED G 1100UL£ INTERFACES 0 STUDIED G) HEASUAAfllE PROCESSING @ PAMitET£1\S IDENTIFIED G PllOfiLEHS DGCUHENTEO AND B STUD IED G PAI\AAETER I'I.EASUIUHENT G [Jtp[R II'I.ENTS DES ICH£0 o--o-o OROU PROCESS INC OBJECTIVES ~NALVZ£0 PROC ESSING POLICIES OOCUHENTEO CUili!;ENT COSTS HEASUAEO 8Y FUNCTION NEU H.OCESS lNG OIJEC• liVES "8MINSTOM ING" COH'LETE PROCESSI NG IN FOI'.KATION NEEDS STUD I ED PROCESSING CONSTRAINTS· IDENTIFIED PllOCESSING PAII.AAETUS 11EASUII.ED PRESE NT COSTS p(R UNITS OF \JDRK CALCULATED PROCESS INC FREQUENCY REQ:UIREHENTS STUDIED BAS I C SVSTEHS ALTERNATIVES IO[HTI FIEO CRITICAL PATH (SEQUENCE Of ACT IVITIES ESTI HATEO TO ft[QUH~.[ THE 1'\0ST TIH[ TO C~PL£l£) C0 DESIGN TMDE·DfF STUOtES COMPLETE @ COMPLETE P.EVIE\1 AND IHFINEMENT OF HOOULE DESIGN CONCEPT @ COMPLETE REYI£\J AND REFINEMENT Of TOTAL SYST£HS CONCEPT 0 HANACEH ENT APPR OVAL OF CONCEPTS @ HHCTIVENESS OF OLD AND N[V DESIGNS COfoi.PAA.£0 G COST TO DEVElOP ANO OPEilATE NEW SYSTE/1 ESTI,..ATED @ OPEII..AT lNG COSTS f~ OLD AND NE\o' DESIGNS COHPAII.ED @ COST/E FFECTIVENESS TMOE•. OFF STUDY COMPLETE G itANAC[MENT APPII.OVAl TO IHPLEHENT C0 CENEII:Al SYSTOtS SP£CIFICATIONS PII.£P,Ait[O Fig. 6. System Development Program (PERT) Planning Diagram. Systems Development Planning/ BELLOMY 201 ® ;ENE AAL SVSTEHS e EQ.UIPHENT @ OPERATI NG PERSONNEL SPECIFICATIONS PAEMAEO SElECTED TAAINEO (0 DETAILED TRAOE·OF'F G OFFICE lAYOUT G MODULE ElEMENTS STUDIES FOR SYSTEMS DESIGNED TESTED COt\PONENTS CQI1PLETED e ORGAJrrf!ZATIONAl @ NEW MODULE IN STALLED @ RECORD AND FILE SlftUCTUI'ES FORMUlATED SPECIFICATIONS PREPARED @ SHAKE O()IN RUN e DETAILED PROCESS\ NC COt\PLETEO @ KACHIN( PROCI\AA PJ\OC EOURts OEV[lOP[O SPltlfiCATIOHS PRHAR(O @ RElEASE HOOUL[ TO @ COI1PLETE EVALUATION OPEAATING UNIT 8 IHPUT/otJT SP£CIFICA. .. ANQ RHIM[/1[NT Of liONS PREPAtl\£0 MODULE DES IC'fS @ FOLLC)I-UP EYALUAT ION COMPLETE G EQU1 Pt1ENT SP£CIFICA.TIOUS G MNAC[M[ NT APPROVAL PREPARED TO IMPlEHE'iT @ 1100Vl[ DESIGN FI NA LIZ£0 G MODULE 111PlE11ENTAT IOII G EXISTING MANUAL PlANS PREPARED FilES CO NV EAT£0 @ MODUlE DESIG~ DOCUMEtiTEO G >;((OA.O fORHATS 9 DATA COMMUNICATION DESIG"lEO liNKS IHSTAllEO @ PUBliSH All SYSTEM e tQU I Pt1EHT PROCURED DOCUH[NTATIOtl @ F'tl£ STRUCTUR($ OES JGH£0 INITIATION OF NEXT TASK G OPERATION INSTRUCTIONS ·G CONSTRAI NED BY' COMPLET ION G HACH It.[ PROGAAHS ~(PAR[O OF SEVEMl PR lOR TASKS COO(O 8 ASSEMBlY AND TEST OF @ INPUT AND OU!PUT KA RWAR£ COf1PlET£ fORMS OESJG~£0 Fig. 6 Continued. 202 Journal of Library Automation Vol. 2/4 December, 1969 Sometimes good lists of planning factors can be developed by reviewing other programs of a similar nature. While no list of planning factors de- veloped by other organizations or individuals will prove entirely satisfac- tory in a new tmdertaking, it seems wise to take advantage, where pos- sible, of others' experiences. Sequencing Activities The axiom, "The best place to begin is at the beginning," is probably less true of program planning activities than any of life's other endeavors. Planning should begin with the important program goals (the major pro- gram objectives as specified by the library's chief executive). This is an alien approach to many, for it seems more "natural" to assess one's present situation and then to ask "where do we go from here?" There are fewer unknowns associated with planning activity for the near future than for the far. Conditions can change radically during the course of a program. Assumptions may be discovered to be poor or false. After having been caught up in such situations a number of times, everyone finds it more natural to say 'Til cross that bridge when I come to it." But some people responsible for funding major library development programs are not "nat- ural" thinkers. They often want assurances of specific accomplishments within specified periods of time in return for a specified amount of funds. It is not unusual for them to get very "unreasonable" when a request for funds is not accompanied by these kinds of "justifications." Thus, program managers must approach the initial planning activity in an unnatural way. Dilliculties must be anticipated and contingencies identified. Above all, the plans must include recognition of every essential major activity. When plans are developed with reference to a carefully formulated work break- down structure, the chance of inadvertently omitting an important activity is greatly reduced. An example of a typical planning diagram is presented as Figure 6. Such a plan is developed by first selecting a primary project objective. Then, moving backwards in time, each task required to achieve the ob- jective is determined in sequence (13, 14, 15, 16, 17, 18). The process of charting tasks in this manner to show their contingency relationships con- tinues backward in time until a task contingent upon nothing other than authorization to proceed with the program is reached. As a practical matter, when a task has been identified that is contin- gent upon the completion of several other tasks, it is probably advisable to enumerate all of these tasks before selecting one to trace on back to the beginning of the program. Naturally, all of the tasks will have to be traced back before the charting process is complete. Preparation of the initial charts is an iterative process and assumes that a number of reviews will be made by knowledgeable individuals and their comments reHected in subsequent drafts of the preliminary chart. Systems Development Planning/ BELLOMY 203 During this preliminary planning stage an effort should be made to have the diagram reflect all tasks that everyone thinks essential. Furthermore, wherever tasks ideally should be conducted sequentially, they should be shown as sequential on the chart. When this procedure ultimately reveals schedule conflicts, compromises can be made. The logic adopted initially will likely be modified a number of times before even the first preliminary draft of the chart has been completed. Arrangements that seemed logical initially will be discovered to be inconsistent as the plan develops, and new approaches and subdivisions of activities will be required. Every good program planner knows that no amount of careful thought and foresight will result in the identification of all problem areas that will interfere with progress once the program is underway. Consequently, he will either explicitly or implicitly build into the program plan contin~ gency factors. In some cases there will be sequences of activities that can be completed ahead of the time when contingent tasks begin. In these cases the waiting time and contingency factors can be identical and the problem is solved automatically, so to speak. In the critical path (that sequence of activities which will take the most elapsed time to complete) there will be no waiting times so that contingency factors must be inter- jected into this sequence of activities. They may be explicitly identified as contingency time or they may be implicitly imbedded in other tasks in the program. For example, management reviews or evaluations can be "padded, with the additional contingency time required for a viable pro~ gram plan. The best program plan will result when the final preliminary draft of the planning diagram reflects the understanding of all the individuals re- sponsible for executing portions of the program plan. Their backgrounds and experiences will permit them to see discrepancies and inadequacies in the plan which any single man could not possibly see. In particular, they will tend to view the plan from their own organization's point of view and can be expected to scrutinize critically those areas for which they will have some responsibility. Some of their comments will not be com~ patible with the overall program philosophy or with the requirements of other organizations involved in the planing process. Someone will need to arbitrate the special interests of individuals reviewing the plan. It is im~ portant, however, to attain a degree of concurrence among all individuals before the planning diagram is finalized. Each of the involved individuals should consider the plan to be his plan, reflecting his judgment of what must be done to achieve the stated program objectives. The program man~ ager, who is responsible for the overall direction of the program, must be a primary participant in these negotiations, naturally. Not every program will require such detailed planning. The process of periodically reviewing and revising the detailed plans is time consuming and may be completely unwarranted where the pressures of time do not force the performance of many tasks simultaneously. Where all major program 204 Journal of Library Automation Vol. 2/ 4 December, 1969 activities can be scheduled for performance sequentially the process of planning is greatly simplified. Referring to Figure 3, again, it will be noted that the first major undertaking in the example is the formulation of a total system concept. The second major undertaking is the development of an order processing module. The third is the development of a systems and procedures module, and so on for the rest of the thirteen modules in the example. It is assumed that the development of each module is substantially completed before initiating the development of the next. Using the less detailed planning approach the interrelationship among the several major activities that could be undertaken in formulation of a total system concept are summarized in Figure 7. It will be seen that the second, third, and fourth activities could be scheduled to occur simul- taneously, if the necessary personnel to undertake them were available. However, there is no reason why they could not be performed sequentially. Taking the same gross planning approach the interrelationship among the various activities that might be undertaken to develop one of the modules are summarized in Figure 8. This generalized planning network could apply equally well to any of the modules. Statement of Work Those responsible for estimating the magnitude of work to be performed in each activity will require some knowledge of the scope of each activity. A generalized statement of work for the development of any one of the modules (Figure 8) might look as follows: 1) Formulate Module Objectives The objective of the module must contribute to achieving the objectives at all higher levels in the objectives hierarchy (Figure 2). In addition to a generalized statement of objectives for the module, a comprehensive list of specific objectives needs to be formulated, in particular, what functions the module must perform; in other words, what products the module must produce. In performing this task attention needs to be paid both to the generalized objectives of the parent organization as well as to the present activities of the existing operations which imply objectives themselves. The design concepts finally adopted will reflect these objectives. 2) Document Existing Operations In the process of formulating a total systems concept a great deal of documentation will have been assembled for all operations of the library. However, the emphasis there was on interfaces among operating units of the library. In executing the present task the emphasis is on detailed inputs, outputs, external constraints, processing information needs, resource re- quirements, and detailed procedures. This task must be concerned not only with specific items, such as books or forms, but also with specific data elements utilized or generated within the operation. Systems Development Planning/BELLOMY 205 Document Policies & Objectives Document System Define Sy ste m eQui rements Prepare Implementation Plans End r-------~--~----------------~ 7 Fig. 7. Total System Concept Formulation Planning Network. Formulate Module Ob·ectives Prepare Design Specifications 0 Design and Develop t-Jodule 0~----~C~o~nd~u~c~t_F~o~J~I~~·~·~u~p~E~v~a~lu~a~t~io~n~------~~ ~--------~R~e~f~in~e~M~o~d~u~l e~D~e~si~g~n------~-----20~o~c~u~~n~t~M~o~d~ul~e~De~s~i~gn~------------~ Fig. 8. Generalized Module Development Planning Network . 206 Journal of Library Automation Vol. 2/4 December, 1969 3) Analyze and Summarize The previous task provided the data necessary for putting together a comprehensive picture of the existing operation. The mass of data and materials which were collected need to be summarized, in a way which presents a concise display of the significant characteristics of the operation. All significant measurable parameters need to be identified. Those capable of succinctly characterizing the operation must then be measured under carefully controlled, typical operating conditions to provide an accurate picture of current costs and effectiveness of the operation. This task should culminate with the informal publication of a module parameter summary. 4) Formulate Design Concepts Once the module objectives have been formulated, various alternate means for achieving these objectives can commence to be discussed. One important objective of this particular task is the identification of as many alternate approaches to satisfying the objectives as can be conceived. In this regard "brainstorming" sessions are useful ( 19,20). The fullest range of techniques and devices available should be explored for possible use in the implementation of the system module. During early stages in the development of a design concept little concern is paid to even the obvious design constraints. Eventually, of course, a system concept must be postu- lated which satisfies these contraints, but initially even impossible ap- proaches may suggest others which are possible, so that all alternatives will be considered in the beginning. Before a design concept is finalized the result of the systems analysis of the existing operations should be eval- uated. When a single set of concepts is finally selected, estimates of devel- opment and operating costs for a new module based on the concept, to- gether with its projected effectiveness, should be made and compared with those of the existing operation. The design concept document should de- scribe all functions to be performed by the module, as well as special techniques or items of equipment which will be used. 5) Prepare Design Specifications Based on the generalized descriptions in the design concept document, detailed specifications for the module are prepared. These specifications include such considerations as: the numbers, kinds, output formats, acces- sibility, and frequency of various management reports; the number of proc- essing stations of various kinds; a comprehensive list of record contents; a description of all files required by the module; descriptions of all forms required by the module; personnel requirements and organizational de- scriptions; office layout; data processing machine software; equipment to be procured; timing of processes; and special module interface features. The documented design specifications should be circulated widely among operating personnel for comment and possible modification based on this comment. Systems Development Planning/ BELLOMY 207 6) Design and Develop Module This task includes the development of detailed procedures for transform- ing the module inputs into all of the required module outputs. Machine programs must be written, forms designs finalized, file structures and rec- ord formats optimized, detailed operating instructions and procedures written, equipment interfaces confirmed, and personnel training programs developed, to name most of the more important undertakings. While no attempt should have been made at this point in the development of a system module to prepare final formalized documentation, enough back- ground material should have been assembled to permit the preparation of such documentation. 7) Assemble Module Components Special equipment must be procured. Interfaces between the library and a remotely located electronic data processing system must be estab- lished. Existing personnel must be retrained and new personnel recruited. New communication links, if required, must be installed. 8) Test Module Design Every segment of the module design should be tested prior to its instal- lation. New items of equipment or communication channels should be tested through many cycles to verify their operating characteristics, as well as to familiarize a few members of the staff with their operation. If a pilot operation of the module is possible, it should be undertaken. During the testing phase a continuing effort should be made to detect serious design deficiencies. The module should be exercised through several processing cycles, utilizing as many different variations of input data and output requirements as possible. Such a testing phase should evaluate the ade- quacy of the various forms and reports, as well as provide some preliminary information about the accuracy of the predicted operating costs for the new module. 9) Install Module The first element of this task is the preparation of an installation plan. During the preparation of the installation plan early consideration needs to be given to the installation approach (phased, parallel, all at once) to be followed ( 21). During the changeover period special attention will need to be paid to operational problems as they develop. No system de- sign is perfect and during the installation period major design deficiencies may become apparent. The major file conversion efforts are included in this task. This task culminates with turning the new module over to the operating personnel. 10) Conduct Follow-up Evaluation During the new system's shakedown period it will have been forced to operate as intended by the designers and the department supervisor. 208 Journal of Library Automation Vol. 2/4 December, 1969 However, the real test of the workability of the system comes after this initial period when the system is "released" to run without any special attention being paid to it. After the system has been in operation for a period of time an evaluation of its effectiveness and the actual operating costs should be undertaken. Because no system is ever perfect, even a brand-new system may be significantly improved as a result of this follow- up evaluation. 11) Refine Module Design If the follow-up evaluation has disclosed any design deficiencies, a modi- fication of the original module design is undertaken where the cost of correcting the deficiency is not greater than the value of the improved operation. 12) Document Module Design After warranted modifications to the module design have been made as the result of the follow-up evaluation, the module design and operating instructions should be formally documented and released. Until about this point in time the module design parameters may have been undergoing a process of gradual evolution, so that formal documentation of them may not have been justifiable. Full and careful documentation of the new module design completes the module development project. Estimating Once the preliminary plan has been completed and approved, estimates of manpower, equipment, and materials requirements can be made. Some people find it convenient to mark the various estimates on the PERT planning diagram itsell, using different colors for each of the estimators. This has the advantage of displaying all previous estimates to each indi- vidual attempting to provide estimates for other activities in the program. However, this approach results in estimates being made on the spot with- out the careful deliberation and evaluation which they deserve and, there- fore, probably is not advisable. The use of estimation worksheets can be effective. A worksheet that has been prepared for the example program is presented as Figure 9. (Note that a task breakdovvn has been included for illustrative purposes in the first two program elements, only.) Each planned activity is entered on the form, where activities have been numbered in their general order of occurence. Enough copies of these forms are then made so that each organization can have its own full set to use for estimating. The responsible individual in each organization provides estimates of required manpower, elapsed time, materials costs, and special equipment or facilities based on his understanding of the job. Estimates of manpower requirements are made by category of manpower, except where a specific individual must be applied to a specific task. In such cases this individual Systems Development Planning/ BELLOMY 209 MONTHS SERVICES ELA PSE D & MAN HOURS BY CATEGORY>', No. PROGRAM ELEMENTS TIME EQUIPMENT MATERIAL 1 2 3 4 5 A TOTAL SYSTEM CONCEPT 18 0 0 1800 1500 1600 -- -- 1 Assemble Document at ion 5 -- -- 200 200 100 -- -- 2 Document Organization 2 -- -- 100 100 200 -- -- 3 Oocumen t Sys tern 10 -- -- 1000 1000 800 -- -- 4 Policies and Objectives 5 -- -- 100 100 200 -- -- s Define System Requirements 1 -- -- 200 -- 100 -- -- 6 Total System Concept 1 -- -- 100 so 100 -- -- 7 Implementation Plan 2 -- -- 100 50 100 -- -- B ORDER PROCESSING MODULE 26 $2S,OOO $23,000 1500 1600 2000 3000 800 1 Formulate Objectives 3 -- -- 100 10 10 -- -- 2 Document Operations 4 -- -- 100 so 200 -- 200 3 Analyze and Summarize 3 -- -- 100 so 200 -- 100 4 Design Concep ts 1 -- -- 50 so 100 100 -- 5 Design Spec if i cett ions 1 -- -- so 10 90 100 -- 6 Design and Oeve 1 op 12 -- $18,000 6oo 20 0 600 1500 100 7 As semb 1 e Components 1 $22,000 -- so 10 100 -- -- 8 Test Design 1 -- $ 3,000 50 -- 90 150 -- 9 Install Module 2 $ 2. 000 $ 1 ,000 100 1000 100 150 400 10 Follow-up Evaluation 1 -- -- so 20 100 200 -- 11 Refine Design 1 $ 1,000 $ 1,000 so so 100 300 -- IZ Document Design 3 -- -- zoo 150 300 500 -- c SYSTEMS & PRO CEDURE MODULE 18 $ 1,000 $ 3. 000 4000 3000 700 zoo SOD D MATER IAL PREPARAT ION MODULE 6 $ 1,000 $10.000 200 200 500 500 -- E CIRCULATION MODULE 18 $41,000 $15 , 000 2000 2000 2000 3000 zooo F USER EOUCATI ON MODULE 18 $10,000 $10.000 1000 400 500 zooo -- G INVENTORY CONTROL MODULE 6 $ 2,000 $ 6,000 100 300 500 300 -- H PERSO NNE L CONTROL MODULE 12 $ 1,000 $10,000 1000 500 1000 700 -- I CATA LOG ING MODULE z4 $35,000 $30.000 4000 1000 3000 4000 -- J L1 BRARY ACCOUNT ING MODULE 12 $ 2,000 $10,000 1000 500 1000 2000 K MATERIALS SELECTION MODULE 12 $ 7,000 $ 6,000 1000 200 1000 1000 -- L MANAGEMENT INFORMAl I ON MODULE 18 $ 8,000 $10,000 1000 300 2000 1000 -- M REFERENCE MODULE 24 $10,000 $15,000 2000 2000 3000 3000 -- N INFORMATION RETRIE VAL MODULE 36 $50.000 $30,000 4000 2000 4000 5000 -- * (1) L ibrarian, (2) Clerk-Typist, (3) Ana l yst, (4) Programmer, (5) General Assistance Fig. 9. Cost/Time Estimates. is identified as a separate category of manpower and estimates are made separately for him. When all estimates have been completed, the costs are summarized by funding categories, as has been done for the example in Figure 10, and the elapsed time estimates are marked onto the planning diagram. Scheduling An elapsed time analysis is performed to determine the estimated time of completion for every event in the program. This is accomplished by adding together all the estimated elapsed times in a sequence of activi- ties, and indicating at each event marker the cumulated elapsed time to that point. Where several sequences of activities converge on a single 210 Journal of Library Automation Vol. 2/4 December, 1969 GENERAL NON- SUPPL I ES EQUIPMENT ASSIST- ACADEMIC ACADEMIC & & TOTAL NO, PROGRAM ELEMENTS ANCE SALAR IES SALAR IES EXPENSES FACILITIES COSTS A TOTAL SYSTEM CONCE PT $ -- $ 10,500 $ 16, 050 $ 2,000 $ -- $ 28,550 I Assemble OocLmentat ion 2 Document Organizat ion 3 Document Sys t em 4 Polic ies a nd Object ives 5 Define Sy stem Requirements 6 Tot a I System Concept_s 7 I mpl ementati on Plan B ORDER PROCESSING MODULE 2,200 8, 700 19,320 39,050 25,000 94, 27 0 I Formulate Object i ves 2 Oo ctnent Operat i o ns 3 Ana I yze and Sl.W1111a r i ze 4 Design Con cepts 5 Design Specifications 6 Design and Deve I op 7 Assemb I e Components 8 Test Des i gn 9 Install Modul ~ 0 Follow-Up Eva luat ion I Refine Des ign 2 Doc ument Design c SYSTEMS & PROCEDURE MODULE 1,375 23,200 13,350 4,070 1, 000 42,995 D MATERIAL PREPARATION MODULE -- 1, 160 4,290 12,675 1,000 19, 125 E CIRCULATION MODULE 5,500 11,600 20 ,400 31 , 050 4 1, 000 109,550 F USER EDUC AT I ON MODULE -- 5 ,800 5 , 230 20,700 10 , 000 41,73 0 G INVENTORY CONTROL MODULE -- sao 4,560 7,600 2,000 14,740 ~I PERSONNEL CONTR OL MODULE -- 5,800 8,850 13.750 1,000 29,400 I CATALOGIN G MODULE -- 23 ,200 25 ,200 51 , 400 35,000 134,800 J Ll BRARY ACCOUNTING MODULE -- 5,800 8,850 20,700 2,000 3 7. 350 K MATER I ALS SE LECT I ON MODULE -- 5,800 8,040 11,350 7,000 32,190 L MANAGEMENT I NFORHATI ON MODU LE -- 5 ,800 15 , 810 15,350 8,000 44,960 M REFERENCE MODULE -- 11,600 27,900 3 1, 050 10,000 80,550 N INFORMATION RETRIEVAL MODULE -- 23,200 35 , 400 56 ,750 50,000 165,350 --- --- --- --- --- --- TOTALS $ 9, 075 $142,740 $213,250 $317 , 495 $193,000 $875,560 Fig. 10. Costs by Budget Category. event marker, that sequence requiring the longest period of time deter- mines the cumulative elapsed time to reach that event. Those sequences which require less time will have slack time (waiting time) built into them and this can be used for adjusting schedules to minimize peak manpower, equipment, or facilities loading. When the cumulative elapsed times for each event have been determined for the entire program the preliminary scheduling activity can commence. It is convenient to use tenth's of forty-hour-work weeks in expressing elapsed times because 1/10th of a week equals a half day, which often seems to be a good minimum unit of time for estimating purposes. When the elapsed time analysis is complete it may be determined that the total elapsed time estimated for the program is incompatible with the Systems Development Planning/BELLOMY 211 required program completion date. If this happens, it will be necessary to reinspect the program logic in an effort to identify activities originally planned to occur in sequence which can, in fact, be performed in parallel. Such a change in the plan, however, will almost always imply increased risks. Sometimes it will be possible to compensate for the increased risk by additional backup efforts, or by assigning the same activity to two different groups for simultaneous parallel performance. Upon closer scru- tiny it may be found that some of the activities originally thought essential are, in fact, merely desirable and can be eliminated from the plan entirely. Eventually, this strategy will force the planned program elapsed time to be compatible with the program completion date specified by the program manager. In establishing schedules for activities, it is always best to leave any available slack at the end of a sequence of activities rather than at an earlier time in the sequence. Because the above approach to scheduling may produce undesirable manpower peaks or unreasonable work schedules for individuals, early drafts of the schedule likely will need to be modified significantly before the draft can be finalized. A preliminary schedule is prepared by charting on a graph the earliest beginning and ending time for each activity identified in the plan. In Figure 11 such a schedule has been graphed. Tasks which are not contingent upon anything other than the start of the program (Tasks 1 and 2 in Figure 8) can be scheduled to commence on the first day of the program and be scheduled for com- pletion in the estimated elapsed time for each one. For example, if Task 1 had been estimated to require an elapsed time of three months the graph would show a bar starting at the beginning of the program and running out to the third month. Some of the tasks (Tasks 3 and 4) depend on the completion of earlier tasks. Thus, Task 4 (in Fi~re 8) could not commence until the third month. Then, if the elapsed time to complete that task had been estimated at one month, the bar for that task would begin at the third month and end on the fourth month. Similarly, all tasks are scheduled in this way for the entire program. Utilizing the other estimations (See Figure 9) and making reasonable assumptions about how the man-hours are distributed in time for each task, the total number of man-hours by category can be calculated for any week in the program. If it were assumed, for example, that the ex- penditure of personnel time was evenly distributed throughout each of the tasks, and if two tasks were scheduled during the same week, with an average of 15 hours per week for one task and 25 hours per week for the second, a total of 40 man-hours of labor of that particular category would be estimated to be expended during the week in question. This sort of analysis is continued until the estimates of man-hour expenditures by category are available for each week of the program. Now it is possible to analyze any period of activity in the planned pro- gram to determine what level of each category of manpower or special NO. PROGRAM ELEMENTS 12 ELAPSED TIME IN MONTS 24 36 A TOTAL SYSTEM CONCEPT I Assemble Documentation 1-- 2 Document Organization -3 Document System 4 Policies and Objectives - 5 Def i ne System Requirements • 6 Total System Concept • 7 Implementation Plan - B ORDER PROCESSING MODULE 1 Formulate Objectives -2 Document Operations -3 Analyze and Summarize .... 4 Design Concepts • 5 Design Specifications • 6 Des i gn and Develop 7 Assemble Components • 8 Test Design • 9 Install Module -10 Follow- Up Eva l uation • 11 Refine Design • 12 Document Design - c SYSTEMS & PROCEDURE MODULE D MATERIAL PREPARAT ION E CIRCULATION MODULE F USER EDUCAT ION MODULE G INVENTORY CONTROL MODULE H PERSONNEL CONTROL MODULE I CATALOGING MODULE J LIBRARY ACCOUNT ING MODULE K MATERIALS SELECTION MODULE L MANAGEMENT IN FORMATION MODULE H REFERENCE MODULE N 1 NFORMAT I ON RETRIEVAL t~ODULE Fig. 11 . Syst ems Development Program Schedule. 41l 60 72 • 1:-0 ~ 1:-0 'c ~ - .Q.. ~ & ~ ~ > ;: 8" £ ..... .... 0 ;:s < 0 ~ ~ tj Cl> () Cl> 8 0" Cl> ~'"I ~ (.0 ~ Systems Development Planning/BELLOMY 213 facilities will be required during that period. During the first months of a typical program there will be heavy demands made on various cate- gories of manpower. Furthermore, later in the program there will be pe- riods when practically no demand is made for the same categories of man- power. It usually would be desirable to minimize the peaks by shifting some of the activities to later times when fewer demands were being made. It is almost always possible to accomplish some shifting of schedules in a typical program. After an evaluation of the manpower loading implications of various scheduling alternatives a program schedule like that shown for the exam- ple in Figure 11 might be adopted. Based on the cost/time estimates presented in Figures 9 and 10 and the program schedules presented in Figure 11, resources requirements by year can be developed for the life of the program. The manpower load- ing chart (Figure 12) shows manpower requirements for each of the four categories of skills speci£ed. The cost phasing chart presented as Figure 13 shows funding requirements by category for each year of the program. It would be possible, of course, to further break down the costs into indi- vidual accounts as discussed in the section describing the work breakdown structure. A much tighter time phasing of all categories of costs is required for program control, but that subject is beyond the scope of the present article. o---o Ll brad ans )E ~ Typ ists f • Anal ysts • • Prograrrrners l3 T IM E I N YEARS Fig. 12. Manpower Loading Chart. YEAR 1 2 3 4 5 6 7 8 9 10 II 12 13 TOTAL ----- --- MAN HOURS BY CATEGORY>'< il COSTS BY CATEGORY•'<>'< I 2 3 4 5 i GA Ac.Sal. N-Ac.Sal. S & E E & F 1,300 1,300 1 '100 0 0 0 8,000 12,000 - - 2,800 1,300 1,200 0 300 1,000 16,000 12,000 2 , 000 1,000 3,400 3,600 2,400 3,500 1,000 3,000 20,000 37,000 10,000 0 1,800 1, 700 1,800 2,200 1,500 4,000 10,000 17,000 46,000 25,000 2,300 1,500 2,200 5,200 500 1,000 13,000 15,000 80,000 55,000 2,000 900 1,600 1,800 0 0 12,000 9,000 14,000 I ,000 2,000 500 2,000 2,000 0 0 12,000 5,000 51,000 35,000 2,300 500 2,000 2,300 0 0 13,000 5,000 1 I ,000 7,000 1,700 1,200 3,000 2,200 0 0 10,000 12,000 46,000 8,000 1,800 1,500 2,200 2,500 0 0 10,000 15,000 0 10,000 1,400 700 I ,500 2,000 0 0 8,000 7,000 57,000 0 1,400 500 1,500 2,000 0 0 8,000 5,000 0 50,000 400 300 3oo 0 0 0 2,000 3,000 0 0 24,600 15,500 22,800 25,700 3,300 9,000 142,000 154,000 317,000 192 , 000 ------- ----- - --- --- --'-------- *(1) Librarian (Academic Salaries) (@$5.80/hr.), (2) Clerk-Typist (Non-Academic Salaries) (@$2.70/hr.) (3) Systems Analyst (Non-Academic Salaries) (@$7.50/hr.), (4) Prog ramme r (Supplies & Equipment) (@$5.35/hr.), (5) General Assistance (@$2.75/hr.), NOTE: 1,800 working hours per year used. ,.,., Costs rounded to nearest $1 , 000 Fig. 13. Systems Development Program Cost Phasing. - Tot a i 20, 000 32,000 70,000 102,000 164,000 36,000 103, 000 36,000 76,000 35,000 72,000 63,000 s,ooo, 8J4,oooi -~ 1.\0 ~ ~ w "i s ,_ .Q., l:""t .... C"' a ~ ::to. ~ f ..... c· ;:! <: ~ 1.\0 ~ t:j (1) n (1) ~ (1) ~"'1 ~ "' ~ Systems Development Planning/ BELLOMY 215 THE "COMPLETED" PLAN When the planned program is finally compatible with the externally imposed constraints and when there is a reasonable degree of concurrence among all the involved organizations, it is generally desirable to formalize the documentation. The temptation to consider the document unchange- able will be eliminated if it is pointed out that individual dates in the schedule reHect current "best estimate" targets, and that planning and rescheduling will be a continuing effort throughout the program. The wide availability of program plans permits all involved individuals to assess the impact of their efforts on the overall program. Furthermore the PERT planning diagram provides them with a convenient means for recording their performance against the program goals. Finally, the cost and benefits data contained in the plans are major inputs to any Planning-Programming-Budgeting System (PPBS) and this more rational approach to partitioning limited resources among the many competing activities of large institutions, like universities, is going to become an in- creasingly significant part of library operations in the future ( 22, 23, 24). No plan is ever final. It must be periodically reevaluated and warranted modifications reflecting newly identified requirements or changes in the operating environment, etc., must be made. It is an axiom of total systems design that the implementation of earlier parts of a system may so signifi- cantly modify the actual operating environment as to dictate major changes in the design specifications for other parts of the system to be implemented later. Thus, a total system is much more likely to evolve, than to unfold according to some predetermined design. We must con- tinue to expect systems work to "evolutionize" rather than revolutionize library operations. ACKNOWLEDGMENTS The work on which this article is based was supported in part under the grant from the Council on Library Resources to the Institute of Li- brary Research for the preparation of the forthcoming Handbook of Data Processing for Libraries. Eugene Graziano made a perceptive and critical evaluation of an earlier draft of this paper which led to its extensive revision; and Robert Hayes encouraged development of this article from material the author prepared for the Handbook of Data Processing For Libraries. REFERENCES 1. Hayes, R. M.: "Concept of an On-Line, Total Library System," Li- brary Technology Reports, (May 1965), 13. 2. De Gennaro, Richard: "The Development and Administration of Automated Systems in Academic Libraries," Journal of Library Auto- mation, 1 (March 1968), 75-91. 216 Journal of Library Automation Vol. 2/4 December, 1969 3. George, C. S., Jr.: The History of Management Thought (Englewood Cliffs, N. J.: Prentice-Hall, 1968). 4. Roberts, Edward B.: "Industrial Dynamics and the Design of Manage- ment Control Systems," In Management Controls, edited by Charles P. Bonini (New York: McGraw-Hill, 1964), pp. 102-126. 5. "Feedback," The Systemation Letter, 166 ( 1966). 6. Wheeler, J. L.; Goldhor, H.: Practical Administration of Public Li- braries (New York: Harper and Row, 1962). 7. Deardon, John: "How to Organize Information Systems," Harvard Business Review, 43 (March 1965), 67-73. 8. "How a System is Built," The Systemation Letter, 186 ( 1966) . 9. "Analysis Check List," The Systemation Letter, 12 ( 1958). 10. Holtz, J. N.: An Analysis of Major Scheduling Techniques in the Defense Systems Environment (Santa Monica, California: The Rand Corporation, 1966). 11. Minnich, C. J.; Nelson, 0. S.: Systems Management ror Greater Profit and Growth (Englewood Cliffs, N.J. : Prentice-Hall, 1966), pp. 16-34. 12. National Aeronautics and Space Administration: NASA PERT in Fa- cilities Project Management (Washington, D.C.: U. S. Government Printing Office, March 1965), pp. 9-11. 13. Kadet, Jordan; Frank, Bruce H.: "PERT for the Engineer," IEEE Spectrum, 1 (November 1964), 131-137. 14. Management Systems Corporation : DOD and NASA Guide PERT COST System Design (Cambridge, Massachusets: June 1962), 145 15. National Aeronautics and Space Administration: NASA-PERT "C" Computer Systems Manual (Washington, D.C.: Government Print- ing Office, September 1964). 16. PERT Coordinating Group: PERT Guide for Management Use (Washington, D. C.: U. S. Government Printing Office, June 1963). 17. PERT . .. a Dynamic Project Planning & Control Method. IBM Gen- eral Information Manual (White Plains, New York: IBM Data Process- ing Division), 28 pp. 18. Navy Department, Special Projects Office: An Introduction to the PERT/ COST System for Integrated Project Management (Washing- ton, D. C.: U. S. Government Printing Office, 1962). 19. "Brainstorming: Cure or Curse?" Business Week, 1426 (December 1956) . 20. Osborn, Alex: "Brainstorming, New Ways to Find New Ideas," Time, 99 (February 1957), 90. 21. Systems and Procedures Association: Business Systems (Cleveland; 1966). 22. "Planning-Programming-Budgeting, Selected comment prepared by the Committee on Government Operations, United States Senate, July 26, 1967 (Subcommittee on National Security and International Operations) . Systems Development Planning/ BELLOMY 217 23. Hartley, Harry J.: Educational Planning-Programming-Budgeting: A Systems Approach (Englewood Cliffs, New Jersey: Prentice-Hall, 1968). 24. Fazar, Willard: "The Importance of PPB to Libraries," Presented to the Institute on PPBS for Libraries, Department of Library Science, Wayne State University, Detroit, Michigan, September 23, 1968. 4666 ---- 218 AN ANALYSIS OF COST FACTORS IN MAINTAINING AND UPDATING CARD CATALOGS J. L. DOLBY and V. J. FORSYTH : R&D Consultants Company, Los Altos, California This study enumerates and compares costs of manual and computerized catalogs. The difficulties of making comparative cost studies are examined. The report concentrates on the problems of cost element definition and on the reporting of as many comparable sources as possible. Results of cost studies are presented in the form of tables that show comparative costs of cataloging, card processing, conversion, and manual and comput- erized processing. There are also tables on card catalog costs. Conclusions are that the costs of manual and automated methods are essentially the same for short entries, and that there is a substantial economic advantage for automated methods in full entries. A side benefit of the present interest in library automation is the amount of attention now being given to study of the traditional methods of librari- anship. This phenomenon is hardly unique to librarianship; in almost every area of human endeavor where attempts have been made to introduce the use of computers, workers in the field have suddenly discovered that they did not understand some of their long-standing methods quite as fully as they had believed. The source of this seeming anomaly is easy to find: to program a computer, it is necessary to specify the work to be done in much greater detail than is necessary to explain the same problem to a human being, that curious human phenomenon known variously as "common sense" or "experience" making up the difference. It has not been uncommon over the past decade to hear many survivors of the "automa- tion experience" admit that a main benefit of use of the machine was Catalog Cost Factors/DOLBY and FORSYTH 219 acquisition of better procedures through a more detailed understanding of the process involved. Improved knowledge of "processes about to be automated" extends to the cost of the process as well, and with added force. In recommending the substitution of one procedure for another in a cost-conscious atmos- phere, it behooves one to proffer sound financial reasons for doing so. Computers are expensive devices. They also represent expenditure of a different kind of money: capital or lease funds in place of labor expense. Thus, although one can still hear the occasional cry that it is difficult to obtain reasonable cost data on various parts of library operations, it is be- coming increasingly difficult to pick up an issue of almost any library journal that does not include at least one piece of cost information. This paper is concerned with the cost of maintaining and updating card catalogs. As the authors have observed elsewhere (l), the cost of comput- ing is going down at a rather spectacular rate, while the cost of labor is increasing. If this trend continues, almost every library will be forced to automate certain aspects of the catalog operation at some point in time. The cited report provided some information about the cost of computer- ized library catalogs. By adding a summary of the cost factors in the use of card catalogs, this article should place in slightly better perspective the more difficult problem of deciding (in the context of a particular li- brary) when the crossover point between manual and automated methods is to be reached. The plan of attack remains essentially the same as in the previous re- port: selection from among the growing number of papers on the subject those that provide comparable sets of cost information pertinent to the various cost elements of the card catalog operation. It is appropriate, therefore, to begin this study with a brief description of the difficulties in comparing cost statistics in such a way. PROBLEMS OF COMPARATIVE COST STUDIES Although comparative cost studies have much to recommend them, they are fraught with certain difficulties ( 2). In the first place, few librarians would group elementary cost operations in precisely the same way. One library may consider a particular element of cost as part of the acquisi- tions operation and a second as a part of the cataloging operation; a third may ignore it altogether, or include it in the burden or overhead cost. Nor is this mere capriciousness on the part of members of the library com- munity. Library operations not only differ from one another, but they also change with time. Consider, for example, the problem of obtaining a set of catalog cards for a particular monograph. Any or all of the following alternatives might be in use at a given library: the cards may be l) supplied with the book as a service provided by the bookseller at some extra cost; 2) ordered from the Library of Congress; 3) provided by a centralized cataloging 220 Journal of Library Automation Vol. 2/4 December, 1969 operation serving several libraries (as in a county or state library system); 4) prepared by catalogers working in the library; or 5) generated by computer program from standard listings (e.g., from MARC tapes). Comparing any two of these procedures within a given library does not present any overwhelming problems, although minor questions of defini- tion do occur (for example, how much of the cost of ordering should be allocated to the acquisitions department and how much to the cataloging department when both the book and the catalog cards are obtained si- multaneously from the same source?). However, to compare costs from two different libraries, it is essential to know what proportion of each card source was used by each library. Fortunately for the purposes of this study most libraries are presently using a mix of method 2 ( LC) and method 4 (own catalogers), and at least some provide sufficient informa- tion to enable determination of the appropriate mix for each. However, the problem is indicative of one essential difficulty in comparative cost analyses; and one that, although eased, would not be eliminated by having all libraries band together for adoption of a standard costing procedure. A second difficulty arises from temporal and geographic differences in the cost of manpower. On the surface, this problem can be eliminated, or substantially reduced, by having all studies based on man-hours spent, rather than on dollars required per item, and a number of writers have suggested such a change in reporting procedure. However, the problem is not quite so simple. For example, determining the number of man-hours spent on cataloging adds cost to the study that tends to reduce the num- ber of libraries willing to report; those that do report may or may not be a representative sample of the total. However, there is a more basic problem. In almost all libraries the real restraint on activities is financial: there are just so many funds available for cataloging and these must be used to at least keep the backlog of un- cataloged material down to the amount of space available to store it. Suppose, for instance, that the amount of material to be cataloged in- creases by ten percent from one year to the next and that the catalogers are fortunate enough to obtain ten percent salary increases over the same period. It is not impossible to consider that in some libraries the catalogers may be forced to "earn" this raise by absorbing at least a part of the in- creased load without extra help. Balancing salary increases by productivity increases is, of course familiar in industry and may well exist in libraries. As evidence that such an effect is present, it is noted later in this report (see Table 4) that three studies made in three rather different libraries over a period of six years showed costs of from $0.228 to $0.235 per card for preparation, production, and filing. The total range ( $0.007) is only three percent of the average cost peF card. ($0.230). Such close agreement would be startling if it were found in three simultaneous studies of three nearly identical library operations. To set this agreement aside as pure coincidence seems unwarranted. It is Catalog Cost Factors/DOLBY and FORSYTH 221 more reasonable to assume that librarians are forced to operate under strong financial constraints and that they adjust their performance to those constraints through hiring of less well-trained personnel, increased time pressures on all personnel, etc. If this is the case, "standardized" reporting through time figures might be quite misleading unless cost figures were reported as well. Finally, there is the question of allocating burden or overhead. Poten- tially, burden could present a severe problem, and occasionally it may. However, in most of the reports cited here, burden is either ignored or separately stated and there is no reason to suspect that the results given in the summaries are noticeably biased by unseen burden differences. Nevertheless, it would be of interest to determine proper overhead figures for library operations, as the switch to automation (which seems inevita- ble), will entail the use of more machines and fewer people, which in turn may drastically alter the overhead structure. THE USE OF COST INFORMATION Having noted some of the difficulties that tend to cloud cost compari- sons, it is perhaps useful to investigate how cost information is likely to be used. The nature of the problem can be illustrated by two rather dif- ferent situations. One is exemplified by Library "A", a large public library of some years' standing. It is considering the possibility of changing from its present manual procedures to some form of automation, and wishes to determine a reasonable strategy for implementing such a change over the next five years. Library "B", otherwise comparable to "A", has been key- boarding the catalog records of its current acquisitions for the last three years. It has now decided to convert its retrospective catalog and wishes to choose the most economic procedure for this step. The differences in the problems facing two such libraries are basically the classic differences between strategy and tactics. Library "A", must lay out a long-term plan, taking into account the growth in its collection over the five-year period, likely changes in equipment and personnel available to it, increases in labor costs, decreases in equipment usage costs, etc. Library "B", on the other hand, is in the position of making a specific set of decisions as to whether the work should be done in-house or subcon- tracted; whether the Library should use punched cards, punched paper tape, or optical character-recognition devices; and so forth. In terms of cost, Library "B", has to prepare a specific budget request for its funding agency, and it is reasonable to assume that that funding agency will require assurance that the task is to be accomplished at the minimum cost consistent with the designated quality level. Cost differ- ences of as little as five percent may be quite important to Library "B". General cost summaries can be of use only in enumeration of the possible alternatives. Even the accounting procedures in effect in the local system will have a bearing on the final decision. 222 Journal of Library Automation Vol. 2/ 4 December, 1969 Thus, the primary utility of a general cost summary to the library about to commit itself in a tactical situation is the information it can provide about the problem statement: which cost factors other libraries have been able to identify in similar situations; which of the various alternatives may be safely eliminated from consideration on the grounds that their present costs are considerably higher than other existing methods; and so forth. The likelihood seems remote that any general study, or, for that matter, any particular study, will be sufficiently applicable to the library now undertaking the problem to enable it to take over cost structures unchanged. Library "A", faced with establishing a long-range plan, has much more flexibility available to it. Its interest in specific costs will be established by some gross notion as to what quantity of funds are likely to be avail- able over the period under plan. Some procedures may be seriously con- sidered because they are relatively new and untried and hence of poten- tial interest to national funding agencies who would not consider funding further experiments with procedures that have been thoroughly tested. Access to good cost information of such well-tested procedures will help in establishing the likely costs for important aspects of the overall plan. Of even greater interest is the possibility that certain costs are likely to undergo substantial change over the planning period. For instance, in Reference 1 it was noted that optical character recognition may be a very attractive long-run option for catalog conversion problems precisely be- cause it is so new, and hence has not had time to allow a sufficient num- ber of service centers to spring up to provide truly competitive service capabilities. Computer typesetting with the new generation of hardware is in much the same category. In both situations it is clear that what is most needed is the enumera- tion of cost elements on the one hand and operating cost experience on the other. Precise estimates of any one cost element are of relatively little importance, either because they are so likely to change over the long run, or because they are likely to be not appropriate to a specific application even in the short run. Comparative cost information would therefore seem to provide a good basis for either application. The comparison forces an enumeration of cost elements precisely because one must evaluate the cost structure of each source to be sure that a reasonable comparison is possible. Reporting of the actual experience of several libraries provides a range of experience, not only over several libraries but also over time, so that the extremes reported give an indication of the variability that must be allowed for. In what follows, therefore, concentration is on the problems of cost ele- ment definition and on the reporting of as many sources as are comparable in the broad sense. Because precise estimates are not only difficult to ob- ' tain, but also unlikely to be relevant to most users, no attempt has been made to provide formal estimates either of the average cost figures or of their underlying variability. Catalog Cost Factors/ DOLBY and FORSYTH 223 THE COST OF CATALOGING The preparation of catalog information for a given monograph is perhaps the most sophisticated operation in the entire catalog operation. As such it is probably the last to be considered a candidate for automation, al- though it is not unreasonable, even now, to consider the use of computers as aids to the cataloger. Consequently in many operations the cost of cataloging will continue to be an invariant regardless of whether automa- tion is introduced into other aspects of the catalog operation or not. Never- theless, it is useful to study the cost of catalogs, both to establish the rela- tive cost of cataloging and the subsequent processing steps, and to estab- lish the line of demarcation between the catalog step and the subsequent steps. Any enumeration of the detailed steps involved in a complex process must be tentative. This is nowhere more true than in the cataloging opera- tion. Fortunately the number of descriptions in detail is growing. For the cataloging operation, three sources of information were used: 1) a de- tailed analysis made as part of an overall time and motion study of opera- tions in the Lockheed Research Library ( 3); a detailed study of the cata- loging and processing activities of the New York Public Library as a pre- liminary to possible automation of some of these operations ( 4); and a detailed study of the acquisitions, cataloging, and other processing opera- tions for the Columbia University science libraries ( 5) . A summary of these studies is given in Table 1. In addition to the eight items in Table 1, the Lockheed Library study included five other items that we have chosen to include in subsequent operations. It is generally true that professionals do not like to have their jobs sub- jected to the minutiae of time and motion study. There is always the ugly feeling that the creative (and most important) aspects of the job cannot be subjected to simple measurement. Nevertheless, cataloging is a continu- ing effort in most libraries and it is possible to establish some average production rates in terms of number of books cataloged per month or the number of minutes needed per book. The problem, as with most statistical studies, is not with the establishment of objective measurements but rather in the manner in which they are interpreted. Use of comparative statistics does not eliminate the possibility of misinterpretation but it does tend to minimize it. The comparative studies selected for the cataloging operation, in addi- tion to those already cited, were: a Colorado study based on average cataloging times for eleven librarians from six cooperating libraries ( 2), and a study of ordering, cataloging, and preparations in several South- ern California libraries ( 6). The catalog cost information for these five studies is summarized in Table 2. Table 1. Cataloging Cost Elements Columbia University Science (With LC information) 1. Assign class number 2. Compare book and card, check entries in general catalog, es- tablish subjects, etc. 3. Make necessary changes in LC proof slip, or type temporary slip giving brief descriptive in- formation and class number 4. Completed books revised and sent for shelf listing ( Without LC information) I. Supply descriptive cataloging 2. Subject analysis, classification and authority work 3. Type workslip for processing section. New York Public 1. Review work done by searcher. Reconcile conflicts and approve new entry forms 2. Full descriptive cata- loging 3. Assign subject entries 4. Assign divisional cata- log designators 5. Check authority files and establish new au- thorities and cross references 6 . Determine classmark Lockheed Research Laboratory 1. 2. 3a. 3b. 4. 5. 6. 7. Get book and analyze for sub- ject. Obtain Dewey and Cutter numbers Check shelf list for duplicates and copy number (With LC information ) Insert and type copy slip and temporary catalog card, check LC subject headings and other references. Descriptive and sub- ject catalog book Pencil call number on title page (Without LC information) Insert and type descriptive part only on copy slip and temporary catalog card. Write subject data only on catalog card. Pencil call number on title page Tear and separate copy slips and temporary cards. Proof and cor- rect as necessary. Take report to reports cataloging Travel to library, check national union catalog or other reference book Count and tally titles cataloged 1:'0 ~ ....... ~ 5 ..... ..a t'"'l .... c::s-' a ~ :;t.. ~ ~ .... g· 6- !:""" 1:'0 ~ tj (b (') (b s 0"' $ ..... co ffi Catalog Cost Factors/ DOLBY and FORSYTH 225 Table 2. Comparative Costs of Cataloging Library Source Date Average Cataloging Implied Avg. Time, min. Cost Salary (per hour) Lockheed 1967 10.0 Colorado 1969 28.6 $ 2.07 $ 4.34 New York 1968 39.8 6.30 5.25 So. Cal. 1961 44.8 2.23 2.98 Columbia 1967 84.0 5.85 4.17 In the Lockheed and Colorado studies, basic times of each operation were studied and then "standard" time factors added to allow for non- productive time. The standard factors increased the Lockheed times by 13 percent and the Colorado times by 48 percent. (The times in the table include these allowances.) The figures for New York were derived from their reported statements that they processed 65,000 books using 49 cata- logers at a total cost of $409,500 (not including fringe benefits). The Co- lumbia figures have been reduced by 20 percent to eliminate fringe bene- fits. The implied average salary for each source was obtained by dividing the total cataloging cost by the average time and multiplying by 60 to convert to cost per hour. The simplest conclusion to reach from a study of Table 2 is that cata- loging costs vary widely from one library to another. Average times differ by more than 8 to 1 and total cost varies by more than 3 to 1. The low salary for the Southern California study is presumably explained by the fact that that study was done in 1961. Adjustment of this figure for aver- age salary increases from 1961 to 1968 would undoubtedly bring their total cost more directly in line with the other studies (Bureau of Labor Statistics shows hourly wages increased approximately 30 percent over this period) . It would be interesting to know if the presumed increased salaries of the Southern California catalogers has led to a decrease in the average time they spend on cataloging. The more recent data on Colorado and New York suggest that this might be expected. The Columbia and Lockheed time data represent, perhaps not unreas- onably, the extremes in this table. The Lockheed research library is small compared to the others, and Lockheed is, of course, a private corporation, whereas the other sources represent public and university libraries. Co- lumbia, on the other hand, is a large university library; however, the fig- ures given are from a study of cataloging of science monographs, which may be more time-consuming. As these cataloging cost figures will be used only as a point of compari- son with subsequent operations, it is not necessary to further resolve the apparent differences. The average time for the five sources is 41.4 minutes. Assuming that a cataloger currently earns $4.50 an hour, the average cost for the five sources would be $3.11 for the unit cost of cataloging. 1:-0 1:-0 Table 3. Processing Cost Elements ~ ....... c Columbia University New York Public Sacramento State ~ '"'t 5 ...... 1. Card production 1. Receive and distribute plan- 1. Type master cards from c -2. Card set completion ning sheets handwritten slips ~ .... 3. Sorting and preliminary 2. Type headings for added en- 2. Produce subject cross refer- <:l"' a filing tries and subject entries ence cards '"'t <.s::: 4. Shelf listing 3. Mark designators and sort 3. Maintain guide cards > 5. Typing of book pockets completed cards 4. Card production and pur- ~ ..... 6. Filing 4. Distribute cards to filing chase c ~ section 5. Complete card sets ~ ..... 5. Paint edges of cards when 6. Proof .... c ;:s required 7. Alphabetize 6. Glue and separate batches 8. File and revise < c 7. Type masters for offset print- 9. Card shifting !'""" to ing 10. Update existing cards ........ 8. Prepare copy for Itek masters 11. Correction of problems 11:>- 9. Check format of entry on 12. Withdrawals t:) (1) masters 13. Weed order slips (') (1) 10. Check letter for letter on 14. Assembly of statistics s 0" planning sheet 15. File temporary slips (1) 11. Gather statistics and keep 16. File permanent slips u'"t ...... log of card preparation 17. Shelf list shifting co ~ 12. Prepare Itek masters and 18. Blank catalog card stock co print cards on offset 13. File Catalog Cost Factors/DOLBY and FORSYTH 227 CARD PROCESSING COSTS If cataloging is the least likely part of the library operation to be auto- mated in the near future, the procedures that immediately follow cata- loging are precisely opposite in character. Card preparation, production, and filing all involve time-consuming routine operations that can be done automatically, thus relieving the library community of a significant pro- portion to man-hours to apply to problems of greater intellectual content. Cost factors must nonetheless be considered. As with cataloging, description of basic cost elements will vary from one library to another. For the detailed breakdown in Table 3, use is again made of the Columbia and New York Public studies previously cited. Added to them is data from an unpublished study made available by Neil Barron of Sacramento State College Library. Barron's cost elements are given in finer detail than those in the other studies reported in this section. In Table 4, data from the New York Public Library and from the Sacra- mento State College Library have been grouped into three categories (preparation, production and filing) to achieve maximum compatibility with data from other sources reported in the table. These sources are: a study (7) at the University of Toronto of manual costs made in conjunc- tion with early machine methods; a comparative study ( 8) of manual methods and a special-purpose machine procedure at the Air Force Cam- bridge Research Laboratory Library; and results of three years of com- puterized card production at the Yale Medical Library (9). Costs shown in Table 4 are on a "per-card" basis, rather than on a title basis, as differing library requirements show averages ranging from 4.6 cards per title at Sacramento to 9.8 cards per title at New York Public. Most significant in Table 4 is the extraordinary agreement between two of the studies: the total processing costs amount to 23.2c per card and 23.5c per card for these two sources, even though the reports were pre- pared over a six-year period and include significant changes in the cost of labor and materials. Furthermore, these costs are reasonably constant for the individual categories in all three sources: card preparation varies from 11.4c per card to 11.6c per card; card production varies from 6.4c per card to 7.9c per card; and card filing varies from 4.2c per card to 5.2c per card. In one sense this close agreement should not be surprising. If it is indeed true that cataloging involves relatively high intellectual con- tent that is difficult to automate, and card processing involves straight- forward operations that are relatively easy to automate, it is reasonable to argue that the latter should show much less variability from one opera- tion to the next. The fact that the New York Public operation has significantly higher costs can be partially explained by the following observations. The NYPL costs are based on the supposition that all cards are locally produced. The to to 00 ....... 0 ~ Table 4. Comparative Costs of Card Processing ~ -0 - Date 1968 1969 1965 1963 1968 ~ .... ~ i:S Library NYPL sse ONULP AFCRL CH¥ ~ > Cards per title (9.8) ( 4.6) (-9) (7) (9.3) ~ ..... 0 ~ (local) (LC (machine) ~ ..... .... 0 cat.) ~ Preparation 0.140 0.116 0.114 0.088 < } 0 !"""" 0.233 0.166 0.075 to Production } 0.064 0.079 ~ 0.186 tJ Filing 0.052 0.042 0.043 0.043 0.043 - (b () (b Totals 0.336 0.232 0.235 0.276 0.209 0.118 9 - C"' ~ (b ~~ 0.228 ~ co 0':> co t Catalog Cost Factors/ DOLBY and FORSYTH 229 other libraries indicate that a significant proportion of their work is based on the acquisition of LC cards. The breakdown for the AFCRL study is shown in Table 4 and the breakdown for Sacramento is approximately the same. Secondly NYPL is clearly the largest of the operations under consideration here, and it is not unreasonable to expect that the size of the file will have an effect on the cost of filing. In fact, assuming that the NYPL cost of preparation and production is the same as that for the AFCRL' s locally produced cards ( 27 .6c) and assigning the rest of the NYPL cost to filing, the latter figure becomes 10.3c per card, or a little more than twice the average for the other three sources ( 4.8c per card). If this is the case, it would be of interest to know whether the problem is one of sheer size of the catalog or rather one of increased den- sity that naturally occurs in larger files. E.g., is it more costly to file "Smith, Adrian J." in a file with 100 Smith's or 1000 Smith's? Finally, in the two cases of partial automation ( AFCRL and Yale) the cost of card preparation and production is significantly lower (7.5c and 8.8c) than that indicated for LC cards ( 16.6c), or the average for the three closely agreeing sources ( 23.2c). This observation alone should point the library community strongly towards automation of the card processing function. Nor is this observation new; both authors of the preliminary studies at AFCRL and Yale made the point more than adequately. Fur- thermore, as will be demonstrated shortly, the cost of filing is also reduced in an automated system. Several factors may be contributing to the slowness of the library com- munity to introduce changes to achieve such cost savings. First, there is inevitably a substantial initial cost involved in any automation project. Second, although the potential cost saving is a substantial proportion of the processing cost, it is still small when compared to the cost of catalog- ing; a librarian under pressure to reduce costs could gain more by cutting back on the time allowed for cataloging without the initial investment necessary for automation. Third, there is a persistent difficulty in finding trained personnel in the automation field. Finally, librarians are certainly aware of the rapid changeover in equipment in the computing field with the concomittant costs of adapting programs to new equipment. CASE AND SPACE The preceding discussion has provided some notion as to the cost of obtaining the required cataloging information, encoding it on catalog cards, and entering those cards in a catalog file. These costs can be compared with other possible approaches to the problem, including those that in- volve some degree of automation. There are, of course, a number of asso- ciated costs that must be taken into account to obtain a full picture of the cost of card cataloging. They would include, at a minimum, the cost of the space occupied by the catalog, the purchase price of catalog filing 230 Journal of Library Automation Vol. 2/4 December, 1969 cases, the cost to the user of consulting th~ catalog, and the cost to the library of maintaining the catalog in usable form. The allocation of capital expenditure costs to a form comparable to the costs per title and the costs per card used in the earlier sections of this rep01t raises certain difficulties. Accounting procedures vary from one in- stitution to another. Further there is the real but difficult-to-measure prob- lem of comparing funds of various types in a particular situation. None- theless, it is useful to know whether under any reasonable accounting system the cost of space and cabinets is of sufficient magnitude to make it worthwhile to consider these costs in the overall evaluation. Assuming, therefore, that a filing case capable of storing 72,000 cards fully packed costs $800 and occupies approximately 30 square feet of space, including room for aisles and access area, and further assuming that land and con- struction costs are approximately $30 per square foot, the total cost of the cabinet and the space it occupies would be approximately $1,700. Finally if it is assumed that on the average a catalog is approximately 60 percent full, the initial cost of space and case is approximately 4c per card. Four cents a card is not negligible, but it is only about 15 to 20 percent of the cost of producing the cards and an even smaller fraction of the total cost when cataloging is included. Hence, it seems reasonable to put this cost for space and case in the category of a secondary cost item that will favor book catalogs, microfilm catalogs, and other high-density forms. It is unlikely to be a determining factor unless other cost factors are very closely balanced. BOOK AND CARD CATALOGS: SOME RELATIVE ADVANTAGES Among the various cost factors involved in cataloging, the most difficult to assess objectively is the cost to the user. The problem is that no one really knows what a user does in a library, nor what impact a given change will have on its utility to him. Whether they like a card catalog or not, library users do consult it and it is thus a usable device for pro- viding access to library materials. Equally, many libraries in times past, and again more recently, have had book catalogs; and they also are viable devices. But which is better? A card catalog is updated by the simple expedient of entering recently obtained cards in the file. A book catalog is updated by periodically printed revisions. Hence any search for a particular item will in general require fewer specific searches in the card catalog than in the book cata- log, if the proper information is available to the searcher. Card catalogs are large and costly and there are few savings over the original cost in producing a second copy. Reproducing books after the first copy is rela- tively inexpensive. Libraries with many branches, or a decentralized set of users, will provide better service with book catalogs. The added cost of maintaining more than a few files is heavy with cards and light with books. Whether card or book catalogs are used, the existence of a machine Table 5. Comparative Conversion Costs Per Title Mar. 68 1968 1964 1968 1966 1964 1966 LC LACP ONLUP NYPL UC/ B CHY SUL 446 char. -450 char. 400 char. 300 char. 317 char. 243 char. 180 char. Coding/editing $0.169 - - - $0.0801 - $0.044 Keying 0.207 } $0.480 $0.307 } $0.450 0.188 $0.198 } 0.183 Re-keying 0.033 ) 0.030 } 0.117 ~ ~ 0.259 ~ Proofing 0.125 0.127 0.085 0.103 S" J 0 ~ Rental 0.156 0.084 0.6502 - 0 .036 0.037 ~ 0 0., ..... Conversion & List } 0.359 0.020 }o.096 0.046 0.020 0.024 } 0.1043 ~ ~ C) Edit List 0.084 0.141 ..... - - 0 ~ 0., Sort & Merge 0.165 0.121 '-. - - - - t::l 0 Supplies 0.080 0.036 0.5084 - - - 0.033 r-c t:d Supervision 0 .183 - 0.580 - - - - ....::: ~ ~ 0... 1 Includes provision for keypunch rental, and supplies "Tj 0 ::::0 2Full keypunch rental absorbed by pilot project U) ....::: ..., 3Includes use of automatic error-detection routines ::r:: 4Includes cost of magnetic tapes and other supplies t--:l CN 1-' 232 Journal of Library Automation Vol. 2/4 December, 1969 readable catalog provides much greater flexibility as time goes on. Revi- sions of cataloging practice become much simpler if the revisions can be programmed on a computer. In sum, machine readable book catalogs appear less advantageous than card catalogs only when immediate updating is the primary criterion for comparison. COMPARATIVE COSTS OF CATALOG CONVERSION Table 5 (an extension and revision of Table 7 appearing on page 42 of Reference 1) gives comparative conversion costs for three public li- braries (Library of Congress (10), New York Public Library and Los Angeles Public Library), the Library of the University of California at Berkeley, the Stanford Undergraduate Library (11), the Ontario New Universiti~s Library Project, and the Columbia-Harvard-Yale study. Al- though the data was gathered for the most part independently over a four- year period, it is worth making a number of internal comparisons to test for consistency. The most outstanding comparison is between the encoding costs for the Library of Congress and those for the Los Angeles Public Library. For records of essentially the same average length ( 446 characters versus 450 characters) the coding costs agree to the penny! Yet the methods of pro- duction are significantly different. The Library of Congress invested heav- ily in the coding and editing operation and used paper tape typewriters with their relatively high rental. As a result its costs in this area are sig- nificantly higher than those for LACP. On the other hand these proce- dural changes resulted in significantly lower keying costs, so that the over- all cost for encoding was the same. The encoding costs of UC/B, CHY, and SUL are all very close (with- in three cents per title) even though there is a fair range of record size (from 180 for SUL to 317 for UC/B). These three studies probably pro- vide a more reasonable picture of the underlying variation in cost than the unusually close figures for LC and LACP. As a further test of consistency, average cost is plotted against average record length (in characters per record) in Figure 1. The rightmost points are for LC and LACP, and the line is simply drawn through the origin (zero dollars, zero cost) and those points. The points of UC/ B, CHY and SUL cluster about the center of the line. Following is an interpreta- tion of the other points charted. The NYPL point of $.45 for a 300-character record is not based on ac- tual NYPL experience, but rather on a study of information from other investigations. Its proximity to the line suggests that NYPL's analysis of existing information reaches a conclusion similar to that of this paper. The average encoding cost used to plot the ONULP point does not contain the full rental charge reported in the ONULP study, because the entire cost of keyboard rental was charged against the project although Catalog Cost Factors/ DOLBY and FORSYTH 233 100 § ....l 80 8 z LC LACP ..... lzl ....l ONULP • H ..... 60 H ~ lzl ll< H "' 0 u • NYP '-' 40 z SUL • • UC/B H Q .cHY 0 u ~ 20 HUL . O~-----.lON0------~2~00~----~J~00~-----4~00~----~j~00~----~6~00 AVERAGE RECORD LENGTH IN CHARACT ERS Fig. 1. Encoding Costs per Title as a Function of Average Record Length. the machines were only partially utilized. The point for Harvard Univer- sity Library ( HUL) is based on information received in a private com- munication. Although there is a significant amount of variation from one study to another it seems reasonable to conclude that the cost of encoding is ap- proximately $.15 per title per hundred characters. The cost of computation is not as well-documented as the cost of con- version. Studies that reported computer costs all include the following three operational costs: The first is the cost of conversion and listing. This cost includes the cost of converting the original machine readable form (be it cards or paper tape) to magnetic tape form. In most cases a by- product of this operation was a listing (all-caps only) of the material on the tape. The second is the cost of an edit run, including a listing in upper- and lower-case. The latter was eschewed in a number of cases because of the added costs. However, many libraries would require a proper edit run and many librarians would prefer to edit from an upper/ lower-case printout than from an all-caps printout. 234 Journal of Library Automation Vol. 2/4 December, 1969 The third is the cost of sorting and merging the tapes. Many of the early studies did not explicitly report on this cost because they were pri- marily concerned with the cost of converting the retrospective list. How- ever, in an on-going operation this would be a continuing cost of some magnitude. The available information points to a uniform cost of approximately $.02 per record for conversion and list, and approximately $.08 per record for editing. The two studies where both these costs are given indicate that a ratio of 4 to 1 is appropriate. The only study giving a ratio between the sort and merge operation and the edit operation is the NYPL study and this is based on before-the-fact-information only; the ratio is approxi- mately 8 to 7. For convenience, one can assume that this ratio is unity, giving an overall ratio of 4-4-1. The most complete history of total com- puter cost is given by LC: a total of $.36 per record for 446 character records. Applying the above ratio to the LC total yields a breakdown of $.04 for conversion and list, $.16 for editing, and $.16 for sort and merge. Extending the Stanford cost of $.12 for conversion and list and editing gives a total cost for SUL of $.22 for its 180 character records. This figure is considerably more than 180/ 446 parts of the LC cost. One other pertinent piece of information is available from the SUL data. In the production of the annual catalog, Stanford estimates a cost of $.121 per title for what is roughly comparable to the cost of sort and merge. This cost is then roughly 1.2 times the SUL cost for conversion and list and editing, verifying the notion that the cost of "sort and merge" is of the same general magnitude as the cost of editing. The ratios of SUL costs to LC for encoding are .367/.690 = .532 and .225/.359 = .625 for computer time. This suggests that the means of com- puting average record length may be different for the two institutions. Taking the LC figures as the standard and assuming that both computing and encoding costs are strictly a function of record length, the SUL rec- ord length should be between .532x446 = 238 and .625x446 = 279. This discrepancy may be a result of one source (presumably LC) counting all delimiter and other non-printing characters while the other does not. NYPL indicates that the ratio of printed characters to total characters is approximately 3:4. If the SUL figure of 180 is expanded by one third, one obtains the figure of 240 which agrees well with the lower limit (based on encoding costs) given above. The cost of sort and merge is a function of the size of the data base, not the amount of material being put into it. The Library of Congress points this out in its study ( 11) and report on an average month (where the data base grows for a period and then is reduced to zero.) Stanford Undergraduate Library figures are based on its second year of operation, in which 16,000 titles were added to form a total base of 41,000 titles. The actual cost of this step in the operatiQn will therefore depend strongly on the operating strategy employed. Clearly, the number of times one Catalog Cost Factors/DOLBY and FORSYTH 235 has to sort and merge the entire data base should be minimized, particu- larly taking into account the fact that sorting costs go up faster than linearly. If the master file is arranged in n orders (author, subject, title, class number, etc.), it will generally be less expensive to sort the updat- ing material into those n orders and make n merge runs with the sorted master files than to make a single merge with a single ordering of the master file and then sort the master file n times to obtain the required updated orderings of the master file. MANUAL AND COMPUTER PROCESSING: COMPARATIVE COST One objective of this paper is to define factors whose costs enter into calculations of relative costs of manual and computer processing of cata- log information and to report these factor costs. The following paragraphs present a simplified comparison of actual costs of manual and machine processing for a "typical" library characterized by average costs approxi- mating those in the preceding tables. Table 5 yields average figures for two cases: catalogs with approxi- mately 425 characters per entry and catalogs with approximately 250 characters per entry; they may be called "full entries" and "short entries," respectively. From Table 4, it is possible to compute similar figures for "full catalogs" and "short catalogs" by clustering the three larger cases (those having 9.8, 9.0, and 7.0 cards per title) and the three smaller cases (those having 3.0 and 4.6 cards per title). For the full catalogs the average cost of proc- essing is 26.7c per card and 8.6 cards per title, or a total cost of $2.29 per title. For the short catalogs the average cost of processing is 20.3c per card and 3.8 cards per title, or $0.78 per title. Combining these two sets of figures gives the results in Table 6. Table 6. Comparative Costs of Manual and Computerized Processing Short Full Entries Entries Manual $0.78 $2.29 Computer $0.84 $1.31 Table 6 shows that an hypothesized "typical" library would be slightly better off with manual methods if it chose the short form entries, and noticeably better off with the machine if it chose the full form of the entry. In making this quick comparison, consideration has not been given to several factors that should obviously be taken into account even in this simple example. First, there is not included either the initial cost of pro- gramming or the initial cost of converting the retrospective records. Either or both of these costs could be substantial, but as they are one-time costs and as libraries are basically long-term institutions, such costs should be written off over a relatively long period, even though they must be fi- nanced out of a given year's budget. 236 Journal of Library Automation Vol. 2/4 December, 1969 Second, the cost of printing the catalog is not included (assuming a book catalog is in fact to be used in the computerized system). Thus the comparison in Table 6 is between a card catalog and a catalog in ma- chine readable form. Such a comparison is complicated by the fact that a card once filed stays in the catalog indefinitely, subject only to long- term wear and tear and a certain rate of attrition due to unauthorized removal, misfiling, and so forth, whereas the machine readable catalog must be updated periodically and supplemented by interim publications. And, of course, the comparison is also complicated by the corresponding low cost of producing a number of copies of the book catalog where this is useful for a given system. However, to put the printing cost in some degree of perspective, one may make a quick calculation based on the production of a single book catalog using a standard upper- and lower-case print chain. At present commercially available prices this would cost between 35c and 50c per 10,000 characters, or approximately 9c per entry for the full form entries and 5c per entry for the short form entries (assuming four complete list- ings for author, title, subject, and class number listings). This added cost would make the comparison between manual and computerized methods even less favorable for the short form, but still substantially better for the long form entries $1.40 to $2.29). CONCLUSION It may be concluded that the card-processing operations in typical li- braries can be automated economically in many situations today. Librar- ies using the short form of a catalog and having no immediate need for multiple copies of the catalog may find it desirable to wait a year or two, depending upon their local situation, the availability of trained personnel and, of course, the availability of capital to finance the initial cost of pro- gramming and retrospective conversion. However, libraries using the full form in their catalogs, or those needing multiple copies of their catalogs, will almost certainly find that there is a substantial economic advantage to computerization at the present time. Even when allowance is made for substantial departures from the "typi- cal" costs found in this study, it is difficult to visualize any library using full form information not finding significant economic gains in computeri- zation. Considering the further advantages of the greater flexibility available in machine readable records, the increased services that can be offered to the user, and the fact that machine costs are decreasing while labor costs are increasing, one is led to the conclusion that more and more libraries will move towards catalog automation. Tables 7 to 11 appearing on the following pages are reference tables for calculating costs. Catalog Cost Factors/DOLBY and FORSYTH 237 ACKNOWLEDGMENTS The work reported in this paper was supported by the U . S. Office of Education under Contract Number OEC-9-8-00292-0107. Mrs. Henriette Avram (Library of Congress) and Mr. Neil Barron (Sac- ramento State College, Sacramento, California) made important contribu- tions of cost figures and other technical data used in this report. Various State Libraries supplied detailed cost information. BIBLIOGRAPHY A 400-item bibliography on cost and automation is available from the National Auxiliary Publication Service of ASIS (NAPS 00696). REFERENCES 1. Dolby, J. L.; Forsyth, V. J.; Resnikoff, H. L.: Computerized Library Catalogs: Their Growth, Cost and Utility (Cambridge, Mass.: M.I.T. Press, 1969). 2. Dougherty, Richard M.: "Cost Analysis Studies in Libraries : Is There a Basis for Comparison," Library Resources and Technical SeTvices, 13 (Winter 1969), 136-141. 3. Kozumplik, William A.: "Time and Motion Study of Library Opera- tions," Special LibraTies, 58 (October 1967), 585-588. 4. Henderson, J. W.: Rosenthal, J. A.: Libmry Catalogs: TheiT Preserva- tion and Maintenance by Photographic and Automated Techniques (Cambridge, Mass. : M. I.T. Press, 1968). 5. Fasana, Paul J.; Fall, James E. : "Processing Costs for Science Mono- graphs in the Columbia University Libraries," Libmry Resources and Technical Services, 11 (Winter 1967), 97-114. 6. MacQuarrie, Catherine: "Cost Survey: Cost of Ordering, Cataloging, and Preparations in Southern California Libraries," Library Resources and Technical Services, 6 (Fall 1962), 337-350. 7. Bregzis, Ritvars: "The ONULP Bibliographic Control System: An Evaluation," In University of Illinois Graduate School of Library Sci- ence: Proceedings of 1965 Clinic on Library Applications of Data Processing (Urbana: University of Illinois, 1966), pp. 112-140. 8. Fasana, Paul J.: "Automating Cataloging Functions in Conventional Libraries," Libmry Resources and Technical Services, 7 (Fall 1963), 350-365. 9. Kilgour, Frederick G.: "Costs of Library Catalog Cards Produced by Computer," Journal of Libmry Automation,. 1 (June 1968), 121-127. 10. Avram Henriette: The MARC Pilot Pro;ect (Final Report on a proj- ect sponsored by Library Resources: Chapter VIII: "Cost Models" (Washington, D. C.: Library of Congress, 1968). 11. Johnson, Richard D.: "A Book Catalog at Stanford," Journal of Libmry Automation, 1 (March 1968), 13-50. to Table 7. Cost/ Card-Library of Congress Catalog Cards (July 1968) co 00 Extra ..... c ~ All chgs/title -t 5 titles aU orders ...... 1st cd of 3 Add'l copies same specific Subsc for lacking .a LC Cards Ordered by/for 1-2 cds only or more order cd ordered same tm. subject all cds req info l:"'t .... <:3-' 1) LC # $ .22 $ .10 $ .06 *- *-- ~ ~ > 2) Author & Title .27 .15 .06 ~ - - 0 ~ 3) Series - .10 .06 - - 1:::. ..... s· ;::s 4) Subject -~---- .10 .06 --- -- <: 0 5) Chinese/ Japanese/ Korean .22-.27 .10-.15 .06 .04 $ .04 !'"""' - to .......... .;:.. 6 ) Motion Pictures & .22-.27 .10- .15 .06 .10 .04 Filmstrips tj (!) (") (1) 7) Phonorecords .22- .27 .10- .15 .06 .10 .04 3 0"' (1) 8) Revised & Cross Ref. .04 ~'"i - - - -- --- ...... co "' 9) Anonymous $ .04 co Source-LC cds, July 1968 Table 8. Catalog Card Costs Cards Cost/ Card Cost/ Hour Time LC Cards $.22-.27 (min order 1-2 cds) } $.04 extra chg all .10-.15 ( 1st cd-3 or more order) orders lacking .04-.06 ( add'l copies same cd-same order ) req. info. Blank Cards < 3-< 4 for $.01 (J ~ s- Original Card ..... c ()'Q Prepantion $.20-2.34 $2.40-4.70 5-30 min/ cd (J c «> .... Card Checking ~ Before Filing $.21 $4.20 3 min/ cd ~ C") 8' ~ Correcting «> .......... Detected $.12 $2.40 3 min/ cd tj 0 Errors t"'' t:P File $.024 $2.40 100 cds/ hr ~ ~ .03 3.00 100 cds!hr l:l p.. .047 4.71 100 cds/ hr "%j 0 Store $.01 ~ rJ:J ~ Reproduce $.0023-.00208 ( AB Dick Offset Press = $.125/bk( 54-60 cds ) ::I: .045 ( Xerox-1K-100K cds ) 1:-0 c.:> tD 240 Journal of Library Automation Vol. 2/4 December, 1969 Table 9. (Estimated) Annual Cost of 1000 Sq Ft of Storage Space 1)" Minnesota State Dept. of Education ( 1968 )-$520 "Source-Private communication 2) R&D Estimate 04 1968 Construction Cost $30 sq ft x 1000 sq ft - $30,000 100 yrs (life of bldg) +Maintenance Costs, clean up, etc. ($1 yr/sq ft) $50,000 197 4 Construction Cost $50 sq ft x 1000 sq ft - 100 yrs (life of bldg) +Maintenance Costs, clean up, etc. ($1 yr/sq ft) ""Source-E. Graziano, Univ. Calif. at Santa Barbara Table 10. Card Catalog Cost/ Year - $ 300/yr $ 1000 - $ 1300/ yr - $ 500/yr $ 1000 - $ 1500/yr Given the following variables, 1 card catalog case with a maximum card capacity of 72,000 cards (purchase price-$789) -the cost/ card to store would be $.01. Estimated Construction Cost Cost sq ft $30/sq ft Maintenance Rental@ --;- 100 yrs Est. $.42 sq ft/mo life bldg @ $1/sq ft Cost/ Yr Cabinet ( 6 sq ft) $30.24 $1.80 $ 6.00 $ 38.04 Room for Users ( 16 sq ft) 80.64 4.80 16.00 101.44 Aisles ( 3 sq ft) 15.12 .90 3.00 19.02 Catalog Table ( 5 sq ft) 25.20 1.50 5.00 31.70 $190.20 + 72,000 cards @ $ .01 (to store) 720.00 TOTAL COST /YR $910.20 Catalog Cost Factors/ DOLBY and FORSYTH 241 Table 11. Card Catalog Maintenance Costs Estimated Requirement Space Cost/Sq Ft Cost/Mo Cost/Year Card Catalog Cabinet - 6 sq ft $ .42 $ 2.52 $ 30.24 Room for Users -16 sq ft 6.72 80.64 Aisles - 3 sq ft 1.26 15.12 Catalog Table - 5 sq ft 2.10 25.20 30 sq ft $12.60 $151.20 Source-E. Graziano, Univ. Calif. at Santa Barbara and R&D Consultants Co. 4667 ---- 242 MARC PROGRAM RESEARCH AND DEVELOPMENT: A PROGRESS REPORT Henriette D. AVRAM, Alan S. CROSBY, Jerry G. PENNINGTON, John C. RATHER, Lucia J. RATHER, and Arlene WHITMER: Library of Congress, Washington, D. C. A description of some of the research and development activities at the Library of Congress to expand the capabilities of the MARC System. Gives details of the MARC processing format used by the Library and then describes programming work in three areas: 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called GENESIS; and 8) information retrieval using the MARC Retriever. The MARC System was designed as a generalized data management sys- tem that provides flexibility in converting bibliographic descriptions of all forms of material to machine readable form and ease in processing them. The foundation of the system is the MARC II format (hereinafter simply called MARC), which reached its present form after many months of planning, consultation, and testing. Implementation of the system itself has required development of a battery of programs to perform the input, storage, retrieval, and output functions necessary to create the data base , for the MARC Distribution Service. These programs are essentially like those of the MARC interim system described in the report of the MARC pilot project ( 1). Briefly, they per- form the following tasks: MARC Research and Development/ AVRAM 243 1) A pre-edit program converts records prepared on an MT /ST to a mag- netic tape file of EBCDIC encoded record segments. 2) A format edit program converts the pre-edited tape file to a modified form of the MARC processing format. 3) A content edit program generates records in the final processing for- mat. At this stage, mnemonic tags are converted to numeric form, sub- field codes may be supplied, implicit fixed fields are set, etc. 4) IBM SORT program arranges validated content-edit output records by LC card number. This program is also used later in the processing cycle. 5) A generalized file maintenance program (Update 1) allows addition, deletion, replacement, or modification of data at the record, field, or subfield levels before the record is posted to the master file. A slightly different version (Update 2) is used to update the master file. 6) A print index program generates a list of control numbers for a given file. The list may also include status, date of entry, or date of last transaction for each record. 7) A general purpose print program produces a hardcopy to be used to proofread the machine data against the original input worksheet. Since the program is table controlled, it can be modified easily to yield a great variety of other formats and it can be extended routinely to handle other data bases in the MARC processing format. 8) Two additional programs select new records from the MARC master file and convert them from the processing format to the communica- tions format on both seven- and nine-track tapes for general distribu- tion. As the basic programs became operational, it was possible to investigate other aspects of the MARC System that would benefit from elaboration and refinement. Reports of some of this activity have found their way into print, notably a description of the MARC Sort Program and prelimi- nary findings on format recognition (2, 3), but much of the Library·s re- search and development effort in programming is not well known. The purpose of this article is to give a progress report on work in three sig- nificant areas : 1) automatic tagging of data elements by format recogni- tion programs; 2) file analysis by a statistical program called GENESIS; and 3) information retrieval using the MARC Retriever. In the following descriptions, the reader should bear in mind that all of the programs are written to accommodate records in the MARC proc- essing format. A full description of the format is given to point up differ- ences between it and the communications format. All of the programs are written in assembly language for the IBM S360/ 40 functioning under the disk operating system (DOS ) . The machine file is stored on magnetic tape and the system is operated in the batch mode. At present, the programs described here are not available for general distribution, but it is expected that documentation for some of them may 244 Journal of Library Automation Vol. 2/4 December, 1969 be filed with the IBM Program Information Department in the near fu- ture. Meanwhile, the Library of Congress regrets that it will be unable to supply more detailed information. It is hoped that the information in this article will answer most of the questions that might be asked. MARC PROCESSING FORMAT The MARC data base at the Library of Congress is stored on a nine- channel magnetic tape at a density of 800 bpi. The file contains records in the undefined format; each record is recorded in the MARC processing format (sometimes called the internal format). Data in the processing format are recorded in binary, packed decimal, or EBCDIC notation de- pending on the characteristics of the data and the processing required. The maximum length of a MARC processing record is 2,048 bytes. The magnetic tape labels follow the proposed standard developed by Sub- committee X3.2 of the United States of America Standards Institute. A MARC record in the processing format is composed of six parts: record leader ( 12 bytes), communications field ( 12 bytes), record control field ( 14 bytes), fixed fields (54 bytes), record directory (variable in length, with each directory entry containing 12 bytes) and variable data fields (variable length). All records are terminated by an end-of-record ( EOR) character. Record Leader 0 1 2 4 5 6 7 Record l ength Element number 1 2 Date YY : MM :nn Status Not Record used type I Number Character Name of position characters in record Record length 2 0-1 Date 3 2-4 8 9 11 Bibliographic Not level used Definition Total number of bytes in the logi- cal record including the number of bytes in the record length itself. It is given in binary nota- tion. Date of last transaction (i.e., the date the last action was taken upon the whole record or some part of the record). The date is recorded in the form of MARC Research and Development/ A VRAM 245 3 4 5 6 7 Status 1 Not used 1 Record type 1 Bibliographic 1 levels Not used 3 Communications Field 12 n 14 15 16 Record Directory Record directory entry source location COlUlt 17 YYMMDD, with each digit be- ing represented by a four-bit binary-coded decimal digit packed two to a byte. 5 A code in binary notation to indicate a new, deleted, changed, or replaced record. 6 Contains binary zeros. 7 An EBCDIC character to iden- tify the type of record that fol- lows (e.g., printed language material) . 8 An EBCDIC character used in conjunction with the record type character to describe the com- ponents of the · bibliographic record (e.g., monograph). 9-11 Contains binary zeros. 18 19 20 2~ Record In- In- Not destination process process u sed type status Element number Number Character N arne of position Definition characters in record 1 Record directory 2 location 2 Directory entry 2 count 3 Record source 1 12-13 The binary address of the record directory relative to the first byte in the record (address zero). 14-15 The number of directory entries in the record, in binary notation. There is one directory entry for every variable field in the record. 16 An EBCDIC character to show the cataloging source of the record. 246 Journal of Library Automation Vol. 2/4 December, 1969 4 Record 1 17 An EBCDIC character to show destination the data bank to which the rec- ord is to be routed. 5 In-process 1 18 A binary code to indicate the type action to be performed on the data base. The in-process type may signify that a new record is to be merged into the existing file; a record currently in the file is to be replaced, deleted, modi- fied in some form; or that it is verified as being free of all error. 6 In-process 1 19 A binary code to show whether status the data content of the record has been verified. 7 Not used 4 20-23 Contains binary zeros. Record Control Field 24 I ! I I 'i'i I I Libr~ry of Con~ess cata~og card nymber 1 Supplement 1 number Not used Segment number Element number 1 Number Character Name of position Definition characters in record Library of 12 Congress catalog card number 24-35 On December 1, 1968, the Li- brary of Congress initiated a new card numbering system. Numbers assigned prior to this date are in the "old, system; those assigned after that date are in the "new, system( 4). The Li- brary of Congress catalog card number is always represented by 12 bytes in EBCDIC notation but the data elements depend upon the system. MARC Research and Development/ AVRAM 247 Old numbering system Prefix 3 24-26 An alphabetic prefix is left justi- fied with blank fill; if no prefix is present, the three bytes are blanks. Year 2 27-28 Number 6 29-34 Supplement 1 35 A single byte in binary notation number to identify supplements with the same LC card number as the original work. New numbering system Not used 3 24-26 Contains three blanks. Initial 1 27 Initial digit of the number. digit Check digit 1 28 "Modulus 11, check digit. Number 6 29-34 Supplement 1 35 See above. number 2 Not used 1 36 Contains binary zeros. 3 Segment 1 37 Used to sequentially number the number physical records contained in one logical record. The number is in binary notation. Fixed Fields I ~ J { 911 The fixed field area is always 54 bytes in length. Fixed fields that do not contain data are set to binary zeros . . Data in the fixed fields may be recorded in binary or EBCDIC notation, but the notation remains con- stant for any given field. 248 Journal of Library Automation Vol. 2/ 4 December, 1969 Record Directory 92 94 95 96 98 99 100 101 102 103 Ta g Site Not Action Data Relative number used code length address Element Number Character number Name of position characters in record Definition 1 Tag 3 92-94 An EBCDIC number that iden- tifies a variable field. The tags in the directory are in ascending order. 2 Site number 1 95 A binary number used to distin- guish variable fields that have identical tags. 3 Not used 3 96-98 Contains binary zeros. 4 Action code 1 99 A binary code used in file main- tenance to specify the field level action to be performed on a rec- ord ( i.e., added, deleted, cor- rected, or modified). 5 Data length 2 100-101 Length (in binary notation) of the variable data field indicated by a given entry. 6 Relative 2 102-103 The binary address of the first address byte of the variable data field relative to the first byte of the record (address zero). 7 Directory end 1 n Since the number of entries in of field the directory varies, the charac- sentinel ter position of the end-of-field terminator ( EOF) also varies. MARC Research and Development/ AVRAM 249 Variable Data Fields Indicator(s) Delimiter Sub field Delimiter Data < $ Terminator code code( s) Element Number Character number Name of position 1 2 3 4 5 6 characters in record Indicator Variable Delimiter 1 Subfield Variable code Delimiter 1 Data Terminator code Variable 1 n n n n n n ~ Definition A variable data field may be pre- ceded by a variable number of EBCDIC characters which pro- vide descriptive information about the associated field. A one-byte binary code used to separate the indicator ( s) from the subfield code( s). When there are no indicators for a var- iable field, the first character will be a delimiter. Variable fields are made up of one or more data elements ( 5). Each data element is preceded by a delimiter; a lower-case al- phabetic character is associated with each delimiter to identify the data element. These alpha characters are grouped. All vari- able fields will have at least one subfield code. Each data element in a variable field is preceded by a delimiter. All variable fields except the last in the record end with an end- of-field te1minator ( EOF); the last variable field ends with an end-of-record terminator (EOR). 250 Journal of Library Automation Vol. 2/4 December, 1969 FORMAT RECOGNITION The preparation of bibliographic data in machine readable form involves the labeling of each data element so that it can be identified by the ma- chine. The labels (called content designators) used in the MARC format are tags, indicators, and subfield codes; they are supplied by the MARC editors before the data are inscribed on a magnetic tape typewriter. In the current MARC System, this tape is then run through a computer pro- gram and a proofsheet is printed. In a proofing process, the editor com- pares the original edited data against the proofsheet, checking for errors in editing and keyboarding. Errors are marked and corrections are re- inscribed. A new proofsheet is produced by the computer and again checked for errors. When a record has been declared error-free by an editor, it receives a final check by a high-level editor called a verifier. Verified records are then removed from the work tape and stored on the master tape. The editing process in which the tags, indicators, sub:field codes, and :fixed :field information are assigned is a detailed and somewhat tedious process. It seems obvious that a method that would shift some of this editing to the machine would in the long run be of great advantage. This is especially true in any consideration of retrospective conversion of the 4.1 million Library of Congress catalog records. For this reason, the Li- brary is now developing a technique called "format recognition." This technique will allow the computer to process unedited bibliographic data by examining the data string for certain keywords, significant punctuation, and other clues to determine the proper tags and other machine labels. It should be noted that this concept is not unique to the Library of Con- gress. Somewhat similar techniques are being developed at the Univer- sity of California Institute of Library Research ( 6) and by the Bodleian Library at Oxford. A technique using typographic cues has been de- scribed by Jolliffe ( 7 ) . The format recognition technique is not entirely new at the Library of Congress. The need was recognized during the development of the MARC II format, but pressure to implement the MARC Distribution Service pre- vented more than minimal development of format recognition procedures. In the current MARC System a few of the fields are identified by ma- chine. For example, the machine scans the collation statement for key- words and sets the appropriate codes in the illustration fixed field. In gen- eral, however, machine identification has been limited to those places where the algorithm produces a correct result 100 percent of the time. The new format recognition concept assumes that, after the unedited record has been machine processed, a proofsheet will be examined by a MARC editor for errors in the same way as is done in the current MARC System. Since each machine processed record will be subject to human review, it will be possible to include algorithms in the format recognition program that do not produce correct tagging all of the time. MARC Research and Development/ AVRAM 251 The format recognition algorithms are exceedingly complex, but a few examples will be given to indicate the nature of the logic. In all the ex- amples, it is assumed that the record is typed from an untagged manu- script card (the work record used as a basis for the Library of Congress catalog card) on an input device such as a paper tape or a magnetic tape typewriter. The data will be typed from left to right on the card and from top to bottom. The data are input as fields, which are detectable by a program because each field ends with a double carriage return. Each field comprises a logical portion of a manuscript card; thus the call num- ber would be input as a single field, as would the main entry, title para- graph, collation, each note, each added entry, etc. It is important to note that the title paragraph includes everything through the imprint. Identification of Variable Fields Call Number. This field is present in almost every case and it is the first field input. The call number usually consists of 1-3 capital letters followed by 1-4 num- bers, followed by a period, a capital letter, and more numbers. There are several easily identifiable variations such as a date before the period or a brief string of numbers without capital letters following the period. The delimiter separating the class number from the book number is inserted according to the following five-step algorithm: 1) If the call number is LAW, do not delimit. 2) If the call number consists simply of letters followed by numbers (possibly including a period), do not delimit. Example: HF5415.13 If this type of number is followed by a date, it is delimited before the blank preceding the date. Example: HA12f 1967 3) H the call number begins with 'KF' followed by numbers, followed by a period, then: a) If there are one or two numbers before the period, do not delimit. Example: KF26.L354 1966a b) If there are three or more numbers before the period, delimit be- fore the last period in the call number. Example: KFN5225f.Z9F3 4) If the call number begins with 'CS71' do not delimit unless it contains a date. In this case, it is delimited before the blank preceding the date. Example: CS7l.S889f 1968 5) In all other cases, delimit before the last capital letter except when the last capital letter is immediately preceded by a period. In this latter case, delimit before this preceding period. Examples: PS3553.E73fW6 E595.F6fK4 1968 PZ10.3.U36fSp TX652.5f.G63 1968 Name Main Entry. The collation statement is the first field after the call number that can 252 Journal of Library Automation Vol. 2/4 December, 1969 be easily identified by analyzing its contents. The field immediately pre- ceding the collation statement must be the title paragraph. If there is only one field between the call number and the collation, the work is entered under title (tagged as 245) and there is no name main entry. If there are two or three fields, the first field after the call number is a name main entry (tagged in the 100 block). When three fields occur between the call number and collation, the second field is a uniform title (tagged as 240). Further analysis into the type of name main entry and the subfield code depends on such clues as location of open dates ( 1921- ) , date ranges covering 20 years or more ( 1921-1967), identification of phrases used only as personal name relators ( ed., tr., comp. ), etc. The above clues strongly indicate a personal name. Identification of an ordinal number preceded by punctuation and a blank followed by punctuation is strongly indicative of a conference heading. In the course of processing, delimiters and the appropriate subfield codes are inserted. Subfield code "d" is used with dates in personal names; subfield code "e" with relators. Example: MEPSfde Smith, John,f1902-1967,fed. Analysis for Fixed Fields Publisher is Main Entry Indicator. This indicator is set when the publisher is omitted from the imprint because it appears as the main entry. The program will set this indicator whenever the main entry is a corporate or conference name and there is no publisher in the imprint statement. This test will fail in the case where there is more than one publisher, one of which is the main entry, but occurrences of this are fairly rare (less than 0.2 percent). Biography Indicator. Four different codes are used with this indicator as follows: A = indi- vidual autobiography; B = individual biography; C = collected biography or autobiography; and D = partial collected biography. The "N' code is set when 1) "autobiographical", "autobiography", "memoirs", or "diaries" occurs in the title statement or notes, or 2) the surname portion of a personal name main entry occurs in the short title or the remainder of the title subfields. The "B" code is set when 1 ) "biography" occurs in the title state- ment, 2) the surname portion of a personal name subject entry occurs in the short title or the remainder of the title subfields, or 3) the Dewey number contains a "B" or a 920. The "C" code is set when 1) "biographies" occurs in the title statement or 2) a subject entry contains the subdivision 'oiography." There appears to be no way to identify a "D" code situation. Despite this fact, the biography indicator can be set correctly about 83 percent of the time. MARC Research and Development/ AVRAM 253 Implementation Schedule Work on the format recognition project was begun early in 1969. The first two phases were feasibility studies based on English-language records with a certain amount of pretagging assumed. Since the results of these studies were quite encouraging, a full-scale project was begun in July 1969. This project is divided into five tasks. Task 1 consisted of a new examina- tion of the data fields to see if the technique would work without any pretagging. New algorithms were designed and desk-checked against a sample of records. It now seems likely that format recognition programs might produce correctly tagged records 70 percent of the time under these conditions. It is possible that one or two fixed fields may have to be sup- plied in a pre-editing process. Tasks 2 through 5 remain to be done. Task 2 will provide overall format recognition design including 1) development of definitive keyword lists, 2) typing specifications, 3) determination of the order of processing of fields within a record, and 4) description of the overall processing of a record. When the design is completed, a number of records will go through a manual simulation process to determine the general efficiency of the system design. Task 3 will investigate the extension of format recognition design to foreign-language titles in roman alphabets. Task 4 will provide the design for a format recognition program based on the results of Tasks 2 and 3 with detailed flowcharts at the coding level. The actual coding, check- out, and documentation will be performed as Task 5. According to cur- rent plans, the first four tasks are scheduled for completion early in 1970 and the programming will be finished later in the year. Outlook It is apparent that a great deal of intellectual work must be done to develop format recognition algorithms even for English-language records and still greater ingenuity will be required to apply these techniques to foreign-language records. Nevertheless, on the basis of encouraging results of early studies, there is evidence that the human effort in converting bibliographic records to machine readable form can be materially reduced. Since reduction of human effort would in tum reduce costs, the success of these studies will have an important bearing on the rate at which cur- rent conversion activities can be expanded as well as on the economic feasibility of converting large files of retrospective cataloging data. GENESIS Early in the planning and implementation of automation at the Library of Congress it became apparent that many tasks require information about the frequency of data elements. For example, it was helpful to know about the frequency of individual data elements, their length in charac- ters, and the occurrence of marks of punctuation, diacritics, and specified 254 Journal of Library Automation Vol. 2/4 December, 1969 character strings in particular data elements. In the past, most of the counting has been done manually. Once a sizable amount of data was available in machine readable form, it was worthwhile to have much of this counting done by computer. Therefore, the Generalized Statistical Program (GENESIS) was done as a general purpose program to make such counts on all forms of material in the MARC Processing Format on magnetic tape files. Any of a variety of counts can be chosen at the time of program execu- tion. There are three types of specifications required for a particular run of the program: selection criteria; statistical function specifications; and output specifications. Selection Criteria Record selection criteria are specified by statements about the various data fields that must be present in the records to be processed. Field se- lection criteria specify the data elements that will actually be analyzed. Processing by these techniques operates logically in two distinct stages: 1) the record is selected from the input file; i.e., the program must deter- mine if a particular record is to be included in the analysis; and 2) if the record is eligible, the specified function is performed on selected data fields. It should be noted that records may be selected for negative as well as positive reasons. The absence of a particular field may determine the eligibility of a record and statistical processing can be performed on other fields in the record. Record selection is optional; if no criteria are specified, all records on the input file will be considered for processing. Since both record selection and field selection reference the same ele- ments, specifications are input in the same way. Selection of populations can be designated by tagging structure (numeric tags, indicators, subfield codes or any combination of these three), specified character strings, and specified characters in the bibliographic data. The following queries are typical of those that can be processed by GENESIS. How many records with an indicator set to show that the volume contains biographic infor- mation also have an indicator set to show that the subject is the main entry? How many records with a field tagged to show that the main entry is the name of a meeting or conference actually have the words "meeting" or "conference" in the data itself? Table 1 shows the operators that can be used with record and field select statements. Statistical Function Specification The desired statistical function is specified via a function statement. Four functions have been implemented to date. They involve counts of occurrences of specified fields, unique data within specified fields given a range of data values, data within a specified range, and particular data characters. In addition to counting the frequency of the specified ele- ment, GENESIS calculates its percentage in the total population. MARC Research and Development/ A VRAM 255 Table 1. Operators of GENESIS Operator EQUALS NOT EQUAL GREATER THAN OR EQUAL TO LESS THAN OR EQUAL TO AND OR Example of usage Count all occurrences where data repre- sented by tag 530 EQUALS "Bound with" Count all occurrences where the publication language code is NOT EQUAL to "eng" Count all occurrences and output records that are GREATER THAN OR EQUAL TO 1,000 characters Count all occurrences of records entered on the MARC data base before June 1, 1968 (LESS THAN OR EQUAL TO 680601) Count all occurrences where the publication equals "s" AND the publication date is greater than or equal to 1960 Count all occurrences of personal name main entry (tag 100) a relator ( subfield code "e") that equals "ed." OR "comp." The first function counts occurrences per record of specified field selec- tion criteria. This answers queries concerning the presence of given con- ditions within the selected records; for example, a frequency distribution of personal name added entries (tag 700). This type of count results in a distribution table of the number of records with 0 occurrences, 1 occur- rence, 2 occurrences, and so forth. The second function, which counts occurrences of unique data values within a specified range, answers queries when the user does not know the unique values occurring in a given field, but can state an upper and lower value. For example, the specific occurrences of publishing dates between 1900 and 1960 might be requested. The output in response to this type of query consists of each unique value, lying within the range specified, with its frequency count. In addition, separate counts are given for values less than the lower bound and of values greater than the upper bound. The function is performed by maintaining in computer memory an or- dered list of unique values encountered, together with their respective counts. As selected fields are processed, each new value is compared against the entries in the list. If the new value already appears in the list, its corresponding count is incremented. Otherwise, the new value is inserted in the list in its proper place and the remainder of the list is pushed down by one entry. The amount of core storage used during a 256 Journal of Library Automation Vol. 2/ 4 December, 1969 particular run is directly related to the number of unique occurrences appearing within the specified range. Since the length of each entry is determined by the length of the bounds specified, the number of entries which can be held in free storage can vary from run to run. Thus it is possible that the number of unique entries may fill memory before a run has been completed. When this happens, the value of the last entry in the list will be discarded and its count added to the "greater than upper bound" count. In this way, while the user may not obtain every unique value in the specified range, he will obtain all unique values from the lower bound which can be contained in memory. He is then in a position to make subsequent runs using, as a beginning lower bound value, the highest unique value obtained from the preceding run. The third function processes queries concerning counts within specified ranges. When this function is used, unique values are not displayed. In- stead, the occurrences are counted by specified ranges of values. More than one range can be processed during a single run. On output, the pro- gram provides a cumulative count of values encountered within each range as well as the counts of those less than and those greater than the ranges. Function four counts occurrences of particular data characters. An indi- vidual character may be specified explicitly or implicitly as a member of a group of characters. This allows the counting of occurrences of various alphabetic characters within specified fields. The current list of character classes that can be counted are: alpha characters, upper-case letters, lower- case letters, numbers, punctuation, diacritics, blanks, full (all characters included in above classes), nonstandard special characters, and any par- ticular character using hex notation. It should be noted that there are various ways of specifying particular characters. For example, an "A" might be designated causing totals to accumulate for all alphabetics; or, a "U" and an "L" might be specified causing separate totals to be accumu- lated for upper- and 1ower-case characters. In addition to the total counts for each class, individual counts of characters occurring within any class can be obtained for display along with the total count. Output Specifications Formatted statistical information is output to the line printer. Option- ally, the selected records can be output on magnetic tape for later proc- essing. Limitations For the purpose of defining a query, more than one field may be speci- fied for record and field selection, using as many statements as necessary. At present, however, the statistical processing for a particular run is per- formed on all of the run-criteria collectively. For example, separate runs of the program are required to obtain each frequency distribution. It is important to note that GENESIS is essentially a means of making MARC Research and Development/ AVRAM 257 counts. The statistical analysis of data is a complex task that requires so- phisticated techniques. GENESIS does not have the capability to analyze data in terms of standard deviation, correlation, etc. but the output does constitute raw data for those kinds of analyses. Although the four func- tions of GENESIS implemented to date do not, in themselves, provide a complete statistical analysis, they greatly lessen the burden of counting; and techniques for designating data elements to be counted suffice to describe extremely complex patterns. Continued use of the program will no doubt provide guidelines for expansion of its functions. Use of the Program GENESIS has already provided analyses that are helpful in the design of automated procedures at the Library of Congress, as is indicated by the following instances. A frequency distribution of characters was made to aid in specifying a print train. An analysis of certain data characteristics has determined some of the specifications for the format recognition pro- gram described in an earlier section. GENESIS is providing many of the basic counts for a thorough analysis of the material currently being con- verted for the MARC Distribution Service to determine frequency pat- terns of data elements. The findings should be valuable for determining questions about storage capacity, file organization, and retrieval strategy. Although GENESIS is a new program in the MARC System, there is little doubt that it is a powerful tool that will have many uses. MARC RETRIEVER Since the MARC Distribution Service has been given the highest pri- ority during the past two years, the emphasis in the implementation of the MARC System has been on input, file maintenance, and output with only minimum work performed in the retrieval area. It was recognized, moreover, that as long as MARC is tape oriented, any retrieval system put into effect at the Library of Congress would be essentially a research tool that should be implemented as inexpensively as possible. It did seem worthwhile, however, to build retrieval capability into the MARC System to enable the LC staff to query the growing MARC data base. Query capability would answer basic questions about the characteristics of the data that arise during the design phases of automation efforts. In addition, it seemed desirable to use the data base in an operational mode to pro- vide some needed experience in file usage to assist in the file organization design of a large bibliographic data base. The specifications of the system desired were: 1) the ability to process the MARC processing format without modification; 2) the ability to query every data element in the MARC record, alone or in combination (fixed fields, variable fields, the directory, subfield codes, indicators); 3) the ability to count the number of times a particular element was queried, to accumulate this count, print it or make it available in punched card 258 Journal of Library Automation Vol. 2/4 December, 1969 form for subsequent processing; and 4) the ability to format and output the results of a query on magnetic tape or printer hardcopy. To satisfy these requirements it was decided to adapt an operational generalized information system to the specifications of the Library of Congress. The system chosen was AEGIS, designed and implemented by Programmatics, Inc. The modification is known as the MARC Retriever. General Description The MARC Retriever comprises four parts: a control program, a parser, a retrieval program, and a utility program. Queries are input in the form of punched cards, stacked in the core of the IBM S /360, and operated on as though all queries were in fact one query. Thus a MARC record will be searched for the conditions described by all queries, not by han- dling each query individually and rewinding the input tape before the next query is processed. The control program is the executive module of the system. It loads the parser and reads the first query statement. The parser is then activated to process the query statement. On return from the parser, the control program either outputs a diagnostic message for an erroneous query or assigns an identification number to a valid query. After the last query statement has been parsed, the control program loads the retrieval pro- gram and the MARC input tape is opened. As each record on the MARC tape is processed, the control program checks for a valid input query. If the query is valid, the control program branches to the retrieval program. On return from the retrieval program, the control program writes the record on an output tape if the record meets the specifications of the query. After the last MARC record has been read from the input tape, the control program branches to the retrieval program for final processing of any requested statistical function (HITS, RATIO, SUM, AVG) that might be a part of the query. The output tapes are closed and the job is ended. The parser examines each query to insure that it conforms to the rules for query construction. If the query is not valid, an error message is re- turned to the control program giving an indication as to the nature of the error. Valid query statements are parsed and converted to query strings in Polish notation, which permits mathematical expressions without parentheses. The absence of embedded parentheses allows simpler com- piler interpretation, translations, and execution of results. The retrieval program processes the query strings by comparing them with the MARC record data elements and the results of the comparison are placed in a true/false stack table. If the comparison result is true, output is generated for further processing. If the result is false, no action · takes place. If query expressions are linked together with "OR" or "AND'' connectors, the results in the true/false stack table are ORed and ANDed together resulting in a single true or false condition. MARC Research and Development/ AVRAM 259 The utility program counts every data element (fixed field, tag, indi- cator, sub field code, data in a variable field) that is used in a query statement. The elements in the search argument are counted separately from those in the output specifications. After each run of the MARC Re- triever, the counts can be printed or punched for immediate use, or they can be accumulated over a longer period and processed on demand. Query Language General. Query statements for the MARC Retriever must be constructed accord- ing to a precisely defined set of rules, called the syntax of the language. The language permits the formation of queries that can address any por- tion of the MARC record (fixed fields, record directory, variable fields and associated indicators and subfields). Queries are constructed by com- bining a number of elements: MARC Retriever terms, operators, fixed field names, and strings of characters (hereafter called constants). The following sections describe the rules for constructing a query and the query elements with examples of their use. Query Formation. A query is made up of two basic parts or modes: the if mode which specifies the criteria for selecting a record; and the list mode which speci- fies which data elements in the record that satisfy the search criteria are to be selected for printing or further processing. In general, the rules that apply to constructing if-mode expressions apply to constructing list-mode expressions except that the elements in the list mode must be separated by a comma. A generalized query has the following form: IF if-mode expression LIST list-mode expression; Where: IF if-mode expression LIST list-mode expression Signals the beginning of the if mode. Specifies the search argument. Signals the beginning of the list mode. Specifies the MARC record data element( s) that are to be listed when the search argu- ment specified in the if-mode expression is satisfied. The format of the query card is flexible. Columns 1 through 72 contain the query which may be continued on subsequent cards. No continuation indicator is required. Columns 73 through 80 may be used to identify the query if desired. The punctuation rules are relatively simple. One or more blanks must be used to separate the elements of a query and a query must be terminated by a semicolon. 260 Journal of Libmry Automation Vol. 2/4 December, 1969 Queries that involve fixed fields take the following form: IF fixed-field-name!= constant LIST fixed-field-name2 Where: fixed-field-namel constant fixed-field-name2 The name of fixed field. Any operator appropriate for this query. The search argument The fixed field to be output if a match occurs. To query or specify the output of a variable field, the following general expression is used. IF SCAN (tag= nnn) = constant LIST SCAN (tag= nnn); Where: SCAN tag nnn constant Indicates that a variable field is to be ref- erenced. Indicates that the tag of a variable field is to follow. The only valid operator. Specifies the tag of the variable field that is to be searched or output. Specifies the character string of data that is the search argument. The MARC Retriever processes each query in the following manner. Each record in the data base is read from tape into core and the data elements in the MARC Record specified in the if-mode expression are compared against the constant( s) in the if-mode expression. If there is a match, the data element( s) specified in the list-mode expression are output. Key Terms. The terms used in a query statement fall into two classes. The first group instructs the program to perform specified functions: SCAN, HITS, AVG, RATIO, SUM. The second group relates to elements of the record structure. The most important key terms in this class are: INDIC (indi- cator), NTC ( subfield code), RECORD (the entire bibliographic record), and TAG (variable field tag). These terms are used to define a constant; e.g., TAG= 100. Operators. Operators are characters that have a specific meaning in the query lan- guage. They fall into two classes. The first contains relational operators, such as equal to and greater than, indicating that a numeric relationship must exist between the data element in the MARC record and the search argument. The second class comprises the logical operators "and" and MARC Research and Development/ AVRAM 261 "or". The operators of the MARC Retriever are shown in Table 2. In the definitions, C is the query constant and D is the contents of a MARC record data element. Table 2. Operators of the MARC Retriever Operator Constan~s. > ;::: < ~ 1= & I Meaning C equals D C is greater than D C is greater than or equal to D C is less than D C is less than or equal to D C is not equal to D "and" (both conditions must be true) "or" (at least one condition must be true ) A constant is either a string of characters representing data itself (e.g., Poe, Edgar Allan) or a specific variable field tag, indicator( s), and sub- field code( s). Constants may take the following form: CC Where CC is an alphabetic or numeric character or the pound sign"#". When this form is used, the MARC Retriever will con- vert all lower-case alphabetic characters in the data element of the MARC record being searched to upper-case before a com- parison is made with search argument. This conversion feature permits the use of a standard keypunch that has no lower-case capability for preparation of queries. 'CC' Where CC can be any one of the 256 characters represented by the hexadecimal numbers 00 to FF. This form allows non- alphabetic or nonnumeric characters not represented on the standard keyboard to be part of the search argument. When this form is used, the MARC Retriever will also convert all lower- case alphabetic characters in the data elements in the MARC record being searched to upper-case before a comparison is made. @CC@ Where CC can be any one of the 256 characters represented by the hexadecimal numbers 00 to FF. When this form is used, characters in the data element of the MARC record being searched will be left intact and the search argument must contain identical characters before a match can occur. # The pound sign indicates that the character in the position it occupies in the constant is not to take part in the comparison. For example, if the constant were #ANK, TANK, RANK, BANK would be considered matches. More than one pound sign can be used in a constant and in any position. 262 Journal of Library Automation Vol. 2/ 4 December, 1969 Specimen Queries. The following examples illustrate simple query statements involving fixed and variable fields. IF MCPDATE1 = 1967 LIST MCRCNUMB ; The entire MARC data base would be searched a record at a time for records that contained 1967 in the first publication date field ( MCP- DATE1). The LC card number (MCRCNUMB) of the records that satis- fied the search argument would be output. IF SCAN(TAG= 100) = DESTOUCHES LIST SCAN(TAG=245); The personal name main entry field (tag 100) of each MARC record would be searched for the surname Destouches. If the record meets this search argument, the title statement (tag 245) would be output. In addition to specifying that a variable field is to be searched, the SCAN function also indicates that all characters of the variable field are to be compared and a match will result at any point in the variable field where the search argument matches the variable field contents. For example, if the if-mode expression is SCAN(TAG = 100) =SMITH a match would occur on the following examples of personal name main entries (tag 100) : SMITH, JOHN; SMITHFIELD, JEROME; JONES-SMITH, ANTHONY. It is possible to include the indicators associated with a variable field in the search by augmenting the constant of the SCAN function as follows: IF SCAN(TAG = 100&INDIC = 10) = DESTOUCHES LIST SCAN(TAG = 245); Where: INDIC 1 0 Specifies that indicators are to be included. Specifies that the first indicator must be set to 1 (the name in the personal name main entry [tag 100] is a single surname, Specifies that the second indicator must be set to zero (main entry is not the subject). The personal name main entry field (tag 100) of each record would be searched and a hit would occur if the indicators associated with the field were 1 and 0 and the contents of the field contained the characters "Destouches." If the record met these search criteria, the title statement (tag 245) would be output. It is also possible to restrict the search to the contents of one or more subfields of a variable field. For example: IF SCAN ( TAG = lOO&INDIC = 10&NTC = A) =DESTOUCHES LIST SCAN(TAG=245); Where: NTC A Indicates that a subfield code follows. Specifies that only the contents of subfield A are to be included in the search. Note that in this form the actual subfield code "a" is converted to "A" by the program (see section on Con- stants) . MARC Research and Development/ AVRAM 263 Special Rules. So far the discussion has concerned rules of the query language that apply to either the if mode or the list mode. This section and the remain- ing sections will discuss those rules and functions that are unique to either the if mode or the list mode. In the if mode, fixed and variable field expressions can be ANDed or ORed together using the logical operators & and j. For example: IF MCPDATE1 = 1967&SCAN(TAG = 100) = DESTOUCHES LIST SCAN(TAG = 245); This query would search for records with a publication date field (MCPDATE1) containing 1967 and a personal name main entry field ( tag 100) containing Des touches. If both search criteria are met, the title statement field (tag 245) would be printed. In the list mode more than one fixed or variable field can be listed by a query as long as the fixed field names or scan expressions are separated by commas. For example: IF SCAN(TAG = 100) = DESTOUCHES LIST SCAN(TAG = 245) , MCRCNUMB; The list mode offers two options, LIST and LISTM, which result in dif- ferent actions. LIST indicates that the data elements in the expressions are to be printed, and LISTM indicates that the data elements in the expression are to be written on magnetic tape in the MARC processing format. It is often desirable to list a complete record either in the MARC proc- essing format using LISTM or in printed form using LIST. In either case, the listing of a complete record is activated by the MARC Retriever key term RECORD. For example: IF SCAN (TAG= 100) = DESTOUCHES LIST RECORD; The complete record would be written on magnetic tape in the MARC processing format instead of being printed out if LISTM were substituted for LIST in the above query. Four functions can be specified by the LIST mode. HITS signals the MARC Retriever to count and print the number of records that meet the search criteria. For example: IF SCAN(TAG=650) = AUTOMATION LIST HITS; RATIO signals the MARC Retriever to count both the number of records that meet the search criteria and the number of records in the data base and print both counts. The remaining two LIST functions permit the summing of the contents of fixed fields containing binary numbers. SUM causes the contents of all specified fields in the records meeting the search criteria to be summed and printed. For example : IF MCRCNUMB = ·~~~68 # #####' LIST SUM ( MCRLGTH ); The data base would be searched for records with LC card number field 264 Journal of Library Automation Vol. 2/4 December, 1969 ( MCRCNUMB) containing three blanks and 68 in positions one through five. The remaining positions would not take part in the query process and could have any value. If a record satisfied this search argument, the contents of the record length field (MCRLGTH) would be added to a counter. When the complete data base had been searched, the count would be printed. AVG performs the same function as SUM and also accumulates and prints a count of the number of records meeting the search criteria. Use of the Program The MARC Retriever has been operational at the Library of Congress since May 1969 and selected staff members representing a cross-section of LC activities have been trained in the rules of query construction. The applications of the program to the MARC master file include: identifica- tion of records with unusual characteristics for the format recognition study; selection of titles for special reference collections; and verification of the consistency of the MARC editorial process. As the file grows, it is expected that the MARC Retriever will be useful in compiling various kinds of bibliographic listings, such as translations into English, topical bibliographies, etc., as well as in making complex subject searches. The MARC Retriever is not limited to use with the MARC master file; it can query any data base that contains records in the MARC processing format. Thus, the Legislative Reference Service is able to query its own data base of bibliographic citations to produce various outputs of use to its staff and members of Congress. Because the MARC Retriever is designed to conduct searches from magnetic tape, it will eventually become too costly in terms of machine processing time to operate. It is difficult to predict when the system will be outgrown, however, because its life span will be determined by the growth of the file and the complexity of the queries. Meanwhile, the MARC Retriever should provide the means for testing the flexibility of the MARC format for machine searching of a bibliographic file. REFERENCES 1. U.S. Library of Congress. Information Systems Office: The MARC Pilot Project. (Washington, D.C.: 1968), pp. 40-51. 2. Rather, John C.; Pennington, Jerry G.: "The MARC Sort Program," Journal of Library Automation, 2 (September 1969), 125-138. 3. RECON Working Task Force. Conversion of Retrospective Catalog Records to Machine-Readable Form. (Washington, D.C.: Library of Congress, 1969). 4. U.S. Library of Congress. Information Systems Office: Subscribers Guide to the MARC Distribution Service, 3d ed. (Washington, D.C.: 1969), pp. 31-3lb. 5. Ibid., p. 40. MARC Research and Development/ AVRAM 265 6. Cunningham, Jay L.; Schieber, William D.; Shoffner, Ralph M.: A Study of the Organization and Search of Bibliographic Holdings Rec- ords in On-Line Computer Systems: Phase I. (Berkeley, Calif.: Insti- tute of Library Research, University of California, 1969), pp. 85-94. 7. Jollilie, John : "The Tactics of Converting a Catalogue to Machine- Readable Form," Journal of Documentation, 24 (September 1968), 149-158. 4668 ---- 266 PERFORMANCE OF RUECKING'S WORD-COMPRESSION METHOD WHEN APPLIED TO MACHINE RETRIEVAL FROM A LIBRARY CATALOG Ben-Ami LIPETZ, Peter STANGL, and Kathryn F. TAYLOR: Research Department, Yale University Library, New Haven, Connecticut F. H. Ruecking's word-compression algorithm for retrieval of bibliographic data from computer stores was tested for performance in matching user- supplied, unedited bibliographic data to the bibliographic data contained in a library catalog. The algorithm was tested by manual simulation, using data derived from 126 case studies of successful manual searches of the card catalog at Sterling Memorial Library, Yale University. The algorithm achieved 70% recall in comparison to conventional searching. Its accepta- bility as a substitute for conventional catalog searching methods is ques- tioned unless recall performance can be improved, either by use of the algorithm alone or in combination with other algorithms. Frederick H. Ruecking has published a report ( 1) of a method for improving bibliographic retrieval from computerized files when searching on unverified input data supplied by requestors. The method involves compression of author-and-title information before comparison. The rules for compression cause certain types of spelling errors and word discrep- ancies to be ignored by the computer. Ruecking reported 90.4% recall and 98.67% accuracy (precision) in a test of his method in which un- verified book order requests were matched against a MARC I data base that contained 1392 of the references searched. This paper reports on a small-scale manual simulation test undertaken to assess the value of the method when applied to bibliographic retrieval from a library catalog. Ruecking' s W ord-C ompression/ LIPETZ 267 The opportunity to test Ruecking's method when applied to retrieval from a library catalog was provided by the ready availability of data derived from a current study ( 2) of catalog use at Sterling Memorial Li- brary (3.5 million books) at Yale University. This study collects, from a rigidly randomized sample of catalog users, precise information on the clues available to them at the moment of initiating a search. Search clues are recorded exactly as known to the catalog user, employing his own spelling-right or wrong. For each catalog user studied, the outcome of the search is ascertained; complete catalog information is recorded for documents identified as pertinent in successful searches. Search clues known to catalog users wno seek specific documents correspond to the "unverified input data" which Ruecking's method would match against catalog holdings. Catalog information on those documents identified as pertinent corresponds to the portion of the data base that Ruecking's pro- gram seeks to match. It was possible, therefore, to apply Ruecking' s method by manual simulation, and to test its recall performance in real catalog searches. A test of its precision was not immediately feasible .be- cause such a test would require comparison of input data with the entire catalog (or a substantial portion of it). However, the determination of recall performance would at least indicate whether the method shows sufficient promise in catalog searching to warrant evaluation of its preci- sion. An aside on precision is in order, however. It should be noted that precision of retrieval with a given method tends to vary inversely with the size of the file being searched. Although Ruecking did not specify the number of records included in his MARC I data base, it could not have exceeded 48,000. Had he run his test on a data base, ten, or fifty, or one hundred times larger, the measured precision would certainly have been much lower than the figure reported. Any librarian who is contemplating the adoption of a retrieval technique which has been tested on a data base similar to, but smaller than, his own should realize that precision performance must inevitably drop as the data base is increased. The degree of lowered precision to be expected may be predicted theo- retically or estimated from tests on files of several different sizes. The data used in the evaluation of recall performance reported in this paper came from 126 searches in which the catalog users had been suc- cessful in locating the specific documents that they were seeking. The compression coding method described by Ruecking was applied in each instance to the author-title search clues supplied by the catalog user and to the author-title information available on the catalog card. Threshold values were computed for the catalog card data, and retrieval values were computed for the user data .. When the retrieval value was at least as large as the threshold value, the document was considered "retrieved." Ruecking's method was designed for use with English-language titles only. Of the 126 catalog searches in the study sample, 20 involved foreign- 268 Journal of Library Automation Vol. 2/ 4 December, 1969 language titles. Recall was determined on both the full sample and the English-language subset of 106 searches. Surprisingly, there is not a great improvement in performance when foreign-language references are ex- cluded. It should be noted that several difficulties were encountered in applying Ruecking' s method because of ambiguities in the rules stated in his paper. In fact, in his Figure 2 (page 236), of the seventeen illustrations of com- pression-coded data retrieved by his program, at least eight appear to contain departures from the compression-coding rules as stated in the paper. His Table 5 (page 235) is scantily described: "Individual Code Test'' and "Full-Code Test" are not defined; neither are column headings. And, contrary to the text (page 234), values in columns five through seven are obtained by adding two to the calculated thresholds in only the top half of Table 5; in the bottom half, no such regular correlation exists. In all cases of ambiguity, the alternative was selected that would tend to increase probability of retrieval. For example, Ruecking states (page 234) that the search program provided for matching of titles on the basis of rearrangement of title words, and that the threshold value required for retrieval is raised at the same time. Raising this value decreases the probability of retrieval, but it is not clear by how much the value is to be raised. For purposes of the test, the threshold value was not raised at all in cases where title words were out of correct sequence, thus re- taining maximum probability of retrieval based on the number of matched words alone, regardless of their sequence. Results of the test showed that, of the 126 documents in the full sample which were located successfully by manual search in the existing card catalog, only 88 were retrieved by the compression-code method-a recall rate of 70%. Considering only the 106 English-language references, 77 were retrieved by the compression-code method-a recall rate of 73%. The premise for the preceding calculation of recall rate should be clearly understood. The test considered real document searches that were con- cluded successfully in an actual library using a manual catalog; recall is defined here as the proportion of such searches that would be concluded successfully in a hypothetical, computerized library where the only means of searching the catalog would be by Ruecking•s method. In a real library with a manual catalog, wanted documents can be located in many ways, not merely through a knowledge of author and title (e.g., through subject entries, series entries, cross references). The test did not disqualify any manual approaches from consideration; it compared the real world with a specific potential alternative. Obviously, the use of Ruecking's method in combination with other computer programs could result in a recall rate higher than 70% or 73% by the method of calculation employed, and conceivably higher than 100% (because some document searches of man- ual catalogs that now end in failure might become successful using new search methods). Ruecking' s W ord-Compression/LIPETZ 269 Table 1 provides detailed information on the discrepancies between user data and catalog data in the test. With respect to the full sample ( 126 documents), there were 49 documents for which mismatches of data were observed. Of these, the compression-code method was able to "heal" mismatches in 11 instances to cause retrieval; on the other hand, manual searches had achieved retrieval in all 49 instances. With respect to the English-language sample ( 106 documents), there were 37 documents for which mismatches of data were observed. Of these, the compression- code method was able to "heal" mismatches in 8 instances to cause re- trieval; on the other hand, manual searches had achieved retrieval in all 37 instances. Contrary to expectations, the compression-code method performed somewhat worse, or at least no better, in "healing" actual mismatches in English references ( 8 out of 37) than it did with foreign-language refer- ences ( 3 out of 12). The higher overall recall percentage with the English- Table 1. Results of Applying Ruecking's Method in Cases where User Clues and Catalog Data Did not Match Completely Type of Mismatch in User Data Had neither author nor title Had author's last name, no title Had title, no author Had wrong author Had misspelled author Had wrong words in title Had misspelled words in title Had words transposed in title Had incomplete title: a. First word correct b. First word incorrect Had entire subtitle, no title Had part of subtitle Full Sample English Subset (126 documents) (106 documents) Not Not Retrieved Retrieved Retrieved Retrieved 1 4 1 2 2 2 2 1 9 5 2 1 2 1 1 4 2 1 9° 1 6 2 1 2 2 2 a. First word correct 1 1 b. First word incorrect 2 2 Total documents 0 00 11 38 8 29 0 1 case of correct word stems not matched because of wrong endings. 0 0 2 cases of long or composite titles with maximum threshold values contained in input words but not among the first four significant words. o 0 ° Figures shown are lower than totals of figures in columns because some documents had two or more types of mismatch. 270 Journal of Library Automation Vol. 2/4 December, 1969 language subset is attributable entirely to the fact that users had com- plete and correct data more frequently for English references ( 69 out of 106) than they did for foreign-language references (8 out of 20). Thus, regardless of original intent, the method words equally well (or equally poorly, depending on one's viewpoint) on foreign-language and English references. If foreign-language references had been systematically ignored in applying the test to catalog searches, some 16% ( 20 out of 126) of the searches would have been excluded, with no real gain in performance. The block of interviews from which the searches used in this test were drawn included 10 unsuccessful document searches in addition to the 126 successful searches. One could speculate on whether the compression- code method would have been able to "heal" these failures, resulting in a higher performance rating. The indications are, however, that the chances of such healing are close to zero. In a majority of these unsuc- cessful searches, the available data were incomplete or were not of the type that the method is intended to utilize. In the few remaining cases, it is very likely that the searches were unsuccessful simply because the desired documents were not in the library collection. Recall performance as measured by the test could have been improved by modifying Ruecking' s rules to some extent. For example, five more titles would have been retrieved had the assigned retrieval value been increased by two units in cases where the first title word matched cor- rectly; this would have increased overall recall performance from 70% to 7 4%. A further increase to 76% would have resulted from matching the user's version of the title with the catalog's subtitle, or with portions of titles which follow a punctuation mark (in addition to matching with the actual title in the catalog). Extension of the compression code to include publisher and date as well as author and title would do little or nothing to improve the per- formance of this method. The test data, although admittedly a small sam- ple, indicate that users who do not have accurate author and title informa- tion when they begin a search very rarely have accurate information on any other descriptive data element. It is, of course, a matter for individual judgment as to whether the performance of the compression-code method, as indicated by the test reported here, is sufficiently good to make it attractive for use in some computerized alternative to the manual library catalog. In the authors' opinion, Ruecking's method does not in itself supply an adequate solution to the problem of searching a computerized catalog. However, further investigation seems warranted along two lines. First, the method might be modified to give better performance in this application. Second, it might be used in combination with some other computer methods to give searching performance approaching that which is attained today by the manual searching of card catalogs. Book Reviews 211 ACKNOWLEDGMENT The work reported in this paper was supported in part by a grant from the U.S. Office of Education, OEG-7-071140-4427. REFERENCES 1. Ruecking, Frederick H., Jr.: "Bibliographic Retrieval from Biblio- graphic Input; The Hypothesis and Construction of a Test," Journal of Library Automation, 1 (December 1968), 227-38. 2. Lipetz, Ben-Ami; Stangl, Peter: "User Clues in Initiating Searches in a Large Library Catalog," in American Society for Information Science, Proceedings, 5. Annual Meeting, October 20-24, 1968, Columbus, Ohio, p. 137-139. BOOK REVIEWS Conceptual Design of an Automated National Library System, by Norman R. Meise. Metuchen, N.J.: Scarecrow Press, 1969. 234 pp. $5.00. This is a very confusing book. And it is too bad, because this reviewer kept feeling that the author, Norman Meise, had something to present. The trouble is that he does not communicate. This, I think, is the result of two things. First, the book reflects the naivete of engineers when they come to deal with what are basically social systems like libraries. This does not mean it can't be done, but such a task needs clarity and pur- pose, which this book does not have. The second springs from this failure. The masses of data, assumptions, and commentary in the book are poorly organized and intenelated. It is not enough to write strings of words; those strings must communicate and relate backward and forward in the text. Although never explicitly stated, the book evidently grew out of a study performed by the United Aircraft Corporate Systems Center in 1965-66 for the development and implementation of a Connecticut Library Re- search Center (see ERIC Document ED 0221512) . The latest reference in the book is 1966. In a field, i.e. library networks, where a fair amount of work and discussion has taken place in the last three years (e.g. the EDUNET Conference in 1966), a book like this quickly loses its impact. The purpose of the book, according to the author, is "to show the feasi- bility of a system concept rather than provide a detailed engineering design." The system is "an automated national library system" using the State of Connecticut as a model. The author then adds (spoiling the whole introduction) : "If these functions (bibliographic searching, acquisition, cataloging, circulation) can be economically automated, the major prob- lems associated with our information explosion will be solved." As Anatole France once said: "It is in the ability to deceive oneself that the greatest talent is shown." 4669 ---- Book Reviews 211 ACKNOWLEDGMENT The work reported in this paper was supported in part by a grant from the U.S. Office of Education, OEG-7-071140-4427. REFERENCES 1. Ruecking, Frederick H., Jr.: "Bibliographic Retrieval from Biblio- graphic Input; The Hypothesis and Construction of a Test," Journal of Library Automation, 1 (December 1968), 227-38. 2. Lipetz, Ben-Ami; Stangl, Peter: "User Clues in Initiating Searches in a Large Library Catalog," in American Societ'l for Information Science, Proceedings, 5. Annual Meeting, October 20-24, 1968, Columbus, Ohio, p. 137-139. BOOK REVIEWS Conceptual Design of an Automated National Library System, by Norman R. Meise. Metuchen, N.J.: Scarecrow Press, 1969. 234 pp. $5.00. This is a very confusing book. And it is too bad, because this reviewer kept feeling that the author, Norman Meise, had something to present. The trouble is that he does not communicate. This, I think, is the result of two things. First, the book reflects the naivete of engineers when they come to deal with what are basically social systems like libraries. This does not mean it can't be done, but such a task needs clarity and pur- pose, which this book does not have. The second springs from this failure. The masses of data, assumptions, and commentary in the book are poorly organized and interrelated. It is not enough to write strings of words; those strings must communicate and relate backward and forward in the text. Although never explicitly stated, the book evidently grew out of a study performed by the United Aircraft Corporate Systems Center in 1965-66 for the development and implementation of a Connecticut Library Re- search Center (see ERIC Document ED 0221512). The latest reference in the book is 1966. In a field, i.e. library networks, where a fair amount of work and discussion has taken place in the last three years (e.g. the EDUNET Conference in 1966), a book like this quickly loses its impact. The purpose of the book, according to the author, is "to show the feasi- bility of a system concept rather than provide a detailed engineering design." The system is "an automated national library system" using the State of Connecticut as a model. The author then adds (spoiling the whole introduction) : "If these functions (bibliographic searching, acquisition, cataloging, circulation) can be economically automated, the major prob- lems associated with our information explosion will be solved." As Anatole France once said: "It is in the ability to deceive oneself that the greatest talent is shown." 272 Journal of Library Automation Vol. 2/ 4 December, 1969 Basically the system is made up of three levels : local libraries, the re- gional center, and the "national library central." These are interconnected either by teleprinter, at 75 bits per second, or CRT consoles, as 1200 to 2400 bits per second. Mr. Meise develops extensive tables, using Connecti- cut as a model, for (a) estimated message traffic, real-time and batch; (b) allocation of communication traffic to segments of circuit route; (c ) cumulative communications traffic; (d) number of circuits required ver- sus circuit speed. He discusses bibliographic coupling ( 78-82), the Itek Memory Centered Processor, disc packs and file organization, ( 100-118, 162-179). I cite these tables and data (there are many more) merely to show the approach. At one point he talks about packages such as books, at another about papers. The whole system is based on statistics for which there is no discussion. Item: "the local library should satisfy a large per- centage of the user's needs ( 90-99% ) ; however, some portion of these needs ( 1-10%) should be obtained from other libraries to keep system costs within reasonable range" ( p.32). Where does "90-99%" come from? How do we know that this level will "keep system costs within reasonable range"? Item: "The State of Connecticut is about the right size for a regional center from the point of view of expected user load" ( p.118). Whose hat did he pull this one out of? There is no discussion of right size, nor really any of what "size" means -population? geographic area? cultural makeup? One suggested region (Arizona, Nevada, Utah) has about the same population as Connecticut, but is 62 times the size. Cer- tainly the communications costs are entirely different and the two regions are not comparable. Figures suddenly appear in the text, e.g. 9,610,000 vols. (p.98) and others, and the reader does not know where they came from. They may be right. They may even have been discussed somewhere in the text, but on page 98 one does not remember. And the index is of no value: two pages, hastily organized. This is all too bad, because Mr. Meise evidently put a good deal of effort into this. Instead of discussing the statistical assumptions necessary for network planning, we are presented with raw and unevaluated data. Instead of a thorough analysis of the "feasibility of a system concept", we are presented with a grandiose scheme. Buried in the pile, however, are data, which while poorly organized and presented, are necessary for prac- tical network planning. What is needed is a coherent and basic statement of the kinds of data available, of the kinds of data that are unavailable or imprecise, of the conditions under which these kinds of data hold, and of the relative usefulness of such data at varying systems levels. Perhaps it is unfair to criticize Mr. Meise for not writing this kind of book. Yet my criticism is precisely that, because he writes as though these data already exist in organized form. They don't. He has built a house of cards on air. Robert S. Taylor Book Reviews 273 Thesaurus of ERIC Descriptors, 2d Edition, Washington, D.C.: Educa- tional Resources Information Center, Bureau of Research, Office of Edu- cation, 1969. 289 pp. One of the principal problems associated with the review of a new thesaurus is that the thesaurus usually serves simultaneously to exemplify the use and misuse of the basics of thesaurus construction. The Thesaurus of ERIC Descriptors is no exception. For the purposes of this review, it is necessary to distinguish between a thesaurus and an authority list. Both are designed to improve communi- cation between the user and the information storage and retrieval system. A thesaurus is usually used in conjunction with free-vocabulary indexing (and retrieval) while the authority list must be used only with controlled- vocabulary indexing. Hence a thesaurus, in the words of the Engineers' Joint Council Guide to Indexing and Abstracting, " . .. is not meant to specify the words in which information is to be recorded, but rather to establish the semantic and generic interrelationships between such words". The indexer uses the thesaurus as a means of "enriching" his in- dexing, i.e., as a guideline for effective indexing. The searcher uses the thesaurus to aid in phrasing or clarifying his search question. In neither use is there demanded the use of a particular term in preference to any other. An authority list, on the other hand, must be composed entirely of system terminology (except for the USE-USE FOR relationships, al- though the non-preferred term cannot be used profitably as a search term) which the indexer and searcher are constrained to employ. The Thesaurus of ERIC Descriptors is, by its own admission, an au- thority list ("Only those descriptors actually used for indexing are placed in the Thesaurus .. . ", p. vii). A thesaurus may be used with either free or controlled vocabulary indexing/ retrieval; an authority list may be used only with a controlled vocabulary. It is time we started using the correct terms for these two types of communication device. Apart from the confusion as to the exact nature of the document it inu·oduces, the introduction to the Thesaurus of ERIC Descriptors does provide a good discussion of the problems of indexing and "thesaurus" de- velopment, especially concerning the need for multi-term entries. The descriptor listing, to which are added a rotated descriptor display, a descriptor group display, and descriptor scope notes, is well constructed (especially commendable is the rotated descriptor display). However, I question the value of the descriptor groups, which serve to grossly classify the ERIC descriptors, since they tend to detract from the cross-concept nature of the authority list. Finally, the formats of the various listings in this document are well done and provide a very readable and usable au- thority list. James E. Rush 274 Journal of Library Automation Vol. 2/4 December, 1969 Announced Reprints. Vol. 1, Feb. 1969. Microcard Editions. 52 pp. $30.00 per year. This journal complements Guide to Reprints. Announced Reprints lists forthcoming reprints that have been announced but not yet produced. Published quarterly, its scope includes books, journals, and other materials originating both in the United States and abroad. Each issue will cumu- late all previous issues except that following the November issue all titles that have been published will be dropped. Books are entered by author. Entries include author, title and original date of publication. Journals and sets are entered by title and include volume numbers. Each entry includes in brackets the date of the first inclusion of an item in Announced Reprints. Titles preceded by an aster- isk are those that have been published subsequent to being listed as a forthcoming title. A title that appears in the February issue, for example, as an announced title which is then published in March will appear in the May, August and November issues preceded by an asterisk. Following the November issue it is dropped. Prices are included, in some cases being in the currency of the country. Prepublication prices may be listed but the deadline is not. There is an alphabetical listing of publishers known to be active in the reprinting business, but of the 218 publishers so listed 124 did not supply Announced Reprints with titles. Among the non- respondents was Kraus Reprint Corporation, one of the larger houses. Exactly what need this journal answers is not completely clear . The Guide to Reprints provides an annual, cumulative list of books, journals and other materials that have been reprinted. As an acquisition tool it is self-evident. But since the period between the time a title is announced and the time it is actually reprinted is variable, one can only suppose that the publishers hope to fix their market by having their forthcoming titles listed in Announced Reprints. If they get expressions of interest from many libraries they may actually reprint. Since Announced Reprints gives the date of the first time a reprint title is listed, eventually librarians will learn which publishers are reliable and which are not in following through on the promise of publishing a reprint. John Demo~ Union List of Serials in the Libraries in the Miami VaUey, Sue Brown, editor. 2d edition. Dayton, Ohio: Wright State University Library, 1969. $20.00. It's hard to review a union list of serials because such a publication is obviously a very useful thing to have and to use. Being intimately con- nected with the production of a similar list for the Cincinnati area, I can Book Reviews 275 only commend the librarians of the Dayton-Miami Valley Consortium for producing this second edition in as short a time as they did. (The first edition containing 8880 titles held by 35 libraries was published in the Spring of 1968.) This edition contains the holdings of tlrree more libraries than did the first, and nearly 900 more titles are included. There are a few minor points about which one might quibble, such as the listing of the computer output on the lined side of the paper, making the pages of the published list a bit lined and grey looking; the use of corporate entries for the titles, which is O.K. if the list is used only by librarians and others used to that form of entry, but confusing to the average patron who is, I am convinced, used to looking up holdings in- formation by the running title that he picked up in a citation somewhere; listing holdings under the latest title of a periodical with notes as to the title variations over the years (although I can't complain too loudly about this, as it is the same way we are doing the Cincinnati area list, although with less information as to title changes than in this list); the use of library name "codes., that are the same as, or similar to, those used in the "Union List of Serials," which causes a great string of ODA-to run down each page. There are, naturally, a few missed cross references to the latest title as well as a few keypunching errors. These detract little from the usefulness of the volume, which should be great, especially in the area near western Ohio. The list is available for $20 from the Acquisitions Department, Library, Wright State University, Colonel Glenn Highway, Dayton, Ohio 45431. Thomas H. Rees, ]r. Current Contents; Education. 1 (June 17, 1969) Philadelphia, Institute for Scientific Information. Subscription price varies. The rise in need for librarians to build their own offprint files has in- tensified searching for current, relevant references. Current Contents; Education facilitates that search, for it reproduces contents pages of some 350 journals in the field of education and related fields. This new publi- cation includes over a dozen library journals, including the Journal of Li- brary Automation. The various sections of Current Contents have established a well de- served reputation for timeliness. Indeed, some librarians have complained that their users receive reprinted contents pages in Current Contents be- fore libraries receive the journals. Since each issue contains an author index and address directory, it is easy to request an offprint, and thereby expend minimal effort in keeping up as well as building a personal off- print collection. The subscription price can vary from $100 for a single non-educational subscription to less than $1.50 for multi-year subscriptions in groups of 200 or more. F d ·kG K"l re enc . J gour 276 Journal of Library Automation Vol. 2/ 4 December, 1969 Standardization for Documentation, Bernard Houghton, ed. Hamden, Conn.: Archon Books, 1969. 93 pp. $4.00. The editor has brought together in this tight little book an illuminating collection of six useful papers prepared initially for a conference held in Liverpool, England, in November, 1968. The announced goal of this conference was to "isolate and consider some of the areas in which the adoption of universal standards is of immediate relevance." Inasmuch as the authors are all British, the volume will have greater interest abroad than in the U.S. Nonetheless, there is universal recognition that standards in various areas of documentation are desperately needed and that a great deal remains to be done. An especially clear exposition of the British Standards Institution's work in this field is the work of C. W. Paul-Jones. He relates the methodology and the work of BSI to that of the International Standards Organization (ISO ) and touches briefly on each standards committee and its program of work. His concise outline of standards in being and in progress, and the place of each standards-involved organization in the framework of universal standards is thoroughly competent. K. I. Porter, the editor of the British Union Catalogue, touches on a variety of problems encountered in his work and discusses the potential of standards in the area of serial publications. A wryly humorous essay on standards for book production is the work of Peter Wright. He seems not very hopeful of changing the methods of book-trade production through standards, but believes in the usefulness of the effort to establish them. The essay of K.G.B. Bakewell takes up classification, cataloging, and other devices for organizing library material and providing access to it. He deplores the inchoate British de- velopment in these areas and cites considerably greater standardization elsewhere. His review of known systems is helpful. D. Martin's paper, "Standards for a Mechanized Information System," reviews the practical problems of one who has to subject information to the unthinking mind of machines. He too enumerates needed standards for coding, indexing, data elements, etc., and concludes (properly) that "It is too early to start talking in terms of solutions: standards activity is only now beginning to gather momentum." The final paper, that of John MacLachlan, is an ordi- nary how-we-do-it job, describing an abstracting service in one specific field and the local standards applied. These six papers taken as a whole constitute an informative and cogent source of information on the present status of standards work in the in- formation field. Despite the British emphasis, the case for multi-national and international standards is clearly set forth. This book should be re- quired reading for students and workers in the fields of information science and documentation. ] errold Orne Book Reviews 277 Cataloging U.S.A., by Paul S. Dunkin. Chicago: American Library Asso- ciation, 1969. 159 pp. This book is, quite simply, a survey of the development of cataloguing in America, and of the present situation of cataloguing in America. It deals with all aspects of author cataloguing, descriptive cataloguing and subject cataloguing (both subject headings and classification). The method used by the author is didactic and expository rather than critical Mr. Dunkin seeks to analyse and to display the situation rather than to arrive at startling insights or to propose radical modification. The book is addressed to "the beginning student ... the experienced cataloger ... the public service librarian ... the library administrator". It is, Mr. Dunkin says, not a "how to do it book" but a "why do it book". It is certainly true that any member of Mr. Dunkin's readership will be enlightened by be- ing shown the roots of modem cataloguing, and by having the perennial problems of cataloguing discussed in an admirably clear manner. Mr. Dun- kin does not fail to illuminate each problem, and such illumination is, of course, half way to a solution. Where he does fail, I feel, is in not pro- viding any firm answers to these problems. Perhaps in cataloguing there are no firm answers. This book seems to me, as an English cataloguer, to epitomise the "other directed" nature of American cataloguing. In read- ing this, as other American textbooks, I find a somewhat reverent attitude towards the great figures (principally Cutter), the great institutions ( prin- cipally the Library of Congress) and the "sacred texts" (the various codes). The English tradition seems to me much more "inner directed", much more concerned with what is best for the individual catalogue, much less concerned with the necessity for standardisation and consist- ency between catalogues. This is not to say that either approach has a monopoly of virtue, or that one can fault Mr. Dunkin's book on this account. Mr. Dunkin has chosen his readership and his method, and within his self-imposed limits has produced a practical and useful book. Further- more the book is written with a clarity and ease unusual in cataloguing literature. Michael Gorman Systems Analysis in Libraries, by F. Robinson, C. R. Clough, D. S. Hamil- ton, and R. Winter. Newcastle upon Tyne: Oriel Press Lin1ited, 1969. 55 pp. 15s. ( Symplegades, Number I, A Series of Papers on Computers, Libraries and Information Processing). If Symplegades once diligently guarded the entrance to the Bosphorus, it has now gratefully allowed this simple book to survive its peril. The authors explain that the title is somewhat misleading-the book 278 Journal of Library Automation Vol. 2/4 December, 1969 has nothing to do with library systems and little in terms of system analy- sis that does not relate to computerization. The two purposes of this work are the need for stressing clarity in defining objectives and for empha- sizing the extent and depth of the work involved in systems analysis. A book that could achieve such simple but difficult objectives and do it in- telligently would indeed be welcome in our discipline. This volrune, how- ever, does not obtain its objectives. It does provide something just as im- portant in that it is readable (with dashes of humor) with a simple pre- sentation of the basic tenets of systems analysis as it applies to libraries. It assumes that the reader knows nothing about systems analysis and its application to the computer. For the professional neophyte or the old graduate who has finally faced up to the realities of the future, this book should be a definite beginning point. The structure of the book and the presentation of the text contains the same simplicity as the message and book conveys. The presentation is in the form of the message. The book does contain one point of view which seems invalid. It sug- gests that systems analysis is only undertaken in connection with comput- erization. There are other shortcomings like the unexplained and unlabeled figures and the use of acronymns without explanation or definition. One also wonders about a technical book without the use of sourcing. I rene Braden On Research Libraries; Statement and Recommendations of the Commit- tee on Research Libraries of the American Council of Learned Societies; Submitted to National Advisory Commission on Libraries, November, 1967. Cambridge, Mass.: The M.I.T. Press, 1969. 104 pp. This report presents the problems of research libraries and puts forth eleven major recommendations to solve these problems. In summary the recommendations are for a national library structure presided over by a National Commission on Libraries to cope with various problems, includ- ing automation; financial support from federal, state and private sources; and study and revision of the copyright law. None of the recommenda- tions is novel. Edwin E. Williams of the Harvard University Library contributed a skillful summary of problems related to "Bibliographic Control and Physi- cal Dissemination." M. V. Mathews and W. S. Brown of the Bell Telephone Laboratories prepared a section entitled "Research Libraries and the New Technology," which discusses computers and microcopying. The discussion of library computer applications is less than helpful. The authors propose a catalog for a university library on 80 reels of magnetic tape and propose "complete resorting of the catalog." No one with any experience what- Book Reviews 279 soever in library computerization would dream, even in his worst night- mare, of such a monstrous arrangement. Yale's RalphS. Brown, Jr., has furnished an appendix, "Copyright Prob- lems of Research Libraries," that is most perceptive and informative. Brown concludes that although copyright revision must move on," the costs of using copyright works [must be] bargained out" and that Con- gress "must for a while attempt the difficult feat of standing still on a tightrope." The verso of the title page of On Research Libraries sharpens this point, for it carries the prohibition that "No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without permission in writing from the pub- lisher." Bargaining, if it can be called that, is surely here. Frederick G. Kilgour 5123 ---- 1 CONCEPT OF AN ON-LINE COMPUTERIZED LIBRARY CATALOG Frederick G. KILGOUR, Director, The Ohio College Library Center, Columbus, Ohio A concept for mechanized descriptive cataloging is presented, together with four areas of research programs to be undertaken. This paper will describe a concept of a catalog that is hospitable to mech- anized descriptive cataloging, and will delineate major areas of research for production of knowledge necessary to implement such a catalog. To avoid an unnecessarily complex presentation, the discussion will treat only of printed books. Nevertheless, the catalog described will function equally well as a store for serials, journal articles, reports, and any other materials that carry bibliographic-like descriptions of themselves. As used in this paper, a concept is an idea that combines experience with and observations of catalogs, and suggests further experimentation and observation. The merit of a concept is measured by its fruitfulness in production of new ideas and new experimentation and observation. The purpose of the concept proposed in this paper is to suggest avenues of investigation that will yield findings useful in development of mecha- nized, descriptive cataloging. The paper opens with a brief discussion of the objective and functions of a library catalog. Next there is an analysis of the principal contribution of information retrieval during the past quarter century and a proposal for applying this advance to cataloging of books. The third section de- scribes a plan for a new style of catalog, and the fourth shows how it will be possible to prepare entries mechanically from title pages for inclusion in such a catalog. There follows an outline of major research investiga- 2 Journal of Library Automation Vol 3/ 1 March, 1970 tions to be undertaken to produce knowledge necessary for activation of t.'Ie new style catalog and mechanical cataloging. The paper concludes with a brief estimate of the success the system may be expected to attain in achieving the objectives that appear at the start of the paper. OBJECTIVES The principal objective of a future library will be active participation in the program of the community or institution of which it is a part by furnishing members of the community with bibliographic, textual, and other recorded information when and where they need such information. The passive service functions that librarianship has developed during the past century are no longer adequate to maintain a library as a viable organization within its environment. Effective special libraries have rid themselves of the passive service concept and aggressively participate in the programs of their companies. An extreme example of active participation in institutional programs is the library-like sections of intelligence agencies. Here collectors and processors of new information do not place that information on shelves or in files with the expectant hope that someone will use it. Rather the collection and processing staff immediately brings new information to the attention of those making decisions that the new data may affect. A fruitful concept of a library is as an external human memory. Since Aristotle, it has been recognized that memory is necessary for creative thinking. The process of creative thinking requires raw materials from memory; but for centuries it has been impossible for man to retain within his memory all data that could fuel his creative thinking. Indeed, one great triumph that sets human beings apart from all other animals is the ability to store data in an external memory such as a library. However, to support creative thinking, the external memory must transfer data to the human mind with as great speed as possible to prevent hindrance of thought that admits distraction. It is this lack of speed that generates frustration among library users. If a library is to participate effectively in programs of its institution or community, it must simulate human mem- ory in furnishing a human mind with data when and where that mind needs data. Current development in computers, and particularly in their memories, holds out the hope of highly effective external memory opera- tion at some point in time beyond the foreseeable future, but in the meantime it is entirely feasible to strive for simulation of human memory with speedy recall of bibliographic information. It has been pointed out elsewhere ( 1) that productivity of library workers is not continuously increasing as is that of workers in the general economy, so that library unit costs are rising at a much more rapid rate than are those in the economy as a whole. If libraries are to attain their objectives in the future, they must invoke a technology that will enable them to lower their excessively high rate of unit-cost rise. It appears that Concept of Computerized Catalogs/ KILGOVR 3 mechanization, or more specifically, computerization, is the only avenue that extends toward the goal of economic viability. INFORMATION RETRIEVAL During the last quarter-century, there have been important develop- ments in information retrieval that have yet to be applied to book collec- tions. Such pioneers as W. E. Batten, G. Cordonnier, Calvin Mooers, and Mortimer Taube made a major innovation when they developed coordi- nate indexing. This technique coordinates index terms at time of search- ing, employing Boolean logic. Coordinate indexing has greatly increased flexibility of searching and number of accesses to documents in contrast to the precoordinated headings in traditional catalogs that are inflexible for searching and for up-to-date maintenance. Early information retrieval systems dealt with relatively small files of documents, such as patents and internal reports, that were not subject to traditional bibliographic control. Moreover, indexes to these files were housed in various manual devices. With the advent of the computer it became feasible to apply coordinate indexing techniques to large files of documents, including materials under classical bibliographic control. However, up to the present time, the techniques of information retrieval have been applied primarily to huge files of journal articles; an outstand- ing example of the application of coordinate indexing to journal articles is the pioneering Medlars project, in which the primary approach to article is via coordination of subject indexing terms. Retrieval of books from a library collection is an information retrieval process irrespective of whether the borrower uses subject headings or author-title entries in the catalog. The user who obtains a book from a library by employing the book's author-title label is logically engaged in the same information retrieval process as is the user who searches out a book under a subject heading. At first reading of the previous paragraph it might appear that a reader who obtains a novel by use of an author-title entry in a catalog is not engaged in information retrieval in its customary narrow sense. However, it is clear that the reader of a novel or poem is acquiring information in the same sense as is the reader of a book on computers, although knowledge he gleans from a novel is not for immediate practical applica- tion, but rather to enable him to understand what it is like to be a human being, and more specifically what it is like to be a human being in some of the precise circumstances of life. For the library user the words in an author's name, a title, and subject headings, are index labels that he uses to find a book that contains infor- mation he needs. The traditional use of these index labels, at least since the Middle Ages, has been some variety of an author and title citation form. The user knows externally to the citation that the book so labeled has information he wants. Apparently three-quarters of the information 4 Journal af Library A.utomation Vol. 3/1 March, 1970 retrieved from a library by use of a library catalog is via an author-title entry ( 2, 3) or a known document search ( 4) . A librarian's use of a catalog (except for reference librarians who. rep- resent users) is to discover whether or not the library has a given book. The librarian does not use the authm-title entry as a label, but rather as information per se. Librarians include sufficient data in catalog entries to enable them to decide from the description whether or not a book at hand or a book described by another citation is tne same book as that in the catalog records. In short, users employ a library catalog to direct them to information they require; librarians use the catalog for the actual information it con- tains. COMPUTERIZED CATALOG CONCEPTS Several libraries employ nonconventional design for computerized cata- logs or lists of other than bibliographic items. The Stanford University Library .. s on-line system uses a sequential fl:le of enfries to which there is an index of word's in the author and title elements of the eDtry as well as other words in other elements (5). Index files for various Stanford data collections are "Author, Title Word, ID Number, Corporate Author, Con- ference Author, Keyword, Citation,"' and for certain files, top1'Cal subject indexes. In general, this system is· widely employed in the organization of computerized files, but the Stanford application has a nniqne featnre in that it uses a derived key consisting of only the mst three letters of index words. For example, the derived keys for author and title words in Figure 1 are VIC, ON}J, RET, SYS, and THE. The computer calculates the position of the word in the index from these trigrams· so that if is possible to locate the index word with great speecl This technique· of employing a derived key to compute locatiOn takes full advantage of a computer's major characteristic, namely, ability to compute rapidly. The Washington State University Library has developed a similar sys- tem for its on-line acquisitions file. Access to an entry in this linear file is by purchase order nmnber. A random number geneFafOl' uses the pur- cllase order number to compute position of the entry in fhe file. From early trials, this technique appears to make possible exceptionally efficient use of random access file space. Yale's Machine Aided Techmcal Processing System uses derived keys to locate entries for book funds in the system·s commitment register and entries in a name and address file employed for addressing notices and claims. The Yale Technical Processing System also uses a derived key technique to detect duplication of purchase orders entering the master file. This system operates using the first four leffers of the author's name, first three letters of the first non-article word of the title, and first letter of the seeond title word if there is a second word. A routine run on 23 June, 1969, compared 1,23-7 new entries against 63,641 entries already in Concept of Cvmputerized Catalogs/KILGOUR 5 the file. The comparison produced 199 couplets containing possible dupli- cates of which 115 couplets actually were duplicates; of the 115, only forty would have been obtained if an equal compare had been made through- out the author and title fields. Several investigators are working on techniques for derivation of keys ( 6, 7, 8) . Similar work on telephone directories ( 9) has yielded prelimi- nary results indicating that an efficient formula for derived keys for per- sonal listings is the first three letters of the surname and .first three letters of the street name; and for business listings the first three letters of the first word and first three letters of the second word. The Ohio College Library Center is working on development of a com- puterized catalog for traditional catalog entries wherein a computer will rompute position of an entry in a file organized in a two-dimensional array from a precoordination of truncated strings of letters from words in the author's name and in the title. OCLC plans to use this technique because, as already noted, three-quarters of the use of a library catalog seems to be use of author-title entries. Precoordination :of derived keys from these two elements will speed average retrieval time. The present design calls for a microcatalog containing, on the average, perhaps fewer than five entries to be located at each computed position. Having com- puted a location, a eomputer will search the microcatalog for entries possessing derived keys matching the original request, and entries satisfy- ing this requirement will be displayed as a minilist on a cathode ray tube terminal. It is hoped that algorithms can be constructed that will yield minilists containing fewer than twenty entries 95% of the time. Indexes to the p:r.oposed main entry file will be the equivalent of classi- cal added entries. However, it is fruitful to view subject headings, title added entries, and author added entries as being continuous text, from which uniterms can be mechanically extracted. Under each uniterm will be a list of addresses of the microcatalogs containing the corresponding entries, and each entry could be looked upon as analogous to the concept of a microtheme proposed by T. P. Loosjes ( 10). Coordination of index- ing at sear-ch time by the user need not be limited within subject words, or title W{)rds, or author words. Rather, coordination among these ele- ments will greatly increase accesses to entries and will make possible retrieval of entries with a relatively slight amount of bibliographic infor- mation, particularly if each word is truncated as described above. Although much research and new knowledge is necessary to achieve successful design of the type {)f catalog described above, there is no tech- nical obstacle to its successful activation for experimentation. When prac- tical implementation and routine operation are also successful, the user will be employing a minilist containing twenty or fewer entries most of the time. In other words, the reader will be using a catalog of twenty or fewer entries, and such a catalog makes it unnecessary to include bibliographical embellishments required for entries in huge card or book- 6 Journal of Library Automation Vol. 3/1 March, 1970 form catalogs. It would appear that for catalogs of twenty or fewer entries only information on title pages would be required; a scholar rarely, if ever, needs more. Hence, it seems feasible that mechanization of descrip- tive cataloging could be achieved in the foreseeable future. MECHANIZED DESCRIPTIVE CATALOGING The organization for a computerized catalog containing entries pre- pared mechanically from title pages would be somewhat different from that described in the previous section. If it proves impossible, as seems likely, to devise an algorithm that would mechanically identify author, title, and other elements on a title page, it would be necessary to arrange entries in sequential order. A computer could then prepare a mechanical coordinate index of substantive words on the title page that would make possible at search time coordination of words in author, title, and other elements without having to identify those elements. Of course, catalogers would still do subject classifying and indexing, as well as assigning of call numbers, but a computer would mechanically convert these additions to the uniterm type of coordinate indexing described in the previous section. This proposal to construct a bibliographic record in the form of a tran- scription of a title page is not new. An early code for production of catalog entries, which the French Government issued in 1791 ( 11), pre- scribed transcription of the title page, and underlining of the author's name as the filing term. If the book did not have an author, the key word in the title was to be underlined. The code also provided for the title- page transcription to be supplemented by a physical description of the book. This proposed new concept for a computerized library catalog closely relates to the Stanford design and the planned OCLC design. However, in contrast to the new concept, the Stanford file organization requires identification of record elements from within which words are extracted for inclusion in indexes, and the indexes are so tagged. Similarly, the present OCLC plan also requires identification of author and title ele- ments for calculation of location, and hence for retrieval, as well as flag- ging of other retrieval elements, such as record number and call number; but the OCLC system will not make necessary identification among types of added-entry elements. The proposed new concept expands this last device to the entire record. A computer simulation has been carried out of an on-line computerized catalog containing descriptive entries prepared mechanically. Access to the simulated catalog was by coordination of non-structure words in titles via single-level indexes. Simulation of user inquiries at a peak rate of five per second, processed on an economically feasible computer, revealed that utilization of the computer's central processing unit was only 19.87 percent. It is known from other simulation studies that library use of such Concept of Computerized Catalogs/ KILGOUR 7 a computerized catalog would raise utilization by only two percent at the most. Hence it follows that there is at least one (and probably several) existing, economical com- puter system that can be employed for such a catalog. Mechanical descriptive catalog- ing of the title page depicted in Figure 1 would be efficient and ef- fective. The only character strings on the title page that would not be useful in coordinate indexing are "BY" and "M.A., F .L.A.". However, the title page in Figure 2 contains at least seven, or perhaps eleven, words and symbols that would not Information Retrieval Systems Characteristics, Testing, and Evaluation F. Wilfrid Lancaster Nat!ooal Ubruy of ModlciQo John Wiley & Sons, Inc. New York · London · Sydney · Toronto Fig. 2. Title Page (undated). ON RETRIEVAL SYSTEM THEORY BY B. C. VICKERY, M.A., F.L.A. SECOND EDITION WASHINGTON BUITERWORTHS 196S Fig. 1. Title Page. be employed in coordinate index- ing. If these eleven were to be in- cluded in the index and remain unused, they would approximately double the size of the index for this particular title page. Such in- efficiency is too large to tolerate. Morever, the title page in Figure 2 does not contain date of publica- tion. Effect of absence of publica- tion date from entry and index is not known, although a recent study suggests that date of publication may be of relatively little use as a retrieval element ( 12). 8 Journal of Library Automation Vol. 3/ 1 March, 1970 The text of a title-page would be displayed as a string of characters and not rearranged as is done in traditional catalog entries. No doubt sophisticated algorithms will be devised to format displays, but even a simple algorithm produces a useful representation of title-page informa- tion. For example, by employing the simplest of algorithms that would insert two spaces at the end of each title-page line, the title page in Figure 1 would appear as follows on a terminal. ON RETRIEVAL SYSTEM THEORY BY B. C. VICKERY, M.A., F.L.A. SECOND EDITION WASHINGTON BUTTERWORTHS 1965 Readers have used title pages successfully for centuries and will surely experience no difficulty in using them displayed in this manner on termi- nals. It is hoped that ultimately it will be possible to use optical character recognition techniques for mechanical transcription of most title pages. Until effective OCR techniques are available, it will be necessary for clerical staff to transcribe title pages, and such employment for human beings is undesirable. However, libraries now employ clerical staff to tran- scribe bibliographic information for entries in essentially the same man- ner, so that continuance of an existing practice in this instance cannot be looked upon as invocation of a machine to convert human beings to ma- chine-like activity. Nevertheless, machines should replace such activity at the earliest opportunity. RESEARCH There are at least four major areas of unknown on which research must be carried out to produce knowledge needed for development of a com- puterized library catalog hospitable to descriptive cataloging entries pro- duced mechanically: 1) use of library catalogs; 2) specification for de- rived keys; 3) identification of title-page words useful and not useful for coordinate indexing; and 4) extent and type of coordination necessary to ensure successful retrieval. Extraordinarily little is known about users' employment of library cata- logs to obtain information from books. Yet successful design of a catalog must be based on firm knowledge of catalog use. Some areas of the broad pattern of catalog usage are known, but much more must be discovered before an effective catalog can be designed. Descriptive cataloging rules have long been derived from rationalized principles of title-page and catalog formats. As yet there has been no major effort to derive these rules from the bibliographic practices of library users. For example, there has been no general effort to construct rules for descriptive catalog entries that match scholarly bibliographic references in such a way that a scholar could always expect to find in a library catalog essentially the same entry presented to him in a biblio- Concept of Computerized Catalogs/KILGOUR 9 graphic footnote. To design new catalogs based on the various scholarly traditions of citation will require a series of analyses of citation practices that will ultimately yield descriptions of minimum regularity. The section of this paper on computerized catalog concepts has referred to research on specification for derived keys. Such specification is required to enable swift access to files and at the same time to diminish human error in search requests. Traditional designs of computer files are inade- quate for management of huge files of millions of bibliographic entries. At the present time it appears that the truncation algorithms already referred to may be able to cope successfully with a majority of catalog entries. However, it is clear that such truncation techniques will not pro- vide uniqueness of all keys adequate for efficient on-line catalogs. There- fore, it will be necessary to carry out a series of investigations that will identify classes of entries for which a basic algorithm does not operate satisfactorily and to devise a supplementary algorithm to improve unique- ness of keys for those entries for which the basic algorithm essentially failed. Presumably this cycle will be repeated as long as inadequacy of key uniqueness persists. In other words, research in this area will continue as long as retrieval inefficiency exists for the user. Uniqueness of key depends on uniqueness of the serial combination of words from which the key is derived. Hence analyses of frequency of word occurence on classical catalog entries, title pages, and in subject indexes, should be carried out with the aim of deriving a generalized description of such frequency distributions. Such findings will be necessary for sophisticated logical and physical file organization. . To organize an efficient, huge file of bibliographic entries it is necessary to develop a method for computing scatter storage addresses that provides a very high percentage of unique addresses, thereby avoiding a collision with an entry already in an address. Of course, it is necessary to furnish a hash-coding, or scatter-storage, algorithm with keys that possess high relative uniqueness; otherwise, the most efficient of scatter-storage algorithms would yield non-unique addresses in ratio to the degree of non-uniqueness of keys. P. C. Mitchell and T. K. Burgess ( 13) have intro- duced random-number generation for computing scatter-storage addresses and have shown their method to be more efficient than division hash coding. Other investigators are working on techniques for minimizing queues resulting from repeated collisions. There is need for continuing imaginative investigation that will yield results like that of Mitchell and Burgess before huge bibliographic files and their indexes will be accessed efficiently. Identification of useful and non-useful words for coordinate indexing on title pages, including those in foreign languages, is related to catalog usage. At the present time there is no information that gives a clue as to size of a list of non-useful words. Much ingenuity and imagination will 10 Journal of Library Automation Vol. 3/1 March, 1970 be required to identify non-useful words and to construct efficient null lists. Finally, investigation will be needed to determine amount and type of coordination necessary among author and title words. It will also be es- sential that a measure of retrieval success by author and title be devel- oped. The need here is construction of a meaningful measure for retrieval of a single entry. CONCLUSION The proposed concept for an on-line computerized library catalog will make it possible for a user to obtain bibliographic information from a remote terminal rapidly. Use of derived keys would increase error toler- ance well above that of present manual systems by diminishing effect of misspellings and by making it unnecessary for the user to have knowl- edge of catalog organization. Moreover, the concept is a step toward full mechanization and can indeed be viewed as a partial simulation of text processing. The proposed catalog will also make it possible for libraries to take the first major step toward their economic goal of development of a continu- ously increasing productivity for both library staff and library user. It is anticipated that successive steps to come after mechanical descriptive cataloging will be automatic subject classification and indexing, to be followed ultimately by full text processing. When it is possible to achieve full text processing mechanically, and a decade or more may be required for that achievement, libraries will have succeeded in attaining their ob- jective of participation as well as their economic goal of rate of cost rise equal to that in the general economy. REFERENCES 1. Kilgour, Frederick G.: "The Economic Goal of Library Automation," College & Research Libraries, 30 (July 1969), 307-311. 2. Tagliacozzo, Renata; Kochen, Manfred; Rosenberg, Lawrence: "Or- thographic Error Patterns of Author Names in Catalog Searches," ]oumal of Library Automation, In press. 3. Brooks, Benedict; Kilgour, Frederick G.: "Catalog Subject Searches in the Yale Medical Library," College & Research Libraries, 25 (No- vember, 1964), 483-487. 4. Lipetz, Ben-Ami: "A Quantitative Study of Catalog Use" In Univer- sity of Illinois Graduate School of Library Science: Proceedings of the 1969 Clinic on Library Applications of Data Processing, (Pre- print). 5. Parker, Edwin B. : "Developing a Campus Information Retrieval Sys- tem." In Proceedings of a Conference Held at Stanford University Libraries, October 4-5, 1968 (Stanford, California: Stanford Univer- sity Libraries, 1969), pp. 213-230. Concept of Computerized Catalogs/KILGOUR 11 6. Ruecking, Frederick H., Jr. : "Bibliographic Retrieval from Biblio- graphic Input; The Hypothesis and Construction of a Test," Journal of Library Automation, 1 (Dec. 1968), 227-238. 7. Nugent, William R.: "Compression Word Coding Techniques for In- formation Retrieval," l oumal of Library Automation, 1 (Dec. 1968), 250-260. 8. Kilgour, Frederick G.: "Retrieval of Single Entries from a Computer- ized Library Catalog File," Proceedings of the American Society for Information Science, 5 ( 1968), 133-136. 9. Rothrock, Hamilton Irving, Jr.: Computer-Assisted Directory Search; A Dissertation in Electrical Engineering (University of Pennsylvania, 1968). 10. Loosjes, T. P.: "Document Analysis,'' Proceedings of the Third Inter- national Congress on Medical Librarianship ( 1969), Preprint. 11. Instruction pour Proceder a la Confection du Catalogue de Chacune des Bibliotheques (Paris: Imprimerie Nationale, 1791 ), p. 6. 12. Vaughan, Delores K.: "Memorability of Book Characteristics: An Ex- perimental Study." In University of Chicago Graduate Library School: Requirements Study for Future Catalogs (Chicago: University of Chicago Graduate Library School, 1968), pp. 1-41. 13. Mitchell, Patrick D.; Burgess, Thomas K.: "Methods of Randomization of Large Files with High Volatility," Journal of Library Automation, 3 (March 1970), 79-86. 5124 ---- 12 LIBRARY MECHANIZATION AT AUBURN COMMUNITY COLLEGE Eloise F. HILBERT: Head Librarian, Auburn Community College, Auburn, New York Use of an IBM 1401 computer and a single keypunch operation for chang- ing a college book collection from Dewey decimal to Library of Congress classification; for acquisitions, accounting and circulation procedures; and for production of a list of periodical holdings. A mark-sense reproducer is used for the circulation system. INTRODUCTION Auburn Community College, a two-year college and one of the fifty-two units of the State University of New York, was founded in 1953 as one of the first community colleges in the state. Like other institutions of higher learning, it has experienced a rapid growth. There are about 1500 students enrolled in the day division and about 1500 in the evening divi- sion. The college offers courses in data processing in addition to the usual curriculum offerings. The library possesses approximately 40,000 volumes and adds about 3,000 volumes a year. The Librarian attended a one-week IBM customer executive conference for librarians on the subject of automation in Endicott, N.Y. in April1967, and returned eager to consider ways of applying computer technology to some of the Library's procedures. Data processing equipment, located directly beneath the Library, was available for Library use. The limited Library staff, both professional and clerical, would benefit by automating of technical processes, since automation would eliminate much typing and reduce tedious tasks. Clerical staff would have time to take over clerical operations that were being performed by professional staff, freeing the latter for more professional work. Library Mechanization at Auburn/HILBERT 13 Upon approval by the college administration of plans for automating, discussions took place between the librarians involved and the Chairman of the Data Processing Department. There was considerable interest and cooperation among the Library staff and Data Center personnel. Library literature describing computerization in academic libraries was reviewed, but there was a decided lack of information available concerning automa- tion of libraries approximately the same size as Auburn's. A proposed use of mark-sense cards in the circulation system also appeared to be unique. IBM publications on library applications served as guides in developing the systems ( 1,2,3). Decisions were made to automate a projected re- classification, the acquisitions and circulation systems and accounting pro- cedures, and to produce lists of the serial holdings ( 4). The IBM Processing installation used for the Library comprised the fol- lowing: 1401 computer, 12K storage; 1402 Card Reader Punch; 1403 Printer; 1311 disk storage drive; 514 mark-sense reproducer; 083 sorter; 548 Interpreter; 026 keypunch. RECLASSIFICATION A decision had previously been made to change the classification of the Library's collection of 30,000 books from the Dewey decimal to the Li- brary of Congress system. Since the tedious task of erasing or painting over and retyping catalog card numbers and entries, book pockets and book cards would be greatly reduced by automation, it was decided that reclassification of the book collection should be the first automated pro- cedure. The aim was to complete conversion as quickly as possible in order to avoid confusion in the Library. The Data Center made its staff available for the summer months so that much of the conversion was completed at that time. Conversion of the shelf list presently in Dewey decimal classification was to be the key step. The shelf list was sent to the Data Center, where one IBM shelf list card was punched for each volume. Accession number, call number, author's name and title, copyright date, and publisher were carefully abbreviated to fit into six fixed fields of the eighty columns on the card. These fields would appear repeatedly in later processes so this bibliographic record, which would be used in later procedures, needed to be keypunched only once. Keypunch operators were instructed to use the Library of Congress number on the catalog card instead of punching the Dewey decimal number. All cards that did not have complete Library of Congress cataloging were returned from the Data Center to the Library for neces- sary additions and corrections. The decision was made to accept the Li- brary of Congress classification call number exactly as it appeared on the card, in order to keep original cataloging to a minimum. This abbreviated shelf list punch card was sufficient for the purpose 14 Journal of Library Automation Vol. 3/ 1 March, 1970 presently planned for its use. It was considered unnecessary to provide complete bibliographical detail, which would be available from the main card catalog. Punched shelf list cards would produce accession lists, "new book" lists, and bibliographies of books in abbreviated form. There was no interest in producing catalog cards, which would have required much more work and more time than was available. Once the shelf list cards had been punched, reclassification proceeded as follows: the punched shelf list card was used as the source card to create the circulation book card, duplicating accession number, call num- ber, author and title. The punched shelf list card also produced the gummed labels, a label for the book pocket, the spine label, and labels for the catalog cards. The punched cards had been produced in Dewey decimal order from the catalog trays. Labels, shelf list and circulation cards were placed with the Dewey decimal shelf list card to make a set. It was important to keep them in Dewey decimal order, since the books would come off the shelves in that order. Each set of labels and cards was placed in its respective book, labels were attached to the book pockets already in the books, and call number labels were applied to the spines of the books. The remaining labels were filed with the Dewey shelf list card and later attached to the other cards in the card catalog. Circulation cards were inserted in the book pockets and the books were returned to the stacks, which had been relabeled with Library of Congress numbers. Student assistants were used to per- form this work under the supervision of the librarians. Originally, it was estimated that the job would require three years, but by automated pro- cedures reclassification took only about six months. Attaching the labels to the cards in the card catalog did take another year. Labels were placed on the cards at the card catalog without remov- ing cards from the trays. Instruction was given and examples were dis- played to show how to locate the new Library of Congress number on the catalog card, thus the use of the card catalog was not hindered by the lack of labels on the cards. This method seemed to be the most effi- cient, since new cards would have been expensive, headings would have had to be retyped and all cards refiled again. CIRCULATION (Figure 1) A machine readable shelf list made possible an automated circulation system, since producing a machine readable circulation card (Figure 2) would be a simple computer operation. The old circulation system pre- sented the usual problems but manually preparing the over-dues was costly and time consuming and would benefit most from automation. Total circulation for the Library for 1968/ 69 was 18,000 books. The maximum number of books charged out per day was one hundred and fifty books and the minimum, twenty-five. Maximum size of the loan record file was about two thousand charges. Library Mechanization at Auburn/HILBERT 15 Mak~ up dtlf~ eal"d. iDJ.af '., rit~fl} Ia L i.brary Fig. 1. Circulation Flow Chart. Gets . (Jut t:~""hy eat"d.5 . +hat- a/r~ady hare. a. d~.~e cla"k- i11 The-m. 16 Journal of Library AA.ttomation Vol. 3/ 1 March, 1970 -1 I~' ~, :r:; - o· -o- 10 NC 0 0 ;sz oo 3::-i oo"" om 03:: ;><;0 < m [AI NU~Bt• UI HOR OUI _10 NC . \. II Ill I U U 01 ' 110 ,,_; ,. 01 u n" " "'"""•"""" '"'" " ~~"~!"".~_'~~-~~ · Ko l nn• ll K """" " KH 011 " "' '""""" " "'" Fig. 2. Circulation Card. A mark-sense reproducer prepares the cards for the computer. This reproducer had been acquired for other college computer functions and the Library was able to make use of it (2). Under this plan the books are charged out by having the borrower write in his identification number, which serves as the borrower's number, and his name in the appropriate box on the IBM circulation card ( 3). The student assistant at the desk mark-senses the book card with the identification number; this is the one manual operation, but it has presented no problem. The marked circula- tion cards are sent three times a week to the Data Center, where the mark-senses are read and punched and the due date is gang-punched in. The 1401 computer generates a second circulation card, duplicating the accession number, call number, author and title. Old and new circula- tion cards are machine filed together by accession number and returned to the circulation file, which is arranged by date and accession number. It was found that the accession number is easier to read than the Library of Congress number and is the truly unique number. A printed circulation listing, arranged by call number to facilitate use, is kept at the charge desk; it shows accession number, author and title, borrower's name, identification number and due date. It is also possible to prepare a daily circulation report by student identification number and name if required. The entire circulation is sent to the Data Center weekly to produce a cumulative print-out of all books in circulation. These print-outs provide daily and weekly totals of all outstanding circulated books. No data processing equipment is required for reserve circulation. Charg- ing out of books on reserve continues to be done by having the borrower write his identification and name on a blue reserve card to be kept at the desk. Library Mechanization at Auburn/HILBERT 17 When a book is returned, the pair of circulation cards are selected from the circulation file. The used charge card, which contains the borrower's identification number and due date, is marked "cancelled" with a rubber stamp. The new circulation card is inserted in the pocket of the book and the book is reshelved. Cancelled circulation cards are kept and sorted later to provide statistical analyses by date and class number for each semester. This system was developed because it was felt a small library could not justify expensive charging machinery. ACQUISITIONS AND SHELF LIST Once the reclassification operation was organized it was possible to set up automation procedures for processing current acquisitions. An IBM card was designed as a book request card (Figure 3) to be filled in by staff or faculty member. Information on it includes author's name, title, publisher, price purchase order number, academic department, and re- questor's name. At the Data Center the foregoing information is key- punched into the card, which then becomes a purchase order card. The purchase order number identifies the vendor and is gang-punched into the cards. A computer print-out produced from the purchase order cards is mailed directly to the dealer as a book order. Order cards are kept in an "on order" file by dealer or purchase order number and then by author until the books are received. / I I I I I I I I I I""'' '""· .,.o . Nol li.tr,. 0 c • •. nn Uf ••11r nnvr TIIS Ill£ M- LIBRARY Author C REQUEST FORM Tide Inch.! de Date & Edirion II NeceuafY Publis'her Dept, Please print or type. list Price I Reqvested a, COMPLETE REQUEST nn liT JHOW nrs LIM . AND SIGN. P.O. -~ Cod LC. Cion I I Do not fold, bend Acceu;on .J I I I or mutilate this card. I I I I I I ' 1 f J t i I I I I II n IJ Ult 111111111tR11 nn l4 11Mn•~UU~»•nxnUt1UnM~·t/II~MIIUUMMMPMM.I1Uh~U-II·H~ ItD»H~NIIH~· ~ ............. Fig. 3. Book Request Card. When the book and its Library of Congress cards have been received, the corresponding order cards are pulled and the following additional information is added to the purchase order card: actual cost (taken from the invoice), accession number (stamped on), and the Library of Congress call number (taken from the Library of Congress printed card). Figure 4 is a flowchart of the acquisitions procedure. The books are then processed in the same manner as was used for reclassifying ( 1). 18 Journal of Library Automation Vol. 3/ 1 March, 1970 ,,.~, Ll/20 VCR Fig. 4. Acquisitions Flow Chart. Vpm r~ceipt .f- ~~.s tJ,.de/#~d. M•ke. ~kiF list . CArd~ f"r1m ~rcluue. Orr:lel' Li~ra~"''/ ~,. fi/,..,9 Library Mechanization at Auburn/ HILBERT 19 /41>1 LillO Fig. 4 continued. 1~01 L/1~0 Order cards are sent to the Data Center to reproduce the shelf list cards, automatically transferring the pertinent in- formation already punched in the order card and keypunch- ing the additional information into the shelf list cards (Fig- ure 5). Currently, provision is being made for inclusion of the Library of Congress card order number in the shelf list card to enable easy subse- quent selection of the corre- sponding MARC records. -r;, L,L,I"r%ry -hr .fi/i-,9 Fig. 5. New Books Listing Procedure. 20 l ournal of Library Automation Vol. 3/ 1 March, 1970 The shelf list cards are used to produce the new books list (Figure 5). The shelf list is kept in the IBM card form, and a book catalog could easily be made if so desired. To compile a bibliography it is only neces- sary to take the punched cards from the shelf list in the wanted classifi- cation. The Library's subject catalog and the Library of Congress subject headings are checked to determine the class numbers to be used. As de- picted in the flowchart in Figure 5, these cards are put through the com- puter to produce the print-out, and then returned to the punched shelf list file. This system was designed to produce a bibliographical record of the books in the library and to automate the technical processing of the books in as simple a method as possible so as not to defeat the purpose of automating. ACCOUNTING (Figure 6) The accounting system was designed to use the book re- quest card after it has had department and cost punched into it. After the books are processed, accumulated re- quest cards are sent periodi- cally to the Data Center, where computer print-out is produced by department, list- ing the books purchased and the cost of each, with a sum- mary showing all expendi- tures. Copies are sent to de- partment chairmen to keep them informed of their ex- penditures. These order cards are kept for a semester, then returned to the individual re- questing faculty members after a cumulative accounting record has been made. By this means it is possible to keep track of each department's book budget and the Library's total book budget, with the computer doing all the work. L117tJ Fig. 6. Accounting Procedure. Library Mechanization at Auburn/ HILBERT 21 OVERDUES (Figure 7) Overdue notices are machine prepared from overdue circulation cards which are selected periodically from the charge-out file. The cards are passed through the computer, which generates second and third overdue Overdue C!.ire. ecl"ds 14tJI 70 Li.6ran; Mw Ovt~due.. File L. 1/~t:f Fig. 7. Overdue Procedure. t 22 Journal of Library A.utomation Vol. 3/ 1 March, 1970 cards to be used for discharging purposes. Gummed address labels that include the student identification number are produced using the college log of names and addresses. The appropriate label is applied to the re- verse side of the circulation card using the I.D. as a guide. Each notice card is stamped "Overdue book, please return as soon as possible," then sent through the postage meter and placed directly in the mail. If several overdues are sent to the same person, the cards are mailed in an envelope, using the gummed label. The second and third notice cards are filed at the circulation desk until needed or until the book is returned. There is another file for borrowers who are seriously delinquent in returning their books. Cards that have accumulated in this overdue file are processed as follows to generate further overdue notices: an overdue notice is sent to the borrower, the Dean's letter to the borrower or to his parents, and the list of names to the Dean and the student personnel office. At the end of each semester a list is prepared indicating all books held by individual faculty members for more than three months and the latter are notified. The time-consuming operation of preparing overdues has been considerably reduced ( 4). SERIALS Serial holdings have been converted to machine readable punched cards. The State University of New York, under the direction of Dr. Irwin Pizer of the Upstate Medical Center at Syracuse, has recently published a union list containing the titles of all periodicals received in all units of the State University (5). It includes the serial holdings of Auburn Com- munity College, (approximately 400 titles) and punched cards for these holdings are used by the Library adapted for its use. Information on the card comprises title, inclusive dates, years on microfilm, department for which the periodical was ordered and the indexes in which the periodical is listed. Each new serial title added to the holdings is keypunched with this information. The punched cards are used to print out an alphabeti- cally arranged title listing and a departmental listing. Adding or with- drawing titles is a simple matter, and up-to-date lists of periodical hold- ings are easily produced by the computer. Copies of the lists are sent to eacli faculty member and several copies are available at the desk and in the periodical room. COSTS Since Library use of the Data Center was considered to be similar to other college uses (e.g., that of the Business Office), the cost of library automation was absorbed by the Data Center and not charged to the Library. An estimate of the cost, including rental time on the computer (about three hours per week), supplies, and Data Center staff time, is about $1500.00 a year for ongoing programs. Library Mechanization at Auburn/HILBERT 23 CONCLUSION The automated systems herein described have now been completely operational for over a year. Converting data for a computer operation spotlighted inaccurate recording of information and afforded a good op- portunity for correcting previous errors. Periodically, progress and results have been reviewed and changes made, as will continue to be the case. The automated circulation system is providing the library with rapid, ac- curate, and efficient circulation control not possible for a manual system. Ease and speed of performing routine library operations by the use of automation more than compensates for the cost of data processing. Auto- mated technical procedures provide for faster and more efficient process- ing of books, production of the Library's monthly new books list (which previously took hours to type) and subject bibliographies. Other impor- tant results of the mechanization project are the serial listings and de- partmental accounts, all of which make possible better library service. ACKNOWLEDGMENTS The programming was done in AUTOCODER by, or under the super- vision of, Mr. Richard Klinger, Chairman of the Data Processing Depart- ment at Auburn Community College; to him is due most of the credit for the mechanization of the Library. The Library is grateful to Mr. Klinger for his encouragement and enthusiastic support and his willing- ness to assume the technical responsibilities of programming and systems design. REFERENCES 1. International Business Machines: Mechanized Library Procedures. (White Plains, N. Y.: IBM, n. d. ). 2. International Business Machines: Library Processing for the Albuquer- que Public School System (White Plains, N. Y.: IBM, n. d.). 3. DeJarnett, L. R.: "Library Circulation." In International Business Machines Corporation: IBM Library Mechanization Symposium (En- dicott, N. Y.: 1964), pp. 78-93. 4. Eyman, Eleanor G.: "Periodicals Automation at Miami-Dade Junior College," Library Resources and Technical Services, 10 (Summer, 1966), 341-61. 5. The Union List of Serials in the Libraries of the State University of New York. (Syracuse, N.Y.: State University of New York Upstate Medical Center, 1966). 5125 ---- 24 TEACHING WITH MARC TAPES Pauline ATHERTON: Associate Professor, and Judith TESSIER: Research Associate, School of Library Science, Syracuse University, Syracuse, N.Y. A computer based laboratory for library science students to use in class assignments and for independent projects has been developed and used for one year at Syracuse University. MARC Pilot Project tapes formed the data base. Different computer programs and various samples of the MARC file ( 48,POO records, approx.) were used for search and retrieval operations. Data bases, programs, and seven different class assignments are described and evaluated for their impact on library education in gen- eral and individual students and faculty in particular. A computer based laboratory for use in library science instruction, with the MARC Pilot Project tapes as the file of catalog records, has been the focus of LEEP (Library Education Experimental Project) at Syracuse University, School of Library Science, since August 1968. Work has been twofold: 1) development of the laboratory as an instructional tool and 2) exploration of applications of such a facility in library education. The instructional aspect of the project is really "learning with MARC tapes". The development of the laboratory has been reported elsewhere ( 1 ), and will not be emphasized in this report. Many of today's students in library schools will be tomorrow's workers in libraries that will be parts of library networks and cooperative techni- cal processing centers. They will be involved in library automation proj- ects and related developments. In anticipation of personnel needs for new modes of library service, LEEP designed activities in the laboratory to satisfy minimum requirements for tomorrow's professional and to en- courage maximum use for students with serious interests. Teaching with MARC Tapes/ ATHERTON and TESSIER 25 The aim during the past year has been to develop a laboratory where computer programs and the MARC tapes could be used by library school faculty in class assignments and by students for independent research. The objective was to achieve a program of activities integrated through- out the library school curriculum-one in which computer applications would be seen as one more source of support for the functioning librarian. Students were to be provided with a myriad of experiences that would help them to probe the potential usefulness of machine readable catalog data and to develop certain minimal skills needed for using computer based retrieval systems. Figure 1 shows the resulting place of LEEP in the library school. The approach has two stresses : 1) demonstrations of li- brary automation and 2) activities where computers are used in librarian- ship for research and experimental applications. This orientation is in contradistinction to Hillis Griffin's use of the term, automation in techni- cal processes, (as he defines it, it includes only acquisition, cataloging, and circulation processes) ( 2). After a short description of the facilities at Syracuse this paper will deal with the various class assignments and student projects developed in the first academic year of use, the feedback from students and faculty concerning the usefulness of MARC records in instruction, and the au- thors' conclusions. l.S. CLASSES Reference CotaloCling Bibliography Technical Services Information Systems etc. Faculty Research Curriculum Development Class Assignments Independent Student Projects Fig. 1. LEEP's Role in the Library School. 26 Journal of Library Automation Vol. 3/ 1 March, 1970 TABLE 1: Data Bases and Computer Programs Available Through LEEP. I. Data Bases: MARC I- 48,000 records (the entire Pilot Project file) LEEP Programs Function Program BIBLOLST(3) FDR(4) Language Access by LC card number, prints Assembler each bibliographic record in LC diagnostic format. A frequency distribution program for Assembler file study. MARC I Prints the entire content of a file Assembler Double Column of MARC I records in a two-column Lister page format. MARC I Record Sort Sorts a file of MARC I records on the Assembler content of any variable (tagged) field. II. Data Bases: Programs MARC/ DPS-1000 (the first 1000 MARC I records) MARCS/ DPS-5200 records (a stratified sample of social science records, selected by LC class number, and the LC A's and Z's) MARCS/DPS(II)-5200 records above, plus 3800 (a stratified sam- ple in humani- ties, selected by LC class num- ber) Function MARC reformat Reformats MARC I records to meet DPS requirements and performs cer- tain counts of characters per field, etc. Program Language PL/ I Teaching with MARC Tapes/ ATHERTON and TESSIER 27 DPS (IBM Docu- ment Process- ing System) (1,5,6) Processes entire text of MARC record Assembler to produce dictionary and search file, of keywords. Retrieves records by any keyword or keyword combinations specified by searcher. Allows for root searches, weighting, phrase and field place- ment, etc. III. Data Bases: MARCS/ MOLDS-5200 (See MARCS/DPS above) MARCH/ MOLDS-3800 (See MARCS/DPS (II) above) IV. V. Programs DBG (7) MOLDS (Management Online Data System) (8) Data Bases: Programs SHOP (Subject Heading Output STAT Data Bases: Test Programs Function Data base generator; selects and formats records for MOLDS. Program Language PL/ I Retrieves fixed- field records by · Fortran matching elements; includes sort ca- pabilities and arithmetic operations. LICOSH (LC Subject Headings, 7th ed.) Function Program Language Formats and prints subject heading PL/ I records. A frequency distribution program for PL/ I file study. LC/Z Class (LC schedule Z) Index to Z class Function Program Language Z Text Processor Selects certain lines of text from Assembler, sample of LC Z-class schedule and FORTRAN transforms lines into KWIC indexable data. 28 Journal of Library Automation Vol. 3/ 1 March, 1970 LEEP FACILITIES The facilities at LEEP include MARC Pilot Project data bases, com- puter programs, and personnel. Students and faculty were fully informed of the accessibility of the staff of faculty, programmers, and graduate student liaison personnel for consultation and guidance. Further, the fa- cility includes computer time; either the LEEP budget or the Library School's university-supported computer budget covers the time charges for class assignments and student projects. Table 1 lists and describes the LEEP programs (with explanation of acronyms) and data bases that are available at the present time at the University Computing Center for library instruction purposes. The LEEP facility uses the University's IBM 360/ 50 computer with the following characteristics: Main storage: 512 K bytes Disc storage: 240 M bytes (2314 disc unit) 3 Tape units: 9 channel (800 bpi max) 1 Tape Unit: 7 channel (800 bpi max) Printer: 1000 1pm (two print trains, std and TN) Card read/punch: 1000 cpm in 300 cpm out Work on the implementation of the computer programs available from the Library of Congress and IBM was carried on through most of the first academic year. BIBLOLST was in use during the Fall term, but the first efficient retrieval system, DPS, became available only during the Spring 1969 term. For this reason, instructional development has been limited to one semester and one summer session. The experiences re- ported here are based upon those two terms. MARC II records became available only in late Spring 1969. No effort to utilize these records has been made to date, but future plans do in- clude using such a data base. Class Assignments and Student Projects Using LEEP Because most assignments use the DPS retrieval system, learning that system once helps the student in consecutive assignments. LEEP staff arranged tours of the Computing Center for individual students, classes and faculty. Keypunching instruction and DPS explanations were dis- tributed with first assignments, or as needed on an individual basis. Dur- ing the summer there was instituted an all-school LEEP orientation semi- nar and a LEEP clinic, where a staff member was available for consulta- tion one hour each day in the corridor outside the Library School class- rooms. Materials related to MARC/ DPS are always available near the students' study carrels and their reserved reading shelf. Seven different assignments were developed for classroom use by the Library School faculty working with the LEEP staff during the first year. Each assignment reflects the interests of the teacher and the purposes of the unit in which it was introduced. During the spring semester over • Teaching with MARC Tapes/ATHERTON and TESSIER 29 100 students had computer based assignments (over 200 searches total); during the summer session, seventy-six students had assignments (over 200 searches). Following are abstracts of the seven assignments: L.S. 407 Bibliographic Linking Reference Service Purpose: a) Obtain a listing of titles con- L.S. 427 Cat. & Class. (Richmond) L.S. 427 Cat. & Class. (Moore) L.S. 621 Technical Services ( Gration and Webster) taining bibliographies from MARC records; b) Prepare for extension and inter- connection of some of these bibliographic entries and the original titles within the MARC data base. c) Practice bibliographic evaluation. Procedure: a) Area of interest was selected by Dewey or L.C. class number (root search, AND, OR options) from MARC file of 1000 records. Records with class number and bibliographic note were retrieved using DPS /MARC system. b) Bibliographic entries in these titles were examined and MARC I worksheets were made for three English monographs, with added data fields for source of reference. c) Evaluate the bibliographies in the books examined as reference tools for a scholar. Title Searches Purpose: Contrast searching for titles to be ordered in BPR and in MARC file, in order to ob- tain L.C. card number, established entry, and full cataloging record. Procedure: a) Search for 12 titles in BPR ( 1966 and 1967). b) Search in MARC file ( 1000 records) for 10 (AND searches of title words), and prepare unit cards for any 5. Searching a Shelf-list Purpose: a) Verify an assigned class number with library holdings in that number. b) Compare the subject headings for one class number. Procedure: Assign Dewey Class numbers to three titles; search the MARCS/DPS file for the as- signed number; compare titles cataloged with titles retrieved by search for consistence and compare subject headings on worksheet provided. Searching for Acquisitions Purpose: Extract cataloging records from MARC files ( 48,000 records) for titles selected from Choice or Library ] ournal ( 1967 issues). 30 Journal of Library Asutomation Vol. 3/1 March, 1970 L.S. 621 Technical Services (Gration) L.S. 628 Information Systems (Bottle) L.S. 628 Information Systems (Bottle) Procedure: Cite L.C. card number for selected titles (at least 10); keypunch numbers; submit with job control deck to dispatcher in Computing Center and obtain printout of full L.C. cataloging via BIBLOLST program. Evaluation of Series Purpose: a) For a given subject, examine catalog records for titles in a series. b) Determine quantity of material on a subject published in series. c) Evaluate series notes and series tracing with a view to setting policy for series control. Procedure: a) Search for subject via DPS/ MARC system ( 5000 social science monographs). (AND, OR, root searches of any descriptors are possible.) b) Examine printout of 50 titles (or less) for series notes, publishers series, etc. c) Write procedural statement for handling series. Preparation of Bibliographic Information for Ma- chine Input Purpose: a) Exercise keypunching. b) Simulate preparation of biblio- graphic information for machine input. Procedure: For one MARC I input worksheet (done in LS407) keypunch six data elements fol- lowing a fixed format. Use of Boolean Logic for Searching MARC File Purpose: a) Practice in use of Boolean oper- ators. b) Practice in use of a reference retrieval system, e.g. DPS/ MARCS. Procedure: Construct 3 searches-!) Do OR search for references found earlier in S.U. Library with both L.C. card number and in BNB; 2) Do AND search for two descriptors possibly in same document, e.g., D.C. class number and L.C. class number, or two English language words that de- scribe a subject; 3) Do OR search looking for same subject as in 2). Compare results and comment on use of modifiers (root search, specification of field, sentence, or paragraph to be searched, etc. ) Teaching with MARC Tapes/ATHERTON and TESSIER 31 Aside from the structured approach developed for classroom assign- ments, students were encouraged to develop independent projects. One student group developed an index to abstracts of recent articles in the area of technical services. This involved analysis of three aspects of information in journal articles (type of library, function, and technique) as described in abstracts prepared by the class. The project group did the coding, keypunching, sorting, and listing needed to produce the index. Several decisions about abbreviations, format, content, and index order had to be made. LEEP provided keypunching and instructions for imple- menting the project. This student abstracting service, begun by one group in the Fall of 1968, was updated by another group in the Spring. The index was ready in a second edition for use by summer school students. It may become an ongoing service if there is enough student interest. Another group of students produced a computer printed bibliography on Negro history for an inner-city school library in Syracuse. PL/1, or Programming Language ONE, was offered twice as a non- credit, eight-session seminar for librarians and library school students. The teacher, a LEEP consultant, stressed the PL/1 vocabulary subset for character manipulation. The students, on completion, could do simple programs to access, count and print MARC I records. One student chose to continue his PL/1 experience, and under an independent project, pro- grammed an ordering procedure and reporting forms. Through term projects or independent research, the student can get academic credit, free computer time, and consultation from faculty and LEEP. During the spring and summer, a general invitation to the students to make DPS searches was offered and a form for search evaluation was developed. During the summer session, such independent searches be- came more popular as instructors of Subject Reference, Bibliography of the Social Sciences, and Bibliography of the Humanities allowed DPS searching as one technique in term project development. These independ- ent search results became a part of the students' subject bibliographies. Searches run by LEEP were used in two classes as instructional aids: in Advanced Cataloging and Classification as examples of precedents in cataloging practices; and in Bibliography of the Social Sciences as an ex- ample of bibliography building for area studies work where information on the catalog card can be retrieved by searching any bibliographic field therein. During the Fall semester, 1969, students had the option of taking a three-unit research course on search strategy and retrieval evaluation. The basic tool of this course was MARC on DPS, and the objective analy- sis and evaluation of reference retrieval via a computer based system, as well as evaluation of traditional cataloging in a new retrieval form. Work continues to prepare other computer based assignments in courses in the bibliography of social sciences and humanities, advanced catalog- ing and subject reference. 32 Journal of Library Automation Vol. 3/ 1 March, 1970 FEEDBACK AND FINDINGS This first year has not yet produced conclusive results about the best direction in which to continue, hut the faculty has been encouraged to think that the above-described integrated activities for the student are promising. Different students used the computer based laboratory in vary- ing degrees, the depth of their investigations being their choice. For some students the only experience with LEEP was a class assignment or an orientation lecture on computers and cataloging or computers and reference service. Others met LEEP in class hut also made efforts to explore the PL/ 1 seminars, and at the informal coffee hours and orienta- tions showed considerable interest in this field. The first year has been a blending of library automation and other concepts in librarianship which the student could explore through practical experience. Evaluation forms were distributed through student mailboxes at the close of each semester to get individual feedback. Of ninety responding (about 35% of the Library School population) sixty-four students had used LEEP in classes and seventeen had used it independently. Sixty students .. picked up new ideas as a result of LEEP." Twenty-two reported no new ideas. Fifty-eight students would .. take a job involving library automation" and fifty-five reported this was not the same response they would have given a year earlier. Some comments included: .. the field has an exciting future," "automation is of value to libraries," .. we need librarians in the field," and "now I can talk to experts and communicate our needs to them." The first assignment that the student encounters is the most important one. All the students in the school, not just those who have expressed interest in automation, are required to take three of the four courses with LEEP assignments. Students with a broad range of personal interests must be exposed to the potential of computers for libraries. The first assignment is designed to overcome students' fears of com- puters and related equipment. Many are reluctant to keypunch and hesi- tant about approaching any problem involving equipment. The method is a simple one : starting the student out with a simple assignment with little computerese and one that has a high chance of success. Every in- struction is made clear and the steps to follow are stated. The reason for the assignment is spelled out exactly and its relation to everyday librarian- ship explained. Even though students tend to resist what looks foreign and complicated, they usually respond, upon completion of the first as- signment: "that was easy". The assignments described above in Cataloging and in Technical Services best illustrate the simple-and locked-in-nature of a beginning assignment. In the classroom, MARC/ DPS has been presented in terms of the as- signment. In order to make the system understandable by the uninitiated, students have at times been given only a portion of its capabilities. This approach works well for locked-in assignments, as in Cataloging. How- Teaching with MARC Tapes/ATHERTON and TESSIER 33 ever, it hegins to seem that an in-depth introduction to DPS, with all its capabilities and fiexibilities, is a better start for some students. An evaluation of retrieval systems and machine readable cataloging data is one instructional aim which may he best achieved in special seminars. Experience has also shown that at times the integration of LEEP as- signments into the instructional objectives of a course vitiates both ele- ments. The Reference assignment is an example. The final objective of the assignment was evaluation of bibliographies in books (indicated by a bibliography note or MARC I indicator). The first section of the assign- ment included structuring a subject search in MARC/DPS that illustrated access to bibliography indicators not accessible in the card catalog. The second and third parts dealt with retrieval of the books and evaluation of the bibliographies. The students expressed satisfaction with the citation retrieval, but they experienced frustrations in finding the books cited on the MARC tapes. Finally, the evaluation of the bibliographies suffered from the student frustration, and students came to regard a "LEEP spon- sored" assignment as "too complicated." The indications from this one assignment were that instruction in retrieval techniques and poten- tial would be sufficient, and need not be tied directly to a larger problem which may rely on external resources. In other classes an integrated ap- proach was used successfully, whenever the techniques and parts of the assignment were kept simple. The greatest impact of the PL/I non-credit seminar was not learning how to program, but understanding what is involved in using such pre- cise techniques, and how to specify steps with logic. This helped make the programmer's role more understandable to the librarians and students in the class. SUMMARY The stress during the first year of operation has been on implementing the LEEP facility for class use. Now that development of data bases and retrieval programs is somewhat stabilized, it is hoped to move on to more use of this tool for analysis and evaluation. With faculty support, it is planned to continue designing class assignments, increasing the "catalog of tested assignments." The intent is to encourage a serious study of the MARC record, and hence traditional cataloging practices. It should also he possible to do some useful research into the nature of bibliographic description as a tool for reference retrieval. LEEP will continue through 1969-70, using the MARC tapes (MARC I and soon, MARC II) . Emphasis will not he to teach "how to automate a library better" hut to learn "what difference does a machine readable catalog make" and "of what use and value is such a record to librarians and library users?" The MARC records will be used to ask questions about how to improve or change acquisitions, cataloging, reference, and other library functions. This is a departure from the use of computer 34 Journal of Library Automation Vol. 3/ 1 March, 1970 based facilities to teach library data processing. The LEEP approach seems to have had an impact upon library students who are "straight librarians," and not very interested in library automation. It may also foster a greater interest in analysis and research in the Library School. For example, with machine readable catalog records it is possible to monitor what has been done in practice before and after the Anglo-American Rules, or the various additions in L.C. subject cata- loging and classification. It is possible to check cataloging consistency more easily. Because the MARC tapes include both Library of Congress class numbers and Dewey class numbers, they can be compared as to their usefulness for subject searches, subject spread on a library's shelves, etc. With MARC II tapes, it should be possible to simulate a data base more like a national bibliography, and thus open new fields for efficient survey. As noted earlier, all the research, whether on retrieval evaluation or on the nature of cataloging, is student based. The Library School's objective is to provide the facility, the impetus, and the guidance which make up the intellectual environment where such investigation can be done. LEEP is a new part of the library school environment. It can serve to encourage librarians to consider, understand, and even use computers where applicable, in library schools today and in the library field tomor- row. The future use of computers in libraries will be decided by librarians and not by system programmers or automation technologists. To prepare such librarians there must be a time in their lives for experimentation, research and development. There must be a time when they can objective- ly question what of the old can blend with the new and what will have to be revised. We hope that LEEP has provided that opportunity to some, if not all, of the students and faculty at Syracuse University School of Library Science. ACKNOWLEDGMENTS Work reported here was partially supported by a grant from USOE, Bureau of Library Research, OEG-08-0664. This paper is based on a presentation before the Library Education Division at the American Library Association Annual meeting, Atlantic City, New Jersey, June 23, 1969. PROGRAMS AND DESCRIPTIONS Microfiches and photocopies of the following LEEP program descrip- tions and related materials may be obtained from National Auxiliary Publications Service of ASIS as follows: 1) "LEEP Program Description for MARC I File: Distribution of Records" (NAPS 00878); 2) "LEEP Report 69-11 : LEEP Program Description: MARC I Double Column Lister" (NAPS 00879) ; 3) "LEEP Report 69-12 : LEEP Program Descrip- Teaching with MARC Tapes/ATHERTON and TESSIER 35 tion: LEEP-BIBLOLST" (NAPS 00880); 4) "LEEP Report 69-13: LEEP Program Description: MARC I Record Sort" (NAPS 00881); 5) "LEEP Report 69-14: LEEP Program Description: Listing Machine-Readable Library of Congress Subject Heading File" (NAPS 00882); 6) "LEEP Report 69-15: The Conversion of the LC Classification Schedules to Machine-Readable Form" (NAPS 00883); 7) "Rome Project Program Description: MOLDS Support Package" (NAPS 00884). Copies in mimeographed form may also be had by writing to Library Education Experimental Project, School of Library Science, Syracuse University, Syracuse, New York 13210. REFERENCES 1. Atherton, Pauline; Wyman, John: "Searching MARC Tapes with IBM/Document Processing System." In Proceedings of the American Society for Information Science, 6 (Westport, Connecticut, Greenwood Publishing Corporation, 1969), 83-88. 2. Griffin, Hillis: "Automation of Technical Processes in Libraries," In Annual Review of Information Science and Technology, Volume 3, Cuadra, Carlos A., ed., (Chicago: Encyclopedia Britannica, 1968), pp. 241-262. 3. Library of Congress, Information Systems Office: MARC Pilot Pro;- ect Report, Appendix A (Washington, D.C., 31 January 1967), p. Ill, 3, 21. 4. Martel, Frank: Stillwell, John: MARC Pilot Pro;ect File Analysis of Distribution of Records (Syracuse: LEEP Report 69-1). 5. Tessier, Judith: Index and Manual for IBM System/360 Document Processing System (Syracuse: LEEP Report 69-5). 6. Tessier, Judith: Searching MARC/DPS; a Users Manual (Syracuse, N.Y.: LEEP Report 69-3 ). 7. LEEP Report to be published December 1969. 8. Peterson, P. L.; Carnes, R.; Reid, I.; et. al.: Large Scale Information Processing System, Vol. I. Compiler, Natural Language and Informa- tion Processing, Report RADC-TR-68-401, Volume I (April 1969). 5126 ---- 36 PROCESSING OF MARC TAPES FOR COOPERATIVE USE Kenneth John BIERMAN: Data Processing Coordinator, Oklahoma Department of Libraries, and Betty Jean BLUE: Programmer, Information and Management Services Division, State Board of Public Affairs, Oklahoma City, Oklahoma A centralized data base of MARC II records distributed by the Library of Congress is discussed. The data base is operated by the Oklahoma Department of Libraries and is available to any library that can make use of it. The history, creation, operation, uses, advantages, disadvantages, cost and future plans of the data base are included, as well as flowcharts (both system and detail) and sample outputs. BACKGROUND INFORMATION Early in 1966, college, university and public librarians in Oklahoma began meeting irregularly to discuss library automation. The incentive for such meetings was clear -libraries in Oklahoma could not justify the financial expenditure necessary to "go their own road" in library automation. Sec- ondarily, they realized that at some time in the future, cooperative auto- mation projects begun now would pay big dividends. With the coming of Library of Congress MARC II distribution service in April 1969, interest in library automation once again came to the fore- front in Oklahoma library circles. Mter several general meetings, primar- .. Processing of MARC Tapes/BIERMAN and BLUE 37 ily to find out what others were doing, planned to do, had done, or had failed to do, a MARC planning meeting was called by the Oklahoma Department of Libraries. Representatives (both administrative and tech- nical) of the three colleges, two public library systems, and two universi- ties that were most likely to be doing anything with MARC in the im- mediate future, were invited. The feeling expressed at the meeting was that if economic use of MARC were to be made in Oklahoma, there would have to be cooperative effort so that MARC data could be used at the least total cost. At the same time, libraries had planned different uses of MARC; therefore, allowance for local autonomy and creative use of MARC was important. Since libraries were planning varying applications of MARC, it was decided that the best place to begin a cooperative effort was in the cen- tralized maintenance of a MARC data base. Four libraries in Oklahoma had placed subscriptions with the Library of Congress for the MARC II tapes when they became available - one public library, one college li- brary, one university library, and the Department of Libraries. The cost of maintaining four complete data bases would be large compared to the cost of maintaining one complete data base in the state for everyone's use. The money saved could then be used for utilization of MARC rec- ords rather than for housekeeping maintenance of MARC records. Mr. Ralph Funk, Director of the Oklahoma Department of Libraries, commit- ted the Department to obtaining and maintaining a complete file of all cataloging information sent out by the Library of Congress in MARC II machine readable format (both current and, when available, retrospec- tive) which would always be available on demand (either in part or in whole) to any library in the State. This report describes the cooperative system developed by the Depart- ment of Libraries to maintain and make available MARC II records to any library in the State that has the computer equipment to make use of them. Unlike NELINET (1) and the Washington State System (2), which are processing MARC tapes to produce final hard-copy products for the cooperating libraries, the Oklahoma system provides the machine readable MARC records (not final products) a library needs; then that library can process these records in any way it wishes on its own equip- ment. None of the MARC I participants was primarily concerned with the central distribution of selected machine readable records ( 3). Possible future state-wide cooperative ventures with MARC (including useful products) are also discussed. OVERVIEW OF THE SYSTEM The system can be thought of as two sub-systems: 1) Merging and maintaining a MARC master file of all records sent out by the Library of Congress in MARC II format, and 2) retrieving-i. e., withdrawing- 38 Journal of Library Automation Vol. 3/1 March, 1970 selected records by LC card number from the MARC master file for specific libraries on demand. The maintenance sub-system has four distinct programs: 1) ODL-01, which merges MARC tapes; 2) ODL-03, which drops or transfers to an- other tape any record or combination of records on a given MARC tape; 3) ODL-04, which prints a MARC tape in upper-case EBCDIC; 4) ODL-06, which prints the LC numbers (300/page) from any given MARC tape. The retrieval sub-system has one program, ODL-05, which selects and copies specified LC card number records from the MARC master file to a blank magnetic tape to be sent to the requesting library for its use. THE MAINTENANCE SUB-SYSTEM The programs discussed in this section are used to merge and maintain the MARC data base ( ODL-01 and ODL-03) and produce hard-copy by-products which are of occasional use for various purposes ( ODL-04 and ODL-06). System, input and output descriptions are included for each program. Record Merge Program This program takes tapes in MARC format and code and merges them in LC card number sequence. During processing, messages print if any unusual conditions occur, such as a new record with status other than "n" (new), a matched record with a code other than "c" (corrected), or "d" (deleted), etc. The messages also indicate the action taken. In gen- eral, any new record is merged onto the file regardless of code, a match with code "d" causes deletion, and a match with code "c" (and any other match) results in replacement by the new record. This occasionally causes "invalid" codes to be merged onto the file, but this approach was taken for three major reasons, one being that it is usually easier, in cases of error, to remove a bad record from the master than to retrieve it from its source and then merge it onto the master. Secondly, as files become larger, it is feasible to make minor merges of a few tapes, then merge the result onto the actual master. During the minor merge, many apparently new records with codes "c" and "d'' will appear, but as the final merge is made, appropriate action is taken. Thirdly, a library obtaining MARC records from the Department of Libraries may also want to use the ODL-01 merge for its own internal use. Some of the records requested by the local library may have been corrected at some time and are therefore coded "c". Although new to the individual library, these are coded "c", but are perfectly valid records from that library's point of view. Since the ODL-01 merge always merges a new record onto the file regardless of the code, this program can be used by the individual library without modification. Processing of MARC Tapes/BIERMAN and BLUE 39 Inputs are 1) a MARC master (a tape in MARC format and code con- taining all records merged to date), which is in LC card number se- quence; and 2) MARC "items" (a tape ( s) in MARC format and code containing the new records to be merged.) Processing halts if this tape ( s) is out of sequence. r- - ----___,, I I I I I I I I I I I I Dl- 1 MARC Merge Fig. 1. Record Merge Program System Flowchart. Outputs are 1) a MARC master, which is a tape containing records from all inputs, with appropriate corrections and deletions made; and 2) a merge listing, which contains notices of all corrections and deletions and notices of all unusual conditions. If desired, this listing can be ex- panded to print certain desirable information from all records merged and thus can be used as a valuable reference and check list. It will con- tain the LC card number, with prefixes and suffixes, and status code, and will indicate if a match was found on the master tape and the action taken. Figure 1 explains the overall flow of the program. Figure 2 gives the program details as of September 1969. HSKP I / ' E';ral ""'- • / •• r tra ' \ Red I CONSTRUCT I NVALID N~\1 CODE MSG ' ' \ M O'I' I Fig. 2. Record Merge Program Detail Flowchart . • 11:>- 0 ....... 0 1:: '"'t ~ ~ .Q. t"-t .... ~ '"'t ~ ~ E- 0 ~ .,... .... 0 ~ < 0 ~ c.:> .......... ~ s: Ill '"I n .?"' ~ (0 ~ 0 E~TUS HCONSTRUCT •C? HATCR MESSAGE; ____j \.__} \.__} L.__1 \.__} I """ No ~ ~ ~ Co Co -· ;:$ Q"q .Q.. ADD 1 TO H CONSTRUCT ~ PRINT H SKIP A SKIP I a:: A READ A READ > DELETE DELETE MSG OLD NEW OLD NEW ~ COUNTER ~IESSAGE Ht":ti ~~~ D~A D-A (J ~ ~ ~ Co CONSTRUCT I ~RAL .......... INVALID to MATCR MSG 1-; trj !:lO ~ > z Ill ::s 0.. to roe Fig. 2 continued. c:: tr1 ,j:>.. ....... ~ ~ ...... c SUBTRACT I 'r·-··-- ~ 2048 FROM ...... LENGTH c -1:"'4 .... ADD 1 TO c:.-- NEII·COUNT ~ EXIT ) ~ ~ .... c ;:! ~ .... s· ;: < c :-- c.:> SUBTRACT I ......... ~ SUBTRACT I v \ R~d I 1 2048 FROM I 2048 FROM LENGTH ~ LENGTH CJ NO >; ~ NO I . \ .?" EXIT ~ (!) EXIT ~ Fig. 2 continued. ADD 1 TO NEW-COUNT Fig. 2 continued. SUBTRACT 2048 I FROH LENGTH ~ I EXIT ) UD-NE\1 - HOVE HI- VALUES TO OLD-COMPARE ~ AREA -----' HOVE HI· VALUES TO H NEll-COMPARE AREA ~~ J EXIT '"I; ~ ~ <.':) ~ ~- 0 - ~ ::x:l (") ~ ~ ) <.':) EX IT ~ b:J 1-C tr1 !:d ~ > z § 0.. b:J t-t c tr1 ..... (,:) 00886NAM 2200205 0010013000000080041000130500021000540820018000751110093000932450119001862600 ~7ft03053000033003425000089C037550400290046465000320049365000240052570000460054970000460059571000400 0641& 67026007 &690324S1968 MOUA · B 10100 ENGO &0 SARC847SB.A67 1966& $4616.3/62/0755&20 SAAPPLIED SEMINAR 0~ THE LABORATORY DIAGNOSIS OF LIVER DISEASES,sCWASHINGTON, D.C.,$01966.&1 $ALABOR ATORY DIAGNOSIS OF LIVER OISEASES.$CCOMPILED AND EDITED BY F. WILLIAM SUNO F. R~AN AND F. WILLIAM SUNDE RMAN, JR.&O SAST. LOUIS,SBW. H. GREENSC*C1968*& SAXl[[, 542 P.SBILLUS.SC27 CM.& SAHELO UNDER THE A USPICF.S OF THE ASSOCIATION OF CLINICAL SCIENTISTS, NOV. 10•13, 1966.& $Al~CLUDES BIBLIOGRAPHIES.&OO SAliVERSXDISEASESSXDIAGNOSIS.&OO$AMEOICINE, CLINICAL.&10SASUNDERMAN, FREDERICK WILLIAMtSD1898•SEEO.& 10~ASUNDERMAN, FREDERICK WILLIAM,SD1931•SEED.&20SAASSOCIATION OF CLINICAL SCIENTISTS.* 00778NAM 2200169 0010013000000080041000130500019000540820010000731000017000832450295001202600 04600415300002600461500003800487500002800525652002800553740002800581& 6702A617 &690324R19681846MDU C C 00000 F~GO &0 SAF93SB.H65 1968& SA929.3&10SAHINMAN, ROYAL RALPH,$01785•1868.&1 SAA CATAL OGtW OF THE NAMES OF THE FIRST PURITAN SETTLERS OF THE COLONY OF CONNECTICUT,SBWITH THE TIME OF THEI R A~RIVAL IN THE COLONY, AND THEIR STANDING IN SOCIETY, TOGETHER WITH THFIR PLACE OF RESIDENCE, AS F AR AS CAN BE DISCOVERED BY THE RECORDS.SCCOLLECTED FROM THE STATE AND TOWN RECORDS.&O SABALTIMORE,SB GENEALOGICAL PUB. CO.,SC1968.& SA336 P.sBPORT.SC23 CM.& $AON SPINE* FIRST PURITAN SETTLERS.& SARE PRINT OF THE 1646 ED.&OOSACONNECTICUTSXGENEALOGY.&OlSAFIRST PURITAN SETTLERS.* 00896NAM 2200193 00100130000000800410U0130500017000540820010000711000021000812450128001022600 0500023030000320020049C005800312500013300370504003100503650003100534710006100565810007700626& 6703 0030 &690324S1968 NYUA B 00010 ENGO &0 SARA395.A3SBU4& SA362.1£10SAULLMANN, JOHN E.t1 SAT HE APPLICATION OF MANAGEMENT SCIENCE TO THF EVALUATION AND DESIGN OF REGIONAL HF.ALTH SERVICES,SCEDIT ED BY JOHN E. ULLMANN.&O SA*HEMPSTEAO, N.Y.,SBHOFSTRA UNIVERSITY*SC1968.& SAIII, 346 P.SBILLUS.$C28 CM.&1 SAHOFSTRA UNIVERSITY YEARBOOK OF BUSINESS, SER. 5 0 V. 2& 'A**THIS* ~fPORT RESULTS FROM THE C ONTINUlNG SERIES OF M.B.A. SEMINARS CONDUCTED BY THE SCHOOl OF BUSINESS OF HOFSfRA UNIVERSITY.*& SA BIBLIOGRAPHICAL FOOTNOTES.&OOSACOMMUNITV HEALTH SERVICES.&20 SAHOFSTRA UNIVERSITY, HEMPSTEAD, N.Y.SBS CHOOL OF BUSINESS.&2 SAHOFSTRA UNIVERSITY, HEMPSTEAD, N.Y.STYEARBOOK OF BUSlNFSS,SVSER. 5, V. 2* 00844NAM 2200217 00100130000000A0041000130410011000540500018000650 8 20014000831000027000972450 0940012426000580021830000490027635000100032549000730033550400810040865000260048965000330051584000270 054884000S200575& 67031114 &690328S1968 NJUA 8 00100 ENGO &1 ShENGFRF.&O SAN7B32SB.G6613& SA704.948/2&10SAGRABAR, ANOR=E,$Dl896•&1 SACHRlSTIAN ICONOGRAPHY*SBA STUDY OF ITS ORIGINS.SC*TRANS LATEO FROM FRENCH BY TERRY GRABAR.&O SAPRINCETON, N.J.*SBPRINCETON UNIVEPSITY PRESSSC*C196R*& SAL, 174, *203* P.SBILLUS. IPART COL.)SC27 CM.& SA15.00&1 SABOLLINGEN SERIES, ~s. THE A. W. MELLON LECTU RES IN THE FINE ARTS, 10& SABIBLIOGRAPHY* P. 149•158 12D GROUP) *ILLUSTRATIONS** P. *1*•*203* (30 G ROUPl&OOSAARTt EARLY CHRISTIAN.&OOSACHR.ISTIAN ART AND SYMBOLISM.& SABOLltNGFN SERIESrSV35.& SATHE A. w. MELLON LECTURES IN THE fiNE ARTSrlV10* Fig. 3. Print Record Program Output. ..,. ..,. I ....... -Q.. t"i & ~ ~ ..... c ~ ..... c;· ;:s ~ !"""' (;:) ........... ...... a:: ~ '"i P- I-' CD C3 Processing of MARC Tapes j BIERMAN and BLUE 45 Drop and Transfer Records Program This is a utility program that enables any number of LC card numbers to be entered on cards, with the option in each case of dropping the record entirely or transferring it to another tape for future action. It has proven useful for removing out-of-sequence records, purging files, etc. Inputs are two in number: 1) any tape in MARC code and format (sequence is not checked) ; and 2) detail cards, each of which contains a 12-position LC card number and a code indicating if this MARC record is to be dropped or transferred to another tape. These cards must be in sequence. There are three outputs: 1) an updated tape containing all MARC records on which no action was taken; 2) transferred tape containing, in sequence, all records transferred; and 3) a listing showing the LC number and the action taken, which is useful for verification of results. Print Record Program This program prints in readable form any tape in MARC code and format. The translation table, which produces a form of upper-case EBCDIC, is the same as that used for other Department of Libraries programs. It is a character-for-character translation, which, for the pres- ent, is useful for many and varied applications. Input is any tape in MARC code and format. Output is an upper-case EBCDIC translation of the tape. Figure 3 shows a sample output. Figure 4 shows how the Oklahoma Department of Libraries is handling the MARC expanded character set with a small printer (IBM 1404-48 characters). Simply stated, the problem is that there are many more char- acters coded in the MARC ASCII character set than are available on the particular printer that the Department of Libraries is using. (This is a local limitation of the printer that happens to be available; it is not a limitation of computer technology, as printers with expanded character sets are readily available). In general, rarely used punctuation and special punctuation marks not in the printer's character set print as an "•'', the lower-case letters print their upper-case equivalents, and diacriticals and foreign language sym- bols print as "= ". This translation table is used for in-house lists (for check- ing purposes, etc.). For production purposes, a slightly different transla- tion table is used. Characters, particularly punctuation marks, not avail- able on the printer are translated to their closest equivalent or left blank, whichever is more appropriate. At the Oklahoma Department of Libraries, all translations at this time are internal and do not affect the MARC tapes, which are being left in the original ASCII code. It seemed unreasonable to centrally translate the tapes to EBCDIC until agreement among all the users could be reached as to a mutually useful translation table. 46 Journal of Library Alutomation Vol. 3/ 1 March, 1970 There is a good possibility that in the near future the Information and Management Services Division will make available an off-line printer with an expanded character set ( upper- and lower-case letters, additional punc- tuation, etc.). If this does happen, then print-outs in an expanded char- acter set would be economically possible. k .. .c " .c "" kC: H A, Y,9 A,9 8, 9 C , 9 0,9 E , 9 F ,9 G ,9 11, 9 A,8 , 9 8 , 8 , 9 c , 8 , 9 0 ,8, 9 E, 8 ,9 F , 8,9 G , 8,9 A , Q , 9 J , 9 K, 9 L , 9 M, 9 N,9 0, 9 P,9 Q, 9 J , 8 , 9 K ,8,9 L ,8, 9 M,8 , 9 N , 8 ,9 0 ,8 , 9 P . 8 , 9 k .. " k .c .. " " .c .c " u .... " " 0 .. .. u .. .. "' :E :E "' 00 NUL $ 01 SOH s 0 2 STX $ OJ ETX $ 04 EOT $ os ENQ $ 0 6 ACK $ 07 BEL s 08 ss $ 09 liT $ OA LF s OS VT s oc FF $ OD CR s :JE so s OF SI $ 10 OLE $ 1 1 DC! $ 12 DC2 + lJ OCJ $ 14 DC4 $ 15 NAK $ !6 SYN $ 17 ETS $ 18 CAN $ 19 EM $ ! A SUB $ 18 ESC $ !C FS $ 10 GS * lE RS & l F US $ " k .. " .c .c " u u .... .....c 0 0<.> u uc: : ~6. S B $ SB $ SB $ 58 s SB $ S B $ S B $ SB $ ss s 58 $ ss $ SB $ ss s SB $ ss $ ss $ 58 $ 58 $ 4E + ss $ 5B $ 58 $ SB $ SB $ 58 $ 5B $ 58 $ SB $ SB $ sc h 50 & SB $ .. .. .c ... .c ...... "" .. ~ "'"' J ,Y,9 Z , l Z,2 Z , J Z,4 z, s Z , 6 z , 7 Z , 8 Y,l , 9 Y, 2 , 9 y , 3 , 9 Y, 4,9 y , 5 , 9 Y,6, 9 Y, 7 , 9 A ,Q, Z 1 , 9 2 , 9 J , 9 4 ,9 5 , 9 6 , 9 7 , 9 8 , 9 1 ,8, 9 2 , 8 , 9 J , 8 ,9 4, 8 , 9 5 , 8 ,9 6 , 8,9 7 , 8,9 ~ k ~ " 0 .. .. "' 20 SP 21 ! 22 .. 2J I 24 $ 25 7. 26 & 27 . 28 ( 29 ) 2A " 2B + 2C ' 20 - 2E 2F I JO 0 Jl 1 J2 2 3J J J4 4 3 5 5 J6 6 J7 7 38 8 39 9 JA : 38 JC < )0 - 3E } 3F ? MARC hex lC • tape mare k .. .c 0 !::! 0 u "' "' SP ,, k * * * " . ( ) * * ' - I ll 1 2 J 4 s 6 7 8 9 .;( " " • • * MARC hex lD • end o f recor d MARC hex l E • field terminator MARC h ex IF • delimet er Fig. 4. Conversion Table. k " .. -<£: 0 ov u uc: "'"'~ "' "'"' 40 sc ,., sc * 5C * sc * sc * 5C " 70 I 40 ( so ) 5C * sc • 6 B ' 60 - 48 61 I F\l 0 Fll F2 2 FJ J F4 4 F5 5 F6 6 F7 7 F8 8 F9 9 5C * sc h sc • sc • sc " sc • ..~ "'"' A, Z 8 , Z c,z D, Z l:: , Z F , Z C , Z H, Z A, 8 B, S C , 8 0 , 8 E , S F , S G, 8 & A,R 8 , R C , R D, R E , R F , R G , R H,R J , 8 K, 8 L , 8 M, 8 N, S 0, 8 P,8 k X <0 " .c .c " 0 " .. .. .. "' .. "' 4 0 @ 41 A 42 B 4J c 44 D 4 5 E 4 6 F 47 c 48 H 49 I 4A J 48 K 4C L 40 ~~ 4E )I 4 F 0 so p 51 Q 52 R 53 s 54 T 5 5 u 56 v 57 w 58 X 59 y SA z ss ( sc \ so J SE ... SF - .. .. oo X <0 .c : + Is 0 ,1 is 0 ,3, 8 i s 5 , 8 .. .. .1 65019066 66021087 66061643 66065709 65019667 66021669 66061685 6606'5767 6'5023174 6607.1679 66061f!69 66065770 6'502S047 66021680 66061875 67010007 65026126 66021689 66061886 67010038 65027231 66021695 66061889 67010310 6'>027416 66022509 66061R99 67010836 6'>021"107 660229Af! 66061917 67011394 6S027708 66.02'l067 66061967 670115H 65028116 66024150 66061983 670ll9H 65060409 6602'i530 66061988 67012048 65060483 66025986 66062017 67012128 6'>060652 1>6026120 660620l8 67012478 650601,84 66026122 66062160 67012840 65060737 660261H 66062168 67013691 65060796 66026124 66062252 670140!>1 65061226 660261?.5 66062259 67014071 650b1567 1>6026126 6 60 62283 67014142 65061895 6602659R 6 6 062290 67014311 65061896 660?6650 66062309 67014312 65062346 b6027410 66062403 67014916 65062359 66027435 66062405 67015033 65062399 6602769~ 6 6 062417 67015715 6'i062463 66027694 660624'•4 67016233 6506248'> M021!204 66062476 67016619 65062489 66028413 66062637 670169H 65062')04 6602A462 66062640 67017216 65062'>07 66028495 66062649 670174!19 65062'>4'3 6602 8687 66062820 67017582 6506<'').72 6602f\C)')f) 66062964 67017584 65062300 660 'i014fl 6fl06?.986 670176:>9 Fig. 5. Library of Congress Number Listing. Processing of MARC Tapes j BIERMAN and BLUE 49 c START ) MOVE DATA TO HEADER UNE MOVE TO HOIDAREA USINGH ALTER END-SW TO GO TO EOJ t-----.1 PRINT-RTN Fig. 6. Print Card Numbers Program Detail Flowchart. 50 Journ-al of Library Automation Vol. 3/ 1 March, 1970 ADD 1 TO H Yes )---------.{ PRINT-RTN LOCATE & PULL EST L.------lil'l LENGTH CODE- r- TRANSLATE Fig. 6 continued. SUBTRACT 2048 FROM LENGTH Processing of MARC Tapes/ BIERMAN and BLUE 51 SETHTOfl TRANSlATE HOlD AREA MOVE 1 to K 51 to L 101 toM 151 toN 201 toP 251 to Q PERFORM RTN- X 5 TIMES CLEAR HOlD AREA Fig. 6 continued. EO) CLOSE FilES STOP RUN SKIP 1 UNE PERFORM RTN-Y 10 TIMES l MOVE HOlD (K) TOPRlNT-K • MOVE HOlD (L) TO PRINT- L MOVE HOlD (M) TOPRlNT- M , MOVE HOlD (N) TOPRlNT- N MOVE HOlD (P) TO PRINT- P , MOVE HOlD (Q) TO PRINT- Q ADD 1 T O K, L, M, N, P, Q. 52 Journal of Library A.utomation Vol. 3/1 March, 1970 RETRIEVAL SUB-SYSTEM Withdrawing Records Program This program withdraws records selected by LC card number and copies the complete MARC II records onto another tape. A library sends the Department of Libraries a magnetic tape containing the LC card numbers for the records it wants copied from the data base. The data base is searched and the requesting library is sent back three tapes and three hard copies. The tapes are: 1) the original finder tape, 2) an item tape containing the records which matched, and 3) a tape containing the LC card numbers of the records which did not match. The three hard copies are: 1) a list in LC card number order of the records which matched containing on the first line information from the finder tape and on the second line information from the MARC tape; 2) a listing of the card numbers and other information on the finder tape which did not match any card number in the data base; 3) a listing of card numbers and other information on the finder tape that were invalid. There are three inputs to the system, the first being a MARC master, which is the latest merged master at the Department of Libraries; its records are in the original code and format. The second consists of finder records, which come from the individual library. Input is originally on card in the format specified in Table 1, then put on tape, blocked 5, and sorted (no tape labels are used at this time) on all 12 positions of the LC number. The tapes are unlabeled upper-case EBCDIC 1600 BPI. The third is a card that enters the appropriate date and library code into the system. Table 1. Original Card Input Format To ODL-~5 Card Columns 1 2-4 5-12 13 14-28 29-48 49-76 77-80 Field Contents and Special Instructions Local Library Code (assigned by Dept. of Libraries) LC card number prefix (upper case alpha or blank) LC card number (numeric) LC card Supplement Indicator (may be blank) Local Use (may be blank) Local Use or first 20 positions of Author (may be blank) Local Use or first 28 positions of Title (may be blank) Local Use or Publication Date (may be blank) The system gives the following five outputs: 1) Matched records, a listing of records that matched and were trans- ferred to the individual library's item tape. This listing shows all informa- Processing of MARC Tapes/BIERMAN and BLUE 53 tion from the finder record, and immediately below, the following infor- mation from the MARC record: LC card number, the first 20 characters of the author, and first 28 characters of the title and the publication date. Information pulled is as follows: author (first tag beginning with 1), which will usually be 100 or 110; title, which will always be 245; and date, which will be the 7-10 positions under tag 008. Figure 7 shows sample of output. The first line is data from the finder tape and the second line data from the MARC master tape. HUCHEO .RECORDS LIBRARY COOl! X QATE. PROCESSED · 06/15/69 LC NUMBER LOCAL USE AUTHOR HTLE DATE 6406(1336 ARCD PUBLISHING COHP OPERATIONS AND MAINTENANCE 1966 AROO PUBLISHING,tOM9 OPERATION~ AND MAINTl!NANCEoT 1966 6()02[680 KNOX, JOHN JAY A HISTORY OF BANKING IN THE 1969 KNOX, JOHN JAY.$0182 A HISTORY Of BANKING IN .THE 1969 67021200 G.ll/PIC GILBERT PICTORIAL ANATOMY ·Of THE CAT 1968' GILBER.T • STEPHEN G.C PICTORIAL ANATOMY OF THE CAT 1968 61.023081> DICKINSON, EMILY TWO POEHS. 1968 DICKI.NSOii, EHILY•SDt TWO POEMS.·SC*ILLUS. AND CAll 1'168 680080H GERNSHEIH, HELMUT L. Jo1 M. DAGUERRB 1968 GERNSHEIH, HELMUlo$0 L. J• M. DAGUERR!!*SBTHE HIST 1968 68008214 RILEY• JDHII It, THE STUDEIIT LOOKS AT HIS TEA 1'169 RILEY, JOHN W.£1 SAT THE .STUDENT LOOKS AT HIS TEA 1969' 6A0081ol8 TAGIURl'o RENATO ORGANIZATIONAL CLIMATE 1968 TAGIURto RF.NAT0~&1 $ ORGANIZATIONAL CLIHATE*~8EXP 1968 680257:!11 T"E BARDOUE PRINCIPLES, STY 1968 8AZ!N 0 GERHAJN~tl SA THE BAROQUE* PRINCIPLES, ·sno 1968 69015554 36823 PHILIPS, JUDSON GIRt WITH Sl~ FINGERS 1969 PHILIPS, JUDSON PENf THE GIRL WITH SIX FINGERS*SB 1969 17002574 AYLMER, G. e. THE STRUGGLE FOR THE CONSTIT 1968 AYLMER, .c. E.&t UTH THE STRUGGLE FOR THE CONSTIT .1968 78625296 GROVES, DORIS GUIDING THE DEVELOPMENT OF 1968 .GROVES, OORIS.U SAG GUIDING . THE DEVELOPMENT OF T 1968 79000540 AMERIC~N ASSOCLATION PREPARATION FO~ RETIREMENT 1968 ·AMERICAN ASSOCIATION PREPARATION FOR RETIREMENT•s· 1968 At 6.800't"l52 IIELlSt R08EI!T SCIENCE• HOB8Y HOOK OF IIEATHf 1968 IIEl:LSi ROBERT', S01913 ·st I ENCE•H08B'I' BOOK OP WEATHE 1968 AGR680001't 5 .SANTHYER, CAROLEE MOROCCO S AGRICULTURAL ECD~O 1968 SANTH'I'ER'• CAROLEE,SD MOROCCO S AGRICULTURAL ECONO l96B GS 68000236 US GEOLOGICAL SURVEY BI~LIOGI!APHY 'OF REPORTS RESU 1968 HEATH, JO ANN,SD1923 BI8LIOGRAPH'I' OF REPORTS RfSU 1968 TOTAL HATCHED RECCS 15 AOO. GENERATED ERRS .. Fig. 7. Matched Records Listing. 54 Journal of Library A.utomation Vol. 3/1 March, 1970 2) Items tape, containing all records requested from the master tape. They are in MARC format and code, and the number of logical records should match the matched record count. 3) Unmatched finders listing, showing all valid finder records that did not match the MARC master tape. Figure 8 shows sample output. 4) Unmatched finders tape, containing all valid finder records that did not match the MARC master tape. UIIMATCHEO ~FCOROS LleAARY COOF. X DATE PAOCESSFO 06/IS/69 LC NUHBf.~ LOCAL USE A UntO~ TITLf DATE 39015412 PAOINT PAOELFOAO I~TEANA TIONAL UW 1939 6000716) 3~6~ DElllt MElVIN BElLI l COKS AT liFE 1960 640635R6 CASWEll t BARBARA W WOAK~ENS C!I~PE~SAT I ON HFNH I 1963 68002763 BELLI lAW REVOLT 1?6~ 68055~0~ EISEHA'It ALBERTA THF. GUEST OOG 68066~07 ROHECK t HllCREO SPF.C!Al ClASS PROGRAMS FOR 1?6A 70003466 TH F FHENOEO CA{f. FACILITY 196 7 71 079)1 0 RfiTlElt WILLI AM THE •FO ITFRRANEAN, I IS ROLF 1969 HfW68000051 PI VE'IH/fOUC TOT~l UN•ATCHEO RCOS 9 Fig. 8. Unmatched Records Listing. 1!!\ltOR LIST IIIG liBRARY CODE X D ATE PROCESSED 06/15/69 X 66016U65 RICH NECESSIT IES Of liFE 1966 I NV ALl C LC• NUMBER X 6805540~ I? I SEGUES T DUPL I CA TE LC NUMBER X 73U3622 8LEINH EIH THE AI.SE AND FALL OF THE 197 0 INV ALIC LC• NUMBER J 95000001 INVALID liBR ARY CODE J G9 6MD003B ~683986 I NVAliD liBRARY CODE JG9 6ADDDOl8 468~986 INVAliD lC• PREF I X JHE\17369 78 }6 HE t S~lUE IIIVALID LIORMY CODE XHE 3A3326D9l INVAliD lC• PREFIX XHEH332609Z INVALID lC• NUH8ER XHlW790D0366 J ONES RELATIONSHIPS A~CNG l969 INVAliC LC• PREF IX J3266AODOOG9 CURT IS THE MAKING Of A PRESIDEN T 1969 INVALID LIRRARY CODE J3266ADDDDG9 CURTIS THE MAKING Of A PRES I DENT l 96? I'IVALIO LC• PRfFIX J32668DOOOr.9 CURTIS THE MAKING OF A PRES I DENT 196? I NV All C LC• NUHHER X32669005736 HEW3265ht 32 INVALIC LC2 PRE F IX TOTAL FRRORS l4 Fig. 9. Errors Listing. Processing of MARC TapesjBIERMAN and BLUE 55 5) Errors listing, showing all 80 columns of invalid finder records and the appropriate error message. Finder records are invalid if one of the following errors occurs: 1) blank or invalid library code; 2) prefix any characters except blank or upper-case alpha; 3) LC card number not pure numeric. Invalid finder records are not processed but are placed on an error listing. Figure 9 shows sample output. No edits will be made on columns 14-80, which are for local use entirely; all data from these fields will be transmitted to printed listings for any desired local use or for verification. Record counts are included at numerous points to facilitate accurate record control. For the purposes of this particular program, counts should check as follows: Matched + E + Unmatched -Generated Original rrors '- = Records Records Errors Count Matched records appear at the end of the listing of the same name, errors appear at the end of the listing of the same name, and unmatched records appear at the end of the listing of the same name. Generated errors appear at the end of the matched records listing. A generated error indi- cates more than one error in -a single card and this count is included only for control purposes. The original count is expected to be maintained by the submitting library for maximum accuracy. These counts are checked immediately, and any discrepancies cleared up as soon as pos- sible. Figure 10 gives the overall view of the program and Figure 11 a de- tailed flowchart. The ODL-05 program was written to provide the greatest flexibility possible to the user libraries. The only information absolutely required for the finder tape is the local library code, and the complete LC card number. However, the remaining 67 card columns are available to the local library for any use it may wish to make of them. If the local library would like a quick method of sight checking to make sure that the rec- ords copied were the records wanted, it can keypunch the first twenty characters of the author in columns 29-48, the first 28 characters of the title in columns 49-76, and the date of publication in columns 77-80. If this is done, the matched records listing will contain the author, title and date from the finder tape, immediately followed underneath in the same position on the page by the corresponding information from the MARC record. Figure 7 shows sample output. Thus, the library can quickly sight check what it thought it was getting at the time of request with what it actually got from the MARC record. Of course, the local library is free to put no information, or other information, in columns 29-80; the operation of the system will not be affected and whatever information is included in columns 29-80 will appear on the three output listings (matched records, unmatched finders and errors). 56 Journal of Library Automation Vol. 3/ 1 March, 1970 -Phase 1- edit, pull atches and list. -Phase 2- Print Errors -Phase 3- print unmatched listing Unmatched Listing Date & Lib. code card Matched Listing Errors Fig. 10. Withdrawing Records Program System Flowchart. ~ MASTER-READ \..._/ ~ FINDER-READ ~ Processing of MARC Tapes/ BIERMAN and BLUE 57 Move Control data ·to Header Areas Put hi-values 1----~in compare Pull LCI & convert to EBCDIC COMPARE \..._/ area Close files Put proper Error code in record ~ !---•COMPARE '-.J ~ PHASE-2 \._.,/ ~---=" FINeREAD Fig. 11. Withdrawing Records Program Detail Flowchart. 58 Journal of Library Automation Vol 3/ 1 March, 1970 Construct ~:...___,..,non-match red 'and code as 1----~ ~1RUD FINDER-READ Fig. 11 continued. such /"""'.. WORK-READ '-./ Open work tap e r-'\ 1----_.. FINDER-READ -...__,~ Close files Construct Error Message WORK-READ - Processing of MARC Tapes/BIERMAN and BLUE 59 {\ PHASE-3 Open Unmatched tape Construct at end Close files Unmatched EOJ Mes sage Fig. 11 continued. Another convenience for the local library is that it has to do no original programming to use the system. All that is needed are standard sort, merge and card-to-tape programs. Any of the programs written by the Department of Libraries is available to users on demand. They may find the merge or LC card number print programs useful. Another consideration for the user is the ease with which invalid finder records and unmatched finder records can be resubmitted into the system. To correct finder records in error, the library simply repunches cards from the error listing, with necessary corrections, and resubmits them in the next cycle with new cards. Unmatched finder records can be merged with any new finder records in the next cycle and resubmitted, no re- punching being necessary. 60 Journal of Library Automation Vol. 3/ 1 March, 1970 WHAT IS PRESENTLY BEING DONE The variety of applications for MARC presently being worked on in Oklahoma libraries is most interesting. Central State College, Edmond, Oklahoma, is currently subscribing to the weekly MARC tapes and pro- ducing an index of available materials which cumulates for two months and then drops off the older entries. The library is receiving its own sub- scription to the MARC tapes for this purpose but does not plan to main- tain a complete file of MARC records. The Tulsa City-County Library System, Tulsa, Oklahoma, is currently using MARC records from the State data base for bibliographic informa- tion for its machine produced book catalog. It originally had a subscrip- tion to the MARC tape service, but with the operation of the state-wide data base, is dropping it. The University of Oklahoma, Oklahoma State University, and Okla- homa County Libraries have no immediate plans for utilization of the MARC records as distributed by the Library of Congress; however, when they do move in this area it will probably be for use in their technical processing departments and the State MARC Data Base will form a basis for their use. COMPUTER AND LANGUAGE USED The computer being used for the Department of Libraries MARC pro- gram is an IBM 360/ 30 located in the State Budget Bureau but under the administrative control and operation of the Information and Manage- ment Services Division of the Board of Affairs (the centralized state com- puter center for the Capitol complex) . The computer has 32K core size, one on-line card read/ punch, Model 2540, four magnetic tape drives, Model 2415, two magnetic disk drives, Model 2311, and one on-line printer Model 1404. The programs are written in COBOL for the 360/ 30, operating under DOS, with a COBOL compiler. Very little modification would be required to operate under OS. The merge program ( ODL-01) requires three tape drives. The withdrawing program ( ODL-05) requires four tape drives but could be modified to operate with only three tape drives. In agree- ment with Henriette Avram and Julius Droz ( 4), the Department of Li- braries has found that COBOL can easily be used to process MARC records. The Information and Management Services Division has assigned a pro- grammer to the Department of Libraries who has done, and will do, all the MARC programming. She is actually employed by the IMSD and the Department of Libraries contracts with them for her services. Presently, the Department is being charged about $7.00 an hour for programming time. The planning, system design, actual programming, and production are all closely supervised by the Data Processing Coordinator of the Li- brary, and he is on the Department of Libraries' staff. Processing of MARC Tapes/ BIERMAN and BLUE 61 The relationship between the IMSD and the ODL has been extremely beneficial for the Library. Thus far, the centralized computer center has provided fast and excellent service at a minimum cost. Having a full- time data processing coordinator on the staff of the library has negated the communication barrier which so often exists between a computer service center and a user library. COST Cost figures for use of MARC are very difficult to find. Few of the MARC I participants ( 3) give anything but a fleeting reference to cost. The reason is clear: cost figures are difficult to determine and even more difficult to evaluate meaningfully. Table 2 is a breakdown of the charges to the Department of Libraries for programming and machine time; it does not include Department of Libraries' staff time or overhead costs. The figures are accurate through the end of February 1970. Table 2. Costs System design ------ ------------------------------------------$1,102.00 Programming ------------------------------ 2,467.00 Machine cost for program testing, debugging and machine and operator cost for merging through 2/ 28/ 70 ------------------- 2,026.00 Total --------------- ------------$5,595.00 For the first year, the Department of Libraries is absorbing all the costs of merging and maintaining the MARC master file, as well as the costs of all programming, as a form of state aid to libraries. The machine costs of comparing a finder tape with the master file, copying the desired rec- ords, and printing the various hard-copy lists, is being absorbed by the user library. The user also supplies the two blank tapes which are needed for each run. The machine time costs are based on the rate of $80.00 an hour of CPU time. PLANS FOR THE FUTURE OF THE STATE-WIDE MARC MASTER FILE Two major problems are apparent in the system as it is now set up. The system was initially created as a sequential tape system because this was the easiest and quickest way to establish a working system, and be- cause it was felt that this would be practical for at least the first year of operation. One problem is that the sequential file will become expensive to maintain and does not allow direct access to a particular record with- out a sequential search. Another problem is that the present system al- lows enhy into the file only by LC card number and does not allow entry directly with bibliographic information. 62 ]vurnal vf Library Automation Vol 3/1 March, 1970 In accordance with present plans, in March 1970 work will begin on converting the storage medium from tape to a direct access device (disk or data cell) as the RECON Study suggests ( 5). At that time the file will cease to be maintairied in LC card number order and will be main- tained in the order in which the records are received from the Library of Congress. Various indices to the MARC data base will be produced; author and title indices will enable the data base to be searched by bibliographic information when the LC card number is not known. In this way, only the indices (which would be comparatively much smaller), and not the complete data base, would have to be merged and searched. In terms of the data base itself, this will be the next major change. In the long run, it will be desirable for libraries that want access to the MARC data base to have such access directly via terminals. At the present time, the cost of this kind of access is not worth the increased speed of access, nor is the money presently available; however, in the future, the cost of such a system will surely be reduced by technological improve- ments and increased importance of instantaneous access to the data base. When need balances with cost, such a set-up will be feasible. The geographical expansion of the system is a possibility. Economically, this is most desirable, because the more ways the cost of maintaining the data base is split, the cheaper it is for all involved. Some preliminary investigation along these lines with bordering states is being made and hopefully at some time in the future there will be a regional data base which many libraries can use. PLANS FOR FUTURE COOPERATIVE USE OF MARC The cooperative use of MARC thus far in Oklahoma only affects the larger libraries which have access to computers and automation personnel. Essentially, each library is autonomous and is free to use MARC in any manner it wishes. It will remain true in Oklahoma that individual libraries will always be free to use the data base to retrieve part or all of the data base for any purpose. However, plans are under way for more coopera- tive use of MARC with libraries that do not have automation capabilities that would result in useful hard copy products for such libraries. Two such cooperative plans have been proposed for immediate imple- mentation. The first of these is a current awareness service. Selected sub- jects would be compared against the data base on a bi-weekly (or other period) basis and complete bibliographic information for books represent- ing the selected subjects would be printed as a personalized current awareness service. For example, all law titles on the MARC tapes for two weeks could be pulled and listed, and the listing distributed to the county and state law offices, attorney firms, the Law School Library, etc., for selection and order purposes. The same could be done with library Processing of MARC Tapes/ BIERMAN and BLUE 63 science or any other subject. Subject lists of interest to various agencies of state government could be produced and sent to them. Another pos- sibility is a profile of a legislative session by subject and then weekly or monthly lists of current materials available on these subjects for ordering by the Department of Libraries and possible lists to be made available to the legislative members. There are many possible uses for such a sys- tem which could be done fairly inexpensively. Work began on this project in October 1969, and the service became operational on a cost basis in February 1970. A second possibility is catalog card and processing aids production. This would probably be done as a pilot project with several libraries throughout the state and then, if successful, expanded to any library in the state wanting to use the service. Catalog card sets with subject head- ings printed at the top, and call numbers printed if the library accepts LC or LC Dewey classification (there would be several options available within the system), spine labels, and book and circulation card labels would be provided. A by-product of such a state-wide operation would be the maintenance of book location information in machine readable form in a central place for future use as a basis for a machine readable state-wide union catalog. A project not in the immediate future but certainly being considered is that of cooperative retrospective conversion. That is, several libraries in the State would like to have bibliographic information in MARC format for all books in their collections. Whether the Department of Libraries would go ahead with such an ambitious project or wait for it to be done nationally ( RECON Study) would depend on timeliness on the national scene, need on the local scene, and available financial resources. Eventu- ally, Oklahoma would like to have in machine readable form a complete union catalog of the entire library resources of the State that could be used for cooperative acquisitions programs, for strengthening subjects which are weak within the State, and as a location tool for interlibrary loan. Such a data base would later be used also for reference functions. Needless to say, such an ambitious project as this is not in the immediate future. CONCLUSION Early in the game, Oklahoma libraries learned that the most economical means to library automation was cooperative automation. The creation of a state-wide MARC data base is an important step toward cooperative library automation, while still allowing each local library to maintain its individuality for uses of the data. Many areas of cooperation still remain untouched. The future success of library automation in Oklahoma lies in the imaginative and creative projects that could be designed and im- plemented cooperatively to the mutual cost savings and benefit of all. 64 Journal of Library Automation Vol. 3/ 1 March, 1970 PROGRAMS Copies of the programs mentioned in this paper may be obtained from National Auxiliary Publications Service of ASIS as follows: 1) "A Pro- gram to Merge All MARC II Tapes Received from the Library of Con- gress onto a Single Tape" (NAPS 00815); 2) "A Program to Drop Given Records or to Transfer Them to a Separate Tape" (NAPS 00816); 3) "A Program to Print MARC Tapes in Readable Form" (NAPS 00817); 4) "A Program to Pull Selected Records from the MARC Master Tape for a Single Library" (NAPS 00818); and 5) "A Program to Print a Listing of All Library of Congress Card Numbers on a Given MARC Tape" (NAPS 00819). REFERENCES 1. Nugent, William R.: "NELINET: The New England Library Informa- tion Network." A paper presented at the International Federation for Information Processing, IFIP Congress 68, Edinburgh, Scotland, August 6, 1968. (Cambridge, Mass.: Inforonics, Inc., 1968), 4 pp. 2. Pulsifer, Josephine S.: "Washington State Library." In Avram, Henri- ette D.: The MARC Pilot Project; Final Report on a Project Spon- sored by the Council on Library Resources, Inc. (Washington: Li- brary of Congress, 1968), pp. 149-165. 3. Avram, Henriette D.: The MARC Pilot Project; Final Report on a Project Sponsored by the Council on Library Resources, Inc. (Wash- ington: Library of Congress, 1968), pp. 89-183. 4. Avram, Henriette D.: Droz, Julius R.: "MARC II and COBOL," Journal of Library Automation, 1 (December 1968), 261-72. 5. RECON Working Task Force: Conversion of Retrospective Catalog Records to Machine-Readable Form; A Study of the Feasibility of a National Bibliographic Service. (Washington, D. C.: Library of Congress, 1969). 5127 ---- DESIGN OF LIBRARY SYSTEMS FOR IMPLEMENT AT ION WITH INTERACTIVE COMPUTERS 65 I. A. W ARHEIT: Program Administrator, Information Systems Marketing, International Business Machines, San Jose, California In the development of library systems, the movement today is toward the so-called "totar' or integrated system. This raises certain design and implementation questions, such as: what functions should be on-line, real time and what should be done off line in a batch mode; should one oper- ate in a time-share environment or is a dedicated system preferred; is it practical to design and implement a total system or is the selective implementation of a series of applications to be preferred. Although it may not be feasible in most cases to design and install a total system in a single operation, it is shown how a series of application programs can become the incremental development of such a system. Currently library mechanization is entering a new phase. The first phase, extending from 1936 to the mid-fifties, saw the development of a number of small, scattered, and essentially experimental Automatic Data Process- ing ( ADP) library applications. These were punch card systems for pur- chasing, serials holdings lists and circulation control. During the second phase, which has been running now about 15 years, a large number of library applications have been mechanized. These include the production of catalog cards, book catalogs, periodical check-in, serials holdings, cir- culation control systems, acquisitions programs and searching of files, or 66 Journal of Library Automation Vol. 3/ 1 March, 1970 information retrieval. Systems librarians have been busy designing indi- vidual programs, building special computer stored files, implementing conversion of records and developing operating procedures for these vari- ous applications. More importantly, they have been studying the library from a systems point of view in order to have a better understanding of the individual tasks performed and how they can be best accomplished with the avail- able tools. At first concern was limited to individual applications in the library. Gradually some of the more perceptive systems analysts began to be concerned about integrating these various applications. Some simple examples are the generation of book cards for process control and circu- lation control as a by-product during the order-receiving cycle; the combi- nation of subscription renewal, claims, and binding control with the serials holding program; the development of authority lists in book catalog pro- grams; the simultaneous updating of accession files and circulation control files, etc. The purpose of many of these partially integrated programs was to reduce redundancy and make multiple use of single inputs. The next step was to look at the library as a whole and consider it as a "total" or single, integrated system. Rather than building a series of independent applications programs, a number of libraries began to plan total systems in which the individual applications would be integrated segments. In the past year or two such efforts have been undertaken by the University of Chicago, Stanford University, Redstone Arsenal, the Na- tional Library of Medicine, Washington State University, University of Toronto, System Development Corporation, IBM and others ( 1, 2, 3, 4, 5, 6) . It is this total systems concept which is the new and current devel- opment of library Electronic Data Processing ( EDP). At first, a total integrated system was conceived as a series of separate application programs utilizing separate files, but whose records have simi- lar formats and field designators allowing for the multiple use of single inputs. A more advanced concept, however, calls for the construction of a single logical file, even though, physically, the individual record ele- ments may be distributed over a number of tracks and storage devices. Operating on this central file are a series of program modules performing functions involving file building, searching, computation, display and printing. As each application is called for-that is, as the librarian pre- pares an order, receives an invoice, checks in a periodical, adds a call number, does some cataloging, charges out a book, etc.-the appropriate program functions are called into use. Attached to the file are a number of indexes or access points. One such program, for example, provides some eighteen indexes: author, permuted title, subject heading, descrip- tor, call number, invoice number, publisher, serial J.D., L. C. card number, borrower, etc. It is not just coincidental that the development of the total integrated library system developed at the same time that computer hardware be- Systems with Interactive Computers/ WARHEIT 67 came available that made it practical, especially in an economic sense, to operate a total library system. One of the basic elements of this hard- ware was the development of real-time, on-line, terminal-oriented, time- shared systems. At present, orders for on-line systems are increasing at such a rate that it is estimated in the June 23, 1969, EDP Weekly "that half of the computers installed by 1975 will be on-line systems." Although there are a number of reasons why on-line, time sharing and terminal oriented equipment made it feasible to build total library systems, the fundamental ones were that now the librarians could interact with their system and records and could, essentially simultaneously, perform a great variety of tasks. The scientific and business communities have been quick to take ad- vantage of these new capabilities. A number of computer manufacturers, software firms and service companies soon started to provide terminal oriented, commercial time-share services. By the beginning of 1969 there were some 35 such services in existence, serving over 10,000 customers; by the end of 1969 it is estimated there will be over 30,000 users. Although these systems are often used essentially for remote job entry, their main attraction for users has been their on-line, conversational, real- time capabilities. The interactive, man-computer techniques made pos- sible by commercial time-sharing services have been extremely valuable for problem solving applications, especially engineering and program- ming. However, the wide availability of text editing packages have also opened up these services for libraries. One of the first academic libraries to use such a service for preparing bibliographic records was the State University of New York at Buffalo (7, 8). Many universities and industrial firms have developed their own time- sharing systems. A number of special libraries, notably those in IBM, were quick to take advantage of their in-house, time-share system to im- plement acquisitions, catalog input and library bulletin programs (9). The Defense Documentation Center over three years ago began preparing its bibliographic inputs on line. The SUNY Biomedical Network based in Syracuse does the same (10). The Washington State University library was one of the first academic libraries to implement an on-line acquisition program ( 11), and Midwestern University ( 12) and Bell Laboratories ( 13) now have on-line circulation control systems. With the advent of time-shared, on-line capabilities and the potenti- ality of building total, integrated systems, librarians today who are plan- ning EDP systems are faced with a number of design decisions: 1) Should the system be a real-time, on-line system or an off-line, batch mode opera- tion, or a combination of both? 2) Is it desirable to operate in a time- share environment or is a dedicated system to be preferred? 3) Should one design a total, integrated system or should one selectively implement a number of individual applications? 4) If the decision is for an inte- grated system, how can it be incrementally implemented? 68 Journal of Library Automation Vol. 3/1 March, 1970 It is recognized that a program must be tailored to fit the available resources and that it is not always possible to build an ideal system. Nevertheless, design objectives must be established even though they cannot be immediately realized. If the ultimate objectives are under- stood, then the program development will be orderly and later reconver- sions will be kept to a minimum. Therefore, even though the design ob- jectives may not be achieved for a number of years, they should be es- tablished so that current implementation can be carried out in a rational manner with some assurance that the system will grow and develop. REAL TIME OR BATCH Library operations have always involved a variety of interactive real time and batch mode procedures. Most operations dealing directly with the library patrons are, of course, in real time; reference question hand- ling and charging of books are typical examples. Some technical processes, such as cataloging and searching for acquisition, are also essentially inter- active, real-time operations. This means that the librarian completely proc- esses each item by creating or updating a record or servicing an inquiry, one at a time, with little or no attempt to batch the identical operations for a number of items or inquiries. Other processing, however, such as preparing and mailing orders to vendors, sorting and filing charge-out cards, sending overdues, filing into the catalog, checking in periodicals, labeling, preparing binding, etc., is essentially done in the batch mode. In other words, batch and real-time operations complement each other, for whereas it is more effective to do some operations in real time, hatch- ing is more effective for other operations. Librarians, therefore, expect and need both modes of operation. The actual distinction between these two modes is often lost in certain mech- anized systems where everything is done in a non-interactive batch mode while interactive, real-time services are provided from printouts. Many current library mechanized systems are really nothing more than processing techniques for producing the standard, hard-copy, biblio- graphic tools such as catalog cards, serials lists, book catalogs, orders, overdue notices and the like. Whenever the librarian wants to use the information generated by these programs, he consults the hard-copy files or lists. He does not interrogate a computer file directly. This approach has been typical of many other computer-based information systems. When the first direct access devices ( RAMAC) were made available for commercial and industrial inventory control, they were used primar- ily to update the records and to produce the inventory lists and card files which the user would consult for information. Later, as confidence developed in the machines, and terminals became available, the print- out lists and files were abandoned and the user began consulting the computer store directly. Systems with Interactive Computers/WARHEIT 69 Typically today in libraries using computer systems, inputs are proc- essed in batches and outputs are produced in batches. Real-time services are provided from the print-outs: the catalogs, the on-order file, serials lists and so on. Even circulation control has been an off-line, batch opera- tion. Although the charge-out may be made through a data entry unit, all that is actually accomplished at the time is that the transaction is recorded. It is only later that the transactions are hatched and processed, the files set up for the loans, the discharges pulled from the file and the delinquencies handled. Although librarians will not, in the immediate future at least, as readily give up their card catalogs and printed lists as business and industry are doing and as some enthusiasts believe librarians will ( 14, 15) - the queuing problem alone where the public must use the files would be very severe- some hard-copy files could be dispensed with in an on-line system. Certainly hard-copy files of circulation records, periodical check- in records, authority lists, on-order records and the like need not be maintained when these files are available via terminals. Until now, practically all library machine processing, with a few ex- ceptions, has been hatched, off line and not interactive. In a non-inter- active system, records are created and modified by manual preparation of work sheets followed by keypunching for data entry. In a library en- vironment, for example, this means that the acquisitions librarian fills out an order work sheet that is given to a keypunch operator, who either prepares a decklet of punch cards or punches a paper tape or makes a magnetic record on tape. The cards or tape are then fed into the com- puter, the input is edited and errors noted and a proof copy is printed. The error messages and proof copy come back to the order librarian, who makes the necessary corrections. These are handed to the keypunch op- erator, who corrects and updates the record and inputs it again into the computer. If the operator has not introduced some new errors, the record is then processed. If she has, the record loops back again to the order librarian. The same story can, of course, be told about catalog records, journal and report records, and so on. In an interactive on-line system, the originator of the information (in this example the order librarian) could key his data directly into the computer or could prepare a work sheet for operator input. The editing would occur at once by the terminal responding to each entry and verifi- cation or error messages would be returned immediately. The librarian or operator would enter the necessary corrections and upon acceptance of the record by the system would signal entry of the record into the file and the print queues as required. Also, during the preparation of the entry, the librarian would be using the terminal-presumably a dis- play type terminal-to consult the files he needs, such as shelf list, orders outstanding, authority lists, etc. 70 Journal of Library AAit01TUltion Vol. 3/ 1 March, 1970 A simplified flow chart comparison of an off-line and an on-line cata- loging process would look something like that shown in Figure 1. OFF-LINE Catalog Revision KP Proof Input Output Worksheet H Edit H .], T 8 Correction Correction - 1 .ON- LINE Cataloger Output or --4 Input Edit (7 Revision Catalog Worksheet l 1 Error Correction I Fig. 1. Cataloging Process: Off-Line and On-Line. Although only a few library applications and no total library system are as yet on-line operations, a number of analogous operations are being carried out in other industries, such as order entry, inventory control, production scheduling, insurance policy information, freight waybilling, etc., so that one can make a few tentative assessments (16, 17, 18). To begin with, in an on-line system a work sheet does not have to be prepared, and so the keypunch operator is eliminated. Because of the interaction of the originator and the system, all corrections and editing are accomplished at once, so that the tum-around time is very much less. Preparation of printed error messages and proof copy are eliminated and the total error rate is greatly reduced. Thus, although the reading-in of the individual records is slower in the on-line mode than in the batch mode, appreciably fewer messages need be read to complete a record in the on-line mode, making for more economical machine time. To this, how- ever, must be added terminal and communication costs as well as the terminal supervisor program and the fact that most on-line work is done Systems with Interactive Computers/WARHEIT 71 during the prime shift, so that actual machine costs tend to be higher with the on-line system. Some, however, dispute this, claiming that, on balanc~, machine costs are equal. Labor costs, however, are very much lower with the on-line system. As a general rule, computer input costs are 85% labor and 15% machine. Not only can a transcription clerk be eliminated, but the order librarian who prepares the original inputs on the terminal works very much more efficiently. Consulting hard-copy files and lists is more time consuming and less informative than interrogating machine files. In an on-line sys- tem, the librarian's necessary tools are brought directly to him and dis- played rapidly and efficiently. He does not have to walk to the sheH list, the catalog or the on-order file and copy information. In a well developed, sophisticated system some of the heavily used tools, such as the subject heading authority lists and class tables, would also be available from the terminal. Not only does the librarian not have to spend time going to the physical files, but since the information is computer stored, it is brought to him in a greater variety of forms and sequences than is available in the hard-copy files. For example, titles are fully permuted so that incom- plete title information can be searched. Some systems librarians are pro- posing the use of codes and ciphers to search for entries, especially those with garbled titles ( 19, 20). All entries, including added authors, editors, vendors, etc., are immediately available even for uncataloged on-order items, so that searching is not restricted to main entries. It is not surpris- ing, therefore, that clerks preparing computer inputs prefer working on line rather than off line. One interesting discovery is that since operators can do so much more with on-line systems they tend to take more time to turn out a better product. Indications are "that significantly lower costs would have resulted if the time-sharing users had stopped work (i.e. gone to the next task) when they reached a performance level equal to that of batch users" ( 17). Even with a circulation control system, there is higher system efficiency with an on-line operation. Every transaction, such as a charge-out or a discharge, is an actual inquiry into the file as to the status of the book and borrower and the answer is immediately available; therefore con- trols and audit procedures can be simpler. Elaborate error correction rou- tines do not have to be provided in the program to identify improper inputs as has to be done with an off-line system. Incorrect loans are not made of restricted material, such as holds and reserves, or to delinquent borrowers. The system also acts as a locator tool for determining the loca- tion and availability of volumes. As a final note, on-line systems are neces- sary if effective networks are to be developed and decentralized services provided ( 21, 22). The basic conclusion is that an on-line system can handle more work and provide more services at greater machine costs but lower labor costs than a manual or an off-line machine system. In view of the fact that 72 Journal of Library A-utomation Vol 3/1 March, 1970 machine costs are coming down rapidly, while labor costs and throughput demands are forever rising, the future of the on-line machine system in the library looks very promising. TIME SHARE OR DEDICATED SYSTEM A number of librarians have had very unhappy experiences with data processing departments over which they had no control. Machines have been changed, schedules dropped, library jobs delayed or dropped for "higher priority jobs" and so on. One tendency, therefore, has been to try to get a library's own computer facility. But, as De Gennaro so succinctly summarizes it, "the economics of present day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need dedicated solely or largely to library uses ... Eventually, library use may increase to a point where the in-house ma- chine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. In the immediate future, most libraries will have to depend on equipment located in computing or data processing centers . . . Experience at the University of Missouri suggests the future will see several libraries group- ing to share a machine dedicated to library use . . . it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library . . ." ( 23). It is true that the small computers are getting more powerful and it is quite possible the day will come when small stand-alone computers will have the capacity to do all the jobs required by the library. For the time being, however, an on-line system supporting a number of ter- minals for a variety of tasks in the library requires a computer of a size which cannot be economically justified except for the very large libraries. Also, one thing that is often overlooked is that implementing a large library system requires data processing technical support that is very sel- dom available on the library's staff. One need only look at the Information Systems Office of the Library of Congress, or the System Analysis and Data Processing Office of the New York Public Library to have some appreciation of the requirements for such technical support. Also, a large central system often has backup capabilities which provide insurance against breakdowns and interruptions. The question really is not whether a library should time share or have a dedicated system, but rather whether or not the library has the neces- sary control over its segment of the total system. This segment is the library's property and its services are available to the library as set forth in the agreement made when the library became part of the data proc- essing services. Again, it must be emphasized that all this applies to systems which have to perform all library functions. Most libraries, however, in order to Systems with Interactive Computers/WARHEIT 73 get started and develop their programs, are beginning with small, stand- alone computers or are submitting batch jobs to a data processing center. Later, as their programs develop, they will have to upgrade their com• puter capabilities. In view of the ultimate needs of a system which will support most of the major processing functions of a library, most libraries will have to have access to computer facilities whose full support they cannot economically justify. Time sharing, certainly for the immediate future, will be required for any on-line library system. TOTAL INTEGRATED SYSTEM OR INDIVIDUAL APPLICATION It is more economical to handle a variety of library applications by using a single file and a standard set of functional programs, than it is to provide a separate file and a separate set of application programs for each application. Not only is it more economical, but this total, integrated approach is, in its essential modularity, extremely flexible. Functions can be added, changed, or removed, and sequences can be re-ordered, so that the system can grow and change with changing needs and capabili- ties. Also, since the full record is available, if needed, for every applica- tion, added services, normally not feasible, are practical. For example, a circulation control system that, instead of having separate circulation files, keeps charge records in its central bibliographic file, can set a hold on all copies of a book, no matter where the copies are kept, as in the BELLREL system ( 13). Also, from a total record one can select various subsets and make different orderings to provide a variety of services. The library systems currently being designed are essentially mechanized versions of existing manual systems. However, as experience is gained with these new systems, as more advanced equipment is made available, and as research and development provide new insights, these systems will evolve and change. For example, in some cases a major part of de- scriptive cataloging is becoming a part of acquisitions. The former com- partmentalization in libraries is already breaking down. One should, therefore, be prudent and not lock up the system into tightly compart- mentalized segments on the assumption that current file subsets will re- main unchanged. It is advisable that each library activity have potential access to all system functions and to all records. In the present context, an activity may have no need for all functions, nor does it need the total record, but as the system develops it might very well need these added capabilities. The problem, however, is that for a total, integrated system one must first build a complete structure including the file and all the functions- such as file building, search, compute, compare, display, print, etc.-as well as set up all the access points which are essentially indexes. In addi- tion, all the overhead necessary for supervising the programs, managing the files, and monitoring the terminals must be provided for. To use an 74 Journal of Library Automation Vol. 3/ 1 March, 1970 analogy, one must first build foundation, walls and roof and install all plumbing and wiring before building any rooms. Consequently, the start up or initial investment is far higher than for implementing a single ap- plication program. Some who have undertaken the development of total systems did not fully appreciate this at first and have, as a result, had to replan their development programs. Even if one could bring in a fully debugged program for a total system, there would still be the tasks of converting records, training staff, setting up operating manuals and working out procedures. Only as machinable records became available and the file grew and developed could various applications become operable. From a practical point of view, the imple- mentation of a total system would have to be incremental; that is, once the basic system is installed, applications would have to be implemented one at a time and in some rational order. This is even more true where the programs for a total system have not been written as yet or where the library's resources are such that it can only undertake one job at a time. From a practical point of view, one can develop and implement only one application at a time. Furthermore, as is often the case, the available equipment is limited and cannot do everything the library will ultimately want. It is necessary, therefore, to develop single applications and to design them in such a way that they can become part of an inte- grated system. It is also necessary to have a strategy and a plan to move up through the various levels of mechanization. Today there are many who, although accepting a total, on-line system as a desirable goal, feel that it is impractical to consider because of costs and unavailability of equipment. A full analysis of economic change in terms of wage-cost rise and machine-cost decrease, of technologic im- provement and of demand for added services, goes far beyond the limits of this paper. There is developing, moreover, a literature on these sub- jects (24, 25, 26, 27, 28, 29). Suffice it to say that an increasing number of librarians are becoming convinced that library mechanization is inevi- table, that it will affect all operations of the library, that it will provide the highest level of service through direct, on-line, interactive systems and that, whatever today's limitations may be, these changes are coming so fast that plans must be made now. These individuals are also con- vinced that whatever is now undertaken in the way of mechanization will evolve into an integrated system with many basic functions operated in a real-time, on-line mode. IMPLEMENTATION OF AN INTEGRATED LIBRARY SYSTEM Typically a library mechanization project will start with a single, rela- tively uncomplicated application that will not impact library operations very much, will require only a small amount of systems design and pro- gramming, and will run in a batch mode on a small equipment configura- tion. A typical example is the preparation of a serials holdings list. From Systems with Interactive Computers/WARHEIT 75 this first job, the librarian and his staff will become acquainted with data processing, will introduce the data processing personnel to some library requirements and will, hopefully, begin to develop procedures for work- ing with the computing center. Having passed this introductory stage, many librarians continued, as a rule, simply by developing the next application. Today, however, the more prescient ones are first assessing the total impact of mechanization and, having decided that their library will be mechanized, try to plan what their foreseeable goals are, then work out a plan to achieve these goals. Having decided that the ultimate goal is a total integrated system for the whole library, which will provide real-time services and therefore must operate on line, the library planner will set priorities and work out a strategy to reach these goals. In some instances he can start designing a total system. In other situations, he does not have the resources to do so, but plans to make use of programs being developed for other libraries or of so-called standard, commercial packages, or programs which may be developed jointly with other libraries. He should realize that he can't just sit and wait for D-day when a total complete program will be wheeled in and a turnkey operation will be installed overnight. The lead time necessary for planning, training, conversion and installation is too often grossly underestimated, so that these preliminary preparations are neglected to the detriment of orderly growth and development. Having established certain long-range goals, the librarian will tailor his current programs so that the library system will develop as smoothly as possible. He will try to keep the various subsystems and program segments as generalized and as modular as possible. He will structure his records so that they can ultimately be fitted together into a full bibliographic record. He will try to avoid using records so truncated that they will have to be discarded and recorded again later. He may, in fact, actually start with a full record that is comparable to his present shelf list or cata- log card, even though there may be no need of the whole record for the current application. He will provide for a variety of print options, such as line width, number of lines, number of columns, etc., so that a separate print program will not have to be written for each product or to accom- modate every change in style. He will try to organize his files so that the file structures and the record formats will not have to be radically changed when the system goes on line. He may store some of his records -his active on-order file, for example-on direct access storage devices. If he can, he will create access points to his large bibliographic file and store them on disk files too, even though he is currently operating off-line. Such direct access storage of indexes makes economic sense when very large files - and library files are large and grow very fast - must be searched or sorted. Aside from these immediate benefits, such a file or- ganization requires little or no restructuring or record reformatting when 76 Journal of Library Automation Vol. 3/ 1 March, 1970 the system ultimately goes on line and becomes terminal oriented. As early as possible, he will put his circulation control system on line. This is by far the cheapest and easiest on-line operation requiring the least investment and yet producing the most immediate benefits. Again, aside from the immediate benefits, this on-line operation represents an important building block for the ultimate total system. Aside from the current improved services, the experience of working on line and the opportunity to develop and refine processes and procedures will pay im- portant dividends in the design and implementation of the total on-line system. With knowledge of how he wants his system to develop, the librarian is now able to establish priorities and allocate his resources. The emphasis will be on file building, on capturing the record. Acquisitions programs or circulation control systems will come first. Work on the display ter- minal and communication will come later after searchable files have been built up. In other words, an attempt is made to have a controlled growth through several levels of mechanization. A start is made with a simple, off-line, batch job. Then a beginning is made on building what is to become the main, central bibliographic file, the catalog. As soon as possible, parts of it are stored on direct access devices, so that it can be used more effec- tively and so that its structure will conform to the requirements of an ultimate on-line system. A simple on-line process is adopted as soon as feasible. Each application program uses standard functional modules in macro form and so on. All this, of course, is highly oversimplified and may seem truistic to many. Nevertheless, there has been too much evidence of programs under- taken without adequate planning and of programs that have lacked con- tinuity because adequate guide lines have not been established. Such failures are too often ascribed to changes in personnel or hardware. A project should be designed so that inevitable changes in personnel and hardware can be tolerated without its being wrecked. Therefore, the es- tablishment of long-range goals can have a profound effect on the shape and success of current operations. More and more librarians and systems personnel engaged in library proj- ects are beginning to think in terms of total integrated systems. They are looking ahead and planning. They are designing and implementing their present applications not in a simple ad hoc way but as part of what is to become a total system. REFERENCES 1. Alexander, R. W.: "Toward the Future Integrated Library System," 33rd Conference of FID and International Congress on Documenta- tion, (Tokyo: 1967). Systems with Interactive Computers/ WARHEIT 77 2. Redstone Scientific Information Center : Automation in Libraries (First Atlis Workshop) 15-17 November 1966, Huntsville, Ala.: Red- stone Arsenal, (June 1967). Report RSIC- 625. 3. Black, Donald V.: "Library Information System Time Sharing: Sys- tem Development Corporation's LISTS Project," California School Libraries, (March 1969), 121-6. 4. Black, Donald V.: Library Information System Time-Sharing on a Large, General Purpose Computer. (System Development Corpora- tion Report SP-3135, 20 September 1968). 5. Bruette, Vernon R.; Cohen, Joseph; Kovacs, Helen : An On-Line Com- puter System for the Storage and Retrieval of Books and Monographs (Brooklyn, New York : State University of New York Downstate Medical Center, 1967). 6. Fussier, Herman H.; Payne, Charles T. : Development of an Integrated Computer-Based Bibliographical Data System for a Large University Library. (Chicago : Chicago University, 1968) . Clearinghouse Report PB 179 426. 7. Balfour, Frederick M.: "Conversion of Bibliographic Information to Machine Readable Form Using On-Line Computer Terminals," Journal of Library Automation, 1 (December 1968), 217-26. 8. Lazorick, Gerald J.: "Computer/ Communications System at SUNY Buffalo," EDUCOM. The Bulletin of the Interuniversity Communi- cations Council, 4 (February 1969), 1-3. Q_. Bateman, Betty B.; Farris, Eugene H.: "Operating a Multilibrary System Using Long-Distance Communications to an On-line Com- puter," Proceedings of ASIS, 5 ( 1968 ), 155-62. 10. Pizer, I. H.: "Regional Medical Library Network," Medical Library Association Bulletin, 57 (April 1969), 101-15. 11. Burgess, T .; Ames, L.: LOLA Library On-Line Acquisitions Sub- System. (Pullman, Wash.: Washington State University Library, July 1968). 12. Reineke, Charles D.; Boyer, Calvin J. : "Automated Circulation Sys- tem at Midwestern University," ALA Bulletin, 63 (October 1969 ), 1249-54. 13. Kennedy, R. A.: "Bell Laboratories' Library Real-Time Loan System (BELLREL)," Journal of Library Automation, 1 (June 1968), 128-46. 14. Licklider, J. C. R. : Libraries of the Future (Cambridge, Massachu- setts : M.I.T. Press, 1965). 15. Swanson, Don R. : "Dialogues with a Catalog," Library Quarterly, 34 (January 1964), 113-25. 16. Brown, Robert R.: "Cost and Advantages of On-Line DP," Datama- tion, 14 (March 1968), 40-3. 17. Gold, Michael M.: "Time-Sharing and Batch-Processing; an Experi- mental Comparison of their Values in a Problem-Solving Situation," Communications of the ACM, 12 (May, 1969), 249-59. 78 Journal of Library Automation Vol. 3/ 1 March, 1970 18. · Sackman, H.: "Time sharing versus Batch Processing: The Experi- mental Evidence," AFIPS Conference Proceedings, 32, 1968 Spring ]oint Computer Conference, 1-10. 19. Nugent, William R.: "Compression Word Coding Techniques for In- formation Retrieval," Journal of Library Automation, 1 (December 1968) ) 250-60. 20. Ruecking, Frederick H.: "Bibliographic Retrieval from Bibliographic Input; the Hypothesis and Construction of a Test," Journal of Library Automation, 1 (December 1968), 227-38. 21. Grosch, Audrey N.: "Implications of On-Line Systems Techniques for a Decentralized Research Library System," College & Research Li- braries, 30 (March 1969), 112-18. 22. Rayward, W. Boyd: "Libraries as Organizations," College & Research Libraries, 30 (July 1969), 312-26. 23. De Gennaro, Richard: "The Development and Administration of Automated Systems In Academic Libraries," Journal of Library Auto- mation, 1 (March 1968), 75-91. 24. "The Costs of Library and Informational Services." Knight, Douglas M.; Nourse, E. Shepley, eds.: In Libraries at Large (New York: R. R. Bowker, 1969), 168-227. 25. Cuadra, Carlos A.: "Libraries and Technological Forces Affecting Them," ALA Bulletin, 63 (June 1969), 759-68. 26. Culbertson, DonS.: "The Costs of Data Processing in University Li- braries : in Book Acquisition and Cataloging," College & Research Li- braries, 24 (November 1963), 487-89. 27. Dolby, J. L.; Forsyth, V.; and Resnikoff, H. L.: An Evaluation of the Utility and Cost of Computerized Library Catalogs. Final Report Project No. 7-1182, U. S. Department of Health, Education and Wel- fare. 10 July 1968, ERIC ED 022517. 28. Kilgour, Frederick G.: ''The Economic Goal of Library Automation," College & Research Libraries~ 30 (July 1969), 307-11. 29. Knight, Kenneth E.: 'Evolving Computer Performance," Datamation, 14 (January 1968), 31-5. 5128 ---- METHODS OF RANDOMIZATION OF LARGE FILES WITH HIGH VOLATILITY 79 Patrick C. MITCHELL: Senior Programmer, Washington State University, Pullman, Washington, and Thomas K. BURGESS: Project Manager, Institute of Library Research, University of California, Los Angeles, California Key-to-address conversion algorithms which have been used for a large, direct access file are compared with respect to record density and access time. Cumulative distribution functions are plotted to demonstrate the distribution of addresses generated by each method. The long-standing practice of counting address collisions is shown to be less valuable in fudging algorithm effectiveness than considering the maximum number of contiguously occupied file locations. The random access disk file used by the Washington State University Library Acquisition sub-system is a large file with a sizable number of records being added and deleted daily. This file represents not only mate- rials on order by the Acquisitions Section, but all materials which are in process within the Technical Services area of the Library. The size of the file currently varies from approximately 12,000 to 15,000 items and has a capacity of 18,000 items. Over 40,000 items are added and purged annually. Each record consists of both fixed length fields and variable length fields. Fixed fields primarily contain quantity and accounting in- formation; the variable length fields represent bibliographic data. Records are blocked at 1,000 characters for file structuring purposes; however the variable length information is treated as strings of characters with delimi- ters. The key to the file is a 16-character structure which is developed from the purchase order number. The structure of the key is as follows: six digits of the original purchase order number, two digits of partial order and credit information, and eight digits containing the computed relative record address. Proper development of this key turns out to be 80 Journal of Library Automation Vol 3/1 March, 1970 the most important factor in achieving efficiency in both file access time and record density within the file. The W.S.U. purchase order numbering system, developed from a basic six-digit purchase order number, allows up to one million entries. Of these, the Library currently uses four blocks: one block for standing orders, one block for orders originating from the University after the system becomes operational, another block used by the systems people in prototype testing of the system, and a fourth block which was given to one vendor who operates an approval book program. In mapping a possible million numbers into eighteen thousand disk lo- cations, there is a high probability that the disk addresses for more than one record will be the same. Disk location, also called disk address, home position, and relative record address ( RRA) in this paper, refers to the computed offset address of a record in the file, relative to the starting address of the file. Currently, the file resides on an IBM 2316 disk pack which can store six 1000-character records per track. Thus if the starting address of the file is track 40, a record with RRA = 5 would have its home position on track 40, while a record with RRA = 6 would have its home position on track 41. It should be noted that routines in this system are required to calculate neither absolute track address nor relative track address and therefore the file could be moved to any direct access device supported by OS/BDAM without program modification. When two records map into the same address, it is called a collision. For a WRITE statement under the IBM 360 Operating System, Basic Direct Access Methods, the system locates that disk address generated and if another record is found there, it sequentially searches from that point forward until a vacant space is found and then stores the new rec- ord in that space. The sequential search is done by a hardware program in the I/ 0 channel and proceeds at the rotational speed of the device on which the file resides. The CPU is free during this period to service other users. Similarily, when searching for a record, the system locates the disk address and matches keys; if they do not match, it sequentially searches forward from that point. Long sequential searches sharply de- grade the operating efficiency of on-line systems. In initial experimentation with this file, it was discovered that some records were 2,500 disk positions away from their computed locations. This seriously reduced response time to the terminals which were operat- ing against those records. The necessity to develop a method for placing each record close to its calculated location became quite obvious. How- ever, the methodology for doing this was not as clear. The upper bound delay for a direct access read/write operation can be defined as the largest number of contiguously occupied record locations within the file. The problem of minimizing this upper bound for a par- ticular file is equivalent to finding an algorithm which maps the keys in such a way that unoccupied locations are interspersed throughout the Randomization of Large Files/MITCHELL and BURGESS 81 file space. One method for doing this is to triple the amount of space required for the file. This has been a traditional approach but is unsatis- factory in terms of its efficiency in space utilization. The method first used by the Library was motivated by the necessity to "get on the air." Its requirements were that it be easily implemented and perform to a reasonable degree. The prime modulo scheme seemed to qualify and was selected. As this algorithm was used, the largest prime number within the file size was divided into the purchase order number and the modulo remainder was used as an address; that is, RRA = [Po Modulo Pr] where RRA is the relative record address, Po is the Purchase Order Number, and Pr is a prime number. During the initial period file size grew to about 8,000 records. Because the Acquisitions Section was converting from its manual operation, the file continued to grow in size and the collision problem became pronounced. When the file reached about 70% capacity-that is when 70% of the space allocated for the file was being occupied by records-this method became unusable; rec- ords were then located so far from their original addresses that terminal response times became degraded and batch process routines began to have significant increases in run times. With no additional space available to expand the size of the file, it became necessary to increase the record density within the existing file bounds. Therefore an adaptation of the original algorithm was developed. In addition to generating the original number by dividing a prime num- ber into the purchase order number and keeping the modulo remainder, the purchase order number was multiplied by 300 and divided by that same prime number to get an additional modulo remainder; the latter was added to the first modulo remainder and the sum then divided by 2: (Po Modulo Pr) + (300 • Po Modulo Pr) 2 RRA = Again this scheme brought some relief, but the file continued to grow as the system was implemented, and it became obvious that this procedure would also fail because of over-crowded areas in the file. A search of the literature using W. B. Climenson's chapter on file struc- ture ( 2) as a start provided some other methods for reducing the colli- sion problem ( 1, 3, 4, 5, 6). Several randomization or hashing schemes were examined. However, none of these methods appeared to be particu- larly pertinent to the set of conditions at Washington State. In order to bring relief from the continuing problem of file and pro- gram maintenance involved with changing the file-mapping algorithm, research was initiated to devise an algorithm which would, independent of the input data, map records uniformly across the available file space. The algorithm which resulted utilizes a pseudo-random number gen- erator, RAND (7) developed at the W.S.U. Computing Center RANDL, Program 360L-13.5.004, Computing Center Library, Computing Center, 82 Journal of Library Automation Vol 3/ 1 March, 1970 Washington State University, Pullman, Washington. The normal use of RAND is to generate a sequence of uniformly distributed integers over the interval [1, M], where M is a specified upper bound in the interval [1, 231 -1]. In addition to M, RAND has a second input parameter: N, which is the last number generated by RAND. Given M and N, RAND generates a result R. RAND is used by the algorithm to generate relative disk addresses by setting M to the size or capacity of the file, by setting N to the purchase order number of the record to be located, and by using R as the relative address of the record. RRA =RAND (Po, M ) . In order to test the effectiveness of this algorithm and others which might be devised, a file simulation program was written BDAMSIM, Pro- gram 360L-06.7.008, Computing Center Library, Computing Center, Washington State University, Pullman, Washington. Inputs to this pro- gram are: a) an algorithm to generate relative record locations; b) a sequential file which contains the input data for "a"; c) various scalar values such as file capacity, approximate number of records in the file, title of output, etc. The program analyzes the numbers generated by "a" operating on "b" within the constraints of "c". The outputs of the program are some sta- tistical results and a graphical plot showing the cumulative distribution function of the generated addresses. Figures 1, 2, and 3 show the plotted output of the three algorithms operating against the current acquisitions file. The abscissas of the plots 8 • )!! II! li 1i :;! 5I ::! !':! ~ ~ N N ~~ a= .. - ~, ,~ -' -' ~)11 I'! a; ·:5 ~li Ma! 0.. 0.. .. .. it ,:: ~ ~ ~--~~~~-±~~~--~~~~~~~~--~--~--~--~~0 21 , 10 '12.20 83.30 111,'10 105. 51 I:M.61 1~7.71 1811,81 IM.tl 211.01 2$!,11 253.2f RELRT IVE RECORD ADDRESSES lX I 02 l Fig. 1. RRA =Po Modulo Pr Randomization of Large Files/ MITCHELL and BURGESS 83 Fig. 2. RRA = ( (Po Modulo Pr) + (300 x Po Modulo Pr) )/ 2. 8 i )C II! ~ ~ Z! 5I Fl !! l':! I<; ~ ;;; :::::: ;::: ~8 z 8::: .; .; ::~ ~ ,.. ~ ..J ..J ~~ ~iii ~M :ti~ a: a: "- "- "' "' ~ ~ ~ ~ Fig. 3. RRA =RAND (Po, Pr). 84 Journal of Library A implications of MARC, and the· Library of Congress systems studies. (this paper includes twenty-eight pages. of ap-· pendices., mostly charts}., Two additional papers include a discussion of the future of, and a tabulation of trends affecting, library automation. Mm:h of the material in these non-survey papers. is reported more com- pletely elsewhere and some of ft now seems dated. The material presented in this publication must have produced a highly effective educational institute in 1967. In 1969~ its value is at best as a first reader in library automation but not as the state-of-the-art review the title proclaims. Charles T . Payne 90 Journal of Library Automation Vol. 3/1 March, 1970 Computers and data processing: Information Sources, by Chester Morrill, Jr. An annotated guide to the literature, associations, and institutions con- cerned with input, throughput, and output of data. Detroit: Gale Re- search Co., [1969]. 275 pp. $8.75. (Management Information Guide, 15) This latest volume in the Management Information Guide Series should prove as useful as its predecessors, offering to those persons interested in or concerned with computers and data processing (and who now is not?) an organized and extensive survey of the basic and necessary source of available information. Thus the text is for the most part an annotated bibliography of pertinent references arranged in broad categories, each category prefaced with a paragraph or two of comment. This is in the style of Mr. Morrill's earlier contribution to the series, Systems and Pro- cedures Including Office Management, 1967 and, in general, that of all the volumes of the series. Section 7 "Operating" is the largest category, some forty pages of references subdivided into "Manuals," "Digital Computers," "Data Transmission," "FORTRAN," "Software" and the like. Section 9, entitled "Front Office References," is of particular interest to the reference librarian, since it serves as a guide to desirable dictionaries, handbooks and abstracting services in the fields of automation and data processing. Individual annotations are usually brief, informative and on occasion evaluative. They give evidence of considerable skill in the art of capsule characterization. The prefatory paragraphs and notes to each section char- acterize the particular topic as successfully and succinctly as do the indi- vidual annotations. The preface to Section 3, "Personnel," is particularly felicitous. Coverage is ample not only as to the subjects chosen but also as to numbers of references under individual subjects. An important thirty pages of appendices lists additional sources of in- formation - associations, manufacturers, seminars, publishers, placement firms, etc.-particularly valuable to the business man or government offi- cial as a desk or front-office reference book, although the librarian will also find it of value in providing specific information for his clientele. In all, this is a highly competent and very welcome addition to the Series as well as to the ranks of special reference sources so necessary to the proper practice of the reference librarian's art. I think of Crane's A Guide to the Literature of Chemistry and White's Sources of Information in the Social Sciences and consider the author quite comfortable in their company as well as in that of his colleagues in the series. In addition, he evinces in his annotations and prefaces a wit, a turn of phrase and a capacity for direct statement that inform and delight the user. He dis- plays an expertise in the fields of management and computer science, and one feels one can rely on his selection and judgment. Eleanor R. Devlin Book Reviews 91 Cenralized Book Processing: A Feasibility Study Based on Colorado Aca- demic Libraries by Lawrence E. Leonard, Joan M. Maier and Richard M. Dougherty. Metuchen, N.J.: Scarecrow Press, 1969. 401 pp. $10.00. In October 1966 the National Science Foundation awarded a grant to the University of Colorado Libraries and the Colorado Council of Li- brarians for research in the area of centralized processing. The project was in three phases. Phase I involved an examination of the feasibility of establishing a book-processing center to serve the needs of the nine state-supported college and university libraries in Colorado (which range in size from the University of Colorado, with 805,959 volumes as of June 30, 1967, to Metropolitan State College, a new institution with 8,310 vol- umes). Phase II involved a simulation study of the proposed center, while Phase III involved an operational book-processing center on a one-year experimental basis. This book summarizes the results of the first two phases of the study. Phase I involved a detailed time-and-cost analysis of the acquisition, cata- loging, and bookkeeping procedures in the nine participating libraries, with resultant processing costs per volume which are both convincing and somewhat startling, ranging as they do from $2.67 to $7.71 per volume. The operating specifications of the proposed book-processing center are then set forth and a mathematical model for simulating its operations under a variety of alternative conditions is prepared. The conclusions are less than surprising: "A centralized book processing center to serve the needs of the academic libraries in Colorado is a viable approach to book processing." Project benefits are enumerated, in the areas of cost savings, time-lag reductions, and the more efficient utiliza- tion of personnel. Unfortunately, while many of the conclusions are but- tressed by a dazzling array of tables and mathematical formulas (how can most librarians really argue with a regression analysis correlation co- efficient matrix?), some of the most important savings cited are based on simple guesses, in some cases very simple guesses. To mention just two examples: 1) We are told that "a discount advantage expected through the use of combined ordering and a larger volume of ordering is con- servatively estimated at 5% ... " (Perhaps, but what is this based on?) 2) In the area of time lag reduction, "the greatest savings in time will accrue when the center is able to purchase materials from a vendor who has built up his book stock to reflect the needs of academic institutions. Up to now, vendors have been unwilling to do this because there is in- sufficient profit motive." Would nine libraries combining together change this profit picture? It is unfortunate that this report could not have waited on Phase III, the completion of the one-year trial of the operational center which was to have been ready in August 1969, so that we could see just how the pre- dictions for the center worked out in practice. As it stands, however, the 92 Journal of Library Autcmuztion Vol 3/1 March, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activi- ties can yield real benefits to librarians everywhere, be they ever so de- centralized. Norman Dudley A Guide to a Selection of Computer-Based Science and Technology Reference Services in the U.S.A., American Library Association, Chicago, Illinois, 1969, 29 pages. $1.50. This Guide is an attempt to bring together those reference publications which are also available in machine readable form. As a "selection" it is limited to eighteen sources from government, professional and private organizations. The Guide is the result of a survey undertaken in 1968 by the Science and Technology Reference Services Committee of the American Library Association Reference Services Division. The committee was composed of Elsie Bergland, John McGowan, William Page, Joseph Paulukonis, Margaret Simonds, George Caldwell, Robert Krupp and Richard Snyder. Each entry is broken down into three units: 1) the Characteristics of the Data Base, 2) the Equipment Configuration and 3) the Use of the File. Subject headings under Characteristics of the Data Base include subject matter, literature surveyed, types of material covered, etc. The Equipment Configuration section describes computer model, core, operating systems, and programming language. The Use of the File section covers potential uses of the data base by the producer and the subscriber. Unfortunately for publications of this sort, they become out of date rather quickly. The continuing series, The Directory of Computerized Information in Science and Technology, is updated periodically and is a very useful reference tool in this field. Ge"y D. Guthrie 92 Journal of Library Autonuztion Vol 3/1 March, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activi- ties can yield real benefits to librarians everywhere, be they ever so de- centralized. Norman Dudley A Guide to a Selection of Computer-Based Science and Technology Reference Services in the U.S.A., American Library Association, Chicago, Illinois, 1969, 29 pages. $1.50. This Guide is an attempt to bring together those refere~~e pu~lic,~~o~s which are also available in machine readable form. As a selection 1t lS limited to eighteen sources from government, professional and private organizations. . . The Guide is the result of a survey undertaken m 1968 by the Sc1ence and Technology Reference Services Committee of the American Library Association Reference Services Division. The committee was composed of Elsie Bergland, John McGowan, William Page, Joseph Paulukonis, Margaret Simonds, George Caldwell, Robert Krupp and Richard Snyder. Each entry is broken down into three units: 1) the Characteristics of the Data Base, 2) the Equipment Configuration and 3) the Use of the File. Subject headings under Characteristics of the Data Base include subject matter, literature surveyed, types of material covered, etc. The. Equipment Configuration section describes computer model, core, operatmg systems, and programming language. The Use of the File section covers potential uses of the data base by the producer and the subscriber. Unfortunately for publications of this sort, they become out of date rather quickly. The continuing series, The Directory of Computerized Infornuztion in Science and Technology, is updated periodically and is a very useful reference tool in this field. Gerry D. Guthrie \ ORTHOGRAPHIC ERROR PATTERNS OF AUTHOR NAMES IN CATALOG SEARCHES 93 Renata TAGLIACOZZO, Manfred KOCHEN, and Lawrence ROSEN- BERG: Mental Health Research Institute, The University of Michigan, Ann Arbor, Michigan An investigation of error patterns in author names based on data from a survey of library catalog searches. Position of spelling errors was noted and related to length of name. Probability of a name having a spelling error was found to increase with length of name. Nearly half of the spell- ing mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. Computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). In designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. For example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). Preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. A recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). The aim of this paper is to pre- sent and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. In the survey, users were interviewed at random as they approached the catalog. Of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). Of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. Approximately half the searchers had a written citation, while half relied on memory for the relevant in- 5245 ---- \ ORTHOGRAPHIC ERROR PATTERNS OF AUTHOR NAMES IN CATALOG SEARCHES 93 Renata TAGLIACOZZO, Manfred KOCHEN, and Lawrence ROSEN- BERG: Mental Health Research Institute, The University of Michigan, Ann Arbor, Michigan An investigation of error patterns in author names based on data from a survey of library catalog searches. Position of spelling errors was noted and related to length of name. Probability of a name having a spelling error was found to increase with length of name. Nearly half of the spell- ing mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. Computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). In designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. For example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). Preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. A recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). The aim of this paper is to pre- sent and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. In the survey, users were interviewed at random as they approached the catalog. Of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). Of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. Approximately half the searchers had a written citation, while half relied on memory for the relevant in- 94 Journal of Library Automation Vol. 3/2 June, 1970 formation. Paradoxically, though most known-item searchers tried to match primarily an author and only secondarily a title, there were in the sample of searches many more cases of exact title citation than of exact author citation. IMPERFECT RECALL OF AUTHOR NAME Of the 1489 "known-item" searches, 1356 could be verified against the actual item. From the total nwnber of searches ( 1260) in which the catalog user had provided an author's (or editor's) name, those works were subtracted which did not have a personal authorship ( 208) or had multiple authors or multiple editors ( 127). This left 925 searches, of which 470 had complete and correct author entries, while 455 contained various degrees of imperfection in the author citation. Table 1 gives the distribu- tion of incorrect and/or incomplete author citations. In the study an au- thor's name was defined as incomplete when the first name, or the two initials, or one out of two initials was missing. Table 1. Incorrect and/or Incomplete Author Names Categories University of Michigan Libraries I II III Total General Library 144 25 6 175 Undergraduate Library 94 35 4 133 Medical Library 110 27 10 147 -- - Total 348 87 20 455 In Category I (the most numerous) the author's last name was correct, but the author citation as a whole was either incomplete or incorrect; i.e., there were mistakes and/or omissions in the first and middle name or initials. Most of the searches in Category I were incomplete rather than incorrect. Since in Category I there is nothing wrong with the au- thor's last name, the searcher's ability to gain access to the right location in the catalog is presumably not impaired as long as the last name is not too common. Once the searcher has entered the catalog, he will make use of other clues, such as title or knowledge of the topic, to identify the right item. But if the name is Smith or Brown or Johnson, and the catalog is a large one, to have an incomplete author's name may be equivalent to having no name at all. (In the University of Michigan General Library catalog, which contains over four million cards, the entry "Smith" extends over eight drawers, and the entries "Brown" and "John- son" over four drawers each.) In an automated catalog it is easy to limit the set of entries from which the right item has to be selected by inter- secting the last name of the author with some other clues. Incomplete- ness of the author name may then not be a serious handicap. Orthographic Error PatternsfTAGLIACOZZO 95 Category III includes all searches in which the searcher had an author that turned out to be wrong. The error in this case was not in incom- pleteness or misspelling of the author's name, but in the identity of the author. No further analysis of this group was conducted. Category II is the one which forms the object o£ the present report. The analysis concerns mainly position and type of errors, and the inci- dence of errors as related to name length. POSITION OF ERRORS IN AUTHOR NAMES The location of errors in the author citation is important for manual systems, such as traditional library card catalogs, as well as for auto- mated systems. Table 2 shows the distribution of E in the sample of incorrect author citations from all three libraries, where E is the position of the letter, counting from left to right, in which an error appeared. In the fourteen cases in which more than one error occurred in the same name, only the first error was considered. In a few cases the error involved a string of letters (e.g., Friedman for Friedberg). In such cases the position of the first letter of the string determined the location of the error. Table 2. Position of Error in Last Name of Author Incorrect Names E No. % Cumulative % 1 2 2.3 2.3 2 11 12.6 14.9 3 11 12.6 27.6 4 19 21.8 49.4 5 13 14.9 64.4 6 12 13.8 78.2 7 7 8.0 86.2 8 6 6.9 93.1 9 3 3.4 96.6 10 2 2.3 98.9 11 1 1.1 100.0 Total 87 Table 2 shows that about half the incorrect author names had errors in one of the first four letters, while the other half had errors in one of the following letters, from the fifth to the eleventh position. The most frequently misspelled is the fourth letter, which is responsible for 21.8% of the total number of errors occurring in the sample. The ordinal number indicating the position of the error is not, by itself, a sufficient indicator of the area where the error occurred. An error in the third letter, for instance, is close to the beginning of the name if the 96 ]ourMl of Library Automation Vol. 3/ 2 June, 1970 name is 9 letters long, but close to the end if the name is 4 letters long. In Table 3 L indicates the length (the number of letters) of the authot name and Pa the location of the error-i.e., the position of the first letter, counting from left to right, where an error appears. The incorrect author names of the sample ( 87) have a length of between 3 and 12 letters. The column on the right of the table, EL, indicates the distribution of names of a given length. The row at the bottom of the table gives the distribution of errors occurring in a given position. Mistakes are shown to occur anywhere from the first letter to the eleventh letter. When the error consists in the addition of a letter to the end of the correct name, Pa is beyond the name itself. The figures which appear next to the dia- gonal line, on the right, indicate mistakes of this sort. A sununary inspection of the table produces the impression that errors are clustered toward the end of the names, or at least that they are more prevalent in the second half of the name than in the first half. This seems to be a direct consequence of the fact that the first column of the table (errors in position 1) is almost empty. It is tempting to say that errors very rarely occur in the first letter of a proper name. But is this really so? It is true that English-speaking people place particular emphasis on initials, to the extent that initials are often sufficient for identifying well-known figures. The special attention given to the first Table 3. Position of Error vs. Length of Name Length (L) Errors (PE) Frequency (EL) 1 2 3 4 5 6 7 8 9 10 11 3 1 1 4 1 3 5 5 1 2 1 7 6 1 3 6 21 7 4 2 6 19 8 2 3 2 16 9 2 1 1 1 1 1 1 8 10 1 1 2 1 2 7 11 1 1 2 12 1 1 Total 2 11 11 19 13 12 7 6 3 2 1 87 Orthographic Error PatternsjTAGLIACOZZO 97 letter of a name would certainly contribute to the scarcity of errors in such a letter. But it is also possible that when errors in the first letter occur, they so transform the name that it becomes unrecognizable. Sev- eral such authors may have ended up in the category of non-verified authors necessarily excluded from the analysis. It would be interesting to verify whether the "serial-position effect" that some authors found in the spelling of common nouns is present also in the spelling of proper names. According to Jensen and to Kooi et al., the distribution of spelling errors in relation to letter position closely approximates the serial-position curve for errors found in serial rote learn- ing ( 11, 12). To ascertain if this is the case for author names, a data base much larger than that used for this study would be needed. DISTRIBUTION OF ERRORS AND LENGTH OF NAMES Is the probability of a catalog searcher misspelling the name of an author dependent to any extent on the length of the name? Table 3 shows the frequency of occurrence of names of a given length in the 87 misspelled names (column EL). The next step was to calculate the dis- tribution of the length of author names in the whole group of verified author citations provided by the catalog searchers. This group, it should be remembered, does not include multiple authors, multiple editors or non- personal authors. The ratio of the corresponding figures in the two distri- butions will give the percentage of names of a given length having spelling mistakes (Table 4) . Table 4. Probability of Errors in Recall of Author Names of a Given Length Length Frequency of Frequency of Percentage of of Name Incorrect Names All Names Incorrect Names 2 1 3 1 9 11.1%} 4 5 87 5.7% 4.9% (short 5 7 169 4. 1% names) 6 21 215 9.8%"\ 7 19 191 9.9% J 10.5% (medium 8 16 127 12.6% names) 9 8 59 13.6%} 10 7 36 19.4% 14.3% (long 11 2 26 7.7% names) 12 1 5 20.0% 87 925 There is an observable trend toward an increase of mistakes with length of name. Of course, the two extremes of length distribution are scarcely 98 Journal of Library Automation Vol. 3/2 June, 1970 represented, and this is probably responsible for inconsistencies in the percentage disb·ibution. Grouping names into three length categories (i.e., short names, middle-length names, and long names) makes more apparent differences in percentages of incorrect names. The differences are significant at the .01 level of confidence. TYPE OF ERROR IN AUTHOR NAMES Errors which occurred in the spelling of the last names of authors were grouped into four broad categories: replacement errors, omission errors, addition errors, and transposition errors. While it is true, especially in badly mangled words, that an error can often be said to be of any of several types, it was generally easy to identify the simplest necessary transformation of the letters, and to assign the incorrect name to the type of error corresponding to that kind of transformation. In some cases this meant adding a string of letters or replacing one string by another. Altogether the sample of 87 incorrect authors contained 104 errors. Eleven names exhibited two errors each, three had three errors, and the remaining just one error. Of the 104 errors, 50 were replacement errors; these are cases in which one letter or string of letters of the correct name has been replaced by a different letter or string of letters (e.g. Hoiser for Hoijer, Friedman for Friedberg). The most common replacement errors appear in Table 5, in order of decreasing frequency. Table 5. Single-Letter Replacement Errors No. of Errors Correct Lettet' Incorrect Letter 6 0 a, a, a, a, p, r 5 a, e, y, y, y 4 y a, i, u, z 3 a i, o, 0 3 s c, r, z 3 v b, f, w 2 e i, 0 2 g c, r 28 Not included in the table are the 10 letters which were each replaced just once and the 12 strings of letters. In four cases, the replaced letter was the second of a double letter. There were 34 omission errors in all. Four of these involved a string of letters; all the rest were single-letter omissions. Eleven single-letter omissions occurred in the last letter of the name (e.g. Abbot instead of Abbott), and 19 in the middle of the name (e. g. Brent instead of Orthographic Error PatternsjTAGLIACOZZO 99 Brendt). Table 6 gives the frequency distribution of the omitted letters. The asterisk indicates that the omitted letter was the second of a double letter. Table 6. Single-Letter Omission Errors No. of Error in Middle Error in Final Letter Errors Position Position Omitted 8 5 3 e 4 4 a 4 40 t 3 1 20 n 2 2 h 2 2 i 2 20 1 2 1 1 s 1 1 c 1 1 d 1 1 r 30 Addition errors totaled 18. In one case the addition consisted of a string of letters, while in the others only one letter was added. Addition errors can occur in the middle of a name (e.g. Berelison for Berelson) or at the end of it (e.g. Haller for Halle). In the latter case, the added letter is found beyond the last letter of the correct name (these were the errors on the right of the diagonal in Table 3). The distribution of addition errors is shown in Table 7. The asterisk indicates that the added letter duplicated the previous letter. Table 7. Single-Letter Addition Errors No. of Error in Middle Errors Position 5 2 2 2 1 1 1 1 l 1 17 Error in Final Position 4 1 1 1 .L Added Letter s c e i a f 1 m n z 100 Journal of Library Automation Vol. 3/2 June, 1970 There were two transposition errors: ie for ei and ai for ia. In cases of second and third errors in the name, there were five replacement errors, seven omission errors, and five addition errors. Table 8 summarizes the type of errors encountered in the sample of incorrect authors. Figures in this table include strings as well as single letters, and second and third errors, as well as first errors. Table 8. Distribution of Types of Errors Middle Position Replacement errors Omission errors Addition errors Transposition errors CONCLUSION Four trends could be observed: 44 21 10 2 Final Total Position 6 50 13 34 8 18 2 104 1) Vowels usually replaced vowels, and consonants usually replaced consonants. Apparently the probability of misspelling a single letter was slightly higher for vowels than for consonants. With the latter, there is some indication that the substitution was guided by phonetic similarity ( " » • 1 d b "b" "f" " ") e.g., v IS rep ace y , or , or w . 2) Most omissions in which the correct name had a double letter oc- curred at the end of the word. 3) Replacement errors tended to come earlier in words than did omis- sions and additions. (This is not due to the fact that addition and omis- sion errors contained a disproportionately high number of final errors; even when these final errors are excluded, replacement errors still come earlier than other types.) 4) Second and third errors in a name have comparatively few replace- ment errors. ACKNOWLEDGMENT This work was supported in part by the National Science Foundation, Grant GN 716. REFERENCES 1. Kilgour, F. G.: "Retrieval of Single Entries from a Computerized Library Catalog File," Proceedings of the American Society for I nfo1'11Ultion Science, 5 ( 1968), 133-136. 2. Nugent, William R. : "Compression Word Coding Techniques for Information Retrieval," Journal of Library Automation, 1 (Decem- ber 1968), 250-260. Orthographic Error PatternsjTAGLIACOZZO 101 3. Ruecking, Frederick H ., Jr.: "Bibliographic Retrieval from Biblio- graphic Input; the Hypothesis and Construction of a Test," Journal of Library Automation, 1 (December 1968), 227-238. 4. Dolby, James L. : "An Algorithm for Noisy Matches in Catalog Searching." In: A Study of the Organization and Search of Biblio- graphic Holdings Records in On-Line Computer Systems: Phase I. (Berkeley, Cal. : Institute of Library Research, University of Califor- nia March 1969 ), 119-136. 5. Peterson, William W.: Error Correcting Codes (New York: Wiley, 1961). 6. Alberga, Cyril N.: "String Similarity and Mispellings," Communica- tions of the ACM, 10 ( 1967), 302-313. 7. Galli, Enrico J.; Yamada, Hisao M.: "Experimental Studies in Com- puter-Assisted Correction of Unorthographic Text," IEEE Transac- tions on Engineering Writing and Speech, EWS-11 (August 1968), 75-84. 8. Tagliacozzo, R., et al.: "Patterns of Searching in Library Catalogs." In: Integrative Mechanisms in Literature Growth. Vol IV. (Uni- versity of Michigan, Mental Health Research Institute, January 1970). Report to the National Science Foundation, GN 716. 9. University of Chicago Graduate Library School: Requirements Study for Future Catalogs, (Chicago : University of Chicago Graduate Li- brary School, 1968) . 10. Tagliacozzo, Renata; Rosenberg, Lawrence; Kochen, Manfred: Access and Recognition: From Users' Data to Catalog Entries (Ann Arbor, Mich.: The University of Michigan, Mental Health Research Institute, October 1969, Communication No. 257) . 11. Jensen, Arthur R.: "Spelling Errors and the Serial-Position Effect," Journal of Educational Psychology, 53 (June 1962), 105-109. 12. Kooi, Beverly Y.; Schutz, Richard E.; Baker, Robert L.: "Spelling Errors and the Serial-Position Effect," Journal of Educational Psy- chology, 56 ( 1965), .334-336. 5246 ---- 102 THE RECON PILOT PROJECT: A PROGRESS REPORT Henriette D. A VRAM: Project Director, Information Systems Office, Library of Congress, Washington, D. C. A synthesis of the progress report submitted by the Library of Congress to the Council on Library Resources under an Officers Grant to initiate the RECON Pilot Project that gives an overview of the project and the progress made from August-November 1969 in the following areas: train- ing, selection of material to be converted, investigation of input devices, and format recognition. INTRODUCTION The RECON Pilot Project is an effort to analyze the problems of large- scale conversion of retrospective catalog records through the actual con- version of approximately 85,000 non-current records. This project has grown directly out of the implementation of the MARC Distribution Service. Libraries considering the use of machine readable records for their current materials have naturally begun to consider conversion of their older records as well. Some libraries have even begun such conver- sion projects. Since the Library of Congress is also interested in the feasibility of converting its own retrospective records, it seemed appropriate to explore the possibility of centralized conversion of retrospective cataloging records and their distribution to the entire library community from a central source. A proposal having been submitted by the Library of Congress to the Council on Library Resources, Inc. ( CLR), the Council granted funds for a study of this problem. An Advisory Committee was appointed to pro- vide guidance, and direct responsibility for the study and report ( 1) was assigned to a Working Task Force. RECON Pilot Project/ AVRAM 103 A recommendation of the Working Task Force was the implementation of a pilot project to test the techniques suggested in the report in an operational environment. Since any feasibility report, no matter how detailed, refers to a theoretical model, the recommended techniques should be tested to determine a most efficient method for a large-scale conver- sion activity. The Advisory Committee concurred with this recommenda- tion. The Library of Congress submitted a proposal for a pilot project (hereinafter referred to as RECON) to CLR, and received an Officer's Grant in August 1969 to initiate RECON while the Council continued its evaluation of the full-sc'ale pilot project. . A progress report was submitted to CLR by the Library covering the period from mid-August to November 1, 1969. So that CLR might have a clear understanding of the work in progress, the report addressed itself to both the areas of RECON supported by the Council and those activi- ties supported by the Library of Congress. In December 1969, CLR awarded the Library the funds requested for the entire pilot project. To make the library community cognizant of RECON as quickly as possible, CLR granted permission to modify the progress report for publication.· OVERVIEW OF THE RECON PILOT PROJECT The pilot project is concerned with the conversion and distribution of an estimated 85,000 English language titles: 22,000 titles cataloged in 1969 and not included in the MARC Distribution Service, and 63,000 titles from 1968. The creation of this data base partially satisfies the conclusions and specific recommendations of the RECON Working Task Force as stated in the report ( 2) : 1) there should be no conversion of any category (language or form of material) of retrospective records until that category is being currently converted; 2) the initial conversion effort should be limited to English language monograph records issued from 1960 to date and converted into machine readable form in reverse chronological order. (MARC Distribution Service covers current English language monographs cataloged by the Library of Congress) . In order to explore the problems encountered in encoding and converting cataloging records for older English language monographs, and monographs in other roman alphabet languages, 5,000 additional titles will be selected and converted. The Library further intends to investigate, through the design and im- plementation of a format recognition program, the use of the computer to assist in the editing of cataloging records. This technique should sig- nificantly reduce the manpower needs of the present method of conver- sion and therefore have an impact on any future Library of Congress conversion activity, either of currently cataloged or retrospective titles. RECON will include experimentation with microfilming and producing hard copy from the LC record set. The record set in the LC Card Division consists of a master copy of the latest version of every LC printed card, arranged by card series and, 104 Journal of Library Automation Vol. 3/2 June, 1970 within each series, by card number. Although a specific time period can be selected for conversion, the primary disadvantage of the record set for this purpose is the fact that not all changes in cataloging made to the LC Official Catalog are reflected in the record set. After considering all the alternatives, the RECON Working Task Force recommended (3) that the record set be used for selection of titles, but that the titles be com- pared with the Official Catalog and updated to insure bibliographic accuracy and completeness. Since the record set is in constant use by Card Division personnel, the selected titles for conversion must be repro- duced, and the original file reconstituted, as quickly as possible. The state of the art of direct-read optical character recognition devices suitable for large-scale conversion will be monitored and experimentation will be conducted with a variety of input devices. RECON is closely related to the LC Card Division Mechanization Project, which is based upon the availability of records in machine read- able form. RECON will be closely coordinated with the Card Division project, both in the design of specifications for implementation and in the investigation of a common hardware/software configuration. The project was organized during August 1969. The first group of rec- ords being edited are those cataloged by the Library of Congress in 1969. In June 1970, the editing of the 1968 records will begin. Since these records will have to be compared with the LC Official Catalog to record any changes, present thinking includes the design of a print program (referred to as a two-up print program) to cut printing time by provid- ing a listing with records arranged in card number sequence (the order of input) and in alphabetic sequence by main entry on the same page. The records will be arranged by main entry to reduce the effort of check- ing them against the Official Catalog and the changed records will be inserted in their proper place in sequence by LC card number. The process of manual editing may be greatly reduced, or perhaps even eliminated, by October 1970, when the format recognition program is scheduled for completion. Mter this time, the records will be input with little or no prior tagging and further editing will be performed by the computer. The resulting records will be examined by the MARC editors both for accuracy in transcription and for correctness in the assign- ment of MARC tags, indicators, and subfield codes. The duration of the pilot project will be twenty-four calendar months, August 1969-August 1971. It is anticipated that by November 1970 enough data should be available to determine whether a full-scale conversion project should be undertaken. An early evaluation of the project is advan- tageous in order to explore the funding possibilities of a conversion effort if the results of the pilot are affirmative. Figure 1 is a calendar indicating the major milestones of RECON as postulated during August 1969. RECON Pilot Project/ AVRAM 105 1969 1970 1971 Au S 0 N D Ja F Mr Ap ~y Jn Jy Au S 0 N D Ja F ~r Ap My Jn Jy Au t_ Project begins • Production staff hired • ISO staff organized Card Division sends 1969, 1968 cards !Investigate input devices, RECON/Card Division hardware/software rrainfng editors trinf index !Reproduction methods for catalog records study !Analysis, editing, etc., research [itles 1 organize cardf for RECON input ~ull editing of 1969 titlFs (16,000 records) rna1ysir of system to convert 1968 titles 1 Full editingr 1968 titles ~er new MTST's ~ire nfw MTST typists fesign and implementation of format recognitifn Fig. 1. RECON Calendar. Use of format recognition on remainder 1968 titles Conversion of MARC , ! to MARC II rnd Interim MARC II to MARC II regin evaluation of pilot pr~ject Begin planning for con• tinuation of project! Begin writing final report 106 Journal of Library Automation Vol. 3/2 June, 1970 Essentially the same Advisory Committee and Working Task Force selected for the RECON feasibility study have agreed to serve in their respective capacities for RECON. The implementation of the Library of Congress' MARC Distribution Service and the initiation of RECON are providing the nucleus of a national bibliographic data base. Creation of this data base is not in itself a panacea for libraries but, in fact, amplifies the need to explore some of the larger issues at this time to provide the direction for future cohesive library systems. Certain aspects of the prob- lems were discussed in general terms in the RECON report but time did not permit full analysis. During the two-year period of RECON, the Working Task Force will consider some of those issues (defined as four tasks listed below) under the grant from CLR. The ability to complete all of the tasks described will be dependent on additional funding, which, it is hoped, may be available early in 1970. 1) Any national data store should have a data base in which all records are consistent. It is possible, and highly probable, that libraries may con- vert bibliographic records for local use, which may not require the detail of a MARC II record. It is imperative that before levels of completeness of MARC records are defined with respect to content and content desig- nation, the implications of these definitions to future library networks be thoroughly explored. 2) Any consideration of a national bibliographic data store in machine readable form should include the possibility of recording titles and hold- ings from other libraries. Although the resolution of the problems asso- ciated with a machine readable national union catalog are enormous, it is time to begin an exploration of the problems to provide guidance for future design efforts. 3) Several institutions have begun the conversion of their cataloging records into machine readable form. The possibility of utilizing these records in building a national bibliographic data store should be investi- gated. This will involve evaluating the difficulty and cost of converting and upgrading records converted by others to a MARC format as opposed to preparing original records. 4) The Library of Congress maintains, and is considering the conver- sion into machine readable form, of its name and subject authority files. Many libraries have expressed interest in receiving these records in the present MARC Distribution Service. Little thought has been given to the storage and maintenance of these large files in each library subscribing to MARC Distribution Service. A library may not have in its collections a bibliographic record requiring either a name or subject cross reference record distributed by the Library of Congress. However, the library will keep the cross reference record because it cannot predict when a title will be added to the collection that does require the cross reference struc- ture. The result will be the eventual storage and maintenance of the ~-==--------------------------------.... RECON Pilot Project/ AVRAM 107 entire LC name and subject reference files in each library. This problem should be explored to determine if there is a possible efficient method of libraries accessing these files from either a centralized source or several regional sources. PROGRESS-AUGUST 1969 TO NOVEMBER 1969 Organization The RECON staff is divided into two sections: 1) the Production Sec- tion, responsible for the actual editing and keying of the records; and 2) the Research and Development Section, responsible for liaison with the Production Section, determination of the criteria for the selection of the 1968 and 1969 titles, actual selection of the 5,000 research titles, in- vestigation of input devices and photocopying techniques, liaison with the Card Division Mechanization Project, and the design and coding of special computer programs unique to RECON. In addition, staff members of the MARC project team in the Information Systems Office (ISO) are work- ing in areas of format recognition and MARC system programming that will affect RECON. Training The MARC experience at the Library of Congress has demonstrated that staff members assigned to the editorial process of preparing catalog records for conversion to machine readable form must be exposed to cataloging fundamentals. Phase I of the training program for the RECON editors was a two- week cataloging class conducted by the supervisor of the Production Sec- tion, a professional librarian with experience in teaching cataloging princi- ples at the Library of Congress. Each day was formally structured into reading, discussion, and practice. The editor-trainees applied the Anglo- American Cataloging Rules ( 4) to practice problems and to actual cata- loging of books. Experience in using the LC subject heading list, filing rules, and classification schedules was provided to a lesser extent. In order to insure that the editor-trainees would have a wider range of experience in examining cataloging copy, the mnemonic MARC tags and the more simple indicators and subfield codes were taught and used to identify explicitly cataloging elements on LC proofslips. Phases II and III of the training, MARC editing and correction pro- cedures, were also taught by professional librarians. The editing class, which lasted two weeks, was divided into lecture sessions and laboratory sessions. Each lecture period was from two to three hours; then, during the laboratory session, the instructions given in the lectures were applied to practice worksheets. The course covered input of variable and fixed fields, assignment of bibliographic codes for language and place of pub- lication, and identification of diacritical marks included in the LC char- acter set. Phase III of the training program, on correction procedures, 108 Journal of Library Automation Vol. 3/2 June, 1970 was a one-week class covering the addition, deletion, and conection of entire records or data elements at the field level. The training period was followed by an intensive practice period using MARC input worksheets, which were reviewed by the experienced editors. Selection of Cards The actual selection of the 1968 and 1969 titles is a joint effort by the Card Division staff and the RECON staff. The procedures for the selec- tion of cards from the Card Division for RECON differ from those de- scribed in the original report. Since only cards for 1968 and 1969 titles are being selected, it is more expedient to draw the cards from the Card Division card stock than to microfilm the record set. These cards will include all titles cataloged by the Library of Congress during 1968 and 1969 regardless of language or form of material, which will yield approxi- mately 250,000 cards. The cards are forwarded to the Production Section from the Card Division, where each record is inspected to determine whether it meets the criteria established for RECON, i.e., all English language monographs with an LC catalog card number representing works cataloged by LC in 1968 and 1969 that are not already in machine readable form. The determination as to whether or not an item is in English is based upon the text, not the title page. An anthology of literature in Spanish with a title page in English would not be included in RECON; a book with text in English but title page in French would be included. If a book is multilingual (complete text in more than one language), the language of the first title determines inclusion or exclusion for RECON. Atlases are included, but not single maps or set maps. Music or music scores are excluded, but books about music are included. Records rep- resenting film strips, moving pictures, serials, and other kinds of mate- rials not regarded as monographs are excluded. Once the cards eligible for RECON are selected and arranged in LC card number sequence, the cards are compared with the Print Index listing all records already in machine readable form. Those records not in machine readable form are photocopied onto the input worksheet for editing and keying. To date, 60,000 cards have been selected by Card Division staff and forwarded to the production staff for further processing. Selection of Research Titles An integral part of RECON is the conversion of 5,000 titles to machine readable form for research purposes. Ideally, these titles should serve not only the needs of RECON but also be useful for some other purpose in the Library of Congress. These titles would include English language monographs cataloged before 1950, and foreign language material using the roman alphabet, and would be used to test various methods of input RECON Pilot ProjectjAVRAM 109 and certain aspects of the format recognition program. The older mate- rial would represent records cataloged under earlier cataloging rules and would reveal problems in conversion in an area in which little informa- tion exists. Two sources were initially considered for the selection of research titles: 1) titles in the Main Reading Room collection for conversion into machine readable form for the production of book catalogs, and 2) the popular titles (cards ordered most frequently) of the Card Division Mechanization Project. A decision was made to study the titles in both sources with priority given to solution of conversion problems and to determine: 1) if overlap existed in records for both projects that would also serve the needs of RECON; 2) if overlap did not exist, which titles (Main Reading Room collection or Card Division popular titles) best served the needs of RECON; and 3) if the titles in neither project were suitable, the method of selection to be used from the Card Division record set. The first task was a study of the characteristics of the Main Reading Room collection. The collection consists of approximately 14,000 titles, and printed cards have been collected to compile a complete shelf-list catalog. These cards represent a wide range of material cataloged from 1900 to date. Approximately one-fourth to one-third represent serials. The collection includes material in most of the roman alphabet languages currently processed at the Library, the more common non-roman alphabet languages, such as Russian, Japanese, Hebrew, etc., and a number of "difficult" titles, such as encyclopedias, dictionaries, etc., that would pre- sent a variety of cataloging and editing problems. The second task was a study of the popular titles from the Card Division. The Card Division provided a printout of card numbers for titles with 25 or more orders. There were 4,765 such card numbers listed with their corresponding number of orders. Only 210 of these were for pre-1950 cards, and 97 of the 210 cards were for serial titles. Only 15 out of the 210 cards were for "difficult" titles. Another list was produced which contained card numbers for titles with ten or more orders. This list (with 39,148 card numbers) did pro- duce more titles that would meet the research needs of RECON. A sampling technique was designed by the Technical Processes Research Office to determine the percentage of overlap of this list with the titles in the Main Reading Room reference collection. The estimated number of matches ( 15.5%) indicated that not enough overlap existed to con- sider a selection of titles that would serve the needs of both projects (Main Reading Room collection and Card Division) and RECON. There- fore, the research titles are being selected from records for the reference collection. ISO is working closely with staff members of the Reference Department on this project. The Reference Department is providing local informa- llO Journal of Library Automation Vol. 3/2 June, 1970 tion (e.g., local call number to locate the item in the reference collection as opposed to the LC call number which locates the item in the general collection) for all titles. As this process is completed, the responsible RECON staff member is selecting the research titles. To date, "local" information has been added to 2,000 records, and 400 RECON titles have been selected from this group of records. Computer Programs The only computer program implemented to date is the Print Index Program. This program was required to check the records meeting the manual selection criteria for inclusion in RECON against records in existing machine readable data bases to avoid duplicate input. Print Index lists by card number all records in machine readable form in either the MARC I or MARC II data bases. At a later date, the 1968 titles found on the MARC I data base will be processed by a subset of the format recognition program and converted to the MARC II processing format. The Print Index Program is made up of two routines. The LC catalog card number routine reads each record, extracts the LC card number and creates a magnetic tape file of numbers (called Print Index Tape). The tape created contains a card number right justified for machine sorting, a card number in the same form (zeros deleted) as the number on the printed card, and a data base code indicating the file in which the record originally resided (e.g., MARC II Data Base, MARC II Practice Tape, MARC I Data Base). A parameter card is used to indicate which format and data base is to be processed. The IBM Sort is used to arrange the output of the LC catalog card number routine into the following order: all 6x-series numbers, all 6x- series numbers with alphabetic prefixes (by year of cataloging-i.e., 1968 followed by 1969), all 7 -series numbers (disregarding the check digit, the second digit in the number). The LC card number print routine prints the card numbers, which are in numeric sequence as described in the preceding paragraphs, from the Print Index Tape. Each page of the listing contains a heading, a running index, a date, and a page number. The program prints 200 card num- bers and data base codes per page. The numbers are in ascending order, top to bottom in four columns of 50 numbers each. Format Recognition The experience of the Library in the creation of machine readable cataloging records during the MARC Pilot Project and the MARC Dis- h·ibution Service has clearly demonstrated that the highest cost factor of conversion is the human editing and proofing. The editing presently consists of assigning tags and codes to the bibliographic record to ex- plicitly identify the content of the record for machine manipulation. The RECON Pilot ProjectfAVRAM 111 Library has completed a format recognition feasibility study which con- cluded that the probability of success of automatically assigning tags and codes by computer is high. Since the format recognition feasibility study was only concerned with cataloging records for current English language monographs, the study must be extended to cover other roman alphabet languages and as part of RECON, records which were created according to different rules and conventions. Although the progress report submitted to CLR included the defini- tion and status of each of the tasks that make up the format recognition program, these have been omitted to avoid duplication with an article recently published in the I ournal of Library Automation ( 5) describing format recognition concepts in some detail and elaborating on the tasks completed and projected at that time. Investigation of Input Devices The investigation of input devices and the testing of several selected devices in an operational mode will continue throughout RECON. A study of the use of a mini-computer operating in an on-line mode for input, editing, and formatting of MARC records is in progress at the Library and will supplement the RECON effort and provide additional data. A preliminary investigation was begun of optical character readers commercially available and in the developmental phases. Only those read- ers capable of reading numerous characters on many lines (page reader) as opposed to a limited number of characters or lines per document (document reader) were included in the study. The machines evaluated were considered as possible candidates if they were capable of processing upper- and lower-case alphabetic characters, numerals, standard punctuation and some special symbols. Each manu- facturer has specifications for the type of paper required and the font style which can be recognized. Paper handling is a major drawback of optical character readers. Excessive handling of the paper or any type of smear, crease, or crinkle could cause rejection of a character or con- version of a character to some specified symbol indicating an invalid char- acter. Error rates for the devices considered range from one to 35 char- acters per 10,000 characters and 80% of the errors are caused by paper handling. Typewriters used to prepare the source document must be constantly cleaned and ribbons changed to keep impact keys free of dirt. Frequent jamming appears to be a characteristic of most machines; un- jamming these machines can be difficult and is highly dependent upon the skill of the operator. Ten companies that have various types of optical character recognition equipment commercially available were considered in the first study. Five were immediately rejected because their devices did not meet the criteria as specified above. 112 Journal of Libmry Automation Vol. 3/2 June, 1970 The devices remaining had the following characteristics: Control Data Corporation 915 Page Reader. Accepts 2.5x4 to 12x14- inch paper; OCR-A standard type font; recognizes upper-case alphas, numerals, and standard punctuation; through pro- gramming and use of special symbols, lower-case alphas can be coded. Farrington Model 3030. Accepts 4.5x5.5 to 8.5x13.5- inch paper; OCR-A standard and 12L (Farrington) type fonts; recognizes upper- case alphas, numerals, standard punctua- tion and special symbols; through pro- gramming and use of special symbols, lower-case alphas can be coded. Scan-Data Models 100/300. Accepts 8.5xll-inch pa- per; multi-type fonts; recognizes upper- and lower-case alphas, numerals, standard punctuation, and special symbols; has pro- grammable unit for formatting. Philco-Ford General Purpose Reader. Accepts 5.7x8.5x- 11 inch paper; multi-type fonts; recognizes upper-case alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. Recognition Equipment Retina. Accepts 3.25x4.88 to 14.14-inch paper; multi-type fonts; recognizes upper- and lower-case alphas, numerals, standard punctuation, and special symbols; has a programmable unit for formatting. The possibility exists of using any of these five machines for the input of English language material. The keying of an extraneous character is required with the Farrington and Control Data Corporation equipment for lower-case and some special symbols. This is not necessary with Philco-Ford, Scan-Data, and Recognition Equipment machines. Since the number of special symbols vary by machine, each machine must be studied to determine a method of coding the entire library character set as developed by the Library of Congress and this method must be eval- uated in terms of the burden placed on the typist. With the added feature of lower-case recognition, the price of the machine increases substantially. Adequate information has not been obtained from these companies to give an accurate accounting of cost. It should be noted that the rental price for the majority of optical character readers is high, a factor which will have to be taken into consideration at the time of selection of an input device. The most economic route to RECON Pilot Project/ AVRAM 113 conversion may be through a service bureau, depending on the volume of records to be converted. OUTLOOK It is too early in the life of the project to predict the outcome or to describe any factual conclusions. The Library of Congress is greatly encouraged by the interest expressed in the project and the assistance offered by the members of the Advisory Committee and the Working Task Force. The scope of the assignments and the fact that all members of the Working Task Force have responsible positions in their own insti- tutions are clear evidence of the spirit of cooperation that has been exhibited by the Working Task Force members and their parent organ- izations. Other members of the library community have been and will continue to be contacted throughout the project for their expertise in certain facets of the many problems under exploration. Several developing regional networks were requested to describe their plans in the hope that smaller scale efforts would shed some light on the problems involved on a national level. Those organizations contacted have responded, and a continuing liaison will be maintained not only to avoid duplication of effort but, more important, to attain a better under- standing of how to approach the requirements of future library systems in terms of what is possible today. The report submitted to CLR described progress made to November 1, 1969. Since that time, the RECON production staff has selected all the 1969 titles from the card stock to be included in RECON, 5,200 records have been edited, and the first 250 have been forwarded to a service bureau to test its procedures for keying. The staff· has begun the selection of the 1968 titles and out of approximately 26,000 records re- ceived to date from the Card Division 19,000 are RECON candidates. The production section continues its training by the proofing of MARC records until the RECON records are processed through the MARC sys- tem to provide the required diagnostics for the proofing process. Procedures were set up for typing records without any editing and in accordance with the requirements for the format recognition program. Sample records selected for testing the procedures were of above-average difficulty in order to include all types of data that might be encountered. The procedures will be continually evaluated until some optimal method is determined. The format recognition algorithms are being evaluated by having RECON staff simulate a computer and follow through the logic of the algorithms on actual data. Results of the simulation will provide the necessary feedback to adjust the algorithms prior to the coding of the computer programs. Detailed design work has begun on the expansion of the MARC sys- tem to include random access capability and on-line correction. This 114 Journal of Library Automation Vol. 3/2 June, 1970 effort is being coordinated with the Card Division Mechanization Project and is considering the requirements of a large-scale conversion activity. Although it has a long way to go, RECON is on schedule and for any project concerned with automation, that is an encouraging note. For the moment the future looks bright. ACKNOWLEDGMENT The author wishes to thank the RECON staff members of the Library of Congress for their respective reports which were incorporated into the progress report submitted to the Council on Library Resources, Inc., and as such, are significant contributions to this paper. Without the aid of the Council on Library Resources the RECON Project would not have become a reality. Through three important grants the Council has made a major contribution to the Project: 1) the first was a grant in support of the RECON Feasibility Study and the Working Task Force that resulted in the RECON Report; 2) an Officer's Grant enabling the establishment of the RECON Production Unit to create addi- tional machine readable records not included in the MARC Distribution Service; and 3), most importantly, a grant providing full funding for the two-year Pilot Project. REFERENCES 1. Library of Congress; RECON Working Task Force: Conversion of Retrospective Catalog Records to Machine Readable Form. (Wash- ington: Library of Congress, 1969). 2. Ibid, pp. 10-11. 3. Ibid, pp. 20-38. 5. Anglo-American Cataloging Rules. (Chicago: American Library Asso- ciation, 1967). 4. Avram, Henriette D., et al.: MARC Program Research and Develop- ment: A Progress Report," Journal of Library Automation, 2 (Decem- ber 1969), 242-265. 5247 ---- \ COMPARISONS OF LC PROOFSLIP AND MARC TAPE ARRIVAL DATES AT THE UNIVERSITY OF CHICAGO LIBRARY Charles T. PAYNE: Systems Development Librarian, and Robert S. McGEE: Assistant Systems Development Librarian; University of Chicago Library, Chicago, Illinois 115 A comparison of arrival dates of 5020 LC proofslips and corresponding MARC magnetic tape records reveals that four-fifths of the MARC records were received the same week as, or earlier than, the proofslips. The purpose of this study is to determine the timeliness of MARC II records' arrival dates in comparison to the arrival dates of matching LC proofslips. The Acquisitions Department of the University of Chicago Library receives a complete set of cut and punched LC proofsheets (or "LC proofslips") that is used primarily for selection and ordering. In examining potential uses of MARC records in acquisitions processing, the Library Systems Development Office felt that a critical determinant would be the timeliness of MARC records in comparison to the arrival dates of the matching LC proofslips. Accordingly, the study described below was designed to gather data upon which appropriate system design questions might be considered. It was decided that "arrival date" would be defined as the week in which an arrival occmTed, since the initial processing and distribution of incoming LC proofslips is framed within weekly, rather than daily, periods. "Week" was defined as the Monday through Friday workweek. "Arrivals" were defined as deliveries of MARC tapes and LC proofslips by the Library mail service. No attempt was made to influence the normal delivery procedures, or to specialize or hasten identification of these 116 Journal of Library Automation Vol. 3/2 June, 1970 materials for priority handling. Arrival weeks were numbered consecu- tively, the week of March 31 - April 4, 1969, being designated Week 1. MARC tape numbers correspond to arrival week numbers; i.e., MARC tape #4 arrived during Week 4. Table 1 presents these correspondences. Table 1. Week Numbers for 15 Weeks of Study Week Number Arrival Week Dates 1 March 31 - April 4 2 April 7 - April 11 3 April 14 - April 18 4 April 21 - April 25 5 April 28- May 2 6 May 5-May 9 7 May 12- May 16 8 May 19 - May 23 9 May 26-May 30 10 June 2- June 6 11 June 9- June 13 12 June 16- June 20 13 June 23- June 27 14 June 30- July 3 15 July 7- July 11 DATA COLLECTION Proofslip collection began in Week 2, but in that week only a partial collection was made. In subsequent weeks, complete collections of proof- slips bearing the MARC acronym (MARC proofslips) were attempted, so that proofslip data beginning with Week 3 (April 14-18) are more complete. Proofslip collection was terminated in Week 15. Discrepancies between the counts of MARC records and the numbers of MARC proof- slips collected have not been accounted for, but possible reasons are dis- cussed in the following section. Data collection was based upon comparisons of: 1) the weekly printed indexes, in LC card number order, that came with MARC II tapes; and 2) weekly lists of MARC proofslip arrivals. In each incoming batch of LC proofslips, those with MARC notes were separated and their arrival date noted. The MARC proofslips for each week were put in primary order by the £rst two digits (series number) of the card number, and were secondarily ordered within each series by the serial number follow- ing the hyphen, thereby matching the order of LC card numbers in the MARC indexes. These numbers were transcribed to create a weekly list of proofslip arrivals. Two new lists of LC card numbers were derived each week: 1) a MARC index; and 2) a proofslip list. Weekly each new list was com- pared with all lists of the other type to identify card number matches. \ Proofslip and MARC Arrival Dates/PAYNE and McGEE 117 Thus, each of the two types of lists was cross-tabulated with all the lists of the other type, showing on all lists which card numbers had been matched, and the week numbers of these matches. Counts were made of the matches tabulated on each list, and were entered into Table 2. Matches made during a given week are sub- counted by series groups 65-68, 69, and the 7 series. The cumulative percentages of MARC record and proofslips matches were entered into Tables 3 and 4. Table 3 contains the percentages of matches for any week's proofslips with successive MARC tapes. For example, of the 340 proofslips received in Week 4, 71.2% matched MARC records received the same week, or earlier, i.e., tapes 4, 3, and 2. Table 4 contains the percentage of matches for any MARC index on successive proofslip lists. For example, of the 768 records on MARC tape number 5 (received in Week 5) 23% were matched by proofslips received the same week, or earlier, i.e., Weeks 5, 4, 3, and 2. ANALYSIS OF RESULTS Some patterns of MARC and proofslip arrivals are indicated by the tables. The results in Table 2 show that there is not a one-for-one weekly relationship between proofslip and MARC record arrivals. For example, the 340 MARC proofslips received in Week 4 matched tape records re- ceived from Week 2 through Week 10, although the highest number of matches was also in the tape received in Week 4. In later proofslip weeks, however, the highest number of proofslip matches was with tape records received at least one week earlier. A summary of Table 2 would show that of the 5020 MARC proofslips received during Weeks 3-10, 4004, or 79.8% were matched to MARC records received the same week or earlier. In Table 3, the cumulative percentages of proofslip matches with suc- cessive MARC indexes indicate, for several of the weeks, more than a 90% match with tape records two weeks after proofslip arrivals. Table 3 shows that the percentage of matches for a set of proofslips received in one week with the MARC indexes received the same week or earlier ranges from 48.9% to 91.6%. Table 4 shows that the percentage of matches for a MARC tape re- ceived in a given week with the proofslips received the same week or earlier ranges from 7.1% to 49.8%. For the period of weeks corresponding to tape numbers 3-10, 6335 tape records (from Table 4) and 5020 proofslips (from Table 2 or 3) were received. The reason for the discrepancy between the number of MARC records and the number of MARC proofslips is not clear, but is possibly due to the combined effects of basic factors such as the limited period of the study, the difficulties of collecting proofslips in a working environment, and the nature of the manual effort required to list LC card numbers and compare proofslip lists and MARC indexes. Table 2. Number of Proofslip Matches with MARC Indexes by Arrival Week and by LC Card Number Subseries Proof slip LC Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Week Series 1 2 3 4 5 6 7 8 9 10 PS 2 65- 68 5 25 13 4 4 0 0 0 0 0 # 88 69 2 8 5 3 3 0 l 0 0 0 7 series 1 2 7 1 2 1 0 0 0 0 Total 8 35 25 8 9 1 1 0 0 0 6!:>-68 6 42 77 37 17 9 2 1 0 0 PS 3 69 7 25 65 32 30 9 5 0 1 0 #497 7 series 0 16 36 41 8 2 0 0 2 0 Total 13 83 178 110 55 20 7 1 3 0 65-68 0 14 35 57 19 12 1 3 2 1 PS 4 #340 69 0 0 26 56 12 3 0 1 1 3 7 serie' 0 0 18 36 19 9 1 0 1 2 Total 0 14 79 149 so 24 2 4 4 6 65-68 0 0 14 56 33 35 0 4 7 4 PS 5 69 0 0 7 62 21 8 0 4 3 2 #398 7 seriel 0 0 9 49 9 9 1 5 2 6 Total 0 0 30 167 63 52 1 13 12 12 65-68 0 0 0 29 108 77 3 5 2 3 PS 6 69 0 0 1 55 95 so 4 2 2 4 #653 7 serie~ 0 0 2 28 72 52 6 0 5 s· Total 0 0 3 112 275 179 13 7 9 12 65-68 0 0 0 9 68 128 29 6 4 2 PS 7 #711 69 0 0 0 2 29 133 54 9 1 8 7 serie! 0 0 0 5 33 92 33 10 3 6 Total 0 0 0 1 6 130 353 116 25 8 16 PS 8 65-68 1 0 0 0 5 87 46 29 6 4 69 0 0 0 0 2 54 49 20 11 4 #503 7 seriel 0 .0 0 0 2 37 46 17 1 2 Total 1 0 0 0 9 178 141 66 18 10 65-68 0 0 0 0 0 10 86 122 52 34 PS 9 69 0 0 0 0 l 3 75 107 53 39 #933 7 «F>ri F>< 0 0 0 0 0 1 49 115 73 42 Total 0 0 0 0 1 14 210 344 178 115 65-68 0 0 0 0 0 0 5 36 159 91 PS 10 #985 · 69 0 0 0 0 0 0 1 40 180 96 7 series 0 0 0 0 0 0 5 23 165 101 Total 0 0 0 0 0 0 11 99 504 288 Tape Tape Tape 11 12 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 2 1 Q_ 3 1 n 0 0 0 3 0 0 1 2 0 4 2 0 0 L. 0 0 0 0 0 0 , 0 L. 1 5 1 1 0 0 0 10 4 0 15 5 1 8 1 0 5 1 1 7 1 0 20 3 1 Tape Tape 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n n n n 0 0 1 0 0 0 1 0 0 0 0 0 () () () () 0 0 1 0 1 0 2 0 5 1 1 0 1 1 7 2 ~ ~ 00 'o' ~ g ..... -Q.. t-c & j :;t.. ~ 0 g ..... a· ~ 2 !"""' co -to ._ c: l:l SD ~ tD z Table 3. Cumulative Percentages of Matches of Each Week's Proofslips Received with Each Additional MARC II Tape Index Proofsl ' Cumulative \ of Tape Number 1_P Tape 1 Tape 2 Tape 3 Tape 4 Tape 5 Tape 6 Tape 7 Tape 8 Tape 9 Tape 10 PS 2 9.1 48,9 77.3 86.4 96.6 97 . 7 98.6 98.6 98.6 98.6 ll 88 PS 3 2.6 19.3 55.1 77.2 !1497 88.5 92.3 93.7 94.0 94.6 94.6 PS 4 o.o 4.1 27 . 4 71.2 85.9 92.9 ll340 93.5 94.7 95 .9 97.6 PS 5 o.o 0 . 0 7.5 49 . 7 65.3 78.4 78.6 81.9 84.9 87.9 # 398 PS 6 o.o o.o .4 17.6 59 .7 87.1 89.1 90.2 91.6 93.4 #653 PS 7 o.o o.o 2.2 20.5 70 . 2 86.5 90.0 91.1 93.5 11711 o.o PS 8 .2 .2 .2 .2 2. :> 37 .4 65.4 78 . 5 8 2.1 84.1 #503 PS 9 o.o o.o 0 . 0 o.o .1 1.6 24.1 61.0 80.0 92 . 4 ll933 PS 10 1.1 11.2 62 . 3 91.6 ll995 o.o o.o 0.0 o.o o.o 0.0 - Tape 11 Tape 12 Tape 13 Tape 14 Tape 15 ~ 0 (;l" 98.6 98 . 6 98.6 98.6 98.6 ....... -s· 9 4.6 94.6 94.6 94.6 94.6 ~ ~ R.. 97.6 97 . 6 97.6 97.6 97.6 ~ > i:J::j 88.2 88.2 88.2 88.2 88 .2 93.9 94.0 94 . 0 94.0 94.0 CJ ~ c;· ~ ...... 94.9 94.2 94.2 94 .4 94 . 4 t:::l / ~ <'+- <.1:> ., 84 . 1 84 .5 84. 7 84.7 84.7 --~ to< 94.0 94.5 94 . 6 94.8 94.8 z tr1 93 .6 93,9 94.0 94.7 94.9 $)j ~ 0.. ~ n G'J tr1 tr1 ...... ...... co Table 4. Cumulative Percentages of Matches of Each MARC II Tape Index with Each Additional Week's Proofslips Received Tape Cumulative \ of Proofslip Week Number PS 2 PS 3 PS 4 PS 5 PS 6 PS 7 PS 8 PS 9 PS 10 PS 11 Tape 1 1.2 3.2 3.2 3.2 3.2 3.2 3.4 3.4 3.4 3.4 # 648 Tape 2 7.1 23.8 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 # 495 Tape 3 · 5.1 41.1 57.1 63.2 63.7 63.7 63.7 63.7 63.7 63.7 # 494 Tape 4 1.0 14.9 33 . 1 54.7 67.9 71.0 71.0 71.0 71.0 71.0 #791 Tape 5 1.2 8.3. 14.8 23.0 #768 58.8 75.6 77.0 77.1 77.1 77.1 Tape 6 0.1 1.9 4,0 a.5 #1136 24.2 55.4 71.0 72.3 72.3 72.4 Tape 7 0.1 1.2 1.4 1.6 3.5 20.2 40.5 70.7 72.3 72.6 #694 Tape 8 o.o 0,2 o.7 2.7 * 658 3.8 7.6 17.6 69.9 84.9 86.2 Tape 9 o.o 0.3 o . a 2.1 3.1 4.0 6.1 26.0 82.5 86.0 #892 Tape 10 o.o o.o 0.7 2.0 3.3 5.0 6.2 18.5 49.8 79.5 #922 ----- -- --- PS 12 PS 13 PS 14 3.4 3.4 3.4 26.6 26.6 26.6 63.7 63.7 63.7 71.0 71.0 71.0 77.1 77.1 77.1 72.4 72.4 72.4 72.6 72.6 72.6 86.5 86.5 86.5 86.3 86.3 86.3 89.9 89.9 90.5 PS 15 3.4 26.6 63.7 71.0 77.1 72.4 72.6 86.5 86.3 90 . 5 ~~- f--1 to 0 g ~ ...... .Q.. ~ 1 > ~ 8' g ..... a· ;;s 2 ~ vo ........... to <....t = ::I so f--1 c:D ~ \ Proofslip and MARC Arrival Dates/PAYNE and McGEE 121 CONCLUSION The data collected to date indicate that the arrivals of MARC records generally precede those of the corresponding proofslips. Thus, MARC records seem to be timely enough to be used in book selection and order- ing processes, where proofslips are now used, as well as to supply biblio- graphic data for cataloging. 5248 ---- 122 LEVELS OF MACHINE READABLE RECORDS RECON Working Task Force: Henriette D. AVRAM, Chairman; Richard DE GENNARO; Josephine S. PULSIFER; John C. RATHER; Joseph A. ROSENTHAL and Allen B. VEANER. This study of the feasibility of determining levels or subsets of the estab- lished MARC II format concludes that only two levels are necessary and desirable for national purposes: 1) the full MA.RC II format for distribu- tion purposes; and 2) a less complex subset to be used by libraries report- ing holdings to the National Union Catalog. INTRODUCTION In March 1969, the Advisory Committee to the RECON Working Task Force, after approving publication of the initial RECON report ( 1 ), en- dorsed investigation of a number of questions raised in that report as well as consideration of certain issues not covered in the initial survey. The basic tasks to be undertaken have been described in another article in this issue (2). With further support for RECON from the Council on Library Resources, Inc., the Working Task Force has met several times to explore some of these problems. This article reports the conclusions reached with respect to one task: the feasibility of determining a level or subset of the established MARC II format that would still allow a library using it to be part of a future national network. DEFINITION OF "LEVEL" During the initial RECON study the Working Task Force, for discussion purposes, considered levels of encoding detail of machine readable cata- log records in relation to the conditions under which conversion might occur. A level was distinguished by differences in 1) the bibliographic \ Levels of Machine Readable RecordsjRECON TASK FORCE 123 completeness of a record, and 2) the extent to which its contents were separately designated. With respect to the latter point, the RECON report stated: "A machine format for recording of bibliographic data and the identification of these data for machine manipulation is composed of a basic structure (physical representation), content designators (tags, delimiters, subfield codes), and contents (data elements in fixed and variable fields). Although the basic structure should re- main constant, the contents and their designation are subject to variation. For example, a name entry could be designated merely as a name instead of being distinguished as a personal name or cor- porate name. When a distinction is made, a personal name entry can be further refined as a single surname, multiple surname, or forename. Likewise, if a personal name entry contains date of birth and/ or death, relationship to the work (editor, compiler, etc.), or title, these data elements can be identified or can be treated as part of the name entry without any unique identification. Thus individ- ual data elements can be identified at various levels of complete- ness." (3) Appendix F of the RECON report tentatively defined three levels: "Level 1 involves the encoding of bibliographic items according to the practices followed at the Library of Congress for currently cataloged items, i.e., the MARC II format. A distinguishing feature of level 1 is the inclusion of certain content designators and data elements which, in some instances, can be specified only with the physical item in hand. "Level 2 supplies the same degree of detail as in level 1 insofar as it can be ascertained through an already supplied bibliographic record ... . "Level 3 would be distinguished by the fact that only part of the bibliographic data in the original catalog record would be tran- scribed. In addition, content designators might be restricted ... " ( 4) . At the outset of the present study, however, it was recognized that incomplete bibliographic description is not acceptable in records for na- tional use. In addition, it seemed that the question of having a level below level 2 really arose from a desire to define a machine readable record with a lesser degree of content designation rather than one with less complete bibliographic data. It was decided, therefore, to concen- trate the study effort on this task, and the original formulation of level 3 was discarded. On further consideration, it was realized also that the distinguishing feature between levels 1 and 2 was not significant. Omission of data elements that cannot be determined unless the book is in hand may simplify an individual record but does not simplify the content designa- tors in the format because these elements are often present in other 124 Journal of Library Automation Vol. 3/2 June, 1970 records. Thus, as far as content designation is concerned, levels 1 and 2 (as originally defined) were in fact the same. Once this similarity became apparent, it was recognized that the specification of levels really depended on the functions of machine read- able catalog records from the standpoint of national use. FUNCTIONS AND LEVELS On the basis of present knowledge, it seems that machine readable records will serve two primary functions for national use. The first involves the distribution of cataloging information in machine readable form for use by library networks, library systems, and individual libraries; the second involves the recording of bibliographic data in a national union catalog to reflect the holdings of libraries in the United States and Canada. In this report, the first is called the distribution function; the second is called the national union catalog ( NUC) function. Each of these functions can be related to a distinct level of machine readable record. The Distribution Function The distribution function can best be satisfied by a detailed record in a communications format from which an individual library can extract the subset of data useful in its application. At the present stage of library automation, it is impossible to define rigorously all of the potential uses of machine readable catalog records. Thus, there is no way to predict which data elements may not be needed or to rank them according to their value to a wide variety of users under different circumstances. To confirm the wide variation in treatment of the MARC II format, an analysis was made of the use of MARC content designators by eight Table 1. Use of MARC Content Designators by 8 Library Systems or Networks Number of libraries Number of items Fixed fields (19) Tags Indicators (63) 8 26 7 6 6 3 5 1 5 4 6 3 3 7 2 2 4 4 1 1 7 None 7 (126) 2 7 9 92 16 Note: Only six libraries supplied information on fixed fields . Sub field codes (181) 1 88 45 15 9 11 9 3 \ Levels of Machine Readable RecordsjRECON TASK FORCE 125 library systems and emerging networks. The data from this analysis were synthesized for presentation in two tables. Table 1 shows the acceptance of content designators in terms of the absolute number of libraries using them. It should be read as shown by the following examples: 1) 26 of the 63 MARC tags are used by all eight libraries; 2) 92 of the 126 indicators are used by only three libraries. Table 2 shows the acceptance of content designators in relative terms. Thus, if only three libraries were using a particular tag and all used the associated subfield codes, the acceptance of those subfield codes was calculated as 100 percent. In both Tables 1 and 2, the columns on indicators and subfield codes include responses only from those libraries that were definitely using the tag with which a given indicator or subfield code was associated. The analysis excludes tags for which no immediate implementation is planned by the MARC Distribution Service. Table 2. Percentage of Acceptance of MARC Content Designators by 8 Library Systems or Networks Percent of libraries 100 75-99 50-74 25-49 1-24 0 Fixed fields (19) 1 13 4 1 Number of items Tags Indicators (63) ( 126 ) 26 9 2 8 16 6 108 7 7 Subfield codes (181) 10 134 32 5 The major findings of this analysis may be summarized as follows: 1) Of 19 fixed fields, 14 were used by at least half of the libraries and all were used by at least one library. 2) Of 63 tags, 43 were used by at least half of the libraries and 26 were used by all of them. Seven tags were not used by any of the libraries studied, but these tags cover items that will appear in machine records produced by the National Library of Medicine, the National Agricultural Library, and the British National Bibliography. 3) Of 126 indicators, only 18 were used by at least half of the libraries. The highest degree of acceptance was the use of the same two indicators by six libraries. On the other hand, each indicator was used by at least two libraries. 4) Of 181 subfield codes, 176 were used by at least half of the libraries that were using the related tags. Each subfield code was used by at least a quarter of the libraries that could express a relevant opinion. 126 Journal of Library Automation Vol. 3/ 2 June, 1970 The foregoing analysis confirmed the view that a nationally distributed record should be as rich in content designation as possible. Failure to provide this detail would result in many libraries having to enrich the record to satisfy local needs, a process more costly than deleting items selectively. Therefore, as of now, the present MARC II format constitutes the level required to satisfy the national distribution function. The National Union Catalog Function As noted above, the NUC function relates to the use of machine read- able records to build a national union catalog. At first thought, it might appear that this function overlaps the distribution function. As far as Library of Congress cataloging is concerned, this view is correct. It is valid also with respect to cooperative cataloging entries issued by the Library as part of the card service. However, the two functions are quite distinct as far as regular reports to NUC are concerned. The essential difference between the two categories of catalog records is that those issued as LC cards have been completely checked against the Library's authority files and edited for consistency, whereas only the main and added entries of NUC reports have been checked for com- patibility. The impact of this difference can be judged from the fact that an attempt to distribute NUC reports as proof slips several years ago was abandoned because the response to this service did not justify its continuance. Distributing NUC reports in machine readable form would add another dimension to the problem of processing them, because, to be flexible enough for wide acceptance, NUC reports would have to be entirely compatible with those issued by the MARC Distribution Service. Since compatibility would involve more detailed content designation than many libraries might put into their records for local use, libraries would have to be willing to provide this detail in NUC reports, or the level of NUC reports would have to be upgraded centrally. As the certification of the bibliographic data and the content designators would entail a major work- load for the Library of Congress, it does not seem practical to pursue this goal at present. It is possible, however, to define a subset of content designators to cover the eventuality that outside libraries may be able to report their holdings to NUC in machine readable fmm. A MARC subset can be determined for the NUC function because this function involves pro- cessing records in a multiplicity of places to be used centrally for speci- fically definable purposes. The distribution function, on the other hand, involves the preparation of records at a central somce to be used for a wide variety of purposes in a multiplicity of places. The difference is vital when it comes to stating the requirements for the two types of records. Levels of Machine Readable RecordsfRECON TASK FORCE 127 The specifications of a machine readable record to fulfill the NUC function depend on the nature and functions of the national union catalog itself. The content designators for such a record will be defined in a separate investigation now being conducted by the Working Task Force. The present study was considered to be completed once the feasibility of defining a level of machine readable record for that purpose was established. CONCLUSION The findings of this study of the feasibility of defining levels of machine readable bibliographic records are as follows: 1) The level of a record must be adequate for the purposes it will serve. 2) In terms of national use, a machine readable record may function as a means of distributing cataloging information and as a means of reporting holdings to a national union catalog. 3) To satisfy the needs of diverse installations and applications, records for general distribution should be in the full MARC II format. 4) Records that satisfy the NUC function are not necessarily identical with those that satisfy the distribution function. 5) It is feasible to define the characteristics of a machine readable NUC report at a lower level than the full MARC II format. REFERENCES 1. RECON Working Task Force: Conversion of Retrospective Catalog Records to Machine Readable Form (Washington, D. C.: Library of Congress, 1969). 2. Avram, Henriette D. "The RECON Pilot Project. A Progress Report." Journal of Library Automation, 3 (June 1970), 10-22. 3. RECON, op. cit., p. 43. 4. Ibid., p. 164. 5249 ---- 128 ON-LINE SERIALS SYSTEM AT LAVAL UNIVERSITY LIBRARY Rosario de VARENNES: Director, Library Analysis and Automation, Laval University Library, Cite Universitaire, Quebec, Canada Description of a system, operational since June 1968, that provides control of all serials holdings in nine campus libraries, permits updating of the complete file every two or three days, and produces various outputs for library users and library staff from data in variable fields on disks (listings, statistics, etc.). The program, presently operating on an IBM 360/50 and utilizing an IBM 2314 disk -storage facility and three IBM 226 CRT termi- nals, is written in IBM System/360 Operating System Assembler Language and in PL/I; it could encompass a file of no more than 10 million records of variable length limited to 127/255 characters and subdivided in 25 or fewer fields . L'Universite Laval, the oldest French university in America, around 1950 began a move from the original location in historic Old Quebec to a new campus in suburban Sainte-Foy; the general plan calls for a total invest- ment of $235,000,000. This private institution, subsidized by the Provin- cial Government at about 75%, had an operating budget in 1968/69 of $32,000,000 (research not included); of this sum, $2,300,000 was appro- priated for the library system. The enrollment of full-time students was 10,145 and the total registration 22,726. The regular teaching staff amounted to 1,016 and the total figure was 1,691. The library serving this community constitutes a unified system under one administration, with centralized technical processing, but with nine physical locations-one of which is still in the Old City-and four auxil- iary services: Documentation Center, Rare Books and Archives, Map Li- On-Line Serials Systemjde VARENNES 129 brary and Film Library. The most recent addition to it is the Main Li- brary Building, dedicated in June 1969, a $10,000,000 seven-story complex of 424,000 square feet(1) . The Library staff consists of 269 employees, of which 78 are professional librarians or specialists. The Serials De- partment totals fifteen employees, of which three are professional libra- rians. The collections as of August 1969 represent 815,966 physical units, or 433,407 cataloging units of books, periodicals, government publications, pamphlets and microtexts; and 88,734 physical units of special collections (maps, photos, films, fixed films, music records, manuscripts, archives). The serials alone account for 189,440 bound volumes and 16,335 titles, of which 12,396 are received currently and 7,934 are subscriptions. The figures for serials titles will probably reach the 20,000 mark with the com- pletion in 1970 of an inventory started in 1964. Library automation venture at the Library goes back to the autumn of 1963, when an off-line serials system and a subject headings list pro- gram were begun. Along the road, there was developed in the Documen- tation Center a special technique of information storage and retrieval utilizing the Recordak Miracode (Microfilm Retrieval Access Code) Sys- tem and a program called ASYVOL 2 (Analyse Synthetique par Voca- bulaire Libre/Synthetical Analysis by Free Vocabulary) by means of which various indexes and research projects are currently processed. Recently the first on-line real-time program with the new serials system went into suc- cessful operation. Some literature, mostly in French, has been issued concerning these realizations and projects, but has been little publicized (2-11). It is also worth noting that the Library, except for some peri- pheral equipment, mostly input devices, does not own any machinery and is utilizing instead the programming staff and the computer facilities of the polyvalent Laval University's Computing Center. In the Library itself the author of this article is mainly responsible for preliminary analysis of projects, for coordination of activities between the Library and the Com- puting Center, for the supervision of work done in library automation units integrated into library services, and for the administration of the budget appropriated for library automation. This last item, research projects not included, is $170,000 for the year under discussion. SYSTEM DESIGN Contents of the File In its present organiza.tion, the serials file is accessible only by an accession number limited to seven digits and ordered corresponding to the alphabetized entries of records. There is a distinct entry for every title and every reference and for each duplication of any one title or reference. All records fall within two main divisions: humanities, repre- sented by H, and sciences, represented by S, and are further identified by subdivisions of these main classes to a limit of three letters (for 130 Journal of Library Automation Vol. 3/2 June, 1970 TITRE A SIGLE g PERIOD! CITE c REPERTOIRES DE DEFOUILLEMENT D AB./DON/ECH. E LIEU DE FUBL. F PAYS/LANGUE G COTE H DATE DE PARUTION I NIT !ALE I EDITEUR ET SON ADRESSE J PRIX K NOTE HISTORIQUE L TITRE DIRECT M VEDETTES·"'ATIERE 1 2 N RENOUV • CCM~. 0 . ETAT DE COLLECTION DE L'ANNEE COUR. p ETAT RETROSPECTIF DE LA COLLECTION Q ETAT RETROSPECTIF DE CE QUI MANQJJE R N.B. ."SIGLE !ere col 2e col 3e col 4e col. 5e col. 6e c ol. Fig. 1. Input Sheet. MATRICULE NO: I I I I I I I I I I I I I 1 courant 1 publication officielle 1 annuel ou continuation 1 voir 1 collection non complete 1 periodiques d'hOpitallX I I I I I I I 0 non courant 0 non officielle 0 les autres o titre du periodique o collection complete INFORMATION SHEET FOR TRANSMISSION TO SYSTEM (BACK-UP) On-Line Serials System/de VARENNES 131 Fig. 2. Serials File Updating. 132 Journal of Library Automation Vol. 3/2 June, 1970 example: HH, Main Library; HMU, Music Library; SA, Agricultural Li- brary; SCC> Science Library, Department of Chemistry). There is a possibility of 25 fixed/variable fields for any record, but only 18 are cur- rently used (Figure 1). As of September 2, 1969, the statistical figures for the complete file were as follows: 22,530 entries, of which 16,335 were titles; 6,192 references; three entries were unspecified by error. Hardware The system is operational with an IBM 360/50, an IBM 2314 disk- storage facility and IBM 2316 disk packs, three IBM 2260 CRT display units, an IBM 2848 display control unit, an IBM 2401 tape transport and control unit and an IBM 1403 high-speed printer. The program system occupies a 56,320-byte region of core memory. Software The system developed at Laval provides essentially for two things: the record display on CRT terminals for questioning or modifying the records; and the updating of the serials file (Figure 2). It is not affected by the bibliographic contents of records; the control of this part is the responsibility of the Serials Department. The system could encompass any file of no more than 10 million records of variable length limited to 127/255 characters and subdivided into 25 or fewer fields. The program is written in IBM System/ 360 Operating System Assem- bler Language, except for the output and printing routines written in PL/1, and it is conceived for an IBM 360/ 30 model or one of a higher number, matched at a minimum with one magnetic tape, one disk, one 2848 Control Unit and one 2260 CRT terminal; it operates under the control of operating system OS/ 360, version 14 or subsequent (12, 13). The system is roughly subdivided into three subsystems, the first being the control routine for the system and CRT terminals, developed by the LINKAGES CM CMCA ED~ NVC MODOSSIF AVANC I RACINE I --r--- INSTITR ~ECRAN CROIX INS CHAN 'LIGNE INS ~ COR MOT AIC REPERAGE RIC LOG BLANBUF Fig. 3. Communication between Modules. On-Line Serials Systemfde VARENNES 133 Computing Center of Laval University and called RACINE (ROOT). The second is a subsystem that consists of display and updating rou- tines, a group of 18 modules falling under two control sections ( CSECT): LINKAGES and MARCHAND (family name of an analyst-programmer from B.I.R.O.). Each is again constituted of various subprograms, and MARCHAND includes also all literals of the program. All these modules communicate various ways as illustrated in Figure 3. Modules not within the large box constitute the CSECT MARCHAND; other CSECf are within the large box. The third subsystem is a modification routine of records on disk called MODOSSIF (Modification des Dossiers du Fichier/Modification of Records on File). The IBM/Linkage Editor links these routines, and they communicate 1) by way of specific registers; and 2) by way of working space areas, some common to all terminal stations and some restricted to one in particular. The main purpose of the system being to give the user up-to-date information, it is implied that information concerning modified records should be available as soon as the transaction is performed. The IBM/ Indexed Sequential System seems at first sight to ideally answer this need. Nevertheless, the Library was forced to elaborate a more complex system for the sake of security. Data Sets (Figure 4) COMMUNE et UNITE MODOSSI F ' ' ' ' 0 0 0 0 PROCEDURE DE " RESTART " - - - - - - --- -- - ~, Fig. 4. Data Sets. 134 Journal of Library Automation Vol. 3/ 2 June, 1970 There exist the master file in direct access on disk, with a backup £le on tape. When a record is asked for, the accession number of the record is transmitted to the program and searched for in the file; if found it is duplicated completely in the working space area on disk corresponding to a particular terminal, and is displayed on the CRT nine lines at a time. In case of a modification being asked for, it is on this copy in that particular area that the program MODOSSIF applies. In switching to another demand, the program checks to see if any modification oc- curred. If not, the copy is destroyed; otherwise the amended record is transferred to the temporary common working space area on disk called BP AM (Buffer Periodiques Amendes/ Buffer Amended Serials), where all modifications accumulate from one updating of the master £le to the other. If queried anew before updating, the same amended record will be retrieved from BP AM £le and duplicated as before. Moreover, any instruction concerning modifications is chronologically recorded on tape as given and constitutes the LOG (Figure 2). If any down time occurs, it is then possible to simulate all the transactions per- formed since the last updating. Updating, normally a daily process, is basically the merger of the master £le with the BP AM £le, resulting in the creation of a new master £le on disk and a new backup on tape. Record in the File As mentioned before, any record in the £le is identified by an accession number of seven digits. Number 0000000 identifies the system's messages and is always displayed first, and number 9999999, indicating that a work- ing space area is not occupied, is not to be used. Otherwise all numbers are symmetrical and interchangeable. Any record may cover up to 25 fields or blocks of logical information. These fields are identified by letters A to Y and put into alphabetical order. They vary in length from three to many thousand characters. Each field is divided into three elements: identifying letter; information to display; and end of field or record control tag. This tag is FD (Fin du Dossier/End of Field) for all fields except the last one, which is tagged FE (Fin de !'Entree/ End of Record ). The information to be displayed is submitted to various restrictions, exemplified in detail in the instructions manual ( 13). The manual, in fact, puts in action the main program MODOSSIF. Physically any record on file is subdivided into many subrecords of fixed length ( L EQU nnn) optimized at 239 bytes to a maximum of 127 per record. Each subrecord is addressed in three sections as follows : 235 bytes of information, three bytes representing the accession number in binary code and one byte giving the sequence number of the sub- record under this particular accession number. This way, the last four bytes give the key to the subrecord in the master file and the last byte the key of access in the working-space area, making it possible to execute On-Line Serials Systemfde VARENNES 135 MODOSSIF and various print-out routines. To facilitate the retrieval of any particular field in a record, the 26 first bytes of the first subrecord are set aside for an index to the fields . The 25 first bytes represent fields A to Y; in each position, a binary zero points to an inexistent field and a positive value indicates the sequence of the subrecord where the field starts. The 26th byte gives the total number of subrecords in the record. The 27th byte gives the name of the first existing field, etc. Figure 5 is an example of a complete record. Underlined sections indicate hexadecimal notation. Each row in the · figure is a subrecord here given an unreal value of 40. The remainder is in alphanumeric characters, except that space is compressed and indicated by a dollar sign. The information on Figure 5 appears as follows on the CRT screen: A TOKYO BUNRIKA DAIGURU, SCIENCE REPORTS B 000010 SC D ANNUAL REPORTS OF SCIENCES Q NO 50-67, 83, 97 T CHIMIE CHIMIE INDUSTRIELLE U A ET B C V A-B C W VIDE X REVUE ANNUELLE DE CHIMIE Y CE DOSSIER EST DRESSE A TITRE D'EXEMPLE SEULEMENT. Varia The program provides also the parameters for each of the lines dis- played on the screen, that is, nine screen-parameters called P ARMEC ( Parametres-Ecran). A B C D E F G U I J K L M N 0 P Q R S T U V W X Y 0102000300000000000000000000000003000004050505050608A T 0 K Y 0 BUN 21E88E01 1 2 3 4 567 R I K A D A I G U R U , S C I E N C E R E P 0 R T S FD B 0 0 0 0 1 0 21E88E02 $ S C FD D A N N U A L R E P 0 R T S 0 F S C I E N C E S !~ Q N 0 21E88E03 5 0 - 6 7 , 8 3 , 9 7 FD T C H I H I E - C II I 11 I E I N D U S T R I E 21E88E04 L L E FD U A E T D $ C FD V A f D , $ C FD W $ V I D E FD X R E V U 21E88EOS E A N N U E L L E D E C H I H I E • FD Y C E D 0 S S I E R E S 21E88E06 T D R E S S E A T I T R E D ' E X E M P L E S E U L E M E N T 21E8BE07 , FE0000000000000000000000000000000000000000000000000000000000000000000021E8BE08 Fig. 5. Example of Complete Record. 136 Journal of Library Automation Vol. 3/2 June, 1970 The analyst-programmers at the Computing Center completed the pro- gram by various printing subprograms from the data on variable fields on disks, by a statistics subprogram and by a control routine of the indexes to the file. Recently another addition occurred to the system when Les Presses de l'Universite Laval, the Library's subscription agent for serials, de- cided to utilize the file to initiate a computerized ordering process. The programming for this project was tested during October and the pro- gram was successfully run during the first week of December 1969. IMPLEMENTATION As soon as it was confirmed that the Computing Center would re- ceive by Summer 1967 a third-generation computer (IBM 360/40) it was deemed advisable to contemplate an on-line system to replace the already saturated off-line serials system on the IBM 1410 inaugurated in 1964 (8). An optimistic target date having been set for January 1968, the author transmitted in April 1967 to the Computing Center for study a working hypothesis concerning the automatic conversion of holdings data and the automatic claiming of missing issues ( 14). In answer to it, in August 1967, Mr. Jean Lachance, analyst-programmer, proposed a first draft of an automatic control system for serials ( 15). In fact the draft envisaged only a semi-automatic conversion of data and the on-line system for current entries only, the non-current being managed off line. Then, on account of various restrictions befalling the Computing Center, it was decided to call upon an external firm, B.I.R.O., Inc., located in Quebec City. The contract, signed at the end of November 1967, pro- vided basically for: 1) the conversion of the master file on magnetic tapes, containing records in fixed fields, to a random access file on disks, with records in variable fields; 2) the programming of record display on CRT terminals; 3) the updating of the file via coded input procedures; 4) the provision of transitory working space areas for current transac- tions; 5) the possibility of questioning and amending both the master file and the transitory file; 6) the writing of the appropriate technical documentation and the intitial training of the operators of the terminals. The contract was to be in conformity with the standards of operating system OS for an IBM 360 computer and subject to the acceptance of the Computing Center of Laval University. At the same time a work- ing schedule was established as follows: 1) beginning of work as soon as the contract was signed; 2) operational program sixty days after deliv- ery to the contractor of terminals in good working condition according to manufacturer's specifications; and 3) termination of contract thirty days after acceptance of the finished product. The terminals were ready by January 25, 1968. The program was de- clared operational by April 11, 1968, and the technical report describing it deposited the week after. Meanwhile a last updating of the master On-Line Serials Systemjde VARENNES 137 file with the off-line system was performed at the beginning of April. On April 29, 1968, the conversion of the file to IBM 2311 disks con- nected to an IBM 360/40 was realized. Everything was then ready for the final test. Unhappily, Mr. Lachance left Laval at the end of April at the most crucial moment, and it was not before June 12, 1968, that the first updating succeeded and the program became operational. From then on, apart from various technical problems, there were other diffi- culties: a moving of the main library during August, a moving of the Computing Center that precluded any activity from September 26 to October 25, and a switch to an IBM 360/50 and to an IBM 2314 disk- storage facility during October-not to mention a turnover of staff in the Serials Department. These prevailing conditions explain why the program was not officially accepted by the Library before December 16, 1968. OPERATION The first of a series of turning points in the refinement of the program occurred in November 1968 with a normalized run of updatings. In January 1969 the two first printing programs ran successfully; these were a daily checking list with calendar, and a statistics subprogram (Figure 6). In February 1969 the program produced almost error-free updatings (0.8% and 0.63%), and in June 1969 the system was finally debugged, eliminating a particular recurring anomaly accounting for most of the errors in the system (a display of preceding instruction bearing acces- sion number and code in some field o:i subsequent record). Some other technical difficulties were encountered along the road and TASlEAU DES SlAliSTI~U 1000 Serial document index terms (Lehigh U) inverted index 5. MULTILIST Varies Threaded list any chosen key term to fit aapli- Tree structure cation ( e! author, subject, ate, directory title wor , subject headings) 6. MARC/ 2000 MOLDS Cell-matrix any discrete data block 7. NASA/RECON 270,000 ? subject l { author corporate qualified date source by report# contract# 8. TIP > (MIT) 25,000 List structure author(s) location ( where work done) citation identification ( i.v-p.) article title ( entire, keyword) citation index bibliographic coupling 9. SUNY BIOMED > 20,000 Inverted index auilior } { COMM. title qualified date, NE1WORK subject by lang. •Each command is a subroutine. Commands are tailored to application. A ccess to Authority Files On-Line UDC Schedules Subject category list, index term file No Index term No Optional No No No LC MARC on MOLDS/ATHERTON and MILLER 145 Related Terms or #Commands Cross Refs In Query Given Language Yes 11 Yes 14 ( 11 light pen) No (conversation) Yes (conversation) No 0 Optional 35 Yes 16 function keys No 9 Also various MAC commands No 10 (?) Computer Instruction In Language Use Optional Optional Optional No No No Optional No No ComJ:ier Ai Query Formulation (Conversa- tion) Limited Yes Yes Yes No No Yes No Yes Root Communi- Word cation Search Link Yes CRT Yes CRT with light pen Yes Teletype Yes Teletype No CRT No CRT Yes Teletype No IBM 2740 console 146 Journal of Library Automation Vol. 3/ 2 June, 1970 BACKGROUND A number of interactive retrieval systems have b een designed and im- plemented within the last few years. The features and potential of LC/ MARC ON MOLDS are best viewed in relation to what has been done in the field up to now. To gain some perspective, the major fea- tures of data base structures and query languages of other interactive systems are summarized in Table 1. This table presents those features of most interest to librarians who may wish to compare searching on a computer with searching in the card catalog or other bibliographic refer- ence tools. References 3-12 document sources for the data in this table. MOLDS DATA BASE STRUCTURE The general structure of the data base with which MOLDS operates is, in comparison with the threaded lists and inverted indexes found in many retrieval systems, extremely simple and unsophisticated. The data base can be composed of from one to ten distinct files of 1000 records each. A record is equal to the bibliographic description on a card in a library catalog. Each record may be up to 300 computer words ( 1200 characters) long and may be subdivided into 80 blocks. Originally, there was a 200- word (BOO-character) limitation on record size, but this has now been expanded. The total file size (limit of 10,000 records) is adequate for testing purposes, but expansion beyond the present limitations is planned in order to make the system more practical for actual use. The structure of a file is essentially a simple matrix. Each row con- tains all the elements of a single complete record; each column contains all like discrete items of all the records in the file. The columns are called blocks in the MOLDS system, block and fi eld being used synonymously in this report. For example, a library catalog card for one publication would be a record in a file composed of library catalog records. The main entries in the file constitute a block and the dates of publication constitute another block. Figure 1 illustrates the data base structure, as of 1968. In this illustration the maximum number of files is 10 ( 1000 records each) and the maximum number of blocks 80. Each file and each block in a file is given a name and/ or number. A user can reference or call up any file or data block within a file by using its name or number in a MOLDS query language command. There are as many access points to a file as there are blocks in that file. This is in contrast to a conventional card catalog, for example, where the only access points are filing entries: main entry, title, subject(s), added entries, series, and analytics. No specific provision is made within the MOLDS system for the stor- age of authority files, cross reference lists, or other intermediate keys to the records. Such files are not absolutely necessary for effective operation of the system since every block can be accessed and can serve as its own authority file. For more efficient system operation, however, it is intended LC MARC on MOLDS/ATHERTON and MILLER 147 to explore the possibility of creating authority fi1es as part of the data base, beginning with portions of the Seventh Edition of the Library of Congress List of Subject Headings. Block Al Block A2 •••••••• Block ABO - PIn l ....___ ---v- ../ File A } record AJ. record A2 re;,ord A3 record AJ.OOO Data Base Block Bl BlockB2 -------- Block B8C ~ - ~ File B record Bl record B2 .. record BlOOO (As of 1968,--MaldmulllllUlllber .of files is 10 files,each or 1000 records, v1th maximum number of blocks 80) Figure 1. Section of General MOLDS Data Base Structure. Provision is made for temporary user storage areas in which the user places the results of his retrieval and processing operations. Data in the user area is retained only during the session in which it is created. Although it cannot be saved for use at a later date, all or part of it can be printed out on the on-line printer for the user's later reference. While the general structure of the data base is formalized within the MOLDS system, the content and specific organization of a particular data base is determined by its originator. This feature, plus the simplicity of MOLDS' own structure, introduces a great deal of flexibility into the data base and the use that can be made of it. The originator of the data base may designate as a block any discrete data item he wishes. If the user population is dissatisfied with results using one content and arrangement of blocks, the base can be reformatted and restructured in a fairly simple maintenance run. No problems of linking records or modi- fying authority lists arise, as neither is part of the system. The first ver- sion of the LC/MARC data base has in f.act been modified by addition of three blocks and division of one block in half to form two blocks, giving access to smaller units of data. 148 Journal of Library Automation Vol. 3/2 June, 1970 The LC/MARC Data Base in MOLDS Format Library of Congress MARC Pilot Project tapes containing some 40,000 records of English language books cataloged in 1966-67 became available for this project in the Fall of 1967. Because of the MOLDS data base limitations, a subset of these catalog records was selected for use with MOLDS. The original plan was to have each file in the data base con- sist of as complete a set as possible of all MARC Pilot Project records from a single Library of Congress classification schedule. The candidate for the first file was class R (Medicine) which contained just under 1000 records. Later MOLDS files were formed for two other LC classes: T (Technology) and Z (Bibliography and Library Science). In mid-1969 two stratified sample files of the MARC data base were created, one in the humanities, another in the social sciences. In all, Syracuse has a MARC/MOLDS data base of 10,000 records. The record format of the MARC tape was first analyzed to determine which fields should be included in the data base, and which might be omitted. The criterion for selection was probable usefulness to searchers of the data base, a conception that should undoubtedly be modified as searches are monitored. Appropriate changes would not be difficult. Toward the end of January 1969, a programming project was begun which entailed the design and implementation of a computer program to perform format conversion of the Library of Congress MARC I biblio- graphic file to satisfy MOLDS data base requirements. The project rep- resented a three man-month effort and was completed by June 1969. The data-base converter program represents an attempt to provide a user-oriented facility for creating a MOLDS data base from MARC in- formation. Essentially, the user of the program describes each MOLDS file to be produced by specifying: 1) the number of (fixed) fields per MOLDS record; 2) the name and size (in characters) of each field in the MOLDS record; 3) the name of the MARC I field from which the data are to be taken; 4) selection criteria according to which MARC I records are to be chosen for conversion; 5) for any MARC I field, a data conversion procedure to be applied prior to transferring the information to the appropriate MOLDS field; 6) whether or not diacritical codes should be stripped from the MARC I field prior to transferring the information to the MOLDS field; 7) whether or not character translation from lower-case to upper-case codes should be performed on the data prior to transfer from the MARC I to the MOLDS field. Although the program has not yet been refined to the extent originally intended, nevertheless it contains all the features indicated above and has LC MARC on MOLDS/ATHERTON and MILLER 149 been used to create ten MOLDS files since its completion. The program is written in PL/I and more fully documented in a report available from the National Auxiliary Publication Service of ASIS. MOLDS requires fixed-field input for its data base, but many of the fields or data blocks on the MARC tape are variable in length. There- fore, the field lengths of 200 records in the class R (Medicine) subset were examined to determine the maximum size which would produce a MOLDS record within the original 200 computer-word (BOO-character) limitation and still retain all the desired data. This limitation was easily expanded to 300 words, allowing addition of new fields and expansion of existing fields as new MARC/MOLDS files were generated. A record whose original variable length was 500 characters or less expanded to about 800 characters when converted to fixed-field form. In the first data base only records of 500 characters or less were considered for inclusion, which gave a total of 620 records in the first MARC/MOLDS file. By mid-1969 this data base was greatly enlarged using the program de- scribed above. The names of the present MARC/MOLDS files are: SS01, SS02, SS03, SS04, SSOZ, and SSOH. The first files generated were called MARC and MARZ. The MARC/MOLDS format now in use is given in Table 2. The addi- tions made to the original format are noted. MARC/MOLDS block names can be used instead of block numbers; for ease of searching both name and number are given in the table. The MOLDS block number corre- sponds to MARC Pilot Project field tags whenever possible. After this second revision had been completed, MARC II ( 13) format with new field tags appeared . Interestingly, there were remarkably few differences. Creating an information retrieval system from other data bases can pre- sent some major headaches. During the first test session with the MARC/ MOLDS data base, it was discouraging to find that successful retrieval operations could not be performed on such vital items as subject or main entry (blocks MAIN and SUBA, respectively) . The problem lay in the fact that the lower-case character codes employed on the MARC tape had not been converted to the all-upper-case-codes required by MOLDS. Once discovered, the problem was easily remedied. Other problems were not so easy to solve. The MARC data base had been received in a "raw form", i.e., there were typographical errors in the original tapes and irregular spacing; and incorrect punctuation, spelling and abbreviations. There was no way to detect these errors, and the retrieval program would only work on direct matches of query and document information elements. The MOLDS language (to be discussed subsequently) required a good deal of stand- ardization and regularity of the records to take full and effective advan- tage of its retrieval capabilities. 150 Journal of Library Automation Vol. 3/2 June, 1970 Table 2. MARC/ MOLDS Data Base Format Description Field Names Chars. MARC I Data Element MARC Fixed Fields: Molds Blk In Fixed Information Block No. Block Field Values or Name Position Explanation or Tag No. LC card no. LDN~ 80 11 9-19 Type of main entry TYPE 81 1 21 A-G Form of work F0RM 82 1 22 MIS Bibliographies indicator BIB 83 1 23 Xb Illustrations " ILLU 84 1 24 " Maps MAP 85 1 25 Conferences C0NF 86 1 26 Juvenile JUV 87 1 27 Languages LANG 88 4/ 4 29-36 Both languages Language 1 LANl 1 4 29-32 Language 2 LAN2 2 4 33-36 Publication dates DATE 89 4/ 4 38-45 Both dates Height in em. HITE 90 2 59-60 Uniform tracing indicator UNIF 91 1 66 Xlb Series tracing indicator SERT 92 1 69 " Place of publication code PLCD 18 4 46-49 Publisher code PUCD 19 4 50-53 LC call no. LCN~ 98 20 90 Dewey class no. DEW! 99 20 92 Dewey class no. (edited) DEW2 39 8 92 ooDDD.DD LC class no. (edited) LCCL 97 8 90 e.g. 00351.2352 Main Entry MAIN 10 68 10 Title Statement TITL 20 80 20 Subtitle Statement STIT 21 80 20 Edition Statement EDIT 25 12 25 Place } . PLCE 30 28 30 Publisher 1mpnnt statement PUBL 31 28 30 Collation C0LL 40 48 40 Series note SERS 50 44 50/51 Note N0TA 60 44 60 Note N0TB 61 44 60 Subject tracing SUBA 68 48 70 Subject tracing SUBB 69 48 70 Subject tracing SUBC 70 48 70 LC MARC on MOLDS/ATHERTON and MILLER 151 Personal Author Tracing PAUA 71 40 71 Personal Author Tracing PAUB 72 40 71 Corporate Author Tracing C0RP 73 1 72 LC card suffix LCFF 94 3 94 Total MARC/MOLDS Characters 848 THE MOLDS SYSTEM Functionally, the MOLDS system consists of utility routines to store a data base, a well-defined query language, a language interpreter, and a set of logical procedures which allow the user to operate on a data base. The MOLDS system is a set of FORTRAN IV subroutines which per- form the maintenance functions, interpret the commands in the query language and perform the desired logical procedures. The subroutines render the system modular and open. It is therefore relatively easy for a programmer skilled in FORTRAN IV to add, modify and delete commands and functions as required. This feature of the sys- tem is quite desirable. User feedback invariably points up weaknesses in the language or suggests useful features which might be incorporated. MOLDS was continually modified in response to user requirements, and each modification was implemented within a short time without requiring major programming changes throughout the system. The system has already grown since it was first implemented with the MARC data base, and commands have been added or modified as required. Hardware Configuration MARC/MOLDS was run at Syracuse University Computing Center on an IBM 360/ 50 computer. Originally, the on-line mode required full dedication of the computer during execution. The MOLDS system re- quires some 150,000 bytes of main memory and a disk storage unit to hold the entire data base, as well as intermediate data generated by the user. The MOLDS system has been implemented on other computers (2). Interaction with the system in the on-line version was carried on through an IBM 2260 Display Station consisting of a keyboard and CRT (cathode ray tube) display screen. Although two or more consoles have not as yet been operated simultaneously, the system is intended to be time-shared. Effort was made to alter the system to operate in a 50,000 ( 50K) upper partition, so that it could be accessible at all times rather than on a scheduled basis. This involved reorganizing the program into an overlay structure in which the basic or root segments are resident in a fixed portion of memory throughout execution, while the remainder of the program is divided into a set of smaller segments which can overlay each other, being brought into memory only when needed. This task 152 ] ournal of Library Automation Vol. 3/ 2 June, 1970 required a careful analysis of each subroutine for its dependence upon others, breaking the program into mutually exclusive segments, while ensuring that any given set of segments which occupied memory simul- taneously did not exceed 50K bytes of storage. Many of the larger seg- ments which had to be further subdivided required considerable repro- gramming. The first attempt at executing the new overlay version failed. Due to a general lack of experience with the 2260 Display Units, it had not been anticipated that system software would not allow the console to be accessed from outside of the root segment, and the 2260 software package had been placed in an overlay area. As a result the original overlay configuration had to be altered. The console input/ ouput (I/0) package was moved into the root segment, increasing its size by several hundred bytes and similarly decreasing the amount of storage available for the overlay portions. Therefore, it was necessary to develop yet another configuration to conform to these new storage limitations. While the necessary changes were being made, the Computing Center began operating a limited time-sharing system which itself required full dedication of the 360/50 machine. Projected dates for returning to normal computer operations within a multi-partition environment were far enough in the future to suggest the efficacy of creating a new version of MOLDS which could function off line, with cards and printer instead of the 2260 consoles. In this batch, or off-line, mode MOLDS jobs could be submitted through the regular queue and run by Computer Center staff during batch process- ing time. With the on-line source program as a starting point, all refer- ences to 2260's were replaced with card reader and printer statements and the MOLDS language instructions deleted which depended on the console for their use. Mter all changes had been made and compilation was completed successfully, the off-line MOLDS was exercised against a sample data base until it was satisfactorily debugged. Since it was known that the Computing Center would eventually return to r artitioned operation, it was next undertaken to overlay the off-line MOLDS into a 50K partition. This was accomplished with little difficulty since the problems encountered in working with the on-line version were largely due to the consoles. The end result of the entire task, therefore, was an off-line MOLDS which could operate either in core or in overlay structure at the discretion of the user. THE MOLDS QUERY LANGUAGE The MOLDS query language includes some 34 distinct commands which must be entirely formulated by the user according to precise syntactical rules. The large number of commands is in part a reflection of the fact that this system provides the user with the ability to perform more op- erations of a greater variety on a data base than other interactive infor- LC MARC on MOLDS/ ATHERTON and MILLER 153 mation retrieval systems. It provides for retrieval of records from the data base according to data value descriptors, processing of data values by arithmetic and logical operations, sorting of retrieval records, and dis- play of retrieval records in full or in part. Operationally, the MOLDS system regards a file of records as a set of parallel lists of blocks (Figure 1). With the MARC data base, these blocks were the 38 fields of catalog data (such as Dewey class number, title, author, etc.). The commands in the MOLDS query language are geared to list processing operations. In general, most of the MOLDS com- mands will result in the formation of lists which are either identical in format to the original file, or are an independent list of alpha or numeric constants not subdivided into blocks. Despite its surface complexity, the query language was designed specifi- cally for users with absolutely no computer experience. The fixed format commands are easy to learn and use, even for the novice in computer based systems. They are mnemonic enough so that a little use soon brings an easy familiarity with them. Commands in the MOLDS Query Language There are six categories of commands in the language: retrieval, pro- cessing, display, storage, utility, and language augmentation. The com- mands are listed below with a brief explanation of each. Retrieval Commands: FIND: EXTRACT FETCH DEFINE CHAIN SELECT Forms a temporary subfile consisting of records from the data base for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. Forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. Forms a temporary file which duplicates an existing file in the data base (added to original MOLDS com- mands during this project) . Forms a temporary subfile from two argument subfiles based on logical relationships AND, OR, NOT. Forms a temporary subfile consisting of records from an argument subfile for which the value in a speci- fied block is equal to any of the values in a specified block from a second argument subfile. Forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in an argument list. 154 Journal of Library Automation Vol. 3/2 June, 1970 These six retrieval commands allow the user to extract selected data from the data base. Selection is based on 1) a simple algebraic relation- ship (e.g., equal, not equal, greater than, etc.) between block values and a value specified by the user in the command (value may be alphanumeric or numeric), or 2) a simple logical relationship (e.g., and, or, not) be- tween block values in two lists. All retrievals from MOLDS files are based on exact-match correspond- ences between input descriptors and data values as they occur in records. Each file is treated as distinct regardless of the fact that for the MARC/ MOLDS data base the second file may simply be a continuation of the first, etc. Any block in a file may be used as an argument in a retrieval process. Thus, the usual range of access points (author, title, subject, classification number) is considerably extended to include such unorthodox access points as juvenile literature, language, illustrations, and bibliographies. For ex- ample, one can retrieve all documents on a given subject or subjects which are juvenile books with bibliographies and illustrations published by a given publisher in 1966. The user can define his search limits with a degree of specificity not found in most interactive systems. However, the price he must pay is exactness in specifying the values used as retrieval criteria. The system will not retrieve on root words or key letter combinations, although such capability could be added. The block values must, there- fore, be consistent and the user must have a precise knowledge of what they may be. This knowledge can be gained by examining the values and having them printed out as needed. (MOLDS does have the capability of selecting unique values from a list, ordering them, and printing them out at any time during system operation. Processing Commands: COUNT Counts the number of records in an argument subfile or items in an argument list. ORDER (REVERSE) MAXIMUM (MINIMUM) TOTAL AVERAGE Arranges the records of an argument subfile in as- cending (descending) order according to the values in a specified block or similarly sorts the values in an argument list. May be applied to alphabetic, numeric, and chronological data. Selects the record containing the maximum (mini- mum) value in a specified block from an argument subfile, or the maximum (minimum) value in an argument list. May be applied to numeric or chrono- logical data. Calculates the sum of the values in a specified block of an argument subfile or of a list of numbers. Calculates the average of the values in a specified block of an argument subfile or of a list of numbers. LC MARC on MOLDS/ATHERTON and MILLER 155 MEDIAN VARIANCE SQUAREROOT DIFFERENCE ADD (SUBTRACT MULTIPLY DIVIDE) Calculates the median of the values in a specified block of an argument subfile or of a list of numbers. Calculates the variance (standard deviation squared) of the values in a specified block of an argument subfile or of a list of numbers. Calculates the square root of each value in a block of an argument subfile or of a list of numbers. Calculates successive differences in the values of a specified block in an argument subfile or of a list of numbers. Adds (subtracts, multiplies, divides ) the values from a specified block from an argument file (or list) to the corresponding values from a specified block from a second argument file (or list) . FIRSTELEMENT Selects the first record from an argument subfile or REDUCE COMPRESS list. Deletes the first record from an argument subfile or list. Forms a temporary list composed of all the unique values in a specified block of an argument subfile or in an argument list. The eighteen Processing commands allow the user to manipulate the data in the lists he has retrieved. He may count the number of elements in a list, arrange them in ascending or descending order, form the sum, average, variance, median and square root of a list of numbers; add, sub- tract, multiply, and divide one list by another, and select all unique ele- ments from a list. The ability to process data as well as retrieve it may be unique to MOLDS as compared to other interactive systems, and gives the language a useful added power. Display Commands DISPLAY SHOW PRINT Outputs on the CRT (cathode ray tube) each com- plete record in an argument subfile (Added to original MOLDS commands during this project). Outputs in columnar fashion on the CRT selected blocks from up to three argument subfiles or lists (Deleted in batch or off -line mode) . Outputs in columnar fashion on the printer selected blocks from up to three argument subfiles or lists (Added to original MOLDS commands during this project) . The three Display commands allow the user to display entire documents, or display selected books of information or records in columnar format. In 156 Journal of Library Automation Vol. 3/2 June, 1970 the on-line version of MOLDS this may be done on the CRT, or a print- out made of selected blocks or lists of documents on the high speed printer. There is much flexibility and versatility in output format which is com- pletely determined by the user. The command, SHOW, is not used in the batch mode of MOLDS. Storage Commands: SET Stores a single numeric value. STORE Stores an alphabetic, chronological, or numeric list of arbitrary length. The two Storage commands allow the user to insert independent lists of constants into the storage area. Such lists do not become part of the data base, but are used in conjunction with retrieval and processing commands. Utility Commands: CLEAR DELETE DUMP RECALL LIST Deletes from storage a temporary subfile or list created during the session. Deletes from storage all temporary subfiles or lists created during the session. Displays on the CRT in tabular fashion the names, file origins, and number of items in each subfile and list created by the user during the session (deleted in batch or off-line mode). Displays on the CRT the command which resulted in the creation of a specified temporary subfile or list (added to original MOLDS commands during this project). Produces printed copy of all commands issued dur- ing the session. May be used with STOP at end of search (added to original MOLDS commands during this project). The five Utility commands allow the user to perform housekeeping op- erations, such as the clearing of storage areas, reinitialization of the system, and termination of execution. The command DUMP is not used in the batch mode of MOLDS. Language Augmentation Command: PROGRAM Allows the user to create new commands consisting of a sequence of basic commands and to store them for future sessions. The language augmentation command PROGRAM, is one of the most important features of the language. It allows the user to create new com- mands tailormade to his own needs. This is shown in the first MOLDS search query which follows. LC MARC on MOLDS/ATHERTON and MILLER 157 SEARCH REQUEST FORMULATION IN MARC/MOLDS MOLDS Search Query-Example 1 (Batch Mode) PROGRAM TALLY A/ COUNT B A/ PRINT B// END FIND ZNY SSOZ/PLCD/E/NYNY/ TALLY ZNY/ PRINT ZNY/PLCE/ PLCD/PUCD/ I FIND P67 SSOZ/ DATE/E/1967 I TALLY P67/ DEFINE NY67 ZNY/AND/P67/ TALLY NY67/ AVERAGE AVHT NY67 /HITE/ PRINT AVHT/ STOP The above example shows an off-line or batch-mode search. This se- quence of commands would be keypunched and submitted as a job deck in the regular queue and run by the computer center staff, the searcher receiving the results as a printout from the high speed printer. SSOZ is the name of one of the MARC/MOLDS files. This particular inter- action shows the use of the operator PROGRAM to augment the lan- guage in the subsequent search by adding TALLY to the list of com- mands. The following example shows a search query which is a sequence of some typical MOLDS commands along with an explanation of the effect of each. Each command has three parts. The first part (FIND, DEFINE, etc.) is the imperative which tells what operation is to be performed. The second part ( BIBL, ENGL, BOTH, etc.) is the label of the place in storage where the result of the operation is to be stored. This label is made up by the user when he gives a command. The third part of the command is the operand. In some cases the operand gives the criteria for retrieval (as in FIND, DEFINE) . It always gives the name or label of the list to be operated on, and in some cases specifies a particular block of that list. The request shown in this example was handled by MOLDS to retrieve, display, and process all English language books on printing, or type- setting, or type founding which have bibliographies. The sequence illus- tt·ates the flexibility of MOLDS, the many types of processing which can be done, the relatively easy way to use command format. This particular sequence was performed in the on-line version with chance for user- system interaction after each command. 158 Journal of Library Automation Vol. 3/2 June, 1970 MOLDS Search Query-Example 2 (On-Line mode) MOLDS Commands: FIND BIBL MARC/BIB/E/X/ Explanation: Find all records in the file named MARC for which the block named BIB contains a value equal to (E) X (X in the block indicates presence of bibliographies). The list of selected records is to be stored in a location called BIBL. FIND ENGL MARC/LANG/E/ENG/ Find all documents in the file named MARC for which the block named LANG contains a value equal to (E) ENG, i.e. English language books. The list of selected records is to be !itored in a location called ENGL. DEFINE BOTH BIBL/ AND/ENGL/ STORE SUBS 3/ ALPHA/13/ ELEMENT 1 = PRINTING/ ELEMENT 2 = TYPE-SETTING/ ELEMENT 3 = Define a new list called BOTH which consists of the docu- ments common to both BIBL and ENGL, i.e., all English language books with biblio- graphies. Inform the system that the user wishes to store, via the console, a list of values which will be called SUBS. The list will contain 3 elements which will be alphanumeric (ALPHA) as opposed to strictly numeric. The longest element will not exceed 13 characters. (System responds with these words.) User inserts first value by typing it on the console. (System responds with these words.) User inserts second value. (System responds with these words.) LC MARC on MOLDS/ATHERTON and MILLER 159 TYPE-FOUNDING/ SELECT ALL BOTH/SUBJ/SUBSI COUNT NO. ALL/ SHOW NO.// PRINT ALL/MAIN/TITL/LCNO I I ALL/PUBL/PLCE/ I MAXIMUM BIG ALL/HITE/ AVERAGE AVE ALL/HITE/ User inserts third value. User has now created an indepen- dent list of three distinct values- PRINTING, TYPE- SETTING, TYPE-FOUND- ING and stored them in a lo- cation called SUBS. Select all records from the list called BOTH for which the values in the block named SUBJ are equal to any of the values in the list called SUBS, i.e. those records for which the subject heading is PRINT- lNG, TYPE-SETTING, or TYPE-FOUNDING. The se- lected records are stored in a location called ALL. Count the number of records in the list called ALL. The count is stored in a location called NO. Display the contents of NO. on the CRT. Produce a 5-column printed listing consisting of the values in the blocks ·named MAIN (main entry), TITL (title), LCNO (library of Congress classification number), PUBL (publisher), PLCE (place of publication) from each record of the list called ALL. From the list called ALL, se- lect the record containing the maximum value in the block named HITE (height) . The record is stored in a location called BIG. Calculate the average of the values in the block named HITE (height) of the list called ALL. The value is stored in a location called AVE. 160 Journal of Library Automation Vol. 3/2 June, 1970 The following example records another interaction and the results in the off-line or batch mode. Notice the error message which did not inter- rupt the search. This result also includes a report on the length of Cen- tral Processing Unit (CPU) time each operation takes in hours, minutes, seconds and tenths of seconds. Any line preceded by C indicates that the line was printed by the computer; any line minus the C indicates that the information was typed in by the user. MOLDS Retrieval-Example 3 (Batch Mode) C PLEASE ENTER YOUR PROGRAM C LINE 1 OOOOOOOOPAULINE ATHERTONOOOOOOOOO c INVALID COMMAND NAME c SET IN AT 185 DAY OF 1969 16-01-17.1 c LINE 1 PROGRAM TALLY A/ c LINE 1 COUNT B A/ c LINE 2 PRINT B// c LINE 3 END c SET IN 185 DAY OF 1969 16-01-17.5 c LINE 2 FIND D2 SSOZ/DEW2/NE/O? c SET IN AT 185 DAY OF 1969 16-02-38.7 c LINE 3 FIND D1 SSOZ/DEWl/NE/ I c SET IN AT 185 DAY OF 1969 16-03-56.7 c LINE 4 TALLY D2/ c 950.00 c SET IN AT 185 DAY OF 1969 16-03-57.3 c LINE 5 TALLY D1/ c 905.00 c LINE 6 STOP COMMENTS ON MARC/MOLDS Thus far this report has been confined to a more or less factual descrip- tion of the components of the MARC/MOLDS system. No doubt the reader has asked himself many questions about the system, and made his own critical comparisons between this system and others. What fol- lows are preliminary and necessarily subjective comments based on a LC MARC on MOLDS/ATHERTON and MILLER 161 few demonstrations given to students in the School of Library Science and on the authors' own observations and reflections. System Design Response Time Response time (i.e. the time between transmission of a command in the on-line version and its execution) has been on the order of 90 seconds for a search of 620 records, to 20 seconds for an arithmetic operation involving the same number of records. When one thinks of these times in comparison with the time required to perform the same operations manually, they seem rapid. However, 90 seconds appears to be an unreasonably long period of time in a computer-based interactive retrieval environment. Viewers of demonstrations often asked why it took the computer "so long" to perform a search. A user's tolerance for delay appears to vary a great deal with the type of retrieval system he is using. This has been observed on other occasions, but no determination has yet been made of tolerable limits in different environments, a determina- tion that would be important in designing computer-based systems. · Man-System Interaction A design goal of most other existing interactive retrieval systems seems to be to give the computer certain anthropomorphic qualities and make it into a teacher or a responsive friend. Such systems offer computer- aided query formulation and/or a friendly conversation with the com- puter. The MOLDS on-line system does not include either of these fea- tures. The user must first master a MARC/MOLDS manual which is an explanation of the system and the data base. He then goes on line and gives his command. MOLDS responds by performing that command or by putting out a brief error message if the command format was improper. Apparently the objective of conversation with the computer as found in most systems is to make it easier for the user to achieve desired results or to make him feel more at ease with the system. The person who plays with an interactive system once or twice probably finds conversations with a computer amusing, novel, and helpful in his first attempts. How- ever, for a serious and steady user, carrying on the same conversation with the computer during each and every session can be tedious, repeti- tive, time consuming and sometimes circular. The optimum mix of com- puter-aided and independent user-formulated query is yet to be studied and found. Perhaps MOLDS, because it is a poor conversationalist, could aid in this search. At any rate, the automatic assumption of conversa- tional features as a design goal for computer-based retrieval systems may not be based on sound knowledge of what suits the serious user. MOLDS Repertory of Commands The processing commands in the MOLDS query language are a wei- 162 Journal of Library Automation Vol. 3/2 June, 1970 come and valuable addition to the usual repertory of search and display commands common to most interactive systems. Although the MARC data base does not lend itself to a great deal of processing, we have found some commands useful, particularly COUNT, ORDER, MAXIMUM, MINIMUM, and COMPRESS. Processing Times When individual commands of a single search take seconds of CPU time, it is certain that a retrieval system will be expensive if it is em- ployed by a great many users as a general purpose system. Some of the MOLDS commands operating on the MARC data base took whole min- utes of CPU time! The authors have learned a great deal about inter- active retrieval systems by using MOLDS experimentally, but because of the excessive cost of certain runs, may not be able to continue research with it. Modifications will have to be made to make it more efficient (i.e. cheaper to run) before it could be recommended for general use in the Syracuse University Library School or anywhere else. If the MOLDS system can be designed to yield good results for certain types of searches with a realistic file size, it will be a boon to the library or educational institution seeking to automate some part of its searching procedures. Data Base Noah Prywes ( 14) has commented, "The effectiveness in retrieving documents is highly dependent on the amount of labor and processing invested in the storage of documents." The minimum amount of process- ing done on the MARC tapes has, in fact, limited the effectiveness of retrieval. The extreme simplicity of the general MOLDS data base struc- ture is worthy of study. The efficiency and cost of retrieval using this structure needs to be compared very carefully with more sophisticated threaded lists. One extremely important factor to consider will undoubted· ly be the effect of increasing the size of the file. As pointed out before, the MOLDS system requires an exact match of punctuation and spelling between retrieval criteria and stored data items, a match difficult to achieve. To be sure, this is partially a limitation in the MOLDS system that may be relaxed by incorporating a capability to search for root words and key letter combinations. However, the many inconsistencies in abbreviations, punctuation, and spelling that appear in bibliographic records when information on title pages is transcribed, as on the MARC tapes, can enormously complicate effective retrieval. MARC or non-MARC bibliographic records will always contain some "author" variations that such a system as MOLDS may have to accom- modate. This is a very knotty problem. These comments are not to be construed as a criticism of the fine work the Library of Congress has done in its MARC Pilot Project. The MARC LC MARC on MOLDS/ATHERTON and MILLER 163 Pilot Project record format, with sometimes indistinct data elements ( spe- cial punctuation marks and symbols), was not specifically designed for computer-based interactive search systems. Hopefully, the use herein described to which the MARC data base has been put, and the experi- ence derived from that use, will be of value as future modifications of the MARC format are made. Mter all, reference retrieval, using biblio- graphic information, automated or manual, is natural to libraries and is, indeed, one of the purposes for which that information is recorded in the first place. Since one of the true values of a computer-based file lies in making multiple use of the records, it becomes imperative to test the vari- ous uses to which these records can be put. THE FUTURE USE OF MARC/MOLDS AT SYRACUSE UNIVERSITY The MARC/MOLDS system has undergone continual modification in data base structure and query language during the first year of work on it. A computer-based system must be capable of such flexibility, for changes should be accomplished easily and smoothly. No system is per- fect, especially in its early days, least of all MOLDS. It is intended to continue investigation into information-seeking behav- ior, and to use MARC/ MOLDS occasionally along with other retrieval systems. Another paper describes use of the MARC file with the IBM/ Document Processing System ( 15). SUMMARY This report has tried to describe, not sell, MARC/MOLDS as fairly as possible in the belief that some of its features should be considered by persons designing interactive systems, and by those responsible for refine- ment of the MARC format. The searching capability is valuable as it increases the access points to the data. The arithmetic and logical opera- tions provide an opportunity to perform certain studies of the MARC data base. The MARC files will eventually have many applications beyond technical processing functions in libraries. These applications would be more practically implemented if the MARC format were modified to ac- commodate them and if librarians would use systems such as MOLDS during their exploration of alternatives. MARC/MOLDS as a computer-based system has many wealmesses. Outnumbering and to some extent overshadowing the concrete statements about its faults is its great potential. Many questions have been raised which remain unanswered. Questions dealing with the basic design of the system and data base are indicative of the development and experi- mentation which must be done before computer-based interactive retrieval in libraries is a practical reality. ACKNOWLEDGMENTS The work on this project has been supported by Rome Air Develop- 164 Journal of Library Automation Vol. 3/2 June, 1970 ment Center (ContractS. U. No. AF30 (602)-4283). Related work, sup- ported by a grant from the U. S. Office of Education, provided an educa- tion in understanding of the MARC tapes. The authors gratefully acknowledge the comments made by Phyllis A. Richmond and Frank Martel on the original manuscript. Mrs. Sharon Stratakos, programmer most responsible for MOLDS, contributed a great deal to the authors' understanding of this retrieval program and its poten- tial use with a bibliographic reference file such as MARC. PROGRAM Microfiches and photocopies of the following may be obtained from National Auxiliary Publications Service of ASIS: "Rome Project Program Description: MOLDS Support Package" (NAPS 00884). REFERENCES 1. Avram, Henriette: The MARC Pilot Project, Final Report (Washing- ton, D. C.: Library of Congress, 1968). 2. A User-Oriented On-Line Data System (Syracuse, N. Y.: Syracuse University Research Corp., 1966). 2 v. 3. Freeman, Robert R.; Atherton, Pauline: AUDACIOUS-An Experi- ment with an On-Line Interactive Reference Retrieval System Using the Universal Decimal Classification as the Index Language in the Field of Nuclear Science (New York: American Institute of Physics, April 25, 1968) (AIP/UDC-7). 4. Burnaugh, H . P.; et al: The BOLD User's Manual (Revised) (Santa Monica, Cal.: Jan. 16, 1967) ( TM-2306/004/01). 5. Cegala, L.; Waller, E.: COLEX User's Manual (Falls Church, Va.: System Development. Feb., 1969) (TM-WD-(L)-405/000/00). 6. Smith, J. L.; MICRO: A Strategy for Retrieving Ranking and Qualify- ing Document References (Santa Monica, Cal.: Jan. 15, 1966) (SP 2289). 7. Green, James Sproat: GRINS : An On-Line Structure for the Negotia- tion of Inquiries (Bethlehem, Pa.: Lehigh University, Center for the Information Sciences, September 1967) . 8. Computer Command and Control Company: Description of the Multi- list System (Philadelphia, Pa.: July 31, 1967. 9. National Aeronautics and Space Administration, Scientific and Tech- nical Information Division: NASA/RECON User's Manual (Washing- ton, D. C.: October 1966). 10. Kessler, M. M.: TIP User's Manual (Cambridge, Mass.: Massachu- setts Institute of Technology, Dec. 1, 1965). 11. Biomedical Communication Network: User's Training Manual (Syra- cuse, New York : December 1968). 12. Welch, Noreen 0. : A Survey of Five On-Line Retrieval Systems (Washington, D. C.: Mitre Corp., August 1968) (MTP-322). LC MARC on MOLDS/ATHERTON and MILLER 165 13. Avram, Henriette D.; Knapp, John F.; Rather, Lucia J.: The MARC II Format (Washington, D. C.: Library of Congress, 1968). 14. Prywes, Noah S.: On-Line Information Storage and Retrieval (Phila- delphia, Pa.: University of Pennsylvania, Moore School of Electrical Engineering, June 1968). 15. Atherton, P.; Wyman, J.: "Searching MARC Project Tapes Using IBM/Document Processing System," Proceedings of American Society for Information Science, 6 ( 1969), 83-88. 5251 ---- 166 BOOK REVIEWS Proceedings of the 1968 Clinic on Library Applications of Data Processing, edited by Dewey E. Carroll. Urbana : University of Illinois, 1969. 235 pp. $3.00. For all except inveterate institute participants, it must be difficult to decide to spend yet another week listening to a widely mixed series of papers and discussions on data processing in libraries, in the hope of finding something new or useful. To attract a wide audience, the offer- ings tend to range from simple introductions to technical discussions of specific programs or projects. The value of gathering the papers of such institutes into volumes of proceedings is questionable. Material from the introductory papers would certainly find greater use in a comprehensive monograph, while the papers which report new developments or tech- nical problems would have a better chance of reaching their proper audiences if published in journals. The repetitive "how-we-did-it" reports might best be left unpublished. The Proceedings of the 1968 Illinois Clinic does have a number of articles which deserve wide readership. Frederick G. Kilgour's paper on initial system design for the Ohio College Library Center is excellent, not so much for solutions, but because he raises the questions on the purpose of college libraries and the nature of regional systems which need to be raised before embarking on design. Those who have had ex- perience with automated operations will appreciate Lawrence Auld's list- ing of ten categories of library automation failure. (He omits one of the most common-lack of computer stability. ) A technical article of con- siderable interest is Alan R. Benenfeld's paper on generation and encod- ing of the data base for INTREX. Those looking for reports of successful computer applications may find useful information in the papers by Robert Hamilton, of the Illinois State Library, on circulation; by James W. Thomson and Robert H. Muller, of the University of Michigan, on the U. of M. order system; by Michael M. Reynolds, of Indiana University, on centralized technical processing for the university's regional campus libraries; by John P. Kennedy, of Georgia Tech, on production of catalog cards; and by Robert K. Kozlow, of the University of Illinois, on a computer-produced serials list. Melvin ]. Voigt Book Reviews 167 Planning Library Services. Proceedings of a Research Seminar held at the University of Lancaster, 9-11 July, 1969. Edited by A. Graham Mackenzie and Ian M. Stuart. Lancaster, England: University of Lan- caster Library, 1969. 30 shillings. This volume offers fifteen papers presented in six sessions; each session had one or more papers and some discussion. The papers range from very general mathematical models to local problems of British legal codes and re-organization of local governments. The first session introduces the problems and some theoretical notions of how to deal with them. The next three sessions deal with analysis techniques. Morely introduces some simple techniques of maximizing benefits for given resources. Brookes presents a good quick introduction to statistics and distributions which occur frequently in information science. Leimkuhler develops cost models for storage policies and Wood- burn analyzes the costs in heirarchical library systems. The mathematics in these latter papers, although not difficult, will probably put off a good many librarians and administrators. Both are practitioners and impressed by results, not complex models; the equations developed by Leimkuhler or Woodburn are probably too complex to be successfully used by most librarians. This might reflect the state of the librarian and not of the art, however, to quote Cloote (from the paper by Duchesne) : "With only a very few notable exceptions, successful models have been so simple that an operational research specialist would disown them." The fifth session covers data collection and evaluation. Duchesne com- ments on management information systems and operations research for librarians. Conventional techniques of data collection are reviewed by Ford, including sample forms and a note of warning about too many surveys. In the final session Leimkuhler presents an overview which includes several choice comments on progress (or lack of it) in libraries. During the discussion period, Mackenzie suggests that libraries should use up to five percent of their budgets for research. This reviewer feels that unless this suggestion is taken more seriously, most of the theory will never find an application. These proceedings would make an excellent companion to Burkhalter's Case Studies in Library Systems Analysis as more theoretically oriented readings for a course operations research or administration in librarian- ship. Some of the techniques presented could be adapted for immediate application in analyzing present systems. Thus this collection of papers can be useful to both student and practitioner interested in research and development of library systems. Arvo Tars 168 Journal of Library Automation Vol. 3/2 June 1970 Libraries at Large, edited by Douglas M. Knight and E. Shepley Nourse. New York: R. R. Bowker, 1969. 664 p. $14.95. Libraries at Large is based on the materials which the National Advisory Commission on Libraries employed in its deliberations. The Commission appraised the adequacy of libraries and made recommendations designed "to ensure an effective, efficient library system for the nation." These materials are also useful to those engaged in the enrichment of present library programs and to those developing new library projects. The materials consist of papers and reports written for the Commission and include essays, original investigations, and literature reviews, as well as reprints of material that has appeared elsewhere. Some papers are of top quality; some are poor. Nevertheless, the appearance of these materials in one volume adds a convenient source of information that will be useful to librarians for years to come. Approximately half the book is devoted to problems related to the use of libraries and to the users of libraries. The second half contains dis- cussions of government relationships of libraries and a series of useful appendixes. Perhaps the most novel section of the book is William J. Baumors "The Cost of Library and Informational Services." This study investigates the economics of libraries in depth and the results are of great interest. This chapter on economics contains new material and brings together that which existed heretofore, so that it constitutes the major resource on library economics. This chapter alone is so valuable as to justify the recommendation that all libraries and most librarians should acquire Libraries at Large. The section on copyright is equally important, for it brings together data on a topic possessing cataclysmic potentials for librarianship. Verner Clapp's "Copyright: A Librarian's View" is the best statement that has appeared on the subject, and it is hoped that Clapp's dissertation will awaken librarians to the peril that confronts them. On the other hand, the chapter entitled "Some Problems and Poten- tials of Technology as Applied to Library Informational Services" is some- what less than satisfying. The section starts off with Mathews and Brown's "Research Libraries and the New Technology," which originally appeared in On Research Libraries. It is still an inadequate exposition. There follows a reprint of "The Impact of Technology on the Library Building," which Educational Facilities Laboratories published in 1967. The statement is adequate, but more useful information exists. The last section of the chapter is a study, "Technology in Libraries," which the System Development Corporation produced. This paper is a useful review of technologies employed by libraries and recommends five important network and systems projects to be undertaken. The chapters on government relationships include discussions of those Book Reviews 169 with the federal government and those at local, state and regional levels. Germaine Krettek and Eileen D. Cooke have provided a worthwhile appendix listing and abstracting library-related legislation at the national level. Libraries at Large is indeed a resource book, and those papers con- taining original investigations and literature reviews are of such high quality as to insure usefulness of this work to all thoughtful librarians. Frederick G. Kilgour Computers and Their Potential Applications in Museums. A Conference Sponsored by the Metropolitan Museum of Art. New York: Arno Press, 1968. 402 pp. $12.50. Computers and Their Potential Applications in Museums contains the published proceedings of a conference which was held in New York, 1968. Sponsored by The Metropolitan Museum of Art and supported by IBM, the conference was another attempt to involve art and related fields in computer technology. This book covers a broad range of issues and problems from information retrieval to creativity. Experts from museums, educators, librarians and computer specialists discussed the possible uses and the implications of computers for the museum field. The diversity of the participants seems to represent the components of an exceedingly complex problem which is as monumental as the museum field itself. As an overall document it gives evidence of concern and insight into the many technical problems which some researchers have encountered. In many instances the non-technical experts were too global in their thinking, while the technologists were too local in their area of concern to communicate to anyone but technologists. This disparity between approaches, with the obvious difficulties presented, is a typical one whenever non-technical groups attempt to make use of computer technology. An ambitious conference in scope, there were excellent participants and several of the papers were stimulating and provocative. The inter- action among the people who attended the conference may have been useful and it may have generated important ideas. For a reader of the published proceedings one wishes there had been a final chapter which could have provided some guidelines for research and education in this field. There was an opportunity for the organizers of the conference or a small group of the participants to summarize the problems and to give some direction to solutions. Several years and many conferences later we in the humanities have made little progress in use of the com- puter. It seems that we are still better at rhetoric than at problem solving. Charles Csuri 170 Journal of Library Automation Vol. 3/2 June 1970 Books for Junior College Libraries. Pirie, James W., comp. Chicago: American Library Assoc., 1969. 452 pp. $35.00. During the recent period of rapid growth and development of junior and community colleges, a bibliographic guideline has been long awaited. James W. Pirie's Books for Junior College Libraries, with its healthy potential for developing many basic collections and extending and up- dating others, fills that void. Though it does not boast to be the single ideal bibliographic tool, it is a welcome addition to, (and perhaps replacement for some of) its pre- decessors-Frank Bertalan's Books For Junior College Libraries; Charles L. Trinkner's Basic Books For Junior College Libraries; Hester HoHman's Readers Adviser; Helen Wheeler's A Basic Book Collection for the Com- munity College Library; Bro Dart Foundation's The Junior College Library Collection~ edited by Dr. Bertalan; and the ever-present Subject Guide to Books in Print and Books in Print, from Bowker. Books for Junior College Libraries represents the cooperative efforts of some 300 expert consultants-subject specialists, faculty members and librarians-charged with the responsibility of producing a publication to serve as a book selection guide for new or established junior and com- munity college libraries. Approximately 20,000 titles are arranged by subject, broadly inter- preted; with entries consisting of author, title, subtitle, edition; if other than the first, publisher, and place of publication, date, pagination, price and L.C. number. Easy access is provided by the inclusion of an author and subject index. A comparative "Table of Subject Coverage" appearing in the preface, tabulating the percentage of subject distribution to total volume for the Lamount, Michigan, and the more recent Books For Col- lege Libraries lists, indicates that Books For Junior College Libraries maintains a comparable subject percentage distribution to total volume. Only book titles have been included; foreign entries have been limited to a few major works, and out-of-print titles, in favor of titles readily avail- able. Paperbacks were listed, in the absence of card copy. Though limited in its coverage of terminal and vocational courses, with emphasis toward the transfer or liberal arts program, Books For Junior College Libraries does embrace all fields of knowledge that tend to be challenging and useful for the general education programs. It has been endorsed by the Joint Committee on Junior Colleges of AAJC, ALA, and the Junior College Section of ACRL, and moves toward the recommen- dations of the ALA Standards For Junior College Libraries. This biblio- graphic guideline for junior college libraries should be welcomed by public schools as well as junior and community colleges for its assistance in developing new collections, as well as expanding and updating old collections, with quantity, quality, and economy working together. ]ames I. Richey Book Reviews 171 Agricultural Sciences Information Network Development Plan. EDUCOM Research Report, August 1969. 74 pp. The National Agricultural Library wants to implement its old plan of an Agricultural Science Information Network "based on the assumption that the land-grant libraries in the States are the natural nodes to this network." EDUCOM undertook a study which was submitted to and discussed by a symposium held in Washington, D. C., on February 10-12, 1970, with the participation of all agricultural libraries interested in "new and improved ways of exchanging information in support of agricultural research and education." The goal is "to develop a long-range plan for strengthening informa- tion, communication, and exchange among the libraries of land-grant institutions and the NAL." According to the report, the network concept would constitute a "network of networks" and three basic components are envisioned: 1) land-grant libraries, 2) information analysis centers, and 3) telecommunications. All these components have their own aims and objectives described in this report. "NAL's first course of action in the establishment of a system of information analysis centers is to develop a directory of existing analysis centers of interest to the agricultural community. The directory should be supported with a catalog detailing the services and products offered by these centers. NAL should then establish coopera- tive agreements with these centers which would make them responsive to the needs of the users of the Agricultural Sciences Information Net- work. This should be supported with the installation of communica- tions equipment to encourage and facilitate the use of a center." No doubt, the participants of the symposium will have thoroughly investigated and discussed this plan with serious consideration to its practical implementation. A new approach and improvement of informa- tion exchange is not only a necessity, but also long overdue, for those in agriculture. This information development plan would provide service for research workers at the experiment stations, scientists and teachers at the colleges, agricultural extension people at the land-grant institutions, and, last but not least, for the farmers who provide us with food and fibers in order to bring a fuller and better life on the farm and in rural and city homes. A detailed analysis of the performance, an evaluation and revision of this gigantic scientific information system, can only be made after it has been in operation for a few years. It is very promising that the National Agricultural Library-among its many objectives-has again taken the initiative. John de Gara 172 Journal of Library Automation Vol. 3/2 June, 1970 Cornell University Libraries. Manual of Cataloging Procedures. 2d ed. Ithaca, N.Y. : Cornell University Libraries, 1969. $18.00. Editor Robert B. Slocum and his associates have produced a valuable manual useful to catalogers and persons involved in the administration of policies and procedures in technical services. As stated in the preface the manual is a supplement, not a substitute, for the Anglo-American Cataloging Rules and its predecessors, LC List of Subject Headings and the LC Classification Schedules. The following directive is basic: "The revisers are always open for consultation on particularly difficult prob- lems, but it must be assumed that a professional cataloger will have a thorough knowledge of the basic tools of his profession. . . . If this knowledge is in any way lacking, the cataloger has the obvious responsi- bility of acquiring it through diligent study and experience. He should not come to the reviser with questions whose answers are available in the aforementioned tools and in this Manual." The format is loose-leaf, so that additions and revisions may be made easily to reflect new developments and techniques. The sections include Pre-Cataloging Procedures; General Cataloging and Classification Proce- dures; Recataloging and Reclassification; Cornell University College and Department Libraries-Special Collections and Special Catalogs; . . . Serials and Binding Department; Files and Filing; Typing, Card Produc- tion, Book Preparation; Statistics; Appendix (including abbreviations, romanization tables, etc. ) ; and Index. The procedures and practices described are those adopted by a research library "conscious of the need for both quality and quantity in the work of its staff." This publication, weighing five pounds, is a great achieve- ment and with its full index an indispensable contribution to the collec- tion of worthwhile cataloging manuals. Descriptions of local procedures may seem detailed but basic principles and policies are well covered. The final touch is the inclusion of a catalog card for the manual! Margaret Oldfather 5254 ---- 207 STANDARDIZED COSTS FOR AUTOMATED LIBRARY SYSTEMS Mary Ellen L. JACOB: Systems Officer, Fisher Library, University of Sydney Costs of automated library systems as currently given in published reports tend to be misleading and confusing. It is necessary to have a clear under- standing of how they were derived before any comparisons can be made. Clearly defined costs in terms of time units are more meaningful than straight dollar costs and can be used as one means of comparison among various system designs and as guidelines for the design of new systems. There is a great lack of consistency in reporting the costs of automated library systems. Cost figures given in published reports tend to be mis- leading and confusing; rather than indicate the true cost of a system, they tend to obscure the entire issue. Without a clear understanding of how such figures were obtained, one cannot use them for comparison against any other system ( 1). While it is true that no two systems are identical, use of standardized methods can make cross comparisons meaningful and give a basis for estimating the costs of new systems ( 2). When all the variables affecting automated library systems are con- sidered, it is very tempting to say that no realistic comparisons can be made. What is needed is some definite statement of just what criteria can be used to determine costs and how they are derived ( 3). While there have been numerous studies dealing with the cost aspects of specific func- tions, no real attempt has been made to define standardized cost criteria .. 208 Journal of Library Automation Vol. 3/3 September, 1970 for automated library systems. The following discussion is an attempt to identify and define some of the more common cost aspects of such systems. WHAT IS COST? "Cost" as defined by accountants, is not the subject of this article. Primary interest here is in cost as a yardstick for measuring the efficiency and effectiveness of a system and for its comparison with other systems. It is important to note that cost is only one criterion and not necessarily the most important one. As costs are herein described several factors need to be determined. Does cost include: 1 ) fixed overhead such as lighting, office space, administrative functions, etc.? 2) actual salaries or assessed salaries (some installations have a fixed figure for certain types of jobs regardless of the actual cost)? 3) equipment cost ( each installation has its own methods for prorating equipment costs)? 4) material costs, paper supplies, etc.? Cost figures in terms of dollars have little or no meaning unless their derivation is understood. More meaningful are costs in terms of time, man units for human work, and actual running times for equipment. Even for use of these units it is necessary to know something of the relative skill of the personnel involved and the equipment configuration used. PERSONNEL COSTS Before examination of the possible breakdown of personnel costs several pertinent points should be considered. It is necessary to state the back- grounds, skills, and levels of experience of the personnel involved. These should include extent of familiatity and experience with the system environment. This environment consists of the equipment, computer or otherwise, and the particular library application involved. It would be advantageous if there were some objective ways of measuring system analyst and programmer performance, rather than reliance on background and experience as measures. Unfortunately there are none. This is a problem that has bothered service bureaus, software houses, and any data processing manager worth his salt. At present there is no clear-cut answer. Some try to measure efficiency by the length of time taken to code a number of program steps. This gives no measure of the efficiency of the program generated, only a guide to the translating ability of the individual involved. It is certainly no measure of the actual program performance on the computer system. In addition it is extremely difficult to estimate accurately the actual running time that a given program should take, especially for a time-sharing system. How then can one measure the effectiveness of a given programmer in achieving such a goal? Another problem to be considered is that of the best program versus the most efficient. It is important that a program be maintainable and capable of being changed easily by another programmer. Too, if equipment changes are contemplated, it is highly desirable that the program be written Costs for Automated Systems/JACOB 209 in a higher-level language, such as COBOL or FORTRAN, which can be used on another machine. Higher-level languages are not as efficient as assembler languages, but generally take less time to write and debug and are usually transferable from one machine to another with only minor modifications. While it is not possible to measure analyst or programmer efficiency accurately, it does help to know the level of experience and the general background of personnel. While an inexperienced analyst or programmer may occasionally be more efficient than an experienced one, this is not generally true. Normally the more experienced man will know a variety of standardized methods or shortcuts that can be used effectively to either shorten running time, coding time, or both. More important, he knows where to start. A sample personnel description might read: Systems and Programming Staff: 1 systems analyst, B.A. in Business Administration, five years' experi- ence with various makes of computers, two years of which were spent as a programmer working with COBOL and FORTRAN, no library background, worked on a part-time basis. 2 programmers, both with high school diplomas, one with one year's experience with COBOL, one trainee with no experience, but high aptitude, both with no library background. 1 library data processing coordinator, Masters in Library Science, manufacturer's course in systems analysis and programming, knowledge of COBOL. Data Preparation Staff: 2 professional librarians with Masters in Library Science. 5 clerk-typists, high school diplomas, two with keypunch ability, three typists with 60 wpm. A breakdown of personnel costs should include : 1 ) planning; 2) actual design (both systems and individual programs); 3) coding (writing of actual programs); 4) testing/debugging; 5) file conversion; 6) actual data preparation and correction (includes new file; prepara- tion and maintenance of existing files); 7) program maintenance. In the planning, design, and coding phases both total time actually spent and the elapsed project time should be given. Testing/debugging and conversion costs are normally one-time costs, but both can amount to a sizable portion of the system cost ( 4). Ideally, if conversion costs can also include file cleanup, it helps make that portion more valuable and easier to accept ( 5). 2IO Journal of Library Automation Vol. 3/3 September, I970 Once the system is in operation, data preparation and correction times become major cost factors. These are usually highest during the initial installation when personnel are learning the new system. Care should be taken not to let the size of initial costs bias the entire cost figure. Once initial training is over, these will reach a more realistic level and will be more indicative of system requirements and actual costs. EQUIPMENT COSTS Just as there are difficulties in determining analyst and programmer efficiency there are similar, though less severe, problems in comparing the efficiency of various machine configurations. Even in comparison of two identical machine configurations, different run times are possible for the same job. The operating system or monitor must also be considered, as must the experience and efficiency of the computer operator. Systems are improving to the point where operator performance is less critical than it once was, but it can still be a significant factor. Equipment costs are largely determined by the machine configuration used. A tape system may have a totally different running time from that of a disc or drum system. The configuration also affects what types of systems may be implemented. For comparison purposes it is necessary to state the make, model, and memory size of the computer used. Memory size should be given in either words or bytes. A byte is the amount of storage required for one alphabetic character, one special character, or one or two numeric characters. If the memory size is specified in words, the word size should also be given. Details of the computer peripherals, such as the general type (i.e., tape drives, disc, printer, card reader, punch, etc.) make, model, and number should be given. For printers, card readers, punches, paper tape units, etc. the speed should also be given (i.e. lines/ minute, card/minute, characters/second, etc.). For storage media, such as disc and drum, the storage capacity should be given. For tape units the tape density should be included. A sample description might read: 1-IBM 360/20 submodel 5, 24K bytes 4-IBM 2415 tape drives, model 3, 800 bpi I-IBM 2560 card reader/punch, reader-500 cpm, punch-160 cpm I-IBM 2203 printer 450 Ipm 2-IBM 2311 disc, model I2, 2.7 million bytes Equipment costs can be subdivided in many ways. A possible method includes: 1) Computer Costs a) Compile times (highly language and computer dependent) b) Test/debug time (should include the entire system as well as individual components) c) Actual run times d ) Maintenance or debug after installation Costs for Automated Systems/JACOB 211 2) Additional Equipment Costs a) Keypunch, paper tape, optical character, other inpu~ devices b) Interpreting punched card output c) Sorting/ collating d) Listings e) Bursting, binding, etc. 3) Special Forms or Material Costs a) Input or work forms b) Punched cards (pre-printed or blank) c) Pre-printed forms d) Carbon sets or NCR forms e) Pre-punched badge or ID cards f) Masters for reproduction g) Special computer printer ribbons While all of the above items may not be applicable to a particular system, all those that are should be included. Large or real-time systems might need additional categories. Compile and testjdebug times are of interest to the systems designer, but are less important than actual run times. Compile times are a function of the computer, the language used, and the complexity of the program. They are more indicative of the compiler efficiency than the system performance. Test/debug times must be allowed for in any system, but data on them will be useful to those having had little experience with automated systems. Experienced designers will be aware of the problem and make adequate allowances for it. Actual run times are a primary cost factor in any system and representa- tive samples should be given. Details concerning the type and volume of input, type of processing, and the type and volume of output should also be stated. Program maintenance costs usually do not appear until after the system has been up and running for some time, and are usually not included in reports of system costs, since most reports are written before, or soon after, system installation. They are important, however, because they represent a part of the continuing cost of the system. Conversion costs can represent a sizable portion of the installation cost of the system. This is especially true if the data must be converted to machine readable form. These costs are of great interest to others engaged in similar conversions and care should be taken to ensure these are accurate. The most obvious type of file conversion is from a record such as a typed list or a catalog card into a machine readable record through keypunching or keytape conversion. File conversion may still exist for a file already in machine readable form if there are differences between it and the files used by the system. Normally such costs are considerably less than con- version from a non-machine-readable form. Exceptions may occur if the file lacks much of the necessary information or if extensive character 212 Journal of Library Automation Vol. 3/3 September, 1970 manipulation is required before the information can be used. An example of a file having insufficient data to warrant conversion might be a card file with very abbreviated authors and titles used for quick listing purposes, when what is wanted is a full shelf list containing all added entries and subjects, full titles, and imprint information. Existing information may require too many corrections to expand the authors and titles to provide any really usable information. In other words it might be more economical to repunch the file from scratch than to try to edit and punch corrections. Non-computer equipment costs should not be neglected. While capital investments in such equipment may be small, the time spent in using the equipment can often be lengthy. This is particularly true of input devices. Non-computer equipment includes such items as keypunches; keytape units; and any unit record equipment, such as collators, sorters, interpreters, xerox machines, typewriters, guillotines, etc. Just as for computer equip- ment the type, make, model, quantity used, and special features should be given. A sample description of such equipment might be: 1 IBM Selectric typewriter, ASCII OCR type element 2 IBM 029 Keypunches (no special features) 1 IBM 82 Sorter 1 IBM 85 Collator In a system using a large number of punched cards or large volumes of paper for printing, these too can be significant cost factors. Again the volume of usage may be more helpful than actual dollar cost. Special or pre-printed forms are usually more expensive than plain forms, so it is important to state types as well as quantities. The actual dollar cost should be stated as well. An example of materials used to produce a small printed catalog with shortened entries on a six month cycle is: 600 DiKote masters (for multilith reproduction) at $56.00/1000 masters 1000 Pre-printed punched cards at $1.50/1000 cards 1 IBM 1403 computer printer ribbon, No. 413197 1200 pages, standard, lined, 14% x 11-inch computer printer paper PRESENTATION OF COSTS The format for presenting cost data could be divided as follows: Personnel: a brief paragraph describing the number, types, backgrounds, and skills of all personnel involved with the system. Equipment: Computer equipment: a brief statement of the computer make, model, and memory size; type, model and number of peripherals. Additional equipment: a brief statement of the types, makes, models, and numbers of any other equipment necessary for the successful operation of the system. Materials : a brief statement of the types and quantities of forms used, and for special forms an indication of the actual dollar cost as well. Table 1. Cost Control Form For SDI System Functions Planning Actual Design Coding Compile Testing/Debugging File Conversion Data Preparation/Correction per run Individual Job Run Citations/External Citations /Internal Profile Update Decollating Bmsting Printing Computer paper ( 141.4 x 11) Citations /External Citations /Internal Profile Update Elapsed 1 month 3 months 4 months 3 months 2 months .5 months Personnel Total Type Hours 80 Analyst 20 Librarian 98 Analyst 20 Librarian 90 Analyst 2 Analyst 5 Analyst 15 Clerk 1 Clerk 3 Librarian .1 Clerk .1 Clerk .1 Clerk Equipment and Material Time Number (in hours) Type 1 1 1 1 1 1 1.2 CDC3600 1 .2 CDC3600 ~ .,... c 1 .5 CDC3600 ~ ;:::. ..... 1 .05 CDC3600 ~ ~ 1 .1 CDC3600 en .2 ~ "' .,... .2 ~ ~ "' ........ ._ > 80 pages n 0 50 pages t:P 50 pages 1:-0 1-' I eN 214 Journal of L ibrary Automation Vol. 3/3 September, 1970 Fig. 1. Profile Update . f'ROFII. £ LIPDATES SORT (MAN ND . S£ tot·) SoRT fA I.PHA Sl£ al•) Costs for Alltomated Systems/JACOB 215 .SG/.£~T /. FDifi"'A -r T£RMS .SORT (AJ.PJIA 7"'£RM s.~ta.) MA'f'~/J PlfDI'IL£./ CITA-r/IIN 7'LifMS Fig . 2. Citation Run. !1DRT CCITATI~N .s£Q,) SLIM AIJD S£L£Cr ~ORT lit A II. .S&D) PI? !NT Norl(.£.5 SJJI NDT/C.L.S 216 Journal of Library Automation Vol. 3/3 September, 1970 Table 1 shows a simple presentation of system cost. The table is a suggested form only and is not exhaustive; it can be expanded as needed for more complex systems. The information and figures given in the table illustrate the system discussed in the following section. The purpose is to provide a sample, not to describe the system in detail, and consequently the system description is very brief. SYSTEM DESCRIPTION A Selective Dissemination of Information System was developed to serve a small group of engineers in a scientific laboratory. One source of input consisted of current accessions obtained as a by-product of regular weekly runs to create a master shelf-list file in machine readable form. Another input file was obtained by subscription to a commercial tape service supplying journal, book and report citations. While most of the programs developed for the system were new, it was possible to modify some existing programs for use in the new system. File conversion was required from an existing profile tape used in the previous SDI (IBM package 1401-CX-01) for the format used by the new system. The greater capabilities of the new system also resulted in numerous modifications and expansions of the profiles. The profile master containing a description of user interests had just under 100 profiles representing 40 separate groups. Most profiles were for groups rather than individuals; these were updated only as needed. The citation tape contained slightly over 8,000 journals and book citations per week. The internal citation tape contained 180 report citations per week. An average of 400 notices per weekly run for the external citations and 200 notices per weekly run for the internal citations were generated. Systems How for the profile update and the citation runs are contained in Figures 1 and 2. The language used for the system was COBOL. Development Personnel 1 Analyst/Programmer with three years COBOL, two years AUTO- CODER, professional librarian with four years library experience, worked with IBM 1401, 1410 and CDC 3600 1 Professional librarian with 15 years' experience in all phases of library work, knowledge of computers, but no programming or analysis experience 1 Clerk-Typist, BA in English, 60 wpm typist, self-taught keypunch operator, worked in library four years Equipment Configuration Computer Equipment 1 CDC 3600, 65 K (words), 8 bytes/word 8 CDC 604 tape drives, 200/500/800 bpi, 7 track, 37.5 inches per sec. Costs for Automated Systems/JACOB 217 2 CDC 861 Magnetic Drwns at 4.2 million characters, 17 ms access time, 2 million cps transfer rate 1 CDC 405 Card reader, photoelectric, 1200 cpm 1 CDC 415 Card Punch, 250 cpm 2 CDC 501 printers, 1000 1 pm, 64 char. print set, 136 char. line 1 CDC 3601 Console Non-computer Equipment 1 IBM 026 keypunch (no special features) 1 Decollator 1 Burster 1 Hand perforator Materials Standard ( 14lh x 11), lined, computer printer paper Blank punched cards Magnetic tape subscription @ $5000./year GENERAL CONSIDERATIONS How well the system attains its intended goals within the desired limits of design, development, and operating costs is the most important considera- tion. Design and development costs are usually initial costs only, but operating costs continue as long as the system functions. Operating costs must include the cost of data preparation, computer run times, cost of program maintenance, additional equipment costs, and cost of special forms or materials needed. Careful consideration should be given to allowing sufficient money to be spent in design and development so that overal1 operating costs, especially those of data preparation and computer run times, can be reduced. REFERENCES 1. Griffin, Hillis L.: "Estimating Data Processing Costs in Libraries," College and Research Libraries, 25 (Sept. 1964), 400-03, 431. 2. Fasana, Paul J.: "Determining the Cost of Library Automation," A.L.A. · Bulletin, 61 (June 1967, 656-61). 3. Landau, Herbert B.: "The Cost Analysis of Document Surrogation: A Literature Review," American Documentation, 20 (Oct. 1969), 320-310. 4. Gregory, Robert H.; Van Horn, Richard L.: Automatic Data-Processing Systems: Principles and Procedures (Belmont, Ca: Wadsworth, 1963 ). 5. Hammer, Donald P.: "Problems in the Conversion of Bibliographic Data: a Keypunching Experiment," American Documentation, 19 (Jan. 1968), 12-17. 5255 ---- A SCATTER STORAGE SCHEME FOR DICTIONARY LOOKUPS D. M. MURRAY: Department of Computer Science, Cornell University, Ithaca, New York Scatter storage schemes are examined with respect to their applicability to dictionary lookup procedures. Of particular interest are virtual scatter methods which combine the advantages of rapid search speed and reason- • able storage requirements. The theoretical aspects of computing hash addresses are developed, and several algorithms are evaluated. Finally, experiments with an actual text lookup process are described, and a possible library application is discussed. A document retrieval system must have some means of recording the subject matter of each document in its data base. Some systems store the actual text words, while others store keywords or similar content indicators. The SMART system ( 1) uses concept numbers for this purpose, each number indicating that a certain word appears in the document. Two advantages are apparent. First, a concept number can be held in a fixed- sized storage element. This produces faster processing than if variable- sized keywords were used. Second, the amount of storage required to hold a concept number is less than that needed for most text words. Hence, storage space is used more efficiently. SMART must be able to find the concept numbers for the words in any document or query. This is done by a dictionary lookup. There are two reasons why the lookup must be rapid. For text lookups, a slow scheme is costly because of the large number of words to be processed. For handling user queries in an on-line system, a slow lookup adds to the user response time. 174 Journal of Library Automation Vol. 3/3 September, 1970 Storage space is also an important consideration. Even for moderate sized subject areas the dictionary can become quite large-too large for computer main memory, or so large that the operation of the rest of the retrieval system is penalized. In most cases a certain amount of core storage is allotted to the dictionary, and the lookup scheme must do the best possible job within this allotment. This usually means keeping the overhead for the scheme as low as possible, so that a large portion of the allotted core is available to hold dictionary words. The rest of the dic- tionary is placed in auxiliary storage and parts of it are brought in as needed. Obviously the number of accesses to auxiliary storage must be minimized. This paper presents a study of scatter storage schemes for application to dictionary lookup, methods which appear to be fast and yet conservative with storage. The next two sections describe scatter storage schemes in general. They are followed by a section presenting the results of various experiments with hash coding algorithms and a section discussing the design and use of a practical lookup scheme. The final sections deal with extensions and conclusions. BASIC SCATTER STORAGE Method A basic scatter storage scheme consists of a . transformation algorithm and a table. The table serves as the dictionary and is constructed as follows: given a natural language word, the algorithm operates on its bit pattern to produce an address, and the concept number for the word is placed in the table slot indicated by this address. This process is repeated for every word to be placed in the dictionary. The generated addresses . are called hash addresses; and the table, a hash table.· · There are many possible algorithms for producing hash addresses ( 2,3,4). Some of the most common are: 1 ) choosing bits from the square of the integer represented by the input word; 2) cutting the bit pattern into pieces and adding these pieces; 3) dividing the integer represented by the input word by the length of the hash table and using the remainder. · Collisions In an ideal situation every word placed in the dictionary would have a unique hash address. However, as soon as a few slots in the hash table have been filled, the possibility of a collision arises-two or more words producing the same hash address. To differentiate among collided entries, the characters of the dictionary words· must be stored along with their concept numb~rs. During lookup, the input word can then be com- pared with the character string to verify that the correct table entry has been located. · · The problem of where to store the collided items has several · methods of solution ( 3,5). The linear scan method places a collided item in the first free table slot after the slot indicated by the hash address. The scan· is Scatter Storage for Dictionary Lookups/MURRAY 175 circular over the end of the table. The random probe method uses a crude algorithm to generate random offsets R(i) in the interval [1,H] where H is the length of the hash table. If the colliding address is A, slot A+R( 1) mod H is examined. The process is repeated until an empty slot is found. Both of these methods work best when the hash table is lightly loaded; that is, when the ratio between the number of words entered and the number of table slots is small. In such cases the expected length of scan or average number of random probes is small. Chaining methods provide a satisfactory method of resolving collisions regardless of the load on the hash table. However, they require a second storage table-a bump table-for holding the collided items. When a collision occurs, both entries are linked together by a pointer and placed in the bump table. A pointer to this collision chain is placed in the hash table along with an identifying flag. Further colliding items are simply added to the end of the collision chain. Table Layout and Search Procedure In the virtual scatter storage system described later, the hash table has a high load factor. Hence the chained method (or rather a variation of it) is used to resolve collisions. Further discussion involves only scatter storage systems using collision chains. With this restriction, then, a scatter storage system consists of a hash table, a bump table, and the associated algorithm for producing hash addresses. A dictionary entry consists of a concept number and the character string for the word it represents. These entries are placed in the hash-bump table as described above. Consequently there are three types of slots in the hash table-slots that are empty, slots holding a single dictionary entry, and slots containing a pointer to a collision chain held in the bump table. Figure 1 is a typical table layout. Hash Table 0 empty slot • • Concept + Char • nary entry single dictio . . Pointer -..J.,ntry 11 'r --~)~Entry 21 \ Collision Cha in Fig. 1. Typical Table Layout. 176 Journal of Libmry Automation Vol. 3/3 September, 1970 One of the advantages of scatter storage systems is that the search strategy is the same as the strategy for constructing the hash-bump tables. A word being given, its hash address is computed and the tables searched to find the proper slot. During construction, dictionary information is placed in the slot; during lookup, information is extracted from the slot. The basic search procedure is illustrated by the flow diagram in Figure 2. The construction procedure is similar. Pointer,----< Get Next Bump Table Entry Input the Text ~rd COIIlpUte Hasii. Address Return Concept Number Word Never Entered in Dictionary Fig. 2. Flow Diagram for the Lookup Procedure in Basic Scatter Storage Systems. Scatter Storage for Dictionary Lookups/MURRAY 177 Theoretical Expectations An ideal transformation algorithm produces a unique hash address for each dictionary word and thereby eliminates collisions. From a practical point of view, the best algorithms are those which spread their addresses uniformly over the table space. Producing a hash address is simply the process of generating a uniform random number from a given character string. If the addresses are truly random, a probability model may be used to predict various facts about the storage system. Suppose a hash table has H slots and that N words are to be entered in the hash-bump tables. Let H, be the expected number of hash table slots with i entries for i=0,1, ... N. In other words, Ho is the expected number of empty slots, H1 is the expected number of single entries, and H2,Hs, ... , HN are the expected number of slots with various numbers of colliding items. Even though the items are physically located in the bump table, they may be considered to "belong .. to the same slot in the hash table. It is expected that: N 1) H=~ H. i=O N 2) N="S i H, i=O Now let X _ (1 if exactly i items occur in the r~~ slot ' 1 - ~0 if exactly i items do not occur in the j'11 slot for i = 1,2, ... , H Then H, = E [Xu + X.2 + ... + Xm] H = ~ E [X,J] i= 1 Assume that any chosen table slot is independent of the others so that the probability of getting any single item in the slot is 1/H. Then the probability of getting exactly i items in that slot is 3)P·= (~X1r(1-1f Then E[X,1] = 1·P, + 0· (1-P,) = P, Substituting into the above 4) H,= H·P, = n( ~X~) i ( 1- ~ ri for j = 0,1, ... > N 178 Journal of Library Automation Vol. 3/3 September, 1970 For the cases of interest H and N are large, and the Poisson approximation can be used in equation 3: P - ·NIH (N/H)' •- e il The ratio N f H is the load factor mentioned previously. It is usually designated by a so that a• 5) H, = He·a if i=O,l, . . . , N Equation 5 is sufficient to describe the state of the scatter storage system after the entry of N items. Most of the statistics of interest can be pre- dicted using this expression; a few of them are listed in Table 1. The time required for a single lookup using a hash scheme depends on the number of probes into the table space, that is, how many slots must be examined. Suppose the word is actually found; if it is a single entry, only one probe is required. If the word is located in a collision chain, the number of probes is one (for the hash table) plus one additional probe for each element of the collision chain that must be examined. Suppose that the word is not in the dictionary; if its hash address corresponds to an empty table slot, again only one probe is needed. However, if the address points to a collision chain, the number is one plus the length of the chain. For words found in the dictionary the average number of probes per lookup is : I 6) P = 1 + N[(O)Ht + (1+2)Hz + (1+2+3)Hs + ... N i = 1 + ~ H , ~ f i=2 f=1 1 N = 1 + 2 ~ (i+1)Fi-1 i=2 1 N 1 N =I+ 2 ~ (i-1) F i-1 +2 l F •. t i=2 i=2 1 N+1 1 N+1 = 1 + 2 ~ (i-1)Fi-t + 2 ~ F1-1 i=2 i=2 (probes) + (1+2+ ... + N)HN] Scatter Storage for Dictionary Lookups/MURRAY 179 Table 1. Expected Storage and Search Properties fo1' Basic Scatter Storage Schemes Measure Load factor Number of empty table slots Number of single entries Number of collision chains of length i Expected sums Fraction of hash table empty Fraction of table filled with single entries Fraction of hash table slots with i entries Expected sums Number of collisions Number of entries in the bump table Total table slots required Average lookup time (probes) H = number of hash table slots N = number of words to be entered Formula a=N/H Ho =He-a H1 = Ne-a ai Hi = H e-a--:-r i = 2,3, ... , N z. N H = ~--Hi i=O N N =~--Hi i=O 1 Fo = H Ho= e-a \ 1 - F1 = H H1 = aea 1 a' F, = H Hi = e-a if i = 2,3, ... , N N 1 = ~ F, i=O N a = ~ i F, i=O No = H2 + H a + ... + HN = H - Ho-Hl B = N-Hl S = H+B 180 Journal of Library Automation Vol. 3/ 3 September, 1970 VIHTUAL SCATTER STORAGE Method From Table 1, the expected number of collisions is Nc= H- Ho- Ht = H( 1 - e ·Ntn _ ~e·NIH) For a fixed N, this number decreases as H increases. At the same time the number of empty hash table slots Ho = H e·Nt n increases as H increases. Both of these results are expected; as the hash addresses are spread over a larger and larger table space ( H slots), the number of collisions should decrease and the number of empties increase for a fixed number of entries ( N). A virtual scatter storage scheme tries to balance these opposing strains by combining hash coding with a sparse storage technique. Large or virtual hash addresses are used to obtain the collision properties associated with a very large hash table, and the storage technique is used to achieve the storage and search properties of a reasonably sized hash table. If the virtual hash address is taken large enough, the expected number of collisions can be reduced to essentially zero. With no expected collisions, it is possible to dispense with verifying that a query word and the dictionary word are the same. It is enough to check that they produce the same virtual address. Hence, the character strings need not be stored in the hash-bump tables at all. To implement the virtual scheme a large hash address is computed, say in the range ( 0, V), and the address is split into a major and minor part. The major portion is used just as before-as an index on a hash table of size H. The minor portion is stored in the hash or bump table, in place of the character string. With this difference, the virtual scheme works just as the basic scheme does. The lookup procedure is identical, but the minor portions are used for comparison rather than character strings. All the results of the previous section apply as storage and timing estimates. The advantage of virtual scatter storage systems is economy of storage space. The minor portion is much smaller in size than that of the character string it replaces. It is true that the virtual scheme assigns the same concept number to two different words if they have the same virtual address. This need not be disastrous for document retrieval applications. Presumably V is chosen large enough to keep the number of collisions small. On the one hand, errors could be neglected because of their low probability of occurrance and their small effect on the total performance of the retrieval system. On the other hand, it is always possible to resolve detected collisions even in a virtual scheme. Collisions may be detected during dictionary construction or updating, and the characters for the Scatter Storage for Dictionary Lookups/MURRAY 181 colliding words appended to the bump table. The hash or bump table entry must contain a pointer to these characters along with an identifying flag. Collisions occurring during actual lookups cannot be detected. Collision Problem "' In order to use a virtual hash scheme, the virtual table must be large enough to reduce the expected number of collisions to an acceptable level. From a practical point of view, a collision may be considered to involve only two words, rather than three, four, or more. It is assumed that the probability of these other types of collisions is negligible. Let V be the size of the virtual hash table. Then the expected number of collisions is simply N. =H2 a2 = V2e·a where a = ~ . In this case V> > N so that a is small and e·a is approxi- mately 1. a2 7) N.=V2 N2 =2v Suppose, for example, the dictionary has N = 213 words. If the size of the virtual hash table is chosen to be V = 226, then the expected number of collisions is (213)2 1 Nc = 2(226) = ]' Suppose further that this table size is adopted for the dictionary, and that the hash code algorithm produces three collisions. The question arises whether the algorithm is a good one-whether it produces uniform random addresses. The answer is found by extending the previous probability model. Consider a virtual scatter storage scheme in which the virtual table size is V, and N items are to be entered into the hash-bump tables. Again assume that collisions involve only two items. Let P(i) =Prob [i collisions] = Prob [i table slots have 2 items and N-2i slots have 1 item] The number of ways of choosing the i pairs of colliding words (in an ordered way) is: (~X N22} .. ( N-~+2 ) - 2' (~~2i)l 182 Journal of Library Automation Vol. 3/3 September, 1970 There are il ways of ordering these pairs and VI (V)N-i = (V-N+i)l ways of placing the pairs in the hash table, so that ( . Nl (V)N-' t N 2 J 8) P l) = 21il (N-2i)! -yr fori= 0,1, ... , In a form for hand computation, 1 2 N-1 9) P(0)=(1--y) (1-y-) ... (1---y) P( ') =P('- 1 ) (N-2i+ 2) (N-2i+1) f t ' 2i(V-N+ i) or i=1,2, ... , These results are exact, but the following approximations can be used with accuracy N-1 . log P ( 0) = ~ log ( 1 - ~ ) i=l N-1 . =~ - -' . 1 v 7= Nz - -2V Let f3 = ~; . Terms linear in N may be neglected in equation 9, giving P(O) = exp(-fi) P(i) = ~ P(i-1) 1 This is also a Poisson distribution: 10) P(i) = exp(-fi) ft for i = 0,1,2, .. . , l ~ J This equation gives the approximate probability of i collisions for a virtual scatter storage scheme. It may be used to form a confidence interval around the expected number of collisions Nc = /3. For the previous example in which V = 22 6, N = 213, N c = ~'the follow- ing table of values can be made: i P(i) ~P(i) 0 .607 .607 1 .303 .910 2 .076 .986 3 .012 .998 \ Scatter Storage for Dictionary Lookups/MURRAY 183 The probability is .986 that the number of collisions is less than or equal to 2. Since the algorithm gave 3 collisions, it appears to be a poor one. The results for the collision properties are summarized in Table 2. Table 2. Expected Collision Properties for Virtual Scatter Storage Systems Measure Collision factor Expected number of collisions Probability of i collisions Probability that the number of collisions C lies in [ a,b] V virtual hash table size Formula N2 {3= 2V N.=P P( i) = exp(-{3) ~' i=O, 1, ... , [ ~ J b Prob = ~ P(i) i=a N number of words to be entered EXPERIMENTS WITH ALGORITHMS FOR GENERATING HASH ADDRESSES Any scatter storage scheme depends on a good algorithm for producing hash addresses. This is especially true for virtual schemes in which colli- sions are to be eliminated. In these experiments three basic algorithms are evaluated for use in virtual schemes. The words in two dictionaries- the ADI Wordform and CRAN 1400 \iVordform-are used. The hash-bump tables are filled using these words and the resulting collision and storage statistics compared with the expected values. Dictionaries The ADI Wordform contains 7822 words pertaining to the field of docu- mentation. It contains 206 common words (previously judged) averaging 3.93 characters. The remaining 7616 noncommon words average 8.00 characters. In all there are 61,712 characters. The CRAN 1400 Wordform contains 8926 words dealing with aero- nautics. The common word list consists of that of the ADI, plus four additional entries. The 8716 noncommon words average 8.40 characters. There is a total of 74,074 characters. Figures 3 and 4 show the distribution of the length of the words versus percentage of collection. The abrupt end to the curves in Figure 3 is due to truncation of words to 18 characters. Both dictionaries have approximately the same size and proportions of words of various length. However, their vocabularies are considerably different. A good hash scheme should work equally well on both dic- tionaries. 184 Journal of Library Automation Vol. 3/3 September, 1970 1/) 0 Common Words "E ~ ~ ADI >- 0 CRAN 1400 ~ 0 c 0 += 0 ·- 0 -0 -c Q) ~ 8 Q) a.. 0 2 4 6 8 10 14 Word Length Fig. 3. Distribution of Dictionary Words According to Their Lengths. >. '- 0 c .Q -u 0 -0 Q) .~ -a :; E ::J u \ Scatter Storage for Dictionary Lookups/MURRAY 185 0 Common Words 6 ADI 0 CRAN 1400 0 2 4 6 8 10 12 14 16 18 20 Word Length Fig. 4. Cumulative Distribution of Dictionary Words According to Th eir L engths. 186 Journal of Library Automation Vol. 3/3 September, 1970 Hash Coding Algorithms By their nature, hash coding algorithms are machine dependent. The computer representation of the alphabetic characters, the way in which arithmetic operations are done, and other factors all affect the randomness of the generated address. The algorithms described below are intended for use on the IBM S /360. Words are padded with some character to fill an integral number of S /360 full words. Then the full words are combined in some manner to form a single fullword key, and the final hash address is computed from this key. In the experiments which follow, the blank is used as a fill character. This is an unfortunate choice because of the binary representa- tion of the blank 01000000. In some algorithms the zeroes may propagate or otherwise affect the randomness. A good fill character is one that 1) is not available on a keypunch or teletype, 2) will not propagate zeroes, 3) will generate a few carries during key formation, and 4) has the majority of its bits equal to 0, so their positions may be filled. A likely candidate for the S/360 is 01000101. Three basic methods of generating virtual hash addresses-addition, multiplication, and division-are studied. The first and second provide contrasting ways of forming the single fullword keys. The second and third differ in the way the hash address is computed from the key. Variations of each basic method are also tested to try to improve speed, programming ease, or collision-storage properties. l. Addition Methods AC-addition and center The fullwords of characters are logically added to form the key. The key is squared and the centermost bits are selected as the major. The minor is obtained from bits on both sides of the major. AS-addition with shifting Same as AC, except the second, third, etc. fullwords are shifted two positions to the left before their addition in forming the key. (An attempt to improve collision-storage properties) AM-addition with masking Same as AC, except the second, third, etc. fullwords have certain nonsignificant bits altered by masks before their addition in forming the key. (An attempt to improve collision-storage properties) 2. Multiplication Methods MC-multiply and center The fullwords of characters are multiplied together to form the key. The center bits of the previous product are saved as the multiplier for the next product. The key is squared and the centermost bits selected as the major. The minor is obtained from the bits on both sides of the major. Scatter Storage fo1' Dictionary Lookups/MURRAY 187 MSL-multiply and save left Same as MC, but during formation of the key, the high order bits of the products, rather than the center, are used as successive multipliers. (An attempt to improve speed) MLM-multiply with left major Same as MC, but taking the major from the left half of the square of the key and the minor from the right half. (An attempt to improve speed) 3. Division Methods DP-divide by prime The fullwords of characters are multiplied together to form the key. The center bits of the previous product are saved as the multiplier for the next product. The key is divided by the length of the virtual hash table-a prime number in this case-and the remainder used as the virtual hash address. The major is drawn from the left end of the virtual address and the minor from the right. DO-divide by odd number Same as DP, except using a hash table whose length is odd. (An attempt to provide more flexibility of hash table sizes ) DT -divide twice Same as DP, except two divisions are made. The major is produced by dividing the key by the actual hash table size. The minor results from a second division. Primes are used throughout as divisors. (An attempt to improve storage-collision properties) Evaluation In the experiments to evaluate each variation of the above hash schemes, the size of the virtual hash table varies from 220 to 228 slots. The actual hash table varies in size from 212 to 214 slots. Bump table space is used as needed. The tables are filled by the words from either the ADI or CRAN dictionaries and the collision and storage statistics taken. Because good collision properties are most important, they are examined first. The storage properties are dealt with later. The number of collisions obtained from each scheme versus the virtual table length is plotted in Figures 5 to 8. The ADI dictionary is shown in Figures 5 and 7, and the CRAN in Figures 6 and 8. The circled lines cor- respond to curves generated from equations 7 and 10. The horizontal one shows the expected number of collisions and the lines above and below it enclose a 95% confidence interval about the expected curve. In other words, if an algorithm is generating random . addresses, the probability is 95% that the curve for that scheme lies between the heavy lines. Consider Figures 5 and 6 showing the results for all the addition methods and the MC variation of the multiplication variation. The AC and MC algorithms differ only in that addition is used in forming the key in the 188 Journal of Library Automation Vol. 3/3 September, 1970 -0 ooooooo Theoretical Curves (Equations (7) and ( 1 0) Experimental Curves ---- Interpolated Curves Virtual Hash Table Size (Power of two) Fig. 5. Collisions in the ADI Dictionary for Addition and Multiplication Hash Schemes. first one and multiplication in the second one. Yet the curves are spec- tacularly different. The result seems to have the following explanation. The purpose of a hash address computation is to generate a random number from a string of characters. If the bits in the characters are as varied as possible, then the algorithm has a headstart in the right direction. However, the S/360 bit patterns for the alphabet and numbers are: A to I 1100 xxxx J to R 1101 xxxx S to Z 1110 xxxx 0 to 9 1111 xxxx Scatter Storage for Dictiona1·y Lookups/MURRAY 189 en c: 0 en 0 (.) -0 ... Q) .0 E ::t z 20 ooooooo Theoretical Curves (Equations (7) and (l 0) Experimental Curves --- - Interpolated Curves 26 Virtual Hash Tobl e Size (Power of two) Fig. 6. Collisions in the GRAN Dictionary for Addition and Multiplication Hash Schemes. c 28 In each case the two initial bits of a character are l's, so that in any given word one-fourth of the bits are the same. In forming a key, the successive additions in the AC algorithm may obscure these nonrandom bits if a sufficient number of carries are generated. However, the number of additions performed is usually small-2 or 3- and it appears that the pattems are not broken sufficiently. The MC algorithm uses multiplication to form its keys, which involves many additions-certainly enough to make the resulting key random. The multiplications in the MC algorithm are costly in terms of computa- tion time. Therefore the AS and AM algorithms are tried. These addition 190 Journal of Library Automation Vol. 3/3 September, 1970 en c 0 ·~ 0 u .... 0 ooooo Theoretical Curves Experimental Curves Interpolated Curves 22 20 Virtual Hosh Table Size (Power of two} Fig. 7. Collisions in the ADI Dictionary for Division and Multiplication Hash Schemes. variants try to hasten the breakup of the nonrandom bits by shifting and masking respectively. Although these variants reduce the number of collisions somewhat, none of the addition schemes could be called random. Typically a few words are singled out at some point and continue to collide regardless of the length of the virtual address. Several collision pairs are listed below. Note the similarities between the words. COUNT WORTH TOLERATED WHEEL -SOUND -FORTY -TELEMETER -SHEET In 1: 0 ·;;; 0 0 ... 0 ... Cl) .D E :I z 20 Scatter Storage for Dictionary Lookups/MURRAY 191 0000000 \ \ ,---- \ \ \ \ 26 Theoretical Curves (Equations (7) and (1 0) Experimental Curves Interpolated Curves 28 Virtual Hash Table Size (Power of two) Fig. 8. Collisions in the GRAN Dictionary for Division and Multiplication Hash Schemes. Consider the multiplication algorithms. During key formation, the pro- cess of saving the center of successive products adds to the computation time. The MSL variation attempts to remedy this by saving only the high order bits between multiplications (on the S /360 this means saving the upper 32 bits of the 64-bit product) . This method is so inferior that its collision graph could not be included with the others. The poor results stem from the fact that characters at the end of fullwords have little effect on the key and that the later multiplications swamped the effects of the earlier ones. Examples of collision pairs are given below. For convenience the fullwords are separated by blanks. 192 Journal of Library Automation Vol. 3/3 September, 1970 CERTAINTY PREVENTED HEAVING EXPE NSE CHARTER - CERTAINLY -PRESENTED -HEAT lNG -EXPANSE -CHAPTER The MC and MLM variants are identical with respect to collision proper- ties. In general these algorithms produce good results, reducing the number of collisions to zero in both dictionaries. The collision curve is always beneath the expected one. Consider Figures 7 and 8 showing the results for all division methods and the MC method. All of the division algorithms display a distinct rise in the number of collisions when the virtual table size is near 224-regardless of the dictionary. The majority of the colliding word pairs are 4-character words having the same two middle letters. This brings to light a curious fact about division algorithms. For virtual tables, the divisor of the key is large and the initial few bits determine the quotient, leaving the rest for the remainder. For words of less than 4 characters (which require no multiplications during key formation), dividing by 224 is equivalent to selecting the last 3 characters of the word as the hash address. Because the divisors are not exactly equal to 224, only the two middle characters tend to be the same. Examples are: DEAL -BEAR TOOK -SOON HELD -CELL VERB -TERM This phenomenon apparently continues for table sizes around 226 and 228, but there are few or no words of 4 characters or less which agree in 26 or 28 bits. For divisors smaller than 22 \ a larger part of the key determines the quotient and apparently breaks up the pattern. Because the above effect occurs only for V = 22 \ these points are passed over on the graphs. In general, the DT algorithm is superior to the rest of the division methods, mostly because each of its two divisors is smaller than those used in other methods. Prime numbers seem to produce better results than other divisors. On the basis of collision properties, the MC, MLM, DT, and possibly AS algorithms are the best. Storage-search evaluations are included for these methods only. The experiments with each hash coding method also include counting the frequency of various length of collision chains. Here a collision chain refers to chains of words producing the same major. The frequency counts are compared with the expected counts given by equation 5. The com- parison is in terms of a chi-square goodn ess-of-fit test with a 10 % level of 0 8 :;::: t/) -0 -(j) Q) ~ 0 ;:) CT (j) I .c u 2 0 Scatter Storage for Dictionary Lookups/MURRAY 193 x·---x·---x or----DT / DT A~AS MLM AS ----AS,DT MLM--MLM MLM MC MC-MC ~C ----MC Virtual Hash Table Size (Power of two) X- curve for 10% level of significance Fig. 9. Deviations of Storage-Search Properties from Expected Values for Selected Hash Schemes Using the ADI Dictionary. significance. Figures 9 and 10 show the results of this test for each diction- ary. Included in the graphs is the line corresponding to the 10% level of significance. If the major portions of the hash addresses are really random, there is a probability of 0.90 that the 10% line will lie above the curve for the algorithm tested. Consider the MC and MLM algorithms which differ only in that the major is selected from the center and left of the virtual address. From the graphs, it is clear that the multiplication methods produce their most 194 Journal of Library Automation Vol. 3/3 September, 1970 .~ -.!:!! -0 -CJ) ~ 0 8 ~ c:1' CJ) I ..c (.) 6 4 MLM x--- DT Virtual Hash Table Size (Power of two) X- curve for 10% level of significance Fig. 10. Deviations of Storage-Search Properties from Expected Values for Selected I-Iash Schemes Using the GRAN Dictionary. random bits in the center of their product. This is somewhat as expected, because the center bits are involved in more additions than other bits. The division algorithm, which had fairly good collision properties, seems to have rather mediocre storage properties. This is probably due to the Scatter Storage for Dictionary Lookups/MURRAY 195 same causes as the collision problems, but working at a lower level, and not affecting the results as much. The AS curve is included simply for completeness. The scheme displays a well behaved storage curve, but it has poor collision properties. In summary, the MC scheme seems to be the best for both dictionaries in terms of collision and search properties. In terms of computing time, the method is more time consuming than the addition methods, but less expensive than the division methods. The difference in computation times is not an extremely big factor. All methods required from 35 to 55 micro- seconds for an 8-character word on the S/360/65. The routines are coded in assembly language and called from a Fortran executive. The times above include the necessary bookkeeping for linkage between the routines. A PRACTICAL LOOKUP SCHEME General Description The lookup scheme described below is designed for use with dictionaries of about 21 :. words. The virtual table size selected is 229 and the actual table size is 216• On the basis of the results presented in previous sections, when the dictionary is full, it is expected that 1) 36.8% of the hash table will be empty, 2) 36.8% of the hash table will be single entries, 3) the bump table will require ( 0.632 )215 entries, 4) 1 collision is expected, 5) the probability of 5 or fewer collisions is 0.999, and 6) the average lookup will require 2.13 probes. Table Layout In all previous discussions a dictionary entry has included a minor and a concept number. A concept number is simply a unique number assigned to each word. The hash address of a word is also unique, and hence can be used. There is no need to store and use a previously assigned concept number. A dictionary entry contains a 14-bit minor and a single bit indicating whether the word is common or noncommon: 1 2 15 IC Minor C = 0 implies the word is common; C = 1 implies the word is noncommon. A hash table entry contains 16 bits arranged as : 0 1 15 I Flag I Information Flag = 0 implies that the information is a dictionary entry; Flag = 1 implies that the information is a pointer to the bump table. Words that have the same major are stored in a block of consecutive 196 Journal of Library Automation Vol. 3/3 September, 1970 locations in the bump table. This eliminates the need for pointers in the collision "chains". A bump table entry also has 16 bits structured as: 0 1 2 w I End I C Minor End= 0 implies that the entry is not the last in the collision block; End = 1 implies that the entry is the last in the block. Some convention must be adopted to signify an empty hash table slot. A zero is most convenient in the above scheme. Unfortunately a zero is also a legitimate minor. However, to cause trouble the word generating the zero minor would have to be a common word and a single table entry (zero minors in the bump table are no problem). Hopefully this occurs rarely because of the size of the minor ( 14 bits) and the small number of common words. However, even if this combination of circumstances occurs, the common word could be placed in the bump table anyway. In designing the tables, it is important to make the hash table entries large enough to accommodate the largest pointer anticipated for the bump table. For the above scheme, the expected bump table size is less than 215 so that the 15 bits allocated for pointers is sufficient. Search Considerations The number of probes needed to locate any given word depends on the place that the word occupies in a collision block. The average search time is improved if the most common words occupy the initial slots in each block. A study of ADI text yields the statistics given in Tables 3 and 4. Table 3. Division of Words by Categm·y. Number of Words Percent of Total 17270 Total words 100.0 8716 Common words 50.5 8554 Noncommon words 49.5 Table 4. Distribution of L engths. Number of All Common Non- Characters Words Percent Words Percent common Percent Words 1-4 10145 58.8 8057 92.5 2097 24.5 5-8 4630 26.8 627 7.2 4003 46.8 9-12 2249 13.0 32 0.3 2217 25.9 13-16 221 1.3 0 0.0 221 2.6 17-20 11 0.1 0 0.0 11 0.1 21-24 5 0.0 0 0.0 5 0.1 Totals 17270 100.0 8716 100.0 8554 100.0 Av. Length 6.3 4.3 8.3 Scatter Storage for Dictionary Lookups/MURRAY 197 Using the categorical information, it appears that in filling the hash-bump tables, the common words should be entered first. Within each category, all words should be entered in frequency order if such information is known. If frequency information is not available, the distribution by lengths can be used as an approximation to it. For common words, this means entering the shorter words first. For noncommon words, the words of 5 to 8 characters should be entered first. The greater the number of single entries, the greater the average search speed. Figure 11 shows the fraction of single entries ( F 1) and fraction of empty slots ( F o) for various load factors. The fraction of single entries .l: IJI 0 I -0 c 0 -(.J 0 ~ u. 0 .4 .8 Load Factor Fig. 11 . Theoretical Hash Table Usage. 0 Fraction Empty Slots A Fraction of Single Entries 1.6 198 Journal of Library Automation Vol. 3/3 September, 1970 F1=ae-a reaches a maximum for a= 1, but since the slope of the curve is small around this point, the load factor in the interval ( 0.8, 1.2 ) is prac- tically the same. Table usage is better, however, for the larger values of a. These facts imply that scatter storage schemes make most efficient use of space and time for a=l. Most text words can be assumed to be in the dictionary. Thus the order of comparisons during lookup should be: Hash Table Scan 1) check minor assuming the text word is a common word 2) check minor assuming the word is non common 3) check if the entry is a pointer to the bump table 4) check if the entry is empty First Bump Table Entry (must be at least two) 5) check minor assuming the word is a common word 6) check minor assuming the word is non common Other Bump Table Entries 7) check minor assuming the word is non common 8) check minor assuming the word is common 9) check if at end of collision block. The search pattern can be varied to take advantage of the storage condi- tions. For example, if all common words are either single entries or the first element of a collision block, then step 8 may be eliminated. Performance The lookup system described above has been implemented and tested on the IBM S/360/65. A modified form of the MC algorithm is used to compute a 29-bit virtual address and divide it into a 15-bit major and a 14- bit minor. The modification is the inclusion of a single left shift of the fullwords of characters during key formation. This breaks up certain types of symmetries between words such as WINGTAIL and TAILWING. Without this, such words will always collide. The hash-bump tables were filled with entries from the ADI dictionary-common words first, followed by noncommon words. The shortest words were entered first. Table 5 gives comparison of the expected and actual results. Table 5. Lookup System Results. a=.239 Number of empty table slots Number of single entries Number of collision blocks Longest collision block Average length of collision blocks Size of bump table Number of collisions Average probes per lookup Expected 25810 6161 797 4 2.1 1663 .06 1.33 Actual 25762 6250 756 4 2.1 1572 0 1.33 Scatter Storage for Dictionary Lookups/MURRAY 199 To obtain the actual lookup times 627 words were processed. The words were read from cards and all punctuation removed. Each word was passed to the lookup program as a continuous string of characters with the proper number of fill characters added. The resulting times are given in Table 6 (in microseconds); a larger sample of the category of "not-found" words processed with less accurate timings indicates that the average time for words in this category is about 62 microseconds (standard deviation 26). Table 6. Lookup Times Category Number of Words of Words All 627 Common 288 Noncommon 338 Not found 1 Percent of Total 100.0 45.9 53.9 0.2 Average Time 57.9 49.9 64.7 53.1 Standard Deviation 11.7 6.7 10.7 0.0 Average Probes 1.18 1.12 1.24 1.00 The time to compute a hash address depends on the length of the word . Let n be the number of S /360 full words needed to hold these characters. The time to form the initial address is I ( n) = 34.5 + 10.2 ( n-1) microseconds. The average total lookup time, then, is T = I(n) + cP where c is the average time per probe into the table space and P is the average number of probes. For the words in the experiment n = 2.32 (average), I ( n) = 40.3, and T = 57.9, so that each probe required about 15 microseconds. C ompadsons Timing information for other lookup schemes is difficult to obtain. A tree- structured dictionary is used for a similar purpose at Harvard. Published information indicates 6pq microseconds are needed to process p words in a dictionary of q entries. This time is for the IBM 7094. Translating this time to the S/360/65, which is roughly four times faster, and using the ADI dictionary ( q = 7822), it appears that each lookup averages 11,000 micro- seconds. Exactly how much computation and input-output this includes is unknown. EXTENSIONS Larger Dictionaries As more words are added to the dictionary, the size of the virtual address must increase in order to prevent collisions. As a result, the number of bits per table slot must also increase in order to accommodate the larger minors and pointers that are used. For a fixed-sized hash table, the number of entries in the bump table grows as new words are added. At some point the space required for tables will exceed the amount of core allotted for 200 Journal of Library Automation Vol. 3/ 3 September, 1970 dictionary use. To salvage the scheme, it may be possible to split the buinp table into parts-one part for more frequently used words and one for words in rather rare usage. During dictionary construction common words are entered first, then noncommon, then rare. When a rare word must be placed in a collision block, a marker is stored instead, and the item is placed in the secondary bump table. Presumably the nature of the words in the second bump table will make its usage rather infrequent, thus saving access to auxiliary storage to fetch it. Suffix Removal Many dictionary schemes store only word stems; the lookup attempts to match only the stem, disregarding suffixes in the process. This is not easily done with scatter storage schemes. One solution is to try to remove the suffix after an initial search has failed. Each of the various possible stems must be looked up independently until a match is found. Another solution is to use a table of correspondences between the various forms of a word and its stem. The concept number could be used as an index on th is table containing pointers to information about the actual stem. A thesauru s lookup can be handled the same way. Application to Library Files Library fil es-characterized by a large number of entries, personal and corporate names, foreign language excerpts, etc.-present special problems to lookups. With regard to size, there is no particular reason that scatter storage cannot b e extended to such files. The only genuine requirement is the ability to compute a virtual address long enough to insure a reasonably low number of collisions. As mentioned previously, table space can become a problem. For really large files, a two-stage process looks most promising. A small hash table is used to address high frequency items and a larger hash table is used for addressing all other data. Lookup starts with the small tables and continues to the larger ones if the initial search fails. The same virtual address can be used in both lookups by shifting a few bits from the high-frequency minor to the low-frequency major. This two-stage technique should keep the amount of table shufBing to a minimum and provide rapid lookup for all textual data in titles, abstracts, etc. With respect to bibliographic information, personal and corporate names are bothersome because they can occur in several forms. Unfortunately, scatter storage schemes do not guarantee that dictionary entries for R. A. Jones and Robert A. Jones are near each other, so that if an initial lookup fails, the rest of the search can be confined to a local area of the file. There are two approaches to the problem : ( 1 ) standardization of names before input or ( 2) repeated lookups using variants of a name as it occurs in text. Standardization, along with delimiting and formatting bibliographic data, is probably the most effective and least expensive approach. In addition, it reduces the amount of redundant data in the file. Scatter Storage for Dictionary Lookups/MURRAY 201 Phrases in foreign languages present a difficulty, since the character sets on most computing equipment are limited to English letters and symbols. However, if an encoding for such symbols is used, lookup can proceed normally. The problem of obtaining the dictionary entry for an English equivalent of a foreign word is a completely different matter and will not be dealt with here. CONCLUSIONS Virtual scatter storage schemes are well suited for dictionaries, having both rapid lookup and economy of storage. The rapid lookup is due to the fact that the initial table probe limits the search to only a few items. The space savings come from the fact that the actual character strings for words are not part of the dictionary. The schemes depend heavily on a good algorithm for producing random hash addresses. The theory de- veloped in the first two sections of this paper gives a basis for judging the worth of proposed algorithms. For any particular application, the table organization may vary to suit different needs and to store different information. However, the advantages of scatter storage schemes are still present. REFERENCES 1. Salton, G.: "A Document Retrieval System for Man-Machine Interac- tion." In Association for Computing Machinery. Proceedings of the 19th National Conference, Philadelphia, Pennsylvania, August 25-27, 1964, pp. L2.3-l-L2.3-20. 2. Mcilroy, M. D.: Dynamic Storage Allocation (Bell Telephone Labora- tories, Inc., 1965). 3. Morris, R.: "Scatter Storage Techniques," Communications of the ACM (January, 1968 ). 4. Maurer, W. D.: "An Improved Hash Code for Scatter Storage," Com- munications of the ACM (January, 1968) . 5. Johnson, L. R.: "Indirect Chaining Method for Addressing on Secondary Keys," Communications of the ACM (May, 1961). 5256 ---- 218 HISTORY OF LIBRARY COMPUTERIZATION Frederick G. KILGOUR : Director, Ohio College Library Center, Columbus, Ohio The history of library computerization from its initiation in 1954 to 1970 is described. Approximately the first half of the period was devoted to computerization of user-oriented subject infotmation retrieval and the second half to library-oriented procedures. At the end of the period on-line systems were being designed and activated. This historical scrutiny seeks the origins of library computerization and traces its development through innovative applications. The principal evolutionary steps following upon a major application are also depicted. The investigation is not confined to library-oriented computerization, for it examines mechanization of the use of library tools as well; indeed, the first half-dozen years of library computerization were devoted only to user applications. The study reveals two major trends in library computerization. First, there are those applications designed primarily to benefit the user, although few, if any, applications have but one goal. The earliest such applications were machine searches of subject indexes employing post-coordination of Uniterms. Nearly a decade later, the first of the bookform catalogs appeared that made catalog information far more widely available to users than do card catalogs. Finally, networks are under development that have as their objective availability of regional resources to individual users. The second trend is employment of computers to perform repetitive, routine library tasks, such as catalog production, order and accounting procedures, serials control, and circulation control. This type of mechaniza- History of Library ComputerizationfKILGOUR 219 tion is extremely important as a fir st step toward an increasingly productive library technology, which must be an ultimate goal if libraries are to be economically viable in the future ( 1,2). Historical studies of library computerization have not yet appeared, although some reports beginning with that of L. R. Bunnow ( 3) in 1960 contain valuable literature reviews. Both editions of Literature on Infor- mation Retrieval and Machine Translation by C. F. Balz and R. H. Stan- wood ( 4,5) are extremely useful. In addition, J. A. Speer's Libraries and Automation ( 6) is a valuable, retrospective bibliography of over three thousand entries. ORIGINS The origins of library computerization were in engineering libraries newly established in the 1950's and employing the Uniterm coordinate indexing techniques of Mortimer Taube on collections of report literature. The technique of post-coordination of simple index terms proved most suitable for computerization, particularly when the size of a file caused manual manipulation to become cumbersome. Harley E . Tillitt presented the first report, albeit unpublished at the time, on library computerization at the U.S. Naval Ordnance Test Station (NOTS), now the Naval Weapons Center at China Lake, California. The report, entitled "An Experiment in Information Searching with the 701 Calculator" (7), was given at an IBM Computation Seminar at Endicott, New York, in May 1954. The system was extended .and improved in 1956, and a published report appeared in 1957 ( 8). Tillitt subsequently published an evaluation ( 9). The NOTS system mimicked manual use of a Uniterm card file. This noteworthy system could add new information, delete information related to discarded documents, match search requests against the master file, and produce a printout of document numbers selected. Search requests were run in batches, thereby producing inevitable delays that caused user dissatisfaction. When the user did receive results of his search, he had a host of document numbers that he had to take to a shell list file to obtain titles. Subsequent system designers also found that a computerized system could cause user dissatisfaction if it did not speed up and make more thorough practically all tasks. Because use of the system dwindled, it was not reprogrammed for an IBM 704 that replaced the 701 in 1957. However, a couple of years later, when an IBM 709 became available, the system was reprogrammed and improved so that the user received a list of document titles ( 10). Tillitt, Bracken, and their colleagues deserve much credit for their pioneer computerization of a subject information retrieval system. The application required considerable in genuity, for the IBM 701 did not have built-in character representation. Therefore it was necessary to develop subroutines that simulated character representation ( 11 ). Moreover, the 701 had an 220 Journal of Library Automation Vol. 3/3 September, 1970 unreliable electrostatic core memory. On some machines the mean time between failures was less than twenty minutes ( 12). In September 1958, General Electric's Aircraft Gas Turbine Division at Evendale, Ohio, initiated a system on an IBM 704 computer ( 13) that was similar to the NOTS application. Mortimer Taube and C . D. Gull had installed a Uniterm index system at Evendale in 1953 (14,15). The GE system was an improvement over the then-existing NOTS system because it printed out author and title information for a report selected, as well as an abstract of the report. Like the NOTS system, however, the GE application provided only for Boolean "and" search logic. The celebrated Medlars system ( 16) encompassed the first major de- parture in machine citation searching. The original Medlars had two princi- pal products: 1 ) composition of Index M edicus; and 2) machine searching of a huge file of journal article citations for production of recurrent or on- demand bibliographies. The system became operational in 1964. The NOTS and GE systems coordinated document numbers as listed under descriptors. Medlars departed from this technique by searching a compressed citation file in which each citation had its descriptors or subject headings associated with it. The Medlars system also provides for Boolean "and," "or," and "not" search logic. The next major development was DIALOG (17), an on-line system for machine subject searching of the NASA report file. Queries were entered from remote terminals. The SUNY Biomedical Communication Network constitutes an important development in operation of machine subject searching and production of subject bibliographies of traditional library materials. The SUNY network went into operation in the autumn of 1968 with nine participating libraries ( 18) . Its principal innovation is on-line searches from remote terminals of the Medlars journal article file to which book references have been added. The SUNY network eliminates the two major dissatisfactions with the NOTS system and all subsequent batch systems, in that it provides the user with an immediate reply to his search query. CATALOG PRODUCTION In 1960, L. R. Bunnow prepared a report for the Douglas Aircraft Company ( 3) in which he recommended a computerized retrieval system like the NOTS and GE systems that would also include catalog card pro- duction. Bunnow's proposal was perhaps the first to contain the concept of production of a single machine readable record from which multiple products could be obtained, such as printed catalog cards and subject bibliographies produced by machine searching. Catalog card production began in May 1961 ( 19), the cards having a somewhat unconventional for- mat and being printed all in upper-case characters as shown in Figure 1. Cards were mechanically arranged in packs for individual catalogs, and alphabetized within packs-an early sophistication. Accompanying the History of Library ,ComputerizationjKILGOUR 221 ML 13,750 DOUGLAS AIRCRAfT CO., INC MECHANIZED INFORMATION RETRIEVAL SYSTEM FOR DOUGLAS AIRCRAFT COMPANY, INC., STATUS REPORT. G. W. KORIAGIN, L. R. BUNNOW JANUARY 1962 COPY 1 Fig. 1. Sample Catalog Card. INFO~TION RETRIEVAL IBRARIES COMPUTER EARCHING IBM 7090 IBM 1401 production of catalog cards was production of accession lists from the same machine readable data. The next development in catalog card production occurred at the Air Force Cambridge Research Laboratory Library, which began to produce cards mechanically in upper- and lower-case in 1963 ( 20). A special com- puter-like device called a Crossfiler manipulated a single machine readable cataloging record on paper tape to produce a complete set of card images punched on paper tape. This paper-tape product drove a Friden Flexo- writer that mechanically typed the cards in upper- and lower-case. Two years later, Yale began to produce catalog cards in upper- and lower-case directly on a high-speed computer printer ( 21). The Yale cards were also arranged in packs, as had been those at Douglas, but were not alphabetized within packs. The New England Library Information Network, NELINET, demon- strated in a pilot operation in 1968 a batch processing technique servicing requests from New England state university libraries, via teletype terminals, for production of catalog card sets, book labels, and book pockets from a MARC I catalog data file ( 22). The NELINET system became operational in the spring of 1970 employing the MARC II data base. Also in 1968 the University of Chicago Library brought into operation catalog card produc- tion with data being input remotely on terminals in the Library, and cards being printed in batches on a high-speed computer printer centrally ( 23 ). Bookform catalogs began to appear in the early 1960's, and it appears that the Information Center of the Monsanto Company in St. Louis, Missouri, published the earliest report on a bookform catalog that it had 222 Journal of Library Automation Vol. 3/3 September, 1970 produced by computer in 1962 ( 24,25) . The Center discontinued its card catalog in the same year. Book catalogs can increase availability of catalog- ing information to users while reducing library work, and the Monsanto book catalog is an example of such an achievement, for it provides a union catalog of the holdings of seven Monsanto libraries, and is produced in over one hundred copies. As would be expected, the catalog appeared all in upper-case. However, in September 1964 the Library at Florida Atlantic University produced a bookform catalog in upper- and lower-case (26) and the University of Toronto Library put out the first edition of its upper- and lower-case ONULP catalog on 15 February 1965 (27,28). The Monsanto catalog format called for author and call number on one line, with title and imprint on a second, or second and third, line. Both Florida Atlantic and Toronto catalogs were essentially catalogs of catalog cards. Under the leadership of Mortimer Taube, Documentation, Inc. was first to produce a bookform catalog in upper- and lower-case, with a format like that of bookform catalogs in the nineteenth century ( 29); Documenta- tion, Inc., prepared the catalog for the Baltimore County Public Library. Entries were made once, with titles listed under an entry if there were more than one. The Stanford bookform catalog appeared late in 1966, introducing a new type of unit record, whose first element is the title paragraph. H. P. Luhn proposed selective dissemination of information ( SDI) in 1958 (30), and perhaps the first library application of SDI was in the spring of 1962 at the IBM library at Owego ( 31), where special processing was given to new acquisitions for input into the SDI system. At about the same time, the library of the Douglas Missile & Space Systems Division instituted an SDI system that employed as input a single machine readable record from which catalog cards and accessions lists were also produced ( 32). The introduction of SDI into library operation is a major, historic inno- vation, for SDI is a routine but personalized service in contradistinction to the depersonalized library service characteristic of all but the smallest libraries. Selective dissemination of information is one of the few examples of library computerization that takes full advantage of the computer's ability to treat an individual as a person and not as one of a horde of users. CIRCULATION The Picatinny Arsenal reported the first computerized circulation sys- tem ( 33). The Pica tinny application produced a computer printed loan record, lists of reserves, overdues, lists of books on loan to borrowers, and statistical analysis, in a system that began operation in April 1962. The charge card at Picatinny was an IBM punch card into which was punched the bibliographic data and data concerning the borrower each time the book was charged. In the fall of 1962, the Thomas J. Watson Research Center ( 34) activated a circulation system much like the Pica tinny system, except that bibliographic data was punched into a book card by machine, but information about the borrower was manually punched. History of Library c;om.puterizationjKILGOUR 223 The next step forward occurred at Southern Illinois University ( 35), where a circulation system like the two just described began limited opera- tion in the spring of 1964 employing an IBM 357 data collection system. By using the 357, it was possible to have a machine punched book card and a machine readable borrower's identification card that could be read by the 357, thereby eliminating manual punching. The Southern Illinois system became fully operational at the beginning of the fall term of 1964, as did a similar 357 system at Florida Atlantic University (26). Batch processed circulation systems periodically producing a listing of books on loan have a built-in source of dissatisfaction, particularly in academic libraries, for current records are unavailable on the average for half the period of the frequency of the printout. Such delay can be elim- inated in an on-line system, wherein information about the loan is available immediately after recording the loan. However, not all circulation systems with remote terminals operate interactively. In an on-line system introduced at the Illinois State Library in December 1966 ( 36) the transactions were recorded on an IBM 1031 terminal located at the circulation desk, data transmitted from the terminal being accumu- lated daily and processed into the file nightly. As first activated, the system did not permit querying the file to determine books charged out, but this capability was added in 1969. Also in December 1966, the Redstone Scientific Information Center brought into operation a pilot on-line book circulation system based on a converted machine readable catalog consisting of brief catalog entries. This pilot system remained in operation until October 1967, and was capable of recording loans, discharging loans, putting out overdues, maintaining reserves, and locating the record in the file (37). The BELLREL real time loan system went into operation at Bell Labora- tories Library in March 1968 ( 38). BELLREL has a data base consisting of converted catalog records, so that in effect it also is a remote catalog access system. BELLREL serves three libraries remotely from two IBM 1050 terminals in each library. BELLREL is a sophisticated on-line, real time circulation system that not only records and discharges books, but also replies to inquiries as to the status of a title, and the status of a copy, and will display the full record for a title, as would be required for remote catalog access. SERIALS The Library of the University of California, San Diego, activated the first computerized serials control system ( 39). This system has as its objec- tive production of a complete holdings list, lists of current receipts, binding lists, claims, nonreceipt lists, and expiration of subscription lists. Checking in was accomplished by manual removal from a file of a prepunched card for a specific title and issue. The check-in clerk sent this card to the computer center for processing and the journal issue to the shelves. This 224 Journal of Library Automation Vol. 3/3 September, 1970 technique of prepunching receipt cards has generated new problems in some libraries, for professional advice is often needed as to action to be taken when the issue received does not match the prepunched card. Never- theless, the San Diego system still operates, albeit with modifications. The Washington University School of Medicine Library activated a serials control system in 1963 ( 40) that was essentially like that at San Diego. A series of symposia held at Washington University, with the first in the autumn of 1963, widely publicized the system and led to its adoption elsewhere. The University of Minnesota Biomedical Library introduced a technique of writing in receipts of individual journal issues on preprinted check-in lists ( 41 ). Check-in data was then keypunched from the lists. This system obviated the problem generated by prepunched cards that did not match received issues, but, of course, reintroduced manual procedures. Difficulties with check-in procedures, and delays in receipt of printed lists of holdings made it clear that an on-line real time circulation control system would be superior to the batch systems described in the previous paragraph. Laval University in Quebec introduced the first on-line, real time system in 1969 ( 42). In September 1969 the Laval on-line file held 16,335 titles. Access to the file from cathode ray tube terminals is by accession number, and the file, or sections thereof, can be listed. The system also produces operating statistics and contains the potential for automatic claiming. The Kansas Union List of Serials ( 43 ), which appeared in 1965, was the first computerized union list to contain holdings of several institutions. The Kansas Union List recorded holdings for nearly 22,000 titles in eight colleges and universities. Reproduced photographically from computer printout and printed three columns on a page, this legible and easy-to-use List set the style for many subsequent union lists. ACQUISITIONS The National Reactor Testing Station Library was first to use a computer in ordering processes ( 44). A multiple-part form was produced for library records and for dealers. The Library of the Thomas J. Watson Research Center activated a more sophisticated system in 1964 that produced a processing information list containing titles of all items in process, a shelf list card, a book card, and a book pocket label ( 45). The Pennsylvania State University Library put a computerized acquisi- tion system into operation in 1964 ( 46). This system produced a compact, line-a-title listing of each item in process, together with an indication of the status of the item in processing. A small decklet of punch cards was produced for each item on a keypunch, and one of these cards was sent to the _computer center for processing each time its associated item changed status. The Pennsylvania system also produced purchase orders. In June 1964, the University of Michigan Library ( 47) introduced a computerized acquisitions procedure more sophisticated than its prede- History of Library Computerization/KILGOUR 225 cessors. The Michigan system produced a ten-part purchase order fanfold, an in-process listing, and computer produced transaction cards to update status of items in process; and carried out acconnting for encumbrance and expenditure of book fnnds. In addition, the system produced periodic listings of "do-not-claim" orders, listings of requests for quotation, and of "third claims" for decision as to future action on such orders. In 1966, the Yale Machine Aided Technical Processing System began operation ( 48). It produced daily and weekly in-process lists arranged by author, a weekly order number listing, weekly fund commitment registers, and notices to requesters of status of request. Subsequently, claims to dealers were added, as well as management information reports on activities within the system. Like the Pennsylvania and Michigan systems, its in- process list recorded the status of the item in processing. The Washington State University Library brought the first on-line acquisition system into operation in April 1968 ( 49). Access to the system was by purchase order number, with records arranged in a random access file nnder addresses computed by a random number generator (50). The Stanford University Libraries on-line acquisition system began operation in 1969 (51), and employed a sequential file of entries having an index of words in author and title elements of the entry. The Stanford system calculated addresses of index works by employing a division hashing technique on the first three letters of the word. STANDARDIZATION By 1965, a dozen or more libraries had a dozen or more formats for machine readable bibliographic records, and an impenetrable thicket of such records was evolving. Fortnnately, the Library of Congress, with the help of the Connell on Library Resources, took the initiative in standardi- zation of format of bibliographic records and produced the now familiar MARC format (52) . Just as standardization of catalog card sizes enabled interchange of catalog records, so has MARC made possible interchange of machine readable catalog records. This standardization has encouraged developments of networks, such as the SUNY Biomedical Network, NELINET, the Washington State Li- braries network, and that of the Ohio College Library Center. With each of these regional networks employing the MARC bibliographic record, it will be possible to integrate these regional nodes into a future national network. SUBSTANCE AND SUM The first half of the first decade and a half of library computerization was confined almost entirely to two major mechanizations of Mortimer Taube's Uniterm coordinate indexing. The computerization of single descriptors with attendant document numbers wa£ a relatively easy task. The first breakaway from computerized subject searching came at the 226 Journal of Library Autornation Vol. 3/ 3 September, 1970 Douglas Aircraft Corporation, where the technique of producing one ma- chine readable record from which multiple products could be obtained was introduced in 1961. The last half of library automation's decade and a half has been largely consumed with efforts to automate existing library procedures. Althou~h notable departures have occurred that take advantage of the computers powerful qualities, on-line, real time techniques introduced at the very end of the historical period under review began again to use individual words as words, not unlike the logic in which the first applications employed Uniterms; and it seems likely that the immediate future will witness increasing degrees of computerization based on individual words in bibliographic descriptions rather than on the record as a whole. ACKNOWLEDGMENTS The author is grateful to Sheila Bertram for identifying, searching out, and gathering most of the references used in this paper. Cloyd Dake Gull furnished in correspondence invaluable information about events of the fifties and early sixties, and various librarians supplied photocopies of early documents. REFERENCES l. Kilgom, Frederick G.: "The Economic Goal of Library Automation," College & Research Libraries, 30 (July 1969 ), 307-311. 2. Baumol, William J.: "The Costs of Library and Informational Services." In Libraries at Large (New York: R. R. Bowker Co., 1969), pp. 168-227. 3. Bunnow, L. R.: Study of and Proposal for a Mechanized Inforrrwtion Retrieval System for the Missiles and Space Systems Engineering Library (Santa Monica, California: Douglas Aircraft Co., 1960). 4. Balz, Charles F.; Stanwood, Richard H .: Literature on Information Retrieval and Machine Translation ( International Business Machines Corp., November 1962). 5. Balz, Charles F.; Stanwood, Richard H.: Literature on Information Retrieval and Machine Translation 2d. ed. (International Business Machines Corp., January 1966). 6. Speer, Jack A. : Libraries and Automation; a Bibliography with Index (Emporia, Kansas: Teachers College Press, 1967). 7. Tillitt, Harley E.: "An Experiment in Information Searching with the 701 Calculator," Journal of Library Automation, 3 (Sept. 1970 ), 202-206. 8. Bracken, R. H. ; Tillitt, H. E.: "Information Searching with the 701 Calculator," journal of the A ssociation for Computing Machinery, 4 ( April 1957 ), 131-136. 9. Tillitt, Harley E. : "An Application of an Electronic Computer to Information Retrieval." In Boaz, Martha : Modern Trends in Doc'lrl- m entation (New York: Pergamon Press, 1959), pp. 67-69. History of Library ComputedzationjKILGOUR 227 10. Zaharias, Jerome L.: LIZARDS; Libmry Irlformation Search and Re- trieval Data System (China Lake, California: U. S. Naval Ordnance Test Station, 1963). 11. Bracken, Robert H.; Oldfield, Bruce G.: "A General System for Handling Alphameric Information on the IBM 701 Computer," Journal of the Association for Computing Machinery, 3 (July 1956), 175-180. 12. Rosen, Saul: "Electronic Computers: A Historical Survey," Computing Surveys, 1 (March 1969), 7-36. 13. Barton, A. R.; Schatz, V. L.; Caplan, L. N.: Information Retrieval on a High Speed Computer (Evendale, Ohio: General Electric Co., 1959), P· 8. 14. Gull, C. D.: Personal communication, (22 August 1969) . 15. Dennis, B. K.; Brady, J. J.; Dovel, J. A., Jr.: "Five Operational Years of Inverted Index Manipulation and Abstract Retrieval by an Elec- tronic Computer," Journal of Chemical Documentation, 2 (October 1962 )) 234-242. 16. Austin, Charles J.: MEDLARS; 1963-1967 (Bethesda, Maryland: Na- tional Library of Medicine, 1968). 17. Summit, Roger K.: "DIALOG: an Operational On-Line Reference Retrieval System." In Association for Computing Machinery: Proceed- ings of 22nd National Conference. (Washington, D. C.: Thomson, 1967), pp. 51-56. 18. Pizer, Irwin: "Regional Medical Library Network," Bulletin of the Medical Libmry Association, 51 (April1969), 101-115. 19. Koriagin, Gretchen W .: "Library Information Retrieval Program," Journal of Chemical Documentation, 2 (October 1962 ) 242-248. 20. Fasana, Paul J.: "Automating Cataloging Functions in Conventional Libraries," 7 (Fall 1963), 350-365. 21. Kilgour, Frederick G.: "Library Catalogue Production on Small Com- puters," American Documentation, 17 (July 1966), 124-131. 22. Nugent, William R.: "NELINET-The New EngJand Information Net- work." In Congress of the International Federation for Information Processing, 4th, Edinburgh, 5-10 August, 1968: Proceedings (Amster- dam: North-Holland Publishing Co., 1968), pp. G 28-G 32. 23. Payne, Charles T.: "The University of Chicago's Book Processing System." In Proceedings of a Conference Held at Stanford University Libraries, October 4-5, 1968 (Stanford, Califomia: Stanford University Libraries, 1969). 24. Wilkinson, W . A.: Personal communication (November 1969). 2.5. Wilkinson, W. A.: "The Computer-Produced Book Catalog: An Appli · cation of Data Processing at Monsanto's Information Center." In University of Illinois Graduate School of Library Science: Proceedings of the 1965 Clinic on Library Applications of Data Processing (Cham- paign, Illinois: Illini Union Bookstore, 1966), pp. 92-111. 228 Journal of Library Automation Vol. 3/3 September, 1970 26. Heiliger, Edward: "Florida Atlantic University Library." In University of Illinois Graduate School of Library Science: Proceedings of the 1965 Clinic on Library Applications of Data Processing (Champaign, Illinois: Illini Union Bookstore, 1966), pp. 92-111. 27. Bregzis, Ritvars: Personal communication (November 1969 ) . 28. Bregzis, Ritvars: "The Ontario Universities Library Project-An Auto- mated Bibliographic Data Control System," College & Research Librar- ies, 26 (November 1965), 495-508. 29. Robinson, Charles W.: "The Book Catalog: Diving In," Wilson Library Bulletin, 40 (November, 1965), 262-268. 30. Luhn, H. P.: "A Business Intelligence System," IBM Journal of Re- search and Development, 2 (October 1958), 315-319. 31. Stanwood, Richard H.: "The Merge System of Information Dissemina- tion, Retrieval and Indexing Using the IBM 7090 DPS ." In Association for Computing Machinery: Digest of Technical Papers (1962), pp. 38-39. 32. Young, E. J.; Williams, A. S.: Historical Development and Present Status-Douglas Aircraft Company Computerized Library Program (Santa Monica, California: Douglas Aircraft Co., 1965). 33. Haznedari, I.; Voos, H.: "Automated Circulation at a Government R & D Installation," Special Libraries, 55 (February 1964), 77-81. 34. Gibson, R. W., Jr.: Randall, G. E.: "Circulation Control by Computer," Special Libraries, 54 (July-August 1963), 333-338. 35. McCoy, Ralph E.: "Computerized Circulation Work: A Case Study of the 357 Data Collection System," Library Resources & Technical Services, 9 (Winter 1965), 59-65. 36. Hamilton, Robert E.: "The Illinois State Library 'On-Line' Circulation Control System." In University of Illinois Graduate School of Library Science: Proceedings of the 1968 Clinic on Library Applications of Data Processing (Urbana, Illinois: Graduate School of Library Science, 1969), pp. 11-28. 37. "Redstone Center Shows On-line Library Subsystems," Datamation, 14 (February 1968), 79, 81. 38. Kennedy, R. A. : "Bell Laboratories' Library Real-Time Loan System (BELLREL)," Journal of Library Automation, 1 (June 1968), 128-146. 39. University of California, San Diego, University Library: Report on Serials Computer Project; University Library and UCSD Computer Center (La Jolla, California: University Library, July 1962). 40. Pizer, Irwin H.; Franz, Donald R.; Brodman, Estelle: "Mechanization of Library Procedures in the Medium-Sized Medical Library: I. The Serial Record," Bulletin of the Medical Library Association, 51 (July 1963) , 313-338. 41. Strom, Karen C.: "Software Design for Bio-medical Library Serials Control System." In American Society for Information Science, Annual Meeting, Columbus, 0., 20-240ct.1968: Proceedings, 5 (1968) , 267-275. History of L-ibrary ComputerizationjKILGOUR 229 42. Varennes, Rosario de : "On-line Serials System at Laval University Library," Journal of Library Automation, 3 (June 1970). 43. Kansas Union List of Serials ( Lawrence, Kansas: University of Kansas Libraries, 1965 ), 357 pp. 44. Griffin, Hillis L.: "Electronic Data Processing Applications to Technical Processing and Circulation Activities in a Technical Library." In University of Illinois Graduate School of Library Science: P-roceedings of the 1963 Clinic on Library Applications of Data ProcesS'ing (Cham- paign, Illinois: Illini Union Bookstore, 1964) , pp. 96-108. 45. Randall, G. E.; Bristol, Roger P.: "PIL (Processing Information List ) or a Computer-Controlled Processing Record," Special Libraries, 55 (Feb. 1964), 82-86. 46. Minder, Thomas L.: "Automation-the Acquisitions Program at the Pennsylvania State University Library." In International Business Ma- chines Corporation: IBM Library Mechanization Symposium, Endicott, New York, May 25, 1964, pp. 145-156. 47. Dunlap, Connie: "Automated Acquisitions Procedures at the University of Michigan Library," Library Resources & Technical Services, 11 (Spring 1967), 192-206. 48. Alanen, Sally; Sparks, David E.; Kilgour, Frederick G.: "A Computer- Monitored Library Technical Processing System." In American Docu- mentation Institute, 1966 Annual Meeting, October 3-7, 1966, Santa Monica, California: Proceedings, pp. 419-426. 49. Burgess, T .; Ames, L.: LOLA; Library On-Line Acquisitions Sub- System (Pullman, Wash.: Washington State University Library, July 1968). 50. Mitchell, Patrick C.; Burgess, Thomas K.: "Methods of Randomization of Large Files with High Volatility," Journal of Library Au-tomation, 3 (March 1970). 51. Parker, Edwin B.: "Developing a Campus Information Retrieval Sys- tem." In Proceedings of a Conference Held at Stanford University Libraries, October 4-5, 1968 (Stanford, California: Stanford University Libraries, 1969), pp. 213-230. 52. "Preliminary Guidelines for the Library of Congress, National Library of Medicine, and National Agricultural Library Implementation of the Proposed American Standard for a Format for Bibliographic Informa- tion Interchange on Magnetic Tape as Applied to Records Representing Monographic Materials in Textual Printed Form (Books) ," Jourruzl of Ubrary Automation, 2 (June 1969), 68-83. 5257 ---- 230 THE RECON PILOT PROJECT: A PROGRESS REPORT NOVEMBER 1969 -APRIL 1970 Henriette D. AVRAM, Kay D. GUILES, Lenore S. MARUYAMA: MARC Development Office, Library of Congress, Washington, D. C. A srtnthesis of the second progress report submitted by the Library of Congress to the Council on Library Resources under a grant for the RECON Pilot Project. An overview of the p1'0gress made from November 1969 to April 1970 in the following areas: p1'0duction, Official Catalog comparison, format mcognition, research titles, microfilming, investigation of inptlt devices. In addition, the status of the tasks assigned to the RECON Working Task Force are briefly described. INTRODUCTION An article was published in the June 1970 issue of the Journal of Library Automation ( 1) describing the scope of the RECON Pilot Project (hereafter referred to as RECON) and summarizing the first progress report submitted by the Library of Congress ( LC) to the Council on Library Resources (CLR). RECON is supported by the Council, the U.S. Office of Education, and the Library of Congress. In order that all aspects of the project might be brought together as a meaningful whole, the various segments, regardless of the source of support, were covered in the second progress report and have been included in this article. In some instances, it has been necessary to introduce a section by repeating some aspects already reported in the June 1970 article in order to add clarity to the content of that section. RECON Pilot Project/ AVRAM 231 PROGRESS-NOVEMBER 1969 TO APRIL 1970 RECON Production The production operations of the RECON Pilot Project are being handled by the RECON Production Unit in the MARC Editorial Office of the LC Processing Department. Printed cards with 1968, 1969, and 7-series card numbers have been provided from the Card Division stock for RECON input, and approximately 99,550 cards in the 1969 and 7-series have been received. Using prescribed selection criteria the RECON editors have sorted these cards and obtained approximately 27,150 eligible for RECON input. Approximately 150,000 cards in the 1968 series have also been received. The RECON editors have sorted 60,000 of these cards and obtained approximately 24,000 records eligible for RECON input. A large number of cards in these three series is already out of print, and replacement cards are being sent by the Card Division as soon as reprints are made. Each card eligible for RECON input from the above-mentioned selection process is also checked against a computer produced index of card numbers for records in machine readable form. Each number in the print index has a corresponding code to show on which machine readable data base the record resides. The source codes are as follows: M1-MARC I data base M2-MARC II, 1st practice tape M3-MARC II, 2nd practice tape M4-MARC II data base M5-MARC II residual data base (The two practice tapes contain records converted before the implementa- tion of the MARC Distribution Service to test the programs and input techniques.) The print index used for the final selection of the 1969 and 7-series card numbers contained only the records from M2-M5 (the MARC I data base consists of the records converted during the MARC Pilot Project which ended in June 1968). For the selection of the 1968 records, another print index had been produced which contains numbers for records on all five data bases. If the RECON editors find a match on the print index, the appropriate source code is added to the printed card; these printed cards are then maintained in a separate file. (Later in the project, the records in the data bases identified as M1 to M3 will be updated to conform with the current MARC II format and added to the RECON data base.) The remaining cards for RECON are reproduced on input worksheets and edited. To date, approximately 9,750 records in the 1969 and 7-series have been edited for RECON. RECON records in the 1969 and 7-series are being input by a service bureau. The contractor uses IBM Selectric typewriters equipped with an OCR typing mechanism, and the hard-copy sheets are run through an 232 journal of Library Automation Vol. 3/3 September, 1970 optical scanner. The output from the scatmer is a magnetic tape which is processed by the contractor's programs to produce a tape in the MARC Pre-Edit format. This tape is then sent to LC and processed by the MARC System programs to produce a full MARC record. Since the input for the retrospective conversion effort will be printed cards (or copies of printed cards from the Card Division record set), it will be necessary to compare these with their counterparts in the LC Official Catalog. The printed card for each main entry in the Official Catalog will show if any changes have been made which did not warrant reprinting these cards to incorporate these changes. Items on a printed card that could be noted in this fashion include changed subject headings, added entries, and call numbers. Since these will be important access points in a machine readable catalog record, it was felt that such revisions should be reflected in the RECON records. The RECON Report ( 2) contains a lengthy discussion of the various factors involved in the catalog comparison process, such as the percentage of change in relation to the age of the record, the difficulty in ascertaining any changes because of language, interpretation of cataloging rules, etc. To determine the most efficient and least costly method of catalog compari- son, two RECON editors were assigned to conduct an experiment to test eight different methods as follows: 1) Print-out checked in alphabetic order-single group of 200 records. 2) Proofsheets (already proofed) checked in worksheet (card number) order-group of 200 records in batches of 20. 3) Proofsheets (not proofed) checked in worksheet (card number) order -group of 200 records in batches of 20. 4) Proofsheets (already proofed) checked by mental alphabetization- group of 200 records in batches of 20. 5) Proofsheets (not proofed) checked by mental alphabetization-group of 200 records in batches of 20. 6) Worksheets before editing (not input) checked by mental alpha- betization-group of 200 records in batches of 20. 7) Worksheets before editing (not input) checked in alphabetical order -group of 200 records in batches of 20. 8) Worksheet before editing (not input) checked in worksheet (card number) order-group of 200 records in batches of 20. Mental alphabetization means the searching of all the entries in a batch beginning with "A," then all the entries beginning with "B," etc., even though the batch is not in alphabetical order. Each editor used 200 records for each method, made the necessary corrections, and recorded the time required as well as the number of corrections made. . Figure 1 shows the average number of records checked in an hour using the eight different methods of catalog comparison. Tables 1 and 2 give the estimated cost per record for each of the methods. In determining Met.hod One : PRINffi-0W Checked in ALPlt.AaET'LCAL 0,rA.e¢ Metn0d. Twa : PR0€iF~li~lt!S (Already ProEPfed) Cheeked.in W:ORK&HEET Qrder Method ·Th~~e: PitooF:S:tiEETS--(No;t Proo.£,ed) Cheek:ed i .n WORKSRBET Orde,r Merll:od Fou·r Method Five Method Six P'ROtlF'SHEETS (Already Proofed) Che.c:ked bv .tmiJ!f,J:M.. ALL'~ETUA.',l'l~ON liRoOFS·H:EETS (Not Pt;';oe:i ed) Checked bv MENTAL ALPHABETIZATION Method Seven: WORKSHtBftTS Before Editing (No•t tnntht) - ~Jh\~E,':f!,Q-!\1. QJ;!>- to ..... 0 ~ = Table 4. Input Devices >:l - -- ----------- - 0 - Manufacturer I Mn};~ne Ke yboard Reco rd Price t'-1 Model Configu- Display Length i•~ Mont1:f! Remarks .... Purchase ~ ration Characters Rent a Cybercom I KIC MARK! KP None 80 $7970 $145 Con.verter-$1801month ~ Data Action KI C 150 KP Projec- 720 $5900 $155 Converter-$5751month > tion .: IBM I KI C 50 KP Back- 7W $9605 $175 Converter-$340/ month -0 I KI C light In£nite Converter-$3401 month ~ IBM MTSTV T Printed $100 -IV T Printed In£nite $277 ... _ 0 Sycor I KIC 301 T CRT 216 $7000 $150 Converter-$1301 month ~ Tycore KI C 8500 KP Light- 240 $6000 $120 Converter-$220/month < Emitting 0 Diodes - Viatron KI C 21 TI KP CRT Infinite $1920 $39 Many options affecting price "' Burroughs KI M N-7000 KP Projec- 160 $8400to $165 to ...___ "' tion $12,200 $277 Honeywell KI M Keytape TI KP Back- 80-400 $7500 to $148 to Pooler for 2 stations- Cl) light $33,000 $735 $2001 month exh-a (I) ~ Keymatic KI M 1091 T Back- In£nite $8750 $166 Price is for basic 88 keys. 256 .... (I) light unique keys available as well s as optional printer. 0" 100 or 200 (I) MAl I KI M 100-92 KP Projec- $6400 $160 Pooler for up to 8 stations- v~ tion $401month extra ...... Mohawk I KI M 6400 KP Back- 80 $8000 $145 Pooler for 3 stations- co light $1751month extra --l c Motorola I K/ M KB800 KP None wo $8500 None Pooler for 7 stations- Potter I KI M KDR KP BCD 160 $8100 $165 purchase price $9700 Pooler for 3 stations- (Bit) $451month Sangamo KIM DS9100 KP Back- 120 $8200 $177 Pooler for 10 stations- Vanguard KI M Data- KP scribe light None 200 $247 / month extra $8500 $175 ComoutP-r T(j'T Tnfo. .,.. C'R'T' QM ~ 1 0. Clf'V\ .&. - Ont'!n ~ '"'""" - . Coo soles System Computer KIT 6000 KP Entry System Mohawk KIT 9000 KP Computer Machinery KI T Key Process- KP General KI T ing 2100 T Computer Systems Inforex KIT Key Entry KP Penta KIT Key KP Associates Logic Systems Eng. K/ T Keytran KP Logic Corp. KI D LC-720 KP Legend: KI T = Key to magnetic tape system KID = Key to disk system KI C = Key to cassette None 496 Back- 80 light Back- 250 light Printed 200 CRT l28 Back- 200 light None 300 CRT 350 KI M = Key to computer compatible magnetic tape KP = Key punch T IKP = Typewriter or key punch $78,000 $ 200 $16,200 to $360 to Two to 6 stations $42,000 $925 $53,000 to $1040 to Four to 16 stations $145,000 $2840 $92,500 to $2055to Eight to 32 stations $168,100 $4095 $81,240 to $2350 to Seven to 39 stations $273,120 $7885 $30,300 to $760 to Four to 8 stations $35,100 $960 $110,000 $3000 to Eight to 64 stations to $8600 $345,200 $100,000 $2875 to Nine to 48 stations to $6350 $220,000 $148,000 $2450to Four to 16 stations to $5800 $300,000 T = Typewriter Backlight= a matrix consisting of all individual characters that can be keyed. Each character, as keyed, is displayed one at a time in its particular position in the matrix. Projection and Light-emitting diodes = A one-character position dot matrix. Each character, as keyed, is displayed one at a time in the same position. BCD (Bit) = Lights displaying the bit position (on, off ) of individual characters. Each character, as keyed, is displayed one at a time. (The prices quoted and the characteristics given of each device reflect the best information that could be obtained by the RECON staff.) :::tl ~ ("') 0 ~ "';j [ ~ ~- ~ --.. ~ ~ ~ ~ 244 Journal of Library Automation Vol. 3/3 September, 1970 could be assigned to single keys and translated to their proper value by software, thus reducing the amount of keystroking required. The Keymatic appears worth further investigation; therefore, the Library may rent a device for several months for testing and evaluation. A typist will be trained in current MARC/RECON procedures and assigned to the Keymatic as soon as her training period has been completed. The first month will be spent training on the Keymatic prior to the actual input of RECON records to obtain production and error rates and cost evaluation for comparison purposes. Serious consideration was also given in the RECON Report to direct-read OCR equipment; however, at that time no equipment existed that offered the technical capability to perform the conversion of the LC record set. Since then, preliminary investigation of the Model370 CompuScan Univer- sal Optical Character Reader proved interesting enough to continue further exploration of the device. The Model 370 CompuScan is a computer directed flying-spot scanner which matches the scanned portion of a character with a character described in the core memory of the computer. The manufacturer has examjned a sample of LC printed cards selected at random over a period of twenty years and has concluded that although the hardware is sufficient to read the record set optically, significant soft- ware effort would be required. The results of the sampling indicated that the record set is not constituted entirely of "mint" cards, i.e., cards printed from the metal of the original Linotype composition, but is composed of originals and reprints of the original. When the stock of the original printing is close to depletion, the card is reprinted by photographing the card, and duplicates are made by a photo-offset process. As this cycle is repeated, the card for any one title could be several generations removed from the original. In some instances, a microscopic examination of the cards seems to indicate that the matrices used in the Linotype composition were worn. Because of these factors, what might appear as the same character to the naked eye would represent different pattern configurations to the scanner's core memory. · The coarseness of the card surface may also cause variations in the same characters. LC cards have a high rag content in order to meet the archival standards required by libraries. The roughness of the surface does not affect the readability for the human but may cause variations in a given character when read by an optical scanner. Another significant problem with LC cards concerns characters which touch, i.e., connections between what are intended to be distinct characters but are read by the scanner as one. For example, if a lower case "n" were next to a lower case «t" and the cross bar on the "t" touched the "n," the scanner would consider the combination of the "n" and the "t" as one character. Software must be written to handle the variant character and the touching RECON Pilot Project/ AVRAM 245 character problems. In the case of the touching characters, the machine must recognize some allowable limit of reading a single character, and when this limit is exceeded, the pattern read rnust be divided and matched against single-character patterns held in core. Programs can be written so that if either of the above conditions occurs, the output on magnetic tape will be flagged for later spot checking, permitting the scanner to continue to operate at throughput speeds without human intervention. The resultant magnetic tape would serve as input to the Library's format recognition programs to reformat the scanner's output into the MARC II format. It has been estimated that the throughput speed of CompuScan would be in the vicinity of 1800 cards per hour. The LC record set will be microfilmed according to the specifications required by the scanner. Since the scanner operates with negative film, a very dark background with a very clear, white image is necessary. A tentative cost estimate of the microfilming and reading has been computed at approximately fifty cents per 1000 characters output on magnetic tape - (approximately three LC cards). This price does not include the cost of the software. Original printed "mint" cards will be used to test the device without implementing the required software, and depending on the results, investi- gation may be continued. The keying of the 1969 RECON records has been performed by a contractor using an IBM Selectric typewriter with the resulting hard copy fed through a Farrington optical character reader. As part of the con- tractor's services to the Library, production rates were monitored and reported. This gave LC the basis to compare two devices, the key-to- cassette used at the Library of Congress for the MARC Distribution Serv.ice and the equipment used by the contractor for RECON records. To make the comparison in Table 5, it was necessary to determine the costs for each method using the techniques developed in the RECON report (9). Some modifications of cost were made to the original RECON estimates because actual figures are now available. MARC costs were obtained by dividing the costs of the manhours for typing and proofing in a given period by the number of records added to the MARC master file in the same period. The equipment cost per record was also based on the number of records added to the master file. Production rates associated with particular tasks were not used. The manpower figures supplied by the contractor were limited to hourly production rates; therefore, to obtain the cost per record for OCR typing it was necessary to project the hourly rate to cover a manyear. The es- timated annual production of a typist was then divided into the annual salary of a GS-4 (step 1) typist incremented by 8.5% for fringe benefits. The OCR equipment costs were computed on the basis of figures supplied by the contractor, assuming ownership of the OCR-font typewriter and service bureau rental of the scanner. 246 Journal of Library Automation Vol. 3/3 September, 1970 Table 5. Input Costs per Record 1. Manpower Key to Cassette Method Typing $ .45 Proofing .70 Total $1.15 OCR Method Typing rate of contractor 1,000 records in 104 hours or 9.6 records per hour Typing cost at LC $5,522 + 8.5% ( $5,522) 9.6 X 1,338 $ .466 Proofing rate of RECON editors at LC: 1,534 records proofed in 173 hours or 8.9 records per hour- 20% = 7.1 records per hour Proofing cost at LC $6,882 + 8.5% ( $6,882) $ .786 7.1 X 1,338 Typing $ .466 Proofing .786 Total $1.25 2. Equipment (costs do not include maintenance where applicable ) Key to Cassette Key to Cassette Monthly rental $100.00 Converter-Monthly rental prorated over 10 Key to Cassettes 26.00 Total $126.00 Hourly cost (assumes 132 hours a month) $ .955 Effective production rate of Key to Cassette Average weekly MARC output 1,005 4 K t C tt 't 120 = 8.4 records/hour ey o asse e um s Record cost of Key to Cassette and converter $.955 8.4 = $ .114 RECON Pilot Project/ AVRAM 247 OCR Method OCR-font typewriter Purchase price 40-month amortization Hourly cost (assumes 132 hours use) Effective production rate of OCR typewriter $500.00 12.50/month .095 9.6 records/hour X 1,338 homs d /l 132 hours X 12 months 8·1 recor s 10ur Record cost of OCR typewriter $.095 sr=$ .o12 OCR scanner-service bureau hourly rental 10,000 lines/hour each record- IS lines 555 records/hour Record cost of OCR scmmer Total record cost for equipment $.012 + $ .09 = $ 50.00 $ .09 $ .102 The cost of proofing in the OCR method was based on the RECON experience at LC modified by contractor experience. In actual practice, OCR records are proofed and corrected by the contractor before they are proofed by RECON editors. It was assumed that double proofing is unnecessary but that allowance should be made for the added difficulty of reading copy with a higher proportion of errors. (A preliminary study of errors on RECON proofsheets has shown that there are fewer typographi- cal errors on RECON proofsheets than on current MARC proofsheets.) For this reason, the number of RECON records proofed in an hour has been decreased by 20% in the calculations. On the basis of the calculations in Table 5, the comparative input costs are summarized as follows: Table 6. Estimated Input Cost per Record Key-to-Cassette OCR Manpower: Typing $.45 $.47 Proofing .80 .78 Equipment .11 .10 Totals $1.26 $1.35 The final figures indicate that the two methods are very close in cost. As presently calculated, the key-to-cassette method is less expensive than the OCR method. It is easy to see that a slight change in any cost or production rate could make the OCR method less expensive. If the proofing 248 Journal of Library Automation Vol. 3/3 September, 1970 rate of 8.9 records per hour were maintained instead of decreasing to 7.1 per hour, the OCR proofing cost would drop to $.63, and the total price for this proposed method would be $1.20. One way to test the assumption of the added difficulty of a single proofing would be to obtain uncorrected records from the contractor as a means of determining the actual proofing rate under that condition. RECON Tasks The four tasks that have been identified for study by the Working Task Force are: 1) levels of completeness of MARC records; 2) implications of a national union catalog in machine readable form; 3) conversion of existing data bases in machine readable form for use in a national biblio- graphic service; and 4) study of problems involved in any future distribu- tion of name and subject cross reference control files. Progress to date on the first three tasks is described in the following paragraphs. Task 1 has been completed, and an article summarizing the results of a report submitted to CLR has been published in the Journal of Library Automation, June 1970 ( 10). The following conclusions reached by this study are quoted from the article: 1) The level of a record must be adequate for the purposes it will serve. 2) In terms of national use, a machine readable record may function as a means of distributing cataloging infor- mation and as a means of reporting holdings to a national union catalog. 3) To satisfy the needs of diverse installations and applications, records for general distribution should be in the full MARC II format. 4) Records that satisfy the NUC function are not necessarily identical with those that satisfy the distribution function. 5) It is feasible to define the characteristics of a machine readable NUC report at a lower level than the full MARC II format. Task 2 consists of an investigation of the implications of a national union catalog in machine readable form. A design of such a system is needed, and although the implementation of such a project is beyond the purview of the Working Task Force, some of the technical and cost factors should be examined and defined for possible future research. As a framework for discussion purposes, a future reporting system for the National Union Catalog was postulated based on the present reporting system as follows: Contributors LC Outside libraries Present Report Form Printed cards Locally produced cards and LC cards Future Report Form LC - MARC data (for all records) MARC data (for all records) or records submitted to NUC to be keyed as machine read- able records RECON Pilot Project/ AVRAM 249 The problems of the control number and library location symbols were considered, but a tentative decision was made that recommendations should be forthcoming when the American National Standards Institute Sectional Committee Z39 has completed its work on library identification codes. The indicators and subfield codes to be included in the machine readable NUC records would depend on the optimum file arrangement of the suggested bibliographic listings. The Library of Congress is presently engaged in a filing rules study which should influence the inclusion or exclusion of particular content designators. Task 2 is still in progress. Task 3 is the investigation of the possible utilization of other machine readable data bases for use in a national bibliographic store. The task was divided into several subtasks as follows: 1) identification of useful data bases for the purposes described (content and bibliographic completeness); 2) cost of the conversion from a local format to a MARC II record; 3 ) cost of updating records not already in the LC data base for consistency and missing data by comparing the records with the Library of Congress Official Catalog; 4) cost of comparing the record for the existing LC machine readable records to eliminate duplicate records. To satisfy the first subtask, a questionnaire was sent to 42 organizations. The information requested included: 1) Availability of data bases-maintained by library or service bureau, and permission to copy data base. 2) Use of the data base-for acquisitions, production of book catalog, circulation system, etc. 3) Composition of data base-monographs, serials, technical reports, etc. 4) Composition of data base-number of titles, imprint dates (pri- marily current, retrospective, etc.), language of records. 5) Source of catalog data-MARC Distribution Service, LC catalog card, local cataloging. 6) Data elements for monographs. 7) Format used in identifying data elements-MARC I format, MARC II format, etc. 8) Character set used. The results from this survey were analyzed, and a follow-up letter was sent to 22 of the organizations, requesting further information as follows: 1) An estimate of the number of monographs added to the data base each year. 2) Representative group of twenty-five entries for monographs includ- ing both fiction and non-fiction. 3) Details on the character set used in the machine readable data base.· 4) Detailed specifications of monographic record format. Responses from this last letter have been received and analyzed. This analysis should identify a limited number of machine readable data bases that will be subjected to further content and cost analysis. 250 journal of Library Automation Vol. 3/3 September, 1970 OUTLOOK The RECON Project continues to be on schedule. The Working Task Force has met several times for deliberations on the assigned tasks; in addition, members have been briefed on the progress of the pilot project and their advice has been sought. Thus, individuals interested in the problems of bibliographic conversion guide the project throughout its development. The Library of Congress RECON staff continues to maintain liaison with individuals and organizations working in any facet of the project's scope, hoping to bring all expertise possible to bear on the problems involved. It is significant, although not fully recognized at the onset of the RECON Project, that the solution to many of the problems under exploration will have impact on current conversion as well as retrospective conversion. This is evident at the Library of Congress where MARC and RECON, although staffed separately in the production area, share staff in the Information Systems Office, and the project is known as MARC/RECON. Coordination continues between the RECON Project and the Card Division Mechanization Project. The RECON Project Director is the technical adviser for the Card Division Project, and under her general direction, a computer analyst in the Information Systems Office has been assigned full time to the project. The analyst has been given a detailed orientation to the procedures and computer programs for MARC/RECON and the specifications for the Card Division Project. This exposure is necessary to guarantee that there is no duplication of effort between the two projects and that the design work for the Card Division Project includes the possibility of a future national service for machine readable cataloging, both current and retrospective. (The MARC Distribution Ser- vice is such a national service for English language monograph cataloging data, but what is assumed here is a service of a much broader scope.) Although progress has been made in many of the tasks included in RECON, several methods of input described in the RECON Report can only be fully evaluated when the format recognition programs are imple- mented. According to present estimates, this should take place toward the end of 1970. Much remains to be accomplished. The Library of Congress will continue to make its progress known as rapidly as possible, because the results of the pilot project will have great ramifications for the entire library community. ACKNOWLEDGMENTS The authors wish to thank the staff members associated with the RECON Pilot Project in the Technical Processes Research Office and the MARC Editorial Office in the Library of Congress Processing Department, and RECON Pilot Profectj AVRAM 251 those in the Information Systems Office, for their respective reports, which were incorporated into the progress report submitted to the Council on Library Resources and which provided significant contributions to this paper. REFERENCES l. Avram, Henriette D.: "The RECON Pilot Project: A Progress Report," Journal of Library Automation, 3 (June 1970). 2. RECON Working Task Force: Conversion of Retrospective Records to Machine-Readable Form (Washington, D. C.: Library of Congress, 1969), pp. 32-33. 3. Avram, Henriette D., et al.: "MARC Program Research and Develop- ment: A Progress Report," Journal of Library Automation, 2 (December 1969)' 250-253. 4. RECON Working Task Force: Op. cit., p. 31. 5. National Microfilm Association: Glossary of Terms for Microphotogra- phy and Reproductions made from Micro-Images. 4th rev. ed. (An- napolis, Md.: National Microfilm Association, 1966), p. 8. 6. Ibid. 7. Ibid., p. 52 8. Hawken, William R.: Copying Methods Manual (Chicago: Library Technology Program, American Library Association, 1966), p. 243. 9. RECON Working Task Force: Op. cit., pp. 58-59, 86, 93. 10. RECON Working Task Force: "Levels of Machine-Readable Records," Journal of Library Automation, 3 (June 1970). / 5258 ---- 252 BOOK REVIEWS Systematic Analysis of University Libraries, by Jeffrey A. Raffel and Robert Shishko. Cambridge, Mass.: M. I. T. Press, 1969. 107 pp. $6.95. . Systematic Analysis of University Libraries is an exciting book, for it IS the first rep?rt describing a~plication of cost-benefit analysis to a library. Raffel and Shishk? ha~e apphed the methodology of cost-benefit analysis to . the M. I. T. L1branes and have produced an admirable description of this. method of research that examines policy making in a system as a cho~ce among alternatives. This work is not a cookbook providing answers denved from principles; it is an exposition of a methodology that produces data used as a basis of decision making. The book employs the case-study technique, with theM. I. T. Libraries furnish~g ~e raw material for the cases. Findings cannot be extrapolated to all hbranes: although they may be applicable in some. For example, Raffel and Shishko found that 75% of the M. I. T. Libraries budget is allocated to res.earch activities in the institution. Such findings are inappli- cable to small hberal arts colleges, where faculty members do little research. The ,Purpose of Systematic Analysis of University Libraries is to teach the ~ppl1cabo~ of cost-benefit analysis rather than to provide answers. It mstructs m the methodology for obtaining answers. Case. studies pres~nted in the ?ook include selection, acquisitions and catalogm~,. ~mong hbrary operations. Also examined are book storage, study facihhes and reserve book procedures. A technique for measuring benefits by surveying users is also described. The conc~uding ~hapter presents in outline form major findings, of which o~ly two will be given here as examples of results of this type of analysis. Frrst, the authors found that the most effective alternate storage system, namely compact storage, saves only about one percent of annual library resources, but provokes a major loss of benefit, since compact storage limits browsing and increases retrieval time for books. A second finding of int~rest ~s . that major. cataloging expenses are for professional librarians domg ongmal catalogmg, and for proofreading and checking of catalog car~s. T?at costs of original cataloging bulk largest will not be a surprise to hb:anans, but that the ~ext largest cost should be proofreading and checkmg of catalog cards will come as a surprise to some. The book concludes with a score of research questions to be explored in the future, and it is fervently to be hoped that Raffel and Shishko will continue their investigation along the avenues they have delineated. Frederick G. Kilgour Book Reviews 253 TJ:e Undergr~d~te Library, by Irene A. Braden. Chicago: American Library Assocmhon, 1970. ( ACRL Monograph, 31). 158 pp. $7.50. The separate undergraduate library on the university campus is a phe- ?omenon of the last two decades-Harvard's Lamont Library was the first m 1949. More than twenty-five such libraries now exist or are in the plann.in~ or c01~str~~tion stage. The literature of librarianship contains descnphons of md1v1dual undergraduate libraries or philosophical essays concerning library services for undergraduates. Braden, however, was the first to st':dy more .exte~sively and impartially this attempt to provide bette~ se.rviCes for umvemty students. For her dissertation at the University of. M~ch1gan, she collected data on six undergraduate libraries-Harvard, ~I~htg~, South Carolina, Cornell, Indiana, and Texas. Each library was visited m 1965/66 and interviews with librarians were conducted· docu- ments were consulted. ' We he~e have published ~5-35 page descriptions of these six pioneers. !~~ studies r.ange from architectural design, through the gathering of the m1hal collections of books and other media, to the host of services offered in the complete~ library. Excellent statistical tables, organizational charts and floor plans Illustrate the text. There are some errors. Michigan added more _se~ts in 1965, not SeJ?,tember, 1966 as stated on page 43. Also referring to MIChtgan on page 47: The reference collection began with about 2000 volumes, but it soon became evident that it would have to be enlarged. The collection now numbers about 3100 volumes.83" The footnote refers to page 18 of the 1957/58 Annual Report of the Michigan Undergraduate Library, but there is no mention of the number of reference volumes there. Instead the 1957/58 Annual Report records on page 4 that there were 800 reference volumes on November 18, 1957 when the collections were moved into the new building. After presentation of the case studies, the author summarizes her con- clusions on the buildings, book collections, services, staffs, and use by stu?en~s. ~f particular value are fourteen brief guidelines formulated to assist libranans who may be contemplating an undergraduate library on their campus. The reader should be forewarned that The Undergraduate Library, although a most welcome publication, is now an historical document. Only data through 1964/65 are presented. Major changes in services and facilities have occurred in the past five years. Those interested in auto- m~ti~n would think that undergraduate libraries have done nothing. Michigan, however, began an automated circulation system for reserve material in 1967 and for the main collection in 1968. . Billy R. Wilkinson 252 BOOK REVIEWS Systematic Analysis of University Libraries, by Jeffrey A. Raffel and Robert Shishko. Cambridge, Mass.: M. I. T. Press, 1969. 107 pp. $6.95. . Systematic Analysis of University Libraries is an exciting book, for it IS the first rep?rt describing a~plication of cost-benefit analysis to a library. Raffel and Shishk? ha~e apphed the methodology of cost-benefit analysis to . the M. I. T. L1branes and have produced an admirable description of this. method of research that examines policy making in a system as a cho~ce among alternatives. This work is not a cookbook providing answers denved from principles; it is an exposition of a methodology that produces data used as a basis of decision making. The book employs the case-study technique, with theM. I. T. Libraries furnish~g ~e raw material for the cases. Findings cannot be extrapolated to all hbranes: although they may be applicable in some. For example, Raffel and Shishko found that 75% of the M. I. T. Libraries budget is allocated to res.earch activities in the institution. Such findings are inappli- cable to small hberal arts colleges, where faculty members do little research. The ,Purpose of Systematic Analysis of University Libraries is to teach the ~ppl1cabo~ of cost-benefit analysis rather than to provide answers. It mstructs m the methodology for obtaining answers. Case. studies pres~nted in the ?ook include selection, acquisitions and catalogm~,. ~mong hbrary operations. Also examined are book storage, study facihhes and reserve book procedures. A technique for measuring benefits by surveying users is also described. The conc~uding ~hapter presents in outline form major findings, of which o~ly two will be given here as examples of results of this type of analysis. Frrst, the authors found that the most effective alternate storage system, namely compact storage, saves only about one percent of annual library resources, but provokes a major loss of benefit, since compact storage limits browsing and increases retrieval time for books. A second finding of int~rest ~s . that major. cataloging expenses are for professional librarians domg ongmal catalogmg, and for proofreading and checking of catalog car~s. T?at costs of original cataloging bulk largest will not be a surprise to hb:anans, but that the ~ext largest cost should be proofreading and checkmg of catalog cards will come as a surprise to some. The book concludes with a score of research questions to be explored in the future, and it is fervently to be hoped that Raffel and Shishko will continue their investigation along the avenues they have delineated. Frederick G. Kilgour Book Reviews 253 TJ:e Undergr~d~te Library, by Irene A. Braden. Chicago: American Library Assocmhon, 1970. ( ACRL Monograph, 31). 158 pp. $7.50. The separate undergraduate library on the university campus is a phe- ?omenon of the last two decades-Harvard's Lamont Library was the first m 1949. More than twenty-five such libraries now exist or are in the plann.in~ or c01~str~~tion stage. The literature of librarianship contains descnphons of md1v1dual undergraduate libraries or philosophical essays concerning library services for undergraduates. Braden, however, was the first to st':dy more .exte~sively and impartially this attempt to provide bette~ se.rviCes for umvemty students. For her dissertation at the University of. M~ch1gan, she collected data on six undergraduate libraries-Harvard, ~I~htg~, South Carolina, Cornell, Indiana, and Texas. Each library was visited m 1965/66 and interviews with librarians were conducted· docu- ments were consulted. ' We he~e have published ~5-35 page descriptions of these six pioneers. !~~ studies r.ange from architectural design, through the gathering of the m1hal collections of books and other media, to the host of services offered in the complete~ library. Excellent statistical tables, organizational charts and floor plans Illustrate the text. There are some errors. Michigan added more _se~ts in 1965, not SeJ?,tember, 1966 as stated on page 43. Also referring to MIChtgan on page 47: The reference collection began with about 2000 volumes, but it soon became evident that it would have to be enlarged. The collection now numbers about 3100 volumes.83" The footnote refers to page 18 of the 1957/58 Annual Report of the Michigan Undergraduate Library, but there is no mention of the number of reference volumes there. Instead the 1957/58 Annual Report records on page 4 that there were 800 reference volumes on November 18, 1957 when the collections were moved into the new building. After presentation of the case studies, the author summarizes her con- clusions on the buildings, book collections, services, staffs, and use by stu?en~s. ~f particular value are fourteen brief guidelines formulated to assist libranans who may be contemplating an undergraduate library on their campus. The reader should be forewarned that The Undergraduate Library, although a most welcome publication, is now an historical document. Only data through 1964/65 are presented. Major changes in services and facilities have occurred in the past five years. Those interested in auto- m~ti~n would think that undergraduate libraries have done nothing. Michigan, however, began an automated circulation system for reserve material in 1967 and for the main collection in 1968. . Billy R. Wilkinson 254 Journal of Library Automation Vol. 3/3 September, 1970 Report on T~e Total System Computer Program for Medical Libraries, by Robert E. DIVett and W. Wayne Jones. Albuquerque: University of New Mexico School of Medicine Library of the Medical Sciences, 1969. 424 pp. The concep~ .of "total system" is a fairly easy one to grasp until one attempts defimtion of the term. Then there creep in all sorts of unexpected, rather unfair practical considerations, usually related to environment. Under these constraints, one man's total system becomes a very personal condi- tioned statement. The report is organized into three sections: a system description oriented toward the librarian; technical descriptions of the file organization and program structure f~r .the programmer; and a set of appendices which mclude the source hstmgs of all the programs. The source listings are mo~e than three-quarters of ~he report, and are tiring to examine and decipher. Much more useful m a report of this nature would have been the program decision tables which underly the program. A section on recommendations explores the future direction of the system. However, some matters of concern in the report are glossed over in a rather facile manner with little or no comment. The system has been implemented at different levels. Acquisitions and cat~o~ing are. essentially translations to an on-line mode of a batch system. (It ~s mteres~g to note ~at ~ card catalog is maintained to back up this on-line operation). On-line crrculation is presented as if it is running, whereas the authors say that lack of funding prevented implementation. -:r:he really exciting work has been done with file organization, the incorpora- tion of MESH tree structures on the file and their use for upward (to most general, not most specific) searching, and the development of an on-line interrogation procedure both for update and search of the file. One finds that hardware costs alone would be either $7728 per annum plus computer time for a batch system, or between $98,000 and $104,000 per annum plus computer time for a terminal system. But then one reads that "the terminal total computer system is the only effective, efficient way of meeting the demands of service and processing that are required by a technical library." When one is talking about a hardware cost of $100,000 per annum what exactly do the words "only effective, efficient" mean? Glyn Evans How To Manage and Use Technical Information, by Freeman H . Dyke, Jr. Boston: Industrial Education Institute, 1968. $15.00. Freeman Dyke is a veteran of the ups and downs of the information- retrieval industry and through his association through the years with Jonker, Doc~mentat.i~n,_Inc. ~Leasco ), and the ACM lecture circuit has developed a Wide famil1anty With hardware and software used in the handling of technical information. This book is a compendium of information about Book Reviews 255 equipment and systems, ranging from catalog cards to computers. A useful feature, repeated many times throughout the b<9ok, is a double list of "advantages" and "disadvantages" for the hardware or the system that has been described. Thus, the advantages of uniterm cards and dual dictionaries (simplicity, low equipment and operating cost, flexibility of vocabulary, physical availability, fairly high output speed ) are balanced against their disadvantages (variable search speed, low output flexibility, indirect access to information, difficulty in updating ). In most cases, no bias is indicated in the descriptive sections, and the reader is more or less on his own in making a final choice of machine or technique. Numerous clear illustrations -photographs, cartoons, diagrams and other graphics-provide a . helpful and interesting relief to the unjustified offset text. The lack of an index sets up serious retrieval problems. The major market for this book would seem to be business and industry, particularly companies which .are planning to set up or modernize their methods for the storage and retrieval of technical information. The book might well be purchased for the business or industrial users of a library. Because it is not at all oriented to the problems of library automation, it is not particularly recommended for use by the librarians themselves. · A. ]. Goldwyn An Introduction to Decision Logic Tables, by Herman McDaniel. New York: Wiley, 1968. 96 pp. $6.95. The literature of decision tables is marked more by · its absence than by its presence; before the appearance of this book, the reader was limited to brief journal articles or an infrequent technical report or two. Thus, even though the author wams that the present volume makes no pretext of being an exhaustive treatment, he has nonetheless added materially to the store of knowledge of this admittedly limited field. ; McDaniel carefully leads the reader through the process of developing a decision table and the simple rules of logic utilized to prove relevancy of the table elements or for eliminating irrelevant tests. Of interest to all who are concerned with automation is the author's discussion of -the con- version of a flow chart to· a decision table. Another interesting section is the use of table processors to translate decisioi1 tables into · portions of computer programs. At · this juncture, the author offers some evidence to support his contention that considerable programming time will be saved if the programmer works from decision tables rather than flow charts. If he is right, librarians had better get with it and learn how to construct decision tables as well as flow charts. One omission, a discussion of AND and OR condition statements, is unfortunate since it appears that they merit space even in an introductory text. However, the author does provide a considerable number of exercises for the reader. These will help to sharpen the reader's understanding of decision tables. John ]. Miniter 254 Journal of Library Automation Vol. 3/3 September, 1970 Report on T~e Total System Computer Program for Medical Libraries, by Robert E. DIVett and W. Wayne Jones. Albuquerque: University of New Mexico School of Medicine Library of the Medical Sciences, 1969. 424 pp. The concep~ .of "total system" is a fairly easy one to grasp until one attempts defimtion of the term. Then there creep in all sorts of unexpected, rather unfair practical considerations, usually related to environment. Under these constraints, one man's total system becomes a very personal condi- tioned statement. The report is organized into three sections: a system description oriented toward the librarian; technical descriptions of the file organization and program structure f~r .the programmer; and a set of appendices which mclude the source hstmgs of all the programs. The source listings are mo~e than three-quarters of ~he report, and are tiring to examine and decipher. Much more useful m a report of this nature would have been the program decision tables which underly the program. A section on recommendations explores the future direction of the system. However, some matters of concern in the report are glossed over in a rather facile manner with little or no comment. The system has been implemented at different levels. Acquisitions and cat~o~ing are. essentially translations to an on-line mode of a batch system. (It ~s mteres~g to note ~at ~ card catalog is maintained to back up this on-line operation). On-line crrculation is presented as if it is running, whereas the authors say that lack of funding prevented implementation. -:r:he really exciting work has been done with file organization, the incorpora- tion of MESH tree structures on the file and their use for upward (to most general, not most specific) searching, and the development of an on-line interrogation procedure both for update and search of the file. One finds that hardware costs alone would be either $7728 per annum plus computer time for a batch system, or between $98,000 and $104,000 per annum plus computer time for a terminal system. But then one reads that "the terminal total computer system is the only effective, efficient way of meeting the demands of service and processing that are required by a technical library." When one is talking about a hardware cost of $100,000 per annum what exactly do the words "only effective, efficient" mean? Glyn Evans How To Manage and Use Technical Information, by Freeman H . Dyke, Jr. Boston: Industrial Education Institute, 1968. $15.00. Freeman Dyke is a veteran of the ups and downs of the information- retrieval industry and through his association through the years with Jonker, Doc~mentat.i~n,_Inc. ~Leasco ), and the ACM lecture circuit has developed a Wide famil1anty With hardware and software used in the handling of technical information. This book is a compendium of information about Book Reviews 255 equipment and systems, ranging from catalog cards to computers. A useful feature, repeated many times throughout the b<9ok, is a double list of "advantages" and "disadvantages" for the hardware or the system that has been described. Thus, the advantages of uniterm cards and dual dictionaries (simplicity, low equipment and operating cost, flexibility of vocabulary, physical availability, fairly high output speed ) are balanced against their disadvantages (variable search speed, low output flexibility, indirect access to information, difficulty in updating ). In most cases, no bias is indicated in the descriptive sections, and the reader is more or less on his own in making a final choice of machine or technique. Numerous clear illustrations -photographs, cartoons, diagrams and other graphics-provide a . helpful and interesting relief to the unjustified offset text. The lack of an index sets up serious retrieval problems. The major market for this book would seem to be business and industry, particularly companies which .are planning to set up or modernize their methods for the storage and retrieval of technical information. The book might well be purchased for the business or industrial users of a library. Because it is not at all oriented to the problems of library automation, it is not particularly recommended for use by the librarians themselves. · A. ]. Goldwyn An Introduction to Decision Logic Tables, by Herman McDaniel. New York: Wiley, 1968. 96 pp. $6.95. The literature of decision tables is marked more by · its absence than by its presence; before the appearance of this book, the reader was limited to brief journal articles or an infrequent technical report or two. Thus, even though the author wams that the present volume makes no pretext of being an exhaustive treatment, he has nonetheless added materially to the store of knowledge of this admittedly limited field. ; McDaniel carefully leads the reader through the process of developing a decision table and the simple rules of logic utilized to prove relevancy of the table elements or for eliminating irrelevant tests. Of interest to all who are concerned with automation is the author's discussion of -the con- version of a flow chart to· a decision table. Another interesting section is the use of table processors to translate decisioi1 tables into · portions of computer programs. At · this juncture, the author offers some evidence to support his contention that considerable programming time will be saved if the programmer works from decision tables rather than flow charts. If he is right, librarians had better get with it and learn how to construct decision tables as well as flow charts. One omission, a discussion of AND and OR condition statements, is unfortunate since it appears that they merit space even in an introductory text. However, the author does provide a considerable number of exercises for the reader. These will help to sharpen the reader's understanding of decision tables. John ]. Miniter 256 Journal of Library Automation Vol. 3/3 September, 1970 Computer-Based Library and Information Systems, by J.P. Henley. Com- puter Monographs Series. New York: American Elsevier, 1970. 84 pp. $5.75. Just when librarians and computer speCialists were beginning to under- stand each other, there is published a slight monograph that effectively gaps the bridge. The book is based upon Mr. Henley's M.Sc. work at Trinity College, Dublin. It bears a 1970 imprint, but appears to be about seven years out of date. One is told briefly about the King Report, the information retrieval languages LISP and COMIT, and related ancient breakthroughs. The bibliography yields 32 dated citations with a mean date of 1963-the ap- proximate time this work might have been considered timely. In seven slim chapters ru1d two gratuitous appendices, the author treats such topics as "Introduction to the Computer", "Library Systems Require- ments", "The Philosophy of a Machine-Based System", and even "A Short Note on Backus Normal Form". Some of the author's urgent allusions to old events are pure High Camp, e.g., "The growing interest in mechanisa- tion, borne out for example by . . . the recent initiation of discussions between a major publishing house and a large computer manufacturer, make it vital for the cross-fertilization of ideas between computer and library experts to proceed as quickly as possible." (p. 75.) Other pronounce- ments are patently absurd, such as: "One common use of such real-time 'on-line' computing is the writing of a program directly at the console, instruction l;ly instruction, instead of having to write it all beforehand and read it in from cards or paper tape." ( p. 10.) It is all too easy to fault a short book for shortcomings, but other books in this same series, such as J. M. Foster's List Processing, have proven the excellence possible in a trim SO-shilling monograph (Mr. Foster's work is only 54 pages.) Excellence in this format appears to require focus upon a narrow subject area, and discipline in the treatment of the core elements of the area. In attempting in 84 pages to cover several subjects of encyclo- pedic scope (Library and Information Systems, as well as a basic computer tutorial), the author piles Pelion upon Ossa and then shows us sample pebbles from the pile instead of the view from the summit. There remains much important and exciting material to be presented to librarians and computer people about each other's work. Regrettably, Mr. Henley, in the words of his fellow Dubliner James Joyce, has "speared the rod and spoiled the lightning." WiUiam R. Nugent 5259 ---- AN ALGORITHM FOR VARIABLE-LENGTH PROPER-NAME COMPRESSION 257 James L. DOLBY: R & D Consultants Company, Los Altos, California Viable on-line search systems require reasonable capabilities to automa- tically detect (and hopefully correct) variations between request format and stored format. An important requirement is the solution of the prob- lem of matching proper names, not only because both input specificatiof.I,S and storage specifications are subject to error, but also because various transliteration schemes exist and can provide variant proper name forms in the same data base. This paper reviews several proper name matching schemes and provides an updated version of these schemes which tests out nicely on the proper name equivalence classes of a suburban telephone book. An appendix lists the corpus of names used for algorithm test. A viable on-line search system cannot reasonably assume that each user will invariably provide the proper input information without error. Human beings not only make errors, but also expect their correspondents, be they human or mechanical, to be able to cope with these errors, at least at some reasonable error-rate level. Many of the difficulties in implementing com- puter systems in many areas of human activity stem from failure to rec- ognize, and plan for, routine acceptance of errors in the systems. Indeed, computing did not become the widespread activity it is now until the so- called higher-level languages came into being. Although it is customary to think of higher-level languages as being "more English-like," the height of their level is better measured by the brevity with which various jobs can be expressed (for brevity tends to reduce errors) and the degree of sophistication of their automatic error detection and correction procedures. The processing of catalog information for the purposes of exposing and retrieving information presents at least two major areas for research in automatic error detection and correction. At the first stage, the data bank must be created, updated and maintained. Methods for dealing with input errors at this level have been derived by a number of groups and it seems reasonable to assert that something in the order of 60% of the input errors can be detected automatically ( 1,2,3 ). With the possibility of human proof- 258 Journal of Library Automation Vol. 3/4 December, 1970 reading and error detection through actual use, it is reasonable to expect a mature data base to have a very low over-all error rate. At the second stage, however, when a user approaches the data base through a terminal or other on-line device, the errors will be of a recurring nature. Each user will generate his own error set and, though experience will tend to minimize the error rate for a particular user, there will be an essentially irreducible minimum error rate even for an experienced user. If the system is to attract users other than professional interrogators, it must respond intelligently at this minimal error level. This paper explores certain problems associated with making "noisy matches" in catalog searches. Because preliminary information indicates that the most likely source of input errors is in the keyboarding of proper names, the main emphasis of the paper is on the problem of algorithmically compressing proper names in such a way as to identify similar names (and likely misspellings) without over-identifying the list of possible authors. EXISTING NAME-COMPRESSION ALGORITHMS The problem of providing equivalence classes of proper names is hardly new. Library catalogs, telephone directories and other major data bases have made use of "see-also"-type references for many years. Some years ago Remington-Rand derived an alphanumeric name compression algor- ithm, SOUNDEX, that could be applied either by hand or by machine for such purposes ( 4). Perhaps the most widely used on-line retrieval system presently in existence, the airline reservation system (such as SABRE), makes use of such an algorithm (5). The closely related problem of compressing English words (either to establish noisy matches, to elimi- nate misspelled words, or simply to achieve data bank compression) has also received some attention ( 6, 7, 8). Implementation of such algorithms has been described ( 9, 10, 11, 12, 13). Although English word structure differs from proper-name structure in some important respects (e.g., the existence of suffixes), three of the algorithms are constructed by giving varying degrees of attention to the following five areas of word structure: 1 ) The character in word initial position; 2) The character set: (A, E, I, 0, U, Y, H, W); 3) Doubled characters (e.g., tt); 4) Transformation of consonants (i.e., all alphabetic characters other than those in 2 above) into equivalence classes; 5) Truncation of the residual character string. The word-initial character receives varying attention. SOUNDEX places the initial consonant in the initial position of the compressed form and then transforms all other consonants into equivalence classes with numeric titles. SABRE maintains the word-initial character even if it is a vowel. In the Armour Research Foundation scheme (ARF), the word-initial character is also retained as is. Algorithm for Name CompressionjDOLBY 259 Both SOUNDEX and SABRE eliminate all characters in the set 2) above. The ARF scheme retains all characters in shorter words and deletes vowels only, to reduce the compressed form to four characters, deleting the "U" after "Q," the second vowel in a vowel string, and then all re- maining vowels. All three systems delete the second letter of a double-letter string. SABRE goes a step further and deletes the second letter of a double- letter string occurring after the vowels have been deleted. Thus, the second "R" of "BEARER" would be deleted. SOUNDEX maps the eighteen consonants into six equivalence classes: 1) B, F, P, V 2) C, G, J, K, Q, S, X, Z 3) D, T 4) L 5) M, N 6) R SABRE and ARF do not perform any transformations on these eighteen consonants. Finally, all three systems truncate the remaining string of characters to four characters. For shorter forms, padding in the form of zeros (SOUNDEX), blanks (SABRE), or hyphens (ARF) is added so that all codes are precisely four characters long. Variable-length coding schemes have been considered but generally rejected for implementation on major systems because of the attendant difficulties of programming and the fact that code compression is en- hanced by fixed-length codes where no interword space is necessary. Although fixed-length schemes of length greater than four have been considered, no definitive data appears to be available as to the enhanced ability of compressed codes to discriminate by introduction of more characters. The SABRE system does add a fifth character but makes use of the person's first initial for added discrimination. Tukey ( 14) has constructed a personal author code for his citation indexing and permuted title studies on an extensive corpus of the statistical literature. In this situation the author code is a semi-mnemonic code in a tag form to assist the user in identification rather than to be used as a basic entry point. However, Tukey does note that in his corpus a three- character code of the surname, plus two initials, is superior to a five- character surname code for purposes of unique identification. MEASURING ALGORITHMIC PERFORMANCE One of the main problems in constructing linguistic algorithms is to decide on appropriate measures of performance and to obtain data bases for implementing such measures. In this case it is clear that certain improvements in existing algorithms can be made- particularly by using more sophisticated b·ansformation rules for the consonants - and that 260 Journal of Librat·y Automation Vol. 3/4 December, 1970 the problems of implementing such changes are not so great in today's context as they were when the systems noted above were originally derived. Improvements in processing speeds and programming languages, how- ever, do not remove the need for keeping "linguistic frills" to a minimum. Ideally, it would be desirable to have a list of common errors in key- boarding names as a test basis for any proposed algorithms. Unfortunately, no such list of sufficient size appears to be available. Lacking this, one can speculate that certain formal properties of the predictability of language might be useful in deriving an algorithm. At the English word level, some effort has been made to exploit measures of entropy as developed by Shannon in this direction (6, 7). However, there is good reason to question whether entropy, at least when measured in the usual way, is strongly correlated with actually occurring errors ( 15). As an alternative, one can study existing lists of personal-name equiva- lence classes to derive such algorithms and then test the algorithm against such classes, measuring both the degree of over-identification and the de- gree of under-identification. Clearly, such tests will carry more weight if they are conducted under economic forcing conditions where weaknesses in the test set will lead to real and measurable expense to the organization publishing the list. The SABRE system operates under strong economic forcing conditions in the sense that airline passengers frequently have a number of competitive alternatives available to them and lost reservations can cause sufficient inconvenience for them to consider these alternatives. However, the main application of the SABRE system is to rather small groups of persons (at least when compared to the number of personal authors in a typical library catalog), so that errors of over-identification are essentially trivial in cost to the airlines. A readily available source of "see-also"-type equivalence classes of proper names is given in the telephone directory system. Here, the eco- nomic forcing system is not so strong as in the airline situation, but it is measurable in that failure to provide an adequate list will lead to increased user dependence on the Information Operator, with consequent increased cost to the telephone company. As a test of the feasibility of using such a set of equivalence classes, the 451 classes found in the Palo Alto-Los Altos (California) telephone directory were copied out by hand and used in deriving and testing the algorithm given in the next section and the SOUNDEX algorithm. There remains the question of deciding what is to constitute proper agreement between any algorithm and the set of equivalence classes chosen as a data base. At the grossest level it seems reasonable to argue that over- identification is less serious than under-identification. False drops only tend to clog the line. Lost reference points, on the other hand, lead to lost in- formation. Investigation of other applications of linguistic algorithms, such as algorithms to hyphenate words, identify semantically similar words through cutting off of suffixes, and so forth, indicates that it is usually Algorithm for Name CompressionjDOLBY 261 possible to reduce crucial error (in this case under-identification) to some- thing under 5%, while preserving something in the order of 80% of the original distinctions (or efficiency) of the system. Efforts to improve materially on the "five-and-eighty" rule generally lead to solutions involv- ing larger context and/or extensive exception dictionaries. In this study efforts are directed at achieving a "five-and-eighty" solution. A VARIABLE-LENGTH NAME-COMPRESSION SCHEME In light of the fact that no definitive information is available on the problems of truncating errors in name-compression algorithms, it is con- venient to break the problem into two pieces. First is derivation of a variable-length algorithm of the required accuracy and efficiency and then determination of the errors induced by truncation. A studying of the set of equivalence classes given in the Palo Alto-Los Altos telephone directory made fairly clear that with minor modifications of the basic five steps used in the other algorithms noted above, it would not be too difficult to provide a reasonably accurate match without requir- ing too much over-identification. The main modifications made consisted of maintaining the position of the first vowel and using local context to make transformations on the consonants. The algorithm is given below. (The rules given must be applied in the order given both with respect to the rules themselves and to the order of the lists within the rules, as the precedence relations are important to the performance of the algorithm.) A Spelling Equivalent Abbreviation Algorithm For Personal Names 1) Transform: "MeG" to "Mk", "Mag" to "Mk", "Mac" to "Mk", "Me" to "Mk". 2) Working from the right, recursively delete the second letter from th f II . I tt · "dt" "ld" " d" " t" " " " d" " t" " '' e o owmg e er parrs: , , n , n , rc , r , r , sc , "sk", "st''. 3) T f ,, , t "k ,, (( , t 1.( , " ., t " ., " , t " ,~ ,, rans orm: x o s , ce o se , c1 o s1 , cy o sy , con- sonant-ch" to "consonant-sh"; all other occurrences of "c" to "k", "z" to "s", "wr" to "r", "dg" to "g", "qu" to "k'', "t" to "d", "ph" to "f' (after the first letter). 4) Delete all consonants other than "1", "n", and Y' which precede the letter "k" (after the first letter). 5) Delete one letter from any doubled consonant. 6) Transform "pf#" to "p#", "#pf" to "#f", "vowel-gh#" to "vowel-£#", "consonant-gh" to "consonant-g#", and delete all other occurrences of "gh". ("#"is the word-beginning and word-ending marker.) 7) Replace the first vowel in the name by the symbol "•". 8) Delete all remaining vowels. 9) Delete all occurrences of "w" or "h" after the first letter in the word. The vowels are taken to be (A, E, I, 0, U, Y) . The remaining literal characters are treated as consonants. 262 Journal of Library Automation Vol. 3/4 December, 1970 The algorithm splits 22 ( 4.9%) of the 451 equivalence classes given by the phone directory. On the other hand, the algorithm provides 349 dis- tinct classes (not counting those classes that were broken off in error) or 77.4% of the 451 classes in the telephone directory data base. Thus has been achieved a reasonable approximation to the "five-and-eighty" per- formance found in other linguistic problem areas. To give a proper appreciation of the nature of these underidentification errors, they are discussed below individually. 1) The name Bryer is put in the same equivalence class with a variety of spellings of the name Bear. The algorithm fails to make this identification. 2) Blagburn is not equated to Blackburn. 3) The name Davison is equated to Davidson in its various forms. The algorithm fails to make this identification and this appears to be one of a modest class of difficulties that occur prior to the -son, -sen names. 4) The class of names Dickinson, Dickerson, Dickison, and Dickenson are all equated by the directory but kept separate, except for the two forms of Dickinson, by the algorithm. 5) The name Holm is not equated with the name Home. 6) The name Holmes is not equated with the name Homes. 7) The algorithm fails to equate Jaeger with two forms of Yaeger. 8) The algorithm fails to equate Lamb with Lamn. 9) The algorithm incorrectly assumes that the final "gh" of Leigh should be treated as an "f." Treating final "gh" either as a null sound or an "f' leads to about the same number of errors in either direction. 10) The algorithm fails on the pairing of Leicester and Lester. The difficulty is an intervening vowel. 11) The algorithm fails to equate the various forms of Lindsay with the forms of Lindsley. 12) The algorithm fails to equate the various forms of McLaughlin with McLachlan. 13) The algorithm fails to equate McCullogh with McCullah. This is again the final "gh" problem. 14) The algorithm fails to equate McCue with McHugh (again the final "gh" problem) . 15) The algorithm fails to equate Moretton with Morton. This is an intervening vowel problem. 16) The algorithm fails to equate Rauch with Roush. 17) The algorithm fails to equate Robinson with Robison (another -son type problem). 18) The algorithm incorrectly assumes that the interior "ph" of Shep- herd is an "£." 19) The algorithm fails to equate Speer with Speier. Algorithm for Name CompressionjDOLBY 263 20) The algorithm fails to equate Stevens with Stephens. 21) The algorithm fails to equate Stevenson with Stephenson. 22) The algorithm fails to equate the various forms of the word Thomp- son (an -son problem.) In several of the errors noted above it may be questioned whether the telephone directory is following its own procedures with complete rigor. Setting these aside, the primary errors occur with the final "gh," the words ending in "son," and the words with the extraneous interior vowels. Each of these problems can be resolved to any desired degree of accuracy, but only at the expense of noticeable ·increases in the degree of complexity of the algorithm. THE TRUNCATION PROBLEM Simple truncation does not introduce errors of under-identification; it can only lead to further over-identification. Examination of the results of applying the algorithm to the telephone directory data base shows that no new over-identification is introduced if the compressed codes are all reduced to the leftmost seven characters. Further truncation leads to the following results: Code Length 7 6 5 4 Cumulative Over-Identification Losses 0 1 6 45 Thus there is a strong argument for maintaining at least five characters in the compressed code. However, there is no real need for restriction to simple truncation. Following the procedures used in the ARF system, further truncation can be obtained by selectively removing some of the remaining characters. The natural candidate for such removal is the vowel marker. If the vowel marker is removed from all the five character codes, only six more over- identification errors are introduced. Removal of the vowel markers from all of the codes would have introduced 17 more errors of over-identification. The utility of the vowel marker is in the short codes. This in turn suggests that introduction of a second vowel marker in the very short codes may have some utility, and this is indeed the case. If the conception of vowel marker is generalized as marking the position of a vowel-string (i.e., a string of consecutive vowels), where for these purposes a vowel is any of the characters (A, E, I, 0, U, Y, H, W), and these markers are main- tained as "padding" in the very short words, 18 errors of over-identification are eliminated at the cost of two new errors of under-identification. In this way the following modification to the variable length algorithm is derived: 1) Mark the position of each of the first two vowel strings with an "o ," if there is more than one vowel. 264 Journal of Library Automation Vol. 3/4 December, 1970 2) Truncate to six characters. 3) If the six-character code has two vowel markers, remove the right- hand vowel marker. Otherwise, truncate the sixth character. 4) If the resulting five-character code has a vowel marker, remove it. Otherwise remove the fifth character. 5) For all codes having less than four characters in the variable-length fonn, pad to four characters by adding blanks to the right. Measured against the telephone directory data base, this fixed-length compression code provides 361 distinct classes (not counting improper class splits as separate classes) or 80% of the 451 given classes. Twenty- four ( 5.3 %) of the classes are improperly split. By way of comparison, the SOUND EX system improperly splits 135 classes ( 30%) and provides only 287 distinct classes (not counting improperly split classes), or 63.8% of the telephone directory data base. ACKNOWLEDGMENTS This research was carried out for the Institute of Library Research, University of California, under the sponsorship of the Office of Education, Research Grant No. OEG-1-7-071083-5068. The author would like to thank Ralph M. Shoffner and Kelley L. Cart- wright for suggesting the problem and for a number of useful comments on existing systems. Allan J. Humphrey was kind enough to program the variable-length version of the algorithm for test purposes. APPENDIX: CORPUS OF NAMES USED FOR ALGORITHM TEST A list of personal-name equivalence classes from the Palo Alto-Los Altos Telephone Directory is arranged according to the variable-length compres- sion code (with the vowel marked "•" treated as an "A" for ordering) . Names whose compressed codes do not match the one given in the first column (and hence represent weaknesses in the algorithm and/ or the directory groupings) are given in italics. A small number of directory entries that do not bear on the immediate problem have been deleted from the list : Bell's see also Bells; Co-op see also Co-operative; St. see also Saint; etc. 0 BL Abel, Abele, Abell, Able 0 BRMS Abrahams, Abrams 0 BRMSN Abrahamson, Abramson •D Eddy, Eddie 0 DMNS Edmonds, Edmunds 0 DMNSN Edmondson, Edmundson 0 DMS Adams, Addems 0 GN Eagen, Egan, Eggen 0 GR Jaeger, Yaeger, Yeager °KN Aiken, Aikin, Aitken °KNS Adkins, Akins °KR OKR ·Ks 0 LBRD ·Ln 0 LN 0 LSN 0 LVR •Ms 0 NGL 0 NL 0 NRS 0 NRSN •Ns 0 RKSN 0 RL 0 RN •RNs •Rs 0 RVN 0 RVNG 0 SBRN B•n B•ns B°KMN B0 L B0 L B0 L B0 L B.L B 0 LN B·M B 0 MN B•N B0 ND B·R B0 R B•R B•R B 0 RBR B•Rc B 0 RGR B 0 RK B 0 RN Algorithm for Name CompressionjDOLBY 265 Acker, Aker Eckard, Eckardt, Eckart, Eckert, Eckhardt Oakes, Oaks, Ochs Albright, Allbright Elliot, Elliott Allan, Allen, Allyn Ohlsen, Olesen, Olsen, Olson, Olsson Oliveira, Olivera, Olivero Ames, Eames Engel, Engle, Ingle O'Neal, O'Neil, O'Neill Andrews, Andrus Andersen, Anderson, Andreasen Ennis, Enos Enrichsen, Erickson, Ericson, Ericsson, Eriksen Earley, Early Erwin, Irwin Aarons, Ahrends, Ahrens, Arens, Arentz, Arons Ayers, Ayres Ervin, Ervine, Irvin, Irvine Erving, Irving Osborn, Osborne, Osbourne, Osburn Beatie, Beattie, Beatty, Beaty, Beedie Betts, Betz Bachman, Bachmann, Backman Bailey, Baillie, Bailly, Baily, Bayley Beal, Beale, Beall, Biehl Belew, Ballou, Bellew Buhl, Buell Belle, Bell Bolton, Boulton Baum, Bohm, Bohme Bauman, Bowman Bain, Bane, Bayne Bennet, Bennett Baer, Bahr, Baier, Bair, Bare, Bear, Beare, Behr, Beier, Bier, Bryer Barry, Beare, Beery, Berry Bauer, Baur, Bower Bird, Burd, Byrd Barbour, Barber Berg, Bergh, Burge Berger, Burger Boerke, Birk, Bourke, Burk, Burke Burn, Byrne 266 Journal of Library Automation Vol. 3/4 December, 1970 B 0 RNR B 0 RNS B 0 RNSN B0 RS BL°KBRN BL 0 M BR 0 D BR 0 N BR 0 N D 0 DS D°F D 0 GN D°K n•KNSN n•KsN n•L n•L n•L D 0 MN n•N n•N n•N n•N n•N D0 NL D.R n•R D 0 RM D 0 VDSN n•vs DR0 SL F• F°FR F 0 GN F 0 L F 0 L F 0 LKNR F 0 LPS F 0 NGN F 0 NL F0 RL F 0 RR F 0 RR F 0 RS Bernard, Bernhard, Bernhardt, Bernhart Berns, Bims, Burns, Byrns, Byrnes Bernstein, Bornstein Bertsch, Birch, Burch Blackburn, Blagburn Blom, Bloom, Bluhm, Blum, Blume Brode, Brodie, Brody Braun, Brown, Browne Brand, Brandt, Brant Diezt, Ditz Duffie, Duffy Dougan, Dugan, Duggan Dickey, Dicke Dickenson, Dickerson, Dickinson, Dickison Dickson, Dixon, Dixson Dailey, Daily, Daley, Daly Dahl, Dahle, Dall, Doll Deahl, Deal, Diehl Diamond, Dimond, Dymond Dean, Deane, Deen Denney, Denny Donahoo, Donahue, Donoho, Donohoe, Donohoo, Donohue, Dunnahoo Downey, Downie Dunn, Dunne Donley, Donnelley, Donnelly Daugherty, Doherty, Dougherty Dyar, Dyer Derham, Durham Davidsen, Davidson, Davison Davies, Davis Driscoll, Driskell Fay, Fahay, Fahey Fifer, Pfeffer, Pfeiffer Fagan, Feigan, Fegan Feil; Pfeil Feld, Feldt, Felt Faulkner, Falconer Philips, Phillips Finnegan, Finnigan Finlay, Finley Farrell, Ferrell Ferrara, Ferreira, Ferriera Foerster, Forester, Forrester, Forster Forrest, Forest F 0 RS F 0 RS F 0 SR FL 0 N FL 0 NGN FR0 FR0 DMN FR0 DRKSN FR°K FR0 NS FR0 NS FR 0 S FR0 SR G0 D G0 DS G°F G0 L G0 LMR G0 LR G0 MS G0 NR G 0 NSLS G0 NSLVS G0 RD c•Rn G 0 RN G 0 RNR c•RR G 0 S GR 0 GR.FD GR0 N GR•s H•n H°F H°FMN H0 G H 0 GN H°K H°KSN H 0 L H•L H•L H0 L H 0 LD Algorithm for Name CompressionjDOLBY 267 Faris, Farriss, Ferris, Ferriss First, Fuerst, Furst Fischer, Fisher Flinn, Flynn Flanagan, Flanigan, Flannigan Frei, Frey, Fry, Frye Freedman, Friedman Frederickson, Frederiksen, Fredickson, Fredriksson Franck, Frank France, Frantz, Franz Frances, Francis Freeze, Freese, Fries Fraser, Frasier, Frazer, Frazier Good, Goode Getz, Goetz, Goetze Goff, Gough Gold, Goold, Gould Gilmer, Gilmore, Gilmour Gallagher, Gallaher, Galleher Gomes, Gomez Guenther, Gunther Gonzales, Gonzalez Consalves, Gonzalves Garratt, Garrett Garrity, Geraghty, Geraty, Gerrity Gorden, Gordohn, Gordon Gardiner, Gardner, Gartner Garrard, Gerard, Gerrard, Girard Gauss, Goss Gray, Grey Griffeth, Griffith Green, Greene Gros, Grose, Gross Hyde, Heidt Hoff, Hough, Huff Hoffman, Hoffmann, Hofman, Hofmann, Huffman Hoag, Hoge, Hogue Hagan, Hagen Hauch, Hauck, Hauk, Hauke Hutcheson, Hutchison Holley, Holly Holl, Hall Halley, Haley Haile, Hale Holiday, Halliday, Holladay, Holliday I 268 Journal of Libra1·y Automation Vol. 3/4 December, 1970 H 0 LG H 0 LM H 0 LMS H 0 LN H0 M H 0 MR H 0 N H 0 N H0 NN H 0 NRKS H 0 NRKSN H0 NS H0 NS I-JONSN H 0 R H 0 R H 0 R H 0 R H 0 RMN H 0 RMN H 0 RMN H0 RN H 0 RN H 0 RN H 0 RNGDN H 0 S H 0 S H 0 S H 0 SN H 0 VR r tFR rFRS tKB rKBsN rKs rL rMs rMSN rNsN rs Ko K°F K°FMN Helwig, Hellwig Holm, Home Holmes, Homes Highland, Hyland Ham, Hamm Hammar, Hammer Hanna, Hannah Hahn, Hahne, Harm, Haun Hanan, Hannan, Hannon Hendricks, Hendrix, Henriques Hendrickson, Henriksen, Henrikson Heintz, Heinz, Heinze, Hindes, Hinds, Hines, Hinze Haines, Haynes Henson, Hansen, Hanson, Hanssen, Hansson, Hanszen Herd, Heard, Hird, Hurd Hart, Hardt, Harte, Heart Hare, Hair Hardey, Hardie, Hardy Hartman, Hardmen, Hardman, Hartmann Herman, Hermann, Herrmann Harman, Harmon Heron, Herrin, Herron Hardin, Harden Hom, Horne Herrington, Harrington Haas, Haase, Hasse Howes, House, Howse Hays, Hayes Houston, Huston Hoover, Hover Jew, Jue Jeffery, Jeffrey Jefferies, Jefferis, Jefferys, Jeffreys Jacobi, Jacoby Jacobsen, Jacobson, Jackobsen Jacques, Jacks, Jaques Jewell, Juhl Jaimes, James Jameson, Jamieson, Jamison Jahnsen, Jansen, Jansohn, Janssen, Jansson, Janzen, Jensen, Jenson Joice, Joyce Kay, Kaye Coffee, Coffey Coffman, Kauffman, Kaufman, Kaufmann K°K K0 L K0 L K0 LMN K0 LR K0 MBRLN K 0 MBS K0 MP K0 MPS K0 N K0 N K0 N K0 N K0 N K0 N K0 N K 0 NL K 0 NR K0 NS K0 P K0 PL K0 R K0 R K0 R K0 R K0 R K 0 RD K0 RLN K 0 RN K0 RSNR K0 S K0 S K0 S K0 SL K0 SLR K 0 SR KL 0 N KL.,RK KL 0 SN KR 0 KR 0 GR KR.,MR KR 0 N KR 0 S KR 0 S Algor·ithm. for Name CompressionfDOLBY 269 Cook, Cooke, Koch, Koche Cole, Kohl, Koll Kelley, Kelly Coleman, Cohnan Koehler, Koeller, Kohler, Koller Chamberlain, Chamberlin Combs, Coombes, Coombs Camp, Kampe, Kampf Campos, Campus Cahn, Conn, Kahn Cahen, Cain, Caine, Cane, Kain, Kane Chin, Chinn Chaney, Cheney Coen, Cohan, Cohen, Cohn, Cone, Koehn, Kahn Coon, Kuhn, Kuhne Kenney, Kenny, Kinney Conley, Conly, Connelly, Connolly Conner, Connor Coons, Koontz, Kuhns, Kuns, Kuntz, Kunz Coop, Co-op, Coope, Coupe, Koop Chapel, Chapell, Chappel, Chappell, Chappelle, Chapple Carrie, Carey, Cary Corey, Cory Carr, Kar, Karr Kurtz, Kurz Kehr, Ker, Kerr Cartwright, Cortright Carleton, Carlton Carney, Cerney, Kearney Kirschner, Kirchner Chace, Chase Cass, Kass Kees, Keyes, Keys Cassel, Cassell, Castle Kesler, Kessler, Kestler Kaiser, Kayser, Keizer, Keyser, Kieser, Kiser, Kizer Cline, Klein, Kleine, Kline Clark, Clarke Claussen, Clausen, Clawson, Closson Crow, Crowe Krieger, Kroeger, Krueger, Kruger Creamer, Cramer, Kraemer, Kl·amer, Kremer Craine, Crane Christie, Christy, Kristee Crouss, Kraus, Krausch, Krause, Krouse 270 Journal of Library Automation Vol. 3/4 December, 1970 KR 0 S KR 0 S KR 0 SNSN Lo Lo L 0 D L 0 DL L 0 DRMN L°K L°KS L 0 LN L 0 LR L 0 MB L 0 MN L 0 MN L0 N L0 N L0 N L0 N L 0 NG L 0 NN L 0 NS L 0 R L 0 RNS L 0 RNS L 0 RSN L 0 S L 0 S L 0 SR L0 V L 0 VD L 0 VL L 0 VN M 0 D M 0 DN M0 DS M 0 DSN M°KL M°KM M°KS M°KS M 0 LN M 0 LN M 0 LR M 0 LR Cross, Krost Crews, Cruz, Kruse Christensen, Christiansen, Christianson Loe, Loewe, Low, Lowe Lea, Lee, Leigh Lloyd, Loyd Litle, Littell, Little, Lytle Ledterman, Letterman Leach, Leech, Leitch Lucas, Lukas Laughlin, Loughlin Lawler, Lawlor Lamb, Lamm Lemen, Lemmon, Lemon Layman, Lehman, Lehmann Lind, Lynd, Lynde Lion, Lyon Lin, Linn, Lynn, Lynne Lain, Laine, Laing, Lane, Layne Lang, Lange London, Lundin Lindsay, Lindsey, Lindsley, Linsley Lawry, Lowery, Lowrey, Lowry Lawrence, Lowrance Laurence, Lawrance, Lawrence, Lorence, Lorenz Larsen, Larson Lewis, Louis, Luis, Luiz Lacey, Lacy Leicester, Lester Levey, Levi, Levy Leavett, Leavitt, Levit Lavell, Lavelle, Leavelle, Loveall, Lovell Lavin, Levin, Levine Mead, Meade M oretton, Morton Mathews, Matthews Madison, Madsen, Matson, Matteson, Mattison, Mattson Michael, Michel Meacham, Mechem Marques, Marquez, Marquis, Marquiss Marcks, Marks, Marx Maloney, Moloney, Molony Mullan, Mullen, Mullin Mallery, Mallory Moeller, Moller, Mueller, Muller M0 LR M 0 LS M 0 N M0 NR M0 NR M0 NSN M 0 R M 0 R M0 R M0 R M0 R M0 RF M0 RL M 0 RN M 0 RS M0 RS MK0 MK0 MK0 MK 0 MK 0 L MK 0 LF MK 0 LM MK 0 N MK 0 NR MK 0 NS MK0 NS MK0 R MK0 R MKD 0 NL MKF 0 RLN MKF 0 RSN MKL 0 D MKL 0 KLN MKL 0 LN MKL 0 N MKL•N MKL 0 S MKM 0 LN MKN°L MKR•o N°KL N°KLS N°KLS Algorithm for Name CompressionjDOLBY 271 Millar, Miller Miles, Myles Mahan, Mann Miner, Minor Monroe, Munro Monson, Munson Murray, Murrey Maher, Maier, Mayer Mohr, Moor, Moore Meyers, Myers Meier, Meyer, Mieir, Myhre Murphey, Murphy Merrell, Merrill Marten, Martin, Martine, Martyn Meyers, Myers Maurice, Morris, Morse McCoy, McCaughey Magee, McGee, McGehee, McGhie Mackey, MacKay, Mackie, McKay McCue, McHugh Magill, McGill McCollough, McCullah, McCullough McCallum, McCollum, McColm McKenney, McKinney Macintyre, McEntire, Mcintire, Mcintyre MacKenzie, McKenzie Maginnis, McGinnis, McGuinness, Mcinnes, Mcinnis Maguire, McGuire McCarthy, McCarty MacDonald, McDonald, McDonnell MacFarland, MacFarlane, McFarland, McFarlane MacPherson, McPherson MacLeod, McCloud, McLeod MacLachlan, Maclachlin, McLachlan, McLaughlin, McLoughlin McClellan, McClelland, McLellan McClain, McClaine, McLain, McLane MacLean, McClean, McLean McCloskey, McClosky, McCluskey MacMillan, McMillan, McMillin MacNeal, McNeal, McNeil, McNeill Magrath, McGrath Nichol, Nicholl, Nickel, Nickle, Nicol, Nicoll Nicholls, Nichols, Nickels, Nickles, Nicols Nicholas, Nicolas 272 Journal of Library Automation Vol. 3/4 D ecember, 1970 N°KLSN N°KSN N°L N°LSN N°MN N°RS N°SBD p•n P 0 DRSN p•c P 0 LK P0 LSN p•N p•R p•R P0 RK P 0 RKS p•Rs r•Rs p•Rs P 0 RSN PR°KR PR 0 NS PR 0 R R• R• R 0 BNSN R•n R•n R 0 D R 0 DR R•ns R 0 GN R•GR R°K R°K R°KR n•L R0 MNGTN R0 MR n•Ms n•N R0 NR R•s Nicholsen, Nicholson, Nicolaisen, Nicolson Nickson, Nixon Neal, Neale, Neall, Neel, Neil, Neill Neilsen, Neilson, Nelsen, Nelson, Nielsen, Nielson, Nilson, Nilssen, Nilsson Neumann, Newman Norris, Nourse Nesbit, Nesbitt, Nisbet Pettee, Petty Peterson, Pederson, Pedersen, Petersen, Petterson Page, Paige Polak, Pollack, Pollak, Pollock Polson, Paulsen, Paulson, Poulsen, Poulsson Paine, Payn, Payne Parry, Perry Parr, Paar Park, Parke Parks, Parkes Pierce, Pearce, Peirce, Piers Parish, Parrish Paris, Parris Pierson, Pearson, Pehrson, Peirson Prichard, Pritchard Prince, Prinz Prior, Pryor Roe, Rowe Rae, Ray, Raye, Rea, Rey, Wray Robinson, Robison Rothe, Roth Rudd, Rood, Rude Reed, Read, Reade, Reid Rider, Ryder Rhoades, Rhoads, Rhodes Regan, Ragon, Reagan Rodgers, Rogers Richey, Ritchey, Ritchie Reich, Reiche Reichardt, Richert, Rickard Reilley, Reilly, Reilli, Riley Remington, Rimington Reamer, Reimer, Riemer, Rimmer Ramsay, Ramsey Rhein, Rhine, Ryan Reinhard, Reinhardt, Reinhart, Rhinehart, Rinehart Reas, Reece, Rees, Reese, Reis, Reiss, Ries R0 S R0 S R0 S R•vs s•BR S°FL s•FN S°FNS S°FNSN S°FR S°FR s•cL S 0 GLR s•K s•Ks s•L s•L s•LR s•Ls s•Lv s•LvR S 0 MKR S 0 MN S 0 MN s•MRS s·Ms s•N S 0 N S 0 NR S0 NRS S 0 PR s·R s·R s·R S 0 R S0 R s•RL S 0 RLNG s•RMN S0 RN s•RR sos SM 0 D Algorithm for Name CompressionjDOLBY 273 Rauch, Rausch, Roach, Roche, Roush Rush, Rusch Russ, Rus Reaves, Reeves Seibert, Siebert Schofield, Scofield Stefan, Steffan, Steffen, Stephan, Stephen Steffens, Stephens, Stevens Steffensen, Steffenson, Stephenson, Stevenson Schaefer, Schaeffer, Schafer, Schaffer, Schafer, Shaffer, Sheaffer Stauffer, Stouffer Siegal, Sigal Sigler, Ziegler Schuck, Shuck Sachs, Sacks, Saks, Sax, Saxe Seeley, Seely, Seley Schell, Shell Schuler, Schuller Schultz, Schultze, Schulz, Schulze, Shults, Shultz Silva, Sylva Silveira, Silvera, Silveria Schomaker, Schumacher, Schumaker, Shoemaker, Shumaker Simon, Symon Seaman, Seemann, Semon Somers, Sommars, Sommers, Summers Simms, Sims Stein, Stine Sweeney, Sweeny, Sweney Senter, Center Sanders, Saunders Shepard, Shephard, Shepheard, Shepherd, Sheppard Stahr, Star, Starr Stewart, Stuart Storey, Story Saier, Sayre Schwartz, Schwarz, Schwarze, Swartz Schirle, Shirley Sterling, Stirling Scheuermann, Schurman, Sherman Stearn, Stem Scherer, Shearer, Sharer, Sherer, Sheerer Sousa, Souza Smith, Smyth, Smythe 274 Journal of Library Automation Vol. 3/4 December, 1970 SM 0 D SN°DR SN°L SP 0 LNG SP 0 R SP 0 R SR 0 DR SR0 DR T0 D T 0 MSN T0 RL TR 0 S v·L v·L v·R w•o W 0 DKR w·nL w·nMN W 0 DR W 0 DRS W 0 GNR W 0 L W 0 L W 0 L W 0 LBR W 0 LF W 0 LKNS W 0 LKS W 0 LN W 0 LR W 0 LRS W 0 LS W 0 LS W 0 LS W 0 LSN W 0 N W 0 R W 0 R W 0 RL W 0 RNR W 0 S w·sMN Schmid, Schmidt, Schmit, Schmitt, Smit Schneider, Schnieder, Snaider, Snider, Snyder Schnell, Snell Spalding, Spaulding Spear, Speer, Speirer Spears, Speers Schroder, Schroeder, Schroeter Schrader, Shrader Tait, Tate Thomason, Thompson, Thomsen, Thomson, Tomson Terrel, Terrell, Terrill Tracey, Tracy Vail, Vaile, Vale Valley, Valle Vieira, Vierra White, Wight Whitacre, Whitaker, Whiteaker, Whittaker Whiteley, Whitley Whitman, Wittman Woodard, Woodward Waters, Watters Wagener, Waggener, Wagoner, Wagner, Wegner, Waggoner Willey, Willi Wiley, Wylie Wahl, Wall Wilber, Wilbur Wolf, Wolfe, Wolff, Woolf, Woulfe, Wulf, Wulff Wilkens, Wilkins Wilkes, Wilks Whalen, Whelan Walter, Walther, Wolter Walters, Walthers, Wolters Wallace, Wallis Welch, Welsh Welles, Wells Willson, Wilson Winn, Wynn, Wynne Worth, Wirth Ware, Wear, Weir, Wier Wehrle, Wehrlie, Werle, Worley Warner, Werner Weis, Weiss, Wiese, Wise, Wyss Weismann, Weissman, Weseman, Wiseman, Wismonn, Wissman Algorithm for Name CompressionjDOLBY 275 REFERENCES 1. Cox, N.S.M.; Dolby, J. L.: "Structured Linguistic Data and the Automatic Detection of Errors." In Advances in Computer Type- setting (London: Institute of Printing, 1966), pp. 122-125. 2. Cox, N.S.M.; Dews, J. D.; Dolby, J. L.,: The Computer and the Library (Hamden, Conn.: Archon Press, 1967). 3. Dolby, J. L.; Forsyth, V. J.; Resnikoff, H. L.: Computerized Library Catalogs: Their Growth, Cost and Utility (Cambridge, Massachu- setts: The M.I.T. Press, 1969) . 4. Becker, Joseph; Hayes, Robert M. : Information Storage and Re- trieval (New York: Wiley, 1963 ), p. 143. 5. Davidson, Leon: "Retrieval of Misspelled Names in Airlines Pas- senger Record System," Communications of the ACM, 5 (1962), 169-171. 6. Blair, C. R.: "A Program for Correcting Spelling Errors," Informa- tion & Control, 3 ( 1960), 60-67. 7. Schwartz, E. S.: An Adaptive Information Transmission System Employing Minimum Redundancy Word Codes (Armour Research Foundation Report, April 1962). (AD 274-135). 8. Bourne, C. P.; Ford, D.: "A Study of Methods for Systematically Abbreviating English Words and Names," Journal of the ACM, 8 ( 1961), 538-552. 9. Kessler, M. M., "The "On-Line" Technical Information System at M.I.T.", in 1967 IEEE International Convention Record. (New York: Institute of Electrical and Electronic Engineers, 1967), pp. 40-43. 10. Kilgour, F. G.: "Retrieval of Single Entries from a Computerized Library Catalog File," American Society for Information Science, Proceedings, 5 ( 1968), 133-136. 11. Nugent, W. R.: "Compression Word Coding Techniques for In- formation Retrieval," Journal of Library Automation, 1 (December 1968), 250-260. 12. Rothrock, H. I.: Computer-Assisted Directory Search; A Dissertation in Electrical Engineering. (Philadelphia: University of Pennsylvania, 1968). 13. Ruecking, F. H.: "Bibliographic Retrieval from Bibliographic In- put; The Hypothesis and construction of a Test," Journal of Library Automation, 1 (December 1968), 227-238. 14. Tukey, J. W.: A Tagging System for Journal Articles and Other Citable Items: A Status Report (Princeton, N.J.: Statistical Tech- niques Research Group, Princeton University, 1963). 15. Resnikoff, H. L.; Dolby, J. L.: A Proposal to Construct a Linguistic and Statistical Programming System, (Los Altos, Cal.: R & D Con- sultants Company, 1967). 5260 ---- 276 ON-LINE ACQUISITIONS BY LOLITA Frances G. SPIGAI: former Information Analyst, Oregon State University Library; and Thomas MAHAN: Research Associate, Oregon State University Computer Center, Corvallis, Oregon. The on-line acquisition program (LOLITA) in use at the Oregon State University Library is described in t erms of development costs, equipment requirements, and overall design philosophy. In pa1'ticular, the record format and content of records in the on-orde1' file, and the on-line pro- cessing of these records (input, search, correction, output) using a cathode ray tube display terminal are detailed. The Oregon State University Library collection has grown by 15,000- 20,000 new titles per year (corresponding to 30,000-35,000 volumes per year) for the past three years to a total of approximately 275,000 titles ( 600,000 volumes); continuing serials account for a large percentage of annual "volume" growth. These figures would indicate an average input of 60-80 new titles per day. On an average, a corresponding number of records are removed each day upon completion of the processing cycle. A like number of records are updated when books and invoices are re- ceived. In addition, approximately 200 searches per day are made to determine whether an item is being ordered or to determine the status of an order. Since the mid-1960's, and with the introduction of time-sharing, a handful of academic libraries ( 1, 2, 3) and several library networks ( 4, 5, 6) have introduced the advantages ( 7) of on-line computer systems to library routines. Most of the on-line library systems use teletypewriter terminals. Use of visual displays for library routines has been limited, although Stanford anticipates using visual displays with IBM 2741 type- On-Line AcquisitionsjSPIGAI and MAHAN 277 writer terminals in a read-only mode ( 1), and the Library of the IBM Advanced Systems Development Division at Los Gatos, sharing an IBM 360/50, uses an IBM 2260 display for ordering and receiving ( 8). In addition, an Institute of Library Research study, focusing on on-line maintenance and search of library catalog holdings records, has concluded that even with the limited number of characters available on all but the most expensive display terminals " ... the high volume of data output associated with bibliographic search makes it desirable to incorporate CRT's as soon as possible, in order to facilitate testing on a basis superior to that achievable with the mechanical devices." (9). Many academic libraries, during shelflist conversion or input of acqui- sition data, use a series of tags for bibliographic information. Some of these tags are for in-house use, while others presumably are used to aid in the conversion of MARC tape input to the library's own input format. The number of full-time staff required to design and operate automated systems in individual academic libraries typically ranges from seven to fifteen. This doesn't seem to be an inordinate range, since most depart- ments of a medium-large to large academic library require a similar size staff for operational purposes alone. LOLITA (Library On-Line Information and Text Access) is the auto- mated acquisition system used by the Oregon State University Library. It operates in an on-line, time-shared, conversational mode, using a cathode ray tube (CDC-210) or a 35-KSR Teletype as a terminal, depending upon the operation required. Both types of equipment are in the Acquisi- tions Department of the Library; each interacts with the University's main computer ( CDC-3300, 91K core, 24-bit words), which, in turn ac- cesses the mass storage disk ( CDC-814, capable of storing almost 300 million characters) through the use of LOLITA's programs in conjunction with the executive program, OS-3 ( 10). Under the OS-3 time-sll,aring system, LOLITA shares the use of the central computer memory and processor with up to 59 other concurrent users; the use of the mass storage disk is also shared with other users of the University's Computer Center. (LOLITA will require approximately 11 million characters of disk storage). LOLITA's programs are written in FORTRAN and in the assembly lan- guage, COMPASS, and are composed of two sets: those which maintain the outstanding order file, and those which produce printed products and maintain the accounting and vendor files. Several key factors have shaped the design of LOLITA. An on-line, time-sharing system has been operating at OSU since July 1968, and on- line capabilities have been available for test purposes since the summer of 1967. Programming efforts could be concentrated exclusively on the design of LOLITA and an earlier pilot project ( 11) , for no time was needed to design, debug or redesign the operating system software, as was necessary at Washington State U. and the U. of Chicago (2, 12) . - Heavy reliance was put on assembly language coding for the usual 278 journal of Library Automation Vol. 3/4 December, 1970 reasons, plus the knowledge that the Computer Center's next computer is to be a CDC-3500, with an instruction set identical to that which the Library now uses. In short, neither the OS-3 operating system nor the assembly language will change for the next few years. An added motiva- tion influencing program design was the desire to minimize response time for the user. In view of the transient nature of a university library's student and civil service staff, the need for an easily-learned and maintained system is paramotmt. The flerible display format of the CRT allows a machine readable worksheet, with a built-in, automatic, tagging scheme; it ob- viates the need for a paper worksheet, and thus eliminates a time-consum- ing, · tedious, and error-prone conversion process. The book request slip contains the source information for input. Proofreading and correction are done on-line at time of input. Alterations can be made at any later time as well. LOLITA has used from 1.5 to 3.0 FTE through the period of design to operation. After an initial testing and data base buildup period, anticipated to last about six months, and during which LOLITA will be run in parallel with the manual system, it is expected that the on-order/in-process, vendor, and accounting files will be maintained automatically and that reports and forms currently output by the Acquisitions Department staff will be generated automatically. Specifically, records comprising three files will be kept on-line : 1) the outstanding order file (a slight misnomer since it includes and will include three types of book request data: out- standing orders, desiderata of high priority, and in-process material), 2 ) name and address for those vendors of high use (approximately 200 of 2500, or about 8% ), and codes and use-frequency counts for all vendors, and 3) accounting data for all educational resource materials purchased by the Oregon State University Library. It should be kept in mind that, although LOLITA is designed for book order functions, the final edited record, after the item has been cataloged, will be captured on magnetic tape as a complete catalog record. Thus, all statistics and information, except circulation data, will be available for future book acquisitions. This project is being undertaken for two reasons: 1) the Oregon State University Library is concerned that librarians achieve their potential as productive professionals through the use of data processing equipment for routine procedures, and that cost savings may be realized as the Library approaches a total system encompassing all of the technical serv- ices routines, and 2) a uniquely receptive Computer Center and a success- ful on-line time-sharing facility are available. RECORD FORMAT AND CONTENT Each book request is described by 27 data elements which are grouped into three logical categories and are displayed in three logical "pages" On-Line AcquisitionsfSPIGAI and MAHAN 279 of a CRT screen. The categories are: 1) bibliographic information, 2) accounting information, and 3) inventory information; Figures 1, 2, and 3 list the data elements in the same sequence as they appear on the CRT screen. Though most data elements listed are self-explanatory, eight require some description. ORDER NUMBER FLAG WORD AUTHOR TITLE EDITION ID NUMBER PUBLISHER YEAR PUBLISHED NOTES Fig. 1. Bibliographic Information. ORDER NUMBER DATE REQUESTED DATE ORDERED ESTIMATED PRICE NUMBER OF COPIES ACCOUNT NUMBER VENDOR CODE VENDOR INVOICE NUMBER INVOICE DATE ACTUAL PRICE DATE RECEIVED DATE 1ST CLAIM SENT DATE 2ND CLAIM SENT Fig. 2. Accounting Information. ORDER NUMBER BIB CIT DATE CATALOGED VOLUME ISSUE LOCATION CODE LC CLASS NUMBER Fig. 3. Inventory Information. 280 l ournal of Library Automation Vol. 3 f 4 December, 1970 Flag Word This data element indicates the status of a request. The normal order procedure needs no Hag word. Exceptions are dealt with automatically by entering an appropriate Hag word. As more requests are added to the system, and as more exceptional instances are uncovered, more Hag words will undoubtedly be added. To date there are twelve Hag words, plus one data element which serves both as a data element and as a status signal. Flag words and procedures activated are described below. CONF.: Confirming orders for materials ordered by phone or letter, and for unsolicited items which are to be added to the collection. The order form is not mailed, but used for processing internal to the Library only. Accounting routines are activated. GIFT: For gift or exchange items, a special series number prefixed by a "G" is assigned and the printed purchase order is used internally only. This Hag word also acts as a signal so that accounting routines will not encumber any money. The primary reason for assigning a purchase order number is to provide a record indexing mechanism (this is also true for HELD orders) . HELD : Selected second-priority orders being held up for additional book budget funds. These order records are kept on line, and are assigned a special series of purchase order numbers, prefixed by an "H." No account- ing procedures accompany these orders, although a purchase order is generated and manually filed by purchase order number. LIVE : HELD orders which have been activated. This word causes a reassignment of purchase order numbers to the next number in the main sequence ( instead of "H" -prefixed numbered) and sets up the natural chain of accounting events. The new purchase order number is then written or typed on the order form, the order date added, and the order mailed. CASH: Orders for books from vendors who require advance payment. An expenditure, instead of an encumbrance, is recorded. RUSH: Used for books which are to be rush ordered and/or rush cataloged. RUSH will also be rubber-stamped on the purchase order for emphasis. No special procedures are activated within the computer pro- grams; RUSH is an instruction for people. DOCS: Used when ordering items from vendors with whom the OSU Library maintains deposit accounts (e.g. Government Printing Office). This causes a zero encumbrance in the accounting scheme; CASH is used to put additional money into deposit accounts. CANC: Cancelled orders. Unencumbers monies and credits accounts for CASH orders. REIS: Used to reissue an order for an item which has been cancelled. A new purchase order containing a new order number, vendor, etc. will automatically be issued. Re-input is not necessary; however, changes in vendor no., etc., can be made. On-Line Acquisitionsj SPIGAI and MAHAN 281 PART: Denotes a partial shipment for one purchase order. No catalog date can be entered while PART appears as the flag word. INVO will replace PART when the final shipment has been received; CANC will replace PART if the final shipment is not received, and the order is reissued for the portion received. · INVO : When invoice information is entered into the file, INVO is typed in as the flag word. This causes accounting information (purchase order number, vendor code, invoice number, actual price, invoice data, account number) to be duplicated in the accounting file. KILL: Used to remove an inactive record from the file ( cf. DATE CATALOGED). DATE CATALOGED: A value entered for this data element signals the end of processing. The record is removed from the main file and trans- ferred to magnetic tape. Changes and additions to inventory and bib- liographic data elements are anticipated at this final point, to bring the record into line with those of the Catalog Dept. Author(s) All authors are to be included in this data element, corporate authors, joint authors, etc. The entry form is last name first (e.g. Smith, John A. ). For compound authors, a slash is used as the delimiter separating names (e.g. Smith, John A. I Jones, John Paul) . ID Number Standard book number, vendor catalog number, etc. Order Number The order number is automatically assigned to one of three series depending on the flag word: the main number series with the fiscal year as prefix; HELD order series with an "H"-prefix (stored in the order number index as 101, the "H" is what is printed on the order forms); and GIFT series with a "G" -prefix (likewise stored in the order number index as 102). Vendor Code A sample of 18 months of invoice data (obtained from the Comptroller's Office) for the Library resource account number indicates the use of 2200 vendors during that period of time. By sorting by invoice frequency and dollar amount, about 200 vendors were identified who either invoiced the Library more than 12 times during this time period (since the invoices tended to contain more than one item for frequently used vendors, the number of purchase orders issued could easily be several times this amount), or whose invoices totalled over $110.00. Of these, 171 have been selected for on-line storage. They will be assigned code numbers 1 to 171, and names and addresses of these vendors will be included on the computer generated purchase orders. Authority files for all vendors 282 Journal of Library Automation Vol. 3/4 December, 1970 are kept on Rolodex units; one set is arranged alphabetically by vendor name, the other by vendor code. Account Number The Library account to which the book is charged. The number is divided into four sections: 1) a two-digit prefix identification for OSU, 2) a four-digit identification for OSU Library resource expenditures, 3) a one- or two-digit identification of the particular Library resource fund account to be charged (e.g. Science, Humanities, Serials, Binding, etc. ), and 4) a one- or two-digit code identifying the subject which most closely describes the request. From this data, statistics will be derived which describe expenditures by subject as well as by fund allocation. This will provide a powerful tool for collection building and . may also be a political aid in governing departmental participation in book selection. BIBCIT Bibliographic citation code which cites the location by Acquisitions Dept. personnel of bibliographic data ( L.C. copy, etc. ). This information is included on the catalog work slip (4th copy of the purchase order) so that duplicate searching by the Catalog Dept. can be avoided. LC Classification Number Refers to the call number as it is assigned by the OSU Catalog Dept. FILE ORGANIZATION On-Order Record The operating system for Oregon State University's on-line, time-sharing system reads into memory a quarter page (or file block) of 510 computer words at a time. Each on-order (outstanding order) record is composed of a block of 51 computer words ( 204 6-bit characters), or linked lists of blocks, in order to best use this system. Thu·s, each quarter page is divided into ten physical records of 51 computer words apiece. For rec- ords requiring more than one block, the nearest available block of 51 words within the same 510 word file-block is used; but if none is vacant within the same file-block, the first available 51-word block in the file is used. If none is free the file is lengthened to provide more blocks. A bit array is used to keep track of the status (in use, vacant) of records in the main file. In the bit array, each of 20 bits of each 24-bit computer word corresponds to a 51-word block in the main file. As in Figure 4, the 13th bit has a zero value, indicating a vacancy in the 13th 51-word block of the main file; the 14th bit has a value of 1, indicating the 14th 51-word block in the on-order file is in use. A total of 10,120 block locations can be monitored by each file block of the bit array. Records in this file are logically ordered by purchase order number, the arrangement effected by pointers which string the blocks together. On-Line Acquisitiansf SPIGAI .and MAHAN 28$ 510-word ftle - block unused 4 bits one -word b i t array Fig. 4. Bit Army Monitor of Record Block Use in the On Order File. Access Points Order Number The order number index is arranged by the main portion of the order number, and within that, it is in prefix number sequence. The sequence in Figure 5 illustrates order number index arrangement (as well as the logical arrangement of the on-order file). The order number index allows quick access to selected points within the main file. Conceptually, the ordered main file is segmented into strings of records whose order numbers fall into certain ranges. More specifically, items whose sequence numbers range from 0 to 4 (ignoring the prefix of the order number) comprise the first segment, 5 to 9 the second, etc. The index itself merely contains pointers to the leading record in each (conceptual) segment. Thus, in the records whose purchase order numbers are shown in Figure 5, there would be pointers to the second (69-124) and sixth (70-125), but not to the others. To reach the fourth ( 101-124) one follows the index to the second, and then follows the block pointers through the third to the fourth . 102-118 69-124 70-124 101-124, 102-124 70-125 102-125 . 70-126 Fig. 5. Fiscal Year 1969, Order Number 124 Fiscal Year 1970, Order Number 124 HELD Order Number 124 for the Current Year Gift Order Number 124 for the Current Year ( Note : The prefix 'H,' which is printed on the purchase orders is represented as the number 101 for internal computer processing; likewise 102 represents the prefix 'G') Order Number Index Sequence. 284 Journal of Library Automation Vol. 3/4 December, 1970 P.O. Number Forward Pointer ' P.o. Number Backward Pointer Time of Last Update . P. 0. Number Title Forward Pointer v Title Backward Pointer v Pointers to Author( s) / ~ ~ Title > Date of Re_quest Date Ordered Encumbered Price Number of C<>E_ies Account Number (2 words) Vendor Number FlAG Word ~ Publisher 1 Date of Publication ~ Notes ~ ~ Edition ~ lD Number ~ BlBCIT ' LC Classification Number )' Volume Number Issue ~ Location Code ; ~ ~ Vendor's Invoice Number ~~ Invoice Date Actual Price Date Received Date First Claim Sent Date Second Claim Sent Fig. 6. "On Order" Record Organization. On-Line AcquisitionsjSPIGAI and MAHAN 285 Author(s) The author index is in the form of a multi-tiered inverted tree. The lowest tier is an inverted index containing the only representation of the author's names (it is not stored in the on-order record (Figure 6), and, for each author, pointers to the records of each of his books (Figure 7). The entries for several authors may be packed into a single 51-word block, if space permits. Each higher tier serves to direct the indexing mechanism to the proper block in the next tier below, and to this end as much as needed of an author's name is filed upwards into higher tiers; this method is described in more detail by Lefkovitz ( 13) as "the unique truncation variable length key-word key." AUTHOR INDEX DIRECTORY (Level 0 + 1) JOHN/ JONES, J 927 INVERTED AUTHOR INDEX (Level 0) Control Word (II chars. in record; # chars. in full name of author; # of titles JONES, T JONES, JOHN PA UL 928 JOP K.A 1282 TOW ~ ~~~3 in on order file ~~2~66~7------------~ ON ORDER FILE 1072 927 10/20/69 10/29/69 $4.95 . 30-1061-6-20 16 0000 1282 10 Fig. 7. Author Index Organization and Access to On Order File. Title Not yet programmed. ON-LINE RECORD PROCESSING Record Creation After a number of new book requests have been searched to determine their absence from OSU's collection and after they have been bibliograph- ically identified, they are hatched for vendor assignment and readied for entry into the on-line file of book requests via the CRT (Figure 8 ). L-.:> 00 0) g '"'t i5 -c -~ N ... /Y'RIFIID "-.. N _/ NOT ""I ASSIQl vrNOOR 1•..-::-::-. _ I .... ~ a ~ y I > ~ ...... c ~ ...... .... c ;:s N < 0 !-' CN -~ d (!) () (!) !3 0"' (!) ~'"I ..... tO -..1 0 Fig. 8. Book Request Processing. On-Line AcquisitionsjSPAGAI and MAHAN 287 LOLITA's starting page is obtained by typing in the word LOLITA on the CRT screen. The text illustrated in Figure 9 is then displayed on the screen of the CRT. When 'T' is typed in, indicating a wish to create a record, the first data element of the first page of input appears (Figure 10). (Since the majority of records do not need a flag word upon input, the flag word fill-in line appears only on a redisplay of this page, and the flag word may be inserted at that time.) MAIN FILE PLEASE INDICATE A CHOICE 1. CREATE A NEW ENTRY 2. LOCATE AN EXISTING ENTRY 9. TERMINATE ALL PROCESSING Fig. 9. "Starting" Page of Function Choices. AUTHOR(S): EXAMPLES: JONES DEQUINCEY, THOMAS WASHINGTON, BOOKER T. ADAMS, JOHN QUINCY/ DOE, JOHN AMERICAN MEDICAL ASSOCIATION Fig. 10. First Data Element Displayed in New Record Creation Process. At this point the user can go in one of two directions. The first page of input information may be entered one data element at a time, each element being requested in a tutorial fashion by LOLITA. Alternately, all of the first page data may be input at once, with data elements separated by delimiters. The user can switch from one method to the other at any point. A control key (RETURN) is the delimiter used to signal the end of each data element, and, at the same time, RETURN repositions the cursor (which indicates the position of the next character to be typed on the CRT screen) to the location of the next data element to be filled in. Another conh·ol key (SEND): 1) serves as a terminal delimiter, and 2) transmits data on the screen to the computer, thereby 3) triggering the continuation of processing until the next screen display is generated. Thus, with page one, data elements are displayed, filled in and sent one at a time in the tutorial approach, or, all seven data elements are typed in at once, a RETURN mark following items 1-6, then sent after the last data element. RETURN or SEND must be used with each data element, even with those for which there is no information. This secures the sequence of element input, thus providing an easy (for the user) and automatic way of tagging elements for any future tape searches to provide statistics or analytical reports. In particular, this process obviates all content restrictions on variable (ie., free-form) items. Each of the pages is redisplayed after 288 Journal of Library Auto'TIUltion Vol. 3/4 December, 1970 input, and corrections can be made at this time. The CRT is used for all input and its write-over capabilities are utilized for corrections, as com- pared to the "read-only" use planned for CRT displays used for Stanford's BALLOTS ( 1). Except for the flag word, all the data elements on the first page are variable in length and unrestricted as to content. Data elements on page 2 and 3 (Figures 2 and 3) are more of a fixed length in nature; thus with these pages, a whole page at a time is always filled in and sent: the tutorial function is inherent in the display. The concluding display is shown in Figure 11. SEND IF ALL DONE, TYPE 1-3 TO REVIEW PAGES. Fig. 11. Review Option. Because hatched searching and input are assumed, when one search or input is finished, the program recycles to continue searching or inputting without going back to the starting page (Figure 9) each time. Record Search Searching programs have been completed which will search by order number and by author. Title searching will be implemented within the next few months, although a satisfactory scheme for title searching ( im- proving on manual methods, yet economical) has not been uncovered. Methods suggested or used by Ames, Kilgour, Ruecking, and SPIRES have been noted (14, 15, 16, 17). The procedure for searching within the outstanding order file begins with the display of choices shown in Figure 9. One types a "2," indicating a desire to locate an existing entry, and the text shown in Figure 12 is displayed on the CRT screen. At this point one chooses to search either by order number or by author. If one selects a valid order number rep- resenting a request record, the first page of that record, containing bib- liographic information, is displayed. This is followed by the display shown in Figure 11, so that accounting and inventory information may also be reviewed. For the user's convenience the order number is displayed in the upper right-hand comer of each of the three pages, both upon record input and search redisplay. To search by author, one types the author's name on the second line of Figure 12, using the same format as that used in record creation. If the ------------------------- : ORDER NUMBER ------------------------------- : A UTH 0 R SUPPLY ONE OF THE ABOVE (START ON THE APPROPRIATE LINE) Fig. 12. Display of Search Options. ' On-Line AcquisitionsjSPIGAI and MAHAN 289 author has only one entry in the outstanding order file, the first page of the entry will appear, etc. (as in the order number search above) . If the author entered has more than one entry in the on-line file, information depicted in Figure 13 will be displayed on the screen of the CRT. __ _____________ : ENTER NUMBER OR 'NF' (NOT FOUND) 1. NIGHT OF THE IGUANA 2. THE MILK-TRAIN DOESN'T STOP HERE ANYMORE 3. CAT ON A HOT TIN ROOF n. THE GLASS MENAGERIE Fig. 13. Display of Multiple Titles on File for One Author. If the requested title is one of the titles displayed, one types its number and the record for that title will be displayed. If the title isn't among those displayed, typing NF would result in a redisplay of the text in Figure 12 in order for searching to continue. For personal authors, variant forms of the name may be located using the following procedure. The word OTHERS is entered at the top of the screen, after an unsuccessful author search, so that a search for author J. P. Jones would find all documents by John Paul Jones, Joseph P. Jones, J. Peter Jones, etc., as well as J. P. Jones. A search for John P. Jones would find all documents by J. P. Jones, John Jones and J. Peter Jones as well as John P. Jones. Record Changes Additions and corrections to the original record are made by first locat- ing the record (by order number, author, or eventually, title), adding to the data elements, or writing over them (for corrections), and transmitting the information. Examples of this procedure include: 1) entering the date received, 2) recording the vendor invoice number, invoice date, and actual price and 3) inserting or changing a flag word. In addition, after an item has been cataloged, the record is revised to include catalog data, as well as to exclude extraneous order notes. Output Aside from the CRT displays, output is in three forms: off-line tape, printed forms and on-line files (Figure 14). Examples of output are library purchase orders, accounting reports, vendor data, and records of cataloged items. The number of potential reporting uses is limited only by money and imagination. 290 Journal of Library Automation Vol. 3/4 December, 1970 Fig. 14. Output from On-Line On Order File Input. I ORDER NUMBER I I DATE I ID NUMBER AUTHOR TITLE PUBLISHER VENDOR NAME VENDOR ADDRESS VOUJMES EDITION Fig. 15. Purchase Order. f ESTIMATED PRICE I NO. Of COPIES I VENDOR COOE I ACCOUNT DATE OF PUB. * * * • FLAG** • * GIFT OR HELD ORDER NO. BIBCIT LIBRARY PURCHASE ORDER 00 r CD !il~ iii r= ::0 r < . > SP >Cil r-i ~ C/l c~ X C/lfTl :v 0 c: -i ::0 z 0 "' < Q "' ~ ::0 C/l ~ :::; -< On-Line AcquisitionsfSPIGAI and MAHAN 291 The purchase order, shown in Figure 15, is composed of four copies: 1} the vendor's copy to be retained by him, 2) a vendor "report" copy, 3) the copy which is kept as a record in the OSU Library, and 4) a catalog work slip to be forwarded to the Catalog Department with the book. Purchase orders are printed on the Library's Teletype, which is equipped with a sprocket-feed. Orders can also be printed on the line printer in the Computer Center. While this is a slightly cheaper data processing procedure, since no terminal costs are incurred, convenience and security have produced a victory in "economics over economies" ( 18 ), and the librarian's time has been considered in the total scheme. For gift items, purchase orders are produced as the cheapest means of preparing a catalog work slip. HELD purchase orders are produced and manually filed in purchase order number sequence, but when their status is changed to LIVE, the old numbers are automatically replaced by a purchase order number in the main series. These new numbers are written onto the purchase orders, along with any other changes, and the orders are mailed. The flag word LIVE also activates accounting procedures. There are two sets of accounting reports. The first is generated when the purchase orders are issued and contains tabulated information for the Library's Bookkeeper, the Head of Business Records in the Acquisitions Dept., and the Comptroller of the Oregon State System of Higher Educa- tion. The second summary report is issued after the book and invoice have been received and will contain additional information, pertinent to the invoicing procedure; this report has the same distribution as the first. Periodic reports are planned for the Library's subject divisions summariz- ing expenditures by account number, reference area, and subject. Pro- gramming for this has not yet been done. A frequency count will be stored with each vendor code and periodic listings will be printed for use in retaining vendors. Mter an item has been cataloged, the catalog work slip and a slip equivalent to a main-entry catalog card are sent to Acquisitions, and all remaining information and changes are recorded in the on-line record. This record is then transferred to a file from which it is dumped onto a magnetic tape. This off-line file will be used for statistical analyses and will be the start of a machine readable data base. Future plans will, of course, depend on funding; however, two logical steps which could follow immediately and require no additional conversion are: 1) additional computer generated paper products (charge cards, cata- log cards, book spine labels, new book lists, etc. ) , and 2) a management information system using acquisition and cataloging data. The construction of a central serial record in machine readable form would produce many valuable by-products. A program for the translation of the MARC II test tape has been written which causes these records to be printed out on the Computer Center's line printer; and since a sub- 292 Journal of Library Automation Vol. 3/4 December, 1970 scription to the MARC tapes is now available to OSU for test purposes, its advantages and compatibility with LOLITA will be investigated as time permits. Unsolved problems, aside from those which everyone working in a data processing environment faces (e.g. syst~m and hardware breakdown, con- tinued project funding, and lengthy dehv~ry times for hardware), include: 1) the widely varying system response tunes (commonly from a fraction of a second up to 60 seconds; usually 2-15 seconds); 2) the lack of per- sonnel skilled in both data processing and library techniques; 3) the limited print train currently available on the line printer ( 62 character set); and 4) bureaucratic policy, which can render the most sophisticated plans for automation unfeasible if properly applied. It is recognized that all these problems can be solved by money, time, and priorities. Meanwhile, the period of in-parallel operation will be valued as a time to educate, to test, to gather statistics, and to further refine the programs and procedures which comprise LOLITA. EVALUATION Preliminary input samples indicate that a daily average of from 8 hours, 20 minutes, to 10 hours and 45 minutes will be necessary for input, searches, ~ting and corrections using the CRT. An additional 3 hours per day ~f terminal time using the Teletype will be required to produce the pur- chase orders, answer rush search questions if the CRT is busy, and activate the daily batch programs (accounting reports, etc.). The sad economic plight of most libraries causes librarians to cast an especially suspicious eye on the costs of automation; a few words on OSU's data processing costs may b~ of interest. The cost of total develop- ment efforts to produce LOLITA IS under $90,000 (though considerably less was actually expended), or an average annual cost of $30,000 over a three-year period. This compare~ favorably with average annual incomes of from $50,000 to over $300,000 m Federal funds alone for other on-line library acquisition projects in ?Tiiversities ( 19, 20, 21, 22). A total of 6.75 man-years was required to des1gn LOLITA. The 6.75 man-years comprises 2.5 years of programming, 3.25 years .of systems analysis, coordination and documentation, and 1.0 year of clencal work, and represents the efforts of four students and six professional workers. This total does not in- clude the time spent by Acqu~sitions Department personnel in reviewing LOLITA's abilities or in leammg to use the terminals. Current data processing rates charged by the Computer Center include the following: CRT rental-$100/mo.; CPU time-$300/hr.; terminal time -$2.00/hr.; on-line storage costs-15c/2040 characters/mo. The Teletype has been purchased, thus only local phone lines charges are incurred. The on-line system is available for use from 7 :30 A.M. to 11:00 P.M. each week-day, and from 7:30 A.M. to 5:00 P.M. on Saturday, which more than covers the 8-5 schedule of the Acquisitions Department. il On-Line AcquisitionsjSPIGAI and MAHAN 293 ACKNOWLEDGMENTS The work on which this paper is based was supported by the Adminis- tration, the Computer Center and the Library of Oregon State University. Special mention is due Robert S. Baker, Systems Analyst, OSU Library, and Lawrence W. S. Auld, Head, Technical Services, OSU Library, for their extensive participation in the LOLITA Project and for their many suggestions which benefitted the final version of this paper. Hans Weber, Head, Business Records, OSU Library, also contributed much to LOLITA's design. REFERENCES l. Veaner, Allen B.: Project BALLOTS: Bibliographic Automation of Large Library Operations Using a Time-Sharing System. Progress Report, March 27, 1969-June 26, 1969, (Stanford California: Stanford University Libraries, 29 July 1969), ED-030 777. 2. Burgess, Thomas K.; Ames, L.: LOLA: Library On-Line Acquisition Sub~System~ (Pullman, Washington: Washington State University, Systems Office, July 1968), PB-179 892. 3. Payne, Charles: "The University of Chicago's Book Processing Sys- tem." In Stanford Conference on Collaborative Library Systems Development: Proceedings, Stanford, California, October 4-5, 1968 (Stanford California: Stanford University Libraries, 1969). ED-031 281, 119-139. 4. Pearson, Karl M.: MARC and the Library Service Center: Automa- tion at Bargain Rates (Santa Monica, California: System Develop- ment Corporation, 12 September 1969). SP-3410. 5. Nugent, William R.: "NELINET -the New England Library Infor- mation Network." In Congress of the International Federation for Information Processing (IFIP), 4th: Proceedings, Edinburgh, Aug- ust 5-10, 1968 (Amsterdam, North Holland Publishing Co., 1968 ). G28-G32. 6. Blair, John R.; Snyder, Ruby: «An Automated Library System: Project LEEDS," American Libraries, 1 (February 1970), 172-173. 7. Warheit, I. A.: "Design of Library Systems for Implementation with Interactive Computers," ] ournal of Library Automation, 3 (March 1970)' 68-72. 8. Overmyer, LaVahn: Library Automation: A Critical Review (Cleve- land, Ohio: Case Western Reserve University, School of Library Science, December 1969). ED-034 107. 9. Cunningham, Jay L.; Schieber, William D.; Shoffner, Ralph M.: A Study of the Organization and Search of Bibliographic Holdings Records in On-Line Computer Systems: Phase I (Berkeley, Cali- fornia University: Institute of Library Research, March 1969). ED- 029 679, pp. 13-14. 294 Journal of Library Automation Vol. 3/4 December, 1970 10. Meeker, James W.; Crandall, N. Ronald; Dayton, Fred A.; Rose, G. : "OS-3: The Oregon State Open Shop Operating System." In American Federation for Information Processing Societies: Proceed- ings of the 1969 Spring Joint Computer Conference, Boston, Mass., May 14-16, 1969 (Montvale, New Jersey: AFIPS Press, 1969), 241- 248. 11. Spigai, Frances; Taylor, Mary: A Pilot-An On-Line Library Acqui- sition System (Corvallis, Oregon: Oregon State University, Com- puter Center, January 1968), cc-68-40, ED-024 410. 12. University of Chicago. Library: Development of an Integrated, Com- puter-Based, Bibliographical Data System for a Large University Library (Chicago, Illinois: University of Chicago, Library, 1968). PB-179 426. 13. Lefkovitz, David : File Structures for On-Line Systems (New York: Spartan Books, 1969 ), pp. 98-104. 14. Ames, James Lawrence: An Algorithm for Title Searching in a Com- puter Based File (Pullman, Washington : Washington State Uni- versity Library, Systems Division, 1968). 15. Kilgour, Frederick G.: "Retrieval of Single Entries from a Com- puterized Library Catalog File," Proceedings of the American Society for Information Science, 5 (New York, Greenwood Publishing Corp., 1968)' 133-136. 16. Ruecking, Frederick H., Jr.: "Bibliographic Retrieval from Biblio- graphic Input; the Hypothesis and Construction of a Test," Journal of Library Automation, 1 (December 1968), 227-238. 17. Parker, Edwin B.: SPIRES (Stanford Physical Information REtrieval System). 1967 Annual Report (Stanford California: Stanford Uni- versity, Institute for Communication Research, December 1967), 33-39. 18. Kilgour, Frederick G.: "Effect of Computerization on Acquisitions," Program, 3 (November 1969), 100-101. 19. "University Library Systems Development Projects Undertaken at Columbia, Chicago and Stanford with Funds from National Science Foundation and Office of Education," Scientific Information Notes, 10 (April-May 1968), 1-2. 20. "Grants and Contracts," Scientific Information Notes, 10 (October- December 1968), 14. 21. "University of Chicago to Set Up Total Integrated Library System Utilizing Computer-Based Data-Handling Processes," Scientific In- formation Notes, 9 (June-July 1967), 1. 22. "Washington State University to Make Preliminary Library Sys- tems Study," Scientific Information Notes, 9 (April-May 1967), 6. 5261 ---- 295 LISTINGS OF UNCATALOGED COLLECTIONS Fred L. BELLOMY: Head, and Lies N. JACCARINO: Systems Analyst, Library Systems Staff, University of California, Santa Barbara, California. An operational computerized system used by the UCSB Libraries produces listings of bibliographic data about items in collections where full cataloging treatment is not considered justified. The system produces listings of the brief bibliographic records sorted by any of the data elements in the record including up to twenty-five subjects terrrl8. Of special interest are the authority listings of descriptions and the coordinate indexes to the full records. INTRODUCTION This short report was extracted from the more comprehensive document, Listings of Uncataloged Collections- Systems Documentation, Santa Barbara: University of California, December 1969, Library Systems Docu- ment LS 69-11. The Library Staff at the University of California at Santa Barbara is using computerized procedures to produce a variety of listings of biblio- graphic information about items in uncataloged collections. Although many similar systems undoubtedly have been developed to do similar jobs, this one is noteworthy in two respects, first in being well-documented and sec- ond because its versatility has been tested on three totally different col- lections. The machine programs, written in PL/1, were first used to list the UCSB Art Exhibition Catalogs Collection, but they were designed to be versatile so that they could he applied easily to other similar collections 296 Journal of Library Automation Vol. 3/4 December, 1970 as well. At present these programs are also being used at UCSB to list the documentation of marine pollution due to major oil spills (The Oil Spill Information Center). The programs have been successfully tested also on about one hundred items of the UCSB collection of Early American Trade Catalogs. Application to other collections (such as the phono record collection or video tape file) has been studied and is feasible. Although it is usually difficult to use programs that were not specifically tailored for a particular user, these programs represent at least one instance where attention to versatility and the probable broad scope of possible applications has resulted in a system capable of producing listings for different collections at any location where there is access to an IBM System 360 Computer and a staff capable of adapting about a half dozen Job Control Language ( JCL) statements. The machine written listings of catalogs provide a limited amount of bibliographic data about each item in the collection. The advantage of such listings is the expedition with which a new, not-yet-cataloged, collec- tion can be made accessible. DESCRIPTION As a first step in obtaining a listing, library staff members examine ea~h item in the collection to be listed and transcribe the necessary bibliographic data to an input work sheet (Figure 1). Information on the work sheet is keypunched into one or more punched cards. These records, once in the computer, can be sorted in various ways to provide a variety of listings. Master listings can be produced at desired intervals (e.g. monthly). Multiple copies of each list can be produced, and the sheets of computer printout are a convenient form of access to the material when individual copies of the list are separated and placed in hard-board binders for distribution to the Library Service Desks. Program "packages" (i.e. JCL decks) contain many comment cards, so that each package is self explanatory after very little instruction. To keep the system simple for the librarians who use it, separate "packages" have been prepared for each different listing (or combination of listings) decided on. Listings of the full records (see Figure 2) have been prepared now by 1) classification letter, 2) accession number, 3) year of "exhibit", 11) mai~ and secondary subjects, 12) agency name, 13) agency city, and 17) author. Obviously, others are possible. Listings of subjects (Figure 3) and agencies with the number of times each was used accompany full record listings by subject and agency. These are used as authority lists for future term assignments. Another package, ARTINDX, is used to produce co- ordinate indexes by subject, agency, author and others. An example of the subject index is shown in Figure 4. Such indexes are used with a master listing of the full bibliographic records in accession number order. This method reduces the amount of printout required to provide many different description approaches to the collection. Listings of Uncataloged CollectionsfBELLOMY and JACCARINO 297 CATALOG COLLECTIONS Input Worksheet Column 1. Classification letter ___ 2-3 2. Accession Number 4-8 3. Year of Exhibit 9-12 4. 8&W Illustration No. 13-15 5. Color Illustration No. 16-18 6. Chronology (Y=yes, N=no) __ 19 7. Bibliography No. Pages __ 20-21 8. Bib.Ft.Notes (Y=yes,N=no)_ 22 9. Pages No. 23-25 10. Spare 26-30 11. Subject(s) (separate with ";"} Var 1 2. Agency name Var 13. Agency City Var 14. Agency State Var 1 5. Agency Country Var 1 6. Title Var 17. Au thor _________ _ Var 18. Spare __________ _ Var Note: Data elements 1-10 are fixed field and are to be keyed into the card columns indicated. The card sequence number is always keyed into column 1. Data elements 11-18 are variable field and each is to be terminated with a"". Every record must contain exac tly eight of the s e end of variable field marks (" "). Fig. 1. Input Worksheet for Catalog Collections. A~T E~H I B ITION CATALOG IN DA TE SEQUENCE DECEMBER 1, 1968 PAG E 3 ~ <---------A GENC Y---------> <~O.> <-------------SUBJECT--------------> <---------------~O I ES ,AUT HOR ,TITLE , ETC . ---- -----------> g5 BRITISH MUSEUM LONOON ,GREAT BR ITAI N AR ITI SH 'I USEUM LON OON,GREAT BR ITA IN f iTZ WILLIAM MU SEUM CAMBR I DGE,GREA T BRITAIN MAGGS BROS. LONOON ,GR EAT BRITAIN KLEIN~ERGER,F .,GALLER I ES NEW YORK , NEW YOR K USA OROUOT, HOTEL PARIS , FR A.'lCE NATIO NA L LOAN COLLECTION T RUST LONDO'l,GREAT BRITAIN BELVEDERE VIENNA,AUSTRI4 OROUOT , HOTEL PAR!S,FRANCE MAGGS BROS, LONDON,G REAT BRITAIN NAT I ONAL HUSEET CO PENHAGEN , OENMARK 933 BRITISH MUSEUM, LONOON,GRE AT BRl TA INt COLLECT I ONS ,HA NOAOOKS MANUAL S AND GUI'lES; LONDON,GREAT 8R ITAIN ,GA LLERI ES AND MUSEU~ S,COLLECTI ONS; BRONZE AGE,EUROPEAN,COLLECTIONS; BRON7.ES,EURO PEAN, CP.LLfC TIONS; ~RONZES,CELTIC,COLLECTIONS 113 BRITISH MU SEUM,L ONOON,G RF.AT BRITAIN, COLLECT IONS,HANOB 80KS MANUALS AND GU I OF.S; LONO'lN ,GREAT BRITAIN,GALLER I ES AND MUSEUMS,COLLECTI ONS,HANDBOOK S ~ANUALS ANO GUIDES; ART, fGYPTI AN, COLLECTIONS 774 FITlWILLIAM MlJSEUM,CAMBRIOGE,GREAT BRITAIN,COLLECTIONS,HANDBOOKS MAN UALS AND GU I DES; ChMBRIDGE,GRO.AT BRITAIN,GALLERIES AND "USEU MS,COLLECTIONS,HANOBO OKS MANUA LS AND GUIDES 670 GRAPH I C ARTS 70 1 PAINTING, ITALIAN, 15TH CEN TURY; CHARITJES,AMERJ CAN,20 TH CENTURY ; PAINTING,JTALIAN,l6TH CEN TURY; WORLD WAR ,l914-191 8 tCHARITIE S, A"ERICAN 7 2 SA I NT-AUBIN,GABRIEL JACQUES OE,l724- 1780; GRAPHIC ARTS,FRENCH,18TH CENTU~Y 789 NAT IONA L LOAN COLL EC TION TRUS T, LONOON,GREAT BRJTAI N, CO LLECTIONSt HANDBOOKS MANUAL S ANO GUIDES; LONOON,GREAT BRITAIN,GALLERIES AND MUSEUMS,COLLECTIONS: PAINTING,COLLECTION S 224 TAPESTRY,GOBELIN 70 UHDE,WILHELM,l874-19~7; COLLECTORS AND COLLEC TING,20TH CENTURY 669 GRAPHIC ARTS 641 HANET,EDOUAR0,1 832-188~ ; PAINT I NG , FRENCH,19TH CENTURY Fig. 2. Sample Listin g of Full Record. 11905 1 1~8P 1153 BtW llL US t 1 COLOR !LLUS , INC . CHRONOLOGYI AUTHOR: lEA),C~ARLcS A.;SHIT H, REGIN AL O; TillE: GUI DE TO THE ANT!~UIT IES OF THE EARLY BRONZE AGE OF CENTRAL ANO WESTERN EU~JPE IN THE DEPARTMENT OF BRITISH AND MEDI AEVAL ANIIQ UIT! fS !BRITISH MUSEUMI 119091 32SP 1233 Bt tl I LLUS, 1P BIBLIOGRAPHY, FOOTNOTES, I NC. CHRONOLO:OY I AUT HOR: BU'lGE,E,A .WAL LIS ; TITLE: GUIDE TO T~E EGYPTIAN COLLE CTI ONS IN THE BR ITI SH MUSEUM 119121 240P 1223 B&W ILL US I TITLE: PR IN CIPAL PICTURES OF THE FITZWILLIAM MUSEUM 0 CAM 8RIDGE 119151 105P 127 B&W IL LUSI TITl E: ENGRAV I NGS , ETCHINGS AND DRAWINGS ! CATALOGUE 134?1 119171 26 0P 1102 B&W I LLUS , 3P BIBLIOGRAPHY, FOO TN OTESI AUTHOR : SIREN,JSVALO;BROCKWELL ,MAUR IC E w.; T ITL E: LOAN EXH IBITI ON OF ITALIAN PRIMI TIVE S IN AIO OF THE AMERI CAN WAR RELI EF 119191 63P 140 BtW ILLUS, FOOTN OTES I TITLE: EAUX-FO RT ES ORIGINALES , GRAVURES,OESSINS , L I VRES ET CATALOGUES !LLUSTRES 119191 1l3P 153 BtW lllUS, IP III BLIOGR APHY, FOO TNOTE S, INC. CHRONOLO~YI TITL E: CAT ALOGUE OF PICTUR ES IN .THE NATIONAL LOAN COLLECT ION TRUST,LONOON 119211 71P 124 BtW IL LUS , FOOTN OTES I AUT HOR: BALOASS,LUOW I G V~; TI TLE: KATA LOG .OER GO~ELINS-AUSS TELLUNG I PART 2 OF A WORK I~ 3 PARTSI 119211 IZP 11 6 3tW ILLUS) TITLE: CATALOGUE DES TABLEAUX: AQUARELLcS,OESS IN S,COLLECTION UHOE , SALLE N.l. 119221 146P 142 BtW I LLU SI TITLE: ENGRAVINGS,ETCHINGS AND DRAW I NGS ! CATA LOGUE •4301 11 9221 36P 17 Bt w ILL US , 1 P BIBliOGRAPHY, FOOTNOTESI T I TLE : EOOUARO MANET CUISIAL LNIN G AV HANS ARBETEN I SK4 NDINAVISK AGOI '0' ~ ~ - ..Q.. t-t & a "'t ~ f .,... 5" ;'$ < 0 ~ c.o -t+>-- d (!) n (!) g. (!) v'"t ~ ~ 0 Listings of Uncataloged CollectionsjBELLOMY and JACCARINO 299 COUNT 1 3 l 1 1 4 6 1 1 1 1 1 2 fl 1 1 1 1 1 1 7 1 1 8 3 q 1 1 1 2 R 1 14 1 1 1 1 1 4 1 4 1 1 2 1 4 1 23 3 1 57 SURJECT PA I~TING,AME~ ICAN ,2 0TH CENTURY, 1qA3-1967 PAINTING,ARGENTINEr20TH CENTURY PAI~TING,AUSTRAIN,15TH C~NTURY PAINTING,AUSTRALIAN,20TH CENTURY,l954-1966,COLL~(TIONS PAINTING,AUSTR!AN,lBTH CENTURY PAINTING~AUSTRIANrl9TH rENTURV PAT~TING,AUSTRIAN,20TH CENTURY PAlNTING,AUSTRIAN,20TH CENTURY,COLLECTIONS PAINTING,BAROQUE,OUTCH,l7TH CE~TURY PATNTING,BAROQUE,FLEMISH,1?TH CEN TURY ?AINTING,BAROQUE,ITALY PAJNTING,BAROQUE,lBTH CENTURY PAINTINGrBELGIAN,lQTH CFNTURY PAINTING,BELGIA~.20TH CENTURY PAINTING,BRASILIAN,20TH CENTURY DAI~TING,BRITISHrl9TH CENTURY PAT~TING,CANAOA,ZOTH CE NTURY PAINTING,CANADIANrl9TH CENTU~Y PAINTING,CANADIAN,2 0 TH CENTURY PAINTING,CHINESE,COLLECTI ONS PATNTING,COLLECTtnNS. PAINTING, COLLECTIONS PAINTING,CZECHOSLOVAK,17TH CENTURY PAINTING,OUTCH,COLLECTIONS PAINTING,OUTCH,17TH CFNTURY PATNTING,DUTCH,1qTH CENTURY PAINTING,OUTCH,20TH CENTURY PAINTING,ENGLISH,COLLFCTIQNS PATNTING,ENGLISH,NOR~I\.H SCHOOL PAINTING,ENGLISH,l6TH CENTUPY PAINTING,ENGLISH,18TH CENTURY PAINTING,ENGLISH,l9TH CENTURY PAJNTING,ENGLISY,19TH CENTURY,COLLECTIONS PAINTING,ENGLISH, 20T H CENTURY PATNTING,ENGLJSH,?OTH C~ETURY PAl~TING,EUREPEA~ PAINTING,EUROPEA~ PAINTING,FLE~ISH,COLLECTI~NS PATNTING,FLEMISH,l6TH CENTURY,COLLECTIGNS PAT~TING,FLEMISH,t7TH CE~TURV PATNTING,FLEMISH,17TH CFNTURY,COLLECTIO~S Pt.TNTING,FRENCH PATNTING,FRENCH,COLLECTI0NS PATNTING,FRE~CH,l6TH CENTURY,l530-1619 PATNTING,FRE~CH,l7TH CENTURY PAtNTING,FRENCH,17TH .CENTURY,CrLLECTlONS PAINTING,FR ENCH,J~TH CcNT~RY PAINTING,FRE~CH,lRTH C~~T~RY,COLLECTIONS PATNTING,FRENCH,19TH CE~TURY PAINTJNG,FRENCH,19TH CFNTURY,CnLL~CTIONS PAlNTING,FRENCH,l9TH CENTURY,t892-1897 DAJNTING,FRENCH,20TH CENTURY Fig. 3. Subject Listing. OIL OIL SPILL INFOR"ATION CENTER Sli8JEC T IN DE X tC TOBER, 1970 PAGE 102 OH c.o 8 OIL IHPORTS.HISTDRY ~E 0007 ...... 0 ~ OIL IHFC~lS.~ESlRlllll~S JO Ol.lJ '"'t ~ ~ ...... OIL IN NAVICA8l£ WATERS ACT 119221 Jt 1100 J O 1013 JO 1195 .Q.. JO 2065 ~ .... OIL IN NAVIGA8L£ WH£RS ALT .AMENDHENTS J~ 1100 ~ 119631 ~ ~ OIL LANDS NE 0603 JO 0129 ~ 1:: OIL lUKAGE J() 016 0 GP 0068 ..... 0 OIL LEAKS JO 0038 ~ ..... .... OIL POllUTION GP UO.lO GP 001l GP 0012 JO OOU GP OCC4 GP OOSS Gf 0 056 GP 0057 JO 0008 GP 0069 0 ~ GP 0060 JO 1011 J G 002 2 JO 0103 GP 0034 JO 002 5 J t 0Cl6 JO 0037 JO 0098 JO 0009 JO COijO JO 2021 J O 0032 JO OIH GP OC61t JO 003.5 JC OC86 JO 0071 JO 0118 JO 0039 < J(; 0100 Nf 101t1 JO 0 082 JO OISl JU OC11t JO C0~5 JC OIU J~ 0087 JO l l~t8 JO 0089 JO 0120 Nf 2111 JG 0092 JO 1013 JO OOllt J O C105 Jt 0126 JO 0097 JO 2018 JO 0099 £. .JO 10oU Nf 2261 JC 0102 JU 1213 JO CC41t J(J 0115 JC Olit6 JU 0127 NE 0608 JO 0109 JU 1100 JG 0132 NE 0603 Jll 0 094 J O 1165 Jll 1196 JO 1067 NE I 008 JO 0129 c.o J~ 20QO Jll 1002 NE IOU JU 010 4 J() £ 025 J( £036 JO 1297 Nf 1088 JO 0199 -NE 06IU JC lOll NE IUS) Jll 0154 JO 2065 NE 2016 r.e· 0607 NE £048 J O 1209 ,;:.. Nf 1040 JO 1092 NE 1073 JO 1014 NE 0615 M 21t6 hE 1 0 67 NE 2088 JO 1229 NE 1070 JC llti2 NE 2U03 JO IC44 NE 2035 NE 2246 ~E 2097 NE 2188 JO 1299 NE 2090 JO 2002 NE 2083 JO IC54 NE 2045 1<£ 2.l5t H 2257 NE 2228 NE 0609 t::J NE 22•0 JU 2012 N£ 2093 JG 1C6'< NE 2085 "E 22~6 I02 NE 210l JO 1084 Igrams might never have been developed. 5263 ---- 320 BOOK REVIEWS A Computer Based System for Reserve Activities in a University Library, by Paul J. Fasana (and others). New York: Systems Office, The Libraries, Columbia University, 1969. (Final Report, Project No. 7-1129, U. S. Office of Education, Bureau of Research) iii, 50, (53) pp. One opens this report wondering whether circulation of reserve books to readers is included in the computer based system, and assuming that such circulation would have to be handled on-line because the short duration of reserve loans, often on the order of one hour, would not seem to fit well with batch processing. It is soon made clear that on-line circula- tion was set as a goal of the second phase of the system; only the first phase is described here, though somewhat tantalizingly it is stated that one of the aspects of phase two already developed or experimented with is "a fully operational off-line circulation system." What is reported here, however, in commendable fullness, is a system, called Reserves Processing, which greatly facilitates the processes of put- ting books on reserve, taking them off, and producing reference lists. Emphasis has been placed on developing a generalized system that can be used in different units of the Columbia University Libraries, and, with necessary modifications, in other academic libraries. The preferred form of data entry is on-line with an IBM 2741 terminal. Other functions (and backup systems for data entry) are off-line; the master reserve file is stored on an IBM 2311 disc pack. One section of the report describes the system for those who are not computer specialists; this includes copies of forms and form letters. Other sections give technical documentation, including a flow chart, details of format, and actual listings of four programs written in F level COBOL for OS/360. The report will be valuable to anyone con- sidering the problem of reserve books; its successor covering phase two will be eagerly awaited by all those interested in circulation as well. Foster M. Palmer Involvement of Computers in Medical Sciences, compiled by K. M. Shahid, H. J. Vander Aa, and L. M. C. J. Sicking. Amsterdam: Swets and Zeit- linger, 1969. 227 pp. The compilers of this volume have brought together the significant abstracts of the literature that pertains to the use of the computer in present-day medicine. This volume will serve a valuable purpose for those interested in the computer and its applications in medical sciences as it will give a broad overview of computer usage in medicine and many closely allied fields. As computer uses grow in frequency and diversity, a review of this type becomes increasingly valuable to those interested in the field. 1 ohn A. Prior Book Reviews 321 Translations Journals; List of Periodicals Translated Cover-to-Cover, Ab- stracted Publications and Periodicals Containing Selected Articles, com- piled by Mrs. A. S. de Groot-de Rook. Delft: European Translations Centre, 1970. 44 pp. $2.00. This book is an updated bibliographical list and union catalog intended as a guide to scientific and technical journals in translation. Entries, ar- ranged alphabetically by original title, contain bibliographical details, publisher and price. The list includes both current and terminated period- icals (about 400 entries). There are cross references from the translated title to the original title. At the end of each entry selected locations and their holdings are listed. The holdings of National Translation Centres and/or libraries adhering to the European Translation Centre are also included. A "List of Publish- ing Houses," the agents from which to order, are included along with mailing addresses. There is also a "List of Holding Libraries" with addresses. Only non-Western language periodicals for which there are Western language verisions are included. No non-Western journals that contain Western language articles or journals originally published in Western languages are included. Irene Braden Hoadley ' Proceedings of the 1969 Clinic on Library Applications of Data Processing, edited by Dewey E. Carroll. Urbana: University of Illinois Graduate School of Library Science, 1970. 144 pp. $5.00. The volume contains eleven invited papers presented at the seventh annual Clinic on Library Applications of Data Processing held April 27- 30, 1969, at Urbana, Illinois. As in the preceding volumes in this series, the purpose is to report actual experience in case history form of applica- tions of data processing technology to areas of library operations. The book is a source of information on how particular problems were handled within a particular environment. Library operations which receive par- ticular attention are the usual ones: acquisitions, cataloging and circulation. "Library Networks: Cataloging and Bibliographic Aspects," by Ann Curran presents actual problems encountered in the development of an operating network as well as many thought-provoking questions. Stephen Salmon's article on automation of the Library of Congress Card Division is very informative. Also of interest are two articles dealing with PL/I as a programming language for library applications. Several articles are beginning to describe on-line applications of data processing for libraries as well as batch processing and the optimal mixes of both. Unlike some of the preceding volumes, this volume has a very fine over- ali index. There is an error in the name of one of the authors (James B. Corbin should b e John B. Corbin). No participant discussion is included. Kenneth ]. Bierman 322 Journal of Library Automation Vol. 3/4 December, 1970 Techniques of Information Retrieval, by B. C. Vickery. Hamden, Conn.: Archon Books, 1970. 262 pp. $11.00. This book is a lucidly written text dealing primarily with manual index- ing, and the manual construction of document profiles. There is a wealth of information about classification systems and their use for indexing purposes, and two particularly interesting chapters that give illustrations of some of the work going on at information centers, and of some of the basic concepts arising in systems evaluation, respectively. The present reviewer finds this book difficult to deal with, since the temptation continuously arises to substitute one's own aims for those of the author. To my mind, this book does not deal with the "techniques of information retrieval," as commonly understood. The latter would surely include a thorough description of automatic indexing procedures, auto- matic classification, on-line search systems, modern storage allocation meth- ods, fast search systems, and so on; and while some of these concepts are mentioned in passing, the reader surely cannot obtain an accurate picture in these areas. Rather, the book deals with conventional indexing procedures, and will likely be of value for the conventional training of librarians and documentalists. The text is easy to read, and includes plenty of examples, as well as some examination questions and exercises. Still, this reviewer wonders whether a more modem book might not have been published in 1970, particularly if the title includes the phrase "information retrieval." To this question, the author would likely answer (as on page 17) that the: " ... analysis and synthesis of information, though it may be aided by the machine can only be carried out effectively by skilled human labor;" or again (as on page 43) : " ... if we cannot say for certain what is the optimum human selection of index terms in a particular situation, then one cannot evaluate a machine selection." Statements such as these are easy to generate, particularly if one is not obliged to furnish any proof for one's assertions. In any case, they serve to illustrate the author's viewpoint and his particular choice of subject matter. To summarize, this text appears to be an excellent introduction to con- ventional documentation work, with emphasis on manual document analysis and indexing. It does not, unfortunately, give a reasonable preview of the fundamental changes which will inevitably occur in the information and documentation fields over the next ten or twenty years. G. Salton 5262 ---- 304 A MARC BASED SDI SERVICE Kenneth John BIERMAN: Data Processing Coordinator, Oklahoma Department of Libraries; and Betty Jean BLUE: Programmer, Information and Management Services Division, State Board of Public AHairs, Oklahoma City, Oklahoma. An operating SDI service utilizes the weekly MARC II tapes distributed by the Library of Congress. The history, creation, operation, uses, ad- vantages, disadvantages, cost and future plans for the SDI service are dis- cussed, and flow charts (system and detail) and sample output given. INTRODUCTION SDI (Selective Dissemination of Information) is the distribution of new information to individuals or groups according to their expressed interests. SDI as a service of libraries is not a new concept, for libraries have been providing such specialized current awareness services for years both form- ally and informally in such ways as routing proof slips to interested persons, departments, etc. Such services have been provided most commonly in special libraries, but are not uncommon in public and academic libraries as well ( 1). "Although the practice of SDI is not new, its application in libraries has been generally irregular, informal, and very limited- depend- ing variously on the memory, willingness and free time of the librarian and contingent on the desire and ability of the patron to make his interest known" ( 2). With the interest in library applications of data processing has come MARC Based SDI ServicejBIERMAN and BLUE 305 an interest in automated SDI services. "All computer based SDI systems work on the same principle and include two basic elements: subject inter- est profiles for the users and a machine readable file of indexed biblio- graphic records of current materials." ( 3) The Annual Review of Informa- tion Science, Volume 4, presents a summary of many different types of SDI systems ( 4) as well as an excellent bibliography ( 5). Additional recent automated SDI services are described in the literature (6-12). The purpose of this article is to describe an operating MARC-based SDI system, the environment within which it operates, and some of the thinking which led to its creation. BACKGROUND INFORMATION The Oklahoma Department of Libraries is the designated State Library Agency in Oklahoma. As such it has two primary statutory responsibilities: 1) provision of library services to State Government, including the Ex- ecutive, Legislative and Judicial Branches of Government, and the agencies of State Government and 2) state-wide responsibility for total library development, including the development of multi-county library systems. The Department provides a great variety of library services to fulfill these functions, one of which is the maintenance of a collection of materials with three subject specialty areas: 1) law (prhnarily of use to the Judicial and Legislative Branches), 2) political science (primarily of use to the Executive and Legislative Branches), and 3) library science (primarily of use to the Department's own staff and the librarians throughout the State). In addition, a general collection and a general reference collection are maintained primarily for use by the Executive and Legislative Branches of State Government and as back-up and supportive col1ections to the libraries throughout the State. With the beginning of the MARC Distribution Service March 29, 1969, the Department implemented a service for other libraries around the State by creating and maintaining a MARC Data Base for the use of all libraries within the State ( 13). After the data base had been created and was working satisfactorily, the Department considered what it could do with MARC to help its own operation. The five following paragraphs discuss projects suggested and considered. The first was design and implementation of the original input of a se- lected portion of the collection (the law collection, for example) in MARC format. The project would be a beginning of putting the entire collection in MARC format and would yield interim useful products as well (a book catalog, for example). However, it was decided that this project would be premature for two reasons. If retrospective conversion were going to be done nationally (14), it would be foolish for the Department to duplicate the work at the local level; and before the Department should expend money putting material into MARC format, it should demonstrate the 306 Journal of Library Automation Vol. 3/4 December, 1970 usefulness of MARC with the already existing records being distributed by the Library of Congress. A second project considered was design and implementation of conver- sion of the storage of the MARC data base from the sequential (tape ) system ( 13) to a direct-access system (disk, for example). Certainly from the standpoint of economic use of the data base such conversion is de- sirable, and for providing multiple access points to the data base (author, title, subject, etc.) it is essential. It was decided, however, that before additional funds and energy should be expended to improve the storage and retrieval of MARC records, the usefulness of the presently available individual records themselves should be demonstrated. Direct access storage and retrieval was deferred until the completion of the SDI system. Work has recently begun on the direct access project ( 15). Design and implementation of an acquisitions module for the Depart- ment was considered because the Department was preparing to re-design its acquisition system. However, to have a meaningful automated MARC- based acquisitions system it would be necessary to search the data base by a number of entry points (author, title, etc. ), which would require the direct access system described above. A fourth project considered was design and implementation of a catalog card set and processing aids (label, etc. ) production module for the De- partment. Because the Department does centralized processing for the library systems throughout the State, the Processing Center is a critical area within the library; therefore, this alternative had the most immediate appeal. It was decided that the Department was not financially prepared to undertake so ambitious a project yet. It was felt that a less ambitious project should be undertaken fust to gain knowledge and experience which would be essential in a successful catalog card and processing aids production system. The project ultimately selected was design and implementation of a subject current-awareness service based upon the weekly MARC tapes. The service would be immediately useful, both to the Department and its clientele; was not dependent upon the maintenance of a large data base; and could be set up and operated quickly and economically. Further, the experience gained in manipulating the MARC records in the print portion of an SDI system would be valuable experience for manipulating the MARC records for printing catalog cards and processing aids at a later date. OVERVIEW OF THE SDI SYSTEM First, the subject interests of a particular user (perhaps an individual, but more likely a State agency) are profiled in Dewey and/or L.C. classifi- cation numbers by a reference librarian from the Department. For example, the Library School could use a listing of MARC records on each weekly tape dealing with library science. Table 1 is a library science profile. MARC Based SDI ServicejBIERMAN and BLUE 307 The Dewey and LC classification numbers of each MARC record on a tape are compared with the profiled Dewey and LC classification numbers. Table 1. Library Science Profile . Subject Library Science Manuscripts & rare books Ethics of librarianship Library manpower Library study techniques Audio visual instruction Films in adult education Printing & binding Bookselling Management of libraries City planning & libraries Architecture & libraries Book illustration Motion pictures Dewey Numbers 020-029 090-099 174.902 331.76102 371.30 371.33 374.27 655 658.809655 658.9102 711.57 727.8 741.64 791.43 L. C.' Number~ Zl-ZlOOO When either the Dewey or LC number matches, that record is pulled from the MARC tape and copied onto a detail tape. After the entire MARC tape has been searched, the detail tape is rewound and the selected records are printed. DESCRIPTION OF THE SDI PROGRAMS The SDI system consists of two programs: the first, ODL-07, pulls the appropriate records from a MARC tape, and the second, ODL-07X, prints these records in a readable form and appropriate sequence. Figure 1 is a system flow chart. ODL-07 Program Inputs are 1) a control card giVmg program identification and date; 2) header cards containing list codes and headers. (The list code is a one-character code that uniquely identifies the list, e.g., "Z" for library science; the header will appear on each page of output, e.g., "LIBRARY SCIENCE" for the library science list.); 3) classification number cards, which contain the proper list code, a selector code ( "D" for Dewey and "C" for Library of Congress), and the LC or Dewey classification number or range of numbers to be selected; and 4) the MARC tape to be searched. Outputs are: 1) a header tape containing all the information from the header cards and the date, and 2) a detail tape containing all selected records with a list code for each record. 308 Journal of Library Automation Vol. 3/4 December, 1970 Print SDI Listings Fig. 1. SDI System Flow Chart. Figure 2 is a detail flow chart for ODL-07. The control and header cards are the first to be read. A header table is constructed for editing and records are written on the header tape. The classification number cards are then read. These cards are edited first for such errors as invalid list code (for each list code on a classification number card, there must be a corresponding header record), invalid selector code (must be "C" for LC or "D" for Dewey), and invalid characters in the LC or Dewey numbers (Dewey may not contain any alphabetic characters and the only valid special characters for Dewey are the period and dash; the first MARC Based SDI ServicejBIERMAN and BLUE 309 ( Start ) HSKP no onve rt to com ..., __ -lpore fom10t on construct LC & Dewey Tabl es Fig. 2. ODL-(/)7 Detail Flow Chart. Check tables for matches· HSKP character of LC must be alphabetic). If the classification cards pass edits, they are used to construct LC and/or Dewey entries. Each table entry consists of three items: the lowest acceptable value, the highest acceptable value, and the list code. Dewey classification numbers can be input into the system without reformatting and are converted by the program to table entries. Table 2 presents some Dewey numbers as they might be keypunched and input into the system and the corresponding table entries which would be created. Dewey numbers are converted from the free form to a fixed-length .10-position all-numeric form. LC classification numbers are more difficult. These cannot be entered into the system without reformatting as can the Dewey numbers; rather, Table 2. Dewey Classification Number Table Keypunched Classification Number Cards Corresponding Table Entry List Code Selector Code Z D Classification Number 174.902 List Code z Lowest Value 1749020000 0200000000 3317610200 3400000000 3311100000 Highest Value 1749029999 0299999999 3317610299 3499999999 3318989999 Z D 020-029 z Z D 331.76102 z L D 34 z L D 331.11-331.898 z KEY: Z = Library Science; L = Law; D = Dewey Classification Table 3. LC Classification Number Table Keypunched Classification Number Cards List Code Selector Code Lowest Value HV7231 Highest Value Explanation p c p c L C z c JOOOOO KOOOOO ZOOOOl HV9920 Records with LC classification number be- tween HV7231 and HV9920 will be hits. JKZZZZ Records with LC classification number be- ginning with J and JA-JK will be hits; JL-JZ will not be hits. KZZZZZ Records with LC classification number be- ginning with K (including KA-KZ) will be hits. Z01000 Records with LC classification number be- tween Z1 and Z1000 will be hits. KEY: Z = Library Science; L = Law; P = Political Science; C = LC Classification c.o ..... 0 'c' 3 ~ ...... c -~ ... ~ ~ "'!! r.:: > ~ .... c ~ .... c;· ~ < ~ c.o ......... ~ t) (!) @ o- (!) v'"l ..... :s 0 MARC Based SDI Service j BIERMAN and BLUE 311 low and high values are entered into the system and put directly into the LC table for searching of the MARC tape. Table 3 presents some LC classification numbers as they might be keypunched and entered into the system and a brief explanation of what records will be pulled as hits (matches). LC table entries are in the form of AANN,NN where AA stands for the two possible initial letters and NNNN stands for the four numbers following the initial letter( s) and immediately preceding the first decimal point or next alphabetic character. Zero is the lowest value and Z is the highest. Mter all classification number cards have been converted to table entries, the MARC tape is read, the LC and Dewey numbers are pulled from each record, and both tables are searched for hits (matches). The Dewey classification number from the MARC record is read and con- verted into a fixed-length 10-position numeric field. For example, the classification number 020/.6234/5456 from the MARC tape would be converted to 0206234545 and the number 025.3/02 would be converted to 0253020000 before Dewey table searching. If a classification number card had been 020-029 (see Table 2), both of these records would have been a hit. The LC classification number read from the MARC record is first converted to the form AANNNN and then searched against the LC table. For example, the classification number Z665.H45 from the MARC tape would be converted to Z00665 and Z678.3.K39 would be converted to Z00678 and then searched against the LC table. If the last entry in Table 3 had been input into the system, these records would both be hits, as their LC numbers lie between ZOOOOl and ZOlOOO. If a match is found in either table, the MARC record is transferred in the original MARC format to the output tape with the list code. Mter ODL-07 is completed, control passes to ODL-07X. ODL-fP7X Program Inputs are the header tape from the previous run and the detail tape containing the selected records from the previous ( ODL-07) run. Outputs are the SDI listings by subject areas (list code). Figure 3 is a detail flow chart for ODL-07X. The first record is read from the header tape and the detail tape is then searched for matching list codes. When a match is found, the MARC record is formatted and printed. When the entire tape has been searched, the next header is read, the detail tape is rewound and the process is repeated. This continues · until all header and detail records have been matched and printed. The result is a series of SDI lists, each in LC card number sequence. See Figure 4 for a sample of two printed records from a library science list. Presently, the weekly lists are being printed on two-up, three-part, perforated teletype size (8~" x 5~") paper, one rec- ord ( SDI notice) to each separable form. 312 Journal of Library Automation Vol. 3/4 December, 1970 HSKP at end at end yes Construct and Print Record no Fig. 3. ODL-07X Detail Flow Chart. DISCUSSION The· SDI system was written with flexibility as one of the main con- siderations. Dewey classification number cards in ahnost any format can be machine converted to the intended table entry. Both ranges and indi- vidual classification numbers are allowed. Any number of Dewey and LC entries and any number of lists can be handled simultaneously, the only limit being core size. The selection tables, not being built into the programs, can be changed at any time, weekly if desired. The print format generally follows traditional catalog card arrangement, the major difference being that each subject heading and added entry appears on a new line and is not numbered. The print program can be easily adapted to any conversion table desired; delimiters, field terminators, etc. are referred to symbolically. There is an optional feature which allows MARC Based SDI ServicefBIERMAN and BLUE 313 09/03170 ll eRARY SCIENCE STEVENS, ~ARY ELIZABETH• AUTC~ATIC INDEXING, A STATE•CF•THE•ART REPORT. REISSUED WITH ADDITIONS A.NO CCRRECTICNS. WASHINGTON, U.S. NATIONAL BUREAU OF STANDARDS, FOR SALE BY THE SUPT. Of DOCS., U.S. GGVT. PRINT. OFF., 1970. VIt 290 P. 26 CM. 2.25 (NATIONAL BUREAU OF STANDARDS ~ONOGRAPH 91) •A UNITED . STATES DEPARTMENT OF CO~~ERCE PUBLICATION.• INCLUDES EIBLICGRAPHIES. AUTOMATIC INDEXING. t.S. NATIONAL BUREAU Cf STANDARDS. MONOGRAPH 91 CC1CO.U556 NO. 91, l97C 029.5 73•t07239 MARC • OKlAHOMA OKlAHOMA D.,ARTM£NT OF Lo1uoon SDJ Usu INr<>RMATOOt< S••voco 09/03170 LIBRARY SCI EHCE LIBRARIANSHIP AND LITERATURE, ESSAYS IN HONOUR OF JACK PAFFCRC. ECITEC BY A. T. MILNE. LONDON, ATHLONE P., 1970. VIII ., ·141 P., 4 PLATES. ILLUS., PORT. 23 CM . 40/- INCLUOES eiBLIOGRAPHICAL REFERENCES. £R. J. ~. P. PAFFCROt BY A. T. MJLNE.--1. THE BRITISH MUSEUM IN RECENT TIMES, BY SIR F. FRA~CIS.--2. THE EDUCATION CF A LIBRARIAN, BY R. IRWIN.••3. LIBRARY CD-OPERATION IN GREAT BRITAIN, BY S. P. L. FILCN.••4. THE DEVELOPMENT Of BRITISH UNIVERSITY LIBRARIES, BY J. W. SCOTT.--5. PROBLEMS OF A SPECIAl LIBRARY, BY R. THO~AS.••6. T~E GROWTH OF LITERARY SOCIETIES, BY A. BROWN.••7. THE EDITOR AND THE LITERARY TEXTt REQUI~E~ENTS ANO OPPORTUNITIES, BY ~. F. BROOKS.••B. SOME LEAVES FRCM A THIRTEEN•CENTURV ILLUMINATED MANVSCRIPT IN THE UNIVERSITY OF lONDON LIBRARY, ay F. WORMAL0.••9. A BIBLIOGRAPHY Of J. H. P. PAfFORt, BY J. HARRIES AND R. Wo POUND. LIBRARY SCIENCE••ACDRESSE·s, ESSAYS, LECTURES. PAFFORD, JOHN HENRY PYLE. ~ILNE, ~LEX~NilER TAYlOR, EO. PAFFORD, JOHN ~ENRY PYlE. Z6~5.L57 C20/.9~2 10~477193 ~85111179 MARC • OKLAHOMA . OKlAHOMA o .. ARTM£NT 01' LIUAIIfS SOl u ... INFOIIMATION SuviCf Fig. 4. Sample SDI Notices. 314 Journal of Library Automation Vol. 3/4 December, 1970 any character or characters to be deleted and the resulting gap closed; this is desirable for diacriticals until better techniques for handling them are devised. Both line and page length are referred to symbolically and can be easily changed to fit any form desired. Line spacing and indentation are built into the present program, but even these can be changed. The major disadvantage of the SDI system as it now exists is that it allows selection by classification numbers only. Unlike the MARC I experi- mental SDI system at Indiana University (16), which allowed for selection by weighted terms (both classification number and subject heading), this system allows for classification number selection only. Programming dif- ficulties, expense, and the necessity for additional processing time inhibit searching on subject headings. For selection of detailed subjects, subject heading searching is essential; however, for making subject searches in subject areas classification number searching seems more expedient, as it would be difficult to determine, and expensive to input, all of the subject headings for the field of law, for example. Ideally, a MARC-based SDI system would be able to provide selection based on classification numbers and/or subject descriptors. COMPUTER, LANGUAGE AND COST The computer for which the programs were written was an IBM 360/30, 32K core size, one card read/punch, four tape drives, two disk drives and one printer. The programs have also been successfully run on an IBM 360/25 with one card read/punch, two tape drives and one printer. In the latter case, the first program was modified slightly because only two tape drives were available, whereas the SDI system normally requires three. Modification was easily accomplished by having the header records punched rather than written in ODL-07. The programs are written in COBOL for the 360, operating under DOS. Very little modification would be required to operate under OS. Being written in COBOL, the programs are easily adapted from one machine to another; they have been successfully run on a RCA Spectra, for ex- ample. They also are easily adapted and changed, the symbolic names and procedure division paragraph headings having been carefully selected to build in as much documentation as possible. Following is a breakdown of the charges to the Department of Libraries for programming and machine time for development; Department of Libraries' staff time, overhead costs, and operating costs are not included. Programming and debugging ------------------------$2,941.00 Machine & operator costs for testing ___________ 452.00 Operating costs are more difficult to determine and nearly impossible to evaluate meaningfully. The total amount of computer time required (and therefore the major cost) is primarily a function of the number of records on the MARC tape being searched and the number of selected MARC Based SDI ServicejBIERMAN and BLUE 315 and printed records. If the MARC tape contains 1,200 records, it takes about twelve minutes (clock time) of computer time (IBM 360/30, 32K) to select the desired records ( ODL-07). As the total of classification num- bers being searched increases (that is, as the Dewey and LC tables grow), the computer time for selection does not appear to increase significantly. The print program ( ODL-07X) is directly a function of the number of lists being produced (the number of times the detail tape must be rewound and re-read) and the total number of records being printed. As an example, ' if six different lists are being produced and a total of 375 records are being printed out, the computer time is 25 minutes. Therefore, producing six weekly lists with an average of 62 records for each list takes approximately 37 minutes (clock time) each week. At the rate of $60.00 an hour, this is $37.00, or approximately 10c per record selected and SDI notice printed. Table 4 presents a detailed analysis of five weekly runs. The total com- puter time is the number of minutes which were charged to the Depart- ment of Libraries by the computer center. Since the Department is charged one dollar per minute, this is also the dollar cost to the Department for computer and operator costs for that weekly run. Unfortunately, the total time given includes time for set-up and other factors. Therefore, meaningful patterns are difficult to discern, as one week it may take several minutes longer to get the forms inserted and lined up in the printer, forms may break another week, etc. The remainder of Table 4 is exactly accurate. It is interesting to note how much variance there is from week to week in the number of SDI notices for each subject list. For example, out of 889 MARC records on the MARC tape run on July 23, 16 were library science titles. However, the MARC tape run on August 6 contained 1,201 records but only 12 were library science titles. In addition, notice that the library science list was reprinted seven times, and for the last two weeks reprinted five times, to get the total number of copies needed for the 25 subscribers to the list. CURRENT USES The uses to which the system is presently being put are in three general areas: 1) SDI lists for internal use of the Department, 2) SDI lists for State Government, and 3) SDI lists for other libraries. The Department currently produces subject lists primarily for its own use in the areas of law and political science. Since the Department main- tains specialty collections in these two subject areas, it is anxious to obtain the most current information on materials published in them for selection purposes. Because the MARC record comes out before the corresponding proof slip is distributed ( 17), use of the MARC file has been a most suc- cessful means of obtaining complete and verified bibliographic information for the purpose of ordering new books. In addition, complete LC cataloging information is available should the proof not have arrived at the time the book is received. Because the lists are currently being printed on three- 2'able 4. Sample Run Times and List Lengths. r- "' Cl' -8 a .. " ~ n - 3 '" ~ .. " 1 1 1 1 75 92 118 22 /:J Yl. liS Ll. 1 1 1 1 7 1 R~ 100 ?n 71 83 100 20 1 1 1 1 60 65 73 21 60 65 73 21 1 1 1 1 61 80 89 29 61 80 89 29 1 1 1 1 80 113 106 31 80 113 106 31 c )> c .. ~ .. "!?- "!?-0 "' 2. 2. (') ~ ~ .. g; .. !1. a (;' a 1 1 -- 15 41 -- I:J 41 -- 1 1 -- R u -- 8 44 -- 1 I 1 5 34 16 5 34 16 1 1 1 11 38 17 11 38 17 1 1 1 11 42 22 11 42 22 Number of Print Runs Number of MARC Records Selected Number of SOl Notices Printed Number of Print Runs Nun>t>e_r of MARC Records Selected Number of SOl Notices Printed Number of Pri nt Runs Number of MARC Records Selected Number of SOl Noli ces Printed Number of Print Runs Number of MARC Records Se ected Number of 501 Notices Printed Number of Pri nt Runs Number of MARC Records Selected Number of SOl Not ices Printed c.;, ...... :: "'! ;I ~ .Q.. ~ ~ ~ "'i ~ > >:: cs .... ;::s a ~· ~ ~ c.;, .......... ~ t::1 ('t) (') ('t) s 0" J~ ...... ~ MARC Based SDI ServicejBIERMAN and BLUE 317 part teletype paper, one record per sheet, it is easy to separate the record to be ordered and send one copy to acquisitions, retaining one copy for the files, and sending one to the interested individual in State government with a note that the book is on order. The Department also produces a special list of many different subjects which are of interest to the Legislature for the Legislative Reference Divi- sion of the Governmental Services Branch. The Legislative Reference Divi- sion can then order particularly useful materials quickly and route a copy of the SDI printout to the interested legislator or legislative committee. The Department has prepared profiles of the State agencies having a large planning and research role. Lists are prepared weekly for the De- partment of Education, Department of Corrections, Department of Voca- tional-Technical Education, Department of Welfare, Industrial Develop- ment Commission, Department of Highways, and several small agencies, and are sent to the person responsible for planning and research within the department. He can then request books from the lists by returning one copy of the SDI notice to the Department of Libraries with a note to order, retaining the other copy for his files or routing it to a researcher particularly interested in the subject. Certain lists are being produced and shared with libraries around the State. The law and political science lists are being sent to two law schools in Oklahoma. The library science and bibliography lists are being sent to the Library School and the two largest public library systems, as well as the two State universities. Over 25 libraries outside Oklahoma are receiving weekly library science, political science or law lists ( 18). A cooperative acquisitions program is evolving whereby certain libraries agree to specialize in certain subject areas so that every subject area would be covered by one library for specialized materials not needed by all libraries. Currently, the program involves the two major public libraries and the Department of Libraries wherein the State Teletype network (OTIS) is used to transmit rapidly information on expensive materials for cooperative acquisitions. Selected lists in the specialized subject areas can be produced each week for each of the cooperating libraries to aid them in their selection, acquisition and cataloging of the materials. The uses currently being made have excited the imagination of many people, both within and without the Department of Libraries. A great deal has been accomplished since the system became operational early in February 1970; however, the possibilities have barely been identified. As mentioned above, one can envision this being the foundation of a cooperative acquisitions program. Such a system could form a node of library service to business and industry; currently, some thought is being given to producing weekly lists of materials in automation and computer science (systems analysis, etc.) both for the many State agencies which have automated equipment and for businesses and industries around the State which utilize computer technology. 318 Journal of Library Automation Vol. 3/4 December, 1970 CONCLUSION MARC is an exciting and potentially valuable innovative new tool avail- able to the library community, useful to improve both its own internal operations and, more importantly, its service to others. Nonetheless, before extensive meaningful use of MARC will occur, its potential uses must be identified and explored. This article has attempted to give a picture of one such experimental project to improve library service for others within the framework of a particular institution's resources and functions. Much more research is needed on potential and operating uses of MARC and the results of this research need to be disseminated to the library com- munity. In addition, it is the opinion of the authors that for reasons both of available financial resources and expertise much of the research and development with MARC must be a cooperative venture among many different libraries. Some work has been done with MARC cooperatively throughout the country (NELINET (19), OCLC (20), CLSD (21), for example) but much more needs to be done. The future of meaningful uses of MARC is bright; however, much research and development is yet to be done which can best be done as a cooperative effort. PROGRAMS AND ADDITIONAL INFORMATION SDI computer programs and services available from the Department of Libraries to other libraries are described in a publication called "SDI Services and Costs," available from the Oklahoma Department of Libraries, 109 State Capitol, Oklahoma City, Oklahoma 73105. Additional progress reports on the SDI project, as well as other automation projects in Okla- homa are reported in the bi-monthly Oklahoma Department of Libraries Automation Newsletter, which is available on request. REFERENCES 1. Cuadra, Carlos A., Editor: Annual Review of Information Science and Technology, 4 (Chicago: Encyclopedia Britannica, 1969), 249-258. 2. Studer, William Joseph: Computer-Based Selective Dissemination of Information (SDI) Service for Faculty Using Library of Congress Machine-Readable Catalog (MARC) Records (Ph.D Dissertation, Graduate Library School, Indiana University, September, 1968 ), 1. 3. Studer, William J.: "Book-Oriented SDI Service Provided for 40 Faculty." In Avram, Henriette D.: The MARC Pilot Profect; Final Report on a Project Sponsored by the Council on Library Resources, Inc. (Washington: Library of Congress, 1968), 180. 4. Cuadra: op. cit., 243-258. 5. Ibid:. 263-270. 6. Bloomfield, Masse: "Current Awareness Publications; An Evalua- tion," Special Libraries, 60 (October 1969), 514-520. MARC Based SDI ServicejBIERMAN and BLUE 319 7. Bottle, Robert T.: "Title Indexes as Alerting Services in the Chemical and Life Sciences," Journal of the American Society for Information Science, 21 (January-February 1970), 16-21. 8. Brannon, Pam Barney; et al.: "Automated Literature Alerting Sys- tem," American Documentation, 20 (January 1969), 16-20. 9. Brown, Jack E.: "The CAN/SDI project; The SDI program of Canada's National Science Library," Special Libraries, 60 (October 1969), 501-509. 10. Davis, Charles H.; Hiatt, Peter: "An Automated Current-Awareness Service for Public Libraries," Journal of the American Society for Information Science, 21 (January-February 1970), 29-33. 11. Housman, Edward M.: "Survey of Current Systems for Selective Dissemination of Information ( SDI) ." In Proceedings of the Ameri- can Society for Information Science, 6 (Westport, Connecticut: Greenwood Publishing Corporation, 1969), 57-61. 12. Martin, Dohn H.: "MARC Tape as a Selection Tool in the Medical Library," Special Libraries, 61 (April 1970), 190-193. 13. Bierman, Kenneth John; Blue, Betty Jean: "Processing of MARC, Tapes for Cooperative Use," Journal of Library Automation, 3 (March 1970), 36-64. 14. RECON Working Task Force: Conversion of Retrospective Catalog Records to Machine-Readable Form; A Study of the Feasibility of a National Bibliographic Service (Washington D.C.: Library of Congress, 1969). 15. Bierman, Kenneth John: "MARC-Oklahoma Data Base Maintenance Project," Oklahoma Department of Libraries Automation Newsletter, 2 ( October 1970). 16. Studer, William J.: (Op. cit., note 2), 23-37. 17. Payne, Charles T.; McGee, Robert S.: "Comparisons of LC Proof- slip and MARC tape Arrival Dates at the University of Chicago Library," Journal of Library Automation, 3 (June 1970 ), 115-121. 18. Bierman, Kenneth John: "MARC-Oklahoma Cooperative SDI Proj- ect Report No. 1," Oklahoma Department of Libraries Automation Newsletter, 2 (June & August 1970), 10-14. 19. Nugent, William R.: NELINET: The New England Library Infor- mation Network. Paper presented at the International Federation for Information Processing, IFIP Congress 68, Edinburgh, Scotland, August 6, 1968. (Cambridge, Mass: Inforonics, Inc., 1968). 20. Kilgour, Frederick G.: "A Regional Network- Ohio College Library Center" Datamation, 16 (February 1970 ), 87-89. 21. The Collaborative Library Systems Development Project (CLSD): Chicago-Columbia-Stanford. Unpublished paper presented at the MARC II Special Institute, San Francisco, September 29-30, 1969.