5578 ---- lib-MOCS-KMC364-20140103102230 38 Journal of Library Automation Vol. 4/1 March, 1971 RECON PILOT PROJECT: A PROGRESS REPORT, APRIL-SEPTEMBER 1970 Henriette D. AVRAM and Lenore S. MARUYAMA: MARC Development Office, Library of Congress, Washington, D. C. A synopsis of the third progress report on the RECON Pilot Project sub- mitted by the Library of Congress to the Council on Library Resources. An overview is given of the progress made from April through September 1970 in the following areas: RECON production, format recognition, re- search titles, microfilming, and investigation of input devices. In addition, the status of the tasks assigned to the RECON Working Task Force are briefly described. INTRODUCTION The RECON Pilot Project was established in August 1969 to test various techniques for retrospective conversion in an operational environment and to convert a useful body of records into machine readable form. It is being supported with funds from the Council on Library Resources, the U.S. Office of Education, and the Library of Congress. This article sum- marizes the third progress report of the pilot project submitted by the Library of Congress to the Council and has addressed itself to all aspects of the project, regardless of the source of funding, in order to present a meaningful document. Two previous articles in the Journal of Library Automation summarized the first and second progress reports, respectively ( 1), ( 2). This article describes the activities occurring April through September 1970. PROGRESS-APRIL THROUGH SEPTEMBER 1970 RECON Production At the present time, the RECON data base contains approximately 20,000 records. It appears that the original estimates on the number of titles to be input during the RECON Pilot Project were considerably higher than the actual number found to be eligible. This situation occurred because of the following circumstances: RECON PILOT PROJECT/ AVRAM and MARUYAMA 39 1) The original estimates were derived from the number of English language monographs cataloged during 1968 and 1969. Since the MARC Distribution Service began in March 1969, it was felt that the number of titles eligible for RECON in the 1969 and 7-series of card numbers would be equal to the number cataloged during January-March 1969. In actuality, the titles cataloged during this period were primarily records with 1968 card numbers. 2) The estimate of records with 1968 card numbers was higher because it was thought that many more of these titles had been through the cata- loging system than were actually processed prior to the beginning of the MARC Distribution Service. Instead of being included in RECON, these records have been input into the MARC Distribution Service. In order to obtain 85,000 records for conversion, several alternatives, including the conversion of English language monographs in the 1967 card series, are being studied. Format Recognition Format recognition is a technique that will allow the computer to process unedited catalog records by examining data strings for certain keywords, significant punctuation, and other clues to determine the proper content designators. This technique should eliminate substantial portions of the manual editing process and, if successful, should represent a con- siderable savings in the cost of creating machine readable records. The logical design for format recognition has been completed, and the manual simulation to test the efficiency of the algorithms was described in an earlier article ( 3). Completion date for the programs is expected in February 1971. The programs were designed in several modules so that they could be adapted for different input procedures without disturbing the logic. Once the programs have been implemented, tests may show that certain fields should be pretagged because the error rate is too high or the occurrence of the field is too low to justify the processing time. The complete logical design for format recognition has been published as a separate report by the American Library Association ( 4). As part of a manual simulation to test the format recognition algorithms, one hundred fifty records for English language monographs were typed on an MT/ST, a typewriter-to-magnetic tape device. The MT/ST hard- copy output was used as the raw data for the simulation. The results of the test were analyzed for possible changes to the algorithms, keyword lists, or input specifications. Then the records with the content designators assigned by the format recognition algorithms were retyped and processed by the existing MARC system programs. Proofsheets were produced and giVen to the RECON editors for proofing, a process to verify content desig- nators and bibliographic information. 40 Journal of Library Automation Vol. 4/1 March, 1971 Each editor proofed all of the format recognition records; their hourly numbers of records proofed were as follows: highest, 9.3; lowest, 5.3; average, 6.8. The average number of current MARC records edited and proofed in an hour is 4.8. When format recognition is implemented, present workflow-editing, typing, computer processing, proofing-will be replaced by a new one- typing, format recognition, proofing. In comparing production rates in the two systems, time needed to proof format recognition records must be compared against time needed to edit and proof in the current system. Several factors should be considered when evaluating this portion of the simulation experiment. Although all the records chosen for the test were of English language monographs, they were generally more difficult than those encountered in a normal day's work for both editors and typists. In addition, numerous errors were made by the human simulators, such as omission of subfield codes, delimiters, or fixed field codes. Format recognition does appear to have reduced the amount of time spent in the combined editing and proofing process, but the success of the program depends heavily on the following factors: 1) extensive training for the input typists with greater emphasis placed on their role in this project; and 2) extensive training for the editors to alert them to kinds of errors the format recognition programs might make. Proofing time for the test was greater than anticipated. With fewer errors from the typing input and the elimination of human errors from the simulation, it is possible that the proofing rate will be higher under actual work conditions. Editors might reach an average of 9.3 records proofed, or double the number presently done in a combined editing/ proofing process. Two programs are being written to support the format recognition project. Format Recognition Test Data Generation (FORTGEN) will provide test data for format recognition by stripping MARC records of delimiters, indicators, and subfield codes, and reformatting the data to be identical with the product from the initial input program. Thus, a large quantity of high quality test data can be provided without additional keystroking. The Keyword List Maintenance Program ( KLMP) maintains approx- imately sixty keyword lists used by the format recognition program in processing bibliographic data. These lists are maintained as a separate data set on a 2314 disk pack. The actual lists themselves, alon~ with associated control data, are referred to as "keyword list structures. ' The general function of KLMP is to read the entire set of keyword list structures from the file on disk, modify them as specified by parameter cards to KLMP, and write a new file on disk. The individual actions performed by KLMP are as follows: 1) create a list; 2) remove a list; 3) add a key- word; 4) delete a keyword; 5) augment a table (translation tables to RECON PILOT PROJECT/ AVRAM and MARUYAMA 41 generate codes such as Geographic Area Code, Language, Place of Publi- cation); and 6) list structures (printout of all or selected portions of a list). Since the keyword lists will be dynamic in nature, this program provides the flexibility required to change or update them without recataloging the entire format recognition program. New lists will be added as format recognition is extended to other languages, and keywords will be added to or deleted from existing lists as experience is gained in the use of format recognition. Research Titles Since the production operations of the RECON Pilot Project have been limited to English language monographs in the 1968, 1969, or 7 -series of card numbers, it was recognized that many problems concerning retrospec- tive records would not be revealed in the conversion of relatively current titles. For this reason, a project to identify and analyze 5,000 research titles was included as part of the pilot project. These research titles would consist of records for older English language monographs and foreign language monographs in roman alphabets and would be studied for problems in the following areas: 1) earlier cataloging rules which caused certain elements to be omitted from the record or transcribed in a different style; 2) different printed card formats which placed elements in different locations; 3) difficulty in working with foreign languages when converting records to machine readable form; 4) problems arising from shared catalog- ing records; and 5) problems arising when expanding the format recognition algorithms to cover these kinds of records. The selection of these records was described in an earlier article ( 5). The initial analysis of the research titles has been completed, and a few of the problems encountered are listed as follows: 1) Ellipses at the beginning of a title field ( • . . Dictionnaire-manuel- illustre des ecrivains et des litteratures) were used frequently on older cataloging records. Since they are no longer prescribed by the present cataloging rules unless they appear on the title page at the beginning of a title, it was recommended that such ellipses be deleted from the machine record because they would affect the format recognition algorithms. 2) Card numbers without digits representing the year (F-3144) were assigned during 1901. Generally, these numbers appear with an alphabetic prefix representing the language of the publication or the classification number. It has been recommended that such numbers be revised to read "f01-3144" for the machine record. 3) Records cataloged under the 1908 A. L.A. Catalog Rules included in the series statement such information as the editor of the series or the location of the series statement (Half-title: Everyman's library, ed. by Ernest Rhys. Reference). It has been recommended that such information he deleted from the machine record. 4) An asterisk preceding personal name added entries (I. 0 Spence, 42 Journal of Library Automation Vol. 4/1 March, 1971 Lewis, 1874- joint author.) indicated that the name had appeared in a fuller form at an earlier date; if this name were used as the main entry, there would have been a corresponding full name note at the bottom of the catalog card. It has been decided that this asterisk will be deleted from the machine record. 5) The national bibliographies from which shared cataloging copy is derived use punctuation conventions which differ from the AA Rules. For example, the West German bibliography uses parentheses to indicate that the data are not on the title page, brackets to indicate the data are not in the publication, and angled brackets to indicate that the data are enclosed in parentheses on the title page ( <22.-27. Mai 1967>. Koln ([-Ehrenfeld] Bundesinstitut fur Ostwissenschaftliche und Internationale Studien) 1967). Such conventions would affect the expansion of the format recognition algorithms to foreign languages. This is an area in which the Standard Bibliographic Description would be of great value. 6) In the MARC II format, each place of publication is a separate sub- field so that when each place is connected by hyphens (Milano-Roma- Napoli ... ,), there would be a problem in inputting the data and having the data printed out in the same fashion. It has been recommended that each place of publication be separated with a comma instead of a hyphen (and the ellipsis deleted from the imprint statement). 7) Conjunctions have been used between places of publication on records cataloged according to the 1908 rules and on some shared cataloging copy (London, Glasgow and Bombay) (Neuwied a. Rh. u. Berlin). In the machine record, each place is a separate subfield, and the presence of a conjunction means that one subfield contains non-essential data. It has been recommended that conjunctions be omitted from the machine record and that places of publication be separated by commas. 8) The A. L.A. Cataloging Rules for Author and Title Entries states that with certain well-known persons, dates of birth and death can be omitted when the heading is followed by a subject subdivision ( 1. Shakespeare, William-Language-Glossaries, etc.). Since the rules provide a list of such persons, it has been recommended that when such names are used as subject headings, they should include dates of birth and death in the machine record. 9) A collation statement like the following ( 25 p., 27-204 p. of ill us., 205-232 p., 233-236 p. of illus., 237-247 p. 28 em.) would cause the format recognition algorithms some difficulty in identifying the proper subfields. This is another area in which the adoption of a Standard Bibliographic Description would aid format recognition programs. 10) Both East and West German bibliographies give information about illustrations in the title paragraph rather than in the collation (Title paragraph: [Mit] 147 Abbildungen und 71 Tabellen. Collation: xii, 418 p. 26 em.). The cataloging policy at the Library has been revised so that RECON PILOT PRO]ECT/AVRAM and MARUYAMA 43 on current cataloging records information about illustrations is also re- peated in the collation. It has been recommended that for retrospective records the data should be input as it appears on the catalog card. In this example, the machine record would not contain illustration information in the collation. 11) The method of transcribing non-LC subject headings has been changed in recent years, and the MARC II format reflects this change. In previous years, the following conventions were used: subscript brackets enClosed headings or portions of headings that were not the same as the LC form; subscript parentheses enclosed portions of headings that were the LC form but not the contributing library's; if two headings had the same number, the LC form was listed first; if both forms of the heading were the same, there would be only one number, and the heading itself would not have the subscript brackets or parentheses. It has been recom- mended that either the non-LC forms be deleted from the machine record or the transcription of such subject headings be revised to follow the current practice. 12) NLM subject hearings have different capitalization conventions from those used by LC, and the geographic subject subdivisions are often in a form different from that which the Library of Congress uses ( [DNLM: 1. Public Health Administration-U.S.S.R. W6 P3]). In analyzing these research titles in terms of possible problems with format recognition, it was discovered that NLM subject headings would be incorrectly identified for the above reasons. Format recognition depends heavily on capitaliza- tion and keyword lists; in this example, the heading "Public Health Administration" would be identified as a corporate name because of the capitalization. Examinination of the research titles showed the similarity of the cata- loging of the older records (pre-1949) and the current foreign language records based on shared cataloging copy. Certain stylistic conventions, such as the use of ellipses or the transcription of imprint statements, were similar for both kinds of material. It would be necessary to have a thorough knowledge of the ALA Catalog Rules (published in 1908) in order to interpret the data on the older printed cards correctly during a conversion project. The experience of the editors in the RECON Production Unit has been that retrospective records, even those cataloged during the last two years, require a considerable amount of interpretation in order to assign the correct content designators in the fixed fields. For pre-1949 records, the problem becomes more acute when one attempts to apply the procedures and techniques for current material to older records. It is very likely that a higher level of personnel would be required to process these records because in many instances the changes would be similar to recataloging the entire record. The expansion of format recognition to foreign languages would be 44 Journal of Library Automation Vol. 4/1 March, 1971 t emely difficult without a greater degree of consistency in shared ~:t~oging copy. Each national bibliography, from which the cataloging copy is derived, has its own rules and style of cataloging, so that although the language of the works may be the same, e.g., German, the entries from the West German, East German, Austrian, and Swiss bibliographies may differ in terms of punctuation or style of cataloging. These problems have been compounded by printer·s errors on the printed cards as the result of conventions that differ from the AA Rules. The adoption of the Standard Bibliographic Description ( 6) would be a tremendous aid in interpreting cataloging data by both humans and format recognition programs. Microfilming Techniques The Library's Photoduplication Service is supporting the RECON Pilot Project by providing the cost estimates for the various alternatives of microfilming techniques and providing technical guidance as required. Several discussions with them confirmed that the method of filming a portion of the record set containing the subset of records to be converted first and selecting the appropriate records afterward would be more advan- tageous than selection prior to microfilming ( 7). It was considered unrealistic to attempt to project microfilming costs for the entire RECON effort. Because of the paper handling problems involved in the management of input worksheets, the microfilming rate should be in reasonable proportion to the actual conversion rate. There is no point in providing a huge supply of input worksheets which will not be used in actual conversion for a long time. The data may become "dated," and there may be storage and handling problems. In addition, cost estimates provided by the Photoduplication Service can only be expected to prevail over the next twelve months. Beyond that period, any quotation given is likely to be higher because of the general trend of rising costs. Any projection of costs should be based on a manageable portion of the whole. Just what this portion should consist of has yet to be deter- mined. Assuming a modus operandi as described above, there is needed a determination of the "rate floor," which is defined as the minimum number of records that must be microfilmed to achieve the maximum cost benefits resulting from a relatively high volume job. Once the rate floor is determined, it should probably be translated into year equivalents, i.e., if the rate floor is 100,000 and the catalog card production is 50,000, then two years· worth of cards would be microfilmed. Estimates would be obtained for the following alternatives: microfilming for OCR device specifications; microfilming for reader-printer specifications; microfilming for reader specifications; and microfilming for Xerox Copyflo printouts of the LC printed cards onto RECON worksheets. Certain ground rules were assumed for the actual microfilming process. The selected drawers of the record would be "frozen" for a day or two prior to being filmed, i.e., the file would be complete and no one would RECON PILOT PROJECT/AVRAM and MARUYAMA 45 remove cards from the file while filming was in process. The filming would take place during the day. Assuming that 100,000 cards for the year 1965 would be used as a base figure and that approximately 5,000 cards per day can be filmed with a planetary camera, it would take twenty working days to film the collection of cards for one year in the record set (rate floor as defined above). All cost estimates will include quality control; i.e., quotations would indicate degree of inspection of film for technical quality and degree of preparation of the file before filming. Input Devices During 1969 the Library of Congress conducted an investigation to determine the feasibility and desirability of using a mini-computer for MARC/RECON input functions (original input and corrections). This study was performed with contractual support and consisted of three basic tasks: 1) analysis of present operations to determine functional requirements, to measure workloads, and to identify problem areas; 2) survey and analysis of mini-computers that are potentially capable of meeting the requirements of the present operations; 3) evaluation of available hardware and software capabilities relative to MARC data preparation requirements and determination of economic feasibility based on present and projected workloads. The intent of this study was to provide a basis for future planning and procurement activities by the Library of Congress relative to improvement of the MARC/RECON man-machine interface. The survey of hardware was not intended to be all-inclusive. There were time and funding limita- tions, and in addition it was recognized that the mini-computer field was a rapidly expanding one; therefore, it was not possible at any cut-off point to have surveyed the totality. Six firms were included in the survey, and the machines considered were the Burroughs TC-500, the Digital Equipment Corporation PDP-8/I, the Honeywell PDP-516, the IBM 1800, the lnterdata Model 4, and the XDS Sigma 3. Of these, the DEC PDP- 8/1 and the Honeywell PDP-516 were determined to have the highest potential for meeting MARC/RECON requirements. Additional analysis revealed that software availability for mini-computers is minimal. Manufacturers covered in this investigation supplied an as- sembler as well as testing and editing routines. Some provided a FOR- TRAN, ALGOL, or BASIC compiler and an operating system with fore- ground/background processing. Systems that support FORTRAN and the operating system are quite substantial, generally requiring 16,000 words of core, memory protect, disc, etc. The cost of this kind of system is generally a minimum of $10,000. Few low-cost peripheral devices are available for use with mini-com- puters. High-speed tape readers, punches, and punched card readers are the most inexpensive input/output devices available. The addition of a magnetic tape unit to most systems significantly increases the overall cost. 46 Journal of Library Automation Vol. 4/1 March, 1971 The conclusion reached as a result of this investigation was that there is no gain, either tech~ically or economically ( co.nsiderin~ ~he hardwa~e configuration of the L1brary of Congress), to usmg a mm1-computer m performing present MARC/RECON functions. Another input device investigated during this reporting period was the Keymatic Data System Model 1093, which was selected for a two-month test and evaluation period because it appeared to have the following advantages for the recording of bibliographic data: 1) this device has 256 unique codes; 2) data is recorded directly on computer compatible mag- netic tape; 3) through manufacturer supplied software, the user may assign to certain keys, called expandables, the value of whole strings of characters; thus a single key would equate to a MARC tag; 4) correction procedures are built into the device, i.e., the ability to delete a character, word, sentence, or entire record; and 5) the single character display screen obviates the necessity for hard copy. It is often claimed that hard-copy output is scanned by the typist tmintentionally to the detriment of typing rates. The machine tested was specifically set for the Library's requirements. Four separate keyboards contained 184 keys, of which 103 had upper- and lower-case capability, and the remaining 81 had only a single case. The 256 possible codes were divided into the following categories: 1) 94 were used as expandables and assigned to those MARC tags and data strings (correction and modification symbols) that appear most fre- quently; 2) 10 were used as machine function codes; 3) 150 were assigned unique values in the MARC character set; and 4) 2 were left unused. The keys on the four keyboards were assigned values such that the most frequently used keys were located in a strong stroke area. The main character keyboard was designed to be closely compatible to the device currently in use at the Library to lessen the training requirements for the typist. Therefore, the typist had only to learn the expandable keys and some lesser used special characters. The program supplied by the manu- facturer was modified for code conversion and output format acceptable to the MARC system and to conform to the Library's computer system assignments. The two typists selected to participate in the test were both experienced MARC production typists. Both typists were given individual instruction on the machine and spent three weeks practicing; at the same time, their performance was being analyzed and discussed with them. During the official evaluation period, the typists spent two weeks working full time on the machine. When the typists began their practice period, their speeds were relatively slow, 6-7 records per hour. As time progressed, their speed increased, leveling off to approximately 11-12 records per hour by the end of the test period. Each typist reported problem areas during the official evaluation. One problem was the hesitation which resulted when the typist had to detennine RECON PILOT PROJECT/AVRAM and MARUYAMA 47 whether to use an expandable key or actually type the data, character by character. If she chose the former, the expandable key had to be found. The number and different combination of tags caused some confusion. The opinion of both typists concerning the keyboard arrangement was that they would rather type the tags character by character than search for the expandable key. More experience on this device might eliminate this problem. The absence of hard copy was felt to cause another problem. When a typist intuitively feels that she has made an error in current MARC/ RECON typing operations, she uses the hard copy to verify that a mistake has actually been made prior to taking corrective action. The lack of hard copy did not allow for this verification, and the typists reported that this detracted from their efficiency. The following table lists the results of the official evaluation period. The average production rate of these two typists on the MT /ST is also listed. The figures for MT jST production have been calculated for a particular three-week period. Typist A Typist B Total MT/ST New records 505 540 1045 1995 Correction records 323 278 601 Verified records 58 537 595 Average records/hour-new 10.1 14.0 12.1 14.6 Average records /hour-corrected 21.3 27.7 24.5 Keystrokes Total 238,435 259,630 498,065 Expandables Used 12,280 14,646 26,926 The Keymatic model used for the test rents for $768.25 per month (July 1970 pricelist). It is a fully equipped model with several options not required for the MARC system. Without these options, a less expensive model could be used. Keymatic does have a 24-month lease plan in which the basic machine could be rented for $368.00 per month. This is an increase of $258.00 per month per machine over the current method of input. Costs per record were computed for the Keymatic device and for the MT /ST based on the average record statistics of both typists. Although the same records were not actually typed on the MT jST, extensive ex- perience with production and error rates on that device made it valid to use average production rates for purposes of comparison. For purposes of computing the cost per record, the hourly cost per machine was calculated by dividing the cost per machine by 160 working hours. The 24-month leasing price of $368.00 per month was used for the Keymatic, resulting in a macbine cost per hour of $2.30. The MT /ST rental cost is $110.00 per month, resulting in an hourly cost of $.69. (The cost of the MT /ST listed in a previous article ( 8) as being $100.00 was 48 Journal of Library Automation Vol. 4/1 March, 1971 in error.) On the basis of 12.1 records per hour on each device, the cost per record for the Keymatic is $.19 and $.06 for the MT /ST. In the context of the Library of Congress MARC/RECON Project, the addition of a Digi-Data to translate MT/ST output to computer compatible tape adds an incremental cost to each input device. For the purposes of this report, it was assumed that the project required five input devices. On this basis, the prorated Digi-Data cost per hour is $.33, which makes the total machine cost per hour for the MT /ST as $1.02. Thus, the cost per record for the MT /ST becomes $.08. The results of the test indicated that the Keymatic used in the Library of Congress environment did not substantially increase production rates or decrease error rates. Thus, no savings in cost were demonstrated. The complex data to be typed and the construction and quality of the work- sheets at the Library of Congress impose severe constraints on all machines. (The manuscript card reproduced on the MARC/RECON worksheet results in a source document that is difficult to work with for the following reasons: 1) loss of legibility during the copying process; 2) position of tags in relation to content; and 3) combination of typed and handwritten data as recorded by the catalogers. ) In order to make a fair comparison between the Keymatic and the MT /ST, the manuscript card was used for the test rather than the printed card. If, on evaluation, the Keymatic proved to be more efficient than the MT /ST using the manuscript card, it would be even more effective if the printed card were used, since the latter is a far more legible source document. Keymatic does have a new machine, Model K-103, which has an SO- character visual display option which might correct one of the objections raised by the typists, i.e., lack of hard copy; however, this model requires the use of a converter as does the MT /ST. This device is less expensive than the machine used in the test and may be evaluated during the RECON Project at a later date. An investigation of Model 370 CompuScan was continued following the initial findings reported in a previous article (9). Twenty-five letterpress Library of Congress printed cards representing English language titles and containing no diacritical marks in the content were sent to the firm for input. This allowed the machine to be evaluated and problems noted within an "ideal" test environment. Depending on these results, further testing could be performed. Since existing CompuScan software was used to conduct the Library of Congress test, the entire LC card could not be read but only that portion that contained fonts already built into the existing configuration. The printed cards were blocked out, except for the area covering the body of the entry, i.e., title through imprint, prior to microfilming for subsequent scanning. Operator intervention was required on approximately 1%-25% of the cfutracters on each card. In addition to the problems offered by variant RECON PILOT PROJECT/ AVRAM and MARUYAMA 49 and touching characters, fine lines in certain characters caused a misreading by the machine. This was f~uticularly true with the letter "e" being interpreted as the letter "c. CompuScan felt this problem might be resolved by increasing the size of the comparison matrix of the hardware. In some instances, a period was generated in the middle of a word due to the coarseness of the card stock that was microfilmed. Initial discussions have begun on the possibility of testing a retyped version of the printed card. The only rationale behind this test would be to investigate if typing for a scanner that could read upper-and lower-case and special characters made any significant difference in speed and/ or error rate compared to costs and production rates of typing for a scanner which could read only upper-case characters. The latter was described in an earlier article on RECON (10). RECON Working Task Force The Working Task Force continued the discussion on the implications of a national union catalog in machine-readable form. From the postulated reporting system for a future NUC described in a earlier article (11), several items were isolated for further consideration. These included: 1) grouping of records in a register (by language, alphabet, etc. ) to allow for a segmented approach to computer-produced book catalogs (a register is defined as a printed document containing the full bibliographic descrip- tions of works sequenced by unique identification numbers. As each record is added to the register, it is added at the end and assigned the next sequential identification number); 2) the need for additional indexes to the register by LC card number and classification number (the class number was not included in the list of data elements required for the machine-readable NUC); 3) the requirement to include the author state- ment in the title index versus using the main entry in all cases; and 4) clarification of subject index to mean only topical or geographic subjects. The following tasks were outlined for further consideration: 1) Format of the printed NUC (graphic design and printing, size, style, typographic variation, etc.); 2) Physical size of the volume depending on pattern of distribution (monthly, bimonthly, etc.); 3) Input (relationship to MARC input, use of format recognition, problems of languages in terms of selec- tion for input); 4) Output (cost of production for register and indexes, cost of sorting, costs of selection, etc.); 5) Cumulation patterns in terms ?£ cost and utility (number of characters in an average entry, number of Items on a page, rate of increase, etc.); 6) The use of COM (Computer Output Microfilm) as an alternative to photocomposition for printed output. Work on Task 3, the investigation of the possible use of existing data bases in machine readable form for a national bibliographic service, has been continued. Phase 1 of this task consisted of a survey of existing machine readable data bases. Selection of data bases for analysis was based on the following criteria: 1) The data base had to include monograph 50 Journal of Library Automation Vol. 4/1 March, 1971 records. 2) Any data base known to have predominantly LC MARC records was excluded. 3) The data base had to be potentially available to RECON (security organizations or commercial vendors might not be willing to give their files to a RECON effort). 4) Data bases of less than 15,000 records were excluded. A data analysis worksheet was prepared to reduce the documentation to a standardized form for each system studied in the survey. It was initially anticipated that once documentation was received from the various institutions, additional contact would be made via telephone or on-site visits. This proved to be unnecessary, as the submitted documentation was generally sufficient. Since many of the formats submitted were com- plicated, errors could have been made in interpretation; however, this possibility was not considered important enough to affect the findings of this task. If necessary, additional information can be requested from the library systems at a later date. The analysis of the submitted documenta- tion was difficult for the following reasons: 1) The amount of documenta- tion ranged from extremely detailed to very sparse; 2) Neither the technical nor the bibliographic terminology was consistent for all organizations; 3) In some instances, the format descriptions were more detailed with respect to control and housekeeping data fields than bibliographic data fields. The formats were ranked according to three broad categories: low poten- tial, medium potential, and high potential. To arrive at a ranking, the data fields of each format were compared to the MARC II format. Com- parison was made on the following basis: 1) present in both formats; 2) not present in local format and not capable of generation by format recognition algorithms; or 3) not present in local format but capable of generation by format recognition. The result of this analysis distributed the twenty-two institutions into the following ranked order: 1) Low potential-S; 2) Medium potential-S; 3) High potential-H. The figure for the number of low potential data bases is in addition to the eight out of the eleven originally rejected due to a small data base or very limited content in the record. It is significant to note that although no attempt was made at an all-inclusive survey of machine readable data bases, the total number of records in machine readable form reported by the respondents amounted to approximately 3.7 million of all types. Of this figure, about 2.5 million represented monograph records. The Phase 1 study included procedures required to transform a record into a certified RECON record, thus outlining the areas requiring cost analysis to compare the economics of using existing files for a national bibliographic store, as opposed to original input. (Certification in this context means comparing the record of the local institution to the record in the LC Official Catalog and, if required, making the record consistent with the LC cataloging as well as upgrading it to the bibJiographic com· RECON PILOT PROJEC1'/ AVRAM and MARUYAMA 51 pleteness of the LC record. Input in this sense includes the editing of the record as well as the keying.) The results of the study, prior to any further analysis, seems to indicate that the next phases of Task 3 will concentrate on a very large data base with a high degree of compatibility with MARC II (high potential) and another data base with a format differing from MARC II both in level of explicit identification and in bibliographic com- pleteness (medium potential). The first data base tests the most favorable situation; the latter a much less favorable situation. The carry-on phases of Task 3 will include: 1) a determination of a cut-off point at which a particular data base would not be included in future studies (although the composition and the format of the records in the data base might fit the selection criteria, the number of records in the file might be insufficient to warrant the costs of the hardware/software for the conversion effort); 2) investigation of the hardware and software effort involved; and 3) determination of the costs of comparing the records with the LC Official Catalog and the resultant updating costs to bring the records up to the level of the records in the LC machine readable MARC/ RECON data base. ACKNOWLEDGMENTS The authors wish to thank the staff members associated with the RECON Pilot Project in the MARC Development Office, the MARC Editorial Office, and the Technical Processes Research Office in the Library of Congress for their contributions to this report. The LC Photoduplication Service provided valuable assistance in certain phases of this project. Work on the RECON Pilot Project has continued to be supported by the Council on Library Resources and the U.S. Office of Education. REFERENCES 1. Avram, Henriette D.: "The RECON Pilot Project: A Progress Report," Journal of Library Automation, 3 (June 1970), 102-114. 2. Avram, Henriette D.; Guiles, Kay D.; Maruyama, Lenore S.: "The RECON Pilot Project: A Progress Report, November 1969-April 1970," Journal of Library Automation, 3 (September 1970), 230-251. 3. Ibid., p. 235 4. U.S. Library of Congress. Information Systems Office. Format Recog- nition Process for MARC Records: A Logical Design. Chicago: ALA, 1970. 5. Avram, Henriette D.; Guiles, Kay D.; Maruyama, Lenore S. Op. cit., p. 236. 6. Ibid. 7. Ibid., p. 237. 8. Ibid., p . 246. 9. Ibid., pp. 244-245. 10. Ibid., pp. 245-248. 11. Ibid., p. 248. 5579 ---- lib-MOCS-KMC364-20140103102106 27 PERSONNEL ASPECTS OF LIBRARY AUTOMATION David C. WEBER: Director of Libraries, Stanford University, Stanford, California Personnel of an automation project is discussed in terms of talents needed in the design team, their qualifications and organization, the attitudes to be fostered, and the communication and documentation that is important for effective teamwork. Discussion is based on Stanford University's ex- perience with Protect BALLOTS and includes comments on some specific problems which have personnel importance and may be faced in major design efforts. No operation is any better than its rersonnel. The selection, encourage- ment, motivation and advancement o the individuals who operate libraries or library automation programs are the critical elements in the success of automation. The following observations are based upon experience at Stanford Uni- versity over the past eight years in applying data processing to libraries, and particularly in the large scale on-line experience of Project BALLOTS (an acronym standing for Bibliographic Automation of Large Library Operations using a Time Sharing System) supported by the U. S. Office of Education Bureau of Research during the past three years. The first par! of the paper treats of five key personnel aspects: the automation team, thetr qualifications, their organization, the climate for effort, and docu- mentation. 28 Journal of Library Automation Vol. 4/1 March, 1971 THE TEAM Experts are required for the design of any computer system or system based on other sophisticated equipment and they must emphatically form a "team" to be effective. The group may include a statistician and/or financial expert, a systems analyst, a systems designer, a systems program- mer, a computer applications programmer, and a librarian. There may be several persons of each type, or one person may assume more than one responsibility. A few universities have librarians who have received train- ing in systems analysis or in programming. The computer related profes- sions are, however, demanding in themselves, and especially so when the programming language may change with each generation of computers. It is therefore usual for the head librarian to work with experts located in a systems office, an administrative dataJrocessing center, or a computation center. Except for the librarians, few · any of the experts may be on the library payroll, although in a very large project all may be financed from one or two accounts in the library. The team must cover the variety of functions encompassed in a formal system development process. These functions are enumerated in detail in Stanford's project documentation ( 1), but a brief summary of typical functions performed by the team may indicate its diversity. There is the analysis of existing library operations, conceptual design of what is desired under an automated system, form and other output design, review of pub- lished literature and on-site analysis of selected efforts of a related nature; determination of machine configuration to support the system design, study of machine efficiency, and reliability of main frame plus peripheral equipment; choice of programming language, checkout and debugging of programs; cost effectiveness study, study of present manpower conversion, analysis of space requirements and equipment changes; staff training pro- grams with manuals or computer aided instruction, system documentation and publicity; systems programming and applications programming, and project management. The total effort is collaborative; the system is de- signed by and with the users of it (i.e., library staff), not for them, and a tremendous contribution of local staff time is essential to success. In many instances an institution will have some, but not all, of these resources and capabilities in adequate amount. If amount is insufficient, the project director must determine how, through consultants or change of project course, a needed talent can be obtained or bypassed. The conse- quences of each mix of talent and change of strategy need assessment at frequent intervals; reassessment must be done with the full participation of the most senior library officers, including the Director of Libraries, as well as certain other key university officers. At Stanford, the group has for three years comprised diverse talent and worked reasonably well as a team. The Library has recently delegated to the Director of the Computation Center the immediate project management of BALLOTS and SPIRES ( 2) (Stanford Public Information Retrieval Personnel Aspects of Automation/ WEBER 29 System). Thus the current combined staff of twenty-three, which should reach a peak of twenty-five during 1971, reports to the BALLOTS-SPIRES Project Director. He in turn reports both to the Director of the Computa- tion Center in a direct relationship and, under his second hat as Chief of the Library Automation Department, to the Assistant Director of Libraries for Bibliographic Operations in a dotted-line relationship. See Table 1 for Stanford's diversity of staff. Table 1. Staff of Project BALLOTS-1970 Title or Age Degree Years of Years on Classification Experience Project Project Director 36 BS, CE 15 1 Special Assistant 40 BS 12 2 Senior System Programmer 37 BA 8 1 System Programmer 36 BS 14 3 Manager Technical Development 29 BS 5 2 System Services Manager 30 BA 8 2 Librarian 11/System Analyst 28 BA, MLS 3 3 Librarian/System Analyst 27 BA, MLS 2 <1 Project Documentation 35 BA, MLS 3 1 Editor Assistant 26 MA 3 <1 System Analyst 27 BA, MA 5 1 Junior System Analyst 25 BA 2 2 Programmer Trainee 26 1 1 Programmer 30 AA 7 3 Programmer 26 BA 4 1 Programmer 32 BS 11 <1 Programmer 28 BS 7 <1 Research Assistant 27 BS, MS, PhD 4 3 Research Assistant 28 BA, LLB 8 2 Research Assistant 22 BA 3 2 Research Assistant 24 BA 4 2 Senior Secretary 27 8 1 Secretary 19 1 1 In development of library automation or of any sophisticated data processing system, it is essential to utilize librarians and other system users to the utmost in constructing the design. There is evidence that an effective program of library automation results from on-campus development: that is, using a local staff with librarians working on a daily basis with system ~alysts, programmers, and information scientists. Librarians most defin- Itely should not try to do it all themselves; that would be sheer folly and w~uld reveal a lamentable lack of appreciation of the highly complex sktlls of the other professionals working in the information sciences. L 30 Journal of Library Automation Vol. 4/1 March, 1971 Team Qualifications A qualified and enthusiastic team with strong backing from the library administration is the most important single element in a library's automa- tion eHorts. This requires that the library administrator have a grasp of the intricacies, although he himself will probably not understand all details involved in the system design. It also requires consideration of the desire for advancement of those in computer refated professions and the various characteristics of their career/attems, including training, experience, job market, salary potentials, an mobility. The team will need to be selected with care and joint eHort by computer staH and library staH management. People are needed who can teach and learn from one another. They must be tolerant, and interested in problems and details, for they will be changing traditional systems, altering people's work habits, and probably shaking their self-confidence. Security comes from knowing the facts and being able to work on the new system-to be in part responsible for one's own future. Team harmony of eHort can be promoted by the so-called "bridge professional", or what the sociologists call a "marginal professional", meaning one who is able to assist those in one profession to converse and work eHectively with those in another. At Stanford the librarian/analysts and the project editor have been eHective in such a capacity. Those in the computer related professions, along with all on the library staH, need a sense of purpose, a sense of achievement, and recognition of their contributions by superiors as well as peers. The automation team needs a competent, experienced, technically knowledgeable, and tactful captain. He must manage with an appreciation for communication, a knack for touching base with various groups having interests in the eHort, the judgment to assign reasonable tasks, and the realism to set and achieve feasible time schedules-all within budget limi- tations. If the leader is less than this paragon, others in the organization must provide these qualities, all of which are required. For at least another decade it is likely that the expert analyst andjro- grammer will receive as high a salary as a librarian division hea or assistant department chief, and a highly qualified systems designer may well earn more than any chief and perhaps as much as the assistant director of libraries. The scale is not irrational or unjust; it merely recognizes the scarcity of particular talents and their importance to major library automa- tion programs. Designing an on-line library system requires a person of proven competence in on-line systems. A salary oHer shaved here may well lead to regret. Experience in Project BALLOTS points up problems with the selection of personnel who are not library trained. Some persons may be excellent in theoretical development but poor as managers, or some may play a "campus politics" game in order to move into senior positions in the computation center. Computer specialists have diHerent career goals than do librarians, and rarely see the library as a permanent career commitment Personnel Aspects of Automation/WEBER 31 by which to promote library automation; rather their commitment is toward automation and computer applications, not a particular section of the university. A project manager also needs to take great care that research does not become an end in itself, a particular tendency of graduate students doing system development. Implementation must be the goal of library automation; automated operations must be sound, efficient, dependable, and economical. Some of the special needs and working conditions for personnel in an automated program are outlined by Allen B. Veaner (3). Team Organization The organizational unit of an automation program may be first an office, then later a division when the group is farger and the function more permanent. The staff of a major project should have a departmental status equal to that of the acquisition or cataloging department. These latter two departments may be combined with an automation department under an assistant or associate director for technical processing. However, it is a rare individual who can give adequate attention to both the complexities of a major traditional library function and the direction of a major research and development program. Thus the initial organizational pattern may be one of separate but equal status, and at some point in time the units may be combined under one administrator. See Figure 1 for Stanford's new organization adopted after three years of eHort, as it entered the production- engineered phase. Units may best be combined when a research and development project begins to take on a significant amount of operational work. The reason is that the person in charge of the system development may need to oversee its implementation in order to assure that standards are followed for data preparation, coding, and the details of forms; and that feedback of experience for system improvement is secured. This combination of units should not be achieved when the rroject is still in the development stage, but it should also not wait unti operations are well under way. Some anticipation is desirable. In the medium-scale program such com- bination of units may be possible after a year of operation, or the con- tinuing production may be assumed by a traditional department and the systems office left free for further experimentation and development work. Production is normally the responsibility of traditional departments and ~om the day of implementation; the automation department responsibility IS for instructing in system use, debugging of programs, and fine tuning of the system. In a large project striving toward an integrated system for all technical processes and public services, the transfer of responsibilities to traditional departments may come in no less than three years and perhaps as many as five years from the origin of the project because of c_onstant developments in software and hardware, developments which library users cannot control but to which they must be responsive. An ~ DIRECTOR, STANFORD : UNIVERSITY LIBRARJES e the .r.LOTS : : BALLOTS: PRIN:CIPAL INVESTIGATOR and Assistant Director of Libraries for Bibliographic Operations I LIBRARY SYSTEMS I DESIGN COM!HTTEE I I I I SYSTEM SERVICES MANAGER 4-LIBRARY SYSTE~lS 2-SYSTEM PROGRAMMERS ANALYSTS (incl. 2 librarians) Age = 27 years Degrees = 1t (a)Exper = 3 years (b)Proj = 2 years Age = 37 years Degrees = 1 Exper = 11 years )?roj = 2 years (a) Professional experience in EDP systems (b) Time with the BALLOTS/SPIRES Project DIRECTOR, STANFORD VICE PRESIDENT COMPUTATION CENTER FOR RESEARCH PROJECT DIRECTOR SPIRES: PRINCIPAL INVESTIGATOR SPIRES/BALLOTS - and Professor of the and Chief of the Library's Department of Communication Automation Department 6-APPLICATIONS + 4-GRADUATE PROGRAMMERS SruDENTS (full time) (part time ) - 26 years 2 Age = 31 years Degrees = 1 Exper = 71- years - Proj = l year 4 years 2 years PROJECT DOCUMENTATION 2-EDITORS Age = 30 years Degrees = 2 Exper = 3 years Proj = 1 year Fig. 1. BALLOTS/SPIRES Organization-1970. ~ 'c' ~ ~ ..... .Q.. t-t 5:- j ~ c ~ ;; c· ;$ ~ ~ -~ ~ i ..... ~ ..... Personnel Aspects of Automation/WEBER 33 automation division or systems office would remain to take care of the refinements, maintenance, and development of further applications which are a result of the open-ended nature of a major automation program. THE CLIMATE FOR EFFORT If the librarian is to work effectively with all of the previously mentioned experts, he must become more than superficially familiar with the equip- ment and with the software which instructs it. The librarian who carries the responsibility for major mechanized data processing programs will probably have taken at least half a dozen courses in various aspects of data processing in order to be able to state reasonable requirements, to compre- hend economic and technical limitations, discuss file organization problems with the systems designer, and be sufficiently informed to help explain the new system to the library staff that will operate or make use of it. This type of specialized training will also be necessary for other team members who will work with different parts of the system. A number of librarians will need to take several short courses selected for their early relevance to the work at hand. Staff may take courses offered in the uni- versity computer science department, by the computation center, or by a local computer firm. Various clerical personnel will need briefing ses- sions, and it will be necessary to train some typists to serve as skilled terminal operators. Indeed, training will be needed on a continuing basis as more staff use the system; manuals are important unless self-instruction is built in. These efforts are desirable because the employee needs assurance that his talents will not be outdated and he be laid off as a consequence; rather that he will be retrained to the new system, shown that its function is not totally different from the previous one, and shown that it can actually serve him and lead to enhanced satisfaction and improved salary in his library employment. Computer based systems are far more likely to upgrade librarianship than to make it obsolete. They will enhance the profession by eliminating its routine drudgery, and thus more sharply identify its really professional nature. Don R. Swanson has commented on this point: "Those librarians who have some kind of irrational antipathy toward mechanization per se (not just toward some engineers who have in- appropriately oversold mechanization) I regard with some suspicion because I think they do not have sufficient respect for their profession. They may be afraid that librarianship is going to be exposed as being intellectually vacuous, which I don't think is so. Even in a completely mechanized library there would still be need for skilled reference librar- ians, bibliographers, catalogers, acquisitions specialists, administrators, and others. Those librarians in the future who regard mechanization, not with suspicion, but as a subject to be mastered will be those who will plan our future libraries and who will plan the things that machines are going to do. There will be no doubt of their professional status." ( 4) 34 Journal of Library Automation Vol. 4/1 March, 1971 Persons who have inhibitions about machine based systems will not be effective members of the design and development group. Those receptive to the change will benefit by having their job horizons enlarged and their prospects for improved salary and personnel classification enhanced. They will also share in the enthusiasm inspired by a bold new enterprise. This is not to say that all library staff members will enjoy the exacting refine- ments of a machine system, just as not everyone has talent to be a first-rate cataloger. It is not suited to everyone, and therefore the nature and purpose of the system must be clearly explained or demonstrated to anyone interested in such an assignment lest he accept it and then become disen- chanted with the work. The importance cannot be overstated of telling the entire library staff what is being done in regard to automation-and why. Disquieting rumors will abound in the absence of full and candid communication. Staff meet- ings should be held to review progress and outline next steps. Staff bulletins should publish summaries of the program and reports on its current status, information that can also be useful for faculty and staff outside the library. It must not be forgotten that the card catalog, the manual circulation system, and common order forms are familiar to all students and faculty. Most students will have seen these in their high school or public libraries, yet few will have seen a sophisticated machine system, and will often be skeptical about its efficiency and dependability. Faculty members may well wonder whether it is worth the cost. The effort to explain a program concisely but clearly to the library staff, students, faculty, and other university staff can be highly rewarding in understanding, and in moral and financial support. Columbia University's experience with library automation has led them to state that .. though the hardware and software programs associated with computer technology are formidable, they are not the only (and possibly not even the most impor- tant) problems in an automation effort. Two areas often overlooked or grossly underestimated are: 1 ) Creating an environment hospitable to change [and] especially important in this area is staff training and organi- zation. 2) Describing and analyzing existing manual procedures sufficiently before attempting to design automated systems." (5) DOCUMENTATION The documentation of any new system is of singular importance. There is an oral tradition in most libraries; techniques of filing or searching are passed on by the supervisor, although libraries use staff manuals to formalize some of the techniques. However, in a system where absolute exactitude is demanded and where costs of system development are high, methodical recording of principles and procedures is obviously necessary. Especially vital are details of design and programming, for purposes of debugging, maintenance, and transfer to others. Personnel Aspects of Automation/WEBER 35 CRITICAL PERSONNEL ISSUES In an important statement from Massachusetts Institute of Technology's Project MAC in 1968, Professor F. J. Corbat6 outlines fifteen critical issues ranging from technical to managerial that affect the complexity and diffi- culty of constructing computer systems to serve multiple users ( 6). Seven of the fifteen have substantial personnel aspects; experience with Project BALLOTS provides the basis for the following comments on them. 1) "The first danger signal is when the designers of the system won't document. They don't want to be bothered trying to write out in words what they intend to do." Stanford's experience might not put this as a first critical issue, yet it is evident that without adequate and clear documenta- tion the advancement of any research or development project is jeopardized. One expert, an invaluable member of the BALLOTS team, has full respon- sibility for this very important task. The position requires adequate clerical support; there are one-and-a-half assistants on the BALLOTS team. 2) "The second danger signal is when designers won't or can't implement. What is referred to here is the lofty designer who sketches out on a blackboard one day his great ideas and then turns the job over to coders to finish many months later." Stanford has experienced some of the seduc- tiveness of design innovations, especially on the part of graduate student research assistants. (Yet these assistants have done excellent work and it is wished they were all full time on the project. ) Without constant review and the use of PERT charts or other scheduling, shying away from imple- mentation can be a real hazard. There will be dark days when the design team cannot surmount some intractable but crucial obstacle, and tne project manager and staH librarians working with the team must be sympathetic, encouraging and patient. 3) "The next danger signal is when the design needs more than ten people. This doesn't mean that all the support people . . . must add up to no more than ten. But when the crucial kernel of the design team is more than ten people, a larger scale project is coming into being. This is the point where communication problems begin to develop." Stanford has flirted with that particular danger point. With acquisition and catalog- ing staff included, the BALLOTS design group is over ten and there is a communication problem, but one due not so much to size as to different backgrounds, vocabulary and scheduling of effort. The need for communi- cation has been intensified because the Main Library is over half a mile from the Computation Center. It has required monthly staff meetings at early stages of design, and late stages of development, and at other times weekly staff meetings of the design group with the librarians who are ~etting the design criteria. Failure of constant and accurate communication m a research and development effort is a threat to its effective progress. 4) "If a project cannot be finished or made use of in one year, there is potential trouble, because the chances of underestimation are strong (and ) a personnel turnover of roughly 20% per year must be assumed." Stanford's 36 Journal of Library Automation Vol. 4/1 March, 1971 experience would bear this out. There was some time and cost under- estimation. Turnover during 1969-70 was 17%; the year before it was 50%. Obviously documentation then becomes a more critical element in progress, and turnover may lead the librarian to feel that it is sometimes one step backwards for every two steps forward. Turnover may be minimized by generous salary increases, not only once a year but perhaps at other times also when merit deserves reward and as responsibilities increase. In con- trast to customary operations, an automation design effort is constantly changing in nature and emphasis; this fact requires flexibility in personnel management and frequently deserves immediate response in salary and classification administration. To keep a qualified research team in an area of specialization in demand, one must pay the price. Let there be no misunderstanding, a good system of library automation cannot be finished in one year-nor in three; and it is costly. 5) "Another danger signal is when a system is not a line-of-sight system. This means that all of the terminals, consoles, or what-have-you are not in the same room within shouting distance of the operator." Any on-line system like BALLOTS cannot be line-of-sight. Terminals are brought to the users, not users to the terminals. Since an on-line system requires total file recovery through use of log tapes, a facility not available on the prototype system, Stanford has experienced problems when the machine goes down; it takes time to rerun a program or mount a different disk pack; a file was once wiped out; and there are many other users of the central facility, which puts a premium on scheduling, advance notice, backup, and the like. If a design team is not housed in adjacent space, it will take more personnel or time than in a line-of-sight arrangement to achieve the same accomplishment. BALLOTS systems analysts were in the Main Library througbout the early design phases and the systems designers were near the Computation Center. Lack of line-of-sight was a sufficiently severe problem that all of the BALLOTS staff were collocated near the Computa- tion Center last winter as the production engineered phase began. 6) "A somewhat related danger signal is when there are over ten system maintainers. Here I am talking about an on-line system that is actually being maintained on-line." At Stanford no more than one person has worked at one time on the program maintenance of Stanford's four-year- old computer produced Undergraduate Library book catalog. There have been some complexities due to staff changes, changes in the operating system, and an off-campus contract for reprogramming to third-generation equipment, but the problems have not resulted because of the scale of the project. BALLOTS, on the other hand, is twenty to fifty times as large a system, and it is expected that two or three programmers will be needed to maintain the systems software and a similar number to maintain and make minor revisions to the applications software. 7) "The last danger signal is when the system requires the ability to permit combinations of sharing, privacy and control." At Stanford, assign- Personnel Aspects of Automation/ WEBER 37 ment of authority for file access has become a problem-who is permitted to update an acquisition record or authorize payment? The requirement for security also enters in any system which has salary data or other per- sonnel information in files. A whole order of complexity is added. As in many of the above problems, complexity is accentuated when one is developing an on-line interactive system which serves multiple users. Security must be designed to the file level and, later, to the record or even data element level. Security requires control of access to file, of writing in a file, and of updating data through three types of checks: access allowable from a given terminal, from the file password, or from an individual password. Such problems do not exist in off-line systems. CONCLUSION For successful automation of library operations, it is of fundamental importance to choose a task that is appropriate in timing, magnitude of effort, funding, and personnel. The BALLOTS experience demonstrates that one must devote great thought, care, and analysis to choosing the right automation project at the right time, and base it on having well qualified people to direct and accomplish the task. Given suitable conditions it will be a most exciting and fruitful endeavor. The system that works well is a thing of beauty, and people make it so. REFERENCES 1. Stanford University, SPIRES/BALLOTS Project: Project Control Note- book, May 1970. Section 1.4 "System Development Process." 2. Parker, Edwin B.: SPIRES (Stanford Physics Information Retrieval System) 1969-70 Annual Report to the National Science Foundation. (Stanford University: Institute for Communication Research, June 1970). 3. Veaner, Allen B.: "Major Decision Points in Library Automation," College & Research Libraries, 31 (September 1970), 299-312. 4. Swanson, Don R.: "Design Requirements for a Future Library." In Markuson, Barbara Evans, ed.: Libraries and Automation. (Washing- ton: Library of Congress, 1964), p. 21. 5. Columbia University Libraries : Progress Report [to the National Science Foundation on Library Automation] for Jan. 1968-Dec. 1969 (NSF-GN-694). p. 14. 6. Corbat6, Fernando J.: Sensitive Issues in the Design of Multi-Use Systems (Waltham, Massachusetts: Honeywell EDP Technology Cen- ter, Technical Symposium on Advances in Software Technology, Feb- ruary 1968). 17 pp. Project MAC Internal Memo. MAC-M-383. 5581 ---- lib-MOCS-KMC364-20140103101752 1 FILE SIZE AND THE COST OF PROCESSING MARC RECORDS John P. KENNEDY: Data Processing Librarian, Georgia Institute of Tech- nology, Atlanta, Georgia Many systems being developed for utilizing MARC records in acquisitions and cataloging operations depend on the selection of records from a cum- ulative tape file. Analysis of cost data accumulated during two years' ex- perience in using MARC records for the production of catalog cards at the Georgia Tech Library indicates that the ratio of titles selected to titles read from the cumulative file is the most significant determinant of cost. This implies that the number of passes of the file must be minimized and an effective formula for limiting the growth of the file must be developed in the design of an economical system. Since 1963 several articles on computerized production of catalog cards have reported cost figures for card production. Fasana reported a cost per card of 9.9 cents at the Air Force Cambridge Research Laboratory (AFCRL) (1). Costs at the Yale Medical Library under the Columbia- Harvard-Yale computerized card production system varied from 8.8 cents to 9.8 cents per card ( 2) . Under the Yale Bibliographic System, costs for card production at the Yale Medical Library have been 13.9 cents per card .. When the MARC MATE program is used to introduce MARC rec- ords mto the Yale Bibliographic System the cost of cards produced from the MARC records is 24.9 cents ( 3) . Costs for computer assisted card production at the Philip Morris Research Library have been estimated at 18 cents per card ( 4) . The cost per card for cards produced from MARC records at the Georgia Institute of Technology Library has been reported as 10 cents (5). 2 ]oumal of Library Automation Vol. 4/1 March, 1971 The focus of interest in these cost reports has been on a comparison of the costs of computer produced cards and manually produced cards. There is agreement in these reports that computer production can compete fav- orably in terms of cost with other methods of production. Less attention has been given to variations in the costs of computer produced cards. Since the systems for which costs have been reported vary in scope and objectives, equipment used, nature of input, rates for labor, and charges for computer time, it is not very useful to compare the costs from system to system. Variations in cost within one system are of greater interest, since it is easier to isolate the factors that result in the altered costs. The report on the Yale bibliographic system shows that the introduction of MARC rec- ords into a system that was not designed for processing MARC records may produce substantially higher costs. Fasana reported that when a PDP-1 computer was used rather than the specially built Crossfiler in the AFCRL system, the cost per card was quadrupled. Kilgour discusses briefly the effects of three changes in the Columbia-Harvard-Yale system on the cost of cards produced. The 10-cent-per-card cost reported for Georgia Tech was the average cost during the preceding three-month period, January through March 1968. During the three years in which catalog cards have been produced on the computer at Georgia Tech, costs have varied widely as procedures, personnel, file sizes and work loads have changed. The greatest variation has occurred in the cost of the manual steps in the system, mainly proof- reading and making corrections. The greatly improved accuracy of the MARC II records has resulted in a reduction in the time required for proofreading and making corrections. The costs of supflies and equipment have been small and shown little variation. The cost o computer time has varied from 18 cents per title (just over 2 cents per card) to a high of 47 cents ( 6 cents per card), excluding the cost of the merge runs to maintain a cumulative file of MARC records. An analysis has been made to determine the factors responsible for this variation in computer costs, and techniques for reducing computer costs have been developed. MATERIALS AND METHODS The Price Gilbert Memorial Library at the Georgia Institute of Tech- nology is a centralized scientific, technical and management collection of 612,000 volumes plus 500,000 microtext and other bibliographic units. In 1968/69 almost 20,000 titles representing about 35,000 volumes were cataloged for addition to the collection. The Library makes use of the UNIVAC 1108 and the Burroughs B5500 computing systems of the Insti- tute's Rich Electronic Computing Center for its data processing needs. The work described here was performed on the B5500. The Georgia Tech B5500 configuration includes two central processing units, 32,000 forty- eight-bit words of core storage, 29 million characters of disc storage and 10 magnetic tape drives. Library programs are written in COBOL and are File Size and MARC Records/KENNEDY 3 multi-processed with other programs in the standard work stream. The Library is billed $140 per hour for central processor time and $47 per hour for IO channel time. The system for production of catalog cards from MARC I records which was in operation for over two years has been described previously ( 6). Statistics were recorded for all computer runs in the processing of 73 batches of MARC I titles. These statistics include number of records processed, file sizes, processor time, IO channel time, and cost, for each run. The time and cost remained fairly constant for some runs. The cost of runs to produce the sorted catalog cards from edited MARC records ranged from 6 to 9 cents per title and averaged a little over 7 cents. The cost of runs to make changes and additions to the MARC records ranged from 1 to 5 cents per title and averaged 2 cents. The cost was usually about 1 cent per title for each time the correction program was run. It often had to be rerun several times before all records in the batch were correct. The Library's improved MARC II system avoids the cost of correction reruns by permitting independent corrections to any record in a direct access file rather than requiring records to be processed as a batch. Most of the variation in the cost of computer time occurred in the run in which records were selected from the cumulative MARC file and the selected records were then converted to the B5500 character codes, reformatted and prooflisted. The cost of this run varied from a low of 10 cents per title selected to a high of 36 cents per title; the variation is primarily an effect of the increasing size of the cumulative MARC file and of variation in the number of titles selected in the run. As the MARC file increased in size the cost of selecting a small number of titles increased dramatically. The precise relationship of file size and batch size to cost per title is not apparent, however, because the cost of character conversion, reformatting, and printing the prooflist were combined with the cost of selection in a single run. An additional complication results from the effects of the other jobs being processed by the computer concurrently. For example, one batch which had to be rerun because the output tape was defective cost 23 cents per title the first time and 28 cents per title when rerun with a different job mix. Although the part of the run cost which can be attributed to passing the M~~C file and the part attributable to code conversion, formatting and pnntmg cannot be determined for a single run, this can be calculated from a number of runs with varying file sizes and batch sizes. It is assumed that . variations in the time required for processing individual records of v~ymg lengths and variations due to the mix of jobs run concurrently ~11 average out and may be disregarded. Statistics for the selection runs mclude the number of records read from the cumulative MARC file, the number ~f recor~ selected and processed, the processor time and IO channel time requrred for the run, and the cost of the run. Using the method of the least squares, these statistics were used to calculate the 4 Journal of Library Automation Vol. 4/1 March, 1971 average time and cost for each record read from the cumulative MARC file. Once these constants are calculated it is possible to predict the cost per item or the total cost of a select run with any given file size and batch size. In order to determine the average cost for processing a selected record and the average cost for reading a record from the cumulative MARC file, it was postulated that C•= (~~) Ca+C. where CT FS is the total cost per title (File Size) is the number of records read from the cumulative MARC file BS (Batch Size) is the number of records selected in the run Cn is the cost of reading a record from the cumulative MARC file CP is the cost for processing a selected record The method of least squares yields the following equations: [ ~(~~ r] Ca+ [ ~(~~)] C•= ~(~~)c. and [ ~(:] C.+NC.=C. Solving these equations for the data from the 73-batch sample gives the following values: cp = $.073 Cn = $.00068 Since charges for computer time are determined differently at other installations, the figures for processor time and 10 channel time may be more useful to others than the cost figures. Using the same techniques but substituting processor time for cost gives the following values: Processor time per record read = .00646 seconds Processor time for selected records = 1.339 seconds Again, using the same technique but substituting 10 channel time for cost gives the following values: 10 channel time per record read= .02048 seconds 10 channel time for selected records= .456 seconds File Size and MARC Records/KENNEDY 5 These values may be substituted in the formula, Cr = ( ~~ ) Cn + c,, to find the cost or time per title for any batch and file size. For example, the per title cost for selecting and processing a batch of 200 records from a MARC file of 40,000 records: c.=(~~ )c. +c. c.=( 4:0} $.00068) +$.073 CT= $.21 It will cost about twenty-one cents per title. The total cost of the run can be predicted as follows: C = ( FS - BS ) ( Cn) + ( BS ) ( CP) c = ( 40000 - 200) ( $.00066) + ( 200) ( $.073) c = $41.27 RESULTS Table 1 shows the predicted cost per title for various file sizes and batch sizes; it is based on the cost of the select run at Georgia Tech and ignores the cost of maintaining the MARC file. Since the Library of Congress cumulated MARC I records until a reel of tape was filled and provided a cumulative card number listing of the records on the reel, it was not essential to update the cumulative MARC file each week. The MARC II tapes issued from the MARC Distribution Service are not cumulative. Most libraries maintaining a cumulative file of MARC records will find it necessary to update this file each week. Weekly updating of the MARC file requires that all records on the file be not only read but also written on a new tape each week. For most systems this will rapidly become the most expensive machine procedure in the entire system. Com- bining the selection function and any index production with the file up- date means that no additional passes of the file will be required, but the cost of writing the file each week must be added to the figures in Table 1. Statistics from the merge runs at Tech show that if the number of old MARC file records read, the number of records read from the weekly update tape, and the number of records written on the new MARC file are totaled, the average cost per IO operation for the merge runs ranged be.~een $.00062 and $.00073 and averaged $.00068 for all merge runs. Since th1s IS the same cost as that obtained for each record read from the cumula- tive file in the select runs, it seems reasonable to use this figure as the cost for reading or writing a MARC record in calculating the cost of 0) ._ c ~ ~ - Table 1. Relationship of File Size and Batch Size to Cost per Title c -r .... ~ ~ ~ BATCH SIZE > File ~ a- Size 50 100 150 200 250 300 400 500 750 1000 ~ $ .209 $ .118 $ .107 $ .100 $ .095 $ .087 $ .082 $ .080 .... !OK $ .141 $ .090 cs· 20K .345 .209 .164 .141 .127 .118 .107 .100 .091 .087 ;s 30K .481 .277 .209 .175 .155 .141 .124 .114 .100 .093 < 40K .617 .345 .254 .209 .182 .164 .141 .127 .109 .100 c ~ 50K .753 .413 .300 .243 .209 .186 .158 .141 .118 .107 ,;... -60K .889 .481 .345 .277 .236 .209 .175 .155 .127 .114 ...... 70K 1.025 .549 .390 .311 .263 . 232 .192 .168 .137 .121 ~ 80K 1.161 .617 .436 .345 .291 .254 .209 .182 .146 .127 ll' .685 .379 .277 .155 "t 90K 1.297 .481 .318 .226 .194 .134 C'.) lOOK 1.433 .753 .526 .413 .345 .300 .243 .209 .164 .141 .?"' llOK 1.569 .821 .572 .447 .372 . 322 .260 .223 .173 .148 ...... co ~ 120K 1.705 .889 .617 .481 .399 .345 .277 .236 .182 .155 ...... Table 2. Relationship of File Size and Batch Size to Cost per Title- File Update and Record Selection Functions Combined in Same Program Old BATCH SIZE File Size 50 100 150 200 250 300 400 lOK $ .378 $ .225 $ .175 $ .149 $ .134 $ .124 $ .111 20K .650 .361 .265 . 217 .188 .169 .145 30K .922 .497 .356 .285 .243 .214 .179 40K 1.194 .633 .447 .353 .297 .260 .213 50K 1.466 .769 .537 .421 .352 .305 .247 60K 1.738 .905 .628 .489 .406 .350 .281 70K 2.010 1.041 .719 .557 .461 .396 .315 80K 2.282 1.177 8.09 .625 .515 .441 .349 90K 2.554 1.313 .900 .693 .569 .486 .383 lOOK 2.826 1.449 .991 .761 .624 .532 .417 llOK 3.098 1.585 1.081 .829 .678 .577 .451 120K 3.370 1.721 1.172 .897 .732 .622 .485 500 750 1000 ":tj ... ~ $ .104 $ .093 $ .088 en .131 .111 .102 N . ~ .158 .130 .115 ~ .185 .148 .129 ~ .212 .166 .143 ~ .240 .184 .156 > .267 .202 .170 ~ () .294 .220 .183 ~ .321 .238 .197 ~ ~ .348 .257 .211 c ~ .376 .275 .224 -.403 .293 .238 ~ tTl :z :z t=:l tj to< ~ 8 Journal of Library Automation Vol 4/1 March, 1971 combined merge-select runs. Table 2 shows the predicted costs per title for combined merge-select runs with varying file and batch sizes. The costs shown are based on the following equation: C. =(FSo + FS~:s· + FS. )c.o + Cp where CT is the cost per title FSo is the file size for the old MARC file FSA is the file size for the add records ( 1200) FSv is the file size for the delete records ( 1200) FSN is the file size for the new MARC file BS (Batch Size) is the number of records selected in the run C1o is the cost of reading or, writing a record ( $.00068) CP is the cost of processing a selected record ( $.073) Calculations for this table are based on several assumptions: it is assumed that the file has reached a state of equilibrium in which the weekly addi- tions and deletions are equal; it is also assumed that delete records have the same average length as other records and therefore take as long to read. While it is unlikely that these assumptions will hold perfectly, the variations are not great enough to destroy the usefulness of the resulting figures as a guide. DISCUSSION The figures presented in the two tables have several implications for the design of systems based on the maintenance of a cumulative MARC file and the selection of records from that file. First, they show the im· portance of assuring that no unnecessary passes of the cumulative MARC file are made. Updating of the MARC file, production of indexes to it and selection of records from it should be accomplished in a single pass of the file. If it is desired to select records from the file more often than once a week, Table 1 provides a means of estimating the cost of the im· proved response time. If for example, the file size is 100,000 and the weekly volume is 500, twice-a-week runs would increase the cost by 14 cents per title or by $68.00 a week for the select runs. The figures presented in the two tables also show the critical importance of controlling the growth of the cumulative MARC file, especially for File Size and MARC Records/KENNEDY 9 libraries with a relatively small volume of titles to be processed. Three characteristics of the acquisitions program of the library largely determine the possibilities for controlling the growth of this file. The number of titles acquired by the library determines the batch sizes for records to be selected from the file each week. The acquisition rate is also an im- portant determinant of the growth rate of the cumulative file provided that records which have been selected and used are then purged from the file. If the Library of Congress issues an average of 1200 titles per week and a library uses an average of 1000 titles a week from the file, the net annual growth of the cumulative file will be only slightly over 10,000 records. On the other hand, a smaller library selecting an average of only 100 titles a week would have a net annual growth rate of about 57,000. If unused records were purged after one year, the file size would remain stable at these levels. Table 2 indicates that the cost per title for file maintenance and selection at these two libraries would be about 9 cents and 86 cents respectively. A second characteristic of the acquisitions program of the library that is important in controlling the growth of the cumulative MARC file is the scope of the subject coverage attempted. If most of the monographs acquired fall within well defined subject classes, the probability of utilizing MARC records in many other subject classes may be low enough that these records need not be added to the cumulative MARC file at all. For a special library that attempts to collect everything published in a few well defined subject areas it may be economical to maintain and utilize a limited MARC file even though the number of records selected is small. On the other hand, a small or medium-sized public library ac- quiring the same number of titles would probably find a much larger per- centage of its records on the MARC file but still not be able to use the MARC tapes economically. Since the public library is likely to collect titles in most subject fields, the probabilities of utilizing records in dif- ferent classes would not vary as widely and it would not be possible to limit the file to records in a few classes having a high probability of utility. Consequently, the per-item cost of MARC records would likely be too high for consideration. If it is determined that the probabilities of using MARC records vary widely for other characteristics, such as publisher, these characteristics may be used for restricting the records to be added to the cumulative file, thus limiting its size, but subject class seems to be the most promising characteristic for this purpose. An analysis by subject class of all non-juvenile records in the MARC I BI.e and of those records selected from it for use by the Georgia Tech Ltbrary has been used as the basis for restricting the growth of the cumu- lative file of MARC II records. Overall, 8,953 out of 46,486 records were utili~ed, 19.3% of the file. The percentage selected varied from more than 50% m some engineering classes to less than 1% in a few classes such as CS (Genealogy) and BW (Practical theology) . Elimination of thirty 10 Journal of Library Automation Vol. 4/1 March, 1971 classes in which fewer than 4% of the records were eventually used would have reduced the file by 7,710 records or 16.6%. Only 184 of these records ( 2.4%) were eventually selected for use. Records for these thirty subject classes are not being added to the Georgia Tech file of MARC II records. A third characteristic of the acquisitions program important in con- trolling the growth of the cumulative MARC file is the speed with which newly published monographs are acquired. If most monographs are ac- quired soon after publication, the probability of using a MARC record that has not been selected in the first few months after its receipt may be low. Unselected records may therefore be purged after a relatively short time and the file size thereby controlled. Use of the MARC tapes for book selection will help to increase the probability of records being selected during the first few months on the file. A system that uses the weekly MARC tapes for book selection and does not retain on the cumula- tive MARC file those records not selected for purchase might be quite economical. The frequency with which decisions are later made to acquire titles that were initially passed over, and the added cost for manual input of those records, would have to be considered in deciding on this policy. An analysis has been made of the interval between the date records were added to the MARC file and the date on which they were selected for use by the Georgia Tech Library. Distributions by time intervals for each Library of Congress subject class were prepared. The distributions varied significantly for reasons that are not yet clear. Generally, it appeared that in those subject classes for which a smaller percentage of the titles available on the MARC file were acquired, they were acquired more rapidly. This seems to be advantageous for keeping the MARC file small. For those classes in which a large percentage of titles are selected, un- selected records will be retained on the file for a long period, such as eighteen months. Use of a large percentage will mean that the number of unused records remaining on the file will be relatively small and they will have a high probability of selection over the extended period. For those classes in which a smaller percentage of titles are acquired, the unselected records will be retained on the file for a shorter period, such as six months. Since titles in these fields tend to be acquired more promptly, few potentially useful records will be lost by purging unselected records after a shorter interval. Over the past year major changes have been made in acquisitions pro- cedures in the Georgia Tech Library. A much larger proportion of mono· graphs are now received on approval plans. The MARC distribution serv· ice now provides about twice as many records each week as were provided during the pilot project phase. The effects of these changes on the propor· tion of titles selected and the time required for acquiring titles in the various subject classes have not yet been determined. Continuous moni· toring of the operation of the system for changes in these characteristics File Size and MARC Records/KENNEDY 11 will be required for efficient operation. The improved program for main- tenance of the MARC II file and selection of records from it provides for designating subject classes which are not to be added to the file and designating how long unselected records in other classes are to be retained on the file. This study of variations in the computer costs of card production lends support to the decision to continue using COBOL as the primary language for the MARC II system being implemented on the UNIVAC 1108 rather than using assembly language. The inefficiency of COBOL for character- by-character code conversion and for manipulating variable length data had been a source of some concern. The cost of all processing of selected records, including code conversion, reformatting, prooflisting, making cor- rections, generating and formatting added entry records, and sorting and printing catalog cards, averaged only about 16 cents per title. A reduction of even 50% through the use of assembly language and increased effort directed to program efficiency would reduce costs by only about 8 cents per title or 1 cent per card. These savings do not seem to justify the in- creased original programming costs and the likelihood of eventual costly reprogramming. On the other hand, the cost of selecting records from the MARC file varied from 3 cents per title to 29 cents per title. With the added cost of weekly maintenance of the MARC file and with more than twice as many MARC records being received, the costs of processing the cumulative MARC file might easily go much higher. By careful attention to controlling tl1e growth of this file, significant savings in the cost of the system may be achieved. CONCLUSION Some librarians have assumed that as the scope of the MARC distribu- tion service expands to include other languages and other types of ma- terials their problems of inputting current records will be solved. This analysis shows that the situation is not so simple. Probably only a few of the largest general research libraries will be able to maintain complete MARC files for their individual use during the next few years, though reductions in computing costs may eventually change this prediction. Even medium-sized libraries such as Georgia Tech will not be able to use eco- nomically the foreign language materials when they are included in the MARC program. Some libraries which do not use a large enough proportion of the MARC records to make it economically practical to maintain a complete MARC file may be able to make economical use of MARC records by carefully contro~ling the retention of records on the cumulative file. Continuing analysts of the probabilities for selecting records of varying age and subject classes rna~ be utilized in developing a formula for maintaining the file at near opbmum size if the system provides for collection of the required statistics. 12 Journal of Library Automation Vol. 4/1 March, 1971 For libraries which cannot profitably use the MARC tapes, there is another prospect. Cooperative centers that do the processing for large library systems or for several systems will have the volume to justify maintenance of complete files. Certainly, a processing center serving all libraries of the University System of Georgia could economically maintain a more complete MARC file than Georgia Tech alone can justify. The de- velopment of cooperative processing programs in Ohio, New England, Oklahoma, ( 7, 8, 9) and elsewhere indicates that some librarians are coming to this realization. ACKNOWLEDGMENTS Mrs. Julie Gwynn wrote most of the computer programs referred to in this paper. Her husband, Professor John Gwynn, gave valuable advice on the statistical techniques employed in analyzing the data. The University of Toronto Library generously provided a copy of its MARC file, which included the date each record was added to the file, for use in analysis of the time lag between availability of the record and selection of it. REFERENCES 1. Fasana, Paul J.: "Automating Cataloging Functions in Conventional Libraries," Library Resources and Technical Services, 7 (Fall1963), 350-365. 2. Kilgour, Frederick G.: "Costs of Library Catalog Cards Produced by Computer," Journal of Library Automation. 1 (June 1968), 121-127. 3. Stone, Sandra F .: Yale Bibliographic System; Time and Cost Analysis at the Yale Medical Library (Unpublished document, New Haven: Yale University Library, 1969). 4. Murrill, Donald P.: "Production of Library Catalog Cards and Bul- letin Using an IBM 1620 Computer and an IBM 870 Document Writing System," Journal of Library Automation, 1 (September 1968 ), 198-212. 5. Kennedy, John P.: "A Local MARC Project: The Georgia Tech Library." In University of Illinois, Graduate School of Library Science: Proceedings of the 1968 Clinic on Library Applications of Data Processing (Urbana: University of Illinois, 1969), pp 199-215. 6. Ibid. 7. Kilgour, Frederick G.: "A Regional Network- Ohio College Library Center," Datamation, 16 (February, 1970), 87-89. 8. Agenbroad, James E.; et al.: Systems Design and Pilot Operations of the N ew England State Universities. NELINET, New England Li· brary Information Network. Progress Report, July 1, 1967. March 30, 1968 (Cambridge, Mass.: Inforonics, Inc., 1968). ED 026 078. 9. Bierman, Kenneth John; Blue, Betty Jean: "Processing of MARC Tapes for Cooperative Use," Journal of Library Automation, 3 (March 1970)' 36-64. 5580 ---- lib-MOCS-KMC364-20140103101924 13 SHAWNEE MISSION'S ON-LINE CATALOGING SYSTEM Ellen Washy MILLER: Library Systems Analyst, and B. J. HODGES: Senior Systems Analyst, Shawnee Mission Public Schools, Shawnee Mission, Kansas An on-line cataloging pilot project for two elementary schools is discussed. The system components are 27 40 terminals, upper-lower-case input, IBM's FASTER generalized software packo.ge, and usual cards/labels output. Reasons for choosing FASTER, software and hardware features, operating procedures, system performance and costs are detailed. Future expansion to cataloging 100,000 annual K-12 acquisitions, on-line circulation, retro- spective conversion, and union book catalogs is set forth. INTRODUCTION The Shawnee Mission Public Schools, serving several affiuent suburbs of the Kansas City metropolitan area, began automated library operations in 1968. As the school districfs Computer Center was then equipped with a 1401 computer and tape/disk store, the first library system was designed for batch ordering and cataloging. Later, a batch circulation system for three of the school district's fourteen secondary libraries was started. Library automation in that period was similar to that described by Scott (1) and Auld (2). Two years saw a profound change in the Shawnee Mission School Dis- trict. By unification, it had added 50 elementary schools and a new high school, makin9 a total of 65 schools, all of which had libraries. At the school districts Computer Center, the configuration had passed through the 360/30 stage to a 360/40; 2314 disk packs were on order; and 2741 term~als, using IBM's Remote Access Computational System (RAX) had been ~stalled at all five high schools for computer science courses. Wlule the batch library system could handle the 28.000 items ordered 14 Journal of Library Automation Vol. 4/1 March, 1971 and cataloged annually up to that point, it was impossible to justify using it for an estimated 100,000 annual acquisitions needed by 65 libraries. Computer time to process AUTOCODER programs on a mod/40 would be excessive; the librarians desired many improvements (upper- and lower- case I/0; longer fields; shortened time to process items; and more accurate data on cards and labels) . The need for data processing and library improvements resulted in rethinking of the approach to ordering, cataloging, and circulation. Natur- ally, on-line processing came to mind. IBM 274fs for computer science courses had given management and data processing staff some experience with a dedicated on-line system; the 360/40 and 2314 disks would support large files, indexed sequential file organization, and multiprogramming (simultaneous use of the CPU for terminal and batch jobs). The experiences of Stanford and University of Chicago ( 3) and IBM ( 4) pointed out that on-line systems could be built for larger and more complex organizations than for Shawnee Mission, where the collections are 95% English language and the system covers only books and audio-visual items. Cataloging is based on title-page information; tools used are Children·s Catalog, Sears, N.U.C., A. A. Rules, and other standard works. Also very important was the fact that the Computer Center management wanted experience in multiprogramming prior to considering it for student scheduling, student records, payroll and business functions. A proposal was made to library and data processing management by the Library Systems Analyst in mid-December 1969 that on-line cataloging in multiprogramming mode be begun by mid-March 1970 for two elementary schools on a test basis. An improved batch order system using COBOL programs was also proposed. Finally, it was suggested that a carefully designed cataloging system could include fields to be used later for circulation control. The specific purposes of the on-line cataloging pilot project were 1) to test whether direct access to master disk files is an efficient, accurate, and economical method of creating and updating bibliographic and holdings data; and 2) to allow data processing management to ascertain if multi- programming is feasible and practical at this time locally. A search of library literature revealed no on-line systems for cataloging and circulation functions; rather, either circulation or order functions were real time. Moreover, truly on-line systems were rare; Columbia had de- signed a circulation system that could be used in that mode, but as of October 1968, was operating batch (5). Chicago's Book Processing System does input data on line, although ordering and cataloging functions are performed off-line (6) . BELLREL is an on-line circulation system (7). Comparing the circumstances of the above institutions with that of Shawnee Mission School District brought out one sterling difference: the latter had no yant money nor huge parent institution upon which to rely. Rather, it ha a modest hardware-software configuration, a need to be On-Line Cataloging System/ MILLER and HODGES 15 operational within three months if the two test librarians were to see any output by the end of the school year, and a small team of data processors and librarians devoted to redesign and implementation. METHODS AND MATERIALS Having earlier seen demonstrations of the Kansas City, Missouri, Police Department's FASTER system, with its on-line access to constantly updated alphanumeric files, the senior systems analysts turned to IBM for further information. The police department's system was based on a software package developed in Alameda, California, for law enforcement. It was also available in a general form called FASTER (Filing and Source Data Entry Techniques for Easier Retrieval). The proven ability of this system to accept on-line data via 2740 terminals and to display it on 2260 CRT's, its ease of adaptation to user requirements, the quickness with which analysts and programmers had learned to use it at the police department, and a local, positive experience decided the issue. In mid-January 1970, FASTER was chosen as the software framework for on-line cataloging. Software FASTER has been programmed in modular form, with each module performing a particular task ( 8). Modules supporting functions that vary because of hardware must be coded by the user. This coding is done in macro form (brief program statements in higher level language which generate many machine instructions) and therefore is not a tedious task. One of the hardest, most complicated portions of implementing a tele- processing system is programming the support from the CPU to the terminal; with FASTER, this took about a day. The macros use Basic Teleprocessing Access Method (BTAM) support. With line support taking little time, the user may spend more effort on his own processing needs. The user may have only a query or an update requirement; Shawnee Mission needed both. Because FASTER is a modular system, the user is permitted to describe each of his needs as a transaction. This transaction must be programmed as a TPD (Transaction Processing D~scription) using macros. Coding and listing time for a TPD will vary w1th the processing description. Those familiar with detailed programming will note that the programmer does not have to concern himself with 1/0. The TPD will prepare the data for output, and the FASTER interface module will handle 1/0. Some of the major functions supported by the macro language include: 1) Retrieval of records from indexed sequential ( ISAM ) files-files accessed only through hierarchic indexes; 2) modifications and additions to ISAM IDes; 3) data manipulation; 4) formatting of responses to selected terminals; 5) message switching and 6) recording audit data on a logging file. FASTER under DOS requires fixed-length records; this has been modified under the OS version. 16 Journal of Library Automation Vol. 4/1 March, 1971 Retrieval from the !SAM files required for processing a given transaction may be performed in one of three ways: 1) retrieval of a unique record, 2) sequential retrieval of a specified number of records from a logical grouping, or 3) retrieval of a specified number of records from a logical grouping, in which the retrieval records represent the best qualified from the group based upon the user's selection criteria. Hardware Hardware supported by the FASTER system is as follows: Machine configuration: IBM 360 mod/30, 40, or 50 Storage requirements: Minimum-DOS 65K; minimum-OS 128K Disks supported: 2311 or 2314 Logging file: Disk or tape Line control: BTAM witb 2701, 2702, or 2703 Terminals: 2740, 1050, 2260 CRT Systems at Shawnee Mission Computer Center: IBM 360/40 DOS 256 K Eight 2314 disks Three 2401 tape drives One 2702 line control 27 40 and 27 41 terminals The system operates in three partitions. Partition F1 houses APL (A Programming Language) for student use with 27 41 terminals. Partition F2 houses FASTER. Partition BG is used for batch jobs (both COBOL and AUTOCODER under CS monitor). File organization and access FASTER supports !SAM files only (as data sets) with the exception of the logging file; the logging device must be sequential. FASTER's support of disk files is accomplished by using the same software modules that AL and COBOL use in maintaining !SAM files. Therefore, standard pro- gramming languages may be used for creating files and data retrieval. Shawnee Mission chose COBOL as its main fanguage and found it to complement the FASTER system. Files The batch library system was based on a 400-character record, repeated in its entirety for every copy in each library. This space consumption for redundant information was undesirable in a system with 65 collections, and therefore two basic files were designed. The first is the disk title file, containing one record with bibliographic information for each unique title in the school district. Its fields include author, title, dates, subject headings, annotation, etc. (Table 1). Each record is 562 characters long. On-Line Cataloging System/MILLER and HODGES 17 Table 1. Main Fields Input by Operators Record Title Copy Field Form code Publication date Copyright date Author Title Annotation Publisher Edition Price Dewey number Cutter number Grade level Collation Series Language code LC card number Subject heading 1 Subject heading 2 Subject heading 3 Added entry Title number Number of copies Building code Funding code Volume number Print instructions Length 2 4 4 35 50 105 30 3 5 8 10 4 40 35 3 14 24 46 60 30 7 4 5 Comments Distinguishes physical format May be continued in Annotation Use MARC language codes Use Sears headings " , , , , For name or title 2 If other than general funds 3 For volume or other sequence number 16 Kept only until labels and cards printed In distinction, the disk copy file contains a 56-character record for each copy of a title, comprised of fixed-length fields for building number, special funding, volume information, and circulation control. Copy and title records are linked through the title number. The third file is the ISAM title index, comprised of records with a phon.etic code and key for each title record. This file is called up by a t~rnu~al transaction containing title; the incoming phonetic code for the htle ~~ matched with any equal ones on the index. For matches, biblio- ~rap~tc data is pulled from the title and typed on the terminal. The main unction of the title index is to determine duplicates. 18 Journal of Library Automation Vol. 4/1 March, 1971 Tests on 45,711 title records showed that a 16-character phonetic code resulted in a maximum of 36 different titles having the same phonetic code. The 16-character code chosen consists of the first character of title followed by numeric values for most consonants. The On-Line Cataloging System Input Recognizing that the pilot project might be expanded into a full-scale operation, librarians drew up procedures for entering data from either shelf list cards or new arrivals. Conversion from shelf lists required that cards be edited to eliminate confusing information and to add implicit data. For new acquisitions, most information needed by the terminal operator is annotated on the title page and its verso. A grid sheet to be slipped into the book contains subject headings, added entry, annotation and some other fields. All of these practices were set forth in a user's manual (9), along with instructions on how to enter transactions into the disk files. Limits on the input buffer permit a maximum of 120 characters to be transmitted by any transaction, which means that several transactions are required to add all cataloging and location data necessary to describe a new title. There are two sets of transactions. The LT series adds to and updates records in the title file; the LC series does the same for the copy file. For instance, entering a new title requires an LT01 transaction to start the record and assign a title number, one or more LT02's to complete catalog- ing data, and an LC01 for building assignment. Operators find transactions easy to key and understand. By category, LC and LT transactions set up new records, add on fields, update fields, delete or activate records, and query the contents of a specified record. These transactions are a simple, understandable, and powerful method of maintaining library files. Several transactions also add data to fields, thus saving the operator keying time. For instance, the Cutter number is automatically derived from the first three letters of the author's last name, unless specifically superseded by the operator. Also, "F" is assigned to Dewey for all items unless replaced by another classification. Finally, a standard set of output, consisting of 1) two author cards, overprinted cards, a copy card, and 2) one three-up label, is assumed when an LC01 transaction is input to show location. If other outputs are needed, the operator uses an LC05 transaction to specify them. There are also several instances of data being input in lower case (to save time and buffer space for a shift) and being edited on output to upper case. The result of all these program aids is that the operator knows she is keying only important data; highly invariable fields are input and edited by the FASTER programs, saving operator time. On-Line Cataloging System/ MILLER and HODGES 19 Output Two basic card formats were chosen. The unit card contains all catalog- ing information; the copy card shows a library's holdings of a given title (Figures 1 and 2) . A unit card and copy card (giving all cataloging and holdings information) go into the school's shell list; the usual set of main and added entry unit cards goes into the public catalog. Gunthf'r , J()lln hlr the Gr -----~1 of loans + Add 1 t o copies circulati ng N Calcu late n o . of days on loan S tor e information in table Fig. 1. Programme Logic. total n of days on loan Re-set Table N Calculate average length of l oan Calculate Standard Deviation Calculate Calcu l a te Print Report Predicting Need for Multiple Copiesj GRANT 67 are copies in the Library. By providing an analysis of the present circula- tion profile of each book, the formula attempts to predict the number of copies of each title the Library would need to have in order to more adequately accommodate unsatisfied demand. The programme for performing the calculations is written in PL/1 and is run on an IBM 360 j 50 (Figure 1). The execution time for 140,000 circulation records (each time a book circulates the data on its circulation is considered a single record ) is 15 minutes. The Historical Record File, the source of data for the programme, is incremented each time a book in circulation is returned. Figure 2 shows the format of this file. The file itself is a sequential file stored on magnetic tape, updated daily to include the previous day's circulation data. Entries are arranged in LC call number-accession number order. FIELD Card Type LC Call Number Author Accession Number Spare Card Sequence Number Spare Borrower's ID Code Borrower 's ID Number Spare Action Code Due Da te (MMDDYY) (MO .-DAY- YR.) Spare Indicator Date Charged Out (YYDDD) (YR.-DAY) Date Returned (YYDDD) (YR. -DAY) Fig. 2. Format of Historical Record File. LENGTH 1 29 15 6 1 6 2 1 6 3 1 6 3 1 5 5 ACCUMULATIVE LENGTH 1 30 45 51 52 58 60 61 67 70 71 77 BO Bl 86 91 68 Journal of Library Automation Vol. 4/ 2 June, 1971 RESULTS After the calculations described above have been performed for every title circulated during the academic year, a print-out of the results is produced ( Figure 3). In order to limit paperwork, only those results under "Projected Need" which were ~ 1.00 appear on the print-out; any results less than 1.00 were suppressed. The column labelled "Transactions" is simply the number of times the book was checked out and checked back in again. The column, "Average Loan Period" is the a described in the formula above. And the column, "Copies Circulated" is the number of books with the same classification number as listed on the left-hand column, but with different accession numbers, checked out during the year. This figure is not the number of copies of the book that the Library owns, which could, in some instances, be more copies than were actually circulated. The column labelled "Projected Need" should, according to the cal- culations, indicate the number of copies of a title which could accommodate the demand for that title with 95% certainty. In order to find out whether or not the Library should purchase more copies of a particular title, the number listed in this column is simply checked against the number of CLASSIFICATION AUTHOR PROJECTED TRANS. AVG LOAN COPIES NEED PERIOD CIRCUL . AM---101.-.C3488-- CANADA-NATIONAL 3 . 61 37 10 . 45 17 B-----56.-.C6--- -- COLLINS-JAMES-D 1.14 21 8 . 00 2 B-----65.-.86--- -- BODENHEIMER,£.- 1. 21 12 11.50 3 B- -- - -65. - .R6----- ROMMEN-HEINRICH l. 34 5 20.60 2 8-- - - -67 .- .858-- - - BLAKE-RALPH-M-- 2.00 4 36 .75 2 8 - ----67.-.N22---- NAGEL-ERNEST--- 2. 34 23 11 . 39 3 8--- --72.-.C63- -- - COPLESTON - F.C . - 2. 39 27 9.18 10 B-- ---72. - .HS---- - GILSON-E. H.---- 1. 64 26 9.03 14 B-----72.-.J6----- JOAD-CYRIL-EDWI 2.84 8 2 1. 7 5 2 B- ----72.-.P3----- PARKER- F .H . ---- 2.48 4 41.00 2 B- - --358.-.C57---- PLATO--------- - 5.68 21 15 . 61 3 8----358.-.J8----- PLATO---------- 2.00 38 8 . 07 10 B- ---358.-.W7----- PLATO----- ---- - 2. 72 5 39.80 3 8----377. -. A285-- - PLATO-- - - - -- - - - 3.65 8 3 5. 3 7 2 B----378.-.A2C6--- PLATO- - ----- - -- l. 58 2 73 . 00 2 8 - - -- 381. -. AST35- - PLATO-- - - - -- -- - l. 04 3 36 . 33 3 B- ---385.-.A6----- ANDERSON-F - H- -- 2.92 16 13 . 43 2 B----395.-.877---- BRUMBAUGH-R-S- - 2. 05 12 13.33 1 B----395.-.C6----- CROMBIE-I------ 3.02 17 12 . 41 2 B-- - - 395. - .G67--- - GRUBE-GEORG E- M- 5.13 30 10.30 5 B----395. -. G78 -- -- GUARDINI,R.- - - - 2. 04 17 12 . 23 4 B- ---395. -. K6----- KOYRE-ALEXANDRE 1.13 4 21. 7 5 1 B- -- - 395.-.L6-- - -- LODGE - RUPERT - C- 1. 88 3 51.33 1 B----395.- . 553- - -- 5HOREY , PAUL-- - - 4.69 23 11.91 4 B- - --398. - .T25 -- - - TAYLOR,A.E. ---- 1. 31 28 7 . 7 5 5 8 - -- - 398.-.E8H17- - HALL , ROBERT-W .- 2.99 11 16 . 72 2 B- -- - 407 .-. L8L9-- - LUTOSLAW5KI,W. - 3.10 4 59.25 1 B- --- 505. -. M2-- - -- ARISTOTEL£5, - -- 2 . 88 17 1 2 . 00 7 B----505.-.03---- - OATES-W.J.-- - -- 3 . 86 9 2 7. 3 3 7 B-- - -528.- . Z 4 13- -- ZELLER-EDUARD-- 1.39 6 33 . 00 2 8- - --528. - .P751--- POHLENZ-MAX---- 1. 35 5 34.60 2 8 - --- 667.- . 525---- SAM BUR5KY - 5AM-- 1. 36 5 42 . 40 1 B- -- - 701 . -.D4D6-- - DONDAINE , H . F . - - 1. 03 2 69. 0 0 1 B- -- - 701.-.A4E5 - -- PROC LU5-DIADOCH 1.11 2 72.50 1 Fig. 3. Circulation History Analysis Report. Predicting Need for Multiple Copies/ GRANT 69 copies listed for this classification number in the official shelf list. For example, the book classified as B.72.J6 shows a "Projected Need" of 2.84. Therefore if the Library had three copies of this book, and the book's circulation pattern did not change significantly in the immediate future, then the Library would be able to fill 95% of the requests. The official shelf list, however, indicates that the Library only owns two copies of this title, suggesting that at least one more copy should be purchased to meet present demand. These calculations do not anticipate future demand on the book. Also, doubling the number of copies can never succeed in doubling circu- lation, a fact demonstrated by Leimkuhler ( 2). This print-out, therefore, can only serve as one guide to multiple-copy purchase. PRECAUTIONS AND PITFALLS In using the results of these computations as a guide to the purchase of multiple copies, the Librarian should be aware of several factors which may have distorted the results. One is that the student who checks out the only copy of a book and keeps checking it out all year, in lieu of buying his own copy, creates a false "demand" for· the book. It may be that he is the only person in the University interested in it, and when he graduates this book may sit out its life on the shelves completely unused. However, since the Historical Record File contains the Borrower's ID number, it is possible to distinguish between an original loan and a renewal. The first time the Borrower's ID number appears on the book's circulation record indicates the original loan. Each additional and consecutive time the same Borrower's ID number appears on the same circulation record indicates a renewal. Although the pilot project did not contain provisions for obviating this problem, it would have been simple enough to build into the programme a mechanism for suppressing the unwanted data. A faculty member who assigns parts of books for students to read, but does not place the books on reserve, forces competition for them on the open shelves. This too creates a demand which may not exist after the professor leaves the University or stops teaching a particular course. The librarian should be aware of such possible short-lived demands that may never recur. The circulation analysis programme was executed at the end of one academic year in order to provide the University of Windsor librarians with guidelines for purchase of multiple copies of books to be used in the next academic year. If it were known that a particular book receiving heavy use one year would not receive equally heavy use in the next (be- cause, for example, the particular course requiring that book would no longer be taught; or the book would be placed on a "two-hour reserve" for the coming academic year; or the book circulated frequently in one year only because it was on the "best-seller list"), then it would be folly to purchase three or four additional copies of the book just because the computer print-out indicated that a number of additional copies were 70 Journal of Library Automation Vol. 4/2 June, 1971 needed. Other factors, therefore, although not included in the input data, are certainly relevant in determining the need for multiple copies. At the University of Windsor Library, a book that needs to be re-bound because of heavy use or mutilation is charged out to the Bindery Depart- ment. It then shows up on the Historical Record File, just as though it had been charged out. But since the "Borrower's ID number" for books charged to the Bindery Department consists of all zeroes, it would be simple enough to identify and suppress these particular records as unwanted data. BY-PRODUCTS In addition to providing a list of books to be considered for duplication, the Historical Record File upon analysis revealed several other interesting facts about the University Library's circulation. Most noteworthy is the fact that, although there were more than 200,000 circulating books sitting on the open shelves at the time of this pilot project, only 40,205 different titles circulated for a total of 134,276 times. Assuming there were only 100,000 different titles among the 200,000 books, this would mean that nearly 60% of the collection was probably not used by the students. Of the 40,205 different works which did circulate, the calculations indicated that only 3,257 titles required one or more copies in order to fill 95% of the requests. Of this latter number, only 570 titles were in need of duplication. (That is to say, the number of copies listed under projected need exceeded the number of copies actually owned by the Library as indicated by the shelf list.) A random sample comprising one-third of these 570 titles was checked to see whether or not the books were in print. Indications were that 38% of the titles in need of duplication were no longer in print. CONCLUSIONS A close examination of the 570 titles apparently in need of duplication reveals that, with very few exceptions, students are apparently checking out only books that are curriculum oriented in the most narrow sense, i,e., books which they need to use in writing term papers. Nevertheless, one can appreciate the fact that these books are in demand by the student, and if the Library is to be responsive to users' demands on its facilities, it will need to spend part of the book budget each year purchasing multiple copies of the most heavily used books. Unfortunately, even with these good intentions and the sophisticated assistance of the computer, students' demands for books will still be frustrated (at least one-out-of-three times) because books which need to be duplicated are no longer in print. PROGRAMME A print-out copy of the circulation analysis programme described above Predicting Need for Multiple Copies/GRANT 71 is available from Mrs. Jean Griffiths, Computer Centre, University of Windsor, Windsor, Ontario, Canada. ACKNOWLEDGMENTS The initial impetus and continuous guidance for this project was pro- vided by Albert V. Mate, Assistant Librarian for Public Services at the University of Windsor. Dr. Martin Basic, Faculty of Business Administra- tion, acted as consultant. Systems analyst was Mrs. Jean Griffiths, and programmer was Mrs. Lillian Jin, both at the University Computer Centre. REFERENCES 1. Leffier, William L.: "A Statistical Method for Circulation Analysis," College and Research Libraries, 25 ( 1964), 488-490. 2. Leimkuhler, Ferdinand F .: "Systems Analysis in University Libraries," College and Research Libraries, 27 ( 1966), 13-18. • 5584 ---- lib-MOCS-KMC364-20140103102629 72 ENTRY /TITLE COMPRESSION CODE ACCESS TO MACHINE READABLE BIBLIOGRAPHIC FILES William L. NEWMAN: Systems Analyst and Programmer, and Edwin J. BUCHINSKI: Assistant to the Librarian, Systems and Planning, University of Saskatchewan, Saskatoon. An entry/title compression code is proposed which will fulfill the following requirements at the Library, University of Saskatchewan: 1) entry/title access to MARC tapes; 2) entry/title access to the acquisitions and cata- loguing in-process file; and 3) entry /title duplicate order edit within the acquisitions and cataloguing in-process file. The study which produced the code and applications for the code are discussed. INTRODUCTION The determination and design of access points, or keys, to machine readable bibliographic files is a major problem faced by libraries planning computer assisted processing. Alphabetic keys, i.e. truncations of title and/or author variable fields, are inadequate, since minor differences in spelling, punctua- tion, or spacing between master key and request key cause difficulties in accessing records. Numeric keys, such as Library of Congress card numbers, ISBN, purchase order numbers, etc. are therefore usually employed for searching machine readable library files. More sophisticated means must be developed in order to maximize the usefulness of these files, since a searcher, even with book in hand, may not be able to provide the numeric key necessary to obtain the book's machine readable data. This problem may be solved through the use of compression codes generated from author /title, or other bibliographic information. Studies Entry/Title Compression Code/NEWMAN and BUCHINSKI 73 of compression codes and their performance have been reported by Rueck- ing ( 1), Kilgour ( 2), and the University of Chicago ( 3). This approach has been endorsed by the Library of Congress ( 4) in the RECON study. Studies at the Library, University of Saskatchewan, were initiated with the hope of producing a compression code that would provide machine duplicate order edit in the acquisitions and cataloguing in-process file, and retrieve entries, using unverified or verified bibliographic information as input, either from a partially unverified file, such as the acquisitions and cataloguing in-process file, or from an authoritative machine readable data base, such as MARC II. In addition, the desired code would have to minimize errors in punctuation and spelling in order to achieve a high retrieval percentage, yet produce a low volume of duplicate codes for dissimilar works. CONSTRUCTION OF THE DATA BASE Since June, 1969, the MARC data base has been used at the Library to generate unit cards which have been used as source data for unit card masters in the cataloguing department. Approximately 300 of these were drawn at random. At the same time the original order request forms for these items were searched. Of the 300 items, 254 requisition forms were found in the manually maintained acquisitions in-process file. The LC card numbers were used to retrieve the corresponding MARC records from the Library's history MARC tape. An additional4,128 MARC records were placed on the same tape as the 254 MARC records for which order request information existed. The LC card numbers, order entry, title, and, if present, d ate of publica- tion were keypunched from the 254 requisition forms. This bibliographic information formed the data base from which search codes were produced for the acquisitions department records. CODE GENERATION A computer program performed the following modifications on all input data prior to generating the actual compression codes. First, all the lower- case alphabetics were converted to upper-case alphabetics. Then all punc- tuation was eliminated from the title field except for periods and apostro- phies within a word. A word compaction routine then eliminated periods from within abbreviations and apostrophes from within words. The entries from the 4,382 MARC records and the 254 requisition forms were categorized according to personal name, and corporate or conference name. The first comma delimited the portion of the personal name to be used in the compression routine. Spaces, diacritics, periods, apostrophes, and hyphens were all eliminated from the personal name. The first two codes used in the Project were labelled imaginatively Code Type 1, and Code Type 2, where Code Type 1 was a slight modification of the code developed by Frederick H. Ruecking ( 1). Code Type 2 was 74 Journal of Library Automation Vol. 4/2 June, 1971 based on a modified University of Chicago Experimental Search Code ( 3) incorporating ideas from some of Ruecking's studies. CODE TYPE 1 Title Compression (16 characters) See Ruecking ( 1) for the rules which were used to construct the four- character compressions. Entry Compression (12 characters) Three four-character compressions were used for corporate or conference names instead of Ruecking's four. One four-character compression was produced for personal names. Date of Publication (3 characters) If the year of publication was available, the last three digits were used, otherwise, the date was left blank. The total length of Code Type 1 is 31 characters. CODE TYPE 2 Title Compression (6 characters) 1) "A", "an", "the", "and", "by", "if", "in", "of", "on", "to" were deleted from the title. 2) The first word containing two consonants was located and the first two consonants appearing in the word were used for the search code. 3) Step 2 was repeated with a second and third word of the short title, whenever these were available. 4) If three words with two consonants were not available, the balance of the six characters needed for the code were supplied by those characters immediately after the last character used (except for blanks). Entry Compression (6 characters) a) Personal name. 1 ) Only the surname, or the forename if there was no surname. 2) If the name had six or fewer characters, the entire name was used. Otherwise, vowels were deleted from the name (working backwards on the name ) until the six-character compression was formed, or the second consonant was located. 3) If the six-character compression was not formed by step 2, then the first four characters and the last two characters were used for the six-character compression. b) Corporate and conference entries The rules for title compression to form the six-character code were followed. Entry/Title Compression Code/NEWMAN and BUCHINSKI 75 Date of Publication (3 characters) The last three digits of the date of publication, as in Code Type 1, were used. In either of the codes, if the title was the main entry, a code was gen- erated with the entry field blank. Examples of Code Generation Title: Factors in the Transfer of Technology. Entries: 1) M.I.T. Conference on the Human Factor in the Transfer of Technology, Endicott House, 1966. 2) Gruber, William H. 3) Marquis, Donald George 4) Massachusetts Institute of Technology Date of publication: 1969 Code Type 1 compressions: 1) F ACTTRSFTCHNlil:z;lblbMITlbCOFRHUMlb969 F ACTTRSFTCHNlblblblbMIT!tCOFRHUMlblblblb 2) F ACTTRSFTCHNlblblz:hGRBRlblblb1z:lbt)lblb969 F ACTTRSFTCHNlblblblbGRBRlblblblblblblblblblblb 3) F ACTTRSFTCHNlbl:z;lblbMAQSlbbbblz:lblbb969 F ACTTRSFTCHNIPbblbMAQSbblbbbbbbl:z;bb 4) F ACTTRSFTCHNlbblbbMATTINTTTCHN969 F ACTTRSFTCHNI:z;bblbMATTINTTTCHNlblblb Code Type 2 compressions: 1) FCTRTCMTCNHM969 FCTRTCMTCNHMlblblb 2) FCTRTCGRUBER969 FCTRTCGRUBERblbb 3) FCTRTCMARQUS969 FCTRTCMARQUSlbbb 4) FCTRTCMSNSTC969 FCTRTCMSNSTClbblb PROCEDURE AND RESULTS The two types of codes were generated from the 4,382 MARC records using publication date, short title, main entry, and added entries. Another program was written to generate codes from the acquisitions department data on cards and to write them on a separate tape using publication date if available, entry, and the first four significant words of the title and/or the words of the title up to the first punctuation mark. The two tapes containing codes were sorted in ascending code sequence, then compared. If the code generated from the acquisitions data, hereafter called the unveri- fied code, was exactly the same as the code generated from MARC tape, hereafter called the verified code, the codes and corresponding LC card numbers were printed as a hit. The program then checked the LC card 76 Journal of Librat·y Automation Vol. 4/2 June, 1971 numbers corresponding to the identical codes. If the LC card numbers were the same, a retrieval was recorded; otherwise, the matching codes were considered a false drop. The program also checked and printed duplicates existing within the verified codes and within the unverified codes. Since the code formation programs involved string manipulation, they were written in PL/I, while the comparison program was written in COBOL. The programs were run on the IBM/360 model 50 installed at the University of Saskatchewan Computation Centre. Table 1 and Table 2 give the results. The following is a description of the error types used in evaluating non-retrieval: A-The unverified entry had only a remote relationship to the verified entry. No retrieval technique would have produced a match. B-The unverified entry was misspelled. C-The unverified title had only a remote relationship to the verified title. D-The unverified title contained misspelled word ( s) . E-Only the unverified date of publication was incorrect. As an immediate consequence of the analysis of Tables 1 and 2, the publication date was eliminated from the codes and the comparison pro- gram rerun producing the results given in Table 3. Table 1. Code Performance Retrievals False Drops Code Type 1 200 0 Code Type 2 206 0 Table 2. Non-Retrieval A nalysis Numbe r of No n-Retrievals Error Type A Code Type 1 B c D E Table 3. Code Performance Rett·ievals Code Type 1A Code Type 2A 220 226 9 10 8 7 20 False Drops 0 0 Percent Retrieval 78.74 81.10 Code Type 2 9 7 8 4 20 Percent Retrieval 86.61 88.98 No duplicate codes existed within the unverified code tape. From the 4,382 MARC records, 6,828 cod es were produced for each of Code Type 1A Entry/Title Compression Code /N EWMAN and BUCHINSKI 77 and Code Type 2A. Works having the same author and title, but different imprint, were not considered duplicates even though the program listed them as such. Seven duplicates, one triplicate and one quadruplicate occurred in Code Type 1A; and eight duplicates, two triplicates and two quadruplicates in Code Type 2A. Government publications were responsible for all but one of the duplicate codes. CODE TYPE 2B A graph of the number of duplicate codes vs. the number of source records was drawn for Code Type 1A and Code Type 2A (Fig. 1 ). As a result of this graph Code Type 2B was proposed. This code employed the same rules for construction as Code Type 2A, except that four significant words from the title and four significant words from corporate or conference entries were used to generate the compression. The total length of Code Type 2B is thus sixteen characters. Six duplicates, one triplicate and one quadruplicate appeared when the comparison program was run using Code Type 2B. Figure 1 is a graph of the result. ~ < (.) 15 ~ 10 rx. 0 5 lOOO 2000 3000 NUMBER OF MARC RECORDS 4000 2A lA 2B Fig. 1. Numb er of Duplicates vs Number of Source Records for Code Types lA, 2A and 2B. 78 Journal of Library Automation Vol. 4/2 June, 1971 The performance of Code Type 2B is summarized in Tables 4 and 5. Table 4. Code Performance Retrievals Code Type 2B 223 Table 5. Non-Retrieval Analysis Error Type APPLICATIONS MARC Tapes A B c D F al:>e Drops 0 Percent Retrieval 87.80 Number of Non- Retrievals Code Type 2B 9 10 8 4 A MARC code tape was recently created and is being maintained at the University of Saskatchewan, as flowcharted in Figure 2. Each record on the tape consists of a compression code and an LC card number. Ap- proximately 100,000 entry /title keys, plus series statement and SBN keys, have been created from the 65,000 records on the current MARC history tape. Figure 3 illustrates how these access points are used to provide unit card printouts. Figure 4 shows a sample output from the matching step in Figure 3. This printout indicates the results of the search, and serves as a link between the request and the catalog card printed from the MARC tape. In the printout, entry / title requests that have found more than one LC card number do not necessarily indicate a false drop. So far, these multiple finds have resulted from the same publications appearing on MARC with different imprints. It is a simple matter to select the catalog card with the appropriate imprint. The discrepancy in Table 6 between MARC records found and titles verified is due to the above, and to multiple hits on a single record when requests for that record were submitted in more than one form, i.e. S.B.N. and author/ title. Table 6 presents a summary of the results of submitting unverified requests over a four-week period against the MARC code tape. During that time, 563 English language monographs with potential 1969 and 1970 imprints were searched. Desired MARC records were found for 184 titles, or 32.7% of these requests. The source data for the requests was supplied from title-pages and order recommendations. This data was not verified because the compression code access technique partially solves the problem of non-retrieval due to human errors in the submission. - Entry/ Title Compression Code / NEWMAN and BUCHINSKI 79 I I I NEW NEW UPDATE CODES CODES .., I I I I I r-- UPDATE I I I I I I I OCCASI ONAL -----1 I I I I I I I I I OLD I I SPLIT CODES I I I I I UPDATE r _____ ., __ .J Fig. 2. MARC Tape Processing. 80 Journal of Library Automation Vol. 4/ 2 June, 1971 GENERATE CODES LC CARD NUMBER ~UESTS CATALOO CARDS ADD TO FI LE CODES Fig. 3. MARC A ccess Programs. Table 6. MARC Ret1·ieval Form of Request Author/ Corporate Title Title Author / Title Number of Requests MARC Records 546 29 130 Found 173 2 11 Titles Verified 148 2 6 False Drops 0 0 2 SBN 139 36 20 0 CODES Series Total 11 855 18 240 8 184 6 8 Entry/Title Compression Code/ NEWMAN and BUCHINSKI 81 l i 8Uit't • FNTII:Yifl'lf ~fOU£""S"fSFOfl: "'l UI(. Cll.UOG (.UOS ii',U;t; ) M ILIC I'oi !IO ~oj l ii Hf:~ MIMU AGES I~ ~NGLAN O 1216 1185 SHAH ~nEUJl l U!)Jol OF UNIYFR'i i TT Tf,I,C.HINC fltf frlll( lU~ I N I)Jt: PUGE iNO L I II.FNF'>S Ullll~iUS 1 1'11 'lUfl: f ii i.Gf •No llltf'-f~S HUttANI!Y l'«) QIVIN1f'l' 1111 ft.ll111,. ......... 0 .. 11-AU"S nF SOUTH III FST SARiWU !HOlf 'tAl 4Y SU fKClOSKf't' .. (f4 FH41CS AND NOit "'A!..!Y.LfTHi n tc.F fl' , IN I'lU') f ll fA L i lAT I ON IN Joii NETfHHH CfNT~'f fU fl:tlP! l(f flt il' 1NDllS TA.IILI14TION IN l 9_TH CFN TL,. Y EUII.O Pf VAlf f:NCll SH CAS(.O "'V VAl( FHCliSH G.t.SCO"'Y llot't 14H HUit l "if'/N fNCl i SH HO"'f lt't'ltfS A"tf".l,O SAJ:C)til "0T1'FitY l>t_!l_ HTTlf MfNT nF f "'GL&NO ')II'~(. I( , ,.. 1."10"tY"100S ii'UtiANf:NU'tY D!AitY 10 SEAt:CH COOf lC CAIII.O NUMIUU I ·=-------------------- CU tHt OGS NCIIl l KS N NO LC CAitO NU"' IUIIS FOUND C U I'IGLIIHI4DVU.IN1tS 10460121 S'eC~t CU NNP~OIIIHSP£(11( NO lt Co\11:0 HU~ 8FIII:5 FOU !ro'D &"f &'II"'NY III()J$ PAIIli&MH11.!.!3!_Qj'.llltY 17.0"'-'-''--------- ----- l(l 'lr. ANHII:C TIC JO'INV)N TNTIIIOPlltJ ION TO \OVIH lfG&l. SY\_ffl't 'U..NEII lt(lt 1fiC5 o\NO "'ll. l 1 'tU10'1H CO .. I'U'f C U l'l" lNfC MTUit Nflt fifO LC CUO NU'!&U5 FOUNO CU I' M5 U.flTN NO lt CAitO NU MUU FOUND Fig. 4. Results of Entry / Title Search for MARC Unit Card Printouts. Manual searching of the NUC catalogs was employed to verify titles that could not be located on the MARC tape. Ten titles were found with MARC notations after failing to be retrieved by compression code matching. Type A and type C errors were primarily responsible for this non-retrieval. How- ever, two of these titles could not be retrieved from the MARC tape follow- ing manual verification, since the verified entries in the NUC preceded their counterparts on the MARC tape. Thus the performance of the com- pression codes can be evaluated as 184 of a possible 192 hits, or a 95.8% retrieval rate. During the four-week period the keypunchers formul ated 52% more entry / title requests than there were titles for verification. This is due mainly to the need for submitting more than one author/ title request whenever the portion of the title which comprises the short title is in doubt, since the code is formulat ed from the short title only. Additional experience should decrease the number of redundant requests. Only 8 false drops have been received in the above submissions. Re- trieval of series entries is likely to engender the greatest number of false drops because series statements are treated as titles in the code generation procedure. Acquisitions and Cataloguing During the past two years, the Technical Services Department at the 82 Journal of Library Automation Vol. 4/2 June, 1971 Library and the Computation Centre have designed, and are currently testing TESA I (Technical Services Automation-Phase I), an automated acquisition and cataloguing system ( 5), the primary objectives of which were to pursue a total library system concept and to provide for conversion from a batch system to an on-line operation when sufficient computer facilities become available. At the same time that work proceeded towards these objectives, status codes and receiving reports were employed as used in Washington State University's LOLA system (6) and (7). However, MARC tapes and com- pression codes comprise an integral part of the system. If a MARC record can be located before an order is entered, a tremendous amount of keying is saved. One 64-character in-process transaction will supply the ordering information and transfer the bibliographic data from the current MARC history tape to the direct access acquisitions and cataloguing in-process file (IBM's Basic Direct Access Method). Minimal cataloguing updates are necessary before catalog card sets can be produced. Entry /title access ensures that only a small percentage of needed MARC records will slip through TESA I's fingers at order initiation time. Another code application as illustrated in Figure 5 will exploit the fact that the same code construction rules are used in the MARC system as in TESA I. Items requiring bibliographic information will be flagged in the in-process file. When a new MARC tape arrives, the in-process code file (IBM's Index Sequential Access Method) will be automatically matched with the MARC codes created from the new weekly tape. A sample printout from these matches is provided in Figure 6. After verifying which MARC records are needed, MARC bibliographic information will be transferred to the appropriate in-process records. Each record in the direct-access ISAM compression code file consists of a compression code (or SBN or LC card number) and the key ( purchase order number) to the corresponding in-process record. A threaded list structure exists within the in-process file to handle the possibility of one code accessing several items. Thus an in-process record may be directly accessed by entry /title, series statement, SBN, LC card number or purchase order number. A fast edit routine built into the direct-access write detects whether or not the compression code about to be written is a duplicate of a code already in the file. If the code is unique the code record is written on disc and a single item list is created within the corresponding in-process record. If the code is not unique, the code record cannot be written. In this case the list structure for the code is updated to include the key of the in-process record being added. A message, together with the purchase order numbers of items which may be duplicates, is printed to warn the acquisitions staff that a potential duplicate is being added to the in-process file. Traditional duplicate checking of in-process items thus becomes an exception. Entry/Title Compression Code/NEWMAN and BUCHINSKI 83 Fig. 5. Search of Weekly MARC Tape for Records Needed in the In-Process File. PAG£ 1 ----------------------------------------------------------------------------------------------------------------- 1 Tf N HUMIIJER ; 1001S2'CO MA't' CnU:ESPONO TO lC CAIU) !IJ MIIJER: AUTHOR: ltfNYON, MUU LLOYI"', JITlE~I:NY~OI NS OF ENCUND. 1711120 4 -!~~::o=~"' ;~:~:~!::~~l~bCO~RESPOND TO lC CARD NUMBER: 7Tl2 lUS,_ ___________________ _ TITLE: Plt.ODUCTS AN D THE CONSUMER : OEFECTIVE AND OANGEROUS PltODUCTS , -!~~=o~~~~~~~~E~~o~=~~ /: ~ c~~:ri;sPnNn ro u: cuo'-"-" ""'""'"~'•oc''---''-"•'-'11'-''-""-'--------------------- --' -'TlF: CHINA 800 Y t S;eOU!!>L'-''---- ITE M NU MI!I FR : lOOJS\~_0 HU CORRFSPONO TO LC CARD NUMBER : l9~()61U I U'I'HQR : ~EHINIR ON P.lN(Ht.Y U I RA J , P UNN I NG AND Of140CR AC Yt JAI PUil , 1~ 6 4. ---l.!.!!:...E: P~CHAU.T I UJ , Pl ANNING AHD .QEMOCU~c.CyL• ..><.<_PROJot!llfE-'.•----------------------- Fig. 6. In-Process Items for Which Bibliographic Information May Exist on the Newly Arrived MARC Tape. 84 Journal of Library Automation Vol. 4/2 June, 1971 Remote Access to MARC An experiment was conducted in which entry /title requests were sub- mitted from the IBM S360/40 computer at the University of Saskatchewan, Regina Campus, Computer Centre, over a communication link to the Saskatoon Campus IBM S360/50 computer. The MARC access program was read into the Regina computer, sent to Saskatoon's computer, spooled in the Saskatoon job queue and executed; then the results of the search were sent to Regina to be printed. The entire process took approximately the same time as if the program had actually been executed in Regina. No data transmission errors were encountered in transmitting either the re- quests or the retrieved MARC unit cards over this 150-mile communication link. CONCLUSION There is an inverse relationship between retrieval performance and number of duplicate codes produced. A high retrieval code such as Code Type 2A results in more duplicates than a code such as Ruecking's, which has a slightly lower retrieval performance. Code Type 2B fulfills the requirements for a code short in length and easy to construct that produces a low number of duplicates and has high retrieval capability. For an index to a library holdings file, or to a national data base, a code such as Ruecking's, with four or more significant words from title and corporate or conference entries, and with different rules for personal author compression, would perhaps be suitable. ACKNOWLEDGMENTS The authors thank the Library staff for their assistance in the study. They are also grateful to the Library and Computation Centre administra- tions, in particular, D. C. Appelt, G. C. Burgis, and N. E. Glassel for the allotment of computer time and their encouragement. REFERENCES 1. Ruecking, Frederick H. Jr.: "Bibliographic Retrieval from Bibliographic Input; The Hypothesis and Construction of a Test," ]oumal of Library Automation, 1 (December 1968), 227-238. 2. Kilgour, Frederick G.: "Retrieval of Single Entries from a Computerized Library Catalog." In American Society for Information Science, Annual Meeting, Columbus, 0., 20-24 Oct. 1968: Proceedings, 5 ( 1968 ), 133-136. 3. "University of Chicago Experimental Search Code." In Avram, Hen- riette D.; Knapp, John F.; Rather, Lucia J.: The MARC II Format: A Communications Format for Bibliographic Data (Washington, D.C., Library of Congress, 1968), pp. 129-131. 4. "Computer Requirements for a National Bibliographic Service." In RECON Working Task Force: Conversion of Retrospective Records to Machine-Readable Form (Washington, D.C.: Library of Congress, 1969), pp. 183-226. - Entry/Title Compression CodejNEWMAN and BUCHINSKI 85 5. Newman, W. L.: Technical Services Automation-Phase I Acquisitions and Cataloguing (Computation Centre, University of Saskatchewan, Saskatoon, November, 1969) , mimeographed. 6. Burgess, T.; Ames, L.: LOLA; Library On-Line Acquisitions Sub-System (Pullman, Wash.: Washington State University Library, July, 1968 ). 7. Mitchell, Patrick C.: LOLA, Library On-Line Acquisitions Sub-System, (Washington State University, June, 1969) unpublished. 5585 ---- lib-MOCS-KMC364-20140103102802 86 BOOTH LIBRARY ON-LINE CIRCULATION SYSTEM (BLOC) Paladugu V. RAO: Automation and Systems Librarian, and B. Joseph SZERENYI: Director of Library Services, Eastern Illinois University, Charleston, Illinois An on-line circulation system developed at a relatively small university library demonstrates that academic libraries with limited funds can develop automated systems utilizing parent institution's computer facilities in a time-sharing mode. In operation sinte September 1968, using an IBM 360 j 50 computer and associated peripheral equipment, it provides control over all stack books. This article describes the history, analysis and design, and operational experience of the Booth Library On-Line Circulation System (BLOC). Since September 1968, when it went into operation, it has constantly been evaluated and modified to make it as perfect a system as possible. Articles in library literature describing on-line circulation systems in operation at various libraries include Hamilton ( 1 ) , Heineke ( 2), Kennedy ( 3), and Bearman and Harris ( 4). BLOC differs considerably from those reported systems and has some unique characteristics that deserve the attention of the library profession. It is one of the pioneering circulation systems in which on-line real-time inquiries are being made into the computer files by use of a cathode ray tube display tenninal. It is not a prototype or model system to be interpreted as the optimum circulation system, but rather it is a dynamic system which will and should ------------------......... On-Line Circulation System j RAO and SZERENYI 87 be modified to achieve th~ best possible system in accordance with the latest developments in computer hardware and software. Its analysis and design were influenced by the needs of an academic library. However, with little or no modification this system can be adopted by public and school libraries. ENVIRONMENT Eastern Illinois University, a state-supported institution located in Charleston, has developed a comprehensive curriculum that offers programs in liberal arts, teacher education and other professional fields, and a graduate school. The enrollment for the academic year 1970-71 is 8,600 students, and the number of the faculty is 711. The goal of the University is to provide an excellent education in an atmosphere of high faculty-student ratio and generally small classes, characterized by intellectual dialog and daily contact among students, faculty and administrators. Instructors require heavy use of library materials. Booth Library, the main library of th~ University, contains 235,000 volumes in its collection at present. It has just finalized a five-year develop- ment plan to keep pace with the growth of the institution and will increase the collection to over 400,000 volumes by the end of 1975. BLOC was designed to satisfy the Library's present and future requirements. PLANNING Analysis Phase In order to improve services to its patrons through the utilization of modern technology, Booth Library started planning for library automation as early as 1965. Early experiments used unit record equipment in such areas as the ordering of Library of Congress printed catalog cards, acquisi- tions and serials control. Initial difficulties prevented these projects achiev- ing full operational status, however, and subsequently all were abandoned. The primary benefit Library staff gained from these early experiments was education in planning carefully for subsequent automation projects, one of which is the BLOC system. Initial planning for the latter began in 1966; however, the original plan, which was for closed stacks, had to be modified considerably to make BLOC compatible with more recent develop- ments in Booth Library operations. The Library switched to open stack operation in 1967. While the BLOC planning was going on, there were also plans underway to expand Booth Library's physical facilities and its resources to meet the needs of an expanding campus. The volume of circulation had already been increasing at the rate of 15% per year. The circulation staff had to be increased to cope with the situation, and even then quality of service had to be sacrificed to quantity demands. Furthermore, it was determined that the proposed growth in enrollment and the anticipated increase in library materials would increase the volume of circulation even more and 88 Journal of Library Automation Vol. 4/2 June, 1971 impose additional work on already overburdened circulation staff. The call-slip circulation system in use at that time no longer seemed adequate, and the file maintenance associated with the call-slip system had turned into a time-consuming and cumbersome task. Thus the need for an im- proved and simplified circulation system became evident to the adminis- tration of the Library. The professional librarians held several informal discussions to identify and develop a circulation system that would adequately meet both present and future requirements of the Library. Several existing types of circulation systems were considered and comprehensively reviewed, but the librarians did not agree upon any of them. However, the review did result in the formation of a task force, consisting of representatives from the administra- tion, the Data Processing Center, and the Library. After thorough investigation, this task force recommended a computer- ized on-line circulation system as a possible solution to the Library's problem, and the administration authorized the task force to prepare a detailed analysis and design proposal. Design Phase In developing its detailed proposal, members of the task force took into consideration the fact that the new circulation system would use the existing computer facilities on the campus. They aimed at a system that would provide the best possible service at least cost in the long run, and one that would allow for incorporation of future developments in computer technology. Main design objectives were to l) eliminate borrower participation in the check-out process, 2 ) speed and simplify circulation procedure, 3) eliminate manual file maintenance, 4) permit identification of the status of any book within the system, 5) provide accurate and up-to-date statistics concerning use of library materials, including the number of times a given book is used, 6 ) provide guidance from the system in case of human error in con- ducting a transaction, and 7) relieve professional librarians from clerical chores. DEVELOPMENT Hardware The computer system on the campus operates in a time-sharing mode, concurrently performing several on-line and batch processing jobs for the Registrar's Office, Business Office, Textbook Library and Booth Library. At the present time it is an IBM S/360 model 50 with 262K bytes central core and related peripheral equipment. It functions under the supervision of operating system OS. On-Line Circulation SystemjRAO and SZERENYI 89 Figure 1 shows a schematic of the system's IBM equipment and data How among the various components, as applicable to BLOC. Among the components shown in the schematic, two 029 keypunches, one 059 verifier, two 1031 terminals, one 1033 printer and one 2260 cathode ray tube display Fig. 1. Booth Library On-Line Circulation (BLOC). 059 Verifier 557 Interpreter terminal are exclusively used by BLOC and are located in the Library. The other components, located in the Computer Center, are shared by BLOC along with the other systems in operation on the campus. It should be pointed out also that this schematic represents only the equipment used exclusively or in a shared mode by BLOC and does not represent the University's total computing system configuration. Software There are 25 different applications programs written in PL/1 ( F level) to support the BLOC system. These do not include the system programs written in Assembler Language to perform certain basic machine functions. There are two main data files that are required for the operation of BLOC. These two files are stored on a 2314 disk storage facility and are available to the system in an on-line, random-access basis. The programs required to process the BLOC transactions are also stored on the same disk storage facility, and these programs are loaded as needed by the operating system. 90 Journal of Library Automation Vol. 4/2 June, 1971 Of the two data files the first one is the Patron File, which contains identification data of persons eligible to borrow books from Booth Library. The second one is the Booth Master File, containing identification informa- tion for each physical volume located in Booth Library. The Patron File is a combination of employee and student files that were created to serve the usual business needs of a university. This file is arranged in the indexed-sequential method (5) by the patron's Social Security number. Each student record in this file is 408 bytes long and each employee record in this file 304 bytes long. At present this file contains over 19,000 records, including some inactive student records. To process the transactions BLOC borrows such information as name, address and telephone number of the patron from this file as needed. Updating of this file is done by the Computer Center with the aid of the University adminis- trative offices. The Booth Master File was created exclusively for the operation of BLOC. Creation of this file, which took one and one-half years, was done by con- verting the Booth Library shelf list into punched cards and then transferring the information from the punched cards to a disk file. One master card was punched for each physical volume in the library. After verification, information from these cards was loaded onto the disk file through the S/360. Layout of the master cards is given below: Field Card Columns Explanation Transaction code Accession number Format code Call number Edition, year, series Volume number Part, index, supplement number Copy number Location code Author Title End of card code 1 2-7 8 9-28 29-31 32-35 36-38 39-40 41-42 43-52 53-79 80 A=new record C=change record D=delete record Oversize, etc. Reference library, etc. 12-4-8 punches Any blank space in the above fields is filled in by a slash ( j). Creation of book cards (those used in circulation transactions) from the master cards is explained in file updating procedure. The Booth Master File on the disk is arranged in the indexed-sequential method ( 5) by the first ten characters of the call number and by the On-Line Circulation System j RAO and SZERENYI 91 accession number, which has a fixed length of six characters. Each record in this .file is 124 b ytes long. The layout for a record is given below: Field Byte Positions Explanation OS control 1 Call number 2-11 Accession number 12-17 Call number 18-27 Edition, year, series 28-30 Volume number 31-34 Part, index, supplement number Copy number Location code Author Title Control byte Number of check outs Status of book Borrower Social Security # Borrower status Due date Format code Save Social Security # Save type Save status Unused bytes 35-37 38-39 40-41 42-51 52-78 79 80-82 83 84-92 93 94-99 100 101-109 110 111 112-124 First ten characters Remainder of call number Cumulative number of check outs In or out !=student, 3=faculty, etc. Oversize, etc. SS# of the patron that requested save !=student, 3=faculty, etc. Is there a save or not? For future use Average access time of a record from this file is 75 milliseconds. As a security measure, a copy of the Booth Master File is kept separately on a magnetic tape, from which the disk file can always be recreated. OPERATION Updating The Booth Master File is updated nightly with records of the new books acquired by Booth Library. After processing in the catalog depart- ment, the new books go to the keypunch section, where a master card is punched and verified for each new book. The master cards are then sent to the Computer Center, where each master card is reproduced and inter- preted into two book cards. The layout of the book card is identical to that of the master card with two exceptions. A 'T' is punched in column one as a transaction code for the system, and the end-of-card code is moved 92 Journal of Library Automation Vol. 4/2 June, 1971 to column 19 to expedite transaction processing, accession number and class number being adequate for locating a book record from the Booth Master File. The identification data appearing on the book card is similar to that on the master card. However, during the interpretation process printing on the book card is rearranged in a format more suited to visual veri£cation (Figure 2) ' II II I Fig. 2. BLOC Book Card. I I I II I I I I After interpretation the book cards are run through a stamping process. In this process the machine reads the accession number from each card and stamps the number on the back of same card, across the 3)~-inch dimension of the card and near the top. This allows the circulation staff to compare the accession number with the number stamped on the book pocket, to insure that the card is put in the right book. After being stamped the book cards are sorted into two identical decks and sent back to the Library's keypunch section. One deck of the cards is put into book pockets and the books are shelved in the stack area ready for circulation. The second deck goes to the circulation department for interfiling in call number sequence into a duplicate book card file. Cards from this file are used as replacements for the original book cards in the book pockets as needed. Whenever a card is removed from this file, the information is noted on a special card, so that another duplicate can be punched and placed in the file for future use. Late at night, after the Library is closed, the master cards received on that day in the Computer Center are used to update the Booth Master File before the new books are put into actual circulation. Transaction P1'ocessing Circulation transactions are processed through the IBM 1030 Data Collection System, whose configuration consists of two 1031 card badge On-Line Circulation SystemjRAO and SZERENYI 93 readers, one 1033 printer and one 1034 card punch. The entire 1030 system is controlled by a 2701 data adapter unit (Figure 1). If the computer is not in the on-line mode, the 2701 routes the transaction information to the 1034 card punch to be punched into cards that will be used later to update the disk files. The 1030 system transactions are monitored by a special program called "1030 Analyzer". This program is written in Basic Assembler Language ( BAL) and has its own partition of about 50K in memory. The 1030 Analyzer controls five overlay programs which actually process the transactions and make necessary file modifications. Each overlay is a segment of the transaction processing program and processes a specific routine, such as determining type of patron and loan period, calculating due date, etc. When the information for a transaction is transmitted to the 1030 partition, the 1030 Analyzer determines which overlays are needed to process the particular transaction and calls those overlays. The overlays access the required records from the Patron and Booth Master Files and do the necessary processing, then the master records representing the latest transaction information are written back in their storage locations. The 1030 Analyzer program and the associated overlays were written locally for BLOC. Check-Out Each person associated with the University who is eligible to borrow books from Booth Library is issued a badge by the appropriate adminis- trative office. A patron is expected to present this badge, along with the books he wishes to check out, to the circulation desk. Though transactions can be processed without the badge, this is done only in exceptional cases. Into each badge are punched the person's Social Security number and a one-digit status code to indicate student, faculty, etc. The badge reader in the terminal reads the Social Security number and transmits it to the system, which interprets it as the record address for the particular person in the patron file and takes the necessary information from that address. The status code enables the system to determine the loan period. After receiving the books to be checked out and the badge from the patron, the circulation attendant first compares the accession number stamped on the book pocket against the accession number stamped on the back of the book card. If the numbers match, she proceeds with the pro- cessing; otherwise she first pulls the right book card from the duplicate book card file before proceeding. Mter comparison of accession numbers, the badge is inserted into the badge slot on the 1031 terminal. The reset switch is set to "non-reset" to charge out more than one book to the same patron. The book card is then fed into the card input slot, face down, notch edge first. If the terminal is not able to read the card the first time a "repeat" light comes on, in which case the card is taken from the exit slot and fed into the input slot again. If the "repeat" light comes on more than twice for the same card, 94 Journal of Library Automation Vol. 4/2 June, 1971 it is assumed that there is a punching error in the card and the transaction is completed by manually recording the necessary information on a special card that is later punched and used to update the disk files. If the terminal reads the card without any problem the "card" light comes on, indicating that the terminal is ready to process the next card. The attendant takes the book card from the exit slot and puts it in the book pocket, then stamps due date on the date due slip for a student check-out, or inserts a pre- stamped date due card for a faculty check-out. When all book cards for one patron have been run through the terminal, the badge control switch is set to "re-set," which releases the badge to be returned to the patron. If the transaction is not normal, the deviation is communicated to the attendant by the system on the 1033 printer with one of the following messages: Terminal Address Message Ace.# Class Patron Code 1) "#TERMINAL-S NO MASTER RECORD 137335 9197 3233815871" Class# Ace.# 2) "#DA428-H21 001253 Message YOU JUST TRIED TO CHECK OUT -IS ALREADY OUT-CHECK IT IN AND TRY AGAIN" The first message is given for a book that got into circulation before its master card was loaded onto the disk file by the Computer Center; in this case the transaction is completed with a special card through manual recording. The second message is given for a book that has not gone through the check-in process upon arrival in the library from a previous check-out; here the attendant simply checks in the book, then checks it out again. Check-In At frequent intervals books deposited in return bins are placed on a truck and taken to a terminal to be checked in. The check-in badge is inserted into the badge slot and the reset switch is set to non-reset mode. When a book is taken from the truck the accession number on the back of the book card is compared with the accession number on the book pocket. If the card is the right one, it is then run through the terminal and replaced in the book pocket, which completes the check-in process for a book. If the book card is not the right one or is missing from the pocket, the transaction is completed using the book card from the duplicate file. Circulation List Each night after the library is closed a cumulative circulation list is printed giving all books checked out up to the closing hour of 11 p.m. Two copies of this list are delivered to the circulation department the next morning. One copy is placed at the card catalog to enable patrons to find out whether books they want are in or out. The second copy is kept at the circulation desk for staff use. On-Line Circulation SystemfRAO and SZERENYI 95 This list, printed in call-number order, sho:ws the identification data of the book, its due date, the patron's Social Security number and his status. For faculty and special badge (mending etc.) check-outs, the transaction date is printed instead of the due date. At present a faculty member can check out a book for a whole academic year. However, the circulation librarian may recall a book from a faculty member after thirty days if it is needed by another patron. The transaction date helps the circulation librarian to recall books in accordance with this policy. Since the loan period for special badge check-outs can not be predetermined, the trans- action date is printed for these check-outs. The circulation list acts as a back-up to permit circulation staff to answer questions when the system is not in the on-line mode. When the system is in the on-line mode the circulation list reduces the demand on the 2260 terminal inquiries during peak periods. The circulation list printing will be eliminated when two more 2260 terminals become available for the use of the system. Transaction File Each check-in and check-out transaction· is recorded on a tape in addition to being on the Booth Master File on disk. This transaction tape is used to generate the circulation list. It also enables restoration of the disk file should something happen to the latter. The transaction tape is cumulated into weekly, monthly and annual tapes to generate a variety of statistical reports and also overdue lists. Batch Processing Most of the time this system operates in an on-line, real-time mode. Occasionally it has to operate in batch mode because of some mechanical malfunctioning. In this mode the computer system is not able to service the circulation system. Consequently the 2701 Data Adapter Unit routes the transaction information to the 1034 card punch (Figure 1) to receive data and punch this data into cards in a pre-designated format. For each transaction conducted in this mode the 1034 card punch punches one card with appropriate data; this is later used to update the transaction file. In this mode the circulation staff cannot make on-line inquiries and cannot get guidance from the system in case of errors. To alert them there is a light on the terminal that Hashes whenever the system operates in this mode. In case of a complete breakdown of the system, transactions are pro- cessed manually using special cards. Later, information from these cards is punched in 1034 card format and the files are updated. During the two years of its operation the system went into this mode only once for two hours because of engineering difficulties. Over dues A cumulative overdue list is printed once a week listing books overdue on that date. It shows identification data of a book and the address of its 96 Journal of Library Automation Vol. 4/2 June, 1971 borrower. For each overdue book a mail notification card is also printed, addressed to the borrower and containing identification data for the book. When an overdue book is checked in, the system prints out a message for the attendant. For example the message "LB1051-S62-131313 CHECKED IN BY 320-46-0785 1 WAS 20 DAYS OVERDUE" means that the book identified as "LB1051-S62-131313", brought back by a borrower whose number is "320-46-0785", was 20 days overdue. Mter check-in the overdue book is turned over to a clerk for necessary action. Personal Reserves A patron wishing to obtain a book that is checked out places a reserve on the book at the circulation desk. Reserves are placed in on-line, real-time mode in the BLOC system. The circulation attendant merely keys in the identification data of the book and the requestor, along with the reserve code, using the 2260 display terminal. This information is sent to the system in the following order: 1) Start symbol indicating the beginning of an inquiry. 2) Inquiry code "BR" (for Booth Reserve). 3) Identification data of the book (call and accession numbers). 4) Identification data of the requestor (Social Security number) . 5) Requestor's status code (No.3 for faculty members, etc.). 6) End of the message code ( "_" underscore ) . When this information is entered, the system places the book on reserve for the requestor and displays the necessary information on the screen for the attendant's visual verification. Whenever a book on reserve is checked in, the system prints a message such as "QA76-5-F34-182929 IS SAVED FOR 138-32-0044 3." Alerted by this message, the attendant places the book on the reserve shelf and notifies the requestor, usually by telephone. Meanwhile, if another person inad- vertently tries to check out the book, the system prints the message as "QA76-5-F34-182929 IS SAVED FOR 138-32-0044 3 DO NOT CHECK OUT." If the requestor cancels his reserve on the book, it can be taken off reserve status by sending the appropriate code and identification data of the book via the 2260 terminal. On-Line Inquiries One of the main advantages of BLOC is that it enables the Library to obtain answers to a variety of questions in seconds. The circulation staff can tell easily the status of any book, and can obtain the list of books bor- rowed by a patron. On-line, real-time inquiries can be made on this system using the 2260 display terminal. The 2260 inquiry processing is controlled by a special program called the 2260 Analyzer. This program is written locally in PL/I and has its own partition (about 95K) in the memory of the computer. Altogether it On-Line Citculation System j RAO and SZERENYI 97 services thirteen terminals located at various places on the campus. Only two of these terminals accept the circulation inquiries: the master terminal in the Computer Center and the terminal at the circulation desk. The rest are used in connection with the other computer applications on the campus. When a circulation inquiry is transmitted to its partition, the 2260 Analyzer determines the type of inquiry and calls in the appropriate over- lays (at present there are 20 ) to access the needed records from the files and to process the inquiry, and then send the response back to the inquiry originating terminal. After processing, the records representing the latest modifications, if any, are written back in their previous storage locations. Inquiry response time is less than a second. To know how to make a certain type of inquiry all one has to do is key in the letters "IN" onto the screen and enter them into the system, then the system displays formats for various types of inquiries on the screen. This feature enables new operators to make inquiries on the terminal with minimum training. The reserve and clear inquiries have already been explained. The other circulation inquiries include: name, student or em- ployee master file, book display, book scan, and unclear. Name Inquiry The Social Security number of a patron may be obtained by keying in his last name preceded by code letters "NA"; if unsure of the spelling of the d esired last name the operator merely keys in a part of the last name. When either a name or segment thereof is entered, the screen displays twelve names in alphabetical order (beginning with the last name or part of the last name entered) along with corresponding Social Security numbers. If the desired name is not within these twelve, the operator can get the next twelve by pressing the "next" key. This procedure may be repeated until the desired name and corresponding Social Security number are located. She can then select that Social Security number and enter it into the system to get the address of that person. Student and Employee Master File Inqui1·ies These inquiries are being made to find the addresses and telephone numbers of patrons as needed by the circulation department. Whenever a person's Social Security number, preceded by code letters "SM" (for stu- dents ) or "EM" (for employees ), is entered into the system, it displays his campus and home addresses and telephone numbers. Book Display Inquiry This inquiry enables the circulation staff to know the status of any book within the system. When the call number and accession number ( which is usually obtained from the duplicate book card fil e at the circulation desk), preceded by code letters "BD", are entered through the terminal, the system displays the following information on the book: call number; acces- 98 Journal of Library Automation Vol. 4/2 June, 1971 sion number; copy number; author and title; status, as checked in or checked out; if checked out, when; how many times it has been checked out so far; if checked out, the name and address of the person who has it; if on reserve, name and address of the reserve requestor. Book Scan Inquiry Through this inquiry the books in a given class can be scanned, one after another. Whenever a class number, or part of it, preceded by code letters "BS" (for book scan), is entered into the system, it displays the information about the first book in that class; then, by pressing the "next" button on the terminal keyboard, the operator can have displayed information about the next book in that class. This procedure may be repeated as many times as necessary. This class access method is a very important feature of the BLOC system; through it, one may discover rather quickly what books are available in the Library on a given subject, and simultaneously it can be found whether a book is in the Library or checked out. In this inquiry mode the system also keeps track of how many books are scanned for a given class and displays this information on the screen. The difference between the "BD" and the "BS" inquiries is that the "BD" inquiry is made when information about a specific book is needed and when its unique record address (call and accession numbers) is known. The "BS" inquiry is made when only part of the record address (such as class number portion of the call number) is known and when browsing through a given class is desirable. Unclear Inquiry The University Library has to clear withdrawing or graduating students, and leaving employees. This is very easily accomplished through this system, by the operator's merely entering the patron's Social Security num- ber, preceded by the code letters "BU" (for book unclear), into the system. If the patron has no books out as of that minute, the system displays "PATRON XXX-XX-XX~:X HAS NO BOOKS CHECKED OUT." Other- wise the system displays the call and accession numbers for books checked out and not yet returned by the patron. The system can display up to ten titles at a time; if the patron has more than ten books out a "continue message" appears at the end of the top line on the screen, and otherwise a "final message" appears. DISCUSSION Benefits gained by the circulation department have already been dis- cussed. Following are benefits gained by Booth Library as a whole: 1) Booth Library can now provide subject listings, arranged in call- number order (sorting and printing takes only a few hours) as required by various academic departments. The listings have been extremely helpful in pointing out the Library's resources to various accreditation committees. On-Line Circulation SystemjRAO and SZERENYI 99 2) Physical book inventory taking was greatly facilitated by printing the Booth Master File in segments and with indication whether a book was in or out at that time. 3) Periodic listings of books charged out to special badges, as Binding, Lost, etc., have been printed to facilitate follow-up activities by the respective departments. 4) The Booth Master File acts as a security back-up to the shelf list. Should something happen to the shelf list it could be recreated from the Booth Master File within two days. Similarly, if need arises, the depart- mental shelf lists can be created overnight. 5) The Library Committee can now make book budget allocations on a more scientific basis by reviewing the annual statistical reports, which give more accurately than before the volume of circulation in various subject fields. In addition to the above benefits there are interesting possibilities for doing a variety of things, of which only a few are mentioned below: 1) Periodic listings of new books received, on the basis of area of interest, can be printed to provide selective dissemination of information service. 2) Since both students' and circulation records are in machine readable form, a variety of research tasks could be undertaken with readily available data to find out the reading habits of students at academic level and at age level. These reading habits could be correlated to academic achievement with the aid of data in the student records . 3) When the Booth Library MARC implementation project is completed, most book cards can be generated directly from the MARC tapes. Punching of master cards will be necessary only for those books not entered on MARC tapes. 4) The Booth Master File can be used to create data bases (partially or completely, depending on the size and characteristics of a library) at other libraries for similar applications. Costs Booth Library does not differ from other libraries when an attempt is made to collect data on costs. There are no figures available on planning cost, system design cost, or on writing and testing the programs. BLOC was developed through the collaborative efforts of Library and Computer Cen- ter staff. Many people have devoted time to the planning and development effort, working for BLOC on an additional duty basis. Only two people were hired to work full time for the project: one keypunch operator and one IBM machine operator; their combined annual salary was $9,000.00 in 1968 and in 1969. The machine operator position was terminated at the completion of the basic file conversion in 1969. Present operating costs of the system are not yet available. Since all programmers' and operators' time is devoted to maintaining and operating a number of systems, it is difficult to determine how much the operation 100 Journal of Library Automation Vol. 4/ 2 June, 1971 and maintenance of BLOC costs. So far staff members have been busy improving the performance of BLOC and have not had time to do an in- depth cost study. Howeve r, there are some costs which can be directly charged to BLOC. At present the full time of one keypunch operator and 270 hours of student help, at a total cost of $745.00 per month, are exclusively devoted to BLOC. The breakdown listed in Table 1 gives an estimate of the percentage of use on each of the items of equipment used for Library purposes and the monthly cost calculated from the percentage of use. Terminal cable and magnetic tape costs are not included in the total. Table 1. Equipment Use and Cost. Qty. Item Number Item Description Percentage Proportional of Use by Monthly Rental BLOC Cost Charged to BLOC 2 029 Card Punch 100 $117 1 059 Card Verifier 100 58 1 083 Sorter 1 1 1 088 Collator 1 4 1 519 Reproducer 1 2 1 557 Interpreter 1 1 2 1031 Input Station 100 159 1 1033 Printer 100 76 1 1034 Card Punch 67 206 1 1052 Printer Keyboard 5 3 1 1403 Printer 2 16 1 2050 Central Processing Unit 10 1,188 1 2260 Visual Display 100 41 1 2314 Disk Storage Facility 13 689 1 2316 Disk Cartridge 100 17 1 2401 Magnetic Tape Unit 50 138 1 2540 Card Read/Punch 5 28 1 2701 Data Adapter Unit 67 157 1 2821 Control Unit 5 46 1 2848 Display Control Unit 13 96 TOTAL Monthly Cost of Equipment: $ 3,043 TOTAL Yearly Cost of Equipment: $36,516 It should be pointed out that the costs shown in Table 1 are averaged on the basis of total number of units rented and the amount paid in con- nection with all computer applications, and not on the basis of equipment used by BLOC alone. If the above-listed equipment had been rented for utilization by BLOC alone the rental costs would have been much higher. Moreover, the costs in Table 1 do not include salaries of computer per- sonnel. On-Line Circulation SystemjRAO and SZERENYI 101 Utilization of the BLOC system has not produced any payroll savings. No library position was eliminated by installing it, but it is a certainty that more personnel would have been needed to discharge all duties at the circulation desk in the future without this system. Using the computer allows a 20% increase in loans to be processed without increase in personnel cost. Expansion The capacity of BLOC has by no means been exhausted; its flexibility allows for more innovations, so that every possible circulation need can be met. The utilization of the BLOC system is limited only by the ingenuity of its users. Two new features are to be added to it in the near future. One of these is the installation of an IBM 27 41 communications terminal to generate date-due slips, so that the present method of stamping the due date in a book can be eliminated. The 2741 terminal was decided on for the reason that it can be operated in either on-line or off-line mode, enabling the circulation staff to type date due slips manually when the system is in off-line mode. The second new feature will be installation of an additional 2260 terminal for public use near the card catalog. This terminal will accept only "BD" and "BS" inquiries, and the "BD" inquiry on this terminal can be made by call number alone, which is readily available in the card catalog. Privileged inquiries, such as placing books on reserve, etc., will continue to be the prerogative of the terminal at the circulation desk. This new feature will provide patrons with up-to-the-minute information con- cerning the availability of library materials. Design phase for these new features has been completed and the pro- gramming effort is underway. It is expected that these new features will be added to BLOC by the fall of 1971. CONCLUSION It can be said that a relatively small University library with limited funds can start and develop automated systems if the parent institution obtains a computer for instructional and administrative purposes. This was the case in Booth Library's circulation system. To keep pace with Eastern Illinois University's anticipated growth, it was decided in 1964 to develop a data processing center. It has grown rapidly in terms of services rendered to the University. Its main purpose initially was to serve the academic de- partments, but its services have spread to several administrative functions, such as admissions, student records, registration and personnel services, just to name a few. It was not difficult for the librarians to convince the University's administration of the necessity and usefulness of the computer for library purposes. Relatively little extra expenditure for hardware was needed. Understanding and cooperation from the staff of the reorganized 102 Journal of Library Automation Vol. 4/2 June, 1971 Computer Center helped to develop the Library's circulation system. What was the dream of the librarians a few years ago is now an actual operation, is working well and giving better service to the Library's patrons. The major advantage is the saving of time on all necessary operations. The system also freed the staff from routine manual work. It eliminated the large call-slip files and inevitable human errors in that file. Patrons were freed from filling out call slips, and the circulation staff was freed from the tiresome task of decoding the unreadable "scribbling" of many patrons. Check-out and check-in of books was speeded. There is no longer a line of waiting students at the circulation desk and, on the average, it takes less than five seconds to check out a book. A variety of reports containing computer analysis of circulation records are available at regular intervals. They are an aid to ordering additional copies for heavily used titles and to surveying the collection for weak spots. After more than two years of operational experience, it can be said with confidence that the BLOC system has fully satisfied all its design objectives and even exceeded them by providing some additional benefits that were not in the original planning. REFERENCES 1. Hamilton, Robert: "On-line Circulation System at the Illinois State Library," The LARC Reports 1 (December 1968). 2. Heineke, Charles D.; Boyer, Calvin J.: "Automated Circulation System at Midwestern University," ALA Bulletin, 63 ( October 1968 ), 1249-1254. 3. Kennedy, R. A.: "Bell Laboratories' Library Real Time Loan System (BELLREL), Journal of Library Automation, 1 (June 1968 ), 128-146. 4. Bearman, H. K. G.; Harris, N.: "West Sussex County Library Computer Book Issuing System," Assistant Librarian, 61 (September 1968 ), 200- 202. 5. IBM Corp. Introduction to IBM Systemj360. Direct-Access Storage Devices and Organization Methods. White Plains, N.Y.: IBM, 1967. 5586 ---- lib-MOCS-KMC364-20140103102448 ON THE RECURSIVE DEFINITION OF A FORMAT FOR COMMUNICATION Leonid N. SUMAROKOV: Head, Research Department, International Center for Scientific Information, Moscow, USSR 61 A recursive presentation of a communication format is discussed and a form of pertinent notation proposed. Recursive notation permits presenta- tion of an interchange format in more general terms than heretofore pub- lished, and expands application possibilities. The development of the forms of exchange of information among docu- mentation systems, and particularly the development of the technique of recording machine readable bibliographic data on magnetic tape, has led to the requirement for the adoption of an agreement on a standard for a format for communication. Thus, the problem of a format for communica- tion reflects the existing tendency toward ensuring compatibility among formats. At the present time the greatest impact on world information practice has been caused by the American National Standard Institute (ANSI) Standard for Bibliographic Information Interchange on Magnetic Tape ( l ) and the several implementations of that standard: MARC, INIS, COSATI and others. It should be noted that, despite numerous existing peculiarities, in principle there is no difference in structure among the formats. One of the most important requisites for a communication format is universality. The practice of processing large quantities of information has emphasized the flexibility of the above-mentioned formats; their use has permitted identification of huge numbers of documentary materials in - 62 Journal of Library Automation Vol. 4/2 June, 1971 various forms, thereby creating the impression that the structure of the format has been developed to such an extent that it can be canonized for any application. It must be said that support or rejection of this impression can be based only upon future experience in the application of a communication format. Nevertheless, it appears expedient to generalize about the structure of a communication format by making a few preliminary remarks and thereby contributing toward expanding the sphere of its application. The remarks deal with the following. In the existing systems for inter- changing information on magnetic tape, the document is the object of identification. With the development of data banks the characteristics of the objects to be identified may prove to be so varied, even though presented in the proper documentary form, that their uniform presentation will cause difficulties. (Actually, examples can be given of data banks in which data appear in the capacity of objects : information regarding firms, rivers, information about products of the electrical engineering industry, etc.). Furthermore, even if it is possible to identify in principle a certain object with the aid of the format, one must distinguish between the question of possible identification in principle, and that of the optimal (or rational) form of identification in view of the limitations of a certain system. The recursive notation of a communication format is presented below. Certain definitions and ideas in general are used as source material for such a notation, using the American Standard for Bibliographic Information Interchange on Magnetic Tape ( 1). It must be conceded that the use of one term or another for defining individual elements of a notation, as well as the general structure of the entire notation, are not the principal subject of discussion here; this means that any change, either in definition or, to a certain extent, in the structure of the notation, will not affect the proposed form of the notation. Consequently, this article does not pretend to describe a certain universal structure for a communication format. It has a different purpose, viz., to point out wider perspectives that will unfold by applying the recursive presentation of notations in formats at the expense of an object with any hierarchical depth. For the following symbols explanations can be found in the ANSI Standard ( 1 ) : R=record L=leader Dr= directory T=tag D=data, or data elements FT=field terminator, or field separator RT=record terminator, or record separator The concept TT used below, and standing for tag terminator, is analogous to FT and RT. So also is the concept SF, meaning specific fields for de· D efinition of Communication FormatjSUMAROKOV 63 fining contents that did not appear in the proposed notation although utilized in actual formats. The following symbols are also used : TG=tag generalized F=field DF=data field BF=bibliographic fields Utilization of special notation in brackets (analogous to the form used in algorithmic languages) enables R to be defined in the form of the following consecutive structure: 1) R=[L] [Dr] [SF] [BF] The symbols written in brackets after the equal sign maintain the rela- tionship of priority. Further, the recursive universal tag TG is defined as follows: 2) TG=[T;TT] Such a notation indicates that the expression in brackets is T or TT. The recursiveness of the notation indicates that it is possible that TG is T1T2 ... Tp :TT where p is any whole number, a larger or an equal unit. (Obviously p defines the depth of the hierarchic description in accordance with the given characteristic. ) Finally 3) F=:[TG] [D]; 4) DF=: [F;FT] ; 5) BF=: [DF;RT]. Thus, the general notation of the format is expressed by 1), in which the element BF, which constitutes the basic part of the so-called alternate fields , is expressed recursively with the aid of the system 2) -5 ). As is evident, the quantity F in DF, and DF in BF, as well as in the case of the subscripts TG, can arbitrarily be a whole number, changing from notation to notation. REFERENCE l. "USA Standard for a Format for Bibliographic Information Interchange on Magnetic Tape," 1 ournal of Library Automation, 2 (June 1969), 53-65. 5587 ---- lib-MOCS-KMC364-20140103102946 103 BOOK REVIEWS Libraries in New York City, edited by Molly Herman. New York: Columbia University School of Library Service, 1971. 214 pp. $3.50. This guide to libraries in New York is comprehensive, and the description of each library is thorough. Pages 184 and 185 list libraries in which there are active and significant automation projects. Frederick G. Kilgour COBOL Logic and Programming, by Fritz A. McCameron, Homewood, Ill.: Richard D. Irwin, Inc., 1970. 254 pp. $6.00. This book provides a good introduction to COBOL, although the author implies that COBOL logic is different from ·other computer language logics. However, many examples are included in the text to illustrate new com- mands and there are numerous review questions, exercises and problems in each chapter. The problems of later chapters build on the logical designs presented earlier. Thus, the reader can follow a problem from analysis through solution. The book would be a more useful self-instruction guide as well as textbook if the answers to recall questions and exercises were given. A sound understanding of COBOL should be gained from solving the fairly sophisticated problems at the end of the book. One unique and useful idea is the inclusion of coding sheet, punch card, printout, test data and output facsimiles. The most serious drawback of this book in regard to library automation is its obvious slant toward business applications. While the COBOL com- mands presented are sufficient for most applications, there is no mention of character manipulation commands such as EXAMINE, with TALLYING and REPLACING options. In addition, problems are oriented toward bookkeeping and inventory controls. Valerie ]. Ryder Die Universitii.tsbibliothek auf der Industrieausstellung: 1. Wissen auf Abruf. 16 pp. 2. Dokumentation-lnformation. 16 pp. Berlin: Universitats- bibliothek der Technischen Universitiit Berlin, 1970. No price. This constitutes a report (in two parts) of the contribution of the Library of the Technical University of West Berlin to the official German Industrial Exposition held September 27 to October 6, 1968. The Library's special exhibit was part of a section labeled: "Quality through Research and De- velopment." It attempted to give a synoptic view of modern library proce- 104 Journal of Library Automation Vol. 4/2 June, 1971 dures and their value for improving science library services. The examples demonstrated emphasized document acquisition procedures and the various readers' services. A total area of approximately 600 square feet was divided into two rooms, one showing technical equipment and the other, besides housing a TWX-terminal, was furnished as a reference reading room. The terminal connected the exhibit area with the reference department of the Library of the Technical University. Graphic charts on the walls explained functions of the typical science library in Germany and the kinds of services offered. No fundamental differences from the situation in other Western countries, especially the USA, can be pointed out. It may be mentioned here that West Germany has an efficient organization of Union Catalogs, one for almost every State (Bavaria, Wiirtemberg Baden including Palatin- ate, Hessen, Nordrhein-Westphalia, Hamburg, and West-Berlin). Inter- library loan requests go first to a region's Union Catalog and from there, when the item is traced within the region, to the appropriate lending library, which forwards the item or copy to the requesting library. Non-traceable titles are automatically sent on to a neighboring State's Union Catalog, and so on, until the item is found and sent to the requesting library. Reader/ copier machines for different systems of micronized text material were displayed and could be operated by the visitors. Under the title "document circulation" the application of EDP methods were shown, using machine readable paper tape for borrowing records. The system described was an off-line one, using (presumably daily) lists of the updated circulation master file. Other graphic charts described the automated document re- trieval system installed at the library of the Technical Institute of Delft, Netherlands, and the integrated library system of Euratom in Ispra, Italy, which includes a Selected Dissemination of Information Service. Computer generated bookform catalogs of monographic and serials records of other West German science libraries were on display, together with information dealing with the European Translations Center in Delft, which records all scientific translations and publishes "World Index of Scientific Translations." A film showing the operation of the National Lending Library of Great Britain was demonstrated. Literature analysis, recording, storing, and retrieval are the topics of the second part of the report. Electromechanical documentation methods using punch cards, and more often punch paper tape, with their corresponding machinery for selecting and writing back records, were shown under operating conditions. A computer based automatic information retrieval system, developed by Siemens on the hardware of the current RCA Spectra 70 computer series was also exhibited. The system named "Golem" claims to have some advantages over the MEDLARS I system of the National Library of Medicine. It is operational at Siemens/ EDP Headquarters in Munich. Richard A. Polacsek Book Reviews 105 MARC Manuals used by the Library of Congres~> , prepared by the Informa- tion Systems Office, Library of Congress. 2d ed. Chicago: Information Science and Automation Division, American Library Association, 1970. 70, 318, 26, 18 p. This second edition contains the same four manuals as did the first, issued in 1969, although the titles of some of the individual manuals have been changed. The manuals are: 1. Books: A MARC Format. 4th ed., April 1970 (formerly the Subscriber's Guide to the MARC Distt·ibution Service. 3d ed.) 2. Data Preparation Manual: MARC Editors. 3d ed., April 1970. 3. Transcription Manual: MARC Typists. 2d ed., April 1970. 4. Computer and Magnetic Tape Unit Usability Study. The fourth manual has been reproduced unchanged from the 1969 edition. The third, which contains the keyboarding procedures designed to convert bibliographic data into machine readable form, has been given a subtitle and completely revised to apply to a different keying device, the IBM MT /ST, Model V. It is the first two manuals, however, which will attract the widest con- tinuing study outside of the Library of Congress. Both manuals have been updated. Significant changes from the previous edition of each are indi- cated in the margin by a double asterisk at the point where the revision was made. No indication is made of deletions, however. Thus, users who look for field 652, which was described in the earlier edition, will not find it; nor will they find any instructions directing them to fields 651 and 610, which contain the material formerly placed in that discontinued field, although both 651 and 610 are provided with o o to indicate that they contain new material. Among the additions to the first manual are provisions for Greek, sub- script, and superscript characters, and a revision of the 001 field to take into account both the old and the new L.C. card numbering systems. Among the deletions is the table showing the ASCII 8-bit HEX and 6-bit OCTAL in EBCDIC HEX sequence. The editors' manual contains the procedures followed by the MARC editors in preparing data for conversion to machine readable form. While the first edition of the MARC Manuals contained the first edition of this particular manual, a second edition was issued in July 1969 for internal use within the Library of Congress. This third edition is essentially the same as the second edition with minor revisions such as the addition of examples and clarifying statements, a few new instructions, and corrections of typographical errors. The double asterisks in this manual refer to changes from the second edition, not from the first, so that owners of the first edition will have to make their own comparisons to see where the third edition differs from the first. Among the new, non-asterisked, materials included that did not appear in the first edition are a discussion of other (non-LC) subject headings on 106 Journal of Library Automation Vol. 4/2 June, 1971 pp. 111-114 and of romanized titles on pp. 131-132. The third edition also contains several new appendices covering diacritics and special characters, sequence numbers, and correction procedures. While the editors' manual is designed chiefly for use by the editors at L.C., it has great value for MARC users. In many places it provides an expansion and explanation of material treated much more briefly in the first manual, Books: A MARC Format. Examples of this clarification are the discussion of fixed fields in the editors' manual and its explanation of the alternative entry indicator in the 700 fields, which is merely listed in the first manual. The editors' manual also contains material that does not appear in the first manual, such as the alphabetic alternatives for the numeric tags (which I find more confusing and less memorable than the numeric ones). While only a year intervened between the appearance of the first and second editions of the MARC Manuals, enough changes have been made to make the new edition a necessary purchase for all those actively involved in the use of MARC records. Provision of an index would, however, have facilitated its use. Judith Hopkins Computers in Knowledge Based Fields, by Charles A. Myers. Cambridge, Mass.: The MIT Press, 1970. 136 pp. $6.95. A Joint Project of the Industrial Relations Section, Sloan School of Manage- ment, MIT and the Inter-University Study of Labor Problems in Economic Development. The author has written previously on the impact of computers on manage- ment. In the current study on the implications of technological change and automation he has selected five areas-Formal Education and Educa- tional Administration; Library Systems; Legal Services; Medical and Hos- pital Service; National and Centralized Local Data Banks. In this book he is trying to answer such questions as what needs prompted the use of computers, what are the initial applications and what problems were encountered, what affect does the use of computers have on the work performed and what resistance was encountered to their introduction. He also posed the question: Can anything be said about comparative costs of computer based programs as compared with other programs? The answer appears to be "no" or "not yet." The chapter on libraries deals primarily with Project INTREX and thus fails to give an overview of developments in library systems which are operational. The other chapters offer a review of planned and operational projects as of 1968-69. Stephen E. Furth Book Reviews 107 Libraries and Cultural Change, Ronald C. Benge ( Hamden, Connecticut and London:) Archon Books & Clive Bingley ( 1970 ). 278 pp. $9.00. This work is intended primarily to serve library students as an introduc- tion to a consideration of the place of the library in society, with suggestions for further reading. The author is hopeful that it may be of interest to a wider audience, and it is. Mr. Benge has taught in library schools in the Caribbean, West Africa, England and Wales. This experience is reflected in his approach to a discussion of the social background of library work. Although, as he points out, it is possible to establish connections of many kinds, and libraries might be convincingly connected with witchcraft or the illegitimacy rate or pre- historic man, yet more meaningful connections must be sought, and he has selected not only culture, but cultural change, as the basis. Further, in his several discussions he has tried to commence with the cultural back- ground and then to note the possible implications for librarianship, rather than to follow the more usual method of commencing with libraries and showing the relevance to them of social forces and other institutions. A listing of a few of Mr. Benge's fourteen chapter-headings will suggest his development of his theme: "The Clash of Cultures", "Mass Communi- cations", "Censorship", "The Impact of Technology", "Philosophies of Librarianship". Each chapter is an urbane essay in the editor's easy-chair manner, a monolog in which the author introduces the reader to that part of the universe that can be viewed through the arch over which the par- ticular chapter-title is inscribed, and relates it to the work of the library. Mr. Benge is infmmative ( he is up-to-date on all manner of matters; e.g., he has been reading Library College and he knows about High John), he is occasionally witty and often convincing. As the basis for class-room discussion his work is perhaps also as stimulating as a propaedeutic should be, but lacking such discussion I doubt this attribute. I find that to stimulate, a book must organize the field of discussion. For me Mr. Ben~e fails to do so. I find his essays agreeable, with occasional bons mots ('Young people, like books, must be preserved for the future "; "Guinea pigs are happy creatures") but, like other conversational literature, it leaves me with a general euphoria but unsatisfied logic. For example, the final chapter ("Philosophies of Librarianship" ) starts out bravely by questioning the relevance of theory but concludes feebly that what is needed to explain librarianship is perhaps a new integration of traditional custodial principles, the missionary approach, and the rationale of a personal reference service. References from other than the Anglo-American culture-sphere are few; the book would have gained greatly from more. We here in JLAUAY are naturally interested to hear what Mr. Benge has to say on "The Impact of Technology". In this chapter-regrettably- he abandons his method of social background first and relevance for libraries 108 journal of Library Automation Vol. 4/2 June, 1971 afterward, and simply notes the direct impact of technology on libraries, mainly in the UK. He concludes that "There can be no doubt that the information crisis does exist and that traditional reference or retrieval methods have not solved it. There is chaos, duplication and waste. What I have tried to suggest here is that on the evidence to date, we cannot yet be sure that machine retrieval is the answer" ( p . 175). There are misprints, to be sure, neither unusually numerous or serious, with one exception. Dr. Vannevar Bush's name (p. 182) has been mangled, and is, moreover, omitted from the Name Index. Verner W. Clapp Serial Publications in Large Libraries, edited by Walter C. Allen. Urbana, Ill.: Graduate School of Library Science, University of Illinois, 1970. 194 pp. $4.50. Handling of serial publications was the topic of the sixteenth Allerton Park Institute held in November 1969; the papers are published in this slim volume. Almost every paper offers a number of controversial and provoca- tive ideas which must have evoked interested and interesting reactions. The subsequent discussions are not reported. Problems of serials-the librarian's basket of snakes-are identified and analyzed from selection and acquisition through check-in, cataloging, bind- ing, shelf arrangement, abstracting and indexing, to machine applications. The papers cover this gamut well and in most cases provide a good view of the state of the art. Recurrent themes are the significant role of serials in today's information flow, the urgency of the problems (though the content is long on agony and short on therapy), and the necessity for bearing in mind the user's rather than the librarian's convenience where both cannot be accommodated when reaching for solutions. Donald Hammer's paper on computer aided operations provides a good introduction and overview of automated serials systems, with some helpful hints to beginners in the field. Microfilm technology and machine reada.ble commercial abstracting and indexing services are touched on by Warren Kuhn and Bill Woods, but each topic deserves more thorough treatment in separate papers. Too few of the speakers proposed specific research in their areas; where such long-standing problems exist, some well-directed suggestions might elicit useful studies. The book should be useful to library schools as good coverage of a seldom detailed problem operation, to librarians entering the challenging maelstrom of serials handling, and to those already overinvolved who might be refreshed by the longer view. The poor proofreading is a minor flaw. Mary jane Reed Book Reviews 109 Training in Indexing: A Course of the Society of Indexers, edited by G. Norman Knight. Cambridge, Massachusetts: The M. I. T. Press, 1969. 219 pp. $7.95. To this reviewer, who h ad struggled through the compilation of one annual index to the Journal of Library Automation with the aid of scarce, unrelated, and out-of-date books and periodicals on the subject of indexing, this thorough, well-written volume, aimed at the neophyte indexer, came as a godsend. It comprises a series of lectures, by master practitioners of the craft, sponsored by the Society of Indexers. That authors and audience were chiefly British d etracts not a whit from the book's usefulness to Americans. Two introductory chapters b y Robert L. Collison on the elements of book indexing are followed by twelve on specific treatment of those ele- ments and of different types of material. Chapters on indexing p eriodicals, scientific and technical material will particularly interest readers of lOLA. Exercises, a selected bibliography, and an index that also serves as an illustration of points in the text, enhance the usefulness of this book to the beginner. It should be equally useful to an indexer of no matter how much experience, for , as Collison emphasizes in his opening statement, indexing is still in an elementary stage, there are no common rules on which all indexers agree, and everyone considers himself his own authority on how an index should b e arranged and what should go into it. In treating a subject that might seem to the layman to lend itself all too readily to the cut-and-dried approach, the authors have brought a delightful measure of flexibility, wit and imagination. At no point do they lose sight of the fact that the indexing of books, like the writing of them, is a very human endeavor. Eleano1· M. Kilgour Reader in Library Services and the Computer, edited by Louis Kaplan. Washington, D. C.: NCR Microcard Editions, 1971. 239 pp. $9.95. This volume contains a couple of dozen reprints, mostly of articles. The Reader is not intended for those doing research and d evelopment in library automation, but rather for librarians and library students who wish to familiarize themselves with the subject. The quality of the articles is high. In general, they present a conservative position, which is not to say that they oppose library automation. Rather, they inform the reader of positive action to be taken and in so doing impart understanding. Within this conservative fram ework, however, various view- points are expressed. Seven subjects group the articles: The Challenge ( three articles); Varie- ties of Response (six ); Theory of Management (one); New Services (six) ; Catalogs and the Computer ( two ); Copyright (one ); and Information Retrieval Testing ( six ) . The R eader is not a book in the sense that a book 110 Journal of Library Automation Vol. 4/ 2 June, 1971 contains a central theme. It is likely that the R eader will be used for its sections rather than in its entirety, but that is the manner in which one expects to use a reader. Anyone who so uses it will be enlightened. The Reader has but one serious shortcoming. It is devoid of an index. This deficit will seriously hamper consultation of the book. Frederick G. Kilgour Automation Management: The Social Perspective, ed. by Ellis L. Scott and Roger W. Bolz. Athens, Ga., Center for the Study of Automation and Society, University of Georgia, 1970. (Second Annual Georgia-Reliance Symposium) $5.75. Sixteen papers are presented at this symposium by a variety of authors from labor, management, academe, etc. As in all collections of papers, they are uneven in quality. The preface of the symposium states that the "1970 Symposium focused on the problem of automation management, from a social perspective, as it relates to industry, education, labor and govern- ment." The papers reflect ideas concerning the need for training and retraining, and for preparing people for automation by having them participate in the decision-making process. Three papers on the effects of automation use economic analysis based upon the Gross National Product and other labor and business indicators and find that the changes predicted for auto- mation in terms of joblessness and increased productivity are unfounded, although some questions are asked about the validity of the figures used to make these assumptions. There are interesting formulations on the nature of change and innovation and the time lag between basic research and industrial application. Gordon Carson's paper expressly attacks the issue of automation in libraries and in education. Dr. Carson sees one of the problems as the library's print media orientation when the other senses, such as hearing, could also be used. Libraries are also attacked on the basis of how they measure effectiveness, i.e., the number of volumes on the shelf, rather than "the speed with which information can be retrieved from that library and placed in the hands of him who needs to use it." This methodology for measuring effectiveness is changing presently, so that the need expressed by Dr. Carson may be met. In conclusion, Dr. Carson states that there are "three essential areas in which automation can be exceptionally helpful in higher education. These are as follows: 1 ) Improved teaching techniques including auto- didactic learning systems; 2 ) registration, fee payment and curriculum planning .. . ; 3) libraries-information retrieval." Although in a way many papers in this volume skirt the periphery of the effects of change and how to create it, it is worthwhile reading on the whole. Henry Voos Book Reviews 111 Interlibrary Loan Involving Academic Libraries, by Sarah Katharine Thom- son. Chicago: American Library Association, 1970. (ACRL Monograph, 32). viii/127 pp. $5.00. Interlibrary Loan Pmcedure Manual, by Sarah Katharine Thomson. Chi- cago: American Library Association, 1970. xi/116 pp. $4.50. Interlibrary Loan Involving Academic Libraries is a summary version of "a normative survey of current interlibrary loan practices in academic libraries in the United States." It makes surprisingly compulsive reading for anyone who has worked much with interlibrary loans, and might be an eye-opener for those who haven't. (The original, complete version appeared in 1967 as a Columbia University DLS dissertation.) Much of it documents or corroborates the feelings (or suspicions) of busy, experienced interlibrary loan staff; some of it is new and surprising; and doubtless many of the same patterns and trends hold true today. Dr. Thomson, working primarily with data reported by academic libraries to the U.S. Office of Education in 1963-64, results of intensive analysis of a sample of 5895 interlibrary loan requests (drawn from a total of 60,000 received by eight major university libraries in 1963-64 and 1964-65 ), and information from several question- naires, presents a clear picture of who borrowed what from whom, how often; staffing and time required; distribution patterns of requests by size and location of library, type of reader; sources of difficulty, delay and failure; factors predictive of fast and efficient service; and a number of other variables. Her results and conclusions are presented clearly, with supportive or illustrative statistics, graphs, correlations, and other tables. Chapter 14 offers recommendations of librarians for increasing the pro- portion of interlibrary loan requests fill ed. Suggestions and recommenda- tions resulting from Dr. Thomson's study were incorporated in, or influenced the drafters of, the 1968 National Interlibrary Loan Code, the model regional or state code, and the 1968 interlibrary loan request form. Dr. Thomson estimates that interlibrary loan requests involving academic libraries are well over the million mark by now, and refers to a 1965 study which reports large libraries estimating they are unable to fill about one- third of the requests they receive. It is to be hoped that some of the worst faults in interlibrary loan requests have been mitigated by the revised codes, revised form s, and better education of interlibrary loan assistants. The new procedures manual should help, too. Perusing this monograph should foster greater awareness and understanding of the dimensions and problems of interlibrary loan service. Now, if only we had an up-to-date cost study .... Who profits from the appearance of the Interlibrary Loan Procedure Manual? Not merely ILL novices, whether new clerical assistants or young librarians faced with setting up, reorganizing, or streamlining inter- library loan routines. It has value for the old ILL hand, checking up on established routines to be sure no sloppiness has crept in ; for the library school student, as an early exposure to good library cooperation manners, 112 Journal of Library Automation Vol. 4/2 June, 1971 as well as a basic step-by-step indoctrination in "how to do it"; for recipient libraries, whose time and patience would be much less strained were all requestors to follow these elementary, commonsense, too often ignored recommendations; and last, not least, the library's patron, whose needs will be filled faster, more economically, with fewer false starts. A wealth of practical detail has been packed into these pages-a plethora of detail, some might complain, confusing the beginner and boring the experienced. But a procedure manual by definition tries to incorporate every stroke and serif of A to Z. Simple solutions to that complaint are re-reading, and/or judicious scanning. The manual includes annotated texts of the 1968 National Interlibrary Loan Code and the Model regional or special Code; primer-type instructions for borrowing and requesting librar- ies (including concise sections on special puzzlers such as academic theses, government publications, technical reports, materials in non-roman alpha- bets); and consideration of related, often problematical areas such as photocopy, copyright and reprinting, location requests, teletype requests, purchase of dissertations, and international loans. Useful appendices (e.g., sample forms, some library policy statements, the text of the IFLA Inter- national Loan Code), a bibliography and a detailed index complete the work. Chapter levels vary of necessity. For the novice, the teletype request chapter may seem too brief or confusing, yet several appendices (for instance ) will be of interest even to the seasoned ILL assistant. Through- out, the effort has been for clarity, coverage, explicitness. The cost of an interlibrary loan transaction is too great to indulge sloppy, inefficient, or idiosyncratic procedures, and this manual is therefore required reading for all involved in interlibrary loans, and a copy should be at the elbow of every new clerical assistant. Elizabeth Rumics 5588 ---- lib-MOCS-KMC364-20140106083504 THE RECON PILOT PROJECT: A PROGRESS REPORT OCTOBER 1970-MAY 1971 159 Henriette D. AVRAM and Lenore S. MARUYAMA: MARC Development Office, Library of Congress, Washington, D. C. Synopsis of three progress reports on the RECON Pilot Project submitted by the Library of Congress to the Council on Library Resources covering the period October 1970-May 1971. Progress w reported in the following areas: RECON production, foreign language editing test, format recogni- tion, microfilming, input devices, and tasks assigned to the RECON Working Task Force. INTRODUCTION With the implementation of the MARC Distribution Service in March 1969, the Library of Congress and the library community have had available in machine readable form the catalog records for English language mono- graphs cataloged since 1969. Most libraries, however, also need to convert their older cataloging records, and the Library of Congress attempted to meet these needs by establishing the RECON Pilot Project in August 1969. During the two-year period of the pilot project, various techniques for conversion of retrospective bibliographic records have been tested, and a useful body of catalog records is being converted to machine readable form. The pilot project is being supported with funds from the Library of Congress, the Council on Library Resources, and the U.S. Office of Educa- tion. Earlier articles in the Journal of Library Automation have described the progress through September 1970 ( 1, 2, 3 ). This article covers the period October 1970 through May 1971. 160 Journal of Library Automation Vol. 4/3 September, 1971 PROGRESS-OCTOBER 1970 THROUGH MAY 1971 RECON Production The conversion of 8476 records in the 1969 and 7-series of card numbers that had not been included in the MARC Distribution Service was com- pleted, and these records were sent to 47 subscribers of the MARC Distri- bution Service. The subscribers were not charged for these records but were asked to send a tape reel to the Library for the duplication process. At present, the RECON data base consists of 25,206 records in the 7, 1969, and 1968 series of card numbers. Records in the 1968 series that were part of the data base for the MARC Pilot Project are being converted by program from the MARC I format to the MARC II format, proofed, and updated. To date, 7551 out of 7583 MARC I records have been processed. Prior to the implementation of the MARC Distribution Service, records were input for test purposes, and the resulting practice tapes contain data requiring correction or updating to correspond with the present specifications of the MARC II format. Of the 8340 titles on the practice tapes, 3460 have been updated and reside on the RECON master file. These updated machine readable records will be distributed with the RECON titles in the 1968 card series. Foreign Languages Editing Experiment A foreign language editing experiment was conducted to test the accuracy of MARC/RECON editors in editing French and German lan- guage records. Records used for this test included 1180 of the 5000 RECON research titles. At least 50 percent accuracy was expected since half of the task of editing a MARC record involves being able to read the language of the record. The other half involves identifying the data elements by their location in the record. The three editors used in the experiment had studied French in high school, one having had an additional year in college; none had studied German. Each editor was required to edit approximately 200 records in each language. Statistics on the number of records edited per hour and the number of errors made, when compared with the same editors' statistics for editing English language records, showed that each editor maintained an approx- imately equal rate of speed in editing foreign language records as in editing English. The error rate for each editor, however, was more than tripled on foreign records, and each made approximately as many errors in French (the language studied) as in German. Each editor averaged more than 12 errors per batch in French and 12 in German. Since the MARC Editorial Office has established a standard of 2.5 errors per batch ( 20 records comprising a batch ) as being acceptable for trained MARC editors, this error rate would have to be lowered in a production environment. The majority of errors occurred in the title field, which is a portion of The RECON Pilot ProjectjAVRAM and MARUYAMA 161 the record that must be read for content in order to be edited correctly. The second largest number of errors occurred in the fixed fields, which are also dependent upon a reading knowledge of the language of the record for accurate coding. The number of errors made in each batch of records by each editor was tabulated to determine if any improvement was made during the course of the experiment. In no case was improvement noted. Statistics were also kept on the number of times an editor consulted various sources for help: e.g., dictionaries, the editing manual, the LC Official Catalog, the reviser, or a language specialist. Dictionaries were consulted frequently, and the reviser and language specialists rarely. Typing statistics (number of errors) were also recorded for 181 French and 185 German records. The error rate for typing foreign language material was lower than for typing English. The English language statistics, how- ever, were combined for several typists, and the foreign language statistics were for one typist only. Charts showed that there was no improvement in the number of typing errors made at the end of the test. The primary conclusion drawn from the results of the experiment is that in order to edit foreign language records with an acceptable degree of accuracy, it would be necessary for the editor to have a good knowledge of the language as well as the editing procedures. F orrnat Recognition Format recognition is a technique that allows the computer to process unedited bibliographic records by analyzing data strings for certain key- words, significant punctuation, and other clues to determine proper identifi- cation of data fields. The Library of Congress has been developing this technique since early 1969 in order to eliminate substantial portions of the manual editing process, which in turn should represent a considerable savings in the cost of creating machine readable records. The RECON report, which was written prior to the completion of the first format recognition feasibility study, concluded that "partial editing combined with format recognition processing is a promising alternative to full editing." ( 4) Since that time, the emphasis in the deve1opment of the programs has been shifted to no editing prior to format recognition pro- cessing. The programs are in the final stages of acceptance testing, and it is expected that 75% of the records can be processed without errors created by the format recognition programs. Preliminary estimates show that it takes approximately half a second of machine time to process one record by format recognition ; the manual editing process, on the other hand, takes approximately six minutes per record. The total amount of core storage required is approximately 120K: 80K for the programs and 40K for the keyword lists. Although the keyword lists are maintained as a separate data set on a 2314 disk pack, they are loaded into memory during processing. The format recognition programs have been written 162 Journal of Library Automation Vol. 4/3 September, 1971 in Assembler Language for the Library's IBM 360/40 under DOS. The logical design of the format recognition process, with detailed flow charts needed for implementation of computer programming, has been published as a worki~;tg document by the American Library Association so that the technical content would be available to assist librarians in their automation projects ( 5). Workflow for format recognition begins with the input of unedited catalog records via the MT /ST following the typing specifications created for format recognition. Mter being processed by the format recognition programs, these records are proofed by the editors (the first instance in which they see the records), and the necessary corrections or verifications made. Correction procedures for format recognition records are the same as those used for regular MARC records. Figures 1, 2, and 3 are examples of the printed card used for input, the MT /ST hard copy, anq the proofsheet of the record created by format recognition. Initial use of the format recognition programs is for input of approx- imately 16,000 RECON records in the 1968 card series. Input of current MARC records via format recognition will begin at a later date. RECON records were chosen for large-scale testing because they are not required for an actual production operation such as the MARC Distribution Service. In addition, work has begun on the expansion of format recognition to foreign languages. Analysis is being done on German and French mono- graph records, and eventually Spanish, for new or expanded keyword lists and some changes to the algorithms. Ewart, Andrew. The world's greatest Ion' n If airs. London. Odhu m~. Hl(;j ti. e. 19681• 287 p. 8 plates, lllus .. ports. 2~ em . 20/- ( n 68-. Library of Congress 0 301.41'4'0922 liR-97457 HQ80l.A2EO Fig. 1. Input for Format Recognition. The RECON Pilot Pro;ect/AVRAM and MARUYAMA 163 HQ80l.A2E9 Ewart, Andrew The world's greatest love affairs.#London, Odhams, 1967 [i. e. 1968]. 287 p. 8 plates, illus., ports. 22 em. 25/- (B68-03757) l.L Love. 2. Biography. I. Title. 301.41/4/0922 68-97457 Library of Congress Fig. 2. MT j ST Hard Copy. 050/ 1 100/1 68-97457 CAL :$ab ---·- -------- -- --·---. ---- MEPS :ta *Ewnrt , 1\ndrew. --------- ------ ---·--------- ---------------- 245/ 1 TILA~ *The world's greatest love affair s. 260/ 1 I MP *abc *London , *Odhams, *1 967 [i.e. 1 968) . ---- ·--- ---------------- -------·-------- 300/1 COL *abc *287 p . *8 p lates , illus., ports , 22*cm . 350/1 PRI *e. 015/ 1 :mrHa *B68-03757 650/ 1 SUT-L*a *Love . ----------- -- --------·- -- -----·--- 650/2 SUT-L*a *Biography. 0 - - --- ------- --- - ----------------------- 08 2/1 DDC*a *301.41 /4/0 922 --;o'Bft~c--=-~ --==~-- ~--~--- - 1 ·~-_--2 -~--~-=i-;_ ~-:-_---;---_ ~ . ~~== c. c. ~- -r11r..~-~1Tl~.b~·-+13~.--~1*~~.--~1~5~.~etmtyr- - -------------·----- - -- -- M-;-s-- 21-.-l%&-H.-------r-3-;-en-!r-Z*;-aef'-2-5.--- --- ------ ----- -- . - - -- -2-6-; --~.-m--~--T9~--'*l-;-----7r.----- Fig. 3. Proofsheet of Format Recognition R ecord. Microfilming For a full-scale retrospective conversion project at the Library of Con- gress, it is likely that records for input would be microfilmed from the Card Division record set and updated from the corresponding records in the Library's Official Catalog. A subset of the record set, such as the catalog cards for a given year, would be microfilmed and then the appro- priate records, i.e., English language monographs, German monographs, etc., would be selected after filming. Costs were calculated for a base figure of 100,000 records for the year 1965, and four different methods of 164 Journal of Library Automation Vol. 4/3 September, 1971 microfilming have been estimated as follows by the Library's Photodupli- cation Service: 1) microfilming for a direct-read optical character reader ( $2000); 2) microfilming for reader/ printer specifications ( $2350); 3) microfilming for reader specifications ( $400); and 4) microfilming for a Xerox Copyflo printout of a card overlaid on a 8 x 10)~ worksheet ( $7000). The differences in cost are primarily attributable to the type of camera used (rotary or planetary) and the kind of feed mechanism (manual or automatic). Other factors need to be considered, such as the fact that film suitable for OCR requirements could not be used on Xerox Copyflo or even for contact printing to positive film. Since a readable copy of the original printed card is necessary for updating and proofing, microfilming for direct-read OCR would not be a viable alternative. Input Devices The monitoring of existent input devices was continued with an investi- gation of Dissly Systems' Scan Data optical character reader. Scan Data has been modified, via software, to read 55 different type fonts which are recognized by a "best compare" technique using six stored fonts to match against the remaining 49. According to the manufacturer, direct-reading is accomplished with approximately 95% level of accuracy. Errors are recorded during a proofing cycle and corrected in the machine readable data base. The Scan Data equipment does not have a transport for a 3 x 5 document, so that a number of 3 x 5 cards must be attached to an 8 x 14 document for scanning, and therefore these cards would not be returned to the Library by the manufacturer. Under these conditions, cards to be read by Scan Data equipment would have to be obtained from stock rather than from the Card Division record set. Unfortunately, many cards are out of stock; and of those that are in stock many may be cards reprinted several times by photo-offset methods and consequently have a poor image. Therefore the use of this device would be severely hampered. Fifty good quality cards were submitted to Dissly Systems for an experi- ment that was run without any modifications to the existing machine and software. Five of the 50 cards were returned to the Library with a matching printout. The results were not encouraging because many lines of text were missed and many characters misread. RECON Working Task Force The RECON Working Task Force has compiled work statements for contractual support for two of its research projects. These projects involve investigations on the implications of a national union catalog in machine readable form and the possible utilization of machine readable data bases other than that of the Library of Congress for use in a national bibliographic store. Preliminary tasks related to these projects have been described in earlier progress reports ( 6, 7). The RECON Pilot Project/ AVRAM and MARUYAMA 165 The first part of the work statement deals with the products that could be derived from the machine readable national union catalog: a biblio- graphic register, indexes by name, title, and subject, and a register of locations. These indexes would provide multiple access points to the records in the National Union Catalog. The bibliographic register will contain a full bibliographic record on each title covered. The indexes will contain partial records which are associated with the full records in the register, and a given index file will carry one or more partial records for every record in the register. For each title in the register, the register of locations lists those libraries where copies of the title are held. The assumption is made that the indexes under consideration will contain the following data elements (the numeric designations and subfield codes are those used in the MARC format fields): Name Index Name ( 100, 110, 111, 400, 410, 411, 600, 610, 611, 700, 710, 711, 800, 810, 811) Short title ( 245) Main entry in abbreviated form Date (fixed field Date 1) Language (fixed field language code) LC card number Register number Title Index Short title ( 130, 240, 241, 245, 440, 630, 730, 7 40, 840) Main entry in abbreviated form Date (fixed field Date 1, or may be omitted if in heading) Language (fixed field language code, or may be omitted if in heading) LC card number Register number Subject Index Subject heading ( 650, 651) Main entry ( 100, 110, or 111) Short title (245) Date (fixed field Date 1) Language (fixed field language code) LC card number Register number The abbreviated form of main entry noted above is to be included in the record of the name or title index unless the name itself is carried in the main entry of that record. It is defined as follows: 1) for a personal name, a conference, or a uniform title heading-subfield "$a" is appended in brackets after the title; and 2) for a corporate name-subfield "$a" plus the first "$b" subfield are appended, within a single set of brackets, after the title. 166 Journal of Library Automation Vol. 4/3 September, 1971 The specific objective of this project is to define and investigate alterna- tive processing schemes associated with an automated National Union Catalog. This study will explore and examine these processing schemes and the following components: 1) Techniques for introducing the necessary input into the automated NUC svstem. The considerations to be covered include the relation- ship to' MARC input, use of the format recognition programs, and the problems of language in terms of selection of input. 2) Techniques for structuring or organizing the data contained in the register and the various indexes to establish and maintain the rela- tionships among the records contained in these data bases. 3) Techniques and procedures connected with the production of the products listed above. This investigation will also cover any selection and sorting procedures necessary. 4) Analysis of the format, i.e., graphic design and printing, size, style, typographic variation, condensation, etc. 5) Examination of alternative cumulation patterns associated with the products of the system. In this connection, items such as number of characters in an average entry, average number of entries on a page, expected rate of increase of number of entries in catalog, and segmentation of catalog are to be taken into consideration. 6) Feasibility of producing a register through automation techniques. If this can be accomplished, further investigation will be directed toward the feasibility and cost of segmenting the register into three sections: one produced from machine readable records (English and whatever roman alphabet language records are in machine readable form); one produced from roman alphabet language records which are only in printed form; and one produced from non-roman alphabet language records which are only in printed form. The costs associated with the various techniques and procedures enumer- ated above as well as with their components will be calculated. From these figures an average total cost per title cataloged is to be determined for each alternative processing scheme. These cost values (one per alternative scheme ) are to be compared with those associated with a purely manual processing scheme. Included in this cost analysis will be the associated costs for different forms of hard copy as well as for the use of COM (Computer Output Microfilm). From any one index and the register of locations, the maximum number of alphabetic and numeric lists (registers of location ordered by register number) will be determined, taking into account ease of usage and technical and economic feasibility. The intent is to have as few lists as possible and still keep the cost within reasonable bounds. Supplements to the indexes should be issued monthly; supplements to the register of locations may be issued monthly or quarterly. The RECON Pilot Pro;ectfAVRAM and MARUYAMA 167 The second project is a continuation of a previous investigation on the possible utilization of machine readable data bases other than that pro- duced by the Library of Congress for use in a national bibliographic store. The results of this project should determine if the use of other data bases is economically and technically feasible. Using three or four data bases selected by the RECON Working Task Force, the study will determine the following: 1) Method and cost of acquiring these other data bases in machine readable form. 2) Analysis of the kinds of programs capable of converting records from a number of these data bases into the MARC format. Different level data bases might require different kinds of programs. If such an effort is deemed feasible, a cost estimate for such a program or array of programs will be calculated. 3) Method and cost of printing the records for examination, corrections, etc. 4) Method and cost of eliminating records already in the MARC data base. 5) Method and cost of comparing these records against the LC Official Catalog and making the necessary changes in the data or content designators. 6) Cost for input of additions and corrections. 7) Method and cost of incorporating the additions and corrections in the machine readable file. 8) Cost of providing means by which these records would not be input again by any future LC retrospective conversion effort. A result of this project should be a determination as to whether high potential or medium potential files, or both, are suitable for conversion. A determination will be made of the minimum yield or the minimum number of titles needed to justify writing the programs to convert these data bases. A factor to be considered is that the number of unique titles will decrease as more data bases are converted for this pool of records. It was decided that the research tasks to study the problems in dis- tributing name and subject cross reference control files would be dropped because of limitations of time and funds. An additional task, however, has been added that can be performed within the time limits of the pilot project. During the past year, the Library of Congress Card Division has recorded information about card orders in machine readable form. This information will be analyzed as to the year and language of the most frequent orders because it is assumed that the most popular card orders bear a relationship to the potential use of a data base in machine readable form by libraries in the field. This study involves the following: 1) Analysis of a frequency count of LC card orders for a one-year period and preparation of a distribution curve for card series. 168 Journal of Libmry Automation Vol. 4/3 September, 1971 2) Analysis of a sample of frequently ordered cards to determine with fair reliability the proportion of English language titles in this group. The sample will be large enough to give an indication of other language groups that might be significant for any RECON effort. 3) Preparation of distribution curves for English language and non- English titles by card series. 4) Mathematical analysis of the results of 1) -3) above to arrive at a table to show the anticipated utility of converting specified subsets of the LC card set. OUTLOOK Research in input devices has not uncovered any equipment that offers a significant technical and cost improvement over the MT /ST currently used in the Library of Congress. On-line correction and verification of MARC/RECON records will, however, speed conversion and will offer relief in the flow of documents and paper work required in a purely batch operation. Since MARC/RECON records will be corrected and verified in one operation rather than by the cyclic process of the present system, · cost savings should be realized. The Library of Congress will have this on-line capability through the Multiple Use MARC System. This new system is still in the design phase, and a projected date for implementation has not yet been set. To date investigations in the use of direct-read optical character readers have demonstrated that there are no devices currently available capable of scanning the LC printed card. The format recognition programs are operational, and RECON titles in the 1968 card series are being converted without any prior editing of the records. Procedures are being implemented to gather the necessary data to compare costs of the format recognition technique with costs of conversion with human editing. Production statistics have shown that retrospective records are more costly to convert than current records. This higher cost is attributed to the additional tasks in RECON of selecting the subset for input from the LC record set and comparing the records with the LC Official Catalog for updating. Since cards in the LC record set do not necessarily reflect the latest changes made to the cards in the LC Official Catalog, the Official Catalog comparison is necessary to ensure that RECON records are as up-to-date as the cards in the Official Catalog. Although the RECON report ( 8) recommended conversion in reverse chronological order with highest priority given to the last ten years of English language monograph cataloging, the Working Task Force study on the Card Division popular titles may reveal that selective conversion is a more practical approach. The orderliness of chronological conversion by language does mean that records in machine readable form can be ascertained easily. It is interesting, however, to speculate on the use of The RECON Pilot Project/ AVRAM and MARUYAMA 169 these records compared with popular titles which may cross many years and languages. The MARC/RECON titles constitute the data base for the Phase II Card Division Mechanization Project, and close liaison continues to be maintained between both projects. It is recognized that the distribution of cards and MARC records requires the same computer based bibliographic files and has similar hardware and software requirements. Plans are pres- ently underway to transfer the duplication of tapes for ~.iARC subscribers from the Library's IBM 360/40 to the Card Division's Spectra 70 when the Phase II system is operational. The RECON Pilot Project does not officially end until August 1971. In an attempt to make information available as rapidly as possible, the preparation of the final report will begin this summer, since several aspects of the project are complete enough to be documented. The final report will be published by the Library of Congress, and its availability will be announced in the LC Information Bulletin and in professional journals. ACKNOWLEDGMENTS The authors wish to thank the staff members associated with the RECON Pilot Project in the MARC Development Office, the MARC Editorial Office, the Technical Processes Research Office, and the Photoduplication Service of the Library of Congress for their contributions to the project and, therefore, to this report. Special thanks are due to Patricia E. Parker of the MARC Development Office for her work on the foreign language editing experiment and for writing that section of this article. REFERENCES 1. Avram, Henriette D.: "The RECON Pilot Project: A Progress Report," Journal of Library Automation, 3 (June 1970), 102-114. 2. Avram, Henriette D.; Guiles, Kay D.; Maruyama, Lenore S.: "The RECON Pilot Project: A Progress Report, November 1969-April 1970," Journal of Librm·y Automation, 3 (September 1970), 230-251. 3. Avram, Henriette D.; Maruyama, Lenore S.: "RECON Pilot Project: A Progress Report, April-September 1970," Jow·nal of Library Auto- mation, 4 ( March 1971 ) , 38-51. 4. RECON Working Task Force: Conversion of Retrospective Catalog Records to Machine-Readable Form: A Study of the Feasibility of a National Bibliographic Service (Washington, D.C.: Library of Congress, 1969 ), 179. 5. U. S. Library of Congress. Information Systems Office. Format Recog- nition Process for MARC Records: A Logical Design (Chicago, Ameri- can Library Association, 1970 ). 6. Avram , Guiles, Maruyama, op. cit., 248-249. 7. Avram, Maruyama, op. cit., 49-51. 8. RECON Working Task Force, op. cit., 11. 5589 ---- lib-MOCS-KMC364-20140103103053 113 MONOCLE Marc CHAUVEINC: Conservator, University Library of Grenoble, Saint- Martin d'Heres, France A new processing format, based on MARC II and some of BNB's elabora- tions of MARC II. It further enla1·ges MARC II to encompass French cataloging practices and filing arrangements in F1·ench catalogs. When the Bibliotheque Universitaire de Grenoble, Section Sciences, wished to transform its card catalog into a book catalog and later into an on-line catalog, the first necessity was to build up a format fitted for the handling of complex records and the filing of non-alphabetical headings. After several personal assays at a format , the Librarian at Grenoble had translated into French, to give French librarians the opportunity to become acquainted with them, the MARC II and BNB formats ( 1,2) and finding these two formats the most flexible and complete of those reviewed, he also began the work of adapting them to French cataloging rules. The MARC format is a standard format designed purely for communica- tion of bibliographic records on magnetic tape; MARC II is a MARC format containing Library of Congress cataloging data disseminated by the MARC Distribution Service of the Library of Congress. The MARC II format is not intended as a local processing format; indeed, even the Library of Congress uses its own internal processing format and not MARC II. Most centers using MARC II records have designed their own processing formats and file structures from which, if the center is to participate in a network, it must be possible to regenerate records in a communications format. The BNB format, one of the derivatives of MARC, contains British National Bibliography cataloging data. l 114 Journal of Library Automation Vol. 4/3 September, 1971 Translations of the two formats was done in January 1969. Subsequently a first French adaptation of them was discussed by a group of experts from the Bibliotheque Nationale and the Direction des Bibliotheques and was judged not good enough; a deeper work was necessary to analyze the MARC format and test its compatibility with French cataloging practices. The resultant new processing format, called MONOCLE (Projet de Mise en Ordinateur d'une Notice Catalographique de Livre), was published in June 1970 (3). PROGRAMS Meanwhile, in order to test the format and to prepare the operational work as soon as possible, programmers attached to the Institute of Applied Mathematics at the University began to write several programs in COBOL. COBOL was chosen because the Institute had good practice in that language, having worked with it for several years; because it can be easily modified if there is a change in format; and because it can be used with several types of computer, enabling other libraries to use it. The programs are still in the process of being written, but since the beginning of January 1970 all books cataloged by the Library according to current practice have also been cataloged according to the new system and their records entered into the computer, so that both systems are now working simultaneously. The author catalog program, which is the most difficult and sophisti- cated, is not yet ready, but most of the following that were foreseen as necessary are actually working: 1) A test program (TSTANALY) that checks the logical structure of the records at the input stage and displays on the printout any errors (fields missing, length of tags, of indicators, subfield codes, logical links between fields and information codes, etc. ) ; 2) A program ( EXPCREAT) that creates the files, computes the direc- tory and puts the records at their places on the disks; 3) A program ( TSTNOT AB ) for producing an alphabetical printed index containing author plus abridged title plus the address of the record on the disk; 4) A program for sorting records according to UDC numbers and for printing them on a two-column weekly list; 5) A program to correct and update the created files; 6) A program for sorting records alphabetically in an annual catalog; 7) A program giving a list of UDC numbers with the corresponding subject headings and vice-versa ; 8) Several small modular programs for supplying statistics on the number of books and volumes, and expenditure in total and by subjects. INPUT AND OUTPUT The Institute of Applied Mathematics has two computers: an IBM MONOCLEjCHAUVEINC 115 360/40 and an IBM 360/67 that work together in a conversational mode during the day and in batch processing during the night. The Library uses both of these modes. The conversational mode is controlled by a system called CP jCMS (Cambridge Monitoring System) for the input of data through an IBM 1050 terminal with a paper-tape puncher and a reader, and the batch-processing mode by OS (Operating System) for the produc- tion of lists and statistics. On-line input through the terminal is very convenient for corrections, because of quick access to non-created provisory files of 100 records and the printed list that can be proofread. It has some inconveniences, however, the first of which is that it is a slow system. A typist punches the paper tape at an average rate of twenty records a day. Taking into account the time of reply, errors of transmission, and breakdowns of the system, it is not possible to read more than fifty records in a morning, although theoretical speed of reading is forty records an hour. Then the files have to be read through the TSTANALY program, printed on the line printer, then controlled by the librarians, recalled and corrected on the 1050 terminal, and then again listed, controlled and so on until they are correct. It can take several days before a file of fifty records is ready. Though paper is a convenient means of storing data in secmity in case of destruction of the files, it is a slow means of transmitting data and, because it may cause errors in transmission, is not very reliable. The 1050 terminal, although a typewriter, does not have a character set sufficient for library work. It was necessary to create multipunch codes for diacritical marks. Because the foregoing is also an expensive means of input, the Library is experimenting with a new one. Using an IBM 72 tape typewriter already in the Library, the corrections will be made off line with the two tape boxes existing on the machine, and when several tapes are correct they will be sent to an IBM service bureau to be translated into a computer magnetic tape. The translation program, which will be written by IBM staff, is not very expensive. Output is on an IBM 1403 N1 line printer on which is used a special print train SN with upper- and lower-case roman alphabet and to which diacritical marks have been added. Products are 1) weekly lists of accessions according to the Universal Decimal Classification, 2) weekly lists of books according to acquisition number, 3) weekly lists of books according to call number, 4) a monthly catalog by authors, 5) an annual catalog by authors, 6) an irregular catalog of periodicals, 7) an irregular catalog of serials, 8 ) an irregular catalog of theses, and 9) regular statistics on the work of the Library. It was felt that for several years catalogs in book form would be less expensive and more useful than a system of on-line inquiry that would require display terminals to be used by untrained people. 116 Journal of Library Automation Vol. 4/3 September, 1971 FORMAT Although it will be possible later on to transform MONOCLE's internal format into one suitable for information retrieval , the system in use at Grenoble is mainly conceived for printing of the lists enumerated above. This goal led to the consideration of the major problems of filing records and building an internal format to allow easy programming of correct filing , even if this correct filing is rather complicated for the computer. There were two possible ways to achieve this aim: one was to build a simple format and provide complex programming to introduce lists of dead words, tables of transcodification and translation (as "Me" to "Mac," "Van Nostrand" to "Vannostrand" ) ; the other was to build a more com- plex format to make programming more simple and generalized and computer processing less expensive. The latter way was followed by the Library of Congress and the British National Bibliography in their com- munications formats, so a start was made from these two projects, keeping most of their structure, tags and subfield codes. The system to be built, however, required a working format, not a communications format, which led to th e first modifications. Two files were created, each containing leader, directory and variable fields. The two parts of each record can be reassembled into one MARC record for a communications format. Record Files The first file, called the Index (Figure 1) , contains the leader slightly modified; field 008 of the MARC format , put in fixed positions and having 69 characters; and the directory, built in a different way from the MARC directory. Since there will never be a field length of 9999 characters and a starting character position of 9999, length was reduced to 999 characters and the starting character position to 999. Since twelve characters are too much for a normal field, these two numbers are only used for compu- tation and are put in binary and both reduced to two bytes. This permits the insertion of three pieces of information between the tag and the field length: the subrecord indicator (two characters), the repeat indicator (one character) and the indicators (two characters). The directory takes the following form : 1 Tag 2 Subr. 3 1 4 Rep. Indic. s 1 6 1 1 Length St. Ch. Pos. 8 9 10 1 11 121 BNB MARC allows one digit for the subrecord indicator that makes possible nine codes for nine subrecords. Since MONOCLE will require more than nine subrecords, two digits are used, thereby permitting 99 subrecords. The repeat indicator of one digit is necessary if several identical fields are repeated in one record (e.g., in the case of several editors). A cross MONOCLE/CHAUVEINC IMAGE DES ENREGISTREMENTS Guide Codes d'information Emrrc:tntc: Vedette auteur ou thrc anon ymc ss 56 57 58 59 lCrc: Date Jcr mot du titre ou de l'Cditcur 60 61 62 63 INDEX 10 II 2c Date 10 II 12 12 so FICHI ER PRINCIPAL DonnCcs Prix ~I Fig. 1. Map of Index and Cataloging Data Records. 52 Sl DonnCcs 117 118 ]ourTUJ.l of Library Automation Vol. 4/3 September, 1971 reference can be directed towards one of these fields, and to prepare the sort field it is easier for the programs to look only for the tags than to test every "$a, in a field, which requires testing every character in the field. The repeat indicator has another function , that of linking several fields to be associated in the processing. On the worksheet ( Figure 2), tags and indicators are written in the INITIALES BORDER EAU DE CATALOCAGE MS 69 I l I I Ecat Type FMmC u d,· daH· 1\o rc dah' (4c,) 2.: done ( 4 c.) Illustration N1vcau Rtpro dur. N A M M R 18 19 ( )uvugc de r H(rc ncc lndt'JI Vl·dctcc lucCuturc l~ogr.Jph.PCnod. lcr Pub. Sc. Collection Suite F I 20 21 22 2} 1.4 25 1 6 27 28 29 lO ll n )) LmJUt' ~u ~~:c Not1:ltnr t. ourcc cat. '":nodK1tC Nbrc ¥oL Source Fournunur "'-'brt t'll . Pru E IN G ~ 138 s 0 11 A HIA 0 11 0 2 0 5 0 0 34 JS 36 Jq 10 41 42 •0 44 4 S 4ti 47 48 ·19 SO 51 52 SJ 54 VC"dcttc auu:ur ou tnrc Is vt·d'"·uc l c.·r mot tiu,• ou Cdltcur 2cmoc tltu Ed. l>att G IE N E 1¢ c A j S p E N A 9 619 It ss 56 57 58 59 60 61 62 6J 64 6' 66 67 68 ()9 Et~qucttc l nd •c. Co.gi!-lli.?>.i\~i9.<:1 ... $.1? ... A .. i;.Q.!1.1.Pr~.h~~-~-~-Y.l! .... ~r~.'!-.~.~-~-<; .... $.~ ... :;:.$!, .... l!Y ... ~r.nst . .W.[9.Ug!!-.J;~g} .. Gi\.~Pi\.J;A , ..... , ... AX.~ .. .J\r.~~-l_cl ... W.a.x:.~.\l.~ ... h . Ed ...... # ................... ......... ...... ................... ... .. .. 681 04 .. $a .. G.eneti.que ..... #. ..... # ...... ................................ ............ ....................................................... . Fig. 2. Worksheet . MONOCLE/CHAUVEINC 119 following order: tags, indicators, subrecord indicators, repeat indicators (e.g., 100 00 001). On the magnetic disk, however, the order is as follows: tag, subrecord indicator, repeat indicator, indicator (e.g., 100 001 00). The second file is the main file. Records in this file have the same general design as the MARC II communications records, and MONOCLE bas retained all the fields designed by the Library of Congress. Each field begins with a two-character subfield code. Grenoble does not use fields 001 to 009, but since the Bibliotheque Nationale will use these fields, MONOCLE retains them. Another characteristic of the second file is that records are input in random order and are given identification numbers that are their physical addresses on the disk. The address, which is put in the leader, is made up of ten digits, of which one is the number of the disk, four the number of the track and five the number of the record. Access to every record is simple, since the identification number is also the physical address. A printed abridged alphabetical list giving author, title and this number indexes a printout of the main file. Additions and corrections are made on this printout and then added to the computer file through a correction tape. The identification number is the access point. No supplementary internal index is needed, nor is any sequential search. There is direct access to every record in the file. Some fields have been added for MONOCLE, some deleted, and some modified. The main field deleted is field 130 (main entry uniform title heading) because its place was considered to be in the group of title fields. Accordingly fields 630, 730 and 930 are deleted. That is to say, they are kept on the format, but not used, as is the case with many other fields. Field 008 contains codes different from those of the ·MARC format. These 69 codes (see Figure 1) are put in fixed position just after the leader and before the directory. This permits various studies and manipulations (statistics, sorts, etc.) without going to the main file, which is in a variable-length form and whose contents are therefore less easily accessible than those of fixed fields. Field 080 for Universal Decimal Classification was not developed by the Library of Congress or BNB. For MONOCLE it has been given a structure that permits differentiation of the call number (when the book is classified on the shelves according to the UDC) from the UDC number, which is only used for the card catalogs. In this structure "$a" represents the call number and "$b" represents the continuation of the UDC number, as shown in the following example: 080 00 $a DUR 539.143 $b ( 083) : 547.1 The colon instructs the computer to make a cross reference from the second number to the first. In field 100, main entry author personal name, the general layout was 120 Journal of Library Automation Vol. 4/3 September, 1971 retained, but the subfield codes changed for filing purposes. As a matter of fact, the filing rules for personal names at the Bibliotheque Nationale differ in many aspects from American Library Association rules. In de- signing MONOCLE, the Library tried all along to give filing value to subfield codes in order to simplify programming. For instance, the filing order for the same name is: Saint Pope Emperor Kings of France Kings (other countries) Forename single Surname plus forename This gives: John, Saint John, King of England John John, Bishop of Chartres John, Peter John, Peter, Ed. John, Peter, Advocate Therefore the following subfield codes have been adopted: Names $a Saint $b Pope $c Emperor $d King of France $e Other Kings $f (Alphabetized by name of kingdom) Relator $g Date $h Numeration $i Precedent epithet $k Filing epithet $1 Forename $m This structure is closer to that of the BNB than to MARC's, but an im- portant change has been made in the indicators. MARC and BNB indi- cators for this field were chosen for communications purposes and are therefore not necessarily convenient for internal processing. In fact, the program had to test every character and take action on some of them (delete a blank, transform a hyphen into a blank, etc.), which takes a lot of computer time. To facilitate construction of sort keys a change of indicators was made that assigned to each of them a specific action. For first indicator 1 no action is assigned. That is to say that a name MONOCLE/CHAUVEINC 121 is filed exactly as it is, whether it is a single surname or a compound surname: 100 10 $a DURAND $m Charles " SMITH $m John ,, CASTRO CALVO $m Frederico HOA TIEN SU SANTA CRUZ $m Alonso de Eighty percent of names are put under this indicator and put in the sorting field without any test, which saves much computer time. First indicator 2 changes a hyphen into a blank in a compound name. The internal hyphen becomes a blank because it is filed as a blank: MARTIN-CHAUFFIER MARTIN CHAUFFIER PASTEUR VALLERY-RADOT PASTEUR VALLERY RADOT First indicator 3 is used for the compound names in which a character (blank, hyphen, apostrophe) is deleted: LA FONTAINE (Filed as LAFONTAINE) MAC INNIS (Filed as MACINNIS ) O'NEIL (Filed as ONEIL) VON NOSTRAND (Filed as VAN NOSTRAND) There seems nowhere a clear explanation of the reasons for creating a special field for family names (the use of this indicator in MARC II). For French libraries it is useless for filing purposes, family name being filed as a surname. First indicator 4 is used when a complex filing is necessary, that is to say, when the technique of inserting vertical bars (or any other characters) is used in the way proposed by R. Coward. The use of this specific indicator for these three bars enables the program to test for them only when this indicator is present. This means that there is just one test per name instead of ten or twenty on each character of every name. As this indi- cator is in the directory, the processing of the names before the sorting itself is hastened. MARTIN I DU CARD I DUCARD DUPON I de LA CUERIVIERE I LACUERIVIERE Me ALESTER I MACALESTER I Me CRAW-HILL I MACCRAW HILL I MULLER I MUELLER I First indicator 0 also has a filing function. As names of saints and kings will be a small part of the files, and in order to file them correctly, three bars are inserted to mark omissions for alphabetization. 100 00 $a THERESE I d' II A VILLA $b Sainte 100 00 $a THERESE de I' II ENFANT JESUS $b Sainte $k Marie Francoise Therese Martin In field llO the subfield codes of the communications format were not sufficient for a good filing. First, there seemed no reason to separate name (inverted) and name (direct order) because there is no difference in the 122 Journal of Library Automation Vol. 4/3 September, 1971 filing of these names, which is strictly alphabetical. There is also no logical difference between them. So MONOCLE retains only two of these indicators: 10, for name of a corporate body entered under the name of a place and 20, for other corporate bodies. This will be useful either for research purposes or for giving priority in filing to the name of place following upon the other name. As there are the same filing problems as in the author field, the indicator 40 has been added, which means that the three vertical lines are used. 110 40 $c Martin I von II Wagner Universitat The subfield coding is rather succinct in the MARC format, and a change was made from the BNB coding because French practice does not use form subheading and "treaty" subheading. Moreover, under the name of a corporate body there can be a subheading such as "conference." This subheading has to be interfiled with a subheading of subordinate depart- ment and then should have a different code. Library Association. Londres. Conference. Library Association. Londres. Cataloging Group The subfield codes are: $a French name of the corporate body ~ Uniform title used by $b Place I the Bibliotheque Nationale $c Name $g Relator $h Name of congress or conference $1 Subordinate department $j Additional designation (number of the congress) $k Date of the congress $m Place of the congress $n Remainder of the title $o Type of jurisdiction $p Name of larger geographic entity $q Inverted element MONOCLE does not use the "$t" proposed in MARC, and the same is true with many other fields ( 410, 610, 710, 910). MONOCLE makes important changes in the title fields, following British MARC but going a little further. Tags have been assigned to titles in the following order: 240 Collective filing title (complete works) 241 Uniform title ( Bible) 242 Original title 243 Translated title (used only for the filing of Russian or Greek words according to the roman alphabet) 244 Romanized title 245 Title A book may have several titles, in which case they are filed under the name of the author in the numerical sequence of the tags. A collective MONOCLE/CHAUVEINC 123 title (the complete work ) is filed before a uniform title (if it exists), and the latter before an original title, which is in turn filed before an actual title. Classical works of which there are many translations have to be regrouped under the original title, but this may not be true of scientific works or of popular novels, which are filed under actual title. Moreover, filing of titles can be different in different libraries and for different books in the same library, which is why the filing order will not be determined on the worksheet, but by the program. This problem in filing order was raised by the Bibliotheque Nationale, which does not want to have determined in the record itself which of several titles will be the filing title; titles will be put under their respective tags according to their nature, and the program will, according to certain tests, choose the filing title. However, a completely satisfying solution to achieving flexibility and unambiguity in filing has not been arrived at. MONOCLE now uses only sequences 240, 241 and 245, using about the same indicators as the MARC format but with a slightly different meaning. The first indicators in field 241 have also been changed in order to achieve proper filing whether or not a conventional title contains a personal name. For example "Exposition Chagall" will be filed before "Exposition Biblio- theque Nationale." The second indicator set to 'T ' shows that there should be a cross reference from this title to the title used for filing (actual title to original title, alternative title to main title ). The second indicator set to "9" shows that the title is not significant and will not be used in a title catalog; field 900 is thus not used and repetition of the cross reference is avoided. MONOCLE also employs in title fields the indicator "4" used in field 100 for complex names and an added indicator "5" for title without personal names. Subfield codes have also been modified in such a way as to use their alphabetical value as filing value as well as to identify data elements within a field. The following codes are used in fields 240, 241, 242, 243, 244 and in corresponding fields 440-444, 7 40-7 44, 940-944 ) : $a Title $b Filing number for a logical order of the Bible, Koran, etc. $c Adaptation or extract $d Remainder of the title $e Filing number for languages $f Language $g Filing number for dates $h Dates $k Name of person $1 Epithet $m Forename $p Place $q Corporate body The following are examples of this subfield code use: 124 Journal of Library Automation Vol. 413 September, 1971 241 50 $a Bible $b 03 $d A. T. Pentateuque, Genese $c Extraits $e 7 $f francais $h 1967 241 50 $a Exposition $p Paris $q Bibliotheque Nationale $h 1967 241 10 $a Exposition $k Chagall $m Marc $h 1963 For field 245 MARC indicators have been retained and "40" added for title with complex filing. These titles use the three vertical lines. 245 40 $a I Le XXeme I VINGTIEME I Siecle For more simple filing the virgule or slash is used to eliminate articles at the beginning of titles. This is more flexible than the use of one indicator to determine the number of characters to avoid in filing, especially as there can be more than nine characters to avoid. 245 00 $a The I Chemistry of Life The foregoing two techniques are used in all the fields x4y of MONOCLE ( 445, 945, etc. ) . There are slight modifications in other fields. For example, in the "collation" field the American and British formats do not make any men- tion of volumes. As it comes first in MONOCLE collation, the subfield codes of 260 are modified as follows: $a Volumes $b Height $c Pagination $d Illustration This situation may change if an international standardized catalog des- cription is agreed upon. In fields 400, 600, 700 and 900 the MARC and BNB MARC projects have foreseen only one subfield "$t" to put the title after the name, and only one field, 740 or 940 for titles alone. To permit filing author-title series or an author-title added entry with titles of works of the same author, the following title fields were constructed in exactly the same way as fields 240-245: 440, 640, 740, 940. The following fields were added, with the same indicators and subfield codes as 240-245: 441, 442, 443, 444, 741, 742, etc. The repeat indicator is used to link the author to the title in order to make one entry, since author entry and title entry may be quite independent. 410 20 001 $c NATIONAL RESEARCH COUNCIL 445 00 001 $a I Publications $y 1708 100 00 $a MEYNELL $m Esther 241 00 $a The I Little Chronicle of Anna Magdalena Bach $f Francais $h 1957 245 01 $a La I Petite chronique d'Anna Magdalena Bach $c trad. par M. E. Buchet 700 11 $a BUCHET $m M. E. $g Trad. 900 10 001 $a BACH $m Anna Magdalena $g Auteur suppose 945 00 001 $a La Petite Chronique $r voir $z 241 000 945 00 002 $a LaiPetite Chronique d'Anna Magdalena Bach $r voir $z 241 000 MONOCLEjCHAVVEINC 125 This is a very useful tool, which permits generalization of the program to interfile records of books published by an institution with records of series published by the same institution, something not possible if one is under "$t" and the other under 245. The technique is not used, however, when the name is part of the title, as in "Holden Day Series in Mathematics." It is also useful because MONOCLE treats large handbooks as series, which is more simple than using "$d" and "$e" in the 245 field and repeating the name of the treatise in every record or using the subrecord technique. Field 502 has also been modified to permit filing dissertations by subject, towns, date and number. The details of the indicators and subfield codes can be found in MONOCLE (3). One of the main problems encountered was the processing of multi- volume sets. It was thought necessary to develop a provision to permit interfiling volumes of a multivolume set. There are three cases, the most simple being that in which volumes are simply numbered 1, 2, 3 ... with or without a title and a date by volume. Field 505 is used in this case, with subfield codes slightly modified: $y Volume number $a Title $b Subtitle $e Remainder (Date, pagination) Following is an example: 505 00 $y 1 $a The Practice of Kinetics $e 1969, 450 p. $y 2 Sa The Theory of Kinetics $e 1969, 436 p. In the second case, when each volume has authors, title, and date, the subrecord technique can be used, each volume having its own subrecord. This is possible only for treatises with few volumes, since the complete record cannot be too long. For very complicated handbooks the series technique is employed. A record is made for the main title as a guide record, and other records are made for each volume, the name of the main treatise being repeated in fields 400-445. This case could be treated by the subrecord technique, but this would give very long and complicated records, too long to be pro- cessed by computer and difficult to correct each time a new volume comes in. Although the technique used is not very logical, the guide record is made only once, and a record is made for the volume only when it comes in, without any modification to the records already in the computer. When the records are sorted in alphabetical order, one entry will be made to the individual volume and by the "series note" will find its place under the guide record ( 3). There is of course no logical link internal to the file between records of different books of the same series, nor of them with their guide record. If there is a multivolume work as part of a series, in which each volume bears a different number in the series, there are two possibilities: either to use field 505 and 445 for each volume, linking them by the repeat indicator, or to use the subrecord technique. MONOCLE 126 Journal of Library Automation Vol. 4/3 September, 1971 makes a choice according to the complexity of the records. At the request of the Bibliotheque Nationale and of some documentalists wishing to use the format for bibliographies of articles, some fields were added. Field 270 contains name of the printer, the place and date of printing. Indicators 00 Subfield codes $a Place $b Printer's name $c Date Field 545 is the title of a periodical from which is extracted the article in the main entry. This tag was chosen because 500 is the note number (the title of the periodical is not an entry ) and 45 is the title number and can be constructed as a title field. Indicators 00 Subfield codes $a Title $b Subtitle $c Year $d Month $e Day $y Volume $f Issue $g Pagination $h Bibliographical references "$y" was kept for volume for the sake of consistency throughout the format. Since it was undesirable to alter MARC fields 660 and 670, MONOCLE employs 680-682 for French subject headings. However, name subject heading tags were retained as 600, 610 and 611, but with modified subfield coding. As in French filing geographical names are filed before topical names, the following tags were assigned: 680 Geographical names 681 Topical names 682 Topical names for indexes only The last tag was created in order to differentiate between subject headings for information retrieval and headings for printed indexes only. If there is a relation between two headings, the slash is used between them to tell the computer to make an inverted entry. For example, 680 04 $a Chemistry j Physics gives two entries, one under chemistry and the other under physics. To allow each library to have its own subject heading system the second indicator is used to indicate this system: for example, 04 is for Bibliotheque de Grenoble. Codes for MONOCLE are partially taken from the British codes instead of the American ones because they are given a filing value. They are, however, slightly different, in that there is no form subdivision. Subfield codes are as follows: $a Heading $t Chronological subdivision $u Geographic subdivision $w General subdivision, 1st level $x General subdivision, 2nd level $y General subdivision, 3rd level $z General subdivision, 4th level MONOCLE/CHAUVEINC 127 The levels have been requested for some information retrieval systems that have multilevel thesauri. As a general rule, the attempt was to give a filing value to most of the subfield codes in order to simplify and hasten processing without any table of translation. The latter is always possible, but burdens the program. The Library of Congress has published a special format for serials. Thinking it not very useful, and feeling that serials could be processed by the MARC format for books, the librarians at Grenoble simply added to the MONOCLE format some fields specifically for serials, as follows: 030 Coden 210 Abbreviated title 515 525 Not used 555 In MONOCLE 503, bibliographic history, is used for the "followed by" and "following" notes of a periodical, because they are simply notes and not added entries. Fields 780 and 785 are not necessary, since in a catalog an entry is usually not made for these titles. Most periodicals are processed by the format without any trouble. The holdings of the Library are put under 090 $b, as shown in the following example: 090 00 $a CbP. 185 $b 1, 1967- $c 5732s. $a Call number $b Holdings $c Location SUMMARY As stated at the beginning, the Library of Congress in its MARC II communications format has published the most comprehensive and the most detailed analysis of a bibliographical record. Some, mostly docu- mentalists, do not agree with the MARC II complexity in coding, but their aims are not the same as those of librarians who want, first, to catalog books and catalog records according to rules required for a catalog of a large stock of books. A simple, alphabetical sort on the author names is not adequate and is quite unusable by a reader. However, an arrangement that is good for a weekly bibliography may not be sufficient for a complete catalog. The British National Bibliography made a thorough study of catalog entries and produced a better filing structure in accordance with the Anglo-American rules. 128 Journal of Library Automation Vol. 4/3 September, 1971 MONOCLE translated the MARC format with slight modifications, but subsequent trials led to more modifications. MONOCLE format has been made from a librarian's point of view, but sometimes a programmer's view of the system has brought about an improvement in it. MONOCLE is working, but not without difficulties. These difficulties come not from the format itself but from the on-line system, which is not working as well as expected. The system organization may not be of the best and perhaps needs a thorough study before being put into operation. The format is not completely satisfactory and needs improvement. Docu- mentalists are right when they say it is too complex and expensive. Syn- thesis between the documentalist format, which is too simple, and the MONOCLE format will be undertaken to simplify the worksheet and speed up input time. From the librarian's point of view there are still problems to be solved. Processing of complex titles is not easy, elegant and clear. The analysis should go deeper to determine more logical relations between data, avoid- ance of duplication of information in the record, and speeding up of processing at every stage. The technique of links between fields and records is not developed in MONOCLE as it is in other systems. It may be helpful to connect data by use of pointers and to do away with repetition of series notes that are already input elsewhere. Hierarchical links between records should be useful. Hence, there is much work still to be done, but the most immediate goal is to make the MONOCLE format operational not only for the Library of Grenoble University for also for the Bibliotheque Nationale, which has adopted it for the automation of the Bibliographie de la France. The philosophy behind the modifications introduced in converting the MARC communications format to the MONOCLE processing format can and should be discussed, but they have all been made in order to improve the structure of the record not only for an internal processing but also for the interfiling of records, which is much more complicated. Until now work has been done only on descriptive cataloging and on author-title filing. Subject indexing and information retrieval are quite another job. REFERENCES 1. Avram, Henriette D.; Knapp, John F. ; Rather, Lucia J.: The MARC II Format: A Communications Format for Bibliographic Data (Washing- ton, D. C.: Library of Congress, 1968). 2. BNB MARC Documentation Service Publication No.1 (London: Coun- cil of the British National Bibliography, Ltd., 1968) . 3. Chauveinc, Marc: MONOCLE ; Protect de Mise en Ordinateur d'une Notice Catalographique de Livre (Grenoble: Universitaire de Grenoble, 1970) . 5590 ---- lib-MOCS-KMC364-20140103103252 SCOPE : A COST ANALYSIS OF AN AUTOMATED SERIALS RECORD SYSTEM 129 Michael E. D. KOENIG, Alexander C. FINLAY, Joann G. CUSHMAN : Technical Information Department, Pfizer Inc. , Groton, Conn., and James M. DETMER: Detmer Systems Co., New Canaan, Conn. A computerized serials record and control system developed in 1968/69 for the Technical Information Department of Pfizer Inc. is described and subjected to a cost analysis. This cost analysis is conducted in the context of an investment decision, using the concept of net present value, a method not previously used in library literature. The cost analysis reveals a positive net present value and a system life break-even requirement of seven years at a 10% cost of capital. This demonstrates that such an automated system can be economically justifiable in a library of relatively modest size ( approx. 1,100 serial and periodical titles). It may be that the break-even point in terms of collection size required for successful automation of serial records is smaller than has been assumed to date. INTRODUCTION The field of librarianship has in general not been characterized by an abundance of cost analysis articles. This is by no means a novel observation ( 1,2,3). Library automation has been no exception, despite its more quantitative aura. In particular there has been an almost complete lack of any analysis of the cost of an automated system as an investment decision. 130 Journal of Library Automation Vol. 4/3 September, 1971 The bulk of material that has been written regarding costs and cost analysis has concentrated upon costs per unit of productivity of a functioning sys- tem, or upon comparison of such costs among various systems ( 4,5,6) . Though still perhaps underrepresented, there is a growing core of such articles. Indeed, Jacob's article on standardized costs ( 7) indicates that a certain level of maturity has been reached. The analysis of library automation in terms of its justifiability as an investment decision is not an appropriate area for benign neglect. Librar- ians, whether they be special, academic, or public, typically must justify their budgets to some higher authority, and the decision to automate must almost invariably be an investment decision, requiring an expenditure of funds above the normal operating budget. If librarians hope to be success- ful in justifying their pleas for an investment in automation, an "investment in the library's future", they should be prepared to justify their requests in terms of what they represent-investment decisions. The cost analysis described below is an example of such an analysis. It is an after-the-fact analysis, but the principle remains the same. METHODS AND MATERIALS The SCOPE (Systematic Control Of PEriodicals) system was imple- mented in 1968 by the Technical Information Department of Pfizer, Inc., at the Medical Research Laboratories in Groton, Connecticut. The system is not radically different from others described in the literature ( 8,9,10). It is reasonably sophisticated in its handling of such featu res as claiming, binding, and budgeting. The basic design element of the system is the computer generation each month of a deck of IBM cards corresponding to anticipated receipts for that month. As an item is received, the cor- responding card is pulled from the anticipated deck and is used to inform the system of the receipt of the item. This "tub file" feature, first used by the University of California at San Diego ( 11) is the major design difference between SCOPE and the University of Minnesota Bio-Medical Library system described by Grosch ( 12) and Strom ( 13) , with which SCOPE seems most comparable in terms of system sophistication and capability. SYSTEM DESCRIPTION The system was originally written in Fortran IV for an IBM 1800 com- puter with two tape drives . A total of twelve programs were written. Two of these programs are quite large (the weekly update and the monthly generation program) comprising about 600 statements each; the remainder average 200 statements. Since that time the programs have been revised to operate on an IBM 360/30 computer using two 2400 tape drives and two 2311 disk drives. Several more programs have also been written. Fortran IV was chosen as a program language to render the system rela- tively immune to hardware changes and has fully justified itself. A listing of programs follows. SCOPE: A Cost Analysis/KOENIG, et al. 131 Program Function Number EPC01 Weekly Update EPC02 Monthly Card Deck Generation EPC03 Vendor Listing EPC04 Periodical Title Evaluation & Budget Listing EPC05 Holdings Listing EPC06 SCOPE File Print EPC07 PSN File Swap-to reassign PSN & realphabetize EPC08 Daily Receipt Listing EPC09 Binding Listing EPC10 Short Title vs. Full Title Thesamus EPCll Skeleton Binding Punch EPC12 Copy Tape File EPC13 General Skeleton Punch EPC14 Cross Index Punch EPC15 Receipt Edit EPC16 Pmchase Order Analysis EPC17 Discipline Analysis File Design Core Requirements Bytes 17060 15992 5648 6916 6992 6852 7638 1768 2480 3876 3024 2008 2920 3300 1444 2796 3024 SCOPE maintains a magnetic tape file in which each periodical is recorded in sequence by its Periodical Sequence Number ( PSN). Appear- ing once in the file for every PSN are records giving title, cross-reference, holdin~s, and journal control information, including, for instance, "separate index.' Records for one or more copies then follow this basic information. Each copy within a PSN consists of records for all current expected receipts ( XRs ), binding units ( BUs ) not yet complete, as well as a trailer ( TL) summary. A File Print program is provided which enables the library staff to inspect every item of data in the file. "Anticipated" Deck SCOPE generates monthly a deck of approximately 2,500 80-column Hollerith cards to be used for posting periodicals as received. A card is made for each receipt expected within the succeeding five weeks. For all regular known publication schedules, these cards are complete as to volume, issue ( including separate index) and publication date. For irregular or unknown publication schedules, one or more incomplete cards are provided in the deck. Upon receipt of an issue, the proper card is pulled from the "anticipated" deck, the actual date of receipt is punched and the card used to prepare the Daily Receipts listing. The card is also used to update the tape file on a weekly cycle. Unexpected issues require that a card be prepared man- ually by the library staff. Issues which are omitted by the publisher require that the card be returned to the system as a "throwback." If an issue is 132 Journal of Library Automation Vol. 4/3 September, 1971 unexpectedly divided into two or more parts, separate cards are manually prepared and the original card deleted. Claims In order to issue claims on a current basis, the tape file is updated weekly with receipts. Every receipt will find a copy of itself on the SCOPE tape (generated when the "anticipated" deck was produced) and a received code (R) and the current date will be posted to the record. Consequently, any item not marked received becomes a claim as soon as the "claim delay" period is exceeded. A card to be used for claiming will be punched on the weekly cycle first exceeding "lag" and "claim delay," and once again every four weeks thereafter until resolved either by receipt or transfer to the Missing Issue File. The "lag" is the period in weeks lapsing between formal date of publication and earliest anticipated date of receipt. The "claim delay" period is calculated as the weeks elapsing between earliest anticipated date of receipt and latest normal date of receipt. "Lag" and "claim delay" may be modified for each publication based on experience. Binding Binding units are created within the SCOPE file during the monthly generation run. A unit is punched when all the issues comprising it are received or claimed (that is, when none of them is yet to be anticipated). If the unit is complete (no claims) it will be dropped from the tape file at the time it is punched and will not be punched again. Binding units are formed whenever a volume changes or whenever the "issues per bind" factor is satisfied. Receipts having been accumulated in the file from week to week are dropped at the time of the monthly generation after being counted for binding. From the Binding Unit cards a listing is prepared that is used by the library staff to make up bundles of periodicals for the binders. The binding unit card accompanies the shipment and is used by the binder. It includes information on issues included, indexes, color of binding, etc. File Maintenance In addition to receipts and "throwbacks" the weekly update procedure allows add, change, and delete transactions to affect the SCOPE file on a record-for-record basis. Such transactions are needed to handle new periodicals, additional copies, closed series, discontinued copies, name changes, publication schedule changes, revised costs, vendor changes, and the like. The update operation is ordered by PSN, copy number, record type, and (for XRs) volume and issue, in that order. An entire publication schedule may be added to the file in such cases as when the schedule is known but highly irregular (Frequency Code 99). After the receipt cards are processed by the update each week, they are filed in the "manual receipt file" together with copies of claims sent to ....... SCOPE: A Cost Analysis/KOENIG, et al. 133 vendors. As binding units are created, copies of binding cards are filed in the same file, and receipt cards representing binding cards are discarded, as are earlier binding cards. This manual file corresponding to 1,000 journals requires about 5,000 cards and occupies three card file drawers. It is filed by PSN and is therefore in order alphabetically by journal title. Discards and additions to the manual file are about equal and hence it does not increase substantially in size. It permits rapid manual examination of the current status of each periodical. Holdings List A program is provided that lists the complete SCOPE file showing full title and abbreviated holdings statement for each PSN. In addition, any cross reference/ history data and any desired holdings detail will be printed. Since the file maintenance process insures an accurately updated file, this listing may be run at any time to provide an accurate reflection of library holdings. Periodical Title Evaluation (Scrutiny) A program is provided that lists all copies in the SCOPE file requiring annual review prior to renewal. This procedure is controlled by the "value code" assigned individually to each copy within a PSN. In addition to full title and abbreviated holdings statement, the listing shows by whom ab- stracted, the discipline codes associated with the periodical, and the annual cost. Given this information, library users are requested to vote for reten- tion of items for the next year. Those not receiving sufficient votes are not renewed. Separate programs not part of the SCOPE system are used to prepare vote cards and tabulate results. Budget List The program that prepares the Periodical Title Evaluation List can be used to prepare lists by "Department Charged," a convenient budgetary tool used each Fall to plan purchases for the following year. The lists may, of course, be run at any time. Vendor Order List A program is provided to prepare from the SCOPE file a listing of all non-terminated copies associated with each requested vendor. A three- character vendor abbreviation is used to control this process and is coded into each copy control record. In addition to the short title, the list gives vendor reference (his identifier for the periodical ), Pfizer purchase order number and date, and the estimated annual cost. Each different condition (form of publication, such as periodical, microfilm) is listed with the number of copies ordered. Although prices are not firm at the time of ordering, this listing never- theless provides the detail needed for purchasing documents. As price 134 Journal of Library Automation Vol. 4/3 September, 1971 change information is made available and updated into the fil e, the listing may be rerun for checking out final billings from the vendor. Similar lists can be produced by purchase order number, a convenient tool for resolving those financial complexities which inevitably occur. Discipline List This program is used to prepare lists by discipline/subject, as micro- biology, immunology, etc., a useful tool for maintaining collection balance, and for assuaging patrons' fears that their disciplines may not be adequately represented. System Capacity Present counts indicate approximately 9,000 tape records in the system, representing approximately 1,100 journals. About 200 issues are posted weekly. There are no restrictions on future expansion of the system as presently implemented. METHOD The method of cost analysis used was the "net present value method." Perhaps the clearest most readily available description of this concept is to be found in chapters 19 and 20 of Shillinglaw's Cost Accounting, An- alysis and Control ( 14). Briefly the idea is that of comparing a given investment decision with what might reasonably be expected from an al- ternative use of that same money for another investment. An investment is typically defined as "an expenditure of cash or its equivalent in one time period or periods in order to obtain a net inflow of cash or its equivalent in some other time period or periods." ( 14, p. 564). The librarian typically thinks of investing in automation now in order to make possible a lessened expenditure in the future-at least a lessened expenditure in comparison to what would be necessary to accomplish the same level of operations in a non-automated fashion. Conceptually these are the same; investment now in order to reap some future benefit. Future savings can be treated as a future cash inflow. The concept of net present value is rather simple; it consists of converting all present and expected future cash flows (or their equivalents) to a present value and examining that value in comparison to alternative uses for the resources invested. The process of conversion is that of relating time and money. Time does of course influence the worth of money. A dollar a year from now is worth less than a dollar today, for the dollar today can be invested and a year from now it will be worth more than a dollar, or at least the mathematical expectation of its worth is more than a dollar. The question is at what rate future cash flows should be discounted. Business firms typically use their "Cost of Capital" (the cost which the business must pay to obtain capital) as the discount rate. A business d~cision should yield a positive net present value when the appropriate future cash flows are discounted at the cost of capital. If not, the invest- SCOPE: A Cost Analysis/KOENIG, et al. 135 ment is a losing proposition, and the business would have been better off by not obtaining the capital, or by investing it elsewhere. The calculation of an appropriate cost of capital is a complicated exercise involving such things as debt capital, equity capital, etc. The figure of 10% is often cited as a good rule of thumb; happily it is appropriate in the case at hand and is the one used here. To the obvious question "is there any relevance in this net-present- value/cost-of-capital idea to an academic or a public library which does not obtain its funds in the same way, or have any explicit cost of capital?" the response is "yes." If a decision to automate, when analyzed in this fashion in comparison with alternative methods, should result in a negative net present value, then that decision is demonstrably poor. For if the money invested in automation were instead invested in the market, it could supply the alternative system's future greater operating costs with money left over to utilize elsewhere. This latter course might not be an option in fact, but the mere presence of its theoretical preferability would cast doubt on the desirability of any decision to automate. Conversely a positive net present value would argue for the desirability of automation, regardless of the source of the funds. The cost analysis that follows is expressed in terms of set up cost outlays (investment) and projected savings (cash inflow). The investment ex- penses are of course reasonably well documented. The operational savings are based on 18 months' successful experience with the system. Set-up Costs (including 1968 and 1969 parallel running costs) Systems analysis and programming: (fees paid to consultant ) Keypunching: Conversion reprogramming: (IBM 1800-dBM 360/30) Computer time: Personnel, opportunity costs: (Asst. Librarian $4,000 Tech. Info. Mgr. $6,000) Total Set-up Costs: Yearly Running Costs System maintenance: (retainer to Detmer Systems) Computer time (full costing) : Allowance for machine conversion: (based on an expectation of conversion at 3 yr. intervals at a cost of $750 each time) Total $10,450 2,000 500 4,000 10,000 $26,950 $ 500 5,000 250 $ 5,750 136 Journal of Library Automation Vol. 4/3 September, 1971 Operational Savings 1970 ~ , per year (in comparison with continued running of the previous manual system) Posting: $ 1,400 (based on a saving of 8 hours per week of clerical work) Claiming: 1,050 (based on a saving of 10% of an assistant librarian's time) Binding: 2,700 (based on elimination of approximately 450 hours of overtime, clerk and assistant librarian, and 150 hours regular time per year) Replacement costs : 400 (represents decreased replacement costs due to rapid binding and consequent lower loss rate) Production of holdings list: 250 (based on a savings of 50 hours per year of assistant librarian's time) Ordering/Bookkeeping: 1,250 (based on a savings of 250 hours per year of assistant librarian's time ) Total $7,050 Savings Resulting from Control of the Collection Practicable (see discussion below) Space saving per year: Subscription saving per year: Incremental overhead saving per year: Total Total Yearly Savings Yearly Running Costs Difference (Realized Savings) RESULTS not Previously $ 750 2,000 1,500 $ 4,250 $11,300 $ 5,750 $ 5,500 The net present value at the end of 1970 based on 10% cost of capital and 15 year life expectancy follows. The present value of one unit one year ago is 1.1052, at 10% cost of capital (assuming for simplicity that the 1968-70 set-up prices were paid in a lump one year prior to the end of 1970 ); 7.7688 is the present value of an annuity of one unit per year for 15 years at 10% cost of capital. Net Present Value Factor SCOPE: A Cost Analysis/KOENIG, et al. 137 Net Present Value 1968-1970 set-up costs: ( $26,950) X ( 1.1052) ( -$29,785) Yearly savings, commencing 1970: ($ 5,500) X (7.7688 ) ( +$43,117) Net Present Value= $13,332 These findings indicate that the crude payback period :::::::: 4.9 years (com- mencing January 1971). The system life required to break even at 10% cost of capital = 7 years. Another way of looking at the matter is to calculate the discounted rate of return. That is, at what rate of discount is the sum of the positive present values equal to the sum of the negative present values. In this case, the discounted rate of return = 17%. In other words, since the dis- counted rate of return ( 17%) is significantly above that available for alterna- tive uses of the resources (say 10%), this is a reasonable candidate for investment. DISCUSSION The net present value method has two inputs in addition to the raw data. The first one, already discussed, is the cost of capital. Most large businesses can supply such a figure, or at least inform the librarian or information manager what approximation is used by that company (though surprisingly many otherwise sophisticated businesses do not use this method ) . In an academic environment, advice can usually be obtained from someone in the economics department or in the business school. In any case, 10% is a good rule of thumb. The second input is the expected life span. This is not as crucial as one might suppose, for the farther distant the cash flow, the less its net present value. The net present value factor in this case for 15 years' life expectancy was 7.7688; for ten years it would have been 6.3213, for 20 years 8.6466-not a great difference. As is invariably the case, many of the effects of SCOPE were difficult to quantify. The most difficult were those in the sections "savings resulting from control of the collection not previously praclkable." Since the collec- tion can now be easily analyzed and scrutinized with only a minimum expenditure of research staff time, the rate of growth of the collection has been considerably tamed, while maintaining customer satisfaction. Prior to SCOPE, new subscriptions had been added at the rate of about 90 a year. When SCOPE was implemented, this fell to 10, and has now risen to approximately 30. During its first year of operation, SCOPE apparently resulted in 80 fewer periodical subscriptions, the second year, 60 fewer. Continuing this progression, 80, 60, 40, 20, 0, one would arrive at the conclusion that a long-range reduction in collection size of 200 subscrip- tions was achievable. To be conservative, the calculation has been based 138 ]ourool of Library Automation Vol. 4/3 September, 1971 on an estimate of a reduction of 100 subscriptions/year. Even this estimate represents a saving of over $4,000 per year. The resulting space savings were based on a cost of $10 per square feet per year (standard occupancy charges adjusted for stack use) and a ten-year cycle in stack space enlarge- ment. This scrutiny might have been done manually at a justifiable cost, but it had not been done, and more importantly probably would not have been done. The operational savings may be open to some criticism because, as is probably obvious to an experienced serials record librarian, the previous manual system was not strikingly efficient. It can well be argued that the most efficient possible manual system rather than the previous system should have been the alternative against which SCOPE was evaluated. From the point of view of the organization, however, the relevant com- parison is to actuality, not to what is theoretically possible, but in general- izing the results this specificity must be borne in mind. Somewhat mitigating this circumstance, however, is the fact that the running costs of SCOPE are probably overestimated. The computer cost is based on full costing, inappropriately high for the following reasons: 1) it includes programming overhead, but since SCOPE was programmed externally, the SCOPE project is being doubly charged for its programming; 2) the same double charging applies to program maintenance; 3) the costing makes no distinction between high priority jobs, and relatively low priority jobs such as SCOPE, and presumably low priority is less expensive. Since the distortions in the two paragraphs above are difficult to estimate and since they are to a degree counterbalancing, they are simply noted rather than quantified. The yearly operational savings ( $7,050) still intuitively appear surpris- ingly high. One's initial reaction is that even with overhead included, this is not a great deal less than the yearly cost of one library assistant. In point of fact, one library assistant has been transferred from the Library to the rapidly expanding Computer Based Information Section (computer based SDI and retrospective searching ), with no apparent deterioration of library services. The Library is in fact handling a greater work load than previously, with one less person. This cannot be entirely attributed to SCOPE, as some other rationalization of library operations has b een introduced, but it does indicate that the calculated savings are not a grossly distorted reflection of reality. CONCLUSION As pointed out in the introduction, almost any significant attempt at library automation will require an investment decision. Librarians should be prepared to make analyses of their proposals in terms of their justi- fiability as investment decisions, both for reasons of politics and for their own satisfaction and confidence. The net present value method is a power- ful, convenient, and useful tool for such analyses. It is hoped that this SCOPE: A Cost Analysis/KOENIG, et al. 139 article will serve as a reasonable case study for the application of this technique to the problems of library automation. An automated serial records system for a relatively modest ( 1,100 serial and periodical titles ) special library has run successfully and achieved its objectives for more than a year and a half. One of the major objectives was to produce a system that allowed clerical help to be substituted for a librarian's scarce and costly time, thus allowing more effective utilization of the professional librarian's skills. This objective has been met. Further- more, a complete turnover of the personnel interfacing with the system has been accomplished easily and painlessly. No small part of the credit goes to the originators who designed and documented the system for such turnover. Jt is an old chestnut, but well worth repeating-"design the systems not for yourself, but for the person who will be chosen to replace you." The cost analysis of the operations of the system indicate that its design, implementation, and operation are economically justified, and that capital investment will be paid off in approximately seven years. (The crude payback period was less than five years. ) The major implication of this economic justification lies in the relatively modest size of the Library's operation. It may well be that the break-even point in terms of collection size required for successful and cost-effective automation of serial records is smaller than has heretofore been assumed. REFERENCES 1. Dougherty, Richard M.: "Cost Analysis Studies in Libraries: Is There a Basis for Comparison," Library Resources & Technical Services, 13 (Winter 1969), 136-141. 2. Fasana, Paul J.: "Determining the Cost of Library Automation," A. L. A. Bulletin, 61 (June 1967 ) 656-661. 3 . Griffin , Hillis L.: "Estimating Data Processing Costs in Libraries," College and Research Libraries, 25 (Sept. 1964), 400-403, 431. 4. Kilgour, Frederick G.: "Costs of Library Catalog Cards Produced by Computer," Journal of Library Automation, 1 (June 1968), 121-127. 5. Chapin, Richard E.; Pretzer, Dale H.: "Comparative Costs of Convert- ing Shelf List Records to Machine Readable Form," Journal of Library Automation, 1 (March 1968), 66-74. 6. Black, Donald V.: "Creation of Computer Input in an Expanded Character Set," ] ournal of Library Automation, 1 (June 1968), 110-120. 7. Jacob, M. E. L.: "Standardized Costs for Automated Library Systems," ] ournal of Library Automation, 3 (September 1970), 207-217. 8. Lebowitz, Abraham 1.: "The AEC Library Serial Record: A Study in Library Mechanization," Special Libraries, 53 (March 1967), 149-153. 9. Scoones, M.: "The Mechanization of Serial Records with Particular Reference to Subscription Control," AS LIB Proceedings, 19 (February 1967)' 45-62. 140 Journal of Library Automation Vol. 4/3 September, 1971 10. Pizer, Irwin H.; Franz, Donald R. ; Brodman, Estelle: "Mechanization of Library Procedures in the Medium-Sized Medical Library: The Serial Record," Medical Library Association Bulletin, Ll (July 1963 ), 313-338. 11. University of California, San Diego, University Library: Report on Serials Computer Project (La Jolla, Cal., University Library, 1962). 12. Grosch, Audrey N.: University of Minnesota Bio-Medical Library Serials Control System. Comprehensive Report (Minneapolis, Univer- sity of Minnesota Libraries, 1968) 91 p. 13. Strom, Karen D.: "Software Design for Bio-Medical Library Serials Control System." In American Society for Information Service, Annual Meeting, 20-24 Oct. 1968, Proceedings, Vol. 5. (New York, Greenwood Publishing Corp. 1968), 267-275. 14. Shillinglaw, Gordon: Cost Accounting Analysis and Control (Home- wood, Illinois, Richard D . Irwin Inc. 1967) 913 p. 5591 ---- lib-MOCS-KMC364-20140106083304 A MARC II-BASED PROGRAM FOR RETRIEVAL AND DISSEMINATION 141 Georg R. MAUERHOFF: Head, Tape Services, National Science Library and Richard G. SMITH: Analyst/Programmer, Research and Planning Branch, National Library, Ottawa, Canada (Formerly with Library and Computation Center, University of Saskatchewan) Subscriptions to the Library of Congress' MARC tapes number approx- imately sixty. The uses to which the weekly tapes have been put have been minimal in the area of Selective Dissemination of lnforrruLtion (SDI) and current awareness. This paper reviews work that has been performed on hatched retrieval/dissemination and provides a description of a highly fl exible cooperative SDI system developed by the Library, University of Saskatchewan, and the National Science Library. The system will permit searching over all subject areas represented by the English language monographic literature on MARC. INTRODUCTION With subscriptions to the Library of Congress' MARC II tapes numbering approximately sixty ( 1 ), the utilization of standardized bibliographic information in machine readable form has reached an all-time high. Numerous subscribers have written programs to access the tapes in order to produce acquisitions and cataloging products, but, unfortunately, the search techniques in these programs have been limited to searching of fixed-length information, such as LC card numbers, Standard Book Numbers ( SBN' s) and compression codes. Accelerated developments of searching mechanisms have been made by those involved with on-line bibliographic systems, but work on MARC information retrieval in the batch mode has 142 journal of Library Automation Vol. 4/3 September, 1971 been evolving very slowly. That is, the proffering of an assortment of remedies for one of the oldest library problems, that of current awareness and Selective Dissemination of Information ( SDI) using MARC, has not received the emphasis it should. The Library of the University of Saskatchewan has been utilizing the MARC tapes since their weekly distribution on 1 April 1969, with areas of usefulness so far having been restricted by the kinds of searching methods available. Concern has therefore been shown for a far greater exploitation of the MARC records. Since no algorithms other than time decay have been established locally for limiting the size of the file to items which have a high degree of usefulness, and since the cost of updating and storing the weekly files has to be incurred, it is only fitting that as many biblio- graphic records as possible be monitored and disseminated to those sections of the University where they can be most effectively used. A program package for current awareness/SOl is the most likely method for achieving this. Collaborative efforts are now the only realistic means of exploiting MARC. Costs can be spread over a large user group, and at the same time personalized services are assured to those taking part. It is for this reason that the Office of Technical Services ( OTS), Library, University of Sas- katchewan, has been cooperating with the National Science Library ( NSL ), National Research Council of Canada, on the development of such a current awareness/dissemination system. Known by the acronym SELDOM (Selective Dissemination of MARC), the program represents cooperation in the true sense of the word, in that the OTS's experiences with MARC are being coupled with NSL's expertise in nation-wide SDI. This paper will describe in detail the evolution of SELDOM, with a future paper to document user reaction to the SELDOM program. HISTORY The University of Saskatchewan is not alone in the investigation of MARC-based retrieval/dissemination programs. The Oklahoma Depart- ment of Libraries, under the coordination of K. J. Bierman ( 2, 3, 4, 5 ), has been operating a weekly MARC SDI service since February of 1970 and found its reception overwhelming. Over twenty user groups in the United States and Canada are presently experimenting with this current awareness service in various subject fields, using the Dewey Decimal and the Library of Congress classification numbers as search keys. Oklahoma's efforts followed the study by William J. Studer ( 6) and the Aerospace Research Applications Center (ARAC) at Indiana University. Studer's hypothesis was "that an SDI system concerned with book-type material would be of significant benefit to faculty in keeping them alerted to what is being published in their fields of interest-especially faculty in the non-technical areas where books are probably still as vital, if not more important, a medium of information and ideas as periodical and report literature (7)". MARC II-Based RetrievaljMAUERHOFF and SMITH 143 In his experiment, Studer translated participants' interests into profiles consisting of weighted Library of Congress subject headings and classifi- cation numbers. Henriette Avram ( 8) of the Library of Congress' MARC Development Office reported on information retrieval using the MARC Retriever, a modification of Programmatics Inc.'s system known as AEGIS. Regarded as "essentially a research tool that should be implemented as inexpensively as possible," the MARC Retriever is tape based and able to accept almost any kind of bibliographic query. Unfortunately, it is only operational at the Library of Congress. Along similar lines are Syracuse University's L.C. MARC on MOLDS and LEEP projects (9, 10, 11, 12). The inter- active retrieval capabilities, which are used in both batch and on-line modes, permit a variety of queries over their MARC data bases. Additional projects reporting on the subject approach to MARC tapes in a batch environment are not numerous. Dohn Martin ( 13) at the Washington University School of Medicine describes a searching method by L.C. classification numbers, in which a PL/1 program is used to produce selection lists for the medical library. This is along the same lines as the work reported by J. G. Veenstra ( 14) of the University of Florida, D. L. Weisbrod (15) of the Yale University Library, and F. M. Palmer (16) of Harvard University Library. In Sweden, Bjorn Tell ( 17) has run a MARC II test tape in his integrated information retrieval system called ABACUS, while in Edmonton, Canada, Doreen Heaps ( 18) reports on author and title searches of MARC tapes in a Chemical Titles format. In England, related research is being contemplated by F. H. Ayres ( 19) for BNB MARC tapes. In Ireland (20), also, plans are in the offing for SDI services based on BNB MARC tapes, while in the United States, the first commer- cial venture is underway by Richard Abel and Company ( 21), which is contemplating selective dissemination of announcements. BACKGROUND The National Science Library has been providing an SDI service for Canada's Scientific and Technical Information (STI) community since April, 1969, spinning a variety of machine readable indexing and abstracting services on a regular basis. A questionnaire ( 22) was sent out by CAN /SDI Project officials in May 1970, asking its subscribership to suggest where subject expansion should take place in the future. Although the responses emphasized the life sciences, e.g. Biological Abstracts' BA Previews and Medlars, the NSL was nevertheless quite enthusiastic about adding the Library of Congress' MARC II tapes to their present SDI service, especially if the project programming could be accomplished elsewhere. Twenty-one subscribers responded to the MARC II tapes, indicating the existence of a good user group, although not one of top priority. The University of Saskatchewan Library expressed a willingness to perform the systems work and project programming, which was estimated to require less than four 144 Journal of Library Automation Vol. 4/3 September, 1971 man-months, making SELDOM operational by February 1971. THE SELDOM PROGRAM Facilities and Programming Languages In order for the OTS to make use of the PL/1 and Assembler programs, an IBM S360 computer configuration consisting of at least lOOK memory and a PL/1 compiler was deemed necessary. This presented no problem because the Library had at its disposal an IBM S360/50 with 256K bytes of memory. Additional hardware specifications include four tape drives, a 2314 disk and two 1403 printers, one with a TN option. The latter is soon to be replaced with the ALA approved library print train. Now, however, because of the addition of Large Core Storage (LCS ), large bibliographic files such as MARC will be processed much more easily. Release 19 of OS MFT was also implemented in order to effectively utilize this additional million bytes of LCS memory. This more than modest memory has great utility, although serious investigations of automated library systems such as this one can take place even with small memories. As can be imagined, the switchover to Release 19 came at an inopportune time as far as the SELDOM programs were concerned. Implementation of the new release affected the scheduling and turn-around times. The SELDOM Record Format Several years ago, the National Science Library decided to adopt a standard MARC 11-like format and design programs to convert suppliers' tapes to this standard format. When a decision is made to add a new tape service, such as Biological Abstracts' BA-Previews, to the present inventory of CAN/SDI tapes, the NSL personnel select those bibliographic items which will find use in an SDI environment. Selected items are then pulled from the input tape by the conversion program, and structured into an NSL format. This then was the first of many tasks facing the OTS- determining which fields should be utilized from the LC MARC tape for searching and printing. Of approximately fifty MARC tags, fixed and variable, only 32 contain information that might be of interest to users of the system for searching. These tags, however, can be grouped into analytical units, i.e. units of like information. Arranged in six term types, they are: personal name, corporate name, classification, title, geographic area code, and date. The abbreviations for the term types are P, B, K, T, G, and D respectively. Users then will be able to request information from the system in many ways, whether it be for a title term, or a combination of categories such as classification number and geographic area code. The twenty-three fields and five subfields chosen, along with their respective analytics are shown in Table 1, where [ ] are not searched and o are OTS calculations. Per- centages of occurrence, which was the criterion used for selection of the MARC II-Based RetrievaljMAVERHOFF and SMITH 145 tags, are also indicated in the table. All the 500 tags were omitted because NSL and the OTS do not wish to search abstracts, annotations, or biblio- graphic notes at this time. Where frequencies were not available from the Library of Congress' publication entitled Format Recognition Process jor MARC Records: a Logical Design, the OTS conducted its own counts over a tape selected at random. The tape chosen (Volume 2, Number 23 ) for the counts contained 881 records. Table 1. Search Fie ld Definitions Se.arch Key Personal Name (P) Corporate Name (B) Title (T) Classification ( K) Geographic Code (G) Date (D) FieldjSubfield 100 [400] 600 700 [800] 110 260$B 410$A 610 710 810$A 111 130 240 [241] 245 410$T [411] 440 [611] 630 650 651 711 730 740 810$T [811] 840 050 051 082 043 009 (i.e. 008 ) % of Occurrence Per Record 84.7 <0.1 12.1 22.4 0 11.7 97.9 2.4 4.8 11.1 4.6 1.5 0.2 4.3 ° 0.1 ° 100.0 2.4 < 0.1 6.0 0.1 0.9 95.9 17.5 0.2 0.8 4.1 ° 4.6 0.1 0.5 105.1 0.9 ° 95.8 34.0 ° 100.0 146 Journal of Library AtLtomation Vol. 4/3 September, 1971 The fact that only 28 data elements were chosen for searching purposes proved highly useful, since the National Science Library's search module was designed, for the sake of efficiency, to accommodate a maximum of 32 search field definitions. The program can handle this many fields, but on the average it makes use of approximately twelve fields per record. There may be occasions, however, when as few as seven or as many as twenty-two directory entries will be handled, not counting subfields. Table 2 is a distribution of directory entries for the sample MARC II file tape. The mean of the distribution of entries is 13, and the median 12. Table 2. Distribution of Directory Entries #of dir. entries L6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ::::,..23 #Records 0 2 0 28 74 116 160 149 123 106 69 25 19 5 3 1 1 0 881 % 0 .23 0 3.18 8.40 13.17 18.16 16.91 13.96 12.03 7.83 2.84 2.16 .57 .34 .11 .11 0 100.00 At the same time that d ecisions were being made regarding the inclusion of certain search fields, print field definitions were structured. Although the programs can accommodate any number of directory items, only 31 are required for satisfactory and meaningful output. The analytics for these definitions make up Table 3, where o are OTS calculations. Frequency statistics are again included. Description of Programs The SELDOM software is comprised of four modules. These modules (A, B, C, D) are easily identified in the system flowchart (Figure 1) and are: "A" the translation and conversion of MARC; "B" the searching of files; "C" the outputting of the search results , and "D" the compiling of profiles. Two IBM utility programs are also used. MARC 11-Based RetrievaljMAVERHOFF and SMITH 147 Table 3. Print Field Definitions Definition Term ( s) Causing Retrieval Main Entry Title Statement Edition Statement Imprint Collation Statement Series Statement/Notes Bibliographic Price Subject Added Entries LC Card Number Profile Number Expression Number Threshold Weight Weight Source Form of Content Language LC Class Number Dewey Decimal Number ISBN %of Occurrence Per Record 98.6 100.0 4.1 100.0 100.0 13.6 39.7 ° 131.6 100.0 53.2 ° 100.0 105.1 ° 95.8 ° 39.7 ° Translation and Conversion Program ( LCONV) The conversion program, called LCONV, converts the weekly MARC tape into a SELDOM MARC II-like format tape. The input records see the following changes: "%" used as field terminator, "$" used as subfield delimiter, "@" used as record terminator, upper- and lower-case ASCII translated to upper case EBCDIC, diacritics removed, text compressed, and unromanized characters that can't be approximated removed. The program is driven by two tables, one of which consists of the MARC tags in which the OTS is interested, and the other, the processes to which the selected tags will be subjected. Currently, all tags can be handled by one of four processes: 1) Process 1 extracts the language and the form of content code from MARC tag 008, and creates a new field 008 consisting of only these two units. Instead of a one-character code for form of content, a four-letter abbreviation delimited by "$A" is used. Language of publication is de- limited by "$B". Process 1 also extracts the first publication date from the original tag 008, and sets up a new field , tagged 009 and delimited "$A". 2) Process 2 handles the Library of Congress ( 051, 052) and Dewey Decimal Classification ( 082). It utilizes only the first subfield, compresses out slashes, and limits the length of these fields to 20 characters. 3) The geographic area code ( 043) and imprint ( 260) are routed through a third process which retains the MARC subfield delimiters. Subfield delimiters are retained to narrow the object field and reduce search time. 148 Journal of Library Automation Vol. 4/ 3 September, 1971 CONVERSION WEEKLY MARC II TAPE PROGRAM ~ COMPILE PROFILES COMPRO D IBM UPDATE UTILITY LCONV UPDATE -----, I CURRENT PROFILES UPDATE -----, I CURRENT ADDRESSES Fig. 1. System Flowchart of SELDOM. B c CONVERTED MARC HITS & MARC RECORDS SORTED HITS & MARC RECORDS PRIN PROGRAM PR NPRO PRINTED PROFILES STATISTICS MARC II-Based RetrievaljMAUERHOFF and SMITH 149 4) All other tags are routed through process 4, which removes subfield delimiters and heads up the entire field with "$A". Narrowing down the object field is not desirable for fields input to this process. The conversion program also outputs for each record a field identified by 035, a MARC II tag for local system number. This field contains data base code ( R for MARC), volume and issue number (extracted from MARC tape label), and the Library of Congress card number truncated to the first eight characters. LCONV sorts the tags, calculates base address and record length, builds a new directory, and writes the SELDOM MARC 11-like record out on tape. The Searqhing Program ( SRCHPRO) The searching program accepts as input compiled profiles, the converted MARC tape from LCONV, and parameter cards specifying data base and up to 32 search field definitions. Each field definition consists of a term type code, tag, and delimiter of the field or subfield to be searched. Six te1m types are allowed, although additions, deletions and changes to these six may be performed upon requests. All terms except date may be truncated on the right, with title terms benefitting from left truncation. The right truncation feature reduces storage and search time requirements. The searches are conducted over the converted tape according to the Boolean expressions which connect symbols representing profile words. Profile words are simply entered into core until the alloted core is filled, and the source tape is sequentially passed against the profiles; i.e., each of the records on the tape precipitates a search of the profile words in core. If all of the profile words were not entered into core, the source tape is rewound and another search is conducted. This continues until all profiles have been searched. An output tape is created containing the SELDOM record retrieved with a prefix consisting of the profile number, threshold weight, weight, expression number, hit number, and terms which caused retrieval of the record. Users also have the option of applying a weight ( -99 to +99) to each profile word. Each time profile words match terms in a record, the weight value of each of the words found is tallied. Upon completing the search of that record and upon satisfying the expression logic, the total of the weight values is compared to a threshold value. Thus, if the total is greater than or equal to the threshold value ( -999 to +999), that particular record is retrieved. Another option available to the user is a hit option, in which the user may specify the maximum number of records he would like various expressions in the program to retrieve for him. The Output Programs The output from the search program is sorted by calling up the IBM Sort Utility, which sorts the records on prefix. The sorted output is then 150 Journal of Library Automation Vol. 4/3 September, 1971 input to the print program along with the address file. The latter is a separate file that is merely updated using the IBM Utility IEBUPDTE. It is in this address file, however, that several options can be specified. Duplicate printouts can be obtained, such that the left and right sides of the page carry identical output, with the right side carrying a feedback mechanism. Two-up printouts, notes, and if necessary, punched card output can be requested. On the whole, the record printed out (see Figure 2) is similar in format to a 3 x 5 catalog card, the only differences being the fixed format, term or terms causing retrieval, the lack of name added entries and notes, and the control information at the bottom of each printout. PROFILE COMPILATION Because of the Library's bibliographic responsibility to the University, an alerting service such as SELDOM will vastly improve user awareness of the published monographic resources. First, users, in house and out, would not only be alerted to many works to be acquired by the Library, but would also be alerted to items that are currently not being purchased. Secondly, they would be assured of personalized services. Users of SELDOM will not receive listings of just new books, but will be notified of the latest books which are presumed to be relevant to their interests. Profiling When a prospective user (group) wishes to search a weekly MARC tape, his (its) interests are entered onto profile formulation sheets. These sheets (see Figures 3 and 4) contain a description of the user's subject interests, several references to the monographic literature, and a listing of the profile words with logical connectives. The profile words may number as many as 500. Figure 5 shows three of the approximately eighty profiles currently running under SELDOM. The profiles are formulated by search editors using words that appear in the user's narrative and references. Additional words are sought in the Library of Congress' List of Subject Headings. Classification numbers that express the appropriate areas are incorporated; depending upon the information need, personal names, corporate names, geographic area codes, and date are also prescribed. According to Mauer hoff ( 23), approximately twenty-seven hours per year are required of an information specialist/search editor in order to accurately capture and maintain a user's need for information. This figure _ incorporates interviewing time, user education, analyses of user feedback, and revision time. The success of this system or of any information retrieval system therefore depends on having sufficient profiling staff. The Compile Program (COMPRO) COMPRO, compiling of profiles program, edits the profile transactions 0018 0018 0018 0018 0018 0018 0018 OOJR 0018 0018 001~ 0018 0018 0018 OOJ g 0018 0018 0018 0018 001~ 0018 0018 0018 001 8 0018 0018 DAT E: MAR 1q, 1971 **************************************** •••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••• .........................................................•.• SHORTT LII\RARY, C/0 MURRAY MEM OR IAL LIBRARY, U~IVFRSITY OF SASKAT CHEWAN, SASKATOON, S ASK . ••• ••• ••• ••• ••• ••• ••• ••• ••• ••• • •• ••• ••• ••• ••• ••• ••• ... ••• ••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ····································*···················· ··· N CN AB, N CN BC IRISH, ERNEST JAME S WI 'IGET T, 1'll2- STRUCTURE OF THE NORTHERN FOOTHILL S AND EASTFRN MOUNTAIN RAN GES, AL BER TA AN D BR ITISH COLU MBIA, BET WEEN LA TITU DES 53 15' AN D 57 20 •, 1\Y E. J. W. I RIS H. DEPT. OF EN ERG Y, MINES AND RESOU RCES< l96 8> 38 P . ILL US ., FOLD. COL . ~APS I IN POCKET! 25 CM . ** GEOLOGICA L SURVEY OF CANADA. BULLETIN 168 **CA NADA . GEOLOG I CAL SUR VEY. BULLETIN 16R ••2.00 GEO LOGY BRITISH COLU~BIA . **GF.OLO GY AL BE RTA. LC 77-524 81>8 QE 185 POOlS FN OJ TW 000 WT 000 S R024'l FC 557 .11 LENG DA TF: MA O. 1'1, l'l71 SELDOM PROJECT: MA O.C II VOL 02 NO 4~ J 'l7 1 TH E FIJ LL ~WING MONOGRAPHS IN THE ARE AS I N WHI CH YOU HAVE FX~RESSEO INT ERF~ T RFPRESFN T TITLES RECENTLY PRCCES~EO RY THF LI~RAPY Of CONGRESS . THIS LISTI~G I S RE I NG PROV I DED TO YOU AS PART OF A RESEARCH PROJECT 8f i NG CONDUC T ED RY YOUR LleRARY I N COOPERATION ~ITH THE NATI ONAL SC I FNCF LlijRARY I N OTTAWA. MURRAY ~E~OR I A L LI~RARY , UNIVERS ITY OF SASKATCHE o AN , SASKATOON , S ASK . N CN NT CUNDY, ROBFRT . BEACON SIX . LONDON , EYRF & SPOTTISWOOOE, 1'l70 . 25 3 P., 16 PLATES. ILLUS., 7 HAPS, PORTS. 21 C~ . •• SO/ - FR AN KLIN, J OHN , S I R, 1786-IR47. **NO RT ~W EST T EQR I TORIFS , CAN DESCR IPTI ON AND TRAVEL . **ARCTI C REGIONS . LC 73-539884 F 1060 POOL~ F~ ryJ TW CO O WT 00 0 S R0249 FC LENG 9 17 . 122041 I 58~ C413263002 00 18 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 0018 ···············································*·········································· ····································· 0018 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••••••••••••••••••••••••••••••••• ••••••••••••••• 0018 ··················································································································*············ 001 8 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••••••••• •••••••••••••••••••• •••••• 0018 ••••••••••••••••••••••••••••••••••••••••••••• • •••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••• •• •••• ••••••••••••••• Fig. 2. Sample Profile Notices. ~ > :;x:, C":l ....... ';"< b:l ~ ~ ~ ::x:l ~ ~ ~- g ..... .......... 3:: > ~ trl ~ ~ 0 ~ ~ § 0.. (J') 3:: J-( ~ ~ ~ CJl ~ 152 Journal of Library Automation Vol. 4/3 September, 1971 PROFI LE -NUMBER 0003 SHEET- NUMBER INSERT YOUR ADDRESS I.ABEL IN THIS BI.OCK Reference Department Murray Memorial Library University of Saskatchewan Saskatoon, Canada. STA'I'E YOU'i $EARtH ' li!QO!$T _,, Iff .. ~ll~l:i\'1: 'OJm, AW l'W(i REJ:t:fi(P'C~$ _:·oF PAPERS PU8USHEO t'l! 'I!QO OR A 'C.ot.I.[AGUE WORkiNG I~ -::·YO.UR ::' ,fi~~o; ::. -(~I.£ASE ,;T'Tf'f; OR:, PRIIH.L .. This profile is intended to obtain information on current monographs that would most likely be of interest to the Reference Department, in order to keep the collection up to date. Reference works such as dictionaries, encyclopedias, handbooks, catalogues, etc. are the kinds of items sought. References: 1. United Nations. Economic Commission for Europe . Sub-Committee on Urban Renewal and Planning. "Directory of National Bodies Concerned with Urban and Regional Research. " New York: Unitec Nations, 1968. 134 pp. (JX1977) 2. Berlin, Roisman & Kessler. "Law and Taxation; a Guide for Conservation and Other Nonprofit Orqanizations". Washinqton: Conservation Foundation, 1970. 47 pp, (KF6449) 3. Hayes, Robert M.' and Becker, Joseph. "Handbook of Data Processing for Libraries." New York; Wiley-Interscience, 1970 . 885 p. 4. Havlice, Patricia Pate. "Art in time". Metuchen, N.J.: Scarecrow Press, 1970. 350 p. (N7225; 016.7; Art -- Indexes} .. ...... .. - .. . .... .. .. :; .. ..... .. .... .. .. :: l.lsf., "PRO'FI~: : woilos/ Mto tii~I'CI( rJe~ESSIONS ON IIEVERSl: SlOE .. ...... . ..... Fig. 3. Sample Profile Formulation Sheet: Narrative and References. MARC II-Based RetrievalfMAUERHOFF and SMITH 153 ., ·II C. f>ROFIL.E WORJ)S .. :' .. .. .. ,· n w · AC . • PROFI\..E IIIORQ~ A A. _AS* _a ANNIIA c ANN IIAI~ n RTRI rnr.llAP~* E ALMANA C* F DICTIONAR* G DI RECTORY H DIRECTORIES I EN CYCLOPED* J FACTS K GLOSSAR* L GUIDE* M HAN DBOOK* N IN DEX* 0 INTERLIBRARY p :HECK [S. * 0 GFNFAIOGY R MANUAL s MANUALS T T OIITI TNF* T u REFERENCE T v REP RINT* T w REVIEW* T X SYLLABUS T y SYLLABI T z CATALOG* T AA ABSTRACT* T AB STA [S' * T AC YEARBOOK* T AD rE rBOO K* .•.. R 99 AIR -~ 2 R 99 LIM-+ 7 3 R 99 Fig. 4. Sample Profile Formulation Sheet: T erms and Logic. 154 Journal of Library Automation Vol. 4/3 September, 1971 p 0007 B (\ DATE : FE3 ?8 , 1970 /!. CIIN<\0 11 13 BR ITIS H CO LU MB IA C AL[l!'RTI\ - . ---- - ··-- --- 0 SASKAJCHE~AN E I'IA.'-HTO'iA F ONTARIO G OUEfiEC OATF : FEB 28 , 197 C p 10 98 K K T T T T T A L~ I C43* B Ll3 104 4* C AvD!C-VISUt. 0 FILM* E AIIOIO* F VIOE(U Fl B tl ~ B B 13 Fl Fl 13 B H N'IVA SC OTIA l PRINCE EDWAiO ISLAN D J NEwF•JU'lOLANC . __ ,_ G INSTRUCT!DNAL H TRA~SP~RENC IES I AV K YUKJN L NURTH~EST T~RRITORIES M QUEEN ' S PMINTER tl N U, S , R. 0 FOR S hl!; I:IY THE SUPT ;- - - --- fl P AVA IL AA L£ F~O~ CLFAR!NGHDUSE 8 0 NA TI ON AL f\ R STAT( 1 S GT , BR IT, 8 T H,'I , S . U. B U UN IT FD NAT IONS '3 V U, N, T W AGARUnGPAPH* EOl R 99 AI B-M E02R99 N l iJ - R T T T T T p T T -,- Eo3 R 99 SIT ___ · ---- T E0 4 R 99 U IV E J5 R 99 W K G J LA NGUII GE Lfltl* K TV L TEACH I NG ~fiCrli~E* M PROGR~~~EC I~STRUCTION N CA I U C ~ MPUTEq-ASSISTED ~9 AI c1 99 c 1.1 - 0 DA TE: Hfl 2!!, 197 0 ')0()9 A PULLUT" tl CONTfi"H"lAT* C POl StJ~* -- F ENV I RO'li·IENT * F TO• G N-* EOl R ! ~I B- F l&G Fig. 5. Computer Version of Profiles. for sequence, syntax and semantics, and generates codes for the Boolean operators in the search expressions. The program Hags incorrect data base specifications, incorrect term types, and incorrect alpha codes (i.e. symbols corresponding to the profile words). Profile transactions are by way of card input, and can consist of profile additions, profile updates, and profile deletes. Listings accompany all transactions. OPERATION AND COSTS OF SELDOM From the time that SELDOM became operable on a day-to-day basis, cost information has been gathered, and since SELDOM is composed of four modules, the recording of items of cost has been easily done. For example, LCONV computer charges are presently $0.019 per record converted based on weekly files ranging in size from 1194 records to 2399. This breaks down to about 1939 records per week, and averages out to about $37 per MARC tape. Following the preparation of the tape for searching, the SRCHPRO-Sort routine is run. The average computer cost has been about $0.186 per profile per issue. SELDOM's user group presently numbers 81, with profile terms numbering 1121 or 14 terms per profile, and questions or expressions numbering 273 or about 4 per profile. PRINPRO was formerly running under stream-oriented transmission, at a total computer cost of $1.70 per 1000 lines of output. A shift to record- - - - - - -----------------· - MARC II-Based RetrievaljMAUERHOFF and SMITH 155 oriented transmission has lowered charges to $1.50 per 1000 lines. With profiles having averaged about 832 lines of output, the total cost of printing out search results has been about $1.25 per profile. Overall costs for the 81 profiles are presently about $2.23 per profile per tape, or $116.00 per profile per year. Since the profiles require updating at frequent intervals, charges of $0.37 per profile per tape have been incorporated into this charge to take care of changes in terms and addresses. Costs which have not been included in the calculations are such items as MARC tape subscriptions, forms, and staff time. DISCUSSION The OTS and the NSL have at their disposal a program package that is highly flexible. For instance, search keys can be added or deleted at will. Fields from the MARC tapes can either be incorporated or removed from the directory. Any number of fields and subfields can be searched on tape, and any new directory items may be created, with the SRCHPRO limit, however, being 32. This number was chosen because it satisfies 99% of the users' needs. Almost every procedure in the program is table driven, the result being that variations can easily be introduced into the programs. In consequence, if and when BNB MARC tapes are made available, and if and when a Canadian MARC service becomes a reality, searching of these tapes would present no problems whatsoever. The benefits to be derived from SELDOM go beyond the concept of SDI, because SELDOM can produce outputs for a wide variety of applications. SDI and current awareness have received considerable emphasis in the literature by those providing search services over a spectrum of scientific- technical tape services. Since MARC II has also elicited a tremendous response, especially from Kenneth Bierman of the Oklahoma Department of Libraries, these utilities do not merit additional treatment in this paper. SELDOM, however, is unique in that it is the only MARC-based SDI system capable of searches using six coordinated entry points, linear matching, truncation, weighting, and output options. From the point of selection, MARC has great appeal. Since the majority of the University's acquisitions (i.e. almost 80%) are English-language monographs, faculty and staff who have the responsibility for book selection would benefit from regular alerting services based on their areas of interest. Apart from receiving verified bibliographic information, the participants benefit from the timeliness of the records. At the same time, selection costs per record will be brought down significantly, especially now that this selection process becomes tied in to TESA -1, the Library's automated MARC-based acquisitions and cataloguing system. It has been suggested that selection and ordering could be done for the cost of selection alone. The only problem areas envisaged are the lack of Canadian imprints, and the lack of other non-English monographs, such as French, German, 156 Journal of Library Automation Vol. 4/3 September, 1971 Spanish and Portugese. A partial solution to this problem may take the form of a Canadian MARC Project. A more complete solution is on its way, since MARC coverage for other languages is anticipated by the beginning of 1972. Collection rationalization, an area receiving considerable attention along regional and national lines, can also benefit from SELDOM. Devising divisions of responsibility in the acquisition of library materials will enable libraries to acquire, organize, store, and make available to the public, comprehensive monographic collections. MARC deselection, where practised by subscribers, is being pursued mainly along the lines of time decay. The University of Chicago (21) has so far exhibited the only deselection algorithm employing a subject and intellectual level approach, in addition to date. They eliminate records from their file if they fall outside of their collection policy by using classifi- cation numbers. The OTS will be able to perform the same function, but much more rigorously, since its deselection criteria can consist of six elements. In this way, file size can be kept to a reasonable level, and update and storage charges will not be so high. Internal library data and information services will be along the lines of SDI, current awareness, demand bibliographies, and management statistics. These in-house utilities, which are already being obtained, have been very usefuL The Reference Department, for instance, receives a bibliography each week of MARC II reference sources. Another profile for one of the catalogers is monitoring the publications of the modern day novelists and poets. OUTLOOK SELDOM has been operational for only several months. While it has tremendous potential in the library field, and although immediate interest has been keen, the system will have to undergo considerable acceptance testing. Attention will have to be given to costs and to the user and his evaluation of the service. How SELDOM fits into a library's patron or reference services will be especially important, since the system will be integrated into a library's current accessions program and also the card catalog service. ACKNOWLEDGMENTS Major credit for the existence of the SELDOM Project is due to the systems analysts and programmers at the National Research Council of Canada, Messrs. P. H. Wolters, R. A. Green, J. Heilik, Miss R. Smith; and to Dr. J. E. Brown, National Science Librarian. REFERENCES 1. Personal Communication with Henriette Avram, MARC Development Office, Library of Congress, Washington, D. C. MARC ll-Based RetrievaljMAUERHOFF and SMITH 157 2. Bierman, K. J.: "SDI Service," lOLA-Technical Communications, 1 (October 1970 ), 3. 3. Bierman, K. J.; Blue, Betty J.: "A MARC-Based SDI Service," Journal of Library Automation, 3 ( December 1970 ), 304-319. 4. Bierman, K. J.: "An Operating MARC-Based SDI System: Some Pre- liminary Services and User Reactions," Proceedings of American Society for Information Science, 7 ( 1970 ) , 87-90. 5 . Bierman, K. J. : Statements of Progress of Cooperative SDI Project. In Oklahoma Department of Libraries: Automation Newsletter, 2 (February 1970 ), 3-4; 2 (June-August 1970) ; 2 (September 1970); 16, 25-26; 2 (December 1970), 34-35; 3 (February 1971), 1-3. 6. Studer, William J.: Computer-Based Selective Dissemination of Infor- mation (SDI ) Service for Faculty Using Librm·y of Congress Machine- Readable Catalog ( MARC) Records. (Ph.D. Dissertation, Graduate Library School, Indiana University, September, 1968). 7. Studer, William J. : "Book-Oriented SDI Service Provided for 40 Faculty." In Avram, Henrie tte : The MARC Pilot Project, Final Report ( Washington, D. C.: Library of Congress, 1968) p. 179-183. Also in Random Bits, 3:3 (November 1967 ), 1-4; 3 :4 (December 1967), 1-4, 6. 8. Avram, H enriette: "MARC Program Research and Development: A Progress Report," Journal of Library Automation, 2 (December 1969 ), 257-265. 9. Atherton, Pauline: "LC/MARC on MOLDS ; An Experiment in Com- puter-Based, Interactive Bibliographic Storage, Search, Retrieval, and Processing," Journal of Library Automation, 3 (June 1970 ), 142-165. 10. Atherton, Pauline; Wyman, John : "Searching MARC Tapes with IBM/ Document Processing System," Proceedings of American Society for Information Scien ce, 6 ( 1969 ), 83-88. 11. Atherton, Pauline; Tessier, Judith: "T eaching with MARC Tapes," l ournal of Library Automation, 3 (March 1970 ), 24-35. 12. Hudson, Judith A. : "Searching MARC/ DPS Records for Area Studies: Comparative Results Using Keywords, LC and DC Class Numbers," Library Resources and Technical Services, 14 (Fall 1970), 530-545. 13. Martin, Dohn H.: "MARC T ap e as a Selection Tool in the Medical Library," Special Libraries, (April 1970 ), 190-193. 14. Veenstra, J. G.: "University of Florida." In Avram, Henriette D.: The MARC Pilot Project, Final R eport (Washington, D.C.: Library of Congress, 1968 ), pp. 137-140. 15. Weisbrod, D. L.: "Yale University." In Avram, Henriette D.: The MARC Pilot Project, Final Report (Washington, D.C.: Library of Congress, 1968) , pp. 167-173. 16. Palmer, Foster M.: "Harvard University Library." In Avram, Henriette D.: The MARC Pilot Project, Final Report (Washington, D .C.: Library of Congress, 1968 ), pp. 103-111. 158 Journal of Library Automatwn Vol. 4 / 3 September, 1971 17. Tell, B. V. ; Larsson, R. ; Lindh, R. : "Information Retrieval With the ABACUS Program: an Experiment in Compatibility," Proceedings of a Symposium on Handling of Nuclear Informatwn (Vienna : 16-20 February, 1970), p. 184. 18. Heaps, D.; Shapiro, V.; Walker, D.; Appleyard, F.: "Search Program for MARC Tapes at the University of Alberta," Proceedings of the Annual Meeting of the Western Canada Chapter of the American Society for Informatwn Science, (Vancouver: September 14, 15, 1970), 83-94. 19. Ayres, F . H.: "Making the Most of MARC; its Use for Selection, Acquisitions, and Cataloguing," Program, 3 ( April 1969 ), 30-37. 20. Dieneman, W.: "MARC Tapes in Trinity College Library," Program, 4 (April 1970 ), 70-75. 21. "MARC II and its Importance for Law Libraries," Law Library Journal, 63 (November 1970), 505-525. 22. Wolters, Peter H .; Brown, Jack E.: "CAN/ SDI System : User Reaction to a Computerized Information Retrieval System for Canadian Scien- tists and Technologists," Canadian Library Journal, 28 (January, February 1971 ), 20-23. 23. Mauerhoff, Georg R.: "NSL Profiling and Search Editing," Proceedings of the Annual Meeting of the W estern Canada Chapter of the American Society for Information Science, (Vancouver: September 14, 15, 1970), 32-53. 5592 ---- lib-MOCS-KMC364-20140106083618 170 BOOK REVIEWS Basic FORTRAN IV Programming, by Donald H. Ford. Homewood, Illi- nois: Richard D. Irwin, Inc., 1971. 254 pp. $7.95. FORTRAN texts are now quite plentiful, so the main question in the reviewer's mind is: What does this book have to offer that no other book has? Regrettably the answer must be nothing. There are many other good FORTRAN books available. This has very little to distinguish it. That is not to say that it is not a good book. The quality of the book is good, the text is very readable, and there has been very good attention to the exam- ples and proofreading. The book is suit able for an introductory course, or for self study. It does not go completely into all the features of the language, as these are usually best left to the specific manuals relating to the machines available. The book does bring the student to a level where he will be able to use those manuals and the level where he will need to use those manuals. The book does come to the level necessary for the person who writes his programs with professional assistance. The author has chosen ANSI Basic FORTRAN IV to be discussed in the book. In particular he relates this to the IBM/360 and 370 computers. This is a common language and is available on most machines with only minor modifications. This was a good choice for the level of book he intended to write, since he didn't want to go into the advanced features of the language. The author goes quickly to the heart of the matter in FORTRAN programming, so that the reader can start using the computer right away. The basic material is well covered and gives a good introduction to the more advanced features which are available on most machines. The examples are well chosen so that they do not require any specialized knowledge ; therefore the emphasis can be put on the programming aspects of the examples. He also has very good end-of- chapter problems, ranging in difficulty from straight repetition of text material to programming problems which will require a considerable amount of individual work. He has a good discussion of mixed mode arithmetic, one of the more difficult topics of FORTRAN to explain. He also has a good discussion of input/output operations, and an explanation of FORMATting which is very good. This again is a difficult area of the language and has been well explained. Discussing each of the statement types in FORTRAN, he begins by giving the general form of the statement in a standardized way, which is very good for introductory purposes and for review and reference. The index in the book doesn't single these out, so somebody who wanted to use the book as a reference should make a self-index of these particular areas of the book where the general forms and statements are given. This is a good feature of the book. Robert F. Mathis Book Reviews 171 Films: A MARC Format; Specifications for Magnetic Tapes Containing Catalog Records for Motion Pictures, Filmstrips, and Other Pictorial Media Intended for Pro;ection. Washington: MARC Development Office, 1970. 65 pp. $0.65. This latest format issued by the MARC Development Office is similar in organiza tion to the previously issued formats, describing in tum the leader, record directory, control fields , and variable fields. Three appendices give the variable field tags , indicators, and subfield codes applicable to this format , categories of films , and a sample record in the MARC format. In addition to the motion pictures and filmstrips specified in the subtitle, the coverage of this format includes slides, transparencies, video tapes, and electronic video recordings. Data elements describing these last two have not been defined completely as the MARC Development Office feels that further investigation is needed in these areas. The bibliographic level for this format is for monograph material, i.e., material complete at time of issue or to be issued in a known number of parts . Since most of the material covered by this format is entered under title, main entry fields ( 100, 110, 111, 130 ) have not been described. This exclusion also covers the equivalent fields in the 400s and 800s. Main entry and other fields not listed in this format but required by a user can be obtained from Books: A MARC Format. This format describes two kinds of data: that generally found on an LC printed card and that needed to describe films in archival collections. Only the first category will be distributed in machine readable form on a regular basis. One innovation introduced in this format that can only be applauded by MARC users is the adoption of the BNB practice of using the second indicator of title fields (241, 245, 440, 840, but not 740 where the second indicator had previously been assigned a different function) to specify the number of characters at the beginning of the entry which are to be ignored in filing. It is to be hoped that in the future this practice will be applied to books, serials, and other types of works as well as to films. Judith Hopkins U.K. MARC Pmiect, edited by A. E. Jeffreys and T. D. Wilson. Newcastle upon Tyne: Oriel Press, 1970. 116 pp. 25s. This volume, which reports the proceedings of a conference on the U.K. MARC Project held in March 1969, may be of as much interest in the USA as in Britain; although the intake of British libraries is much smaller and the money available for experiments much less, the problems of developing and using MARC effectively within these constraints are for this very reason of special interest. 172 Journal of Library Automation Vol. 4/3 September, 1971 A. J. Wells opened the Conference with a paper introducing U.K. MARC and closed it with a paper stating its relationship to the British National Bibliography. Points of interest are the need for standardisation among libraries (not smprisingly, this theme occurs throughout) and the differ- ences between U.K. MARC and L.C. MARC (the latter being the odd one out, in its departures from AACR 67). Disappointingly, no hint is given of additional national bibliographical products that might come from MARC, such as cumulated and updated bibliographies on given subjects, or listings of children's books, etc. Richard Coward, with his usual clarity and con- ciseness, explains the planning and format of U.K. MARC, in which he has been so centrally involved. As he says, "we have the technology to produce a MARC service but we really need a higher level of technology to use it at anything like its full potential." R. Bayly's paper on "User Programs and Package Deals" is disappointing, dealing only with ICL 1900 computers, and not comprehensively or clearly even with them. Two papers discuss the problems of actually using MARC: E. H. C . Driver's "Why MARC?", which concludes that "the most efficient use of MARC will be made by large library sys tems or groups of libraries," and F . H. Ayres' "MARC in a Special Library Environment," which con- cludes that eventually all libraries will use the MARC tape. Mr. Ayres discusses the proposed use of MARC at A WRE Aldermaston, and also gives a general (and highly optimistic ) blueprint of the sort of way MARC could be used in an all-through selection, acquisition and cataloging system. (The four American experimental uses of MARC reviewed by C. D. Batty-at Toronto, Yale, Rice and Indiana- are probably well enough known in the USA and Canada.) Keith Davidson's discussion of filing problems is first class-and his paper is just as topical as when it was written, because little progress has been made since then. Peter Lewis, in "MARC and the Future in Libraries," makes the point that whereas BNB cards provided a ready-made product for libraries, MARC tapes will merely offer them a set of parts to put together themselves. Of special interest to American audiences may be Derek Austin's paper, "Subject Retrieval in the U.K. MARC," since the PRECIS system to which it forms an introduction may represent a major breakthrough in machine manipulable subject indexing. MARC and its uses constitute one of the most rapidly developing areas of librarianship. Regular conferences of this standard are needed to review progress from time to time. Maurice B. Line 5593 ---- lib-MOCS-KMC364-20140106083630 DEVELOPMENT OF A TECHNICAL LIBRARY TO SUPPORT COMPUTER SYSTEMS EVALUATION 173 Patricia Munson MALLEY: Librarian, U. S. Army Computer Systems Support and Evaluation Command, Washington, D.C. This paper reports on the development and growth of the United States Army Computer Systems Support and Evaluation Command (USACSSEC) Technical Reference Library from a collection of miscellaneous documents related to only fifty computer systems to the present collection of approx- imately 10,000 hardware/software technical documents related to over 200 systems from 70 manufacturers. Special emphasis is given to the evolution of the filing system and retrieval techniques unique to the USACSSEC Technical Reference Library, i.e., computer listings of avail- able documents in various sequences, and development uf the cataloging system adaptable to computer technology. It is hoped that this paper will be a contribution toward a standard approach in cataloging ADP collections. The advent of the computer has created a situation which has been labeled the "information explosion." Through automatic data processing, managers of all types can have available to them information previously impossible. Many authors have addressed this situation from many aspects. However, little has been said of the explosive growth of information about computers themselves and of ways to cope with it. This paper is intended to help overcome this void. It is a description of the system installed by the United States Army Computer Systems Support and Evaluation Command ( USACSSEC) to provide controls on its extensive library of technical literature pertaining to Automatic Data Processing Equipment. The USACSSEC has the mission of selecting and procuring this equip- ment to satisfy requirements of the Army, a process that involves analyzing and evaluating technical proposals made by computer manufacturers. The analysts of the Command require immediate access to detailed technical literature on all aspects of commercially available ADP hardware and 174 Journal of Library Automation Vol. 4/4 December, 1971 software. This literature is maintained in the Command Technical Reference Library. In form it ranges from single-page summaries to multi-volume bound collections. It includes periodicals, books, brochures, and reference works. In approximately five years, the Library's vendor documentation has grown from approximately 200 to 10,000 manuals on over 200 com- puter systems. The Library's holdings also include information on peripheral equipment from over 170 manufacturers, e.g., printers, magnetic tape transports, microfilm, platters, memories, etc.; standards; GSA Federal Supply Schedules; Programmed Instruction Courses published by vendors; and major reference works with monthly supplements. In the early days of the Library's existence, one librarian was able to catalog and shelve the material manually with no difficulty. However, the rapid growth in the availability and use of ADP brought with it a flood of technical literature which threatened to inundate the librarian and the manual filing methods. It was recognized early that some form of auto- mation assistance for the Library was necessary. The system described in this paper is the one which evolved and is now successfully employed. SYSTEM DESCRIPTION The system, named ACCESS (Automated Catalog of Computer Equip- ment and Software Systems), used by the USACSSEC is characterized by simplicity. It is built around a master list of all holdings, and the key to its uniqueness and success is the cataloging scheme. Manufacturers have various methods of identifying their literature, some having structured stock numbers, some using only the document title, and others ranging between these extremes. The only common identifier is document title, which offers inadequate access to the collection. An efficient cataloging scheme is therefore of primary importance as a means of identifying and retrieving documents. Searches made by the analysts for whom the library is maintained usually fall into one of three types: . 1) Location of a specifically identified document (e.g., the COBOL Programming Manual for the UNIVAC 1108 computer system); 2) Location of all documents pertaining to specific aspects of a particular computer system (e.g., technical descriptions of all output devices for the Burroughs B3500 system); 3) Location of all documents pertaining to particular aspects of a number of different computer systems (e.g., technical descriptions of line printers for IBM System 360, Burroughs B3500, Honeywell 200, RCA Spectra 70, and UNIVAC 1108). In 1966, since approximately 75% of the literature in the Library was IBM oriented, IBM's Index of System's Literature, which categorizes documents by subject, was used as an initial model to classify literature of other manufacturers. Since that time a more sophisticated, explicit and expanded subject index has been developed. Table 1 shows a complete list of categories, together with an explanation of them. Computer Systems LibraryjMALLEY 175 Table 1. Representative Subject Categories and Codes Hardware Categorization Subject Code (Tab) 00 01 03 05 07 08 09 Abbreviated Title General Information Machine System Input/Output Magnetic Tape Units and Controls Direct Access Storage Units and Controls Analog Equipment Auxiliary Equipment Subject Category Content Systems summaries, bibliogra- phies, configurators, pub- lications guide, brochures on systems where no technical documentation is provided and price lists not in the GSA Fed- eral Supply Schedule. EX: Publications guide with addendas. Principles of operation, operator manuals, operating procedures, reference and system manuals. EX: Processor systems informa- tion manual, operating manual. Component descriptions of unit record equipment, e.g., line printers, paper tape readers, card readers, etc. EX: Printers Reference Manual, Card Punch Style Manual. Component descriptions and operation of the units. EX: Magnetic tape unit operat- ing manual. Component descriptions and operation procedures. EX: Disc storage subsystem and reference manual. Information related to analog computers. Also includes the interface equipment for con- necting to digital computers. EX: Integrated hybrid subsys- tem. Includes plotters, digitizers, op- tical character readers, all non- standard I/ 0 devices. Interface equipment. EX: Graph plotters. 176 Journal of Library Automation Vol. 4/4 December, 1971 10 13 15 19 20 21 24 Communications and Remote Terminal Equipment Special and Custom Features Physical Planning Specifications Original Equipment Manufacturers Information Component descriptions of com- munication control devices and remote terminals. EX: a. Voice response unit. b. Visual display unit. c. Teletype, typewriter terminals. d. Graphic display units. Special Feature descriptions and Custom Feature descriptions. (Those devices that must be custom built.) EX: a. Satellite coupler. b. Programmed peripheral switch. c. Special feature channel- to-channel adapter. d. European communica- tion line terminal. Installation and physical plan- ning manuals. EX: Site preparation and instal- lation manual. Devices subcontracted from other manufacturers. EX: Component subleased from one manufacturer for use on own vendors equipment. Software Categorization Programming Systems- General Assembler COBOL General concepts and systems summary related to the soft- ware of the system. EX: a. Catalog of programs. b. Programmer's guide. Reference and programming manuals on the assembly lan- guage ( s) of the system. EX: a. Assembler language. b. Card assembler refer- ence manual. Reference and programming manuals on the COBOL lan- guage. EX: COBOL reference manual. 25 26 28 30 31 32 Computer Systems LibraryjMALLEY 177 FORTRAN Other Languages Report Program Generator Input/Output Control Systems Data Management Systems Literature on the Utility Programs Reference and programming manuals on the FORTRAN lan- guage (includes BASIC). EX: FORTRAN IV operations manual. Reference and programming manuals on other higher-order general purpose languages such as ALGOL, JOVIAL, etc. EX: a. ALGOL Programmers' Guide. b. JOVIAL Compiler Ref- erence Manual. Reference and programming manuals on Report Program Generator ( RPG) languages. EX: Report program generator reference manual. Information related to the soft- ware facilities for the control and handling ·of input/output operations. EX : a. Operating systems basic lOCS. b. Computer systems in- put/output package. Information related to general- ized information processing systems which include the func- tions of information storage, retrieval, organization, etc. EX: a. IBM -GIS b . Burroughs - Forge c. GE-lDS Standard routines used to assist in the operation of the com- puter; e.g., a conversion routine, sorting routine or a printout routine. EX: a. Utility system general information manual. b. Utility systems pro- gramming manual. 178 Journal of Library Automation Vol. 4/4 December, 1971 33 35 36 37 48 SORT/MERGE Systems Simulators/ Emulators Language Translators Operating Systems, Supervisors-monitors Automatic Testing Programs Miscellaneous Programs Information related to software facilities whose major functions are to sequence data in a disci- plined order according to de- fined rules. EX: a. SORT /MERGE Timing Tables. b. General Information SORT/ MERGE Rou- tines. Information related to tech- niques, hardware or software, utilized to make one computer operate as nearly as possible like some other computer. EX: a. Flow simulator informa- tion manual. b. Emulation information manual. Information related to the pro- grams of a system which are responsible for scheduling, allo- cating and controlling the sys- tem resources and application programs. EX: a. Disk / Tape operating system operation man- ual. b. Operating system pro- grammers. Interpretive diagnostic tech- niques which provide analysis of hardware components or of software programs; e.g., hard- ware autotest programs, soft- ware trace routines. EX: a. Program writing and testing bulletin. b. System Test Monitor Diagnostic. Information related to special techniques or application pro- grams. EX: a. APT General Informa- tion Manual. Computer Systems Library/MALLEY 179 Documents are shelved (in loose-leaf notebooks) by manufacturer, com- puter system, subject category and numerical publication identification. The user is aided in his searches by the following three types of listings of holdings: 1) Listing by manufacturer (Figure 1 ): major sort field, manufacturer; intermediate sort field , computer system nomenclature; intermediate sort field, subject code (tab); and minor sort field, publication number. That is, a document is listed by publication number, within subject code, within the computer system, within the manufacturer. This list serves as an index to the Library's holdings. ~ IBM IBM IBM I BM IBM IBM IBM IBM IBM IBM IBM IBM IBH SYSTEM TAB SYS/3 70 00 SYS/370 00 SYS/370 01 SYS / 370 01 SYS/370 01 SYS / 370 01 SYS/370 01 SYS / 370 03 SYS / 370 03 SYS/370 07 SYS / 370 07 SYS/370 15 SYS/370 15 USACSSEC TECHNICAL REFERENCE LIBRARY CATALOG AS OF JUNE 71 IBM CORPORATION LIBRARY LISTING BY MFR BY SYSTEM MRS . MALLEY, LIBRARIAN PUB NO PUBLICATION TITLE A33-3006-0l SYS / 370 MODEL 135 CONFIGURATOR 710300 N20 -0360 - 71 *SRL NEWSLETTER INDEX OF PUBLICATIONS + PROGRAMS 701231 A22 - 6935- 00 SYS/370 MOD 165 FUNCTIONAL CHARACTERISTICS 700600 A22-6942 - 00 SYS/370 HOD 155 FUNCTIONAL CHARACTERISTICS 700600 A22 -7000- 00 SYS / 370 PRINCIPLES OF OPERATION 700600 C20 -1729- 00 A GUIDE TO SYSTEM / 370 MODEL 165 700600 C20 - 1734- 00 *A GUIDE TO THE IBM SYSTEM/ 370 MODEL 145 700900 A21 - 9124 -0l 3505 CARD READER, 3525 CARD PUNCH SUBSYSTEM 710300 A24 - 3550 -0l 3215 -1 CONSOLE PRINTER- KEYBOARD COMP DESCR 700700 A26-1592-00 3830 STG CONTRL / 3330 DISK STORAGE COMP DESC 700600 A26 -1606- 00 2319 DISK STORAGE COMPONENT SUMHARY 700900 A22 - 6970- 00 SYSTEM/ 370 MODEL 15 5 INSTALLATION MAN PHYS PLAN 700600 A22 - 6971 -00 SYSTEM/ 370 MODEL 165 INSTALLATION MAN PHYS PLAN 700600 *Indicates new entries since last catalog. Fig. 1. Sample Index Listing by Manufacturer Name and System. 2) Listing by subject code (Figure 2) : major sort field, subject code (tab); intermediate sort field, manufacturer; and minor sort field, computer system nomenclature. That is, a manual is listed by computer system, within the manufacturer, within the subject code. Within each subject code, or tab, all manuals pertaining to this subject area are listed. 3) Listing by manufacturer name and publication number (Figure 3) : major sort field, manufacturer; intermediate sort field, publication num- ber. That is, a document is listed by publication number within the manufacturer. 180 Journal of Library Automation Vol. 4/ 4 December, 1971 MFR SYSTEM TAB CDC 6000 24 CDC 6000 24 602S3000B 60191200A PU BLICATION TITLE *6000 SERIES COBOL 3 REFERENCE MANUAL 64/6S/6600 COBOL REFERENCE MANUAL 700700 690900 RCA SPEC70 24 EC - 001 - S- 00 *ANSI COBOL LANGUAGE TRANSLATOR (UCOLT)PROG PUB 701200 RCA 3301 24 940SOOO REALCOM COBOL 660SOO UN! 1108 24 FSD 20S l *FD ANSI COBOL PROG REF MAN 700S04 UN! 1108 24 UP 7626 R2 *COBOL EXEC 2 & EXEC 8 SUPPLEMENTARY REF 700911 UNI 9200 24 UP 7S43 R2 *COBOL SUPPLEMENTARY REF-SEE 9300 - 24 700S11 UNI 9300 24 UP 7S43 R2 *COBOL SUPPLEMENTARY REF 700S11 UNI 9300 24 UP 7820 *9200/9300 COBOL SUMMARY CARD 700917 UNI 9400 24 UP 7709 Rl *9400 COBOL SU PPLEMENTARY REF 700630 UNI 9400 24 UP 7797 *9400 COBOL SUMM~RY CARD 700707 XDS SIGMAS 24 901S01A COBOL- 6S OPERATIONS 680700 XDS SIGMAS 24 90 1SOOA COBOL - 6S REFERENCE 680700 *Indicates new entries s ince l ast catalog. Fig. 2. Sample Index Listing by Subject Code (Tab), Manufacturer Name i and System . MFR SYSTEM TAB PUB NO PUBLICATION TITLE PUB DATE IBM SYS/370 01 A22 - 6935 -00 SYS / 370 MOD 165 FUNCTIONAL CHARACTERSTICS 700600 IBM SYS / 370 00 A22 - 6944 -0l MODEL 195 CONFIGURATOR 691100 IBM SYS/370 01 A22 - 6962 -00 SYS/370 MOD 155 CHANNEL CHARACTERISTICS 700600 IBM SYS/370 15 A22 - 6971 -00 SYSTEM/370 MODEL 165 I NSTALLATION MAN PHYS PLAN 700600 IBM SYS /360 19 A22 - 6974 -00 SYS/360 - 370 I i O INTERFACE CHANNEL 710200 IBM SYS/370 01 A22 - 7000-00 SYS / 370 PRINCIPLES OF OPERATION 700600 IBM 7070 t 7074 01 A22 - 7003 -06 7070/7074 PRI!ICIPLES OF OPERATION 620000 IBM 1401/1460 00 A24 -140l -02 1401 SYSTEM SUMMARY 650900 IBM SYS/370 07 A26 - 1606-00 2319 DISK STORAGE COMPONENT SUMMARY 700900 IBM SYS/370 01 C20 - 1738 - 0l A GUIDE TO SYSTEJ>I/370 MODEL 135 710 300 IBM SYS/370 15 C22 - 7004 - 00 SYS / 370 INSTALLATION MANUAL - PHYSICAL PLANNING 710 100 IBM SYS/360 26 320 - 1011 - 01 CALL/360 & PL/1 SUBROUTINE VER 2 700200 IBM SYS/360 25 320 - 1054-00 CALL/360 FORTRAN REFERENCE MANUAL 700200 Figure 3. Sample i ndex l i st i ng by manufacture r name and publ i cation numbe r . Fig. 3. Sample Index Listing by Manufacturer Name and Publication Number. Computer Systems Library/MALLEY 181 The manufacturer needs only to list his documents pertaining to a proposal and an analyst can find them immediately by using this listing. This listing also aids the manufacturer in updating his documents on file in the Library, as most manufacturers publish their own index of publica- tions in numerical order. The above lists are generated by sorting and listing a master file. The latter is maintained on magnetic tape and updated with punch cards. Four card formats are employed, one for each of the following: 1) addition of publications, 2) deletion of publications, 3) change of title or date of a publication in the file, and 4) change of other information. Tables 2 through 5 show the format for each type of card. It should be noted that in Table 3, information in columns 1-26 must be identical to that in the entry to be deleted, and that the publication title and publication date are not changed by the card described in Table 5. Table 2. Punch Card Format for Addition of a Publication Card Columns Information 1-3 Manufacturer (abbreviated) 4-12 System number 13-14 Subject code 15-26 Publication number 0 27 The letter 'A' (key for adding 28-74 75-80 a publication) Publication title Publication date Table 3. Punch Card Format for Deletion of a Publication Card Columns Information 1-3 Manufacturer 4-12 System number 13-14 Subject code 15-26 Publication number 0 27 The letter 'D' (key for deleting a publication) Table 4. Punch Card Format for Change of Title or Publication Date Columns Information Remarks 1-3 Manufacturer Identical 4-12 System number to 13-14 Subject code Listing 15-26 Publication number 0 27 The letter 'C' 28-74 The new title if applicable 75-80 The new publication date if applicable 182 Journal of Library Automation Vol. 4/4 December, 1971 Table 5. Punch Card Format for Change of Manufacturer, System, Tab, or Publication Columns Information Remarks 1-3 Manufacturer Identical 4-12 System number to 13-14 Subject code Listing 15-26 Publication number 27 The letter 'X' 28-30 New manufacturer name 31-39 New system number 40-41 New subject code 42-53 New publication number A simple program written in COBOL for the UNIVAC ll08 is used to implement ACCESS. Data Cards are read into memory, and the master tape file is updated. Errors such as "no match" or incorrect format are identified during the update process. The updated master file is sorted to provide the three types of output listings described above. SYSTEM DEVELOPMENT The present system evolved over a five-year period. The initial catalogs were prepared and maintained manually, and some of the better features of the early attempts were carried forward into the automated system. Because of this evolution, it is difficult to determine the actual development cost of ACCESS. Much of the detailed design was done in connection with development of the computer program. Approximately seven man-months were required for preparation and debugging of the program. During this period, a total of approximately two hours of UNIVAC ll08 system time was required. Negligible time has been spent on program maintenance since installation of ACCESS. Not unexpectedly the greatest effort was expended in collecting and preparing data for the initial master file. The Library in 1967 contained over 3,000 documents, and a punch card had to be prepared for each. The major ADPE manufacturers cooperated in this undertaking, by providing properly punched cards for individual documents. Cards were prepared by the USACSSEC for documents provided by small manufacturers and for miscellaneous documents in the Library. The major manufacturers have continued their assistance in maintaining the data base, providing punch cards with all new documents delivered to the Library. Nevertheless, it cannot be stressed too strongly that the updating and maintenance of this library file is a very difficult and tedious task representing the work of a full-time librarian, library assistant and Computer Systems Library /MALLEY 183 clerk The Library may receive 600 new documents and/or page changes, with or without cards, during a thirty-day period. The master file is updated and new listings produced every sixty to ninety days. More frequent runs would prove more beneficial to the users and require less manpower on the part of the staff. Each run requires approx- imately ten minutes of UNIVAC 1108 system time. It is an interesting fact that communication was a problem during detailed design of ACCESS. ADP system analysts and programmers thought and spoke in terms of codes, fields, sorts and files; the Librarian operated in a context of documents, catalog cards, and indexes. A period of mutual education was necessary before effective communication transpired and the system design progressed. RESULTS The Library today contains almost 10,000 hardware/software equipment documents on over 200 computer systems from 70 manufacturers. The flexibility inherent in ACCESS permitted the Library to absorb this rapid growth with minor perturbation. During one six-month period documents describing the mini-computers of twenty manufacturers were added. The subject codes accommodated all documents, and the only modification required to the system was the addition of codes for these new manu- facturers. The value of ACCESS was demonstrated when IBM and RCA announced the new System 7. Documentation on the available hardware and software was delivered on the day of announcement together with punch cards, and within one week this large addition to the collection was completely integrated into the catalog. ADPE manufacturers also have benefitted from ACCESS. The Army requires that ADPE vendors, to be eligible for contracts, must maintain cur- rent technical documentation of their proposed systems in the USACSSEC Library. Manufacturers are provided copies of the listings pertaining to their equipment to check for compliance with the requirement. Some manufacturers have even accepted the ACCESS cataloging scheme for use in their own libraries. ACCESS has met the objectives established for it. Benefitting from the evolutionary nature of the cataloging scheme, the system has required a minimum of modifications to date. None of these has been substantive, falling more in the category of debugging rather than in that of design change. Although ACCESS was initiated and installed to satisfy the unique requirements of the USACSSEC, it has general application. It brings order to the conglomeration of technical information on ADP systems and equipment. The three listings that it produces become, in effect, axes for the multi-dimensional volume of information. 184 Journal of Library Automation Vol. 4/4 December, 1971 CONCLUSION The USACSSEC Technical Library is recognized as having the most extensive holdings of ADPE manufacturer's literature in the Washington area. No libraries of equal or greater size are known to exist anywhere. It was planned initially that only USACSSEC analysts and technicians would have access to the information in the USACSSEC Library. However, the resulting interest of various organizations of the Department of Defense (DOD), and the fact that this collection provided information that was otherwise unavailable, prompted the Command to open the Library to a selected group of DOD users. This initial relaxation has gradually evolved into provision for all government and military personnel receiving prior clearance from Command Headquarters USACSSEC to utilize the Library for research. Unfortunately, because of the type of material collected, the quantity available, and the constant demand, it has not been possible to permit the lending of materials. At present, approximately eighty personnel from other government agencies use the Library each month for research. Some of the agencies use it each month for evaluation and selection of computers. User reaction is amazement that such a collection of ADP materials exists. It is not unusual for relatively new and thoroughly dynamic fields of interest to progress so rapidly that efforts to document them adequately lag behind the latest developments. The problem is particularly acute in the information processing field, whose large amount of technical literature is of little value without an efficient cataloging system. USACSSEC has solved some of the information problems in the computer field by examining in detail the special on-the-job requirements of computer system analysts in general. By developing its library in terms of the com- puter industry, rather than specifically to one Command's requirements, a generalized library system in ADP has evolved. It is hoped that this paper will be a contribution toward a standard approach in cataloging ADP collections and creation of a commonality among ADP technical libraries. 5594 ---- lib-MOCS-KMC364-20140106083744 185 AUTOMATIC PROCESSING OF PERSONAL NAMES FOR FILING Foster M. PALMER: Associate University Librarian, Harvard University Library, Cambridge, Massachusetts Describes a method for preparing personal names already in machine read- able form for processing by any standard computer sort program, determin- ing filing order insofar as possible from normally available information rather than from special formating . Prefix recognition is emphasized; multi- word forename entries are a problem area. Provision is made for an edit list of problems requiring human decision. Possible extension of the method to titles is discussed. This paper describes a method of computerized filing of personal names for display in book catalogs or other lists intended for direct human consultation. The problem is to be distinguished from a related but dif- ferent one: computerized storage for retrieval by means of a search key, in which machine rather than human convenience can determine the order. To the extent that filing is a purely mechanistic sorting process, it is ideally suited to computerization. However, it was early recognized that there are many possible complications in machine filing of library entries, even in the relatively straightforward area of personal names. Some of these complications arise from such factors as upper-case codes, diacritic codes, and punctuation; others are the result of library rules or practices that call for departures from strict alphabetical order. While the latter are especially numerous in subject headings and titles, they affect names as well, for example, the custom of filing Me as if Mac. 186 Journal of Library Automation Vol. 4/4 December, 1971 While no general review of the literature on machine filing will be attempted here, attention will be called to selected contributions. Nugent ( 1) described an approach to computerizing the Library of Congress filing rules and pointed out areas where the present rules do not lend themselves to mechanization. Cartwright and Shoffner ( 2) discussed four major ways of approaching a solution to the problem and concluded that a mixture of different methods would eventually be required. In a later publication Cartwright ( 3) developed his ideas further and included a brief description of the present writer's then unpublished work. The principal monograph on the subject is that by Hines and Harris ( 4). They present a suggested filing code departing significantly from those in widespread use and propose that material be encoded in a certain fashion so that it will be ready for computer sorting. In particular, considerable dependence is placed on distinctions between single, double, and multiple blanks separat- ing words or fields. In a recent paper, Harris and Hines restate their rules briefly and report on their later research ( 5). The present paper describes a different, virtually an opposite, approach. Rather than relying on special formating of the material at the time of encoding, the system described herein attempts to derive the necessary filing information from normally formated material. Historically, it grew out of a desire to construct improved indexes for use at the Harvard University Library to the body of records distributed by the MARC Pilot Project, in which there were field indicators and a limited number of delimiters within fields, but a general absence of information added expressly for the purpose of filing. While some early work embraced both personal names and titles, it was soon apparent that names by themselves presented a considerable challenge, and further consideration of the even more difficult areas of titles, corporate entries, and subject entries was deferred. A few comments on the possible applicability of the general method to titles will be made later. The concrete form which the work eventually took was an Autocoder macro instruction for a second generation computer, an IBM 1401. (A macro instruction is a means of calling forth by means of a single instruction a more extensive routine already worked out and placed in the system "library.") Since the 1401 was a fairly small computer, it was important that the algorithm not require an excessive number of instructions, and since the internal speed of the machine was only moderate, it was also important that processing be direct and economical. The method used, however, is by no means limited to a particular computer or a particular language. A partial version of the algorithm has been written in ADPAC, as an exercise in the evaluation of that language, and run on an IBM 360-65 using MARC II test data. The system is based on examination of names (previously identified as such by appropriate tags) and development of parallel sort keys consisting Processing of Personal Names/PALMER 187 only of letters, numerals, and blanks, readily processable by any standard computer sort package designed for alphanumeric information. The only requirements are that blank sort low and that the letters A - Z and the numerals 0-9 sort in their natural order; whether numbers are considered higher or lower than letters does not matter. Processing starts at the beginning of the name and proceeds until one of three conditions prevails: The number of characters examined is equal to the length of the field as specified in the record; the number of characters developed in the sort key has reached a specified cut-off point or the default value of 40; or a delimiter indicating the end of the name, or the end of the name proper, is encountered (a search being then made beyond the delimiter for a date, which, if found, is added to the sort key). The sort key is derived by transferring letters (or, in the case of a date, numbers) from the source, with occasional modifications as described below, and inserting one of four filing codes at the end of each word or element of the name. In early work, single special characters were used as filing codes, but this was inappropriate as a general solution since the filing order of these characters depended on the collating sequence peculiar to a particular computer. Furthermore, it was inconvenient because it involved changing all blanks to something else, since a blank within a name with its implication of something to follow should not file as low as what- ever indicates the very end of the name. The idea of using a two-character code, the first always being blank so that any filing code will file ahead of any letter or date, was derived from Nugent ( l) and has been followed in all later work. Only three filing codes were actually used in compiling indexes to the MARC I tapes, and in the first description privately cir- culated by the author ( 6 ). However, at least four are now seen to be necessary, actual need to. distinguish the second and third not yet having been encountered but being possible: Code (blank followed by: ) 3 5 6 7 Placement The end of the name including date if any. Between the name proper and a date. The end of the surname. The end of any other "word" of the name. (A word is any element followed by a blank, hyphen, comma, or period, except that prefixes which are identified as such are not considered separate words. ) The following examples illustrate the use of the codes and the general workings of the system. In this and later examples, the left hand column gives data in MARC I format (where diacritics are represented by super- script numbers preceding the letters to which they apply, and the equal sign is a delimiter ), and the right hand column gives the sort key as derived by the macro. 188 Journal of Library Automation Vol. 4/4 December, 1971 Arthur Arthur, Joseph Arthur, Joseph,= 1875- Arthur, Joseph Charles Arthur-Behenna, K. Arthur-Petr2os, Gabriele Maria Wilson, William Wilson, William,= 1923- Wilson, William Lyne Wilson-Browne, A. E. arthur 3 arthur 6joseph 3 arthur 6joseph 51875 3 arthur 6joseph 7 charles 3 arthur 7behenna 6k 3 arthur 7petros 6gabriele 7maria 3 wilson 6william 3 wilson 6william 51923 3 wilson 6william 7lyne 3 wilson 7browne 6a 7 e 3 The use of the numbers 3, 5, 6, and 7 is arbitrary to a degree. An interval was left between 3 and 5 so that the end of name code could be changed to 4 if the name were a subject rather than a main or added entry. No extra interval to accommodate added entry as distinguished from main entry was left because the author did not wish to encourage what he regards as an unwise practice. However, those who insist may easily substitute a new series of codes allowing for it. The distinction between end of name and end of surname serves to bring simple forename entries, that is those consisting of a single word, e.g. Sophocles, ahead of similar surnames, e.g. Sophocles, Evangelinus Apostolides. No serious work has yet been undertaken on the problem of processing complex forenames, but the distinctive tagging of forenames in MARC II has made available a growing body of experimental data and the codes 1 (and 2 for subject) are reserved for possible future use in this connection, without any intent of prejudging the question whether complex forename entries should come before similar surnames. It is the view of the author that the filing of complex forename entries is one of the areas in which all librarians are on most uncertain grounds in assessing the preference and convenience of readers. In handling such entries as Alexander, Mrs., or Maurice, Sister, the algorithm depends on the presence of a delimiter before Mrs. or Sister to avoid filing after Alexander, Milton or Maurice, Robert. Such delimiters were in fact present in the MARC Pilot Project data. Despite the limitations mentioned in dealing with multiple-word forename entries and with sur- names lacking forenames, the algorithm is well suited to names in the normal modern pattern, namely a simple or compound surname followed by a comma and one or more given names or initials. Furthermore, very specifically, it deals with prefix names. Prefixes with apostrophes are taken care of by a general dropping out of apostrophes and other non-significant punctuation: [L'Isle, Guillaume de] lisle 6guillaume 7de 3 O'Brian, Robert Enlow obrian 6robert 7 enlow 3 The same feature also handles such names as the following: Prud'homme, Louis Arthur prudhomme 6louis 7arthur 3 Ta'Bois, Roland tabois 6roland 3 Processing of Personal Names/PALMER 189 Most prefixes, however, are dealt with by a specific search based on examining the first letter of each new "word" of the name. If the element begins with A, B, D, E, F, I, L, M, 0, S, T, V, or Z, a branch is made to a prefix searching routine tailor-made for the particular letter. Takin& names beginning with L as an example, if the second character is "e," "a,' or "o," a prefix may be present; otherwise the prefix search is discontinued. If still searching and the third character is a blank or a hyphen, a prefix is adjudged to be present. The letters "le," "la," or "lo" are moved to the sort key output field. Three input and two output characters are counted, effectively skipping over the blank or hyphen. Similarly, if the third character is an "s" followed by a blank or a hyphen, "les," "los," or "las" is moved with a count of four input and three output. Otherwise there is no prefix. La Place, Pierre Antoine de laplace 6pierre 7antoine 7de 3 Las Cases, Philippe de las cases 6philippe 7 de 3 Le Fanu, Joseph Sheridan lefanu 6joseph 7sheridan 3 Lo Presti, Salvatore Iopresti 6salvatore 3 Routines for other letters, similar in approach but varying in detail, produce similar results: Degli Antoni, Carlo degliantoni 6carlo 3 De La Roche, Mazo delaroche 6mazo 3 Fitz Gibbon, Constantine fitzgibbon 6constantine 3 Van der Bijl, Hendrick Johannes vanderbijl 6hendrick 7johannes 3 The search for prefixes and quasi-prefixes is not limited to the first surname. It is and quite plainly should be extended to given names: Bundy, McGeorge bundy 6macgeorge 3 Bundy, Mary Lee bundy 6mary 7lee 3 Whether it should be extended to later elements of compound surnames is problematical. Bowing to the fact that filing is as much an art as a science, in practice a compromise was reached: the prefix search was extended to compounds, except when the prefix of the succeeding element begins with D. The exception was made to accommodate the large number of Hispanic names in this pattern, since it seemed clearly preferable to file all the names beginning "Perez de" before any of those beginning "Perez del": P2erez, Joaqu2in perez 6joaquin 3 P2erez de Urbel, Justo perez 7de 7urbel 6justo 3 P2erez del Castillo, J os2e perez 7 del 7 castillo 6jose 3 P2erez Gald2os, Benito perez 7 galdos 6benito 3 Perhaps skipping prefix treatment in subsequent elements should have been made the rule rather than the exception; but an exception would then have been required for "Me," "St.," and perhaps others. A list of the prefixes and quasi-prefixes sought for is given in Table 1. Note that in some cases the result is considered doubtful, and a special signal is set. In such situations the program can then set another signal within the macro and reprocess the name using alternate rules. 190 Journal of Library Automation Vol. 4/4 December, 1971 Table 1. List of Prefixes, Etc., Found by Special Search A 1, 4, 7 Den St. 4, 15 A 2, 4 Der 4, 11, 18 Ste. 16 Ab Des Te 4, 11 al 5 Di Ten 4 AI 3, 4, 6 Do Ter An 4, 7 Dos 4, 11 The 1, 4, 8 Ap Du Van 1, 17 At el 5 Van 2, 4, 12, 17 Aus 17 El 3, 4,6 Van' ... 4,9 Aus' ... 4, 9 Fitz Vande Bar 10 Im Vanden Bat 10 In 17 Vander Ben 10 La Ver Da Las Von 17 Das 4, 12 Le Vande De 17 Les Vanden Degli 1 Lo Vander Dei Los Z 4, 5 Del M' 4, 14 Zu 17 Della Mac Zum Delle Me 13 Zur Della 0 1. Only when followed by blank. 2. Only when followed by hyphen. 3. Only when upper case. 4. "Doubt" signal is set. 5. Bypassed, i.e. dropped out and disregarded. 6. Bypassed if "alternate" signal is on. 7. Bypassed unless "alternate" signal is on. 8. Bypassed if first word. 9. Aus'm and Van' t are closed up to "ausm" and "vant'' by the general dropping of apostrophes but no attempt is made at further special processing since their rarity would not justify the necessary elaboration of the algorithm. 10. Not treated as prefix if special parameter is present. 11. Not treated as prefix if "alternate" signal is on. 12. Not treated as prefix unless "alternate" signal is on. 13. Expanded to "mac". 14. Expanded to "mac" unless "alternate" signal is on. 15. Expanded to "saint". 16. Expanded to "sainte". 17. Another prefix may follow, as in De La. 18. Previous notes do not apply when preceded by Van or Von. Processing of Personal Names/PALMER 191 Diacritical marks on other than the first letter, or capitalization beyond the normal, such as all caps., would prevent proper processing. Except as indicated, lower case is included along with upper, and prefixes followed by a hyphen are treated the same as those followed by a blank. The MARC I corpus included several names with hyphenated prefixes, and fortuitously a method was available with the 1401 for giving the hyphen search almost a "free ride" along with that for the blank. Since the code for hyphen was a single bit, the so-called B bit, and a blank was represented by no bits, a "branch if bit equal" instruction specifying all the other bits, A, 8, 4, 2, and 1, would branch if any character other than blank or hyphen was present. Implementations for other machines may have to devote a disproportionate number of instructions to the search for the rare hyphen- ated prefixes, or else risk missing them. No doubt some other prefixes could be added to the list. "Ua," for example, was considered but not included in the actual working macro after examination of a catalog of five million cards showed that only two beginning with these two letters were not for the prefix. The increase in processing time involved in adding another initial letter to the list of those looked for did not seem to be justified. In the program employing the macro for production of an index to names in the MARC Pilot Project data, whenever the "doubt" signal was set, the name was printed on an edit list for human inspection. The name was then reprocessed with the "alternate" signal set and if a different output form was developed, this form also was printed. If the person reviewing the list accepted the first form, no special action was necessary. If the second was preferred, a card with an identifying number and the code 2 was punched; if a hand-made form was needed, this form was entered on a card with the code 3. These cards and the original output tape were then used to produce an edited output tape, in which the alternate forms were dropped unless a card directed otherwise. A second printed listing, re- cording the action taken, was also produced. The doubtful cases identified by the algorithm are not limited to the prefix problems described above. By far the commonest occasion for doubt was the presence of "a," "o," or "ii." Was it a Germanic umlaut, calling for translation for filing purposes to "ae," "oe," or "ue," or was it something else? This is not the place to debate the practice, followed in most American academic libraries, of filing umlauted letters as if spelled out with an "e." The major bibliographies covering the German book trade do so, but most German dictionaries and encydopedias do not; the example of other reference works and indexes is mixed. Since the aim of the work described here was to produce an index of names that could be used comfortably by librarians used to the practice, a means of continuing it was sought. However, it would be manifestly improper to insert an "e" if the mark were a diaeresis rather than an umlaut; and, in the opinion of the writer, almost equally improper for Hungarian, Finnish, and Turkish vowels. Even 192 Journal of Library Automation Vol. 4/4 December, 1971 those who do file such vowels in these languages as if they were Germanic do not usually do so for Chinese. It should be noted here that not all transformations of special letters turn on the doubt signal. "A" is routinely translated to "aa" and Icelandic thorn to "th." Other occasions for signalling doubt include names with a suspiciously high number of words before the first comma. This provision was introduced in an attempt to catch some non-names in the original data which had been wrongly coded, e.g. Women's Association of the St. Louis Symphony. When found, a card with the code D was punched for the edit run to delete these entries entirely. Statistics of processing for the entire corpus of MARC Pilot Project data as cumulated and to some slight degree edited at the Harvard University Library will be useful in seeing the edit list in proper perspective. The entire file consisted of 47,884 records, 4,285 of which lacked names. The remaining 43,599 records contained 55,286 names ( or alleged names ). Of these, 52,372 or 94.7% were judged to be purely routine. Special pro- cessing of some sort not involving doubt (e.g., recognition of compound surname, expansion of "Me" to "Mac," closing up of apostrophe or non- doubtful prefix) was performed on 2,283 names, or 4.1%. The total number of doubtful names printed on the edit list was 631, or 1.1%. Somewhat more than half of these ( 334) resulted in different forms on being reprocessed with the "alternate" signal on. In 562 of the 631 doubtful cases, or 89% of this group, the first or only form printed was accepted, so that no action beyond inspection was necessary. Only 69 names, or not quite one out of 800 of the whole number, required the punching of a card-47 to indicate choice of the second form, 14 supplying a hand-made form, and 8 calling for deletion of non-names. Subsequent changes in the macro would have reduced considerably the number of names requiring hand-made forms. It will be instructive to examine some of the names from the edit list to see what types of problems arise. The first selection of actual consecutive names (from LC card number 66-15363 through 66-17297) is rather typical: Barnard, Douglas St. Paul barnard 6douglas 7saint 7paul 3 Ekel4of, Gunnar,= 1907- ekeloef 6gunnar 51907 3 or: ekelof 6gunnar 51907 3 Woolley, AI E. woolley 6al 7e 3 Sch4onfeld, Walther H. P., = 1888- schoenfeld 6walther 7h 7p 51888 3 or: schonfeld 6walther 7h 7p 51888 3 ]4anner, Michael jaenner 6michael 3 or: janner 6michael 3 M4 uller, Alois,= 1924- mueller 6alois 51924 3 or: muller 6alois 51924 3 Huang, Y4uan-shan huang 6yuean 7shan 3 or: huang 6yuan 7shan 3 M4uller, Kurt,= 1903 mueller 6kurt 51903 3 or: muller 6kurt 51903 3 Processing of Personal Names/PALMER 193 Note the dominance of simple umlauts; also, as a curiosity, the fact that all persons named "Al" appear on the list because of the possibility that it might be an unhyphenated Arabic prefix. Note also that Saint is treated as a separate word, not closed up as a prefix. "St." was originally put on the doubtful list with the thought that it might stand for Sankt or Szent instead of Saint, although normal library practice would not use an abbre- viation in such cases. Its inclusion on the doubtful list was unexpectedly justified, however, by the occurrence of the name Erlich, Vera St. It seems likely that in this case "St." may stand for a patronymic, perhaps Stojanova or Stefanova, and there may be other occasions on which St. rather than S. is used as an abbreviation for such a name as Stefan ( cf. the French use of Ch. rather than simple C. as an abbreviation for Charles). The only action required for the names in the list above would be to punch a "2" card for the Chinese name Huang, Yuan-shan. Indeed, just as the umlaut is the largest category on the edit list, so the non-umlaut- a diacritic that looks like an umlaut but does not call for insertion of "e"- is the commonest occasion for punching an exception card. Occasionally a diaeresis is found: Lecomte du No 4 uy, Pierre lecomte 7du 7nouey 6pierre 3 or: lecomte 7du 7nouy 6pierre 3 More common are certain front vowels in Hungarian, Finnish, or Turkish, or the vowel ii in Chinese as already encountered: F 4oldi, Mih2aly foeldi 6mihaly 3 T4 olgyessy, Juraj or: foldi 6mihaly 3 toelgyessy 6juraj 3 or: tolgyessy 6juraj 3 mettaelae 7portin 6raija 3 or: mettala 7portin 6raija 3 naervaenen 6sakari 3 or: narvanen 6sakari 3 inoenue 6e 3 or: inonu 6e 3 suemer 6mine 3 or: stuner 6mine 3 yue 6ying 7shih 3 or: yu 6ying 7shih 3 Some libraries avoid the problem by treating all but the last of these as if umlauted, but determination of the correct category can usually be made at sight. Occasionally a name gives pause, for example these two which both prove to be Swiss and presumably Germanic, although Chonz may be Romansh: Ch4onz, Selina R4 uede, Thomas or: or: choenz 6selina 3 chonz 6selina 3 rueede 6thomas 3 ruede 6thomas 3 194 Journal of Library Automation Vol. 4/4 December, 1971 Somewhat more troublesome are names where some but not all elements are Germanic: Vogt, Ulya (G4oknil) Ouchterlony, 40rjan vogt 6ulya 7 goeknil 3 or: vogt 6ulya 7 goknil 3 ouchterlony 6oerjan 3 or: ouchterlony 6orjan 3 ivanyi 7 gruenwald 6bela 3 or: ivanyi 7grunwald 6bela 3 Although Vogt is obviously Germanic, Ulya Goknil is equally obviously not, and therefore the decision is that no umlaut is present. Orjan, on the other hand, is a Scandanavian forename, to be treated as umlauted even though coupled with a surname of Scottish Gaelic origin. Bela Ivanyi- Grunwald is a more difficult case. Grunwald is of course Germanic in origin, but can it be regarded as Magyarized? In English we might assume that such a name is Anglicized when the bearer starts writing it Grunwald or Gruenwald. However, the case is not so clear in Hungarian, since that language also has the letter "u." Discussion of such a point may seem to split hairs, but it does involve a significant difference between manual and machine systems. In a manual system, the question of whether to file as Ivanyi-Grunwald or as Ivanyi-Gruenwald would arise only in the exceed- ingly unlikely event that another name which would file between the two also occurred in the corpus. In a machine system, however, any difference, even this late in a distinctive name, could result in the various works of the author being misfiled among themselves, or a work about him filed before one by him. Use of different codes to represent the same graphic, umlaut on the one hand or diaeresis or other non-umlaut on the other, would drastically reduce both the number of doubtful names aud Lhe number of those for which an exception procedure is required. The Harvard College Library actually follows this practice. The Library of Congress experimented with it, but found that catalogers were reluctant in some cases to make the decision. Contemplation of the case of Bela Ivanyi-Grunwald gives the author more sympathy with this reluctance than he originally felt. In attempting to evaluate the method described above, one must acknowl- edge both strong points and limitations. On the one hand it is very gratifying to see AEsop us and [A esopus] falling together despite differences in the capitalization of the "e" and the bracketing, and to find such sequences as the following, all without even being referred to the edit list under the rules then prevailing: Aziz, Khursheed Kamal Aziz Ahmad al-Azm, Sadik J. Azrael, Jeremy R. Ba Maw, U Baab, Clarence Theodore aziz 6khursheed 7kamal 3 aziz 7 ahmad 3 azm 6sadik 7j 3 azrael 6jeremy 7r 3 ba 7maw 6u 3 baab 6clarence 7theodore 3 Processing of Personal Names/PALMER 195 Delgado, David J. Del Grande, John Joseph Delhom, Louis A. Delieb, Eric DeLise, Knoxie C. De Lisser, R. Lionel Dell, Ralph Bishop Dellinger, Dave Dell'Isola, Frank Del Mar, Alexander Delmar, Anton Delmar-Morgan, Edward Locker delgado 6david 7j 3 delgrande 6john 7joseph 3 delhom 6louis 7 a 3 delieb 6eric 3 delise 6knoxie 7 c 3 delisser 6r 7lionel 3 dell 6ralph 7bishop 3 dellinger 6dave 3 dellisola 6frank 3 delmar 6alexander 3 delmar 6anton 3 delmar 7morgan 6edward 7locker 3 While it is certainly true that the system cannot survive without some provision for referring doubtful questions to a human editor, the number of these depends to a considerable extent on the filing and coding policies followed. Provided forename entries are coded as such, the system does a good job of identifying possible problems. (Presently, all multiple word forename entries are considered doubtful.) "U a" has already been cited as an example of a prefix deliberately omitted, and there are others which could be added at any time it is thought worth while. A more troublesome situation, pointed out by Kelley Cartwright, is the possible occurrence of "Van" as a non-final element of an unhyphenated Vietnamese name. The only way this could be prevented from misfiling by merging it with the next element would be to throw all "Vans" including the numerous ones of Dutch origin into the doubtful category, expanding the edit list more than twenty percent. This did not seem advisable, particularly since normal library usage is to hyphenate Vietnamese compound names. Up to this point the evaluation is quite favorable. The system can correctly process a very large proportion of names, including some which involve quite sophisticated points, without reference to a human editor, and it can call virtually all the rest to the attention of an editor. However, human review of problems means that there will be occasions when border- line cases are decided in different ways. If a permanent machine file of all established forms of names in the system is kept, both forms of each doubtful name could be checked against it so that decisions already made would not have to be repeated, thus saving the time of the editor as well as the hazard of differing decisions. It would of course be very expensive to keep such a file just for this purpose, but a file of this type would probably form a part of a comprehensive mechanized bibliographic system anyway. Another area in which a mixed report would have to be given to the system is its extensibility to types of headings other than names. In work conducted on the same principles with a few thousand early titles from the MARC Pilot Project, there were only two conspicuous problems, one of which may not in fact be a problem: the filing of numbers as such rather 196 Journal of Library Automation Vol. 4/4 December, 1971 than as if they were spelled out in the language of the title. True, the particular algorithm then in use did not provide for bringing numbers of differing length into logical order ("50 great ghost stories" before "200 years of watercolor painting in America" ), but this is a readily attainable refinement. The other problem is more refractory and is exemplified by titles beginning with prefix names, for example "De Gaulle," "De Soto," and "Van Gogh." Names within titles could not receive the usual name treat- ment since there was no way of identifying them as such, and therefore the prefixes were filed as separate words. Furthermore, while MARC Pilot Project authors were quite a cosmopolitan lot, the titles were almost entirely in English. Therefore, removal of initial articles was not much of a problem. There did not happen to be any work beginning "A to Z of ... ". However, there was a book which, although in English and so coded, had a title beginning with a Spanish article: "La vida," by the late Oscar Lewis. In working toward automatic removal of initial articles from titles, the usual assumption is that machine coding of the language of the work is available and will be checked first. This seems desirable both because it is probably more efficient in machine time than to check every title against a long list of possible articles in many languages, and because words that are articles in one lan~uage are not necessarily so in another. Most occurrences of initial "die' are probably German articles, but some are other parts of speech in English, for example "Die Casting" or "Die like a Dog." If the umlaut is the common problem in names, the initial indefinite article which is the same as the numeral "one" in several languages may well be the most frequent occasion for doubt in processing of titles. "Un" or "ein" will usually mean "A," to be dropped; but will sometimes mean "One," to be kept. There are certainly other problems, in addition to the one with prefix names already mentioned, including some that give trouble even in manual filing: "Charles the First," "Charles II," "Charles V et son temps." It may be that at some point in the cataloging process a reviser will have to be on the lookout for certain of these special situations and add flags to indicate that a title includes a prefix name, or that it begins with an article which would not be found by program, or that it does not begin with an article although it appears to do so, or that for some other reason it calls for a hand made key. The system described is not an absolute system, but absolute systems have their own tyrannies. If, as the author believes, Cartwright and Shoffner ( 2) are correct in thinking that a mixture of methods will be required in actual book catalog projects, then a system along the lines of the one described may well be a useful part of the mix. REFERENCES 1. Nugent, William R.: "The Mechanization of the Filing Rules for the Dictionary Catalogs of the Library of Congress," Library Resources & Technical Services, 11 (Spring 1967), 145-166. Processing of Personal Names/PALMER 197 2. Cartwright, Kelley L.; Shoffner, Ralph M.: Catalogs in Book Form ([Berkeley]; Institute of Library Research, University of California, 1967 ), pp. 24-27. 3. Cartwright, Kelley L.: "Mechanization and Library Filing Rules," Ad- vances in Librarianship, 1 ( 1970 ), 59-94. 4. Hines, Theodore C.; Harris, Jessica L.: Computer Filing of Index, Bibliographic, and Catalog Entries (Newark, N.J.: Bro-Dart Founda- tion, [ 1966]). 5. Harris, Jessica L.; Hines, Theodore C.: "The Mechanization of the Filing Rules for Library Catalogs: Dictionary or Divided," Library Resources & Technical Services, 14 (Fall 1970 ), 502-516. 6. Palmer, Foster M.: A Macro Instruction to Process Personal Names for Filing ([Cambridge, Mass.]: Harvard University Library, 1970 ). A copy of this document, which contains an Autocoder listing of the actual working macro, has been deposited with the National Auxiliary Publi- cations Service, from which it can be obtained on microfiche (NAPS 01680 ) . In this version there are only three codes, 2 corresponding to 3 as used in this paper, 4 to both 5 and 6, and 6 to 7. There are also a few differences in the treatment of particular prefixes. The macro is made up of 579 cards, of which 125 are comments only. 5595 ---- lib-MOCS-KMC364-20140106083930 198 AN ALGORITHM FOR COMPACTION OF ALPHANUMERIC DATA William D. SCHIEBER, George W. THOMAS: Central Library and Documentation Branch, International Labour Office, Geneva, Switzerland Description of a technique for compressing data to be placed in computer auxiliary storage. The technique operates on the principle of taking two alphabetic characters frequently used in combination and replacing them with one unused special character code. Such une-for-two replacement has enabled the ILO to achieve a rate of compression of 43.5% on a data base of approximately 40,000 bibliographic records. INTRODUCTION This paper describes a technique for compacting alphanumeric data of the type found in bibliographic records. The file used for experimenta- tion is that of the Central Library and Documentation Branch of the International Labour Office, Geneva, where approximately 40,000 bibli- ographic records are maintained on line for searches done by the Library for its clients. Work on the project was initiated in response to economic pressure to conserve direct-access storage space taken by this particularly large file. In studying the problem of how to effect compaction, several alternatives were considered. The first was a recursive bit-pattern recognition technique of the type developed by DeMaine ( 1,2), which operates mdependently of the data to be compressed. This approach was rejected because of the apparent complexity of the coding and decoding algorithms, and also because early analyses indicated that further development of the second type of approach might ultimately yield higher compression ratios. Compaction of Alphanumeric DatajSCHIEBER and THOMAS 199 The second type of approach involves the replacement, by shorter non- data strings, of longer character strings known to exist with a high fre- quency in the data. This technique is data dependent and requires an analysis of what is to be encoded. One such method is to separate words into their component parts: prefixes, stems and suffixes; and to effect compression by replacing these components with shorter codes. There have been several successful al- gorithms for separating words into their components. Salton ( 3) has done this in connection with his work on automatic indexing. Resnikoff and Dolby ( 4,5) have also examined the problem of word analysis in English for computational linguistics. Although this method appears to be viable as the basis of a compaction scheme, it was here excluded because ILO data was in several languages. Moreover, Dolby and Resnikoff's encoding and decoding routines require programs that perform extensive word analysis and dictionary look-up procedures that ILO was not in a position to develop. The actual requirements observed were twofold: that the analysis of what strings were to be encoded be kept relatively simple, and that the encoding algorithm must combine simplicity and speed presumably by minimizing the amount of dictionary look-up required to encode and decode the selected string. One of the most straightforward examples of the use of this technique is the work done by Snyderman and Hunt ( 6 ) that involves replacement of two data characters by single unused computer codes. However, the algorithm used by them does not base the selection of these two-character pairs (called "digrams") on their frequency of occurrence in the data. The technique described here is an attempt to improve and extend the concept by encoding digrams on the basis of frequency. The possibility of encoding longer character strings is also examined. Three other related discussions of data compaction appear in papers by Myers et al. (7) and by DeMaine and his colleagues (8,9). THE COMPRESSION TECHNIQUE The basic technique used to compact the data file specifies that the most-frequently occurring digrams be replaced by single unused special- character codes. On an eight-bit character machine of the type used, there are a total of 256 possible character codes (bytes ) . Of this total only a small number are allocated to graphics (that is, characters which can be reproduced by the computer's printer). In addition, not all of the graphics provided for by the computer manufacturer appear in the user's data base. Thus, of the total code set, a large portion may go unused. Characters that are unallocated may be used to represent longer character strings. The most elementary form of substitution is the replacement of specific digrams. If these digrams can be selected on the basis of frequency , the compression ratio will be better than if selection is done independent of frequency. 200 Journal of Library Automation Vol. 4/4 December, 1971 This requires a frequency count of all digrams appearing in the data, and a subsequent ranking in order of decreasing frequency. Once the base character set is defined, and the digrams eligible for replacement are selected, the algorithm can be applied to any string of text. The algorithm consists of two elements: encoding and decoding. In encoding, the string to be encoded is examined from left to right. The initial character is examined to determine if it is the first of any encodable digram. If it is not, it is moved unchanged to the output area. If it is a possible candidate, the following character is checked against a table to verify whether or not this character pair can be replaced. If replacement can be effected, the code representing the digram is moved to the output area. If not, the algorithm then moves on to treat the second character in precisely the same way as the first. The algorithm continues, character-by-character until the entire string has been encoded. Following is a step-by-step description of the element. 1) Load length of string into a counter. 2) Set pointer to first character in string. 3) Check to determine whether character pointed can occur in com- bination. If character does not occur in combination, point to next character and repeat step 3. 4) If character can occur in combination, check following character in a table of valid combinations with the first character. If the digram cannot be encoded, advance pointer to next character and return to step 3. 5) If the digram is codable, move preceeding non-codable characters (if any) to output area, followed by the internal storage code for the digram. 6) Decrease the string length counter by one, advance pointer two positions beyond current value and return to step 3. In the following example assume that only three digrams are defined as codable: AB, BE and DE. Assume also that the clear text to be encoded is the six-character string ABCDEF. After encoding the coded string would appear as: AB C DE F A horizontal line is used to represent a coded pair, a dot shows a single (non-combined) character. The encoded string above is of length four. Note that although BC was defined as an encodable digram, it did not combine in the example above because the digram AB was already encoded as a pair. The characters C and F do not combine, so they remain uncoded. Note also that if the digram AB had not been defined as codable, the resultant combination would have been different in this case: A BC DE F Compaction of Alphanumeric Data j SCHIEBER and THOMAS 201 The decoding algorithm serves to expand a compressed string so that the record can be displayed or printed. As in the encoding routines, de- coding of the string goes from left to right. Bytes in the source string are examined one by one. If the code represents a single character, the print code for that character is moved to the output string. If the code represents a digram, the digram is moved to the output string. Decoding proceeds byte-by-byte as follows until end of string is reached: 1 ) Load string length into counter. 2 ) Set pointer to first byte in record. 3 ) Test character. If the code represents a single character, point to next source byte and retest. 4) If the code represents a digram: move all bytes ( if any ) up to the coded digram; and move in the digram. 5) Increase the length value by one, point to next source byte and continue with step 3. APPLICATION OF THE TECHNIQUE The algorithm, when used on the data base of approximately 40,000 records was found to yield 43.5% compaction. The file contains bibliographic records of the type shown in Figure 1. 413.5 1970 70Al350 WARNER M STONE M THE DATA BANK SOCIETY- ORGANIZATIONS, COMPUTERS AND SOCIAL FREEDOM. LONDON, GEORGE ALLEN AND UNWIN, <1970>. 244 P. CHARTS. /SOCIAL RESEARCH/ INTO THE POTENTIAL THRF.AT TO PRIVACY AND FREEDOM f/HUMAN RIGHT/Sl THROUGH THF MISUSE OF /DATA BANK/S - EXAMINES /COMPUTER/ BASED /INFORMATION ---- ~IEVAL/, THE IMPACT OF COMPUTER TECHNOlOGY ON BRANCHES OF THE /PUBLIC ADMINISTRATION/ ANn /HEALTH SERVICE/$ IN THE /USA/ ANO THE /UK/ ANO CO~CLUOES THAT, IN ORDER TO PROTECT HUMAN DIGNITY, THE NEW POWERS MUST BE KEPT TN CHF.CK. /BIBLIOGRAPHY/ PP. 236 TO 242 ANO /REFERENCE/$. ENGL Fig. 1 . Sample Record from Test File. Each record contains a bibliographic se gment as well as a brief abstract containing descriptors placed between slashes for computer identification. A large amount of blank space appears on the printed version of these records; however, the uncoded machine readable copy does not contain blanks, except between words and as filler characters in the few fields de- fined as fixed-length. The average length of a record is 535 characters ( 10) . 202 Journal of Library Automation Vol. 4/4 December, 1971 The valid graphics appearing in the data are shown in Table 1, along with the percentage of occurrence of each character throughout the entire file. Table 1. Single-Character Frequency Freq. Freq. Freq. Freq. Freq. Graphic % Graphic % Graphic % Graphic % Graphic % b 14.87 I 4.32 H 1.58 0.63 8 0.31 E 7.63 c 3.48 1.52 w 0.50 ( 0.28 N 6.38 L 3.32 ' 1.52 2 0.42 ) 0.28 I 6.01 D 2.32 1 1.08 K 0.42 + 0.21 A 6.01 u 2.21 v 0.91 3 0.40 J 0.15 (/J 5.86 p 2.12 B 0.87 5 0.37 X 0.14 T 5.50 M 2.02 9 0.83 7 0.37 z 0.13 R 4.82 F 1.61 y 0.82 0 0.35 Q 0.08 s 4.61 G 1.58 6 0.81 4 0.34 Misc. 0.01 Spec. As might be expected, the blank (b) occurs most frequently in the data because of its use as a word separator. The slash occurs more frequently than is normal because of its special use as a descriptor delimiter. It should also be noted that the data contains no lower-case characters. This is advantageous to the algorithm because it considerably le~sens the total number of possible digram combinations. As a result, a larger proportion of the file is codable in the limited set chosen as codable pairs, and because the absence of 26 graphics allows the inclusion of 26 additional coded pairs. In the file used for compaction there are 58 valid graphics. Allowing one character for special functions leaves 197 unallocated character codes (of a total of 256 possible ). A digram frequency analysis was performed on the entire file and the digrams ranked in order of decreasing frequency. From this list the first 197 digrams were selected as those which were eligible for replacement by single-character codes. Table 2 shows these "encodable" digrams arranged by lead character. The algorithm was programmed in Assembler language for use on an IBM 360/40 computer. The encoding element requires approximately 8,000 bytes of main storage; the decoding element requires approximately 2,000 bytes. In order to obtain data on the amount of computer time required to encode and decode the file, the following tests were performed. To find the encoding time, the file was loaded from tape to disk. The tape copy of the file was uncoded, the disk copy compacted. Loading time for 41,839 records was 52 minutes and 51 seconds. The same tape to disk operation without encoding took 28:08. The time difference ( 24:43) represents encoding time for 41,839 records, or .035 seconds per record. A decoding test was done by unloading the previously coded disk file to tape. The time taken was 41:52, versus a time of 20:20 for unloading Compaction of Alphanumeric DataiSCHIEBER and THOMAS 203 an uncompacted file. The time difference (21:32) represents decoding time for 41,839 records, or .031 seconds per record. The compaction ratio, as indicated above, was 43.5 per cent. For purposes of comparison, the algorithm developed by Snyderman and Hunt ( 6) was tested and found to yield a compaction ratio of 32.5% when applied to the same data file. Table 2. Most Frequently Occuring Digrams Lead Char. A B c D E F G H I L M N 0 p R s T u v w y b 1 I ) Eligible Digrams AB AC AD AG AI AL AM AN AP AR AS AT Ab BL BO CA CE CH CI CL CO CT CU Cb C. DEDI DU Db Dl EA EC ED EF EL EM EN EP ER ES ET EV Eb El FE FIFO FR F~ GE GL GR Gb Gl HA HE HI HO Hb lA IC IE IL IN 10 IS IT IV LA LE LI LL LO LU Us MA ME MI MM MU MhS NA NC ND NE NG NI NO NS NT Nla Nl OC OD OF OG OL OM ON OP OR OU OV Ol,a PA PE PL PO PR P. RA RE RI RK RN RO RS RT RU RY Rb Rl SA SE Sl SO SP SS ST SU ShS S, S. TA TC TE TH TI TO TR TS TU TY Tb T I UC UD UL UN UR US UT VA VE VI wo YhS Yl liSA hSB bC bD bE hSG lal laL bM bN bO hiP l;6R bS hiT l;6U l;6W };6};6 l/J I l/J-. l/J ( 19 1 A ; c JE 11 / L ; M JP JR ; s JT Jb 1, ,b .l/J -b ), POSSIBLE EXTENSION OF THE ALGORITHM Currently the compression technique encodes only pairs of characters. There might be good reason to extend the technique to the encoding of longer strings-provided a significantly higher compaction ratio could be 204 Journal of Library Automation Vol. 4/4 December, 1971 achieved without undue increase in processing time. One could consider encoding trigrams, quadrigrams, and up to n-grams. The English wo~d ·'the", for example, may occur often enough in the data to make it worth coding. The arguments against encoding longer strings are several. Prime among these is the difficulty of deciding what is to be encoded. Doing an analysis of digrams is a relatively straightforward affair, whereas an analysis of trigrams and longer strings is considerably more costly, because of the fact that there are more combinations. Furthermore, if longer strings are to be en'coded, the algorithms for encoding and decoding become more complex and time-consuming to employ. One approach to this type of extension is to take a particular type of character string, namely a word, and to encode certain words which appear frequently. A test of this technique was made to encode particular words in the data: descriptors . All descriptors (about 1200 in number) appear specially marked by slashes in the abstract field of the record. Each descriptor (including the slashes) was replaced by a two-character code. After replacement, the normal compaction algorithm was applied to the record. A compaction ratio of 56.4% was obtained when encoding a small sample of twenty records ( 10,777 characters). The specific difficulty anticipated in this extension is the amount of either processing time or storage space which the decoding routines would require. If the look-up table for the actual descriptor values were to be located on disk, the time to retrieve and decode each record might be rather long. On the other hand, if the look-up table were to be in main storage at the time of processing, its size might exclude the ability to do anything else, particularly when on-line retrieval is done in an extremely limited amount of main storage area. A partial solution to this problem might be to keep the look-up tables for the most frequently occurring terms in main storage and the others on disk. At present further analysis is being done to determine the value of this approach. CONCLUSIONS The compaction algorithm performs relatively efficiently given the type of data used in text data base (i.e. data without lower case alphabetics, having a limited number of special characters, in primarily English text ). The times for decoding individual records ( .031 sec/ record ) indicate that on a normal print or terminal display operation, no noticeable increase in access time will be incurred. However several types of problems are encountered when treating other kinds of data. Since the algorithm works on the basis of replacing the most-frequently occurring n-grams by single-byte codes, the compaction ratio is dependent on the number of codes that can be "freed up" for n-gram representation. The more codes that can be reallocated to n-grams, the better the com- paction. Data which would pose complications to the algorithm-as cur- rently defined-can be separated for discussion as follows: Compaction of Alphanumeric DatajSCHIEBER and THOMAS 205 1) data containing both upper and lower case characters (as well as a limited set of special characters), and 2) data which might possibly contain a wide variety of little-used special graphics. If lower-case characters are used, a possible way to encode data using this technique is to harken back to the time-honored method of repre- senting lower-case with upper-case codes, and upper-case characters by their value, preceeded by a single shift code (e.g., #ACCESS for Access). The shift code blank character digram would undoubtedly figure relatively high on the frequency list, making it eligible as an encodable digram. The second problem occurs when one attempts to compact data having a large set of graphics. A good example of this is bibliographic data con- taining a wide variety of little-used characters of the type now being provided for in the MARC tapes ( 11) issued by the U. S. Library of Congress (such as the Icelandic Thorn). Normally representation of these graphics is done by allocating as many codes as required from the possible 256-code set. Since the compaction ratio is dependent on the number of unallocated internal codes, a possible solution to this dilemma might be to represent little-used graphics by multi-byte codes which would free the codes for representation of frequently occurring n-grams. Further, it is noticeable that the more homogeneous the data the higher the compression ratio. This means that data all in one language will encode better than data in many languages. There is, unfortunately, no ready solution to this problem, given the constraints of this algorithm. In dealing with heterogeneous data one must be prepared to accept a lower com- pression factor. Without doubt to be able to effect a savings of around 40% for storage space is significant. The price for this ability is computer processing time, and the more complex the encoding and decoding routines, the more time is required. There is a calculable break-even point at which it becomes economically more attractive to buy x amount of additional storage space than to spend the equivalent cost on data compaction. Yet at the present cost of direct-access storage, compaction may be a possible solution for organizations with large data files. REFERENCES 1. Marron, B. A.; DeMaine, P. A. D.: "Automatic Data Compression," Communications of the ACM, 10 (November 1967), 711-715. 2. DeMaine, P. A. D.; Kloss, K.; Marron, B. A.: The SOLID System III: Alphanumeric Compression. (Washington, D. C. : National Bu- reau of Standards, 1967 ) . (Technical Note 413 ) . 3. Salton, G.: Automatic Information Organization and Retrieval (New York: McGraw-Hill, 1968 ). 4. Resnikoff, H. L.; Dolby, J. L.: "The Nature of Affixing in Written English," Mechanical Translation, 8 (March 1965), 84-89. 206 Journal of Library Automation Vol. 4/4 December, 1971 5. Resnikoff, H . L.; Dolby, J. L.: "The Nature of Affixing in Written English," Mechanical Translation, 9 (June 1966), 23-33. 6. Snyderman, Martin; Hunt, Bernard: "The Myriad Virtues of Text Compaction," Datamation (December 1, 1970), 36-40. 7. Myers, W.; Townsend, M.; Townsend, T.: "Data Compression by Hardware or Software," Datamation (April 1966), 39-43. 8. DeMaine, P. A. D.; Kloss, K.; Marron, B. A.: The SOLID System II. Numeric Compression. (Washington, D. C.: National Bureau of Stan- dards, 1967). (Technical Note 413 ). 9. DeMaine, P. A. D.; Marron, B. A.: "The SOLID System I. A Method for Organizing and Searching Files." In Schecter, G. (Ed.): Informa- tion Retrieval-A Critical View. (Washington, D. C.: Thompson Book Co., 1967). 10. Schieber, W.: ISIS (Integrated Scientific Information System; A Gen- eral Description of an Approach to Computerized Bibliographical Control). (Geneva: International Labour Office, 1971) . 11. Books: A MARC Format; Specification of Magnetic Tapes Containing Monographic Catalog Records in the MARC II Format. (Washington, D. C.: Library of Congress, Information Systems Office, 1970.) 5596 ---- lib-MOCS-KMC364-20140106084018 TITLE-ONLY ENTRIES RETRIEVED BY USE OF TRUNCA1'ED SEARCH KEYS 207 Frederick G. KILGOUR, Philip L. LONG, Eugene B. LIEDERMAN, and Alan L. LANDGRAF: The Ohio College Library Center, Columbus, Ohio. An experiment testing utility of truncated search keys as inquiry terms in an on-line system was performed on a file of 16,792 title-only bibliographic entries. Use of a 3,3 key yields eight or fewer entries 99.0% of the time. A previous paper ( 1) established that truncated derived search keys are efficient in retrieval of entries from a name-title catalog. This paper reports a similar investigation into the retrieval efficiency of truncated keys for extracting entries from an on-line, title-only catalog; it is assumed that entries retrieved would be displayed on an interactive terminal. Earlier work by Ruecking (2), Nugent (3), Kilgour (4), Dolby (5), Coe ( 6), and Newman and Buchinski ( 7) were investigations of search keys designed to retrieve bibliographic entries from magnetic tape files. The earlier paper in this series and the present paper investigate retrieval from on-line files in an interactive environment. Similarly, the work of Rothrock ( 8) inquired into the efficacy of derived truncated search keys for retrieving telephone directory entries from an on-line file. Since the appearance of the previous paper, the Ohio State University Libraries have developed and activated a remote catalog access and circulation control system employing a truncated derived search key similar to those described in the earlier paper. However, OSU adopted a 4,5 key consisting of the first four characters of the main entry and the first five characters of the title excluding initial articles and a few other nonsignificant words. Whereas the OSU system treats the name and title as a continuous string of characters, the experiments reported in this and the previous paper deal only with the first word in the name and title, articles always being excluded. 208 Journal of Library Automation Vol. 4/4 December, 1971 The Bell System has also recently activated a Large Traffic Experiment in the San Francisco Bay area. The master file in this system contains 1,300,000 directory entries. The system utilizes truncated derived keys like those investigated in the present experiments. MATERIALS AND METHODS The file used in this experiment was described in the earlier paper ( 1), except that this experiment investigates the title-only entries. The same programs used in the name-title investigation were used in this experiment; the title-only entries were edited so that the first word of the title was placed in the name field and the .11emaining words in the title field. As was the case formerly, it was necessary to clean up the file. Single word titles often carried in the second or title field such expressions as ONE YEAR SUBSCRIPTION or VOL 16 1968. In addition there were spurious character strings that were not titles, and in such cases the entire entry was removed from the file. Thereby, the original 17,066 title entries were reduced to 16,792. The truncated search keys derived from these title-only entries consist of the initial characters of the first word of the title and of the second word of the title. If there was no second word, blanks were employed. If either the first or second word contained fewer characters than the key to be derived, the key was left-justified and padded out with blanks. To obtain a comparison of the effectiveness of truncated research keys derived from title-only entries as related to first keys derived from name- title entries, a name-title entry fil e of the same number of entries ( 16,792) was constructed. A series of random numbers larger than the number of entries in the original name-title file ( 132,808 ) was generated and one of the numbers was added to each of the 132,808 name-title entries in sequence. Next the fil e was sorted by number so that a randomized file was obtained. Then the first 16,792 name-title entries were selected. The same program analyzed keys d erived from this file. RESULTS Table 1 presents the maximum number of entries to be expected in 99% of replies for the file of 16,792 title-only entries as well as for the name- title file containing the same total of entries. For example, when a large number of random requests are put to the title-only file using a 3,3 search key, the prediction is that 99.0% of the time, eight or fewer replies will be returned. However, in the case of the name-title file , only two replies will be returned 99.3% of the time. The 3,3 key produced only thirteen replies ( .12% of the total number of 3,3 keys) containing twenty-one or more entries. The highest number of entries for a single reply for the 3,3 key was 235 ( "JOU ,OF" d erived from JOURNAL OF ) . The next highest number of replies was 88 ("ADV, IN" for ADVANCES IN ) . Trun cated Search Keys j KILGOUR 209 Table 1. Maximum Number of Entries in 99% of Replies Search Key Title-Only Entries Name-Title Entries Percent Max imum Ent1·ies Maximum Entries Percent Per Reply Of Time Per Reply Of Time ~2 ~ ~1 7 99.0 ~3 ~ ~1 4 99.6 2,4 11 99.0 3 99.5 3,2 9 99.1 3 99.2 3,3 8 99.0 2 99.3 3~ 8 ~1 2 99.5 4,2 8 99.1 2 99.2 4,3 7 99.0 2 99.6 4,4 7 99.1 2 99.7 DISCUSSION The two words from which the keys are derived in name-title entries constitute a two-symbol Markov string of zero order, since the name string and title string are uncorrelated. However, the two words from which keys are derived in the title-only entry are first order Markov strings, since they are consecutive words from the title string and are correlated. The consequence of these two circumstances on the effective ness of derived keys is clearly presented in Table 1. The keys from name-title entries consistently produce fewer maximum entries per reply. Therefore, it is desirable to derive keys from zero order Markov strings wherever possible. The Ohio State University Libraries contain over two and a quarter million volumes, but on 9 February 1971 there were only 47,736 title-only main entries in the catalog. The file used in the present experiment is 35% of the size of the OSU file. Since 99% of the time the 3,3 key yields eight or fewer titles, it is clear that such a key will be adequate for retrieval for library on-line, title-only catalogs. The 3,3 key also posse sses the attractive quality of eliminating the majority of human misspe1ling as pointed out in the earlier paper ( 1). There remains, however, the unsolved problem of the efficient retrieval of such titles as those beginning with "Journal of" and "Advances in". It appears that it will be necessary to devise a special algorithm for those relatively few titles that produce excessively high numbers of entries in replies. In the previous investigation it was found that a 3,3 key yielded five or fewer replies 99.08% of the time from a fil e of 132,808 name-title entries. Table 1 shows that for a file of only 16,792 entries the 3,3 key produces two or fewer replies 99.3% of the time . These two observations suggest that as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio, since the maximum 210 Journal of Library Automation Vol. 4/4 December, 1971 number of entries rose from two to five while the total size of the file increased from one to approximately eight. Further research must be done in this area to determine the relative behavior of derived truncated keys as their associated file sizes vary. CONCLUSION This experiment has produced evidence that a series of truncated search keys derived from a first order Markov word string in a bibliographic description yields a higher number of maximum entries per reply than does a series derived from a zero order Markov string. However, the results indicate that the technique is nonetheless sufficiently efficient for applica- tion to large on-line library catalogs. Use of a 3,3 search key yields eight or fewer entries 99.0% of the time from a file of 16,792 title-only entries. ACKNOWLEDGMENT This study was supported in part by National Agricultural Library contract 12-03-01-5-70 and by Office of Education contract OEC-0-72-2289 (506). REFERENCES 1. F. G. Kilgour; P. L. Long; E. B. Leiderman: "Retrieval of Bibliographic Entries from a Name-Title Catalog by Use of Truncated Search Keys," Proceedings of the American Society for Information Science 7 ( 1970), pp. 79-82. 2. F. H. Ruecking, Jr.: "Bibliographic Retrieval from Bibliographic Imput; The Hypothesis and Construction of a Test," Journal of Library Auto- mation 1 (December 1968), 227-38. 3. Nugent, W. R.: "Compression Word Coding Techniques for Informa- tion Retrieval," ] ournal of Library Automation 1 ( December 1968 ) , 250-60. 4. F. G. Kilgour: "Retrieval of Single Entries from a Computerized Library Catalog File," Proceedings of the American Society for Information Science 5 ( 1968), pp. 133-36. 5. J. L. Dolby: "An Algorithm for Variable-Length Proper-Name Com- pression," ] ournal of Library Automation 3 (December 1970), 257-75. 6. M. J. Coe: "Mechanization of Library Procedures in the Medium-sized Medical Library: X. Uniqueness of Compression Codes for Bibliographic Retrieval,'' Bulletin of the Medical Library Association 58 (October 1970), 587-97. 7. W. L. Newman; E. J. Buchinski: "Entry/Title Compression Code Access to Machine Readable Bibliographic Files," Journal of Library Automation 4 (June 1971 ), 72-85. 8. H. I. Rothrock, Jr.: Computer-Assisted Directory Search; A Dissertation in Electrical Engineering. (Philadelphia: University of Pennsylvania, 1968). 5597 ---- lib-MOCS-KMC364-20140106084043 211 NAME-TITLE ENTRY RETRIEVAL FROM A MARC FILE Philip L. LONG, Head, Automated Systems Research and Development and Frederick G. KILGOUR, Director: Ohio College Library Center, Columbus, Ohio A test of validity of earlier findings on 3,3 search-ke y retrieval from an in-process file for retrieval from a MARC file. Probability of number of entries retrieved per reply is essentially the same for both files. This study was undertaken to test the applicability of previous findings on retrieval of name-title entries from a technical processing system fil e ( 1 ) to retrieval from a MARC file; the technique for retrieval employs truncated 3,3 search keys. MATERIALS AND METHODS The study cited above employed a file of 132,808 name-title entries obtained from the Yale University Library's Machine Aided Technical Processing System. Bibliographic control was not maintained for the generation of records in this file , with the result that the file contained errors that simulated errors in the requests library users put to catalogs. The MARC file employed in the present study contains 121,588 name-title entries that are nearly error free. Whereas the MARC file possesses few records bearing foreign titles, the Yale file has a significantly higher per- centage of such titles, as would be expected for a large university library. Initial articles were deleted in Yale titles, but only English articles in MARC titles because the language of foreign language titles is not identified in MARC. 212 Journal of Library Automation Vol. 4/4 December, 1971 Design of the program used to analyze the MARC file was the same as that for the program employed in the previous study. However, the new program runs on a Xerox Data Systems Sigma 5 computer. The test em- ployed the 3,3 search key to make possible comparison with previous results. RESULTS Table 1 presents the percentage of time that up to five replies can be expected, assuming equal likelihood of key choice. Inspection of the table reveals that there is no significant difference between the findings from the Yale and the MARC files. Table 1. Probability of Number of Entries Per Reply Using 3,3, Search Key Number of Replies 1 2 3 4 5 DISCUSSION Cumulative Probability Percentage Yale File MARC File 78.58 79.98 92.75 93.28 96.83 96.93 98.40 98.26 99.08 98.91 The same result was expected for the MARC file that had been obtained earlier from the Yale file. Possible influences that might have led to different results were the existence of errors in the Yale file, a significant proportion of foreign titles in the Yale file as compared to the nearly all-English MARC file, and the inability to mechanically delete the initial articles in the few foreign language MARC titles. It is most unlikely that the effects of these differences are masking one another. CONCLUSION The findings of a previous study on the effectiveness of retrieval of entries from a large bibliographic file ( 1) by use of a truncated 3,3 search key have been confirmed for a similarly large MARC file. REFERENCE 1. Kilgour, Frederick G.; Long, Philip L. ; Leiderman, Eugene B.: "Retrieval of Bibliographic Entries from a Name-Title Catalog by Use of Truncated Search Keys," Proceedings of the American Society for Information Science, 7 ( 1970 ), 79-81. - 5598 ---- lib-MOCS-KMC364-20140106084054 A COMPUTER SYSTEM FOR EFFECTIVE MANAGEMENT OF A MEDICAL LIBRARY NETWORK 213 Richard E. NANCE and W. Kenneth WICKHAM: Computer Science/ Operations Research Center, Institute of Technology, Southern Methodist University, Dallas, Texas, and Maryann DUGGAN: Systems Analyst, South Central Regional Medical Library Program, Dallas, Texas TRIPS (TALON Reporting and Information Processing System) is an interactive software system for generating reports to NLM on regional medical library network activity and constitutes a vital part of a network management information system (NEMIS) for the South Central Regional Medical Library Program. Implemented on a PDP-lOfSRU 1108 interfaced system, TRIPS accepts paper tape input describing network transactions and generates output statistics on disposition of requests, elapsed time for completing filled requests, time to clear unfilled requests, arrival time dis- tribution of requests by day of month, and various other measures of activ- ity andjor performance. Emphasized in the TRIPS design are flexibility, extensibility, and system integrity. Processing costs, neglecting preparation of input which may be accomplished in several ways, are estimated at $.05 per transaction, a transaction being the transmittal of a message from one library to another. INTRODUCTION The TALON (Texas, Arkansas, Louisiana, Oklahoma, and New Mexico) Regional Medical Library Program is one of twelve regional programs established by the Medical Library Assistance Act of 1965. The regional programs form an intermediate link in a national biomedical information network with the National Library of Medicine ( NLM) at the apex. Unlike 214 Journal of Library Automation Vol. 4/4 December, 1971 most of the regional programs that formed around a single library, TALON evolved as a consortium of eleven large medical resource libraries with administrative headquarters in Dallas. A major focus of the TALON pro- gram is the maintenance of a document delivery service, created in March 1970, to enable rapid access to published medical information. TWX units located in ten of the resource libraries and at TALON headquarters in Dallas comprise the major communication channel. In July 1970 a joint program was initiated to develop a statistical report- ing system for the TALON document delivery network. Design and devel- opment of the system was done by the Computer Science/Operations Re- search Center at Southern Methodist University, while training and oper- ational procedures were developed by TALON personnel. Both parties in the effort view the statistical reporting system as a vital first step in pro- viding TALON administrators with a comprehensive network management information system (NEMIS ). An overview of this statistical reporting system, designated as TRIPS (TALON Reporting and Information Proc- essing Systems), and its relation to NEMIS is discussed in the following paragraphs. The objectives and design characteristics of NEMIS are stated in ( 1 ). DESIGN REQUIREMENTS There were two considerations for requirements for a network manage- ment information service ( NEMIS ) for TALON: 1) In what environment would TALON function? 2) What should be the objectives of a network management information service and what part does a statistical reporting system play in its development? The TALON staff and the design team spent an intensive period in joint discussion of these two questions. TALON Environment The TALON document delivery network operates in an expansive geo- graphical area (Figure 1). The decentralized structure of the network enables information transfer between any two resource libraries. In addition TALON headquarters serves as a switching center, by accepting loan requests, locating documents, and relaying requests to holding libraries. A requirement placed on TALON by NLM is the submission of monthly, quarterly, and annual reports giving statistical data on network activity. These statistics provide details on: 1) requests received by channel used (mail, telephone, TWX, other), 2) disposition of requests (rejected, accepted and filled , accepted and unfilled), 3) response time for filled requests, 4) response time for unfilled requests, 5) most frequent user libraries, 6) requests received from each of the other regions, and 7) non-MEDLARS reference inquiries. A Medical Library NetworkjNANCE 215 • Fig. 1. Location of the Eleven Resource Libraries and TALON Head- quarters. Monthly reports require cumulative statistics on year-to-date performance, and each of the eleven resource libraries and TALON headquarters is required to submit a report on its activity. Needs and Objectives While the immediate need of the TALON network was to develop a system to eliminate manual preparation of NLM reports, an initial decision was made to develop software also capable of assisting TALON manage- ment in policy and decision making. Eventual need for a network manage- ment information system ( NEMIS) being recognized, the TALON report- ing and information processing system (TRIPS) was designed as the first step in the creation of NEMIS. Provision of information in a form suitable for analytical studies of policy and decision making- e.g., the message distribution problem described by Nance ( 2) -placed some stringent requirements on TRIPS. For in- stance, the identification of primitive data elements could not be made from report considerations only; an overall decision had to be made that no sub-item of information would ever be required for a data element. In addition the system demanded flexibility and extensibility, since it was to operate in a highly dynamic environment. These characteristics are quite apparent in the design of TRIPS. 216 Journal of Library Automation Vol. 4/4 December, 1971 TRIPS DESIGN TRIPS is viewed as a system consisting of hardware and software com- ponents. The description of this system considers: 1) the input, 2) the software subsystems (set of programs), 3) hardware components, and 4) the output. Emphasis is placed on providing an overview, and no effort is made to give a detailed description. The environment in which TRIPS is to operate is defined in a single file ( FOR25.DAT). This file assigns network parameters, e.g., number of reporting libraries, library codes, and library titles. The file is accessed by subprograms written in FORTRAN IV and DYSTAL ( 3), the latter being a set of FORTRAN IV subprograms, termed DYSTAL functions, that per- form primitive list processing and dynamic storage allocation operations. Because it requires only FORTRAN IV TRIPS can be implemented easily on most computers. Input A transaction log, maintained by each regional library and TALON head- quarters, constitutes the basic input to TRIPS. Copies of log sheets are used to create paper tape description of the transactions. If and when compatibility is achieved between standard TWX units and telephone entry to computer systems, the input could be entered directly by each regional library. (This is technically possible at present. ) Currently, TALON headquarters is converting the transaction descriptions to ma- chine readable form. Initial data entry under normal circumstances is pictured in Figure 2, which shows the sequence of operations and file accesses in two phases: 1) data entry and 2) report generation. Data entry in tum comprises 1) collecting statistics, 2) diagnosis and verifica- tion of input data and 3) backup of original verified input data. TRIPS is designed to be extremely sensitive to input data. All data is subjected to an error analysis, and a specific file (FOR22.DAT ) is used to collect errors detected or diagnosed in the error analysis routine. Only verified data records are transmitted to the statistical accumulation file (FOR20.DAT). Software Subsystems TRIPS comprises seven subsystems or modules. Within each module are several FORTRAN IV subprograms, DYSTAL function and/ or PDP-10 systems programs discussed under hardware components in the following section: NEWY: Run at the beginning of each year, NEWY builds an in-core data structure and transfers it to disk for each resource li- brary in the network. It further creates the original data backup disk file ( FOR23.DAT). After disk formatting , RECORD (the accessing and storage module) may be acti- vated to begin accumulating statistics for the new year. A Medical Library NetworkjNANCE 217 Statistical Collection l~cport Genera tion Reimburs ab le Statis tic s Repor t Non-Reimburs•ble S tat istics Report Fig. 2. TRIPS Structure NEWQ: NEWM: DUMPl: RECORD: REPORT: MANAGE: Run between quarters, NEWQ purges the past quarter sta- tistics for each library and prepares file FOR23.DAT for the next quarter. The report for the quarter must be generated before NEWQ is executed. Run between months, NEWM purges the monthly statistics for each regional library and prepares file FOR23.DAT for the backing up of next month's data. The utility module causes a DYST AL dump of the data base. The accessing and storage module RECORD incorporates the error diagnosis on input and the entry of validated data records into file FOR23.DAT. No data record with an indi- cated error is permitted, and erroneous records are flagged for exception reporting. The error report (ERMES.DAT) may be printed on the teletype or line printer after execution of RECORD. The reporting module REPORT generates all reimbursable statistics on a month-to-date, quarter-to-date, and year-to- date basis. Utilization of TRIPS as a network management tool is af- forded by MANAGE, which combines statistics from reim- bursable and non-reimbursable transactions to generate a report providing measures of total network activity and performance. 218 Journal of Library Automation Vol. 4/4 December, 1971 The primary files used by the software subsystems are described briefly in Table 1. Table 1. Primary Files in TRIPS File Name Function of the File FOR25.DA T Contains the system definition parameters and initialization values. FOR20.DAT FOR2l.DAT Statistical accumula- tion for validated data records. Generation of reports from information in FOR20.DAT. Comments Created from card in- put to assure proper format. Two parts : File Type ASCII ( 1) input translator Binary data structure, and (2) statistical data base. Carriage control char- ASCII acters must be in- cluded to generate re- ports. FOR22.DAT Collects data records Errors accumulated ASCII diagnosed as in error. in FOR22.DA T are transmitted to ERMES.DAT for out- put. FOR23.DAT Enables creation and Each month's vali- ASCII updating of the back- dated records added up magnetic tape. to tape. FOR24.DAT Enables recovery Tape information Binary read of backup tape. stored prior to trans- fer of file information to FOR20.DAT. ERMES.DAT Serves to output mes- If 6 or less errors oc- ASCII sages on data records cur ERMES is not diagnosed as in error. created and messages are output to tele- type. If more than 6 errors, an estimate of typing time is given to user who has op- tion of printing them on the teletype or in a report form on the line printer. A Medical Library NetworkjNANCE 219 A major concern in any management information system is the system integrity. In addition to the diagnosis of input data, TRIPS concatenates sequential copies of disk file FOR23.DAT to provide a magnetic tape backup containing all valid data records for the current year. A failsafe tape, containing all TRIPS programs, is also maintained. Hardware Components Conversion of transaction information to machine readable form is done off line currently. Using a standard TWX with ASCII code, paper tapes are created and spliced together. Fed through a paper tape reader to a PDP-10 (Digital Equipment Company), the input data is submitted to TRIPS. Control of TRIPS is interactive, with the user monitoring program execution from a teletYPe. All file operations are accomplished using the PDP-10 via the teletype, and the output reports are created on a high-speed line printer. With SM,U's PDP-10 and SRU 1108 interface, report genera- tion can be done on line printers at remote terminals to the SRU 1108 as well. Output TRIPS output consists of a report for each library in the network and a composite for the entire network. The report may be limited to reim- bursable statistics or include all statistics. Information includes: 1) Errors encountered in the input phase, 2) Number of requests received by channel, 3 ) Disposition of requests (i.e., rejected, accepted/ filled, accepted/ unfilled, etc. ) , 4) Elapsed time for completing :filled requests or clearing unfilled requests, 5) Geographic origin of requests, 6) Titles for which no holdings were located within the region, 7 ) TYPes of requesting institutions, 8) Arrival time distribution of requests by day of month, 9) Invoice for reimbursement by TALON, 10 ) Node/ network dependency coefficient as described by ( 4). SUMMARY TRIPS is now entering its operational phase. Training of personnel at the resource libraries is concluded, and data on transactions are being entered into the system. Input errors have decreased significantly ( from fifteen or twenty percent to approximately two percent ). TALON person- nel are enthusiastic, and needless to say the regional library staffs are happy to see a bothersome, time-consuming manual task eliminated. - In summary, the following characteristics of TRIPS deserve repeating: 1) With its modular construction, it is flexible and extensible. 220 Journal of Library Automation Vol. 4/4 December, 1971 2) Implemented in DYSTAL and FORTRAN IV, it should allow instal- lation on most computers without major modifications. 3) Designed to operate in an interactive environment, it can be modi- fied easily to function in a batch processing environment. 4) TRIPS is extremely sensitive to system integrity, providing diagnosis of input data, reporting of errors, magnetic tape backup of data files, and a system failsafe tape. 5) Definition of primitive data elements and the structural design of TRIPS enable it to serve as the nucleus of a network management information system ( NEMIS) as well as to generate reports required by NLM. 6) Currently accepting paper tape as the input medium, TRIPS could be modified easily to accept punched card input and with more ex- tensive changes could derive the input information during the mes- sage transfer among libraries. Finally, the processing cost of operating TRIPS, neglecting the conver- sion to paper tape, is estimated to be $.05 per transaction (a message trans- fer from one library to another). Extensive and thorough documentation of TRIPS has been provided. Availability of this documentation is under review by the funding agency. ACKNOWLEDGMENT Work described in this article was done under contract HEW PHS 1 G04 LM 00785-01, administered by the South Central Regional Medical Library Program of the National Library of Medicine. The authors express their appreciation to Dr. U. Narayan Bhat and Dr. Donald D. Hendricks for their contributions to this work. REFERENCES 1. "NEMIS -A Network Management Information System," Status Re- port of the South Central Regional Medical Library Program, October 26, 1970. 2. Nance, Richard E.: "An Analytical Model of a Library Network," Jour- nal of the American Society for Information Science, 21: (Jan.-Feb. 1970), 58-66. 3. Sakoda, James M.: DYST AL- Dynamic Storage Allocation Language Manual, (Providence, R. I.: Brown University, 1965). 4. Duggan, Maryann, "Library Network Analysis and Planning (LIB- NAT)," Journal of Library Automation, 2: (1969), 157-175. 5599 ---- lib-MOCS-KMC364-20140106084141 221 BOOK REVIEWS Introduction to Information Science, Tefko Saracevic, ed. New York: Bowker 1970, 776 pp. $25.00 The editor has put together a large volume consisting of 776, 8~ x 11 pages and weighing almost 5 pounds. It comprises 66 different articles written by almost as many authors and covers the period from 1953 to 1970. Two-thirds of the articles were written during the period 1966-1969. In short, it is a collection of a large number of papers mostly from the last few years having to do in some way with information science or more properly, with information systems. The papers generally are good ones and in some cases have already become acknowledged classics. In a few cases I am a bit puzzled about their inclusion in a volume of this type. In the few months since I have had this book I have already found numerous occasions to consult several of the articles. Some of the other papers which I have not seen recently I have enjoyed reading again. The book is divided into four parts, which are further subdivided into thirteen chapters. The four parts are Basic Phenomena, Information Systems, Evaluation of Information Systems, and A Unifying Theory. Although the chapter headings are too numerous to list they include such topics as Notions of Information, Communication Processes, Behavior of Information Users, Concept of Relevance, Testing, as well as Economics and Growth. By virtue of the Parts, Chapters, and articles the editor has provided a type of classificalion system or structure for information science without attempting to define information science. Interspersed between each of the Parts and Chapters is up to a page of introductory and explanatory material provided by the editor. In a volume of this type it is important to recognize what the volume is and what it is not. As I have mentioned, it is a good anthology of important articles related to information. It is not, as the title implies, an introduction to information science. The papers are by and large unrelated to each other and the introductory comments by the editor do little to provide a unifying relationship. Furthermore, the overall scope of the articles is generally quite limited and, although the editor implies it is not so, tends to equate information science to information systems. The final paper in the volume by Professor William Goffmann is listed by the editor as Part Four-A Unifying Theory. The precise title of the chapter is somewhat less ambitious: namely, "A General Theory of Communication." The paper is an unpublished one (although similar papers by the author have been published elsewhere) and relates communication in a general sense to the theory of epidemic processes. Although the theory is an 222 Journal of Library Automation Vol. 4/4 December, 1971 interesting one, it would hardly qualify as a unifying theory for informa- tion science. It certainly does not provide the unifying relationships among the various articles included in the text. My guess would be that .other qualified individuals, in putting together a similar volume, would have included many different articles. This, how- ever, is the nature of the field at this time. By comparison note the recently published volume Key Papers in Inf01·mation Science, Arthur W. Elias editor. This book, although admittedly serving a somewhat different pur- pose, contains 19 papers with only a single paper in common with those of this particular volume. In summary, this is a good collection of relevant and useful articles in information science. It is probably desirable that they be included in a single volume. Serious students, educators, and research workers will find this volume to be of interest. As a reference book it will be quite useful. The book is not, however, an introduction to information science. The novice, the student, and the casual reader will probably be disappointed and confused, and in some cases might even be misled. Marshall C. Yovits Information Processing Letters. North-Holland Publishing Company, Am- sterdam. Vol. 1, No. 1, 1971. English. Bi-monthly. $25.00. This journal is published by a most reputable company and has a most impressive international list of editors and references. The affiliation of editors illuminates the orientation of the journal: six of them are from departments of mathematics, computer science or cybernetics and two are from IBM laboratories. Understandably, the journal is devoted basically to computer theory, software and applications, with a heavy accent on mathematically expressed theory related to the solution of computing problems, algorithms, etc. It is directed toward basic and applied scientists and not toward practitioners. People interested in library automation may, from time to time, find in it theoretical articles broadly related to their work, but they will have to do the "translating" themselves. This journal follows the tradition of "letters" journals in physics, biology and some other disciplines. The papers are short; publication is rapid; work reported generally tends to be very specific, preliminary to or a part of some larger research project; usually small items of knowledge are reported. The "letters" journals are received in the fields where they appear with mixed emotions. For instance, Ziman (Nature 224:318-324, 1969) questions very much the need for these publications. On the other hand, they are a useful outlet for authors who otherwise would not publish these often useful bits of specific knowledge. Recommended for research libraries related to computer science. T efko Saracevic Book Reviews 223 Handbook of Data Processing for Libraries. By Robert M. Hayes and Joseph Becker. New York: Becker & Hayes, Inc., 1970. 885 pages. $19.95. To write a universal handbook in a field so full of complex intellectual pro.blems and simultaneously satisfy every potential reader is an impossible assignment. Therefore the authors cannot be faulted for failing to satisfy everyone. They have succeeded in writing for a very important audience -administrators and decision makers. For this group, they have presented difficult technical material in a clear readable fashion-a reflection of ' their extensive teaching experience. For many library administrators, this handbook arrives five years too late. Had it been available earlier, a large number of current automation projects might never have been authorized by management, or at least might have been conducted on a sounder basis. Following a very con- servative approach, the authors generally remain within the limitations of the current state of the art, being careful to distinguish that which is feasible (i.e., practical ) from that which is possible. Over and over again, they warn librarians about the limitations of computers and caution against excessively high expectations. For administrators, the most useful material is in chapter 3, "Scientific Management of Libraries," and in chapter 8, "System Implementation." A reading of chapter 8 alone suffices to convey to the administrator the magnitude and complexity of even the most seemingly routine computer application in libraries. This chapter, the most important and useful in the entire book, covers planning, organization, staffing, hardware, site preparation, programming, data conversion, phase-over, staff orientation, and training. Each of these topics-deserving of complete chapters in themselves-is treated briefly, but in enough detail to communicate the complexity of each component in the long stream of system development activities, all of which must be completed to the last detail for success. There are three useful appendices: a glossary, an inventory of machine readable data bases, and a list of 115 sources for keeping up to date. Bibliographic footnotes abound and each chapter ends with a list of suggested readings. However, it is surprising how many references are five or more years old; in fact, there is a scarcity of current references. For example, Ballou's well-known Guide to Microreproduction Equipment, now entering its 5th edition, is cited in the first edition of 1959. The authors have been badly served by their proofreaders. The book is marred by an incredible number of spelling errors in text, tables, foot- notes and references, especially with personal names, plus incomplete citations. The index contains many entries too broad to be useful, such as: utilization of computer ( 1 entry ), time sharing ( 1 entry), hardware ( 3 entries), technical services ( 3 entries). Lacking from the index are name references to distinguished contributors to the literature, such as Avram, Cuadra, DeGennaro, Fasana, and others. Many of these names appear only in footnotes. 224 Journal of Library Automation Vol. 4/4 December, 1971 The book is rich in tabulated data and specifications for a variety of equipment. Unfortunately, much of this equipment is inapplicable to library use, or the tabulated data is in error. Table 12.25lists several defunct or never marketed equipments, such as IBM's Walnut and Eastman Kodak's Minicard, without indication of non-availability. In table 11.22 there are extensive listings of CRT terminals, most of which are unsuitable for library applications by reason of deficient character sets or excessive rentals. Nine of the units listed showed rentals of over $1,000 per month, and two of these were virtually at $5,000 per month, clearly beyond the reach of any library. Table 12.2 suggests the access time to one of 10,000 pages in microfiche is half a second, a figure that is off by an order of magnitude for mechanical equipment and by two orders of magnitude for manual systems. (More nearly correct figures are given in the text on page 396). Table 12.21 lists several microfilm cameras designed expressly for non-library applications and not adaptable to any library purpose. From a broader perspective, one misses several other features. Is a "handbook" for the practitioner? If so, this volume is too elementary. Can it be used as a textbook in a course in library automation and information science? The book contains no problems for students to attack, and except for references, no aids to the instructor. Possibly it can serve as supple- mentary reading, for it contains far too much tutorial material (yet only ten pages of nearly 900 are devoted to flow charting). One wishes for more specifics drawn from the real world. A hypo- thetical case study in chapter 11 is illustrative: a 5% error rate is assumed for input of a 300,000 record bibliographic data base to be converted to machine readable form. Not revealed in the example is that a relatively low error rate in keyboarding may result in a very high percentage of records which must be reprocessed to achieve a high quality data base. Each reprocessed record will consume computer resources: CPU time, core, disc I/0, tape reading and writing, etc. We know from MARC and RECON that the ratio of the total records processed to net yield is on the order of 3:2; i.e., each record must be processed on the average of one and a half times to get a "clean" record. The cost of this reprocessing is far beyond the 5% lost by faulty keyboarding. The Handbook will be a useful decision making tool for the generalist, a less helpful aid to the practitioner. It is hoped that a revised edition is in preparation, and particularly that the tabular material will be corrected and brought up to date. Chapter 8, the heart of the book, should be greatly expanded. For the next edition, some consideration might be given to a two-volume work: the first volume for the administrator, and the second containing much more technical detail for the practitioner. If the two volume pattern is followed, a loose-leaf format with regular updating would be most helpful for the second half. Allen B. V eaner Book Reviews 225 L~brary Automation: Experience, Methodology, and Technology of the Lzbrary as an Information System, by Edward W. Heiliger. New York: McGraw-Hill Book Co., 1971. xii, 333 pp. The need for a handbook and/or general introductory text on the topics of automation and systems analysis in libraries has been sorely felt for quite som.e time. During the past year, three have appeared (Chapman and St. Pierre, Library Systems Analysis Guidelines, Wiley, 1970; Hayes and Becker, Handbook of Data Processing for Libraries, Wiley, 1970 and the book here reviewed.) Unfortunately, none is completely satisfactory, for different reasons. A serious student wanting a reasonably comprehen- sive, systematic, and balanced treatment of these subjects will, I'm afraid, be forced to have to use all three of these titles and, even then, will have constant need to use supplementary materials for a number of aspects. The title being considered in this review by Heiliger and Henderson, if one judged only by the authors' intent as expressed in the Preface, would be exactly the kind of work that we've all felt the need for. As they state on page vii, the purpose "is to provide a perspective of the library func- tions that have been or might be mechanized or automated, an outline of the methodology of the systems approach, an overview of the technology available to the library, and a projection of the prospects for library auto- mation." And, indeed, if one looks at the table of contents there are four parts that closely parallel this statement of purpose. The parts themselves though, when inspected more closely, reveal not a systematic treatise or even an in-depth treatment of these topics, but rather a loosely connected series of essays, each on a fairly superficial level, discoursing on a variety of aspects associated with, or tangental to these topics. This indicates, at least to this reviewer, that the genesis of the book was a series of lectures presented and refined over a period of time by the authors. Although not in itself a bad thing, here it is unfortunate to some degree because not enough effort was expended in amplifying the material with additional data, library-oriented examples, and illustrations, nor in logically integrat- ing the various parts. Part I, entitled "Experience in Library Automation," begins by broadly citing a number of library automation projects mostly dating from the early 60's. The level is extremely superficial and the presentation not very enlightening, since only three or four projects are mentioned, and then only in passing. Immediately following are several excellent chapters describ- ing traditional library activities (e.g., acquisition, cataloging, reference,. etc. ) in functional terms. The approach, though extremely simple, is for the most part effective and is only marred by occasional, overly condescend- ing statements such as "Library filing is a very complicated matter" or "Reference librarians use serials literature extensively." Unfortunately, in the 104 pages of this section there is not one illustration. 226 Journal of Library Automation Vol. 4/4 December, 1971 Part II, "Methodology of Library Automation," attempts to describe the general approach and techniques of systems analysis. In a number of ways, this is the best part of the book. Unfortunately, the concepts that are so simply and succinctly described are only indifferently related to activities that will be familiar to librarians. As a brief essay on the objectives and concepts of systems analysis, it is quite adequate, but as a discussion of how they relate to library problems, it is totally inadequate and often misleading. Part III, "Technology for Library Automation," is probably the least informative part of the book, giving the reader virtually no practical in- formation. All of the important and obvious technological concepts are listed, but are dismissed with what oftentimes is little more than a brief definition. The one exception to this is Chapter 13, entitled for no apparent reason, "Concepts." This chapter is in fact an innovative and thought- provoking view of a library as a data-handling system. One wishes that this chapter had been amplified and treated more fully. Part IV, "Prospects for Library Automation," is the least effective part of the book, having in my mind only one merit: it doesn't tack on a Holly- wood-style happy ending. The authors' view of the 70's, as far as can be inferred from this too short section, is cautious and mundane. These will be, I'm convinced, the overriding characteristics of automation efforts for the next several years. I only wish that the authors had elaborated more fully on these points and presented their views more coherently. The book is augmented with a 61-page bibliography ( 1,029 citations), which, though reasonably current, is of dubious worth because it is neither annotated nor particularly well balanced. Certain classics, such as Bourne's Methods of Information Handling, or Information Storage and Retrieval by Becker and Hayes, and certain current, basic items, such as Cuadra's An- nual Review of Infonnation Science and Technology and the Journal of Library Automation, are not listed. Each chapter is accompanied by a "Suggested Reading List" wherein materials more or less pertinent to the subject of the chapter are listed. A glossary of terms in three parts (a total of 36 pages) is also included and, though difficult to use because it is in three alphabets and interspersed with the text, provides short but very adequate definitions. Unfortunately, several jargon terms used in the text itself are not included; one that was most irritating to this reviewer is the term "gigabyte" which to my knowl- edge has very little currency among the cognoscenti. On balance, Library Automation is a title that should be recommended for a wide range of readers. Though it will probably have little to offer experts in the field, it does have value as a text for library students or a general introduction for the average, non-technical librarian. Paul ]. Fasana Book Reviews 227 Sistema Colombiano de Informacion Cientifica y T echnica (SICOLDIC). A Colombian Network for Scientific Information, by Joseph Becker et al. Quirama, Colombia: May-June 1970. 59 p. Mimeo. The task of the study team which produced this report was to present "an implementation plan for strengthening the scientific communication process in Colombia by providing a permanent systematic mechanism to function in the context of Colombia's internal needs for scientific and tech- nical information in government, industry, and among the research activi- ties in higher education." More specifically, the expressed goal of such a mechanism is "to develop a network which will permit any scientific or technical researcher, in gov- ernment, industry, or university, to access the total information resources of the country without regard for his own physical location." The study was comple ted in two months (according to the cover dates) and comprised four areas of investigation, namely: 1) to elucidate the advantages of d eveloping a centrally administered national network in- cluding three levels of network nodes and a technical communications plan; 2) make an inventory of universities, institutes, telecommunications and computer facilities in Colombia; 3) recommend a mix of these factors to produce specific services, and 4) propose a seven-year budget. The Republic of Colombia is about the size of Texas and California combined, and its population is about 1 million less than New York State. Most scientific and technical workers are located in five major cities, and the country is divided into six administrative zones. Within these zones twenty universities and forty-four institutes were inventoried by the study team with respect to specialization, faculty, book collections and the like. From these universities and institutes, five primary and seven secondary nodes were named to be connected by means of a Telex communications system. The Telex connections are not to be computer-mediated in the forseeable future, but used for interlibrary loan and other messages. ( There were two teleprocessing systems operating in Colombia at the time of the study.) Basic recommendations are: that a governmental unit be established with responsibility for directing the development of SICOLDIC; that this unit, with a high echelon board of directors, should produce several directories, bibliographies and union lists, and publish a monthly catalog of govern- ment-sponsored scientific and technical research. In addition, a manual for use of the telecommunications system should be produced. The pro- posed budget is about $250,000. ( 4.5 million pesos ) for the first year, graduating to a 25-fold increase by 1976. In some aspects the SICOLDIC plan follows the pattern of some state library development plans being implemented in the U.S. The advantage of central control of information resources planning and fund control b y the SICOLDIC group, with fairly direct access to high governmental 228 Journal of Library Automation Vol. 4/4 December, 1971 authority, provides reasonable insurance for support of the plan, especially since these services contribute to the economic and scientific advance of Colombia. There is no indication of the acceptance of the plan by COLCIENCAS, the governmental unit which commissioned it. Of the sixty references in the bibliography, Spanish publications pre- dominate. Ronald Miller Cooperation Between Types of Libraries 1940-1968: An Annotated Biblio- graphy, by Ralph H. Stenstrom. Chicago: American Library Association, 1970. This bibliography is an effort to sift, organize and describe the literature of library cooperation produced during the period 1940-1968. Two criteria governed the selection of the 348 books and monographs listed: 1) they must deal with cooperative programs involving more than one type of library, and 2) they must describe programs in actual operation or likely to be implemented. Although most of the cooperative projects described are located in the United States, other countries are represented when the material about them is written in English. Cooperative programs in the audio-visual field are included. The annotations explain the nature of the cooperative projects and give the names of participating libraries. An appendix describes briefly about 35 recent cooperative ventures not yet reported in the literature, which the editor learned about through an appeal published in professional journals. Entries are arranged chronologically to facilitate direct access to the most recent developments and to permit tracing the evolution of a par- ticular project over a period of time. Three indexes provide approaches to the material by 1) name of author, cooperative project or library organization, 2) type of C , < 0, 1>, <1, 0>, < 0, 2>, < 1, 1> , <2, 0 >, < 0, 3>, < 1, 2>, etc. Ordered triples are ordered pairs of ordered pairs and the inte- gers: T(X, Y, Z)=I(X, I(Y,Z) ). And so on. Because we have a d enumerable set of books we can accomplish a linear mapping by both subject and category. In fact , the problem is trivial because there are only a finite number of books. Physically, however, neither subject nor category will remain together. To suit the library the mapping must be physically simple, but can be abstractly complex. For all his protestations, the classificationist cannot eschew the physical library. If he could-or wished to-the way is open. As I understand classification, it is vacuous without reference to its ability as a finding tool. It must concern itself with the polydimensional aspect of content but cannot disregard the codex. In answer to the ques- tion form "where is the book about ... ?" an appropriate and total response type is "at location (X, Y, Z) ." Here X, Y, and Z are the spatial coordinates relative to a particular library both as to origin and values. The Dewey or LC numbers of the book are incomple te answers in that they presume a knowledge of the classification structure as well as knowledge of the architecture of the building. I have suggested that a classification scheme must not disregard the codex, but must insofar as possible not be subservient to physical form. The following scheme takes advantage of the codex form, is as easily auto- mated or computerized as current one-dimensional schemes, advances beyond one dimension, and is very relevant to finding: A library is con- sidered as a three-dimension entity. Conventions are adopted for run-on from room to room and floor to floor as for the linear scheme. Each book is classified in all three dimensions-the dimensions being independent. The interpretation of each dimension is left to the discretion of the indi- vidual library. Thus each book has a relative position in each dimension. (This is not an Alexandrian scheme relying on absolute location. ) The following example illustrates the relevant concepts: Choose a subject classification (as commonly understood ) for the X dimension. For example, let Dewey numbers be arranged from left to right on the X axis. Choose a category scheme for the Y dimension. One could assign degrees of dif- Cataloging Geometry/HAZELTON 15 X D I F F I c u SUBJECT ficulty from one to seven, for example. Choose a category scheme for the Z dimension. One could assign numbers between one and seven running from most general to most specific. This has the following effect: standing in front of the near shelf (i.e., Z=l) one can choose a subject by moving laterally. The general books will appear first with difficult items at the top, easy ones at the bottom. If the items are too general, merely move one stack forward and try again. This approach presents an unusually usable instructional layout for circular libraries. A reading lounge can be put dead center with the most subject specific books ranged about the circumference. Level of difficulty is easily adjusted by looking up or down. Given this apparatus you may wish to change the subject classification scheme. Why not put solid state physics behind general physics instead of to the right or left? The card catalog can now be used with greater meaning. There is no reason why it cannot be a map of the shelves. The axes can be translated for ease of searching (e.g., interchange X and Y for the card catalog). Of particular interest is the relation between this scheme and those of A. D. Booth ( 4) where access time is minimized by arranging books in the inverse order of their frequency of use. Further refinements consider nonstandard shelf layouts (radial, circular, spiral). One misgiving about shelving by inverse frequency expressed by librarians is that one no longer knows where to look for a particular book in the sense that one knows when using standard schemes. This objection is easily overcome by com- bining the three-dimensional and frequency schemes. One dimension can be used for frequency, leaving two dimensions in which to group books by subject, difficulty, generality, color, length, or whatever you please. Access time is reduced while physical grouping is retained. 16 Journal of Library Automation Vol. 5/1 March, 1972 One difficulty that will be encountered is the classification of books that are not subject-oriented-poetry and fiction, for example. These areas are not adequately dealt with in linear schemes and they could easily be left as they are. That is, two dimensions could be constants. On the other hand, it seems plausible that, given three dimensions in which to work, someone could discover congenial physical groupings that would be reason- able yet impossible in one dimension. Rather than being a problem, three- dimensional classification offers opportunities to cope with literatures that are not subject specific. Each dimension of this scheme can be criticized on the same grounds as the current linear classification. But, taken as a whole, it provides a more powerful, much needed tool for the classificationist while allowing new approaches by automaters. Its simplicity is assured because it is closer to our intuitive notions of information storage. Three dimensions are necessary! REFERENCES 1. W. H. Auden, "The Cave of Nakedness," About the House ( New York: Random House, 1965), p.32. 2. Jesse H. Shera, "Classification-Current Functions and Applications to the Subject Analysis of Library Material," in Libraries and the Organi- zation of Knowledge (Connecticut: Archon Books, 1965), p.97 -98. 3. Jesse H. Shera, "Classification as the Basis of Bibliographic Organiza- tion," in Libraries and the Organization of Knowledge (Connecticut: Archon Books, 1965) , p.84, 85. 4. A. D. Booth, "On the Geometry of Libraries," Journal of Documenta- tion 25:28-42 (March 1969). 5723 ---- lib-s-mocs-kmc364-20140601051313 17 A TRUNCATED SEARCH KEY TITLE INDEX Philip L. LONG: Head, Automated Systems Research and Development and Frederick G. KILGOUR: Director, Ohio College Library Center, Columbus, Ohio. An experiment showing that 3, 1, 1, 1 search keys derived from titles are sufficiently specific to be an efficient computerized, interactive index to a file of 135,938 MARC II records. This paper reports the findings of an experiment undertaken to design a title index to entries in the Ohio College Library Center's on-line shared cataloging system. Several large libraries participating in the center re- quested a title index because experience in those libraries had shown that the staff could locate entries in files more readily by title than by author and title. Users of large author-title catalogs have long been aware of great difficulties in finding entries in such catalogs. Since the center's computer program for producing an author-title index could be readily adapted to produce a title index, it was decided to add title access to the system. A previous paper has shown that truncated three-letter search keys derived from the first two words of a title are less specific than author- title keys ( 1). Earlier work had revealed that addition of only the first letter of another word in a title improved specificity ( 2) . Therefore, the experiment was designed to test the specificity of keys consisting of the first three characters of the first non-English-article word of the title plus the first letter of a variable number of consecutive words. The experiment was also designed to produce an index that catalogers could use efficiently and that would operate efficiently in the computer system. It was assumed that the terminal user would have in hand the volume for which an entry was to be sought in the on-line catalog. The index was not to be designed for use by library users; subsequent experi- ments will be done to design an index for nonlibrarian users. Other investigations into computerized, derived-key title indexes include 18 Journal of Library Automation Vol. 5/1 March, 1972 the previous paper in this series to which reference has already been made ( 1) and development of a title index in Stanford's BALLOTS system ( 3). Although Stanford has not published results observed from experiment or experience that describe the retrieval specificity of its technique, it is clear that the Stanford procedure is not only more powerful than the one described in this paper but also more adaptable for user employment. The Stanford index is probably less efficient. MATERIALS AND METHODS A file of 135,938 MARC II records was used in this experiment. This file contains title-only and name-title entries, and keys were derived from titles in both types of entries. A key was extracted consisting of the first three characters of the first non-English-article word of each title plus the first character of each following word up to four. If there were fewer than four additional words, the key was left-justified, with trailing blank fill. Only alphabetic and numeric characters were used in key derivation; alphabetic characters were forced to uppercase. All other characters were eliminated and the space occupied by an eliminated character was closed up before the key was derived. A total of 115,623 distinct keys was derived from the 135,938 entries. These 115,623 keys were then sorted. Each key in the file was compared with the subsequent key or keys and equal comparisons were counted. A frequency distribution by identical keys was thus prepared, and a table constructed of percentages of numbers of equal comparisons based on the total number of distinct keys. This table contains the percentage of time for expected numbers of replies based on the assumption that each key had a probable use equal to all other keys. Next, by eliminating the fourth single character and then the fourth and third, files of 3,1,1,1 and 3,1,1 keys were prepared from the 3,1,1,1,1 file. For example, the 3,1,1,1,1 key for Raymond Irwin's The Heritage of the English Library is HER, 0, T, E , L; the 3,1,1,1 key for this title is HER, 0 , T, E; and the 3,1,1 key, HER, 0 , T. The same processing given to the 3,1,1,1,1 file was employed on these two files. RESULTS Table 1 contains the maximum number of entries in 99 percent of re- plies. Inspection of the table reveals that there is a large increase in specificity when the key is enlarged from 3,1,1 to 3,1,1,1; the maximum number of entries ( 99+ percent of the time) drops from twelve to five. However, when the key goes to 3,1,1,1,1, the number of entries per reply goes down only to four from five. The percentage of replies that contained a single entry was 67.8 for the 3,1,1 key, 84.0 for the 3,1,1,1 key, and 90.0 for the 3,1,1,1,1 key. A Truncat ed Search Key / LONG and KILGOUR 19 Table. 1. Maximum Number of Entries in 99 Percent of Replies Search Key 3, 1,1 3, 1, 1,1 3, 1, 1, 1,1 Title Index Entries Maximum Entries Per Reply 12 5 4 Percentage of Time 99.0 99.1 99.2 The Irascope cathode ray tube terminals used in the OCLC system can display nine truncated entries on the one screen, and it is felt that catalogers can use with ease up to two screensful of entries. Therefore, the keys producing more than eighteen titles were listed. For 3,1,1,1,1 there were only 33; for 3,1,1,1 there were 67; and for 3,1,1 there were 357. The maximum number of identical keys was 321 for 3,1,1,1,1 and 3,1,1,1; the key was PRO, b, b, b, b, most of which was d erived from "Proceedings." For 3,1,1 the maximum was 417, for HIS, 0 , T - "History of the." DISCUSSION It is clear from the findings that a 3,1,1 search key is not sufficiently specific to operate efficiently as a title index in a large file. However, the 3,1,1,1 key appears to be sufficiently specific for efficient operation, while the 3,1,1,1,1 key does not appear to possess sufficient increased specificity to justify its additional complexity. The observation that there is a large increase in specificity between keys employing three- and four-title words that constitute Markov strings suggests that the second and third words may be highly correlated. Indeed this suggestion is substantiated b y the maximum case for 3,1,1-HIS, 0, T. In the more-than-eighteen group for 3,1,1,1, these characters occurred in seven keys for a total of 206 entries, and for 3,1,1,1,1 they did not occur at all in the more-than-eighteen group. CONCLUSION This experiment has shown that a 3,1,1,1 or 3,1,1,1,1 derived search key is sufficiently specific to operate efficiently as a title index to a file of 135,938 MARC II records. Since a previous paper observed that as a fil e of entries increases the number of entries per reply does not increase in a one-to-one ratio ( 1 ), it is likely that these keys will operate efficiently for files of considerably greater size. 20 Journal of Library Automation Vol. 5/1 March, 1972 REFERENCES 1. Frederick G. Kilgour, Philip L. Long, Eugene B. Leiderman, and Alan L. Landgraff, "Title-Only Entries Retrieved by Use of Truncated Search Keys," l ournal of Library Automation 4:207-10 ( Dec. 1971 ). 2. Frederick G. Kilgour, "Retrieval of Single Entries from a Computerized Library Catalog File," Proceedings of the American Society for Infor- mation Science 5: 133-36 ( 1968 ). 3. Edwin B. Parker, SPIRES (Stanford Physics Information REtrieval System) 1969-70 Annual Report ( Palo Alto: Stanford University, June 1970 ), p. 77- 78. 5724 ---- lib-s-mocs-kmc364-20140601051338 MULTIPURPOSE CATALOGING AND INDEXING SYSTEM (CAIN) AT THE NATIONAL AGRICULTURAL LIBRARY. 21 Vern J. VAN DYKE: Chief, Computer Applications, National Agricultural Library, and Nancy L . AYER: Computer Systems Analyst, National Agri- cultural Library, Beltsville, Maryland. A description of the Cataloging and Indexing System (CAIN) which the National Agricultural Library has been using since January 1970 to build a broad data base of agricultural and associated sciences information. With a single keyboarding, bibliographic data is inputed, edited, manipulated, and merged into a permanent base which is used to produce many types of printed or print-ready end-products. Presently consisting of five sub- systems, CAIN utilizes the concept of controlled authority files to facilitate both information input and its retrieval. The system was designed to provide maximum computer services with the minimum of effort by users. INTRODUCTION This article describes an interactive system in operation at the National Agricultural Library which with a single keyboarding of data provides all necessary catalog cards, book catalogs, bibliographies, and related internal reports, as well as a computer data base for information retrieval. Primarily in batch mode, the system can operate on an IBM 360 with 256K memory using OS, six magnetic tape drives, a card reader, and a line printer. BACKGROUND The National Agricultural Library ( NAL) as one of the three national libraries is responsible for the collection and dissemination of agricultural information on a national and worldwide basis. In this pursuit publications are obtained through gifts, exchange agreements, and by purchase of items in many languages. Titles of those items in non-Roman alphabets are transliterated and all non-English titles are translated. The volume of publications handled by NAL in 1969 was in the neigh- 22 Journal of Library Automation Vol. 5/1 March, 1972 borhood of 600,000, of which approximately 275,000 were added to the collection. This volume was sufficiently large to provide a serious problem to NAL's staff and thus computer assistance was clearly a logical and necessary arrangement. In 1964 a computer group was formed in NAL; it became active in developing systems to prepare voluminous indexes for the Bibliography of Agriculture, the complete Pesticides Documentation Bulletin, and the categorical and alphabetical issues of the Agricultural/ Biological Vocabu- lary. During 1969 these systems were consolidated and expanded so as to process all input data within one coordinated set of parameters. In Jan- uary 1970 the new Cataloging and Indexing (CAIN) System was im- plemented. SYSTEM DESIGN CAIN is a complex and comprehensive computer system which has been engineered to handle up to five ( 5) simultaneous but separate users who share the same controlled authority files. The basic precept in develop- ment of computer applications at NAL is to make input and output simple and convenient for the users, with the computer assuming as much detail and data manipulation as is technically feasible. At NAL the current users providing input data are the New Book Section, Cataloging, Index- ing, and Agricultural Economics. Operating in parallel, CAIN also services the herbicides data base of the Agricultural Research Service; the Inter- national Tree Disease data base of the Forest Service; and in 1971 will be installed in the Library of the Technion-Israel Institute of Technology in Haifa, Israel. The master data record is variable in length with a fixed portion of 173 characters and up to fifty-seven additional segments of 65 characters each. The fixed portion includes basic data plus a directory of data con- tained in the variable portion. Data elements in CAIN are: a. File code-delineates the various files. b. Identification number-on cataloged items this embodies the ac- cession number. All identification numbers include the year of accession, a parallel run code plus a unique control number. c. Source code. d. User codes-specific identification of up to five users. e. English Indicator-language of text. f. Translation code-availability of an English translation. g. Language, if other than English. h. Proprietary restrictor- identifies classified records. i. Title tracing indicator-for catalog cards. j .. Main entry-designates main entry if not normal sequence. k Document type-whether journal article, monograph, serial, etc. I. Filing location-if other than in the library stacks. m. Categories-two. General area of coverage of subject matter. Cataloging and Indexing System/VAN DYKE and AYER 23 n. New book description-if the title is not sufficiently explanatory. o. Titles-three types: ( 1 ) vernacular or short, ( 2) alternate or hold- ings, and ( 3) translated title (English). p. Personal authors-up to 10. Names plus identifying data. q. Corporate authors-maximum of two. r. Major personal author affiliation. s. Abbreviated journal title if item is a journal article; imprint if mono- graphs and serials. t. Collation/Pagination. u. Date-two: Search date, and date on publication if different. v. Call number. w. Subject terms-may be nested. Up to 45. x. General Notes. y. Special purpose numbers-patent, grant, analysis, contract, tech- nical, or report. z. Series statement. aa. Abstract/ Extract. bb. Tracings not otherwise normally generated by the system. cc. Nonvocabulary cross-references. The total number of individual elements is limited only by the maximum record size. The NAL-produced software is written in COBOL. The data base is maintained on tape which is nine-track, 800 bpi, blocked 2, in EBCDIC, with standard IBM 360 header and trailer labels. The total system pres- ently consists of forty programs, some of which are multipass. In addition, throughput is sorted twenty-five times during the full computer run. These, of course, include the search and retrieval programs and sorts which are run only on request. The ultimate system which NAL is working toward and for which the basic design is already substantially complete is an on-line full library document locator and control system which may be linked via dial-up service to an international and national science and technology information network. Each portion of CAIN is developed with the broader picture in mind. It was this factor which weighed heavily in selecting cathode ray tube (CRT) terminals for the proposed data gathering subsystem inas- much as CRT's will be the predominant type of terminal in the future network. For convenience in discussion, the system will be described by its sub- systems: data gathering, edit and update, publication, search and con- trolled authorities. DATA GATHERING SUBSYSTEM From its inception the input to CAIN was in the form of punched cards, a method which has proved to be slow and error prone. In order to elimi- nate double keyboarding and excessive time lag, as well as to reduce the 24 Journal of Library Automation Vol. 5/ 1 March, 1972 error rates, it was decided to perform this input function in the library with trained library personnel. To accomplish this, NAL proposes to implement an "on-line" type of input subsystem using CRT's. Although this form of entry is not yet in use, the subsystem should operate substantially as follows. The documents are to be marked by catalogers and indexers and passed to library technicians who will enter the data through CRT's into an on-line storage file. To do this, the technician will call from the hardware pre- stored formats as desired and fill in the data elements required. These formats use English terms and for the most part call for data rather than codes. In addition, data are to be entered in normal upper- and lowercase without diacritics, thus improving visual scanning for errors. An average of four formats will be needed to enter one item. By use of an algorithm, the system would store formatted records for each ID in such a manner as to permit recall singly or collectively. The physical documents are then to be passed on to an editor who can recall any or all formatted records for review. With the document in hand, stored records will be reviewed and corrected if necessary. When accept- able, the records will then be transmitted to magnetic tape. Variations on this procedure could include input direct to tape, storage to tape without recall to a CRT by an editor, cancellation of actions, and a direct purge of the entire storage file without loss of the controlling matrix. The expertise of the library technicians inputting the data should insure far more accuracy than could be expected from multihandling and multi- keyboarding. In addition the system has been designed to accomplish basic pre-CAIN editing of such factors as numeric or alphabetic characters in certain fields and overall lengths of the fields. Errors in these categories will be promptly identified by the computer by a blinking feature on the CRT screen. Another major benefit of this direct approach is that documents can be processed through the system so as to reach the stacks twenty-four days faster than under the current keypunch method. Magnetic tapes created by the data gathering system will be periodically converted from ASCII to EBCDIC and processed into the edit and update subsystem of CAIN. The present NAL time schedule for updating master CAIN files is weekly. This is not a requirement of the system but an ad- ministrative decision based on other deadlines. The data gathering system as prescribed by NAL will be composed of sixteen CRT's, a large on-line storage file , and one nine-track 800 bpi mag- netic tape drive. This configuration will be either a hard-wired "black-box" approach, or controlled by a dedicated mini-computer. The hardware pre- scribed for this subsystem is not included as a requirement of CAIN inas- much as transactions can be entered on 80-column cards if desired. An additional feature of this subsystem will be the generation of manage- Cataloging and Indexing SystemjV AN DYKE and AYER 25 ment information feedback. This will encourage elimination of manual counts and provide accurate throughput volume statistics on a timely basis. Through this means the supervisor will be in a better position to evaluate workload, individual performance, and hardware utilization. EDIT AND UPDATE SUBSYSTEM The first step in the acceptance of transactions is a thorough validation of each data element. The computer is used to relieve librarians of the voluminous and time-consuming edit of many individual elements having predetermined limits. Thus, only a cursory review of the proof-listed rec- ords is necessary by a librarian before acceptance. The system cannot detect, of course, logical or typographical errors, but it can determine the absence of necessary information, codes in invalid ranges, and the incorrect placement of data. Elements for which the system supplies authority files are not only verified against the file but also additional transactions are generated from the authority file to assure uniformity in output. This also eliminates the necessity for librarians having to enter those elements which have a direct predictable relationship to another element. Further validations are performed at the point of building new records or updating records already in the master file. The two "master" files are ( 1 ) the temporary set of unselected records and ( 2 ) the permanent set of those records which have been approved and selected for publication in some form. Data elements specified as required within each record are reviewed. If one or more is missing, the system refuses to approve this record, and a notice is produced concerning this reversal of human input. Fields can be deleted, in whole or in part, replaced or added. Three types of output from this subsystem are: • New updated master files. Those which have been added or altered during this update run are proof-listed for cursory review by a team of professional librarians. Corrections and/ or approvals are submitted in a subsequent update run. • Activity notices. Every action whether submitted by the user or sys- tem-generated which has been accepted for processing is reported. • Error notices. All error and warning messages from this subsystem are compiled into one listing. This includes errors on individual ele- ments, system-discovered errors of omission, and warnings of computer overriding of submitted actions. Through the use of control cards various handling options are possible. One of these is proof-listing of a specific range or ranges of masters by identification numbers or dates. Subject headings are assigned by professional librarians for monographs and new serial titles. For journal articles, however, the system analyzes the title of the article and creates subject index terms, using single words, 26 Journal of Library Automation Vol. 5/1 March, 1972 combinations of two words not separated by stop words, and singular and plural variations. The generated terms are then processed against the con- trolled authority file. Those accepted as valid are inserted in the record for searching purposes. PUBLICATION AND DISTRIBUTION SUBSYSTEM Each data element of a bibliographic item is captured only once and at the earliest possible time in the receipt process. Master records which have successfully passed the edit and update phase become candidates for various types of publications and other user services. Six major modes of publication products are produced by CAIN, at various times and in a variety of both formats and media. Preliminary to the production of formal output there is a screening for records designated as fully acceptable by the edit and update sub- system. As mentioned above, any record may be identified as being ap- plicable to any combination of from one to five users. By a method of control cards the system is informed as to which users are scheduled for publication/ distribution, and the maximum quantity to be selected in each case. This subsystem reviews each record to ascertain its appropriateness for selection. Records meeting the criteria are siphoned off for individual handling. No record is dropped from the temporary file until it has been selected by all applicable users. A New Book Shelf listing may be printed on photocopy paper on request. On preparation, it is ready to be matted, photographed, printed, and dis- tributed throughout the Department of Agriculture. Only enough new book entries are selected by the computer at one time as will fit on three sheets of a four-page publication. Approved cataloged records are selected weekly. Each record is analyzed for applicability to any or all of the eight major files for which catalog cards are prepared. Each card file has its own criteria both in content and in the number and types of cards produced for it. The system produces a separate record for each card required, sorts together the records for each file, and alphabetizes within that file. Leading articles (regardless of language) are printed but are excluded in the sorting procedure. Cards are printed two-up in upper- and lowercase in the format prescribed by Anglo- American cataloging rules. After printing, the cards are distributed to the appropriate organizations and sections where they may be filed with a minimum of additional effort. Monthly, a book catalog is compiled. This contains not only a listing by main entry but also indexes of personal authors, corporate authors, subjects, and titles. A biographic index (major personal author affiliation) capability is available although not presently used by NAL in the book catalog. This catalog is printed in varying numbers of columns changeable by control card option for each index. Again photocopy paper is used with a standard Cataloging and Indexing System/VAN DYKE and AYER 27 upper- and lowercase (TN) print train. An alternate option is magnetic tape output formatted for direct input to a computer-driven LINOTRON. See bibliographic description for more detail. Semiannually the index portions of the book catalog are cumulative. Main entry listings are not repeated. Multiyear accumulations may also be produced. The book catalogs are presently being published from photo- copy printout by Rowman and Littlefield, Inc., New York. Bibliographies, either scheduled or special, can be produced with the same indexes as those in the book catalog. These are normally prepared for printing via the LINOTRON. This magnetic tape record contains all formatting requirements with the exception of word divisions. Document title, page, and columnar (subject category) headers are provided by NAL. Running headers are inserted by the LINOTRON. Through predetermined codes, the CAIN tape specifies the print style, print size, and print format. Bibliographies may also be computer printed on photocopy paper similar to the book catalog. Once a month, each record selected for publication is processed through a merge and adjustment program. At this point published records not previously on the permanent master file are added to it. Those which are already on it are compared and the resident record is adjusted to include the new user for whom the record has just been published. The term field is also verified and updated if necessary. Each term is also used to generate posting records for the subject authority file. The permanent (published) CAIN data base is available on magnetic tape in either the master format or a print format of the linear proof (list- ing of each data element). Only records not previously published are added to the monthly sale tapes. These tapes may be ordered individually (new monthly selections) or collectively (whole file) at the cost of repro- duction only. The tape is nine-track, 800 bpi, EBCDIC with standard IBM 360 header and trailer labels. One of the purchasers of CAIN tape is the CCM Information Corporation of New York which publishes Bibliography of Agriculture from it starting in 1970. Current purchasers include private corporations and universities, both in the United States and abroad. The last type of output is normal computer printout of numerous internal reports in a variety of customized formats. SEARCH SUBSYSTEM The search capability of the CAIN system is not being used by NAL on its own data base at the present time. It is utilized, however, by other organizations who run the CAIN system on a parallel basis, maintaining their own data bases. The following description, therefore, pertains to the programmed system rather than to its use on the NAL data base. This subsystem permits identification and retrieval of records in CAIN format based on search statements as applied to almost every data element 28 Journal of Library Automation Vol. 5/ 1 March, 1972 or combinations thereof. Such searches may use simple statements or a complex series of nested boolean parameters. Questions may also be absolute or weighted to give more precise results. The weight factors if used are normally assigned to each statement within a search question, with a threshold weight assigned to the overall question. The total weight of all true statements must be equal to or greater than the threshold weight for the full query in order to be considered as meeting the search criteria. If such is not the case, the record will not be selected. Since CAIN uses a controlled vocabulary, query statements on subject terms are first matched against that authority file. At this point each in- valid (USE ) term is replaced by a corresponding valid ( UF ) term if appropriate. In addition, if the query statement so specifies, the requested terms may be expanded one level in the hierarchy. In other words, it could generate additional statements requesting all broader, narrower, or related terms as specified if such structure were present for the subject within the vocabulary. Because subject terms comprise the largest percentage of all search elements, an algorithm was developed whereby queries on this type of element are first processed against an inverted file. Identification numbers are extracted for all terms matching the query and only those candidate records are searched using the full query. On a serial file such as CAIN, this concept provides a substantial savings in computer run time. The print options of retrieval output allow either for normal sequence by identification number or for a specific sequence as requested by the originator. The printout may contain all data elements or only those se- lected, all others being suppressed. At the present time this subsystem is used infrequently by NAL and only for internal high priority searches due to the extremely limited subject indexing terms present. It is used more extensively on the parallel operation established for the International Tree Disease Register maintained for the U. S. Forest Service. AUTHORITY FILES SUBSYSTEM This subsystem updates, generates, expands, and maintains three types of authority files. These include subject terms with associated hierarchy, call numbers of indexed journals with abbreviated titles, and a subject term inverted file carrying the identification number of each record using that term. Each transaction to add, change, or delete any data is both edited and reversed before entering the updating sequence. Thus an addition of a narrower term (for example, HORSE) to a base term (for example, ANIMAL) will automatically generate another transaction to add the broader term of ANIMAL to a base term (new or existing ) of HORSE. This precludes having to manually enter both sides of an action as well as assuring reciprocity of entries. Due to the flexibility of the search sub- Cataloging and Indexing System/VAN DYKE and AYER 29 system of CAIN, this hierarchical continuity is of great importance. If an item is changed the same procedure is followed. In the instance of deletion, a broader precept is involved. In this case, the term is deleted from all entries in other hierarchies but is itself left on the authority fil e and marked as being no longer valid. It is thus avail- able for search purposes but is not allowed to be used on subsequent CAIN data records. During a normal CAIN data run, each call number or subject term in a record is verified against the appropriate file. Each element on these files is carried in two forms-one in stripped uppercase, and the other in preferred print form. When an incoming term is found on the authority file, the system substitutes the proper form. This includes substituting a valid term for an invalid term as in the "use-use for" relationship, as well as generation of the appropriate abbreviated journal title for a given call number. In order to keep the authority file up to date, the transactions generated by the publication subsystem are now used to insert the record identifica- tion number into the inverted file as well as increase the number of postings per term. This assists search specialists in formulating queries in the manner which will reduce computer processing time to the greatest degree. When published, the authority files themselves can be printed in a special format which displays the entire hierarchy of each term. In addi- tion, up to ten levels of increasingly narrower terms can be listed for each term. SUMMARY CAIN is a broad-based comprehensive batch mode system which meets many library requirements. Its flexibility is apparent from the fact that it has already been expanded to se lect each newly cataloged serial record for transmission in MARC II communication format to the National Serials data bank being created by the three national libraries. Still more capabili- ties will undoubtedly be built into it before the NAL ultimate on-line system is implemented. The major thrust of the systems design has been to concentrate on simplifying user interface while imposing stringent and extensive service requirements on the computer system itself. Due to its inherent fluidity, CAIN is being retained as an in-house sys- tem. It is so complex that a single change in one subsystem may have radial effects in any or all of the other portions. Continuing efforts are underway to simplify input, accelerate throughput, and expand its already generous services both to the staff of the National Agricultural Library and to those organizations utilizing output from the CAIN system. 5725 ---- lib-s-mocs-kmc364-20140601051432 30 Journal of Library Automation Vol. 5/1 March, 1972 CIRCULATION CONTROL: OFF-LINE, ON-LINE, OR HYBRID Michael K. BUCKLAND, Bernard GALLIVAN: Library Research Unit University of Lancaster, England The requirements of a computer-aided circulation system are described. The characteristics of off-line systems are reviewed in the light of these requirements. On-line systems are then reviewed and their economic viability queried. A "hybrid" system (involving a dedicated mini-computer in the library, used in conjunction with a larger machine), appears to be more cost-effective than conventional on-line working. INTRODUCTION An important feature of a very small library is the close contact between the librarian, his collections, and his users. Over the years as collections and library usage have both increased enormously, librarians have gradually been losing this important "contact." The trend toward increased book use is a desirable one but the sheer pressure of transactions has necessitated the adoption of manual and photographic circulation control systems which concentrate on a restricted range of information about borrowing -notably when a book is due back and who has a given book. Computer- based circulation systems offer the prospect of regaining detailed knowledge of book usage-at a price. This paper reviews three approaches. The desirable features of a circulation control system are that: 1. It should "marry" borrower, bo~k, and date information together rapidly and accurately. 2. It should enable rapid, easy consultation of the issue files at any time in order to detect the location of any book. 3. It should be able to immediately detect and register the fact that an item just returned from loan has been requested by another reader. This ability should not be dependent on whether or not the person returning the book has also remembered to bring in a recall notice. Hybrid Circulation Control/BUCKLAND and GALLIVAN 31 4. It should prepare suitable overdue notices for books retained too long. 5. It should be possible to produce lists of items out on loan to any given borrower and also to signal "over borrowing" (i.e., having an excessive number of books out on loan at any given time). 6. It should be able to detect delinquent borrowers at the point of issue. 7. When material is returned from loan, the system should amend the circulation records promptly and permit the calculation of any fine. 8. It should facilitate the collection, analysis, .and presentation of the "management information" needed to maintain effective stock con- trol and high standards of service. 9. It should perform these tasks reliably and economically. These requirements vary in importance from library to library, but, with some differences in emphasis they appear to apply equally to both public and university libraries. OFF-LINE The commonest approach to computer-aided circulation control is to operate in the off-line mode. Well-documented examples are IBM 357's (Southern Illinois), (2) Southampton University library, (phase II), using Friden Equipment, ( 3) and the current Automated Library Systems ( ALS ) Ltd. equipment. ( 4) These systems can perform the basic operations of issuing and discharging books in an economical manner but because they are operating in an off-line manner they experience difficulties in main- Fig. 1 CARDS BADr.ES DIALS · '~'"m"lli ~ ~ ~I/ ALSonly G :-TAAPPING_l,_ _______ .,. RECEIVER I STORE I .... ___ _ ___ ...J ! c:J or I I L----_J I LIBRARY I I I IBM 357 COLLECTADATA 30. (Friden) ALS ---------------------------------,-------------------------------- 1 32 I ournal of Librm·y Automation Vol. 5 / 1 March, 1972 taining an up-to-date overview of their collections and in detecting reser- vations. They cannot detect delinquent readers at the point of issue. In order to solve some of these problems ALS has been developing an off-line system with a certain amount of storage attached to it. This "trap- ping store," indicated by dashed lines in Figure 1 (see p.31), can contain the numbers of reserved books and delinquent readers to facilitate immedi- ate identification at the point of issue. This system has proved to be quite popular and at least fifteen have been installed in university and public libraries. It is still not able to provide any better currency of information than is possible with a basic off-line system and the ALS system will handle only numeric information. Books are identified by number only, so that if one receives an overdue reminder, it is because books 341672, 816649, and 654321 are overdue-unless there is a substantial matching operation against a complete catalog file. In contrast a system using alpha-numeric characters could include brief author, title, and call-number information on, say, an 80-column book card. This would permit the production of lists by author, etc., without reference to a complete catalog file. It may be noted by reference to Table 1 that an outstanding virtue of the ALS system is the low cost of installing additional data collection points. A notable desideratum in library automation is the apparent lack of a simple, inexpensive data collection unit capable of reading alpha-numeric book cards. If relatively expensive equipment is used (e.g., IBM 357 or Friden Collectadata), there may be difficulties in coping economically with the inherent peakiness of library borrowing. Attempts to mould Table 1. Off-Line Ci1'culation System Costs IBM 357 Basic Two Transmitter and Receiver System ____ $13,000 Maintenance ( p.a. ) __________________ 655 Trapping Store ___ _ --- $13,655 ALS (6-reader Friden Collecta data 30 system) $14,000 $10,000 900 800 11,000 $14,900 $21,800 Notes: 1. The specifications are intended to represent two service points. Since ALS equipment uses separate card readers for borrowing and return, the provision for two borrowing points and two return points would, in fact, have a higher traffic capability than the other two systems. 2. Figures representing British prices expressed as U.S. dollars at $2.40=£1. 3. Collecta Data 30 hardware is at 1967 price. 4. Approximate cost of each ALS reader is $500. - Hybrid Circulation Control/BUCKLAND and GALLIVAN 33 library use to suit the machinery are unlikely to prove satisfactory. Notably a general lengthening of loan periods will result in a lower standard of library service in terms of immediate availability ( 1) and, almost cer- tainly, a decrease in actual book usage. In management information terms, the symptoms of this would be an increase in the size of the issue file compared with the borrowing rate and a drop in issues per capita. ON-LINE Since the deficiencies of off-line working are serious, various attempts have been made to develop on-line circulation systems (see Figure 2). This is the second main formula of which Illinois State library ( 5), Queens Uni- versity, Belfast ( 6 ), and Midwestern University library (7) provide well- documented examples. Such a system is able to maintain a completely up-to- date picture of the issue files. They can detect both reserved material and delinquent readers immediately and appear to provide the complete answer to the library's needs until their technical requirements are ex- amined. In order to control the circulation system in an on-line manner, the library expects there to be at least ten hours on-line working available to it each working day. As more than one university library has already discovered, this number of hours of on-line working is very rarely avail- able at present when computer facilities are being shared with many other users as in a university environment. Furthermore, it is unlikely r=J/LT!PLEXER Input-Output D.C . U. ' s I Fig. 2. On-Line Circulation Control ON-LINE COMPUTER DEDICATED STORAGE C 0 M P U T E R U N I T 34 Journal of Library Automation Vol. 5/1 March, 1972 to become available for quite some years in the future because, with present machines and techniques, on-line working is an inefficient mode of operation unless the computer system is running well below capacity. A further obstacle when sharing facilities is the amount of dedicated storage that must be made available to the library. Storage is a much prized commodity and computer centers are unwilling to forfeit valuable storage for any length of time. It should also be noted that no average-sized library will be able to afford or justify possession of its own dedicated computer adequate for on-line working. A library's requirements for storage, printing facilities, and so on would make such an independent system an extravagance since its power would have to be considerable to handle the vast quantities of data input to it, but it would constitute a grossly under-utilized investment compared with the sharing of the facilities provided in a university or local government computing center. The data collection units could be teleprinters or card reading stations with some printing or display facilities. The number and type of such DCU's will depend on the local work load, but we will consider a system using two alpha-numeric card reading stations with printout facilities plus an interrogating printer. An interface into the main computer and a multiplexing device will also be required. In order to answer queries and to control the circulation in a com- pletely on-line manner the dedicated disk must be large enough to hold the issue file, and having gone to the expense of controlling the issue on- line, it would seem inconsistent to be satisfied with a number-only system. If ~e plan for an expected maximum number of 50,000 records in our issue file at any one time and allow 100 characters per record (i.e., author/ title, class or call number, borrower number, date due back, code to describe the type of loan, i.e., long or short, etc.), the disk must be capable of storing 5 million characters. Since it is usual to store the bulk of the circulation control programs on the same disk and to allow certain parts of the disk to be used as work areas, a total store area of 6 million characters, at least, will be required. The cost of providing adequate dedicated disk storage will depend on the local situation but could well cost anything between $30,000 and $50,000 to purchase. The remaining equipment is likely to cost $20,000 and development costs will be greater than with off-line. A "HYBRID" CIRCULATION CONTROL SYSTEM It is possible to meet all the requirements of a library circulation system in a cost-effective manner by exploiting and combining the main ad- vantages of on-line and off-line working in a hybrid system. The basic structure of such a system is shown in Figure 3 (see p. 36) . As can be seen, the mini-computer is sited in the library building and has the various data collection terminals attached locally to it. The mini-computer is also con- Hybrid Circulation Control/BUCKLAND and GALLIVAN 35 nected via a line to the main computer into and from which it can send and receive data. The important differences between this system and the con- ventional on- or off-line systems are that: 1. The mini-computer spools up the transactions as they occur, into its own storage (either tape or disk). 2. The on-line link to the main computer is only used two or three times each day. This is important, since it implies that the Hybrid system does not require continuous on-line facilities. 3. Supplementary or full listings show the state of the issue file at a particular time. 4. The recent transactions stored by the mini·computer can be inter- rogated and, in conjunction with the listings, gives the immediately current state of the issue file. 5. Reserved books and delinquent readers have their identifiers stored in the core of the mini-computer to enable their immediate identifi· cation at the point of issue. 6. The necessity for dedicated equipment on the main computer (such as a dedicated disk) is avoided. A fairly heavily used library will be handling approximately 5,000 trans- actions each day. Since these transactions will be either issue or return transactions, in the main, if we allow 100 characters worth of information to identify an issue and 20 characters to identify a return, then on the average we will be handling 300,000 characters worth of information each day. In the Hybrid system the mini~computer is acting as a controller to the data collection devices and is spooling this information up onto a magnetic tape until such time as the storage space is becoming full or until a suffi- cient time has elapsed since the last updating of the records. At this time the mini-computer passes the information on its magnetic tape to the main computer via the on-line link. The duration of the on-line connection might be ten to fifteen minutes owing to line speed limitations. In order . to operate a hybrid system, the library would need two periods of on-line working each day of approximately ten to fifteen minutes each. Alternatively the magnetic tape could be physically replaced, the fresh one continuing to record transactions while the full one is carried manually to the main computer. Provided that the tapes can be read by the main computer, on-line facilities would not be required. The recent transactions, having been passed to the main computer, will be sorted and merged with the rest of the issue file which would be kept on magnetic tape. The precise nature of the listings produced by the main computer will depend on local factors, such as the duration of the loan period, etc., but could be either a fully revised complete listing or a listing of the most recent transactions to supplement an earlier complete listing. 36 Journal of Library Automation Vol. 5/ 1 March, 1972 HYBRID Computer Costs Basic Computer (includes Teleprinter ) ------------------------------------------------$ 8,650 Extra 4k of store ----------------------------------------------------------------------------------------$ 3,600 Tape con troller --------------------------------------------------------------------------------------$ 7,200 Dual tape transport --------------------------------------------------------------------------------$ 5, 700 Data break interface ------------------------------------------------------------------------------- $ 600 D. C. U .' s @ $3,100 ( 2 off) ______________________ ___________________________________________ ______ _ $ 6,200 Interface D . C. U .-Mini-computer ------------------------------------------------------------ $ 2,000 Interface Mini-Main computer --------------------------------------------------------------- $ 1,200 $35,150 It is worth noting that the most widely adopted computer- aided circula- tion system in Great Britain is the ALS system, which, if purchased with a "trapping store," is th e nearest equivalent to the hybrid system outlined in Figure 3. The chief differences are that ( 1 ) the ALS system operates on numbers only, which, in our view makes it less suitable for university library applications; ( 2) the "trapping store" is inflexible in its capability when compared with a mini-computer of similar cost. In order to utilize the on-line facilities provided b y a mini-computer to the full, it would be possible to handle the "short loan" reserve collections of popular texts (commonly borrowable for a few hours only) in a com- pletely on-line manner. In this respect there would be no reliance on the main machine. It might also be appropriate to use the mini-computer to handle other library data processing tasks. LIBRARY ~R~ALS rn ~~ ,,!,. [___J COMPUTER d! LJ Consol e typewriter or VOU COMPUT ER U NIT · MAI N COMPUTER -QHOR LISTS, '-----,t,----------' CALL NUMBER - LISTS, NOTICES, ETC . 8 Fig. 3. Simplified Hypothetical " Hybrid" Circulation Control System Hybrid Circulation Control/BUCKLAND and GALLIVAN 37 Last year at the University of Lancaster the average cost per issue was 12.72 cents. Since the University of Lancaster is a new university in Great Britain, it is in the middle of a period of growth and student numbers are expected to increase from 3,000 to 5,400 in the next five years. This uni- versity has also researched into the influence of duplication and loan period adjustment on the availability of stock to prospective users and with the present level of duplication and grades of loan period there is a per capita borrowing rate approaching 80 issues per annum. This figure is expected to increase in the next five years. With these figures as a basis, at least 2 million issues are expected in the next five years. Even allowing for the cost of data conversion and the amortizement of hardware over five years, the use of a hybrid circulation control system could be expected to result in an average cost per issue of just under 12 cents. CONCLUSION The costs already mentioned can be tabulated thus: off-line-$13,000-$22,000 on-line-$70,000 H ybrid-$35,000 This suggests that a Hybrid system offers complete control over library circulation in a highly cost-effective manner compared with on-line work- ing. Whether or not a hybrid system is also to be preferred to off-line working will depend on the individual library context. The trade-off between the marginal advantages and the marginal increases in cost and complexity will depend on the detailed costs and value-judgments specific to each situation. If our diagnosis is correct then most attempts to progress from off-line to on-line working are ill-judged and would appear to have no justification in cost-effectiveness. In our view these developments are unlikely to become fully operational. If they do, their life will probably be short or restricted to limited hours unless exceptional circumstances prevail. Such circum- stances would include continuing subsidies for research and development or the existence of a system justified on other grounds (e.g., police records) . REFERENCES 1. Michael K. Buckland and others, Systems Analysis of a University Li- brary; Final Report on a Research Project, University of Lancaster Library Occasional Papers, 4 (Lancaster, England: University of Lan- caster Library, 1970). 2. R. E. McCoy "Computerised Circulation Work: A Case Study of the 357 Data Collection System, Library Resources & Technical Services 9:59-65 (Winter 1965). 3. B. A. J. McDowell and C. M. Phillips Circulation Control System. Auto- mation Project Report No . 1 (SOUL/ APRl) (Southampton, England: University of Southampton Library, 1970). 38 Journal of Library Automation Vol. 5/1 March, 1972 4. Lorna M. Cowburn "University of Surrey Automated Issue System," Program 2:70-88 (May 1971). 5. Robert E. Hamilton "The Illinois State Library "On-Line" Circulation Control System," in Dewey E. Carroll ed., Proceedings of the 1968 Clinic on Library Applications of Data Processing. University of Illinois Graduate School of Library Science, Urbana, Illinois (London: Bingley, 1969) , p.ll-28. 6. Richard T. Kimber, "An Operational Computerised Circulation System with On-Line Interrogative Capability," Program 2:75-80 (Oct. 1968). 7. Calvin J. Boyer and J. Frost, "On-Line Circulation Control-Midwestern University Library's System Using an IBM 1401 Computer in a "Time- Sharing' Mode," in Dewey E. Carroll ed., Proceedings of the 1969 Clinic on Library Applications of Data Processing. University of Illinois Graduate School of Library Science, Urbana, Illinois (London: Bingley, 1970) , p.135-45. 5726 ---- lib-s-mocs-kmc364-20140601051521 - 39 SELECTIVE DISSEMINATION OF MARC: A USER EVALUATION Lorne R. BUHR: Murray Memorial Library, University of Saskatchewan, Saskatoon, Saskatchewan After outlining the terms of reference of an investigation of user reaction to the selective dissemination of MARC records, a summary of the types of users is given. User response is analyzed and interpreted in the light of recent developments at the Library of Congress. Implications for the future of SDI of MARC in a university setting conclude the paper. INTRODUCTION F. W. Lancaster ( 1968) in his detailed study of MEDLARS makes the following statement, which has application to all SDI work: "In order to survive, a system must monitor itself, evaluate its performance, and upgrade it wherever possible." ( 1) Since SELDOM operates in a fairly new field, SDI for current monographs, an evaluation is most important. To a great extent it must be made without reference to other systems since most of the operational SDI services deal with tape services in various fields of scientific journals, and although there are some parallels, there are numerous dif- ferences. Whereas services such as Can/ SDI cater primarily to the natural and applied sciences, SELDOM opens up the possibilities for SDI in the humanities and social sciences. The background to the SELDOM Project at the University of Saskat- chewan has been outlined earlier by Smith and Mauer hoff ( 1971) and will not be repeated here. (2) After five months of operation a major question- naire was sent out to each of 121 participants in the experimental SELDOM service. This questionnaire was based almost entirely on the one used by Studer ( 1968 ) in his dissertation at Indiana State University. (3) The general purpose of the study was to elicit user reaction to SELDOM, their eval- Material appearing in this paper was originally presented at the third annual meeting of the American Society for Information Science (Western Canada Chapter), Banff, Alberta, October 4, 1971. 40 Journal of Library Automation Vol. 5/1 March, 1972 uation of its usefulness, time necessary to scan the weekly output, sugges- tions regarding continuance of the service, etc. Besides this general purpose, the gathering and analyzing of data on SELDOM will be useful to the library administration in determining the future of an SDI service of this nature. A separate cost study is being prepared in this connection. Several factors prompt a cautionary stance in assessing the value of an SDI system on the basis of one questionnaire: ( 1) There is no control situation to which we can compare SELDOM, i.e., there was no systematic service for current awareness in the field prior to the advent of SELDOM. Faculty and researchers were dependent on their ingenuity to ferret out information on new books which were pertinent to their field of research and instruction. SELDOM is therefore being compared to a conglomera- tion of ad hoc methods which may be as numerous as the individuals using them. Therefore, we must be cautious or we will tend to say, "Something in the field of current awareness is better than nothing," when we really do not know what that "nothing" is. (2) Although SELDOM had been operational for some twenty weeks when evaluation began, this is a rela- tively short period on which to base an assessment. On the other hand Studer's evaluation was based on the experiences of thirty-nine users and covered only eight weekly runs against the MARC tapes scheduled on an every other week basis. (3) SELDOM was implemented without any study to determine the adequacy of the ad hoc approaches, to which I have already referred, nor to assess the patterns of recommendation for purchase. It was assumed that there was a need for SELDOM and some of the response would indicate that this is a fairly valid assumption, since almost 90 percent of the respondents wanted the service continued. A random investigation in mid-August of 748 current orders in the acquisi- tions department for books with imprint of 1969 or later revealed that ninty-five or 12~ percent referred to SELDOM as the source of information for a particular recommendation to purchase. This may or may not be significant since there is no way of assessing whether these items would have been recommended anyway, only later perhaps. One by-product of orders based on SELDOM information is that correct LC and ISBN numbers are given and with the capabilities of the TESA-1 cataloging/ acquisitions system such orders can be expedited more quickly and can also be cataloged sooner than non-MARC materials, thus ostensibly getting the desired item to the requestor in less time than previously. SELDOM is valuable in our university setting, therefore, not only as a means of awareness of new items, but also in the actual retrieval of the item for the user, in this case through acquisition. Our analysis, however, must be directed to the effectiveness of SELDOM as an awareness service, vis-a-vis the ad hoc approach. USER GROUP Of 121 questionnaires sent out, seventy-seven or 63.5 percent were re- turned. Six of these had to be rejected for the purposes of this study since Selective Dissemination of MARC /BUHR 41 Table 1 I. Library and Information Science A. On-campus 12 B. Off-campus 17 29 II. Social Sciences and Humanities A. On-campus 15 B. Off-campus 2 17 III. Natural and Applied Sciences A. On-campus 23 B. Off-campus 2 25 either only a few questions had been answered or a general letter had been sent instead of answering the questionnaire. Thus, the data presented in this study will be based on seventy-one completed questionnaires or 58.6 percent return. Three additional verbal comments were made to the writer and thus we in fact heard from eighty or 66 percent of the users. The term "users" will designate the seventy-one who completed their question- naires, although comments from the other nine individuals will also be referred to. The users have been grouped into three categories according to Table 1. Categorization was along fairly traditional lines, with category I being necessary because of the large number of people falling into this area. The seventeen off-campus users coming under designation (I) represent the library schools in Canada as well as librarians/ information scientists in Canada and the United States. The on-campus users are library depart- ment heads and heads of branch libraries. - Included in the social sciences and humanities are the fields of psy- chology, sociology, history, economics, English, commerce, classics, etc. The natural and applied sciences include all the health sciences plus physical education since the two profiles in that area are tending toward the health sciences. Engineering, poultry science, physics, chemistry, bi- ology, etc. , are represented here. OBSERVATIONS A sample of the questionnaire used appears on p.47-50 and includes a tally of the number of responses for each possible alternative answer to each question. In some cases the total number of replies for a question is less than seventy-one. This is explained by the fact that some questions on some questionnaires were not answered or were answered ambiguously so they could not be tallied. Generally speaking, users found SELDOM to be good to very good in providing SDI for new English monographs. 25.8 percent of the users found the lists very useful while 48.5 percent said they were useful. Six users said the listings were inconsequential for their purposes; in several 42 Journal of Library Automation Vol. 5/1 March, 1972 instances this may be due to poor profiling or profiling for a subject area in which little would appear on the MARC data base. 23.6 percent of the users indicated that in most cases items of interest found on the SELDOM lists were previously not known to them. 45.8 percent said that "of interest" items were frequently new. 76 percent of the group believed that the proportion of "of interest" items which also were new was satisfactory, a percentage which speaks well for the currency and effectiveness of an SDI capability. One of the chief drawbacks for which SDI services are often cited is the absence of evaluative commentary or abstract material to accompany the citations. Some tape services do provide either an abstract or a good number of descriptors, and this has proved to be an asset in helping the subscriber. SELDOM is based on the MARC tapes which provide complete cataloging data but do not give either evaluations or a multiplicity of descriptors. (Some indications are that the information now available in Publishers' Weekly might at some time in the future be added to the MARC tapes. ) Interestingly enough, 83.5 percent of the users said the information included in the entries was adequate to determine whether an item was of interest or not. Predictably, title, author/ editor, and subject headings were the three indicators, in that order, which were found most useful in making evaluations. This is significant since titles in the humani- ties and some of the social sciences, particularly, are often not as specific in describing the contents of a work as are titles in the physical sciences. 63.5 percent of the users indicate that SELDOM information is used for recommending titles for acquisition by the library. As a result it is quite possible that purchasing in the areas covered by SELDOM profiles may increase and the tendency to broaden the collection should increase. Unfortunately, no pattern of pre-SELDOM recommending for purchase is known. Some instructors use the weekly printouts to keep current biblio- graphies on hand both for teaching purposes and for research purposes. Since over half the users ( 55.8 percent) needed no more than ten minutes per week to scan the printouts, there is no indication that excessive time is taken up in the use of such an SDI service. In reply to the question, "Would you be willing to increase the number of irrelevant notices received in order to maximize the number of relevant ones?" opinions were nearly balanced with 58 percent replying in the affirmative and 42 percent answering negatively. On the other hand, increases in the MARC data base expected some time in 1972 when other Roman alphabet language imprints and records for motion pictures and filmstrips are added, did not seem problematic with only 25 percent of users asking that an upper limit be placed on the quantity of material retrieved by their profiles. Numerous individuals (thirty ) responded fav- orably to the prospect of wider language coverage by MARC. On the other hand, several individuals commented that non-English output on SELDOM would not enhance the service for them, and this likely reflects languagt Selective Dissemination of MARC /BUHR 43 capabilities more than a lack of non-English material in their subject area. The question regarding format brought interesting comments, especially from library personnel and off-campus librarians: "Computer type format is often confusing." "A book designer should be consulted to improve the format." "Spacing could be improved to separate title and imprint infor- mation from subject headings and notes at foot of entry. Would make scanning easier." Questions fourteen, nineteen, and twenty-one provide an overall sum- mary of user reaction. 88.6 percent of users want the service to continue. Overall value of SELDOM was rated "very high" by 11.3 percent, "high" by 33.8 percent, "medium" by 42.2 percent, and "low" by 12.7 percent. SELDOM served to demonstrate the possibility of SDI for monographs "amply" according to 36.6 percent of users, "adequately" to 50.6 percent of users, and "poorly" to 12.65 percent of users. There was less certainty on how such a program should be administered or coated, particularly since a long-range cost study was not yet available. Clearly those who were impressed with SELDOM's effectiveness and future possibilities wanted other faculty to have the same opportunities, yet they cautioned against a blanket service. One comment sums this up best, "It should be available to anyone who has a perceived need for it-but require them to at least make the effort of setting up the profiles, etc." Many of the less than enthusiastic comments about SELDOM could be correlated with little or no user feedback to the search editor in order to improve relevancy and recall. User education in this regard is crucial in order that all users fully understand the possibilities and limitations of the SDI service. The success of any existing SDI service in the periodical literature has hinged on a good data base and up-to-date, specific profiling according to Smith and Lynch ( 1971 ). ( 4) The effectiveness of the profiling is a direct function of the ingenuity and persistence of the user and the profile editor. DISCUSSION This study has attempted to weigh the usefulness of an SDI service primarily with regard to its utility as a current awareness service. SELDOM, in order to be worthwhile, must either be faster or broader in its coverage than existing services. Two comparisons readily arise out of the com- mentary of the users. Some library science professors felt that the LC proofslip service was just as fast as SELDOM and thus there was no advantage in having the latter when the former was available. A study done at the University of Chicago by Payne and McGee ( 1970) repudiates this argument fairly effectively. ( 5) Findings at Chicago show that MARC is faster than the corresponding proofslips. A number of users rely heavily on publishers' blurbs and prepublication notices and find that often books for which records appear on SELDOM are already on the library shelves. This observation is not altogether an indictment of SELDOM since another user observed that he appreciated being able to have the hard copy im- 44 Journal of Library Automation Vol. 5/1 March, 1972 mediately; and in some cases he might not even have known about the item except for SELDOM. Some users mentioned that waiting for evalua- tive reviews could put one at least a year behind just in placing the order for the book, let alone receiving it. SELDOM has the virtue of informing individuals of the existence of new books, but the delay in having the actual item might be problematic, so one question was directed to this consideration. Some people felt that it was at least worth something to know that a book existed even if one could not consult it immediately. Numerous complaints were aired re- garding the slowness of obtaining items ordered through a library's ac- quisitions department. In fact one user said this slowness meant he had to purchase personal copies of items he wanted/ needed. As indicated earlier in the introduction, the TESA-1 acquisitions-cataloging routine at the University of Saskatchewan library does have the capability to speed up actual receipt of books by the patron. A recent development at the Library of Congress has definite implica- tions for the future of SELDOM and any other MARC-based SDI pro- grams. The CIP (Cataloging in Publication) 0 program initiated this summer means that LC will now be able to make available cataloging in- formation, except for collation, for books about to be published, at a time factor of up to six weeks before publication. Such MARC records will have a special tag designating them as CIP material. Furthermore, CIP records will appear only on MARC, the number predicted is 10,000 for the first year and 30,000 by the third year, a figure which would include all American imprints. (6) MARC-OKLAHOMA o o has already surveyed the subscribers to its SDI Project to determine whether they would prefer to receive both CIP MARC records and regular MARC records or only one of the two categories. Users preferred to receive both types of infor- mation and appropriate changes have been made to the Oklahoma SDI programs. (7) Beginning with September MARC CIP records will appear and present information on books thirty to forty-five days before they are published. Several library personnel appreciated the usefulness of SELDOM as an outreach service of the university library into the academic community. They see SELDOM as a public relations tool. Numerous efforts are at the present time being made by librarians to alert individuals to materials in their several fields of interest, and SELDOM can play an important role in providing an active dissemination of information on a systematic basis. This is the direction in which we need to move so that our role becomes both that of a collector of information and a disseminator of information. Special librarians have been doing this kind of thing for years and SEL- DOM allows for speCialized service to a larger user group. IMPLICATIONS AND CONCLUSIONS 1. An SDI service based on MARC can be helpful in building a bal- anced library collection depending on the efforts of faculty and/ or bib- Selective Dissemination of MARC /BUHR 45 liographers in setting up their proRles and maintaining them. The article by Ayres ( 1971 ) is particularly good on this aspect. ( 8) The parameters of the MARC data base must constantly be kept in mind, just as the constraints of the ad hoc methods must be considered in any comparisons. Pub- lishers' blurbs in journals have the limitation of not systematically covering all the publications in a given subject area; book reviews tend to appear too late to allow users to receive current information on new books; SEL- DOM corrects the first shortcoming at the expense of not having the eval- uations appearing in book reviews. On the other hand MARC tapes do represent the cataloging of books in the English language by one of the largest national libraries in the world, and thus provide a coverage which is hard to duplicate by any one other alerting service. 2. Comments, especially from users in the social sciences and humani- ties, indicate that an SDI system for new monographs has greater pertinence in their area than perhaps in the natural and applied sciences simply be- cause of the nature of research done in the two areas. A recent study by J. L. Stewart ( 1970) substantiates this factor for the field of political science. ( 9) His detailed analysis of the fatterns of citing in the writings ap- pearing in a collective work in politica science indicated that 75 percent of such citations were from monographs leading him to the obvious con- clusion that "monographs provide three times as much material as do journals" in the field of political science. By contrast, journals are likely more crucial for the fields of natural and applied science, and provide the key access point for vital information. 3. SDI of MARC, most users felt, should demand a fair amount of effort on the part of users to assure that the service would obtain optimum return for money invested. A blanket service to all faculty would be waste- ful since many faculty would not have a perceived need for it and others would not use it enough if it was simply offered free to everyone. Comments tended to favor making contact through the departmental library repre- sentative and channel weekly printouts through this individual. A cost study will help determine whether it is economically feasible to operate SELDOM in an academic setting with at least 100 users. If current sub- scription costs for SDI services such as those offered by Can/SDI of the National Science Library, Ottawa can be maintained, and early indications are that they can, a cost of $100 per profile per year may be feasible bring- ing the annual expenditure for 100 users to $10,000. A chief variable which makes effective costing difficult is the variation in the number of records appearing on each weekly tape and this is a variable which can only be dealt with by prediction on the basis of the number of records on past tapes. 4. SELDOM has the virtue of adding a major role of dissemination of information to libraries which up until now have primarily operated as starers of information. - Selective Dissemination of MARC/BUHR 47 822 33~ SHAKESPEARE WILLIAM FLEAY. FREDERICK GARO. 1831-1909. SHAKESPEARE MANUAL. NEW YORK.AMS PRESS<1970> XXIII. 312 P. 19 CM. lC 76-130621 PR2895 P1002 EN 01 TW 000 WT 000 S R0252 FC LENG 822.33 ISBN 0~0~02~08~ SELDOM Evaluation Questionnaire l. What is your feelin g about the SDI lists as a source for finding out about the existence of newly published works in your fields of interest? Would you say that the lists provided a source which was: (a) very useful (b) useful ( c ) moderately useful (d) inconsequential 18 34 12 6 2. Do you feel that the SDI lists brought to your attention works of in- terest which are not generally cited b y other sources that you use to learn of new publications? (a) many works (b ) some works (c) a few works (d) none 10 39 19 2 3. How would you characterize your feeling about the relative propor- tions of the items "of interest" ( relevant items) and "those not of interest" (irrelevant items) included in the SDI lists? (a) the proportion of relevant items in the lists was satisfactory. 57 (b) the proportion of irrelevant items in the lists was too high. 13 48 Journal of Library Automation Vol. 5/1 March, 1972 4. It is inevitable that some "not-of-interest" items are included in the SDI lists. Was the inclusion of irrelevant notices bothersome to you? (a) yes ( b ) no 6 65 REASONS: 5. On the other hand, it is possible that for any given search run, some relevant items in the file are missed. The chance of relevant items being missed can generally be minimized by certain search adjustments, but with a resulting increase in irrelevant notices. Would you be will- ing to increase the number of irrelevant notices received in order to maximize the number of relevant ones? ( a) yes ( b ) no 40 29 REASONS: 6. The SDI lists notified you of an average of--items per list which you judged to be "of interest." On a purely quantitative basis, would you say that this number was satisfactory, or for some reason too small or too large? (a) satisfactory 48 ( b ) too small 16 ( c ) too large 1 7. When the input to the MARC file is increased, your SDI output would also likely increase. Do you feel that you would like to be able to set some arbitrary upper limit on the quantity of items included in each SDI list even at the risk of missing a number of relevant items? (a) yes (b) no If yes , Maximum number__ _____________ _ 17 51 REASONS: 8. The SDI lists alerted you to a number of items which you judged to be "of interest." Would you say that "of interest" items were new to you? (a) in most cases (b) frequently (c) occasionally (d) seldom 17 33 17 5 9. Do you feel that the proportion of items "of interest" which were also "new" to you was: (a) satisfactory (b) too low 54 17 10. Would you say that, in general, information given for the entries in the SDI lists is adequate to judge whether an item is or is not of interest to you? ( a) yes ( b ) no 58 10 11. What elements of the entry did you most often find useful in making evaluations? (a) author/ editor (b) title (c) publisher (d) series note (e) sub- 38 55 9 4 35 ject headings (f) classification numbers (g) other (please specify) 8 1 Selective Dissemiootion of MARC /BUHR 49 12. What is the primary use to which you put the SDI information? (a) recommendation for library acquisition (b) personal purchase of 51 12 item (c) other (please specify) 15 13. If your recommendation originates the library order for a publication, it will be some time before the work is available; and even if already on order, most of the publications included in your lists were probably too new to be available from the library at the same time you received the list. Do you feel that this diminishes the value of the SDI service? (a) significantly (b) somewhat (c) negligibly 2 w ~ For what reasons? 14. A potential value of SDI service, based on the large volume of newly published works cataloged by and for the Library of Congress, is to bring together in one list timely notices for those works in the file which correspond to your several fields of interest. Do you feel that the experimental SDI service demonstrated this capacity? (a) amply (b) adequately (c) poorly 26 36 9 15. Is the format of the SDI notices satisfactory? (a) yes (b) no 61 9 If not, what format would you suggest? 16. Is the distribution schedule of once a week satisfactory? ( a ) yes ( b ) no 71 0 17. On the average, how much time would you estimate it took to examine an SDI list? Roughly: Minutes: (a) 5 (b) 5-10 (c) 10 (d) 10-15 (e) 15 (f) 15-20 (g) 20 23 16 9 11 5 1 5 18. A possible by-product of this SDI service is the building up of a cum- ulative MARC tape file which can be searched in various ways by computer. Would you make use of such a file? (a) yes (b) no 40 18 If no, for what purposes? 19. Judging from your total experience with the SDI service, would you characterize its overall value to you as: ( a) very high (b) high (c) medium (d) low R 24 30 9 50 Journal of Library Automation Vol. 5/1 March, 1972 20. The MARC file at present represents English monographs cataloged by the Library of Congress on a week-by-week basis. Sometime in 1972, the Library of Congress will begin to add some non-English monographs to the MARC file. Keeping in mind the forthcoming expanded MARC file on which future SDI service would be based, do you feel that its value to you would then be: (a) increased (b) the same (c) less 30 33 7 21. Do you personally want this SDI service to be continued? (a) yes (b) no (c) it doesn't matter ~ 3 5 22. Do you faculty? (a) yes feel that this SDI service should be offered to the entire 42 REASONS: (b) no 14 23. Do you feel that this SDI service should appropriately be made avail- able by the university, i.e., that the university should organize and administer the service? (a) yes ( b ) no (c) don't know 36 5 23 24. Do you feel that the university alone should pay for this faculty SDI service? (a) yes (b) no (c) don't know 30 6 25 25. Optional: General comments, pros and cons, elucidation of above re- plies, attitudes, suggestions, etc., concerning the SDI service. 5727 ---- lib-s-mocs-kmc364-20140601051638 - 51 SCIENTIFIC SERIAL LISTS Dana L. ROTH: Central Library, Indian Institute of Technology, Kanpur, U.P., India This article describes the need for user-oriented serial lists and the develop- ment of such a list in the California Institute of Technology library. The results of conversion from EAM to EDP equipment and subsequent utili- zation of COM (Computer-Output-Microfilm) is reported. INTRODUCTION Prior to the dedication of the Millikan Memorial Library, which houses the divisional collections in chemistry, biology, mathematics, physics, en- gineering, and humanities, the libraries at the California Institute of Tech- nology were largely autonomous, reflecting the immediate needs of each division, and exhibited little attempt at interdivisional coordination of library purchases. With centralization of the major science collections, it became apparent that any efforts to reduce duplication, promote more effective library usage, and provide assistance in interdisciplinary research efforts would require a published list of serials and journals ( 1). SCIENTISTS VS LIBRARIANS It is certainly a truism that serial publications constitute the backbone of a library's research collection. Particularly in the sciences, where serial publications serve as the primary record of past accomplishments, studies have shown that over 80 percent of the references cited in basic source journals are to serials (see Table 1). Citation of serials rather than mono- graphs was greater in chemistry than in other sciences and the overall order may reflect the efficiency of the respective abstracting/ indexing services. In spite of the scientist's heavy dependence on serials, it appears that in most libraries little attempt has been made to reconcile the library 52 ]oumal of Library Automation Vol. 5/1 March, 1972 Table 1. Percentage of citations to serials found in basic source journals for various, scientific disciplines 0 Discipline Percentage of citations to serials Ch em is try ____________________ ____________ ---------------------. __ . _____________________ 93. 6 Physiology ----------------------------------____________________________________________ 90. 8 Physics __ . ____________________________________________________________________________ . .88. 8 ~~~l~~lo~~--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~6:~ Mathematics __________ ·------------------------------------- _____________ ....... 76.8 °C. H. Brown, Scientific Serials (Chicago: Association of College and Re- search Libraries, 1956). record with practices found in the scientific literature. This is in part due to the general acceptance of the Library of Congress dictum that serials should be cataloged according to the general principles laid down for monographs. Fortunately, monographs are generally cited in the scientific literature by entries (author / title ) which invariably appear in the library catalog. Serials, however, present the special problems of so-called indis- tinctive titles, frequent title changes, and common reference to the ab- breviated form of their title. Most American libraries have followed the Library of Congress/Union List practices and as a result have long suffered user complaints about the use of corporate entries for so-called indistinctive titles, entries under the latest form of title, and the treatment of prepositions and conjunctions as filing elements. These practices have been defended as attempts to extend the reference value of the catalog but in doing so they create a number of problems and ambiguities which are only partially resolved by the annoying use of see references. The recent surge of interest in making the library "relevant" and more intimately involved with its users needs must take into account that in the minds of scientists it is a presumptive require- ment for them to remember cataloging rules when the library could just as well accommodate the scientific form. In recognition of the long-standing scientific tradition of describing serials by their titles (which considerably predates the corporate entry syndrome), the logical solution wou ld be to provide title added entries for those serials whose main entry is in corporate form ( 2). SPECIFIC PROBLEMS 1. Even if scientists were to remember the basic rules for society publi- cations and similar corporate entries, how are the exceptions shown in Table 2 to be reconciled? 2. The practice of cataloging serials under their latest title then best serves as an obstruction to determining the library holdings, since refer- Scientific Serial Lists/ROTH 53 Table 2. An example of the difficulties encountered in translating abbrevia- tions of scientific journal titles into LC entries Abbreviation Scientific form of title Union List entry Bull Acad Pol Sci Bulletin de 1' Polska Akademia Nauk Academie ... PNAS Proceedinys of the National Academy ... Nationa ... JACS Journal of the American Chemical .. . American ... Berichte Berichte der Deutsche Chemischen ... Deutschen . .. Comp. Rend. Comptes Rendus ... Academie des Sciences .. . Ber. Bunsen... Berichte der Bunsen. .. Deutsche Bunsen ... Bull. Soc. Chim. Belges Bulletin des Societies ... Bulletin des Bull. Soc. Chim. France Bulletin de la Societe Societies ... Societe Chimique des France ences given in the scientific literature and citations obtained from ab- stracting/ indexing services are obviously to the title currently in use. Another important factor, that is sometimes overlooked, is the requirement of a classified shelf arrangement. Otherwise, since the title of the bound volume corresponds to the title in use at the time of binding, you have the ambiguity of catalog referring to the latest title and shelf locator re- ferring back to the earlier title. These problems are further complicated by the long d elays and backlogs in recataloging. In many large libraries this is a major function of serials catalogers and it is estimated that it takes 50 percent longer to recatalog than to catalog originally ( 3). 3. The jargon of scientists when discussing or requesting information about various periodicals is replete with acronyms and abbreviated forms. JAGS, PNAS, Berichte, Comptes Rendus, Annalen all have well-defined meanings in scientific literature and conversation because of the well-de- veloped title entries and abbreviations given in Physics Abstracts, Chem- ical Abstracts, and the World List of Scientific Periodicals. The use of prepositions and conjunctions as filing elements constrains these scientists to being able to translate these abbreviations only into title entries where the omitted words are obvious, e.g., Journal of the American Chemical Society but often causes problems with titles like Journal of the Less- Common Metals. THE CAL TECH SERIALS LIST: OBJECTIVES AND PROCEDURES The publication of a serials list oriented to the needs of scientists must then provide for: scientific title entries for corporate and society publica- 54 Journal of Library Automation Vol. 5/1 March, 1972 tions, treatment of each title change as the cessation of the old title, and omitting prepositions and conjunctions as filing elements. These practices will increase the number of entries by about 40 percent over the number of current titles but in terms of user appreciation the extra expense is amply justified. The list can then be a logical extension of the library's reference service and offers the opportunity of facilitating the research efforts of its users by obviating the need to remember cataloging rules or visit the library to determine its holdings. Input to the serials list was derived from the library's serials card catalog. The information was typed on oversize card stock and included the full main entry, holdings, and divisional library location, with additional data cards, as required, to reflect title changes. With this data base, an extensive search of the World List of Scientific Periodicals and List of Periodicals Abstracted by Chemical Abstracts was made to determine the additional scientific title entries to be incorporated in the list. (Each departmental library provides a shelf locator which relates the various forms of entry in the serials list to that chosen for the bindery title and subsequent shelf location.) Prepositions and conjunctions were replaced with ellipses in the final typing of multilith stencils required for the manual publication of the first edition of the Cal Tech Serials List ( 4). During the spring of 1969, the decision was made to employ EDP tech- niques in the publication of the second edition of the list. As an interim housekeeping device between editions, the author maintained an in-house supplement on punch cards using a single card format. This experience indicated an unacceptable severity of title abbreviation which was obviated by adopting a two-card format. This is consistent with the IBM 360 system wherein input records are read two cards at a time, and thus, the unit record may be thought of as a "super" card of 160 columns (of which only a maximum of 131 columns can be printed on a given line, the remaining 29 columns being used for internal records). The unit serials record consists of the title, holdings, divisional library, serial number, and spacing command (see Table 3). The unit records were created directly from the existing serial list and the cumulated supplement by in-house clericals. This obviated the usual requirement of coding the data for keypunch operators. Subsequent to the preparation of the unit records, having an alphabetical sequence of punched cards, it was a simple matter to program the computer to serially number each second card, using orre letter and six digits. An example of the distribution of titles one might expect is given in Table 4. While the data conversion was being performed, a series of programs was written. These programs were designed to create a master tape, update the tape, and to produce a variety of listings. These listings, in addition to the required 131-column printout for the serial list, include the 160-column Table 3. The unit serials record Card No. Columns 1 1-75 2 1-27 2 29-32 2 72-78 2 80 Scientific Serial Lists/ROTH 55 Field Designation Title Holdings Divisional library Serial No. Spacing command Table 4. Distribution of titles by initial letter. Letter Number of Title Entries A 1,024 B-D 1,126 E-1 1,199 J-M 1,272 N-R 1,413 S-Z 1,471 printout (in sequential SO-column units) and printouts for individual divisional libraries which can be annotated with shelf locations. The data base was then transferred from punch cards to magnetic tape and subsequent additions and changes involve punch cards and tape one onto tape two operations. As a protective device tape one and tape two are the current and previously current tapes, respectively. Thus in the case of accident the preceding tape can again be updated. As a further precaution the original punch card data base and update decks are on file. The economic justification for the use of EDP equipment in libraries is based upon the necessity of maintaining current records that can be published at regular intervals. In the special case of serial lists this involves the periodic merging of small numbers of new and corrected unit records with the much larger number of unit records in the existing data base. The use of serially numbered unit records allows the relatively easy ma- chine function of merging numbered items in contrast with the difficulties involved in merging large alphabetical fields. Recent advances in reprographic technology suggested that COM (Computer-Output-Microfilm) could be utilized to produce a quality cata- log, free of the normal objections to "computer printout." The flexibility of currently available COM units allows the acceptance, as input of a normal print tape from most computer systems (IBM, Burroughs, Univac) 56 Journal of Library Automation Vol. 5/1 March, 1972 Table 5. Data presentation and relative spacing Title Holdings Divisional Library Faraday Society, London Discussions 1,1947+ Chern 10,1951+ C Eng Symposia 1,1968+ Chern 1,1968+ C Eng Transactions 1,1905+ Chern 46,1950+ C Eng Farber-Zeitung 1889-1918 Chern without reformating ( 6). The print processors resident in the front-end computer of the FR-80, for example, allow for upper- and lowercase, gold characters, column format, pagination, and sixty-four-character sizes. Var- iation in character size allows a maximum density of 170 characters per line and 120 lines per ( 8~ x 11 ) page. The application of COM equipment requires the production of a "print tape." This is simply a coded version of the current tape which contains the additional instructions necessary for spacing the unit records, defining the page size, and inserting "continued on next page" statements. The use of spacing command instruction, as an integral part of the unit record, allows all the information on a given title to remain in one unit and easily provides for a blank line before the next title ( see Table 5). The additional problem of keeping the information on one title together on a given page or providing a "continued on next page" statement was solved by analyzing the information in the eighty-ninth line of each page to determine whether to print another line, insert the "continued on next page" instruction, or begin the title on the next page. Once the film is generated, it is a simple matter to produce plates for the multilith produc- tion of hard copy ( 7). The choice of a ninety-lines-per-page format was influenced, in part, by our desire to use the serials list to break down the reluctance shown by faculty and students toward microformats. This format results in a one- third reduction of the 112-column computer printout and enables our 5,000 current titles to be accommodated on two microfiches ( 152/ pages ). FOOTNOTES 1. For the purposes of this article, periodical and serial are synonymous and refer to publications which may be suspended or cease but never conclude. The term "serials list" should be restricted to publications which record only serial titles ( and supplementary information to dis- tinguish between similar titles), holdings, and internal records. Library catalogs and union lists are quite sufficient sources for relating a title Scientific Serial Lists /ROTH 57 to its successor or precedent, and providing full bibliographic detail. 2. P. A. Richmond and M. K. Gill, "Accomodation of Nonstandard Entries in a Serials List Made by Computer," Journal of the American Society for Information Science 11:240 ( 1970 ); Dana L. Roth, "Letters to the Editor; Comments on the 'Accomodation of Nonstandard En- tries . . . ,' " Journal of the American Society for Information Science (in press). 3. Andrew D. Osborn, Serial Publications ( Chicago: American Library Association, 1955). 4. E. R. Moser, Serials and Journals in the C.I.T. Libraries (Pasadena: California Institute of Technology, 1967). 5. Dana L. Roth, Serials and Journals in the C.I.T. Libraries (2nd ed.; Pas- adena: California Institute of Technology, 1970). 6. Robert F. Gildenberg, "Technology Profile; Computer Output Micro- film ," Modem Data 3:78 ( 1970 ). 7. Computer Micrographics, Inc., Los Angeles, California. 5728 ---- lib-s-mocs-kmc364-20140601051731 58 BOOK REVIEWS Descriptive Cataloguing; A Student's Introduction to the Anglo-American Cataloguing Rules 1967. By James A. Tait and Douglas Anderson. Second ed.; rev. and enl. Hamden, Conn.: Linnet Books, 1971, 122p. $5.00 This second edition contains some corrections to the errors made in the 1968 edition, and includes the changes and clarifications brought out by the AACR Amendment Bulletin. The number of exemplary title pages has been increased from twenty-five to forty, thus giving the student more practice in determining entries and doing descriptive cataloging. This re- viewer believes that a more exact title would be "Descriptive Cataloging and Determining Entries and Headings," because this introductory text not only covers descriptive cataloging as defined and explained in "Part II- Descriptive Cataloging" of the Anglo-American Cataloguing Rules, but also includes some of the basic rules for determining entries and headings in AACR's "Part !-Entry and Heading." There are three distinct sections: descriptive cataloging; determining entries and headings; and facsimile title pages for student practice. Descrip- tive cataloging is covered in just thirteen pages, but all the basic elements are there. The explanations are clear and examples are shown, but not in the context of a full card. (Unfortunately only one full catalog card is illustrated in the entire book.) It is in this section, more than in any other, where the differences between British and American cataloging become obvious. British descriptive cataloging varies in so many ways from its American counterpart that a beginning student in an American library school would be quite confused by these variations. The next section consists of twenty-five pages and is devoted to the basic rules on entries and headings. Examples are used to illustrate the rules and the authors point out some differences between the British and American texts of the AACR. The remaining seventy pages contain the forty repro- duced title pages which are followed by some commentary and a key corresponding to each title page. These title pages give the student a wide range of experience in transcribing the proper information onto the card and in determining main and added entries. Even though this book is an excellent introduction to the rudiments of descriptive cataloging and the determination of main and added entries, Book Reviews 59 its use of British descriptive cataloging precludes its being widely adopted in beginning cataloging courses in American library schools. Donald /. L ehnus Centmlized Processing for Academic Librm·ies. By Richard M. Dougherty and Joan M. Maier. Metuchen, N.J.: Scarecrow Press, 1971. 254p. $10.00 This is the final report of the Colorado Academic Libraries Book Pro- cessing Center ( CALBPC) two-part study investigating centralized pro- cessing. Phase I, reported by Laurence Leonard, Maier, and Dougherty in Centralized Book Processing, Scarecrow, 1969, was basically a feasibility study, whereas this final report describes the beginning six months of opera- tions that tested the Phase I recommendations. Partially funded by the National Science Foundation, the experiment measured anticipated time and cost savings, monitored acquisitions and cataloging operations, and tested product acceptability for six libraries participating in the 1969, six-month study. Even though centralized book processing might hold little appeal for the reader, this volume nonetheless is valuable to technical service heads because of its above average sophistication in applying a systems analysis approach to technical services problems. The authors objectively report their findings, outlining in detail the mistakes, the unanticipated problem areas, and what they believed to be the successes. From the start the authors encountered problems with scheduling. By the time the experiment began most participants had a large portion of their book money encumbered, and the center was forced to accept cata- loging arrearages in addition to book order requests. Those who did send in orders did not conform to patterns predicted in Phase I. Instead, the center was used as a source of obtaining more difficult materials, including foreign language items. It was discovered that in actual practice CALBPC had no impact on discounts received from vendors. The vendor performance study lacked relevancy because it was based upon the date invoices were cleared for payment rather than the date books were received in house. In evaluating the total processing time, four libraries reduced their time lag by participating in the center's centralized processing, and the cost of processing the average book was reduced from $3.10 to $2.63. The product acceptance study showed that the physical processing was only partially accepted with most of the libraries modifying a truncated title that was printed on the book card and book pocket as a by-product of the automated financial subsystem. Other local modifications were made on books processed by the center but that cost or local error correction costs were not reported in the study. CALBPC's automated financial subsystem was beseiged with many prob- lems resulting from lack of programming foresight and adequate consulting 60 Journal of Library Automation Vol. 5/1 March, 1972 by those who had previously designed such systems. Individuals interested in the automation of acquisitions should read this section of the report. CALBPC's problems were typically those of building exceptions to excep- tions in order to accommodate unanticipated program omissio.ns. Simply not recognizing that books could be processed before invoices were paid caused delays and bottlenecks of such magnitude that procedures had to be devised to circumvent requirements of the automated subsystem. Many recommendations were particularly relevant to cooperative ven- tures. In formulating processing specifications such as call number format and abbreviation standardization, CALBPC had not anticipated the infinite local variations they would have to accommodate. They quickly recognized the need for both greater quality control to minimize errors within the system and better communications and educational programs for partici- pants. A reoccurring message was that librarians emphasized the esthetics of catalog cards rather than the content, thus a recommendation was made to investigate whether a positive correlation exists between the esthetics of the product and the quality of the library service. The authors emphasized that a cooperative program depends more upon competencies and willing- ness of individuals than the technical aspects of the operations. Some diversification of services was called for but no mention was made of the possibilities of an on-line system. It was felt that in future operations the center should accept orders for out-of-print and audiovisual materials. Those libraries participating in approval programs had received no benefit by having books sent first to the center, thus it was suggested that the center forward those libraries a bibliographic packet only and that the approval books bypass the center. This well-documented study, half of which is devoted to charts and appendix materials, concluded its recommendations with a positive evalua- tion of the service the center had performed and suggested that public and school libraries should also be participants. Ann Allan 5730 ---- lib-s-mocs-kmc364-20141005023522 61 FROM THE EDITOR At the January 1973 Midwinter Meeting of the American Library Associa- tion, the Board of Directors of the Information Science and Automation Division appointed me to the position of Editor of the Journal of Library Automation. I wish to express gratitude to Don S. Culbertson, at that time Executive Secretary of both the Information Science and Automation Division and the American Library Trustees Association, for adding yet another hat while he prepared a substantial portion of this June 1972 issue of JOLA. As incoming editor, I also wish to describe briefly to the subscribers and regular readers of JOLA the situation of the journal and my plans for its immediate future. You are aware that there has been a hiatus in the publi- cation of JOLA. At this writing, the journal is approximately ten months behind schedule. By taxing the capacity of the ALA staff, JOLA should return to its normal schedule within a year. During the intervening period, I will appreciate greatly the support of ISAD members, JOLA readers, authors, and the ALA staff. With this support the task will be made lighter and perhaps will be expedited. No substantial changes in editorial policy will be made in the near future, as all efforts will be turned toward bringing the journal up to scheduJe. Susan K. Martin, editor 17 March 1973 5731 ---- lib-s-mocs-kmc364-20140601051834 BIBLIOS Revisited j KOUNTZ 63 BIBLIOS REVISITED John C. KOUNTZ: Library Systems Coordinator, California State Univer- sity and Colleges, Los Angeles. When this article was in preparation, the author was Systems Analyst, Orange County Public Libraries, Orange County, California. In the following, Orange County Public Library's earlier reports on its BIBLIOS system are updated. Book catalog and circulation control modules are detailed, development and operation costs documented, and a cost comparison for acquisitions cited. "In 1968 ALA began publishing, through its Information Science and Automation Division, a Journal of Library Automation. It is perhaps appro- priate to note that in the first three quarterly issues only one public library project was described ( 1), and this was a project under contemplation, not one actually in operation." ( 2) This statement by Dan Melcher to substan- tiate his contention that library automation is suspect is, in itself, suspect. The public library project alluded to as being contemplated in 1968 was brought to fruition by Orange County (California) Public Library in 1969, and has functioned with startling success ever since. In addition, the finished system was reported to the library ( 3) and data processing ( 4) worlds in 1969 and 1970 respectively. Orange County Public Library's BIBLIOS (Book Inventory Building Library Information Oriented System) is a system designed to fulfill all functional requirements of a multibranch library which is growing by leaps and bounds (5). Specifically these functional requirements are: acquisitions, book processing, catalog maintenance, circulation control, and book fund accounting, in addition to management reporting on a level not practical in a manual system. 64 ]ounwl of Uhrary Automation Vol. 5 / 2 !unc, 1972 THE FUNCTIONAL SYSTEM The interrelation of these system elements is shown diagramaticall y in Figure 1. Briefly and from a us<'r's point of view, the system works like this: A title is desired by someone, patron or staff member. The p erson refers to the book catalog, Figure 2, to see if the item is in the collection. If it is and not in circulation, he gets the book directly. If the item is in circulation, he can submit a request for it-to rece ive the book on its return. To update the catalog, a cumulative supplement is produced, keeping current the listing of the library's holdings. If the title is not found in the catalog or supplement, the monthly cumulative on· order list, Figure 3, is consulted. If the title is listed , a request is submitted and, on receipt and processing, the book is released to the requester. If the title is cancelled, the requester is notified. When a title wanted for the collection is not listed in either the catalog or the cumulative on-order list, a bibliographic information sheet ( BIS ), Figure 4, is completed and optically scanned into the system. This informa- tion is essentially a pre-cataloging bibliographic description of the desired material. Once entered, these same data serve first to create purchase orders and related reports; then, once edited by the catalogers from the book in hand, to create book card and pocket sets (Figure 5 ), book catalog entries, shown in Figure 2, holding lists (shelf lists ) for each branch, and a broad array of operational reports. It is a feature of BIBLIOS that the descriptive data (from the BIS) are entered in their entirety only once. This means that a bibliographic description need not be initialized by each individual using it; rather, it need only be consulted and, if necessary, corrected or deleted. Thus, an entry once in the system is immediately available for, among other purposes, ordering. This is especially significant since it means that each entry in the book catalog, the catalog supplement, the cumulative on-order list, etc., can be ordered against by simply using the key number for the desired item and the number assigned to the branch wishing to order. This poses the possibility of orders for materials which are OP or otherwise not readily available through the usual vendor channels. BIBLIOS addresses these potential errors by listing (pre-vend list, Figure 6) all order require- ments for review before they are used to create orders. By editing this list against Books in Print and/or publishers' catalogs and taking corrective action, orders for the unobtainable are short-stopped. On placing an order, while a unique subpurchase order number is mechanically created, the key number continues to document the title for processing purposes. In this role the key number follows the order until it is filled or cancelled. Thus, the key is used by BIBLIOS to update inven tory automatically on receipt of an order and to create the card and pocket sets for those materials received. Finally, the key number is used by the branches to report inventory changes and, as a subset of inventory, for circulation control. Rl RUOS Revisited /KOUNTZ fl.S Since it is through the key number (or key, for short) for a bibliographic citation that the citation is used in the various functions performed hy BIBLIOS, perhaps a little detail concerning the key is in order. Bibliographic Data Optical Scan MARC ~ l BIBLIOGRAPHIC jl BOOK CATALOG ~ Master Indices Book Catalogs & Supps. Orders New Materials Reorders ~ ! ACQ UISITIONS ACCOUNTING ~ Sub Purchase Orders On-Order-Lists Budget Reports Vendor Performance Pre-Vend/Review Lists Inventory Update Losses Gifts ..___...._r----- INVENTORY t LOCATOR ~ Lo cator Guide & Supps. Pocket & Card Sets Collection Profiles Fig. 1. BIBLIOS-The functiorwl system. THE KEY NUMBER Circulation Input Transactions Patron Registrations _,...___ ... l Cl RCULATION l ~ Holdings List Book "Tags" Patron Register Overdue Notices Management Repo rts Us e Profiles In Figure 2, the key for 73084452 has been underlined. The key number resembles the LC card order number. Wherever an LC card order number is available, it is used. When no LC card order number is available, a unique Orange County ( OC) number is applied. The OC number consists of two alphabetic characters in the first two positions (at one time the numbers implied year ) of the "traditional" number followed by a six-digit sequen- tial number. Since the Library of Congress has certain idiosyncracies about its card order number, the key also specifies the type of material it repre- sents (for example, only book keys are in the book ca talog ), and identifies each volume, or edition, of a title which has a blanket LC card order number. The selection of the LC card order number for this application was based on a suspicion that the bulk of materials in the collection were already ADULT CATALOG '71 CUMULATIVE SUPPLEMENT 7 AUTHOR-TITLE SECTION Wall, Joseph Frazier. Walter Chandoha's Book of Foals Ward, Mary Jane. Washington, George, Pres. U. S ., Andrew Carneg•e Oxford Umvers•ty 1970 and Horses. The Other Carol•ne~ A Novel Crown 1970 17 32-1799. 1 137p lnde• B•bhog Photos B•ography See. Chandoha. Walter 216p The Journal ol Ma1or George Wa.h1ngton 92-C AA006725 636.1 197 1 AA011379 FICTION 70108078 March of Ameoca Facs1mlfe Seoes. No Wall, Leonard Vernon comp1ter The WalterS ndrome. Ward, Ritch i e . 42 Oogmat r P Reads W1tilamsburgh The Puppet Boo~ Ed. of G. A While 2nd & See· Neel RIC~ard The LIVIng Cloc~s Drawmgs by Hollett Smith Ponied. London. Reponted for r Jefferys Extensive Rev Under Ed. of A. R. Philpott y . Allred A Knopf 1971 385p Index B/ W 1154 Co vers the Pertod from Oct I 153 Faber & Faber 1965 300plnde•B•bhog FICTIONM 1970 79122149 lllusPhotos to Jan 1154 Un•vers•tyMicrO fllms 1966 B/ W lllus Photos Walters, Barbara. 574 1 77111247 32p No Index Maps 791.53 68017740 How to Tal~ w1th Practically Anybody About · . 973.26 66026314-001 Wall Street and w·tchcraft Practically Anyth~ng Doubleday 1970 Warde, Fredenck B . Washington International Arts I • 195p F11ty Years ol Make Bel1eve By Fredeock L tt See Gunther. Max 80856 AA007142 Warde International Pr SyndiCate 19 20 e er Edllor 133 1971 AA012873 · liOp Grant s and A1d to lnd1v1duals 1n the Arts Th W II S Walton, Clarence C . Contammg Ltstmgs o f Most Protess1onat e a treet Jungle. Ethos and the Execut1 ve Values m FICTION 2 I008l 54 Awards. and /nlormafiOn About Colleges See Ney. RIChard Manageoal DeciSIOn MakmgPrent1ce-Hall Wardropper, Bruce W. Edlfor Umvers1tres and Prot Schools of the 332.678 1970 76084477 1969 267plndeKBibhog Spamsh f>oetry of the Golden Age Edited by Arts by the Ed1tors of the Washmgton Wallace, Irving. 658.4 .. 73084~52 Bruce W Wardropper Appleton-Century Inti Arts Lefler Paperback Wa sh lnll The Nympho and Other Mamacs S1mon- Walton lzaak 1971 353p For Lang Poetry Collect1on Art s Le tter 1970 75p No Index Sc huster 197 1 4 7 5p Index B1bhog B/ W The Ll~es of John Donne and George Herbert B1bhog R378. 34 70112695 lllus B•ography Bound wlfh the Prlgom 's Progress. by SP861.08 78132806 Waskow, Arthur I. 301.415 AA011778 John Bunyan V 15 m the Harvard Ware, Clyde. The Freedom Seder A New 1/aggadah l or Wallace, Marcia. C!ass1cs Coll1er 1909 418p The Ecfen Tree Touchstone Pubhsh~ng Passover Holt R1nehart W~nsto 1970 Barefoot '" the Kltchen. A Cookbook tor FICTION 09023026-001 -015 Company 1971 357p 56p B W lllus Summer Hostesses Dra wmgs by Re1d The L1ves ol John Donne and George Herbert FICTION AAO 13079 296.437 7910355 7 Perez Kolman St Marhn·s Pr 1971 150p Bound w1th the P1lgflm ·s Progress. by . Wasley, Ruth. Index B/ W tllus John Bunyan No. I 5 m the Harvard Warmack, Ohver J. Bead oe 51 gn A Comprehens11·e Course tor 6 4 !5 73145431 CtasslcsColl1er 1937 418p The Mystery ol lmqUity . Volume I 2 Thess . Begmner and Expeoenced Craltsman by W II · R b rt FICTION 37040164 -001 -015 2.1 Pub by the Author 1969 120p No Ruth Wasley and Ed•lh Hams Crown a ace, 0 e Edit or W b h J h Index 19 70 216p Index B W Ill us Col Ill us The WorldoiBermnl.l598- 1680 ByRobert am aug • OSep • 200 77013647-001 Photos - 1 W41/4c~ tJnd th~ Cdllors of T1me-Lde The B lue Kmeht An AtlantiC Monthly Press Books T1me ·L1 I e Books. 1970 192p Index Book L11t1e. Brown. 1972 338p Lg dwin G . -------- 746.~ . ~ ...__......._ 81~&,.Chm ; ' CQL.!!.~u s · - · __...------~·~'l...._ _ 79175 nd The~r Cu ~ -~ - - --~ ~-~ Fig. 2. A book catalog page featuring four columns. BIBLIOS RevisitedfKOVNTZ 67 assigned a number, a suspiCIOn which was confirmed on completion of conversion through simple reporting of the keys on file. In short, after fifty years of operation of Orange County's libraries, 92 percent of all titles in the collection had an "LC number," a factor one might weigh when trying to decide between ISBN and LC card order number; nor has it been indi- cated that ISBN's will be developed retrospectively. AN UPDATE TO THE SYSTEM In the paper presented to the American Society for Information Science in 1969 ( 6), neither the book catalog nor the circulation control modules had been implemented. Book Catalog In May 1971, the first edition of BIBLIOS book catalog was released for public use. Since that date, the cumulative supplement has been run six times. The module of BIBLIOS producing the book catalog and cumulative supplement is diagrammed in Figure 7. Input is the title-master file (the system's bibliographic data base) and a specification of the output required. The output options available to the library include the production of either a full catalog or a cumulative supplement (displaying all entries placed on file since production of the full catalog which have been edited by cata- loging). In the case of full catalog production, the title-master file is updated to reflect the use of all qualifying entries for catalog production and the date of their use. This updating facilitates cumulative supplement pro- duction by precluding the use of these entries from display until the next full catalog run. In addition to the type catalog (full or supplement), the library desig- nates the format of the output. Either an off-line print-out or a print file designed to drive a mechanical photocomposition device, or both, can be requested. It is important to note that this print file is designed specifically to be hardware independent, e.g., it will run on RCA, Photon, Alphanumeric, or comparable equipment with equal ease. Hardware independence in its simplest terms means the computer program does not have to be rewritten each time a vendor goes out of business. And, coincidentally, this print file is in the sequence it is to be displayed in. In short, the vendor only performs that processing necessary to make his device set type to the library's specifi- cation for layout, font style, and font size- a specification, it might be added, which calls for upper- and lower-case type from a file in upper-case only. This approach differs from what has become typical of book catalog production in that sorting, file maintenance, and all related processing are sustained by the library through BIBLIOS. The vendor only sets type, prints, and binds. The results spell savings since a potentially error-laden file does not have to be committed to the most expensive of all displays, photocomposition, before corrections can be made. L~20L4C4 CUMULATIVE ON·O~OER LIST _ _ _ _ MEDIA 01 BOOK AUTHOR TITL~ --wiOEREIERG~ SIV WICKER, KINGS~EY '~IER, ESTE::t WIEST, J, I LEVY, P, WILtOX, LESLIE A, _ _ WIL.DE F; , LAURA .( INCAL.LS) WILK, MAX WlLKEKSUN, OAVIO ~ILKES, BILL ST. JOHN WlLKlhSUN, PAUL H, WlLKI~SON, RUPERT WILLARn5, MILORtO WILDS ~~~ Li.C ox, DONt,LO \oilLLC(1ll, DONALD WJLLCUx·, DONALD WJLLIAIIS, BRAO WILL I AMS, CtlLJN ·- - -· \ol I L LI MIS, ETHEL W a _ WILLIAHS, GARTH vii LL I MIS, JAY ~ILLIANS, JOHN G, WILL I MIS, JOYCE WILL 14MS, HILL.ER WILLIAMS, ROOERT M. --WiLLIAMS, TE NN ESSEE ~ ILLIA~S, URSULA MORAY ~ILLIA~S, URSULA MO RAY ~ ILLI~G HAH, ~ARREN We 1-: ILLl$, F. ROY I'll LS J;.~ , EOMLINO HILSON, ~~LEN JANET (CAME WHSON, ERICA wiLSON, H, w.; FIRM, PUSL ¥! HSON, IRA G. 'rl lL.SON, JEAN vi I LSON~ J UliN RGWAN ~ILSON, K~ NNETH L. W INC~[LL, CUNSTANCE MABEL ~INUCHVt EU~ENE C, WINN, MARIE; lllt\HF\tS TALES , ~ I NTEPBURN 1 MOLLIE ~ t NTERSt DONALD L., _ __ \-' lRTE NOER G, PATRICIA l~ ~~ 1 SE 1 ARTHUR WI SE, HERBE~T ALVIN \ollSE, Slr.NE:V T, ~- !li IT HER S1 CARL ~llTKIN, B. E. MY BEST FRtf l-10 wAYS OF NHHLISM WHITE OAK MANAGEM NT GUIDE TO PERT- M~SCHJt S 1/ CYAGE BY THr. ~HORr.S OF SILVER L WIT AND WISrOM Of HOLLY~U CROSS AND TH E S WITC~ULAOE NAUTICA i' ARC HA ECJLOGV AIRCRAF ENC I NES OF T~E W PREVE NTI ON OF u•I MKI NG PR LUC K nF HARR Y WfAV~R MUO (RN LEAT HtR DESIGN NEW UES!GN I N JEWELRY WOOD DESIGN LOST LE l ENOS CF THE WES T ~OMnSEX~AL.S AND THE MlLIT KNUW YOUK ANt r STURS BIG GUL •.EN ANIMAL A 8 C S lL. V E R ~· H I S TL t; FIELD G- I DE TO THE SUTTER ADJUSTA~L[ JUL I~ ONLY WO~LO THLRE IS U C L A SUSI Nr SS FORECAST HILK TRAIN DU~SNtT STOP H snv IN eAr< N THRH TOY MAKl RS FR~E-AC ~ ESS HIGHER ED UCAT JTA~Y C ~OOSES E~ROPE UPSTATE AlllR ICMl PAI NTER I N PARIS CRf~E L. ~MGRO l DERY FICTI ON CATAL~G FOR 1970 Wf-lhT CCJ ii ?UTO;S CA N ~OT DO WEAVING IS FU!·i OA~R lt~C. T O:~ HAVE FA l TH WITH~UT FtAR GUIDE T REFERENCE BOOKS, TO NK IN 1' ULF PLAYGRO . P OL!O K W I NTF~tS TALES l61 1970 TECHN I QUE LF rlANDPU ILT PU HENRY C· NT.m .L 'riALL..U:!: AS ALL.- ARO~ND- T H L - 4GUSE 6RT \ti HU KI Lli:O E!iJCH PUr:l.LL GREAT T ~ES UF Tl RRGR I T lNVfST ~NO ~~TIRl I~ MEXl M\E RJCA . RIDDLE &Ulll< SUMHARY OF CALIFURNIA LAW -------------------------- Fig. 3. All outstanding titles are reported in the monthly cumulative on- order list. ORANGE C3UNTY PUBLIC LIBRAKY LC-CC · N i.J ~UER 7Zl14Z2 9 ••••••••• AAOll44 9 ••••••••• 7 ~ 13~ 0 00 ••••••••• AAOll~5 3 • •••••••• 7 J l ~ 5~89 ••••••••• 1 39 0 27 9 4 9. , ••••••• 73124 9 83 ••••••••• 63 009~42 ••••••••• 711,9~50 ••••••••• 41013 39 7 •••••• 070 7aou 3D57 ••••••••• 72 131 147 ••••••••• 6Y0 17~b7., ••••••• , 7 9 126t 7o ••••••••• 6~0 12400 ••••••••• . 7oo H 0 ~ 6n •• • •••••• AAOl l , O? ••••••••• 6 00 15 2 5 2 •• • ••• • •• ~700 ~JlZ ••••••••• 71 1 36 ~ 8? ••••••••• 7 3 146 ~ 03 ••••••••• A A U12 ~ 5l • •••• • ••• 7ol227Uo ••••••••• AA017727 ••••••••• b 30l364l ••••••••• 79 1 0 2 ~ 11 ••••••••• 7315 2 d 7 ~ ••••••••• AA 0 17711 ••••••••• 7 50 G3 v 24., ••••••• 7514 33 0?. ••••••••• 7 0 149ZZJ., ••••• • • 62 009637 ••••••••• 0 90 35 0 4~ •••••• 070 ' 7 3 112 ~ 23 • • • •••••• AAO l3 , 77., ••••••• 72150 ~2 ~ ••••••••• 7 7 1 2 4 69~ ••••••••• AAo u 6 78 J •••••• o 67 AA c l2 1 55 • •••••••• o7 o l3)ql ••••••••• 5 5o1 J l ~~ •••••• o7l , AAnl 77 9 4••••••••• 7 t::CJ:1orc; .J ••••••••• 600260 ~~ ••••••• • • 761 4 8 ~ 31 ••••••••• 4 ~00 5 5 ~ 2 ••••••••• AAo l 7o e5 ••••••• • • 5J OlO U45., ••••••• 6000 4 79~ •••••• 169 SUrl PU t; tiP. L8 ORDER NO 7127707 7 47 7117906?.49 7121705 '•27 7ll3 C02 0 3.3 7l2 7700J o a 711791 8~36 71 2 2105 -' ~l 7 1 2 !>oOO l 90 7127703 '•36 712210 ~'i 8Q 710 '> 704 \li. O 7l22107Z63 712!>6023 9 4 7122103587 7l277Q5.::i30 712770297 9 7125605046 71277 0 2742 7l2t:l006~ l 71 22104 <. 31 7J.2560l l!l 6 7122100 0 95 712!>6026?.3 71161098 7 !) 7l2!;i605 ;•J3 7l2770 0 o37 7 l27 7 :l 4 Ub8 "Tllol OS 'J65 712Tr 0 2 6 5l 712 560 3 v l3 7125600143 7lz770ll6t 7l 25r) 097CO 712211 ( V ~H 712770 1!U E0 7 i 6 4 712,~03013 711"1903-il~ 7 1256081>0 7127711:323 71 2i'7u tl'>31 7ll~ l()4d69 712770703i. 71221QQ l ?7 712 ~ 60 3 G 5l 7 1277021> 71 711791P t94 712"170 59 05 712 21 0 9t•0 7 VEI~DQR CT BRO l:tP. O SHO BRQ Pr< ~! BRO BR~ BRO Cd BRO BT B~O BRO BRO CRO BT ar.o BIHI 8 Rll &KU ur•u Sf{ O CH UT ~ R l.J Ck O C!-i &R IJ Bf, G flfW LIKO WI P I< \~ ST 00 bT P.RO DD Cf·IA IH a f.:. a p (;RO IHUJ BRO ORt-1 [I T Cll to- o•··· 7l P t.r • .:~ 6 8 PTL QTY CD :?.Z 4 5 l 1 1 23 l 4 2 8 1 2 3 l 1 2 l 1 16 !. l 1 'l 1 7 lo '0 1 j 6 1 26 fl l \ ' ) 6 2 4 l 3 4 2 1 4 l 7.7 3 :; LI ~ T PRTC.t: :i, So-- 7.or, 3.95 4.9~ !!.9~ l. 8 () 7,95 4,9 5 ?.'J:; 2 5. 00 l :> . (; Q 4.5 o 1~.so · 7.~0 8 ,95 5,95 6.9; 5 , C·O 3,3 () 3 .If ., ~ . 9 :; l. 3 5 4.9!> 10. ~ 0 3. 75 3,A il 3. (1 () 6 , ~ I) a. ~o 6. 9 :; 4 .95 7.5 () zs.r.o (l. 9~ 5 .95 b.95 3 .9~ · - 4. 00 6.9~ 4.95 5.9!5 l.C, \) 0 8,9!) 5. 0 0 ;. 9 ) 4. 'i 'i l. -.: ..... a ;j ~ ..... 0 ~ 5' . ;::s 0 < £. CJ1 0 ......_ l\J '--< c: 0 ::l _en ..... 0 CD -l t-0 0 --- 7616035~••••••••• 1 RYCK1 FRA~CIS F A LOArJ~D GUN A~OlZ421········· 1 SACKI JOHN N A ~IEU!ENANT CALLF.V 7712'906tt••••••• 1 SANDERS, EO, N A fAMI~V AA01311~•·••••••• 1 SANDERSON, IVAN TE~ENCE N A \.!• 5 9 A, 61014l1'••!••••!• 1 SANDOZ, MARl ~ J R JHES~ WERE THE SIOUX AAQ1048~••••••••• 3 SANTESSON 1 HANS STEFAN F J DAYS -- AFTER TQMURROW AAol0~7bee•t•••~• 1 SASE~, HlROSLAV N ESTO ES SAN P:RAilCISCU J AA010~77,,,,,,,,, 1 SASEK, MIROSI.AV N ESTU ES WASHINGT0~1 D, C, J 6~0l9787e••••••!• 1 SASEK, ~lROSLAV THIS IS HoNG KONG '!016ZB6," .. "!! 1 SASEK, 11IRDSLAV THIS lS PA~IS 14171270 .. " .. ,., 1 SAXTON, JUSEPiiiNE GROUP FEAST N J R N J R F A 1.1~1 pi{'I(;E ~,,o 4o95 ---------------------------------------------- Fig. 6. Before producing a sub-P.O., the pre-vend list is checked for O .P. materials, among other things. ---------------------------~------------------ , Y , _ r,u~LlC LlORt\RY OAf E .. ll•OIJ·7~ ~AGf- 44 Ol IHJOK , TY PUBLlSH [ fl. UA!E VUR 0 UJSC NET PR AV .. SPEciAL O~OE~l liATA co cD PtT P.RiCE co co 3 ~ARNt:S -NUBI.E ~9'11 •BRO A ! , , E~!JNOtHCS IH A 4 l>OUaLrOAr 1971 •ou A 5 DCJl.ISLE()tiY 19?1 *Ol' A SF 9 STElN-UAY 19"11 *I.H\ 0 A a! A 2 VI K l '·H> PRESS 19'!1 •BT A 8RO A ) E. P, DUTTON 1971 *If!' A ii---3 N -.]. -.]. ----------~----~--------------- ()0 LB~OC301 HOLDINGS LIST COST CENTER j ADULT CALL NU"BER AUTHOR * ()' I 0 795.~15 REESE TERENCE • STORY OF AN ACCUSATION 1 0 795.415 S~EINWOLO ALFRED * SHOAT CUT TO WINNING BRIDGE~· 795.415 S,.ITH THOI'IAS " * LOOK IT UP IN I'OYLE 795.~15 YOUNG RAY • BRIDGE FOR PEOPLE WHO DON T KNOW 0 795.41503 REESE TERENCE + BRIDGE PLAYER S DICTIONARY , 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 795.42 CCLLVER DONALD I + SCIE~TIFIC BLACKJACK AND CO ~ 795.42 THORP EDWARD 0 * BEAT THE DEALER A WINNING STRI 7q5.43& BLACKSTONE HARRY + BlACKSTONE S MODERN CARD TR ~ 795.43& STANYON ElliS + CARD TRICKS FOR EVERYONE 795.540973 R~ND ,.CNALLY * 1970 RAND MCNAllY GUIDEBOOK TO C 796 BISHER FUR "AN * WITH A SOUTHERN EXPOSURE 796 KROUT JOHN ALLEN + ANNALS OF A"ERICAN SPORT 796 MITTELBUSCHER C F + CAll EM RIG~T 796 MURRAY JIM * SPORTING WORLD OF JIM MURRAY 796 SMITH ROBERT MillER • GRANTLAND RICE AWARD PRI2 7<16 SMITH WALTER WElLESLEY + VIEWS CF SPORT 796 VANNIER 11ARYHELEN + INDIVIDUAL AND TEA" SP ORTS 796 WOOD CLE11E NT • COI'IPLETE BOOK OF GAI1ES 796.026 SPORTS RUlES ENCYCLOPEDIA* SPORTS RULES E~CYCl 796.03 S~LAK JOHN S + DICTIONARY OF AMERICAN SPORTS 796.06& AARON DAVID* CHILO S PLAY 796.068 BUTLER GEORGE D * RECREATION AREAS 796.0M ISAACS STAN* CAREERS AND OPPORTUNITIES IN SPOil 796.08 PEPE PHILIPS* WINNERS NEVER OUIT 796.08 WINO HERBERT WARREN * REALM OF SPORT 796.082 ESQUIRE * ESQUIRE S GREAT MEN AND MOMENTS IN SP 796.0&2 SPORTS ILLUSTRATED * SPORTS THE AMERICAN SCENE 796.09 COHANE TIMOTHY + BYPATHS OF GLORY 796.0973 BE9T SPORTS STORIES • BEST SPORTS STORIES FOR 796.0973 BEST SPORTS STORIES * BEST SPORTS STORIES FOR 796.0973 BEST SPORTS STORIES • BEST SPORTS STORIES FOR 796.0973 BEST SPORTS STORIES * B!ST SPORTS STORIES FOR 796.0973 BEST SPORTS STORIES • BEST SPORTS STORIES FOR 7q6.l BROER MARION R • FUNDAMENTALS OF MARCHING 7q6.13 C~~SE RICHARD • HULL~BALOO AND OTHER SINGING F( 796.15 WAGENVOORD JAMES • FLYING KITES 796.3 HOLT RICH~RO • TEACH YOURSELF BILLIARDS 796.31 AMATEUR ATHLETIC UNION • OFFICIAL RULES 796.31 MAXWELL HARVEY C + AMERICAN LAWN BOWLER 796.323 AMATEUR ATHLETIC UNION + OFFICIAL A A U 796.323 SPORTS ILLUSTRATED + BOOK OF BASKETBAll ~NO SN( HANDBAI S GU I Dl BASKET I 7q6.32307 VERDERAME SAL REO • ORGANIZATION FOR CHAMPIDNSI 796.323092 AUERBACH ~RNOLO REO * REO AUERB~CH WINNING THE 796.3230q2 PETTIT BOB • BOB PETTIT THE DRIVE WITHIN ~E 796.3236 A~THEl" PETE • CITY GAME 796.33 CCNERLY CHARLIE • FORWARD P~SS 796.332 SCHENKEl CHRIS • HOW TO WHCH FOOTBALl ON TEl.E1 796.33203 TREAT ROGER L • ENCYClOPEDIA OF FOOTBALL 796.3320'& RIGER ROBERT • BEST PLAYS OF THE YEAR 196Z 796.33206 CURRAN BOB • FOUR HUNDRED THOUSAND DOLLAR QUAA" 796.332077 DEVINE DAN * "ISSOURI POWER FOOTBAlL 796.332077 SCHOOR GENE • TREASURY OF NOTRE DAME FOOTBALL 796.332082 NEWCOMBE ~ACK * FIRESIDE BOOK OF FOOTBALL 7q6.33209 BELL JOSEPH N * BOWL GAME THRILLS --.-----------.- -----------......-.--..-.-- Fig. 9. Th e maintenance of manual shelftists is obviated by a BIBLIOS- produced holdings list for each branch. J------------------------------- COSTA IUSA lOH-FICT ION l TLE I: CARD FROI1 ANOTHER l' LETE CASINO GUIDE ·iE GY FOR THE GA"E OF TWENTY ONE i :KS ; l"PGROUNOS REV EO I I l: SPORTS STORIES I I 'OR GIRLS ANO WOllEN ! 11PE D lA Ls I I ~ QRTS I I I 061 : ~63 ~64 ~66 ~ 970 ~LK GAI'IES ( JKER L l E ! All GUIDE 1965 1966 r iP HIGH SCHOOl BASKETBAll HARD WAY ~.IS ION ' ERBACK 07/01/71 PAGE 341 R s N8R lC/OC NBR • 67~17872 ••••••••• • 61016665 ••••••••• • 7&077366 ••••••••• • 64015641 ••••••••• • 63025374 ••••••••• • 66023116 ••••••••• • 66012019 ••••••••• • 58005566 ••••••••• • 6&022206 ••••••••• 60C01380••••••••• • 62008215 ••••••••• • 2900080~A •••••••• • 8B091802 ••••••••• • 68C25594••••••••• • 62015934 ••••••••• • 53006862 ••••••••• • 60007465 ••••••••• • 3&003909 ••••••••• • 61019409 ••••••••• • 60013658 ••••••••• • 64012696 ••••••••• • 57011288 ••••••••• • 64019529 ••••••••• • 67026079 ••••••••• • 66019433 ••••••••• • 61010232 ••••••••• • 63021480 ••••••••• • 63016506 ••••••••• • 45035124 •••••• 061 • 45035124 •••••• 063 • 45035124 •••••• 064 • 45035124 •••••• 066 • 45035124 •••••• 070 • 65021807 ••••••••• • 49008127 ••••••••• • 68031281 ••••••••• • 5&003667 ••••••••• • 88C40004 ••••••••• • 66025876 ••••••••• • 88090177 ••••••••• • 62011346 ••••••••• • 63014720 •••••• • •• • 67011223 ••••••••• • 66C14357 ••••••••• • AA010179 ••••••••• • 60012110 •••••••• • • 64020856 ••••••••• • 61013913 ••••••••• • 62022305 • • ••••••• • 65022618 ••••••••• • 62005250 ••••••••• • 6201&326 ••••••••• • 64019933 ••••••••• • 2 63016799 ••••••••• 80 Journal of Library Automation Vol. 5/ 2 June, 1972 BRANCH To Circul•tion Control Sub-System Book Card Production (by S ranth) BOOK & DATE-DUE CARDS Fig. 10. BIBLIOS circulation control subsystem. BIBLIOS RevisitedjKOUNTZ 81 cards, cassette, or mini-reels). Ideally, the elusive transactor should be able to "read" a label on the book as well as a patron card. Kimball labels, "Sunburst" tags, magnetically coded swatches and the like have worked and continue to work in the retail trade; there is no reason why they shouldn't work for libraries. The only deterrent seems to be the reticence of their manufacturers to enter an unknown market where, follow- ing the Melcher axiom, they are met with a "stubborn, 'show me' attitude when automation is proposed." ( 8) The products designed into the circulation control module include: weed lists, patron "black lists," circulation profiles (graphically displaying patron use of each branch's collection), and automatic duplicate ordering. Reports measure circulation from a manager's viewpoint, but not to the exclusion of such bread-and-butter products as overdue notices, registration lists, and related statistical recapitulations. A WORD ABOUT DOCUMENTATION For each program in each subsystem of BIBLIOS, forty unique programs in all, there is a formal package consisting of: l. A program specification detailing the inputs, processing, outputs, idiosyncracies, and edits of that program; 2. A listing of the COBOL program itself; 3. An operations binder (notebook) section for set-up and run pro- cedures; 4. A user's guide section relating requirements and diagnostics to the librarians using the program including typical problems; and, 5. Assorted total system binders (notebooks). While some might think "overkill," in automation this is not the case. The BIBLIOS system has yet to fail a scheduled commitment. Further, it is suspected that the mere discipline of documentation caused many serious reconsiderations of program and procedural logic, at the time and on the spot, with the result that BIBLIOS is a reliable system- requiring no major rework and continuing to respond to the library's functional requirements for over two years at this writing. A WORD ABOUT DEVELOPMENT COSTS Both developmental and operational costs for BIBLIOS are known and documented. Specifically, the costs to procure such a system are broken out in Table 1, where each subsystem is examined in terms of the dollars it represents and the assorted tasks required to bring it into being. The totals represent all costs over approximately a three-year period beginning with rough specifications and yielding the first book catalog. It must be noted that final program specifications and coding were per- formed for Orange County by a contractor. This approach was chosen, since a good job done on time was wanted. That the approach was valid is Table 1. BIBLIOS development costs (including full conversion and publication of first book catalog). (X) 1:-0 '-. 0 Bibliographic Book Catalog ~ MARC Inventory Locator Guide Acquisitions Circulation Total Contractor 0 -Program Specifications t"-' & Coding $16,686 $ 54,299 $ 25,800 $ 72,305 $ 91,000 $260,090 & ;::; ""' .'= Orange Co. Public Library ~ :::: ..... Analyst 3,360 7,840 2,240 14,560 7,000 35,000 0 ;:; Coordination ..... 1,225 7,679 818 5,310 5,670 20,702 -· -~ :s Implementation ( K.P. , < Machine Time, Etc. ) 4,772 12,263 4,635 7,879 10,110 39,659 £. Con version/ Outside Cll -Services 800 53,500 41,370 95,670 L-:> Subtotal 10,157 81,282 49,063 27,749 22,780 191,031 ._ ,.. :l "\.) ,.... TOTAL $26,843 $135,581 $ 74,863 $100,054 $113,780 $451,121 c;o -..1 l..O BIBLIOS RevisitedfKOUNTZ 83 evidenced by the achievement of a successful system on schedule and within budget. This approach reflects a contention that librarians can specify their requirements if they "have a mind to," and that a contracted programming staff can satisfactorily perform to predetermined standards and timeframes if properly directed. In direct contrast to this approach are the incredible schedules developed when requirements are not specified (and frozen), and the suspected monumental costs hidden in lost staff time due to extended parallel operations or simply waiting until "they" get the " ... thing" to run right. The remaining cost components, briefly, reflect direct library analyst time, the cost of coordination meetings, direct key punch and machine time for programs, their test, debug, string test, systems test, and for the biblio- graphic and book catalog, subsystems conversion and catalog print file generation. The conversion/outside services include a MARC subscription, the creation and use of a group of nine typists to optically scan the library's files to convert them to machine readable form (including error correction), and the contracted services of a photoreproduction house to mechanically compose, print, bind, and deliver 500 sets of the book catalog and 100 sets of the locator guide. These are the costs of setting the system up, staff training, and creating a single operational display: the book catalog. A WORD ABOUT OPERATING COSTS Early in 1965, as a prelude to implementing a book acquisition program, a time/cost study was performed to determine how much it cost the library to order a book (one title). This study detailed and costed the typing, sorting, assignment of vendors, and the reduction of a diversity of paper requisite to creating a purchase order. Excluding the cost of the purchase order form itself, the direct manual cost for this process was $1.56 per title, using a clerical rate of $2.10 per hour. In the intervening years three things have happened: First, clerical rates have increased to $2.79 per hour which when applied to the unit cost of the 1965 acquisitions study means a direct outlay of $2.07 per title (as against the previous $1.56). Second, the number of branches has increased which implies that, if the manual system of 1965 could cope with the increased load, it would have required more people and therefore an increase in indirect costs, not to mention the probability of less efficiency due to increased direct costs. Third, Orange County has automated this function (as well as others). Since Orange County is wont to track costs, it so happens that the cost for creating a purchase order ( subpurchase order under the new system) is available. Specifically, Orange County knows computer and peripheral costs and the exact time for processing from actual billings over the past two years. The reduction of these data to a per-unit-handled equivalent, while detailed, is not difficult. Thus, it is possible to deduce the machine costs Table 2. Typical processing costs for one title in Orange County Public Library's BIBLIOS system. MARC Acquisitions 2 Book Catalog (Weekly) Bibliographic1 Inventory Order Receive3 B.C. Inventory Run Cost $325.16 $300.40 $201.21 $1244.94 $238.55 $238.00 $26.00 Average Items Per Period 1154 1,000 8,100 700 4000 Order Receive Cost/Entry 2.83 0.30 0.025 $1.78 $0.34 0.059 0.0006 Supplies 0.13 0.028 $.05 (Sub P.O.) Services 0.02 ( Convelope) .06 ( Opscan) 0.041 0.0028 (Opscan) ( Comp /Print) ( Comp /Print) TOTAL $2.96 $0.32 $0.053 $1.89 $0.34 $.10 $0.0034 Example: Cost of entry from initial input to display in book catalog (including Convelope; excluding MARC source: $2.77). 1 40% Bibliographic. 2 60% Bibliographic. 3 Includes invoice, vendor, and budget displays. 4 If all new entries to system came from MARC. (X) ..,... 0' ::: 3 1::) ........ ~ t""< 6.:. ..., ;:::: ..., <::: :;.,... ::: .,... 2 :::; ..... c· ;:; < 2.. '-" -...._ 1'0 ._ ,.. 5 "(1) ,_. ~ 1'0 BIBLIOS RevisitedjKOUNTZ 85 equitable to those for the earlier manual effort : creating a purchase order for one title, including the purchase order form , now costs $1.89. Similar economies can readily be documented as can the increases in service to our patrons at no increase in staff. The operating costs for those BIBLIOS subsystems in regular use are given in Table 2. Only two entries on this table are not self-explanatory. MARC MARC, which is indicated as processed weekly, has not been run for over a year. The explanation is simple economics. It costs $0.32 to manually place a bibliographic description on file (excluding the time spent to circle an entry in Publishers, Weekly (PW) vs. $2.96 to process the same entry from MARC. This cost for MARC includes the subscription cost prorated to selected entries, the translation and format of all MARC entries, the automatic release of those entries of limited value to a public library, the cumulation of entries which may be of value, the extract and transfer of those entries selected, and the reporting via indices and full listings for the contents of the cumulated file. The unit cost is the actual processing cost for MARC II files for one year divided by the number of titles processed through the rest of BIBLIOS during the same period. This cost does not include corrections to selected MARC entries (invariably in the call num- ber and author fields for consistency with the library's existing files). The costs affiliated with processing corrective input closely resemble those for bibliographic, e.g., $0.32 each. Prorated Bibliographic Input BIBLIOS works on pre-cataloged entries. The 60 percent bibliographic input shown under acquisitions relates to the full initial description for a title being entered by a book selector to effect its order and subsequent reporting; the 40 percent shown under bibliographic is for cataloger input to adjust the entry for title-page accuracy, consistency with existing files , and, for nonfiction, the assignment of call numbers and subject headings. It is important to note that for reorders against a title already in the system, no bibliographic input is required. In the case of reorders, the per title cost is $0.88 including subpurchase order forms. REFERENCES 1. John C. Kountz, "Cost Comparison of Computer versus Manual Catalog Maintenance," l ournal of Library Automation 1:159-77 (Spring 1968). 2. Daniel Melcher, M elcher on Acquisition (Chicago: American Library Association, 1971 ), p. 135. 3. John C. Kountz and Robert Norton, "BIBLIOS-A Modular Approach to Total Library ADP," Proceedings of ASIS 6:39-50 ( 1969). 86 Journal of Library Automation Vol . 5/ 2 June, 1972 4. John C. Kountz and Robert E. Norton , "BIBLIOS-A Modular System for Library Automation," Datamation 16-79-83 ( Feb. 1970 ) . 5. Orange County Public Library presently has twenty-six branches, three bookmobiles, and plans for at least three more branches and an addi- tional bookmobile in the near future. 6. Kountz and Norton, "BIBLIOS-A Modular Approach." 7. The device affiliated with the book depends on the transactor. The only requirements are that it mechanically represent the key for the book, be practically indestructible, and that it can be prepared mechanically. This last consideration is an absolute when there are 800,000 volumes to convert. 8. Melcher, Melcher on Acquisition, p. 135. 5732 ---- lib-s-mocs-kmc364-20140601052109 COMPUTER ASSISTED CIRCULATION CONTROL AT HEALTH SCIENCES LIBRARY SUNYAB 87 Jean K. MILLER: Associate Health Sciences Librarian for Circulation and Dissemination A description of the circulation system which the Health Scien ces Library at the State University of New York at Buffalo has been using since October 1970. Features of the system include automatic production of overdue, fin e, and billing notices; notices for call-in of requested books; and book avail- ability notices. Remote operation and processing on the IBM 360/40 and CDC 6400 computer are accomplished via the Administrative Terminal System (ATS) and Terminal job Entry (T]E). Th e system provides informa- tion for management of the collection and improved service to the user. INTRODUCTION The Health Sciences Library of the State University of New York at Buffalo (SUNYAB) serves the teaching, research, and clinical programs of the five schools of Health Sciences at the University-Medicine, Dentistry, Phar- macy, Nursing, Health Related Professions-as well as the Department of Biology. It is the biomedical resource library for the five teaching hospitals affiliated with SUNYAB and for the health professionals within the nine counties of the Lakes Area Regional Medical Program. Service demands had increased steadily since 1961 with the incorporation of the university within the State of New York This was apparent in the Circulation Department where statistics indicated a 21 percent increase in the circulation of materials between FY 1967/ 68 and 1968/ 69. The circu- lation system in use was inefficient and time-consuming for both the user and the clerical staff. The user was required to fill out a charge card for each book, giving his name, address, and status; and the title, author, year of publication, volume, copy number, and call number of the book. The card 88 Journal of Library Automation Vol. 5/ 2 June , 1972 was stamped with the due date and filed alphabetically by main entry. Problems resulted from illegible handwriting, selection of incorrect main entry, and incorrect filing. Control of library materials was inadequate. The system to be described was adopted following consideration of the requirements of an effective system of circulation control and of the resources available to the library. Planning for the development and imple- mentation of the automated circulation system began in the fall of 1969. Funding was provided by a Medical Library Resource Grant and the Office of the Provost of Health Sciences of SUNYAB. System design began in February 1970; programming was accomplished during June and July; implementation started in August; and the system was operational in October 1970. Costs of operation have been provided by the University Libraries of SUNYAB since April 1971. COMPUTER FACILITIES The Health Sciences Library shares the facilities provided by the Depart- ment of Computer Services on campus. The current installation is an IBM 360/40 H with an eight-disk drive 2319 unit, six magnetic tape devices, card read and punch unit, and a 1100 line-per-minute printer. It includes a 2703 telecommunications unit supporting forty 2741 terminals and a 2701 unit with parallel data adapter unit interfacing a channel-to-channel adapter to a CDC 6400 computer. The IBM Operating System, SCOPE 3.2.0 version 1.1, is used. PROCESSING The library's circulation system was designed to use the Administrative Terminal System (A TS) and Terminal Job Entry ( TJE) for remote opera- tion and processing on the IBM 360/40 and CDC 6400 computers. Programs are written in FORTRAN for CDC 6400-6500-6600 (version 2.3). The pro- gram modules comprising the circulation system require from lK to 60K and from 0 to 2 tape units for processing. ATS documents are used rather than punched card decks as program and data input media. The system incorporates several large data bases which are updated at regular intervals. A file of current circulation transactions ( 80 characters per record) is mahltained on both magnetic tape and in A TS storage. This file is merged daily with new transactions. Names and addresses of univer- sity personnel and students are maintained on magnetic tape. A file of inactive circulation records (50 characters per record) is also maintained on tape. Other smaller files are stored in ATS and are updated daily and/ or weekly. No permanent disk storage is used. Input of data and programs is made from the IBM 2741 terminal located in the Circulation Office of the library. Data are entered daily by the clerical staff via the ATS terminal. Storage, retrieval, and text editing are performed as required. Processing of data is initiated by the library staff. A properly sequenced assemblage of ATS documents consisting of data and pro- Computer Assisted Circulation Control/MILLER 89 gramming instructions ( TJE input file) is input from the IBM 27 41 terminal. This input file is submitted through TJE for execution on the CDC 6400 computer. The data are processed in accordance with the specific job command entered at the terminal. After processing, the output is stored as a single ATS document. In some instances, the clerical staff divides the ATS stored output into discrete output files for storage and subsequent use. Selected segments of the output (notices, save lists, etc.) are produced in hard copy format and delivered to the library by the Computer Center (Figure 1). HSCCIRC SYSTEM The Health Sciences Library Circulation System ( HSCCIRC) provides: 1. A file (Query File) of all monographs off the shelves which includes a record of: a. Books charged out. b . Books on interlibrary loan. c. Books on reserve. d . Books at the bindery. e. Books on the "hold shelf'' which have been returned upon the request of another patron. f. Books on the "new book shelf." g. Books which have been declared lost and are in the process of being replaced. 2. Overdue notices to all borrowers. 3. Billing notices to students for those books not returned after a second overdue has been sent. 4. A file (Fine File) indicating the amount owed by individual students for overdues. 5. Fine notices to students if an overdue book is returned but the fine is not paid. 6. Notices to users having books requested by other patrons. 7. Hold shelf notices alerting library personnel to those books which have been reserved for library pat rons. 8. Book availability notices to users who have made "save" requests. 9. A file (History File) containing records of inactive transactions. 10. Daily and cumulative (fiscal year-to-date) statistics of the trans- actions. The foregoing lists the information which the system provides on a routine basis. Other modules of the system permit access to additional information as required. For example, lists may be prepared of books currently in circulation to interlibrary loan, on reserve, or at the bindery. These lists are used by the staff involved in processing these materials and may be updated at their request. 90 Journal of Library Automation Vol. .5 j 2 Jun e, 1972 QUERY FILE PROCESSING QUERY FILE (ATS) 8 8 ,.-----'--'---- " PROCESSING FINE UNPAID FINE FILE FILE \ \ \ \ \ ' \ ' ' ' ' ' ' TRANSACTION FILE(S) \ ' \ \ \ \ \ ' \ \ ' \ ' ' ' \ \ \ \ \ UPDATED FINE FILE Fig. 1. System overr.;ieu;. HISTORY FILE ANALYSIS TABLES CHARTS LISTS ADDRESS FILE UPDATE UPDATED ADDRESS FI LE Computer A ssist ed CiTculation Control/ MILLER 91 CREATION NAMES/ ADDRESSES MASTER 1~---i ADDRESS TAPE CREATION CREATION NAMES / ADDRESSES SEMESTER FACULTY/ STAFF LETTERS 9 SPECIAL RUN SEQUENCES LISTS (ILL . RESERVE BINDERY) 92 Journal of Library Automation Vol. .5/ 2 June, 1972 The History File is analyzed quarterly. The analysis provides a statistical breakdown, by user categories, of the transactions which occurred since the last analysis. The total number of charges, renewals, and save requests for each of the five user categories are tallied. The call numbers of the books borrowed by members of each user category are listed. Multiple charges of the same book are incremented and recorded. This information on book usage and borrowing patterns assists in library management decisions. It is possible to identify high usage of specific volumes or subject areas and to determine whether the demand is from the faculty, staff, or graduate or undergraduate student body. Records of heavy demand and multiple save requests aid in decisions to purchase additional copies of a monograph. At the end of each semester, faculty /staff letters are prepared and mailed. Each notice lists the call number and due date for overdue books currently charged to the faculty or staff member. The notice requests return of the book( s) before the beginning of the next semester. Statistics generated by the system (Figure 2) are used in the preparation of monthly, quarterly, and annual reports. They have been used as a basis for decisions on policy such as that resulting in a change in the length of the circulation period in April 1971. Subsequent statistics have been used to evaluate such changes. In addition, the system permits rapid, easy consultation of the Query File to detect the location of any book off the shelves. This is accomplished through use of the printed Query File (Figure 3) which is arranged in call number sequence. It contains one line of information for each transaction New Chrgs Holds Spcl chrgs (ILL) Spcl chrgs (BND) Spc 1 ch rgs (RES) Renewals Save requests Recall letters Ho 1 d 1 etters Books overdue 1st overdue 2nd overdue Bills Lost books Discharges Discharges (hld shlf) Fig. 2. Circulation statistics. 42572 Year to date 112 5 5 0 1 12 3 2 3 48 0 0 0 2 173 5 2225 185 85 48 116 308 74 63 99 1230 773 364 122 61 2837 154 Computer Assisted Circulation Control/MILLER 93 *t)L696/P2L27 /1 *ClO *T 60772 *D 70772 *P 202319 *U3 *QL696/P2L27 /1 *C51 *T 71172 *D 71872 *P 202319 *U3 *QL697/G4/l966/l *ClO *T 62672 *D 62672 *P 71165132 *Ul *QL697/G4/l966/l *C52 *T 70772 *D 71472 *P 71165132 *Ul *QL698/H25/l *ClO *T 61972 *0 71972 *P 138551 *U3 *QL698/J3/l953/l *ClO *T 61972 *D 71772 *P 138551 *U3 *QL698/S78/l965/l *C20 *T 62372 *D 72372 *P 244856 *U2 *QL698/S78/l *ClO *T 61972 *D 71972 *P 138551 *U3 *QL698.3/A7/l965/l *ClO *T 61972 *D 71972 *P 138551 *U3 *QL703/W22/l968/Vl/l *ClO *T 51172 *D 61172 *P 102714 *U3 *QL706/p25/1957/l *ClO *T 71172 *D 81172 *P 440503366 *U4 *QL715/Cl32M/l966/l *ClO *T 70272 *D 80272 *P 180439 *U3 *QL73l/D67F/l969/l *ClO *T 51872 *D 61872 *P 102714 *U3 *QL73l/D67F/l969/l *C53 *T 70772 *D 71472 *P 102714 *U3 *QL73l/E55/1953/l *ClO *T 51872 *0 61872 *P 102714 *U3 *QL731/E55/l953/l *C53 *T 70772 *D 71472 *P 102714 *U3 *QL737/C2H24/l948/l *ClO *T 50772 *D 60772 *P 147339 *U2 *QL737/C2H24/l952/l *C60 *T 42472 *D -0 *P -0 *UO *QL737/C2H24/l966/l *C60 *T 42472 *D -0 *P -0 *UO *QL737/C23C7/l969/l *ClO *T 62172 *D 72172 *P 175470 *U2 *QL737/C4L5/l96l/l *C60 *T 30172 *D -0 *P -0 *UO *QL737/C7/l964/l *ClO *T 62272 *D 72272 *P 220053 *U3 *QL737/M3F919K/l969/l *ClO *T 60972 *D 70972 *P 165872 *U3 ~QL737/M3F919K/l96l/l *C51 *T 71172 *D 71872 *P 165872 *U3 *QL737/P9B9/1963/l *C60 *T 12272 *D -0 *P -0 *UO *QL737/P9H4/Vl/1953/l *ClO *T 51172 *D 61172 *P 102714 *U3 *QL737/P9H4/Vl/l953/l *C54 *T 71172 *D 61172 *P 102714 *U3 *QL737/P9M64/l967/3 *ClO *T 61572 *D 71572 *P 360307268 *U6 *QL737/P9S3/l965/V2/2 *Cl8 *T 100471 *D -0 *P 777777777 *UO LEGEND: *Call number *C Transaction Code *T Date of transaction *D Due date or date next notice will be generated *P Patron identification number *U User category Fig. 3. Query file. changing the status of a book. For example, when a book is charged, the Query File contains one line of information relating to the charge. If the book becomes overdue, a second line of information is automatically gen- erated indicating the overdue status of the book. A two-digit transaction code defines the status. The transaction code is entered as part of the input (as code 10 when charging a book); or it is generated by the system, as occurs when a book becomes overdue and initial and subsequent overdue, billing, and/or fine notices are produced (code 51,52,53,54). This same information may be obtained through on-line query of the circulation file from the IBM 2741 terminal during the hours of operation of ATS. Access to the Query File is either by call number of the book or identification number of the user. The latter is used when producing lists of items out on loan to a borrower and in detecting delinquent borrowers. 94 Journal of Library Automation Vol. 5/2 June, 1972 OVERVIEW Comparison of statistics between FY 1969/70 and FY 1971/72 showed a 12 percent increase in circulation. During the same period there was a 61 percent increase in the number of people using the library. The Circula- tion Department has been able to handle the increased workload more efficiently because of the automated system. A decrease in clerical time required for carrying out the tasks of the department has been realized. The circulation records are now updated five times per week and notices are issued promptly. Previously, updating was possible only once in every seven to ten days. Service to the user is much faster and more accurate in charging books and in providing information on this status. Control of items loaned to users is more effective. Information for management of the collec- tion and provision of improved service is available. System disadvantages are related to the mode of data input and lack of author and title information on records. Transcription errors occur during manual capture of data at the time the transaction occurs and when the data are entered by the clerical staff from the terminal. Correction of errors requires rekeying and reentry of the corrected data for reprocessing. This increases cost in terms of personnel time and equipment use. Author and title information is not provided in the Query File or on notices sent to users. This is an inconvenience to the user and requires checking of the shelf list by library personnel to provide the information when required. These potential disadvantages were recognized at the time the system was planned. However, they were not considered serious draw- backs. The decision was made to adopt the system and, when additional funds were forthcoming, to provide machine readable input and add author and title information to the records. COSTS The cost of the system during its first year of operation was $10,590.65. This included monthly charges for rental of equipment, use of ATS, storage of records, computer time, and print costs. IBM 27 41 Terminal (including phone line) A TS sign on time ATS storage Computer time and print costs Total $1082.86 1042.08 3187.24 5278.47 $10,590.65 Unit cost figures are imperfect, but over 69,000 transactions were pro- cessed and over 20,000 notices generated at an average unit cost of 11.6 cents. Clerical time is not included in this figure. The number of clerical assistants remained constant although, as noted, all phases of the work of the Circulation Department increased. Computer Assisted Circulation ContmljMILLER 95 FUTURE DEVELOPMENT In the future, the library hopes to be able to take greater advantage of the on-line query capability of the present system. Additional IBM 27 41 terminals at selected locations in the library could provide instantaneous file query. While non-routine queries are made on-line, the library now uses printed listings for most routine queries. The installation of automatic data input devices, such as IBM 1030 equipment, would permit reading of coded book cards and patron identifi- cation cards with direct transmission of data to ATS storage. The hardware and software modification required to implement this additional capability is technically feasible and not financially prohibitive. The present system is to be installed soon in another library on the SUNYAB campus. Implementation should require only minimal software modifications to identify and keep separate the records of the other library. Adoption is simplified because of the fact that book cards are not required and that the circulation file consists only of charged materials and not a record of complete library holdings. ACKNOWLEDGMENTS The following individuals contributed their varied talents and support to the development and implementation of the system: Mr. Erich Meyerhoff, former librarian of the Health Sciences Library; Gerald Lazorick, systems design programmer, former director, Technical Information Dissemination Bureau, SUNYAB; Mrs. Jean Risley, programmer/analyst; Mr. Mark Fen- nessy, former library intern at the Health Sciences Library; and the clerical staff of the Circulation Department, especially Barbara Helminiak and _Evelyn Hufford. 5733 ---- lib-s-mocs-kmc364-20140601052211 96 ]o11mal of Library Automation Vol. 5/ 2 June, 1972 ANALYSIS OF SEARCH KEY RETRIEVAL ON A LARGE BIBLIOGRAPHIC FILE Gerry D. GUTHRIE, Steven D. SLIFKO : Research & Development Divi- sion, The Ohio State University Libraries, Columbus, Ohio Two search keys (4,5 and 3,3) are aMlyzed using a probability formula on a bibliographic file of 857,725 records. Assuming random requests by record permits the creation of a predictive model which more closely approximates the actual behavior of a search and retrieval system as determined by a usage survey. INTRODUCTION Systems planners are hard pressed to accurately predict the access charac- teristics of search keys on large on-line bibliographic files when so little is known about user requests. This paper presents a realistic model for analyzing different search keys and, in addition, the results are compared to actual request data gathered from a usage survey of the Ohio State University Libraries Circulation System. A number of papers are available in the literature concerning search key effectiveness; however, all of these were done on relatively small data bases ( 1-5) . Of particular importance to this paper is Kilgour's article on truncated search keys ( 6) . PURPOSE The purposes of this study are ( 1 ) to determine the comparative effec- tiveness of the 4,5 and 3,3 search keys, ( 2) to compare two predictive models, and ( 3 ) to test the results with an actual usage survey. METHOD The Ohio State University Libraries Circulation System contained at the time of this study 857,725 titles representing over 2.6 million volumes in the Analysis of Search Key Retrieval/GUTHRIE 97 OSU collection. The data base used for this study was the search key index file which contained one search key for each title in the master file. The search key is composed of the first four letters of the author's last name and the first five letters of the first word of the title excluding non- significant words ( 4,5 key). Title words are passed against a stop-list to determine significance. The stop-list contains the words: a, an, and, annual, bulletin, conference, in, international, introduction, journal, of, on, proceed- ings, report, reports, the, to, yearbook. The search key file is in sequence by search key. For comparative purposes, a second search key file was created and sorted which contained a 3,3 key (the first three characters of the author's last name and the first three characters of the first significant word of the title. ) The two files of sorted search keys were then processed by a statistical analysis computer program. This program created a frequency distribution table of identical keys, i.e., how many keys were unique, duplicated once, duplicated twice, etc. From this table two models were compared. Modell: File entry was viewed as a random process with choice of any unique search key equiprobable. This model has been suggested in the literature mentioned earlier. It states that if X;. number of keys will return i matches then the probability of a file search returning i matches may be written: P(i) = Xi/Ku where Ku is the total number of unique file keys. Likewise, the cumulative probability for I or fewer matches is I I P(I) = ~ P(i) = ( l x;. )/Ku i= l i= l Model 2: File entry is viewed as a random process with the choice of any record equiprobable. Thus, P( i) = ix;/Rt where R t is the total number of file records. Correspondingly, I I P(I) = l P(i) = ( ~ ixi )/Rt i= l i= l Survey: The Ohio State University Libraries Automated Circulation System includes a telephone center to which patrons may telephone requests for 98 Journal of Library Automation Vol. 5/2 June, 1972 library holdings information and for checking out and renewing books. Telephone operators, sitting at cathode ray tube ( CRT) terminals, translate the patron's author-title request into a 4,5 search key and proceed with a file search. By having the telephone operators treat te lephone calls as random input to the system and recording the number of matches returned for each search used, results can be generated in the same form that both of the models take, i.e. , I or fewer matches have been returned P( I ) x 100 percent of the time. This is a relatively easy survey to conduct since the output list of match- ing records for any particular key entry is headed with the exact number of matches which follow. The sample size was 1000 information requests recorded over two one-week periods separated by one month. Before these two subsamples were merged, statistical analysis on their individual means (for percent of 10 or fewer matches) signified they were identical at the 99 percent confidence level. RESULTS The results predicted by the two models for both a 4,5 and 3,3 search key for 1-10 matches appear in Tables 1 and 2. The figures pertaining to the 4,5 key can be compared directly to the data received fro m the survey conducted through the OSU Library's tele- phone center. This comparison is shown in Table 1 for 1-10 matches. Table 1. File Access Comparisons (4,5 search key). (Percent of time I or fewer matches returned) I 1 2 3 4 5 6 7 8 9 10 Actual Survey 35.9 53.8 66.0 73.1 78.5 81.3 83.8 85.6 86.6 87.8 Modell Model 2 (random key) (random Tecord) 81.3 55.7 92.9 71.6 96.3 78.5 97.7 82.4 98.4 84.9 98.8 86.6 99.1 87.8 99.3 88.8 99.4 89.6 99.5 90.2 To acquire a 99 percent upper confidence limit on the percent of requests returning 10 or fewer matches, the normal distribution was used as an approximation to the binomial distribution ( n = 1000, p = .878 ) producing an upper limit of 90.2 percent. Analysis of Search Key Retrieval/GUTHRIE 99 Table 2. File Access Comparisons (3,3 search key). (Percent of time I or fewer matches were returned ) I 1 2 3 4 5 6 7 8 9 10 DISCUSSION Modell (random key) 64.3 81.0 87.9 91.6 93.7 95.1 96.1 96.8 97.3 97.7 Model 2 (random record) 28.0 42.5 51.7 58.0 62.7 66.3 69.3 71.8 73.9 75.7 In Table 1 the results of the survey show that 87.8 percent of all searches recorded returned 10 or fewer titles. In Modell, assuming that requests of the file are random with respect to search key, it is predicted that 99.5 percent of all searches will return 10 or fewer titles. All predicted per- centages for Model 1 are consistently higher than observed results. The predicted response in Model2 more closely approximates the observed behavior of the system as the number of responses increases. However, Model 2 is also consistently higher than the actual survey. Comparing Model 1 and Model 2 only, it is apparent that assuming a random record request more accurately reflects the true usage of a library collection. The lower percentages recorded in the actual survey may be attributable to a number of variables not taken into consideration in this study. Clus- tering due to common English word titles and common names may account for the greater part of this difference. Table 2 shows the results of predicted response for a 3,3 search key. In this table, Model2 predicts that only 75.7 percent of requests will return 10 or fewer titles. Equally important, only 28.0 percent of the requests will return a single record. CONCLUSION In predicting the expected behavior of an information retrieval system, it is more accurate to assume random requests by record than to assume random requests by search key. Probability predictions are deceptively high for assumed random key requests and do not reflect actual usage of the file. Even assuming random requests by record will produce higher-than- observed results. Data calculated using Model 2 should be considered as an upper limit or "ideal" performance indicator. Regarding the results of 100 Journal of Library Autvmatio11 Vol. 5/ 2 June, 1972 the random record model as the upper limit on effectiveness of the search key, the data gathered indicate that, as the search key is shortened from 4,5 to 3,3, the deviation between the random key and random record models is considerably heightened. The 4,5 search key is more efficient for retrieval of 10 or fewer records from a large file than the 3,3 key (90.2 -75.7 percent ). Based on these data, the OSU Libraries decided to retain the 4,5 search key and not reduce it to 3,3. Additional studies should be undertaken to determine the effects of com- mon word usage, common names, and their relation to book usage. Secondly, the data presented here could be systematically and randomly reduced in size to predict the behavior of various search key combinations on varying file sizes. REFERENCES 1. Philip L. Long and Frederick G. Kilgour, "A Truncated Search Key Title Index," Journal of Library Automation 5:17-20 (Mar. 1972 ). 2. Frederick G. Kilgour, Philip L. Long, Eugene B. Leiderman, and Alan L. Landgraf, "Title-Only Entries Retrieved by Use of Truncated Search Keys," Journal of Library Automation 4:207-10 (Dec. 1971 ). 3. Frederick G. Kilgour, "Retrieval of Single Entries from a Computerized Library Catalog File," Proceedings of the American Society for Infor- mation Science 5: 133-36 ( 1968) . 4. Frederick H. Ruecking, Jr., "Bibliographic Retrieval from Bibliographic Input; The Hypothesis and Construction of a Test," j ournal of Library Automation 1:227-38 ( Dec. 1968). 5. William L. Newman and Edwin J. Buchinski, "Entry / Title Compression Code Access to Machine Readable Bibliographic Files," Journal of Library Automation 4:72-85 (June, 1971 ). 6. Frederick G. Kilgour, Philip L. Long, and Eugene B. Leiderman, "Re- trieval of Bibliographic Entries from a Name-Title Catalog by use of Truncated Search Keys," Proceedings of the American Society for Information Science 7:79-81 ( 1970). .. 5734 ---- lib-s-mocs-kmc364-20140601052239 101 AN INTERACTIVE COMPUTER-BASED CIRCULATION SYSTEM FOR NORTHWESTERN UNIVERSITY: THE LIBRARY PUTS IT TO WORK Velma VENEZIANO: Systems Analyst, Northwestern University Library, Evanston, Illinois Northwestern University Library's on-line circulation system has resulted in dramatic changes in practices and procedures in the Circulation Services Section. After a hectic period of implementation, the staff soon began to adjust to the system. Over the past year and a half, they have devised ways to use the system to maximum advantage, so that manual and machine systems now mesh in close harmony. Freed from time-consuming clerical chores, the staff have been challenged to use their released time to best advantage, with the result that the "service" in "Circulation Services" is much closer to being a reality. The transition from a manual to an automated system is never easy. North- western University Library's experience with an automated circulation system was no exception. The first three months of operation were especially harrowing; there were times when only the realization that the bridges back to the old system were burned kept the staff plugging away with a system which often seemed in imminent danger of collapse. That they survived this period is a tribute to their persistence and optimism as well as to the merit of the system . The impressive array of obstacles was offset by a number of positive factors. Even though there were mechanical problems with terminals, the on-line computer programs worked flawlessly from the first. The climate for change was favorable. The automation project had the complete support of library administration; the head of circulation services, although new to the department and untrained in automation, was completely committed to the system and was able to transmit his enthusiasm to his staff. 102 Journal of Library Automation Vol. 5/2 June, 1972 Within three months, the systems analyst, who had been available for advice and trouble-shooting, began to fade from the scene. Only an occa- sional minor refin ement is now necessary. Maintenance problems, both in programs and procedures, are minimal. Basically the system has proved itself workable. In a previous paper by Dr. James S. Aagaard (lOLA , Mar. 1972 ), the development of the system is traced and the system is described in terms of its logical design, program, and hardware components. The present paper will describe how the system operates in the library environment. The system accomplishes the traditional library tasks connected with circu- lation, but the methods used have changed radically. The development of effective procedures must in large part be credited to the circulation staff. These procedures have in a real sense spelled the difference between an adequate system and a good one. It is these procedures on which we will concentrate. The author wishes to thank the head of circulation services, Rolf Erickson, and his assistants, Mrs. Eleanor Pederson and Mrs. Lillian Hagerty, for supplying th e information to bring her up-to-date on procedures as they have evolved over the past three years. BOOK IDENTIFICATION Almost 100 percent of the 900,000 books in the main library's circulation collection contain punched cards. Accurately punched book cards, available in all books, can make the difference between success and failure of a circulation system. The book cards contain only the call number and location code. There is no doubt that, if conversion funds had been less limited, we might have elected to capture author/title data. However an analysis of the amount of data which could be carried on an 80-column card, added to the fact that this would quadruple the cost, led to the decision to omit author/ title. As a result, key punch costs were exceptionally low-1.1 cent per card. In spite of our fears, the complaints by users because overdue and other notices do not contain author / title have been surprisingly few. Cards for new books are, with a few exceptions, produced automatically as output from the Technical Services Module. All book cards are also on magnetic tape and constitute a physical inventory of the entire circulating collection, which is updated at intervals and listed. USER IDENTIFICATION The system requires a unique numeric identification number for each borrower. For faculty and Evanston campus students, this is their social security number; for special users it is a five -digit number assigned by the library from a list of sequential numbers. The number is supplemented by a one-digit code which identifies the type of user. ,. Interactive Circulation Syste m j VENEZIANO 103 The university's Division of Student Finance has responsibility for is- suance of punched plastic badges for students. Each spring at pre- registration time, data are gathered and pictures taken for students planning to return to the university in the fall. Badges are ready for distribution as soon as school opens. For incoming freshmen, transfers, and returnees, data are gathered and badges punched at registration time in the fall. A temporary paper badge is used during the several weeks required for badge preparation. An outside contractor prepares and punches the badges. There were initial problems with the accuracy of punching but these have been re- solved. The library now has a small IBM 015 badge punch, which it uses for punching special user badges and badges for carrel holders. Student badges are valid for one year. The user code is changed each year to prevent use of an expired badge. Faculty and staff badges are issued by the Person- nel Department of the University, and are good for three years. These are also produced by an outside firm. BOOK SECURITY Exit guards examine all books taken from the library to ensure that they are properly charged. The call number on the book and on the date-due slip are compared; the user number on the date-due slip and the user's badge are compared. This need not be a character-by-character comparison. A few selected characters will suffice. Student badges contain their pictures, which should bear at least a resemblance to the holder of the badge. Initially, students were not required to show their badges. After a rash of book thefts resulting from the use of lost or stolen badges, this policy was changed. The book-check routine sometimes slows exit from the building during peak periods ; however it is considered a necessary security measure. The problem of lost badges is a serious one. Users tend to leave badges in the terminals. Usually such badges are turned in at the main circulation desk by the next user; the owner is notified to come in and pick it up. If a student loses his badge, he must report it to the circulation desk as soon as possible. He is issued a special use r badge, and the computer center is notified to "block" his regular user number. If someone then tries to use the badge, an "unprocessed" message will appear in lieu of a valid date-due slip. The problem is timing. "Blocking" is done only once a day. A determined thief can charge out a considerable number of books before the number can be blocked. For this reason, a check of the photograph on the badge is important. The maximum number of user numbers which can be blocked is fifty. Fortunately, except for faculty /staff badges which are good for three years, student badges automatically become invalid at the end of each school year, and special user badges expire at the end of each quarter. Behind the decision to go on-line was the belief that a university library, 104 Journal of Library Automation Vol. 5/ 2 June, 1972 to effectively serve its patrons, needs to be able to determine the status of a book without delay. All books which are not in their places on the shelves as indicated by the card catalog are, in theory, retiected in the computer circulation file. Out of a circulating collection of 900,000 items, the number of records in the file at any one time will range from 30,000 to 60,000. This includes books temporarily located in the Reserve Room, books being bound, and books which have been sent to the catalog department. It also includes books which are lost or missing but which have not yet been withdrawn from the catalog. A single 2740 typewriter terminal, located at the main circulation desk, is used for inquiry into the circulation file. A library user, having obtained the call number of a book from the catalog, looks for it in the stacks. If he is unable to find it, he inquires at the terminal. The operator enters a command "search," followed by the abbreviated call number of the book (the key ) . If one or more records with this key are in the file, the file address, plus the balance of the call number (the key extension), are typed back from the computer for each such record. If one of the listed records is the desired one, the operator then asks for a display of the record. The display includes the due date, type of charge, user number, and, if there is one, the saver number. The ability to use an abbreviated call number to access the file has proved invaluable. The operator can in effect "browse" among all the various editions, copies, and volumes of a particular book which are in circulation. The technique also facilitates finding a record, such as a volume in a serial, where the format is often quite variable, and not always obvious from the call slip supplied by the user. If a large number of books all with the same key are in the file, there is sometimes a considerable wait while the typewriter types out the addresses and key extensions for all the records. Once such a listing begins, there is no way at present to cut it off in mid-point. This is a minor inconvenience; it could be remedied quite easily if computer core were not such a precious commodity. The single 2740 terminal is heavily used and plans are under way to substitute a cathode ray tube in the near future. BOOK LOCATE PROCEDURES If a search on the 27 40 terminal reveals that a book is not in circulation, the individual may ask that it be "located." A form is filled out and the book is searched nightly in the stacks. ( It is also again searched in the 27 40 since it may have been charged out to another user after the inquiry. ) If it is found , it is brought down and placed on the "save" shelf, and the inquirer is notified that it is available. If it is not found , the form is held for two weeks and searched again, both in the 2740 and the stacks. If it is not found on the second search, it is Interactive Circulation SystemjVENEZIANO 105 entered into the file as a "missing book." The circulation section has found that entering missing books into the file as soon as possible saves them time, because a search for a single book is often duplicated needlessly for a number of different individuals. SAVE PROCEDURES When a user is informed that a book is in the circulation file, he may ask that it be called in for him, provided it is not on loan to the Reserve koom and provided it is not already "saved" for someone else. The 2740 operator calls in the record and adds the saver's identification number to the record. Each weekday morning, '·book needed notices" are sent over from the computer center for books "saved" since the last notice run. The notices are stuffed in window envelopes and mailed. Even though the number of saves is small, in relation to the total number of books charged out, this feature has contributed to the library's and the user's satisfaction with the system. Initially there was some consideration given to providing for multiple saves on the same book. A study of the frequency of multiple saves indicated that the increased system complexity did not warrant it. Moreover, a student usually cannot wait too long for his turn at a book. A better solution in a university library is either to buy more copies or place high demand books in the Reserve Room, or both. The standard loan period is four weeks. A save on a book causes the due date to be recalculated either to two weeks, or to five days from the date of the save, whichever is later. This variable loan period increases the number of users who can use a book in high demand, without inconvenienc- ing the user of a book which no one else needs. To succeed, such a call-in policy must be backed with enough force to ensure that a called-in book is returned promptly. If a book is returned after the revised due date, the user incurs a penalty fine of $1.00 per day in addition to the regular 10 cents per day fine. Expired call-ins result in a weekly computer-generated reminder. When a book which is saved is discharged, the terminal printer issues a message to this effect, and the book can be placed on the "save" shelf instead of being sent to the stacks. Each night "book available notices" are produced for all such books discharged since the last notice run. The first copy of the notice is mailed to the saver; the second part is inserted in the book. The saver is given five weekdays to pick up the book. BOOK CHARGES Self-service Charges During the regular school year, from 1000 to 1200 books per day are charged out through the system. Most of these charges are processed by the users, on the self-service terminals. 106 ]ounwl of Library Automation Vol. 5/2 June, 1972 A basic objective in the design of the system was to make it easy for the user to charge out books. Initially it was planned to have manned charge- out terminals. However, as the design of the system progressed, it became evident that the vast bulk of charge-out transactions would consist of three simple steps: ( 1) insert the user badge, ( 2) insert the book card, and ( 3) tear off the date-due slip. The idea arose: If the procedure was so simple, why not let the user himself do it, thus saving the cost of terminal operators? There was some concern over user reaction, but it was decided it was worth the risk. A simple set of illustrated instructions is attached to the terminal. Since the terminal will not accept badges or book cards unless they are inserted in the proper direction, the user soon gets the idea. The terminal will also refuse a seriously off-punched badge or book card. If everything is done properly, the printer produces a date-due slip containing the user number, the book ca11 number, and the date due. This is detached and placed in the book pocket. If, instead of a valid date-due slip, the user receives a slip from the printer containing the word "UNPROCESSED," he is instructed to take all materi- als to the main circulation desk. This condition will occur if the individual tries to take out a book which is already charged out (perhaps to the Reserve Room or a carrel). It also happens if the badge or book card has fewer than the required complement of characters or if the user code on the badge has expired. It also happens if the user's number has been "blocked." Although readers had no difficulty mastering the technique of using the 1031 badge/card reader, the 1033 printer was another story. Despite the printed and illustrated injunction to "tear the slip forward ," the users insisted on pulling the paper upward. The result-the continuous roll of paper would start to skew and the paper would eventually jam. To alleviate the skewing problem, we had pin-feed platens installed in the typewriters. These prevented the skew, but the upward pull on the paper caused the pin-feed holes to tear and get out of alignment with the pins. The result-a paper jam. The IBM field engineers valiantly tried to overcome the condition but to no avail. IBM was unwilling to make any major modification of the paper feed mechanism, and no amount of argu- ment that such an improvement would increase their sales to other libraries had any effect. In desperation, the library fina11y took its problem to the Physics Shop in Northwestern University's Technological Institute. The technicians there designed and built a hooded feed to channel the paper upward and forward at the desired angle. A hand-actuated knife blade was installed to cut and dispense the ticket-type slip. In spite of these heroic efforts, paper jams still occur with enough fre- quency to be annoying. Since the terminals are isolated in the stacks, a jam often goes undetected until a user comes down to the main circulation desk Interactive Circulation SystemfVENEZIANO 107 with a complaint. For this reason, we have plans to install a "ticket printer," which will automatically cut and eject a ticket with no user intervention. Unlike the 1033 printer, there has been very little down-time due to malfunctioning of the 1031 badge/card reader. Due to their isolation on the stack floors, there was some early tampering with the terminals. Now that the newness has worn off, the terminals seem to have lost their appeal to pranksters, except that the photographs used to illustrate procedures have a way of disappearing. Everything taken into consideration, the self-service concept has proved completely feasible. It saves staff time and user time. The time required to charge out a book ranges from ten to fifteen seconds. Carrel Charges Each quarter, the circulation section assigns carrels to individuals, mostly graduate students and faculty. Carrel holders may charge out books for use in their carrels. A special loan code is entered which results in the date-due slip bearing the word "CARREL." The user cannot take these books from the building. Carrel charges are subject to call-in after two weeks but are not subject to fines. At the end of each quarter, unless the carrel has been reassigned to the same individual, any remaining books in the carrel are picked up and discharged. Once a quarter, the carrel user receives a computer-printed list of books charged to his carrel. Carrel holders tend to charge large numbers of books. For saving time on their part and on the part of staff, plastic badges are issued. These will contain the carrel number, the carrel code, and an expiration date. Carrel holders may then use the self-service terminals in the stacks. Charges to the Reserve Room The Reserve Room does not use the circulation system for charges to individuals, since the loan period is so limited. However the circulation file contains a record of all books located in the Reserve Room. When a book is charged to the Reserve Room, the identification number of the Reserve Room is entered in the 1031 slides, together with a loan code indicating an indefinite loan period. Processing of large batches of books is speeded up by suppressing the printing of date-due slips for all intralibrary charges. After charging, the punched book card is removed and held until the book is ready to be returned to the stacks, at which time the book is discharged in the regular manner. If a book needed for reserve cannot be found in the stacks, it is searched in the 2740 terminal. If it is in the file , a save is placed on the record which generates a book-needed notice. The user is given five days to return the book. When the book is returned and discharged, a printer message alerts 108 Journal of Library Automation Vol. 5/ 2 June, 1972 the discharger, who places the book on the shelf for pick-up by the Reserve Room. If the book is not in the file, it goes through the "book locate procedure," after which , if it is not found , it is processed as a "missing" book. If su ch a missing book turns up, it can be immediately identified as needed hy the Reserve Room. A quarterly listing, in call number order, is re ceived from the computer center for all books charged out to the Reserve Room. This list serves as the Heserve Room's shelf list. Bindery Charges If a book is found to be in bad condition, it is set aside for a bindery decision. If it is beyond repair, it is charged out to the Catalog Department to be replaced or withdrawn. (After it is withdrawn it is deleted from the file.) If it can be repaired in-house, it is charged out to the mending section. If it must be sent to a commercial binder, it is charged to the bindery. The bindery section prepares an extra copy of the bindery ticket for all periodicals and unbound items, which it sends to the bindery. This ticket is used to keypunch a book card, which is then used to charge the book to the bindery. Whenever a book is back from mending or binding, it is discharged before sending to the stacks. Renewals All renewals are processed at the main circulation desk. The procedure is identical to a regular charge except that a slide on the terminal is set to "renew." The new date-due slip will contain the phrase "RENEW TO." In theory, the self-service terminals could be used for renewals. In prac- tice, unless elaborate precautions were taken, a user could renew a book before it became due and then return it for discharge, leaving one slip in the book and keeping the other. After the book reached the stacks, the user could insert the extra date-due slip and walk out undetected. As protection against this, the original date-due slip must be in the book when it is renewed. Phone renewals are not accepted. However, if the user mails or brings in his date-due slips, the renewal is processed on the 2740. In the renewal of a book via the 2740, the record is called in and modified to change the date due and enter the correct renewal code. The original date due slip is stamped with the new date and the phrase "RENEWED." The slip is mailed to the user. Although record modification via the 27 40 is a valuable and necessary feature, it must be used with discretion, since the generalized file manage- ment system governing the 2740 does not have the controls contained in the circulation-specific portions of the program which handle data from the 1030's; for example, automatic calculation of date due, rejection of renewals on saved books, validation of codes, etc. Interactive Circulation SystemjVENEZIANO 109 BOOK DISCHARGE Heturned books are left in book bins, one inside the building and one outside. It became very c\·ident during the implementation phase of the system that the success of the system depended on a thorough screening before discharge for purposes of detecting and deflecting potential problems before they got to the discharge terminal. Books are first placed on dated trucks and then screened. Books tcifhout Punched Book Cards 1 f the punched hook card is missing, there will usually be a hand-written dat(' slip in the book ( the result of a manual charge ). The screener pulls the matching book card from the "book-cards-pending" fil e. ( After a manual charge, book cards arc punched and filed in this file to await the return of the hook.) The book is then ready for regular discharge. If there is no book card waiting, the hook must be held until a card is ready. This is done to avoid the charge being made after the discharge. Books with Incorrect Book Cards All book cards are checked to see that they match the call number on the book pocket. Sometimes cards get switched between two books by the user when he charges them; sometimes the error was made when the card was originally matched with the book. If a book is found to contain an incorrect card, sometimes the correct card will he found in the "cards-pending" file. If so, it is pulled and inserted and the hook sent for regular discharge. ( The incorrect card becomes a "snag". ) If the correct card is not found, the record is searched in the 27 40 under both call numbers ( the one on the card and the one on the book ) . If the record is under both call numbers, the record which matches the book is deleted; the book is sent to keypunching; the unmatched book card is filed in the "cards-pending" file to await the return of the book which matches it. , If the record is found under only one of the two call numbers, it is deleted. The book is sent to keypunching; the unmatched book card becomes a ''snag." "Snag" cards will be searched in the shelf list and, if they represent valid books, will be searched in the stacks. This is done to determine if a matching book can be found. Books without Date Due Slips The presence of a date clue slip in a hook usually indicates that the book should be in the circulation file. A slip will be missing if the user nen'r charged it out or if he lost (or removed ) the slip after charging it out. Such books arc searched on the 2740. If no record is found, the book is sent to the stacks. If a record is found, it is deleted. However, we wish to llO journal of Library Automation Vol. .5 / 2 June, 1972 guard against the user returning to insert the date due slip and walk out with it; thus, the book is not sent to the stacks until the date due is past. Regular 1031 Discharges The speed and accuracy of discharge are features which have contributed much to the success of the system. A book with a date-due slip and book card which matches the book go to the 1030 terminal at the main circulation desk for discharge. One slide is set to either "fine paid" or "fine not paid." (If the user paid a fine at the time he returned the book, a "fine paid" flag will be in the book. ) Another slide is set to "book returned today," or "book returned yester- day," or "book returned prior to yesterday." If the last condition applies, the date of return is also set in the slides. Once set, these slides need not be reset until there is a change of date or fine condition. For minimizing the resetting of slides, books are segregated into groups all of the same type. Discharging is the essence of simplicity. The book card is inserted in the reader; it feeds down and out and is replaced in the book. The date-due slip is discarded and the book is ready for shelving. For the purpose of speeding up discharge, no printer message is received unless there is an error (record not in file), or unless the book has a save on it, or is a "found" lost book. One operator can discharge five to six books per minute. Books are almost always discharged within one day of return and usually within three or four hours. If a large number of books should pile up after a period of computer down-time (fortunately rare), a massive discharge campaign is launched. Two operators, working together on the terminal, can discharge books at the rate of one every eight seconds. If at the time of discharge a "save" message appears on the printer, the book is placed on the save shelf instead of being sent to the stacks. If a lost book is "found," the message alerts the operator to send the book to the staff member in charge of lost and missing books. If a message is received to the effect that no record exists in the file, the book is routed to the 2740 operator. Occasionally the 1031 terminal will misread a card, usually due to im- proper folding. If a card is folded outside the punched area it causes no trouble. Unfortunately, some of the original cards were folded in the middle which sometimes results in a punch being missed. This, in some cases, cannot be detected by the computer program. If the error resulted from a mis-read card, the terminal operator can usually determine, from the date-due slip and the error-message slip, the key under which the record exists. The record is deleted and the book sent to have a replacement card punched. An occasional cause of the "record-not-in-file" condition results when the charge was processed on the Standard Register punch (the mechanized Interactive Circulation SystemjVENEZIANO 111 back-up system). This punch has a disconcerting habit of dropping a punch from badges which have a slight defect. There is no warning when this happens, and the error is often not detected until the transaction is later processed through the 1030 terminal. Since it is impossible to identify the user with certainty, such cards are simply discarded without processing on the assumption that most users are basically honest and will return the book. The 27 40 operator, seeing a date-due slip with a short identification number, is safe in assuming the record never got in the file. Sometimes the "record-not-in-file" condition is the result of a discharger absent-mindedly discharging a book twice. If the 2740 operator cannot find a record, she gives up and sends the book to the stacks. During the early days of operation, when much of the charging was being done on the source record punch, the "record-not-in-file" condition was often due to the book being "discharged" b efore the charge was processed. The very small amount of down-time now, coupled with careful scheduling when it does occur, has almost eliminated this source of error. OVERDUE BOOKS Overdue notices for students and special individual users are prepared once a week. To avoid sending out large numbers of notices for books only a few days overdue, an overdue notice is prepared only if the book is at least four days overdue. A second notice is prepared two weeks after the first; a third and final notice is prepared two weeks after that. If there is no response to the final notice within two weeks, a "delinquent" notice is prepared which is not sent out but is used to prepare a bill for a "lost" book. The overdue-notice run also produces reminders of expired call-ins. FINES AND FINE COLLECTION Faculty and staff are fine-exempt. Students and other individual users pay a 10 cents per day fine for books overdue more than three days. In addition , if a reader does not respond to a call-in by the revised due date, he is charged a $1.00 per day penalty fine. A user may elect to pay a fine on an overdue book at the time he returns it, in which case a "fine-paid" flag is inserted to alert the discharger to set the proper slide. No fine notice will result if this slide is set. For all other books returned late, fine notices are computer-prepared each weekday. These are on four-part forms; one copy is inserted in a window envelope and mailed; the other three parts are filed alphabetically by name. When the user pays his fine, the extra slips are discarded. If the fine is not paid in a reasonable period, one of the extra copies is sent as a follow-up notice. If no response to the follow-up is received, and if the total bill exceeds $3.00, the bill is sent to the Department of Student Finance for collection. 112 Journal of Library Automation Vol. 5/2 June, 1972 Sometimes the receiver of an overdue notice will come in to report that he ( l) returned the book, ( 2) lost it, or ( 3) never had it. In such cases the book is searched in the 27 40 because the book may have been returned since the overdue notice was prepared. If the record is still in the file, the item is verified in the shelf list. In some instances an incorrectly punched card is responsible for the item not being properly discharged. If a call number on a notice cannot be found in the shelf list, there is no alternative except to delete the record and absolve the reader of responsibility. If the call number on the notice represents a valid book, it is searched in the stacks and if found, is brought down and discharged, with a resultant fine notice. When the book cannot be found, the reader is usually held responsible for it, unless it was a case of a lost badge which was reported promptly, in which case the library is usually lenient. If no lost badge was involved, the book is processed as a "lost" book and the user is billed. A book is also considered lost (and the user billed) if the user does not respond to three overdue notices. Weekly overdue notices are not prepared for faculty. Instead a once-a- quarter computer-produced memo is prepared informing the individual of the books charged to him. He is asked to return them or notify the library by carbon copy of the list that he wishes to retain them. If a faculty member does not return the list, the library calls in the books individually. As part of this quarterly memo run, listings of books charged to carrels and to departments (reserve room, bindery, cataloging, etc. ) are produced. These listings have proved very valuable in maintaining control over books charged out on a long-term basis. LOST BOOKS When a book is determined to be "lost," a duplicate book card is prepared. The history of the loss, including the name and address of the individual involved, is entered on the card. If the reader is held responsible, the book is priced and a bill is prepared. The original record is left in the file until all the documents are prepared. Then it is deleted via the 27 40, and a duplicate card is immediately used to charge the book out to the "lost" category. The duplicate card is then filed in the "lost/missing" file, by call number. Another category of books is known as "missing." These are books which, although not charged out to anyone, cannot be found in the stacks. A dupli- cate card is prepared and used to charge the item out to the "missing" category. The card is filed in the "lost/missing" file. Once a quarter, a computer-produced listing of lost/missing books is received. Using this list, the stacks can be searched to see if the books have turned up. The list of books lost or missing for more than two years it turned over to the Catalog Department for withdrawal. After official with- drawal, the record will be deleted from the file. Interactive Circulation SystemjVENEZIANO 113 The fact that all lost/ missing books are reflected in the file has aided in detecting them if they turn up. If such a book is discharged, a printed message alerts the operator who routes the book to the person in charge of lost/missing books. The duplicate card is then pulled from the lost/missing file. Since the card contains the name of the responsible individual, it is possible to trace down the original bill in case an adjustment is necessary. Lost/ missing books also turn up if someone tries to charge them out. The "UNPROCESSED" message which is printed instead of a date-due slip will usually cause the reader to bring the book to the main circulation desk where the proper action can be taken to reinstate it in the collection. MANUAL CHARGES The system had to be designed so a book could be charged out even if it did not have a punched book card. Such books are brought to the main circulation desk where a two-part form is hand-prepared. One part becomes a date-due slip; the other part goes to keypunching. A composite card con- taining the call number, the user number, and the loan code is punched, which is then fed through the 1030 to create a charge record. Also keypunched at this time is a regular book card, which is filed in the "cards-pending" file to await the return of the book. Such manual charges are very unsatisfactory. Call numbers and identifi- cation numbers are often illegible or miscopied. Keypunch errors are not uncommon. Care must be taken that the composite cards are processed through the 1030 before the book is returned for discharge. Fortunately, books without cards are now a rarity. MECHANIZED CHARGES Although the amount of computer down-time is very slight, some means had to be devised to charge out books during such periods. The manual charge procedure could have been used; however the high error rate in copying and punching, coupled with the delay in keypunching any sub- stantial volume of cards, caused us to reject this as a back-up system. A Standard Register Source Record Punch is used. This punch reads the badge and book card and transfers the data, plus data from a series of internal slides, to produce a printed date-due slip and a punched composite card. When the computer comes back on, the composite card is fed through the 1031 to set up the charge record. Since only one machine can be justified from a cost standpoint, the process of charging books out in this fashion is slow. Long lines of people often form, waiting for service. Resetting the internal slides between one loan code and another is awkward and error-prone. The machine is ex- tremely sensitive to badge quality and often misses a punch. However, as with manual charges, the most significant disadvantage is that charges are made "blind." There is no way to determine whether a book is not already in the file, or, if it is being renewed, that it has a save 114 Journal of Library Automation Vol. 5/2 June, 1972 on it. The user's number may be one of those "blocked" from use; this fact is not detected until it is too late. As with manual charges, care must be taken that all such mechanized charges are processed through the 1030 before any discharging is done. In spite of its defects, the Source Record Punch has proved useful as a system back-up. The error rate in transfer, while higher than on the 1030, is significantly less than the error rate of manually prepared and keypunched charges. Although slow, records get into the file much faster than if they had to be keypunched. THE IMPACT OF THE SYSTEM ON THE LIBRARY The new system has had a profound impact on the operation of the circulation services section, but other departments have also been affected, particularly Technical Services. Tighter control of cataloging is now maintained. No longer is it feasible for small uncataloged collections or collections with off-beat cataloging to exist in virtual isolation from the rest of the library. Regulations as to depth of classification have had to be adopted; the formation of the Cutter number and work letters must be carefully regulated; the assignment of volume and edition numbers must be uniform. Location symbols require careful control; no longer can books be casually passed from one collection to another without official transfer. Withdrawal of lost and missing books must be systematically performed. The system gives maximum flexibility-books may circulate on LC class numbers or document numbers as well as on a Dewey number. Ways of handling non-standard Cutter numbers and work letters have been impro- vised. At the same time, the system operates to prevent unnecessary hap- hazard and shortsighted practices. Within the circulation services section, the computerization of circulation has not resulted in fewer personnel; it has, however, resulted in the same number of staff members being able to handle a much larger volume of circulation and to handle it more efficiently. In addition, Cirrulation Services has taken on a number of tasks which in the past were either not its responsibility or, if they were, were given only perfunctory attention. A comprehensive inventory of the entire collection of 1,200,000 books in the main library is in progress. Errors both on books and in the catalog are being corrected. The physical condition of the collection is being attended to. The content and quality of the collection are receiving increased atten- tion. Incomplete serial holdings are being brought to light for possible acquisition. Books in the stacks which are candidates for inclusion in the "Special Collections" Department are being detected. So far as circulation proper is concerned, it can be said without reserva- tion that the system saves a great deal of clerical effort. Staff time spent in charging out books is very small. Discharging an average day's books Interactive Circulation System j VENEZIANO 115 requires three or four man-hours. Filing has almost disappeared as has most of the typing formerly required. A 2740 operator is required for inquiry and processing of mail renewals for the better part of the day and evening. The collection and follow-up on fines and bills is still a time-consuming job, although the extra forms available for follow-up have supplied some relief. The system is not perfect. There are certain improvements-such as on-line validation of users and automatic regulation of loan privileges- which would be made if the time and money were available for them. However, considering the modest cost of developing and operating the system, the imperfections are bearable. Not the least of the benefits derived from the system is a somewhat intangible one. The role of the circulation librarian, and that of his staff, has changed. No longer are they chained to mountains of cards which, as soon as they are filed, must be unfiled. Staff members have been challenged to use their released time to the best advantage. Much thought and in- genuity has gone into setting up procedures to achieve maximum efficiency and accuracy. For the first time, perfection is seen as an attainable goal. Each day the staff develops more sophistication and gets a step closer to that goal. Figure 1. User inserts identification badge and punched book card in self-service circulation terminal. 116 Journal of Library Automation Vol. 5/2 June, 1972 Fig. 3. User inserts date-due slip in book pocket, complet- ing charge procedure. Fig. 2. Specially designed attach- ment is used to cut off printed Fig. 4. Terminal at circulation desk has manual entry unit, which can be set to process charges without an identification badge, renewals, or discharges. Interactive Circulation SystemjVENEZIANO 117 Fig. 5. Typewriter terminal is used for inquiry into file, placing saves on books, and occasionally for renewals. 5735 ---- lib-s-mocs-kmc364-20140601052432 118 Journal of Library Automation Vol. 5/2 June, 1972 AUTOMATION OF ACQUISITIONS AT PARKLAND COLLEGE Ruth C. CARTER: System Librarian, University of Pittsburgh Libraries. When this article was in preparation, the author was Head of Technical Services and Automation, Parkland College, Champagne, Illinois This paper presents a case study of the automation of acquisitions fun ctions at Parkland College. This system, utilizing batch processing, demonstrates that small libraries can develop and support lm·ge-scale automated systems at a reasonable cost. In operation since September 1971 , it provides machine-generated purchase orders, multiple order cards, budget state- ments, ovet·due notices to vendors, and many cataloging by-products. Th e entire collection, print and nonprint, of the Learning Resource Center is being accumulated gradually into a machine-readable data base. INTRODUCTION-BACKGROUND Parkland College, opened in 1967, is a two-year community college located in Champaign, Illinois. Before the librarian-analyst, who combines a library degree with several years' experience as a computer systems analyst and six months of programming training, was hired by Parkland, the administration decided that automation of some library procedures was feasible. At the time the library decided to initiate automation planning (December 1970), it had a book collection just under 30,000 plus 1000 audio-visual items. The decision to automate would not have been possible unless a computer was available at the college. In the spring of 1970 when the librarian-analyst was hired, Parkland owned an IBM 360/ 30 with 32K. Before automation plans were under way, the college purchased an IBM 360/30 with 64K. The computer's increased capacity provided even more incentive for utili- zing the computer for significant projects in addition to instructional and administrative functions. Among the reasons in favor of automation was a Automation of Acquisitions/CARTER 119 general consensus indicating that automation was the way to go, and that the library with its many individual records is a natural for utilizing the computer. The automation of library acquisitions at Parkland is notable for several reasons. First, automation was done relatively easily and rapidly; actual systems design and programming were completed in six months. Full implementation was achieved within nine months of the formal beginning of the project. Second, documentation of the system is exhaustive and is based on a detailed method of communication between the system's librarian-analyst and the programmer. Third, automation in this instance was accomplished economically. Fourth, the entire system can be run on an IBM 360/30 with 32K having two disk drives and two tape drives, and a standard print chain consisting of just upper-case letters. WHAT TO AUTOMATE? This, of course, is a crucial question. Where out of the various alternatives of circulation, acquisitions, cataloging, and others does one begin? Neither the librarian-analyst nor the rest of the library staff made any attempt to work out an answer during the fall of 1970. The librarian-analyst, as head of Technical Services spent the first four months concentrating on cataloging and learning the problems in the acquisitions area. By December she was ready to begin planning for automation. Meetings were arranged with the director of the Learning Resource Center and the director of the Computer Center. Informal discussions with the library staff were held. Circulation was eliminated early from consideration, since Parkland is in temporary quarters. It seemed more logical to develop the area of circula- tion with the move to the permanent campus. In addition, the volume of circulation did not appear to warrant the time and personnel commitment necessary to develop a comprehensive system at this time. Several possi- bilities remained: the acquisition of new materials, conversion of our whole catalog, and periodicals control, including automatic claim notices. Periodicals seemed the least likely of the three, because our holdings numbered less than 700, and it was felt that the volume involved did not justify the effort and expense of going to a computer system, particularly the first computer system within the library. Converting the whole catalog had some positive arguments. It would provide a data base for later circulation efforts and also make it possible to produce bibliographies and other service features for faculty members. However, this idea was discarded due to the large initial data-conversion problem, and because it did not provide relief for existing problems within the library. The library staff concluded that acquisitions had first priority for automa- tion. To this the director of the Computer Center heartily agreed on the grounds that it was a conventional data processing type of application, and it would dovetail with existing data bases already maintained for admini- strative purposes, in particular, the Vendor File and Financial Reporting 120 Journal of Library Automation Vol. 5/ 2 June, 1972 Files. Furthermore, the library could then produce its encumbrance data to be entered into the budget programs for the Business Office accounting records. From the standpoint of the library staff, it was believed that by utilizing the computer in acquisitions we could improve the overall staff utilization in the area. Probably the strongest point is that, while we did not expect clerical work time to be decreased, its nature would be changed. One specific function to be eliminated was the manual bookkeeping done, although a machine system would still require checking for accuracy. We expected that the acquisitions librarian, once freed from some routine responsibilities concerning the budget, would be able to devote that time to more professional activities. Other advantages in automating acquisitions were: more accurate and up-to-date information, especially in regard to budget figures would be available; human errors in sending out orders would be cut down; and statistics on orders could be compiled automatically. At this point, as well as previously, the literature was searched for relevant discussions of acquisitions systems and/or mechanization applications in small libraries. Relatively little had appeared in print describing library automation in junior colleges. Those articles found to be helpful included: Burgess, Cage, Corbin, Dobb, Dunlap, Macpherson, Morris, and Vagianos (see references 1-5 and 7-9). Also, Hayes and Becker's Handbook of Data Processing for Libraries ( 6) became available at this time. It was especially useful for the summary of features usually present within the scope of standard acquisitions applications. Along with use of the literature, several visits to other libraries with operational systems were made. A visit of particular importance was made in January ( 1971) to study an established off-line acquisitions system. As soon as there was general agreement on proceeding with plans for acquisitions, a list was prepared of the criteria the library staff would expect from the automation of acquisitions. The list items included: 1. The system should be open-ended, i.e., it should be planned with other potential future systems in mind. 2. It should handle the preparation of outgoing forms such as purchase orders, book-order cards, notifications to faculty requestors, and overdue notices to vendors. 3. The system should perform bookkeeping functions and provide many different access points for inquiry into the data base. 4. There must be a status list of items in the acquisitions process, up to and including the point of receiving cataloging. 5. It should have as much automatic editing of input data as possible. 6. The system must have flexible updating and file maintenance routines. 7. It should provide the library staff with decision-aiding information including many of our previously manually maintained statistics. Automation of Acquisitions/CARTER 121 8. It must be flexible. 9. It should maintain simplicity. And, 10. It should provide better service to the faculty through faster and more accurate ordering and notifications. Along with the criteria for an acquisitions system, a Possible Sequence of Automation Development was submitted. This was to provide a means for keeping clearly in mind that, while acquisitions would get first attention, this was only a starting point, and that the system should be planned in such a manner as to facilitate its compatibility with future developments. As originally stated, acquisitions, strictly speaking, represented Phase 1, and materials added to the collection were Phase 2. However, Phases 1 and 2 were planned and programmed at the same time. Thus, from the beginning, Parkland College has included in its system cataloging information such as the complete call number, and up to three subject headings of fifty charac- ters each. The decision regarding number and length of subject headings will be discussed later. (See master record layout at Figure 1.) TIME ESTIMATE-SCHEDULE In January, 1971, a proposed time estimate (see Figure 2) was submitted to the director of the Computer Center for his approval. This time estimate was prepared with the goal of automating acquisi- tions beginning with the fiscal year 1972 (i.e., July 1972). The proposed schedule also took into account the fact that most of the librarians were expected to be on vacation all (or at least most) of August, and also that during September, with the registration of students and other demands on the computer resulting from the beginning of a new academic year, com- puter time and personnel would be tight and probably could not provide the necessary support to a system still in its developmental stages. The schedule called for the librarian-analyst to begin full -time work on analysis on February 15 with final implementation of the system by the end of July. Preparation of this estimate was based on computer output if everything went right. It was an extremely rigorous schedule. Considering that problems did arise, the implementation of this system during the first week of August is truly notable. Of course, bugs remained after the system was actually in operation, and, as with all systems, changes were still being made several months later both in specifications for programming and in the programs conforming to the specifications. When the time estimate was submitted, it was also necessary to make firm decisions regarding personnel to perform all the necessary tasks. The librarian-analyst assumed responsibility for all systems analysis and program definitions. The library staff supplied the keypunching support. One clerk had been hired previously because of her keypunch training. On July 1, an additional clerk was hired with this skill. The main problem was program- TAP'E LAYOUT FOI'IM TAPE NO. I PREPARED BY: I REMARKS R. CARTER LIBRARY MASTER FILES: ON ORDER, I N PROCESS, ~!STORY NO , LENGTH, BLOCK : 400 x 9 IJi"i'l ·~ ~~·I I Ill I II II I I Ill I I j";'i" I ~ II Ill II I I II II I Ill I I~ ~j""t"l" I rJ,IIIIIIIIIIIIIIIII 'i'j"llllllllllllllj~ llll.~ k '"''''' "''""' '"· ' J l 1111111111111111111111111111111 1 111111111111111 F ~ositions 301-350 • Subject Hesding No. 2; 35 1-400 • Subject He &ding No. 3. Fig. 1. Master record layout. 1-' ~ 0' ~ !::l --a t'"' .... Cl"' ~ > ~ ...... 0 ~ ...... o· ;! < 0 ~ CJl ......._ l~ ._ c :l v(!) 1-' CD -l 1:-0 Automation of Acquisitions/CARTER 123 ming, because the Computer Center did not have the full-time personnel to support a major new effort. This was resolved by hiring a programmer on a special three-month contract running from April 15 to July 15, 1971. Prior to implementation, the library was forced to rely on the availability of keypunch machines at the Computer Center. In September 1971, an IBM Model 129 keypunch and verifier was installed in the Technical Services Department of the library. A Model 129 was chosen for the library in con- formance with the initial requirement set by the director of the Computer Center-that all library data for the computer be verified. This has proven to be a wise decision, as we have had relatively limited problems with invalid or erroneous data. REQUIREMENTS SPECIFICATION PHASE (ANALYSIS) Three weeks were allowed for identification and specification of all output desired from the initial system. Many of these requirements were alluded to in the preliminary list of criteria for the system. To meet the library's needs we decided that the system must produce: purchase orders, individual order cards (including a copy used to order catalog cards from the Library of Congress), budget statements including all encumbrances and payments as well as other financial data, lists of all books on order or in process or cancelled, notices to vendors regarding items on order more than 120 days, notices to each faculty member of the additions to the collection of items they requested complete with call number, and a monthly accession list of all newly cataloged items that could be circulated to all faculty members. Time Date to Date to Development Steps Required Start Complete I. Requirements specifications 3 weeks Feb. 15 March 5 II. Detailed design-System How 3 weeks March 8 March 26 Ill. Detailed design-Programming specifications 10 weeks March 29 June4 IV. Programming-Acquisitions 10 weeks April15 June 23 v. Programming-Materials accessioned 3 weeks June 24 July 14 VI. Computer Program System Test -Acquisitions & Materials Accessioned 2% weeks July 1 July 26 VII. Implementation July 1971 Fig. 2. Time estimate for automation of acquisitions at Parkland College as submitted in January 1971. A beginning and ending date for each phase is indicated and the actual time in weeks required is shown. 124 Journal of Library Automation Vol. 5/2 June, 1972 Once it was known what forms were required, orders were placed for the necessary pre-printed forms. With some outside advice in the matter of forms suppliers, specifications for three new forms were delineated, two of which would be for use on the computer. The first form encountered in outlining the acquisitions process was a request form. The request form is used to make a record of all items ordered and to serve as a checklist in the searching process (see Figure 3). Later, it is stamped with a six-position control number and serves as the source document for keypunching new orders, which require three input cards per item ordered. The request form is then retained in control-number sequence until the item has completed its way through the technical services process. Specifications for the purchase orders were drawn up by Parkland's business manager. The machine-generated purchase orders used by Park- land are almost identical to the conventional manual purchase orders used throughout the college. In this case, automation of the library's purchase orders is a likely precursor to automation of the purchase orders for the remainder of the college. The most complicated form to design, from the library's viewpoint, was the individual order form. This was required in five parts, including a copy complying with Library of Congress specifications for use with OCR equipment. (This is illustrated in Figure 4. ) PAPER PATO IY N.CJI. CO. SPEEOISET e MOORE BUSINESS FOAMS, INC., 26 SEARCHED IN BIP PBIP 8PR PTLA O. P, PIL FUND VENDOR FORMAT CODE AUTHOR (LAST NAME FIRST) TITLEfVOL. CARD CATALOG PUBLISHER OTHER YEAR NO. COPIES REVIEWED IN: SERIES/EDITION lCCARD NO. REQUESTER CONTROL NO. ORDER CODE PRICE SBN Fig. 3. Request form, used as a control record for each item ordered. -------r-------------------------------------------------------------------- I 0 0 0 0 0 I SUBSC RIBER NO I m I AlPHA PREF' I I 220111 I I I AUTHOR WESTHEIMER, DAVID TITLE LIGHTER THAN A FEATHE R PUBLISHER lITTLE DATE 1971 NO. COPIES l CONTROL NUMBER 103921-B ORDER DATE ·V ENDOR ll TTLE BROWN & co J L C CAR D NUMBER r 174 -15494 7 I I 10 I LIST PRICE I lo I ' w '; I r •· " ~ II 0 7.95 I 1-14-72 I L01375 I 0 p 0. NO . I PARKLAND COLLEGE LIBRARY I IO 11111111111111111 i 0 I A B c D E F G SBNI H I J K L M N 0 I I b_ -------i------------------------------- ----- ---------------------------------~---~-- . 01 1 o I I ------- ------- _L__ Original copy, used to order catalog cards rrom me Library of Congress. ---~----------- -- ---------------------,-- 0 i 1 r;:·-t,m7 ! o 0 [ AUTHoR·wesTH£JMeR, oAvio l ... o ... ee•o TITLE LIGHT fit THAN • FEATttfR I o-m 0 'O PUBLISHER ll TTLE DATE 1971 LIST PRICE 7. 95 0 0 0 0 . NO. COPIES 1 CONTROL NUMBER 10)921-6 ORDER DATE vENDoR LITTLE a·•towN a. to 1-14-72 P.o. NO. l01J7S PARKLAND COLLEGE LIBRARY CHAMPAIGN, ILLINOIS 61820 Second copy, used to send to vendor. Fig. 4. Copies one and two of the multiple-part order form . 0 0 0 126 Journal of Lib·rary Automation Vol. 5/2 June, 1972 It was important to determine forms requirements early, as it was anticipated that several months' time would elapse before they would be received. Naturally, it was desired that the forms be on hand by the time the programs would be ready for testing, which was planned for late June or early July. One of the most critical parts of the requirements specification phase was the determination of data elements to be included in the master records. Perhaps the most perplexing of those possibilities considered was subject headings. Since we wanted an open-ended system which would leave us some room for future development, without major modifications, a decision was made to include three 50-character subject headings in each record. Here we were limited because of the decision made (for purposes of sim- plicity of design and programming) to confine the system to fixed -length records. It was considered desirable for storage purposes to keep the master record length within 400 characters. While the decision on subject headings may prove to be adequate in the long run, it does give Parkland's library a good starting point for some projects using subject headings, such as developing bibliographies on demand. Despite possible future modifications to the data base, all items going into the History (Master) File included headings as defined above. Additional determinations made in the initial phase regarded files to be maintained. Here a crucial factor was the physical limitations of the college's computer system. As only two tape drives and two disk drives comprised the primary storage facilities, the capability for performing sorts was limited. In fact, one of the disk drives was reserved strictly for systems programs, and could not be utilized directly by the library. This contributed to the decision to maintain separate On-Order and In-Process Files, as well as a History File on tape. The college Vendor File and the Library Budget File are maintained on disk. A final area of effort in the initial phase was developing codes to be utilized throughout the system. Naturally, many conditions would be indi- cated in the computer records by the use of a one- or two-position code. One example is the Format Code, a one-position code, which indicates the types of items used such as: B=Book, R=Record, and S=Filmstrip. DESIGN PHASE-SYSTEM FLOW Three weeks were allotted to developing the overall systems flow chart. This time was spent working out each separate program that would be required, and flow-charting the entire series of programs. A flow chart of the system (without minor additions dating after September 1971) is shown in Figure 5. However, it does not necessarily indicate the sequence in which programs are run. In general, maintenance of each of the separate files is run prior to new data. This procedure has proved to work well. .-------~ : llfNOO• I : U'OAf( CA~O$ I ~ ----;--.! ... -- .. - -, I \WOAU I I VtlfOOII r· ~- :~~ - - ~ Automation of Acquisitions/ CARTER 127 o\UIOJHIUV ooooo c:~f6' Fig. 5. System flow chart. 128 Journal of Library Automation Vol. 5/ 2 June, 1972 In most cases, pre-sorting of card input is provided. This decision was not based on optimum efficiency but on the compatibility with routine pro- cedures and facilities in the Computer Center. DESIGN PHASE-PROGRAM SPECIFICATIONS One of the most significant parts of the development of Parkland's auto- mated library acquisitions system is the exhaustive documentation pro- vided by detailed written specifications for each program in the system. Each program, including utilities such as sorts, was assigned a job number and then described under each of the following topics: purpose, frequency, definitions (any unusual terms), input, output, and method. A format was provided for each input and output, whether it was a card, tape, disk, list, or other printed report or form. These accompanied each individual program specification. The Method Section is particularly important. Here the librarian-analyst stated the procedure used to arrive at the given output based on the given input. Any necessary constants were defined. Because the librarian-analyst has had programming training, these specifications are detailed to the point where the programmer does not have to do much more than code the prob- lem, making it possible for programming to proceed quickly. This thorough problem definition for each program by the librarian-analyst was one of the major factors (perhaps the primary key) in our success in acquisitions being accomplished rapidly and efficiently. It had the advantage of obviating the need for a senior programmer, or for having someone from the Computer Center become highly involved in the analysis of library details. Further- more, and perhaps most important is the fact that it provides the detailed documentation of the system. There should be no doubt as to the procedures within each program. An example of a specification for one of the programs in the Parkland College Library Acquisition Series is presented in the Appendix. It should be mentioned that most of the programs are written in COBOL. There are a few in Assembler, and some minimal use is made of RPG. TESTING OF THE PROGRAM The original plans called for testing with test data which would pro- ceed simultaneously with programming. However, as things developed, most coding was done prior to very much testing. As a result, the period originally devoted to live data testing of the whole system was instead devoted to testing the programs with test data. Thus, in early July, we were about two weeks behind the original time estimate, and that is where it ended up. The usual problems showed up in testing with test data. Moreover, during the first week of July, it was learned that the Business Office was changing the length of the Account Numbers from 9 to 11 positions. Fortunately, space had been planned for up to a 12-position field, so the lengthened number could be easily accommodated by the system. However, the chang- Automation of Acquisitions/CARTER 129 ing of numbers required modification of any program which edited data for valid account numbers. This was a minor problem and easily resolved. On July 15 the programmer completed the job for which he was hired- i.e., to complete a programming and systems test utilizing live data and to make appropriate changes as identified during testing. Since not even test- data testing was complete on July 15, he stayed until July 20 and finished that work. Meanwhile, the director of the Computer Center had already selected the individual to be the operator when the library's jobs were being run on a regular basis. This employee would also provide program maintenance. On July 21, this permanent staff member took over pro- gramming. For the next two weeks, while summer school classes were in session, most of the trial runs of the library series had to he done during evenings, nights, and on weekends. By the end of July, most of the major bugs appeared to be out of the programs. IMPACT ON TECHNICAL SERVICES Success on the first usable purchase order and order cards came on August 3. Within the next day or two, a workable budget statement was produced along with a WITS List (Work in Technical Services). By August 13, when the vacation time came, nearly one thousand books had been ordered via the automated system. While a few bugs remained to be dealt with in September, the system was accomplishing its basic mission essen- tially on time. It took less than eight months to identify requirements, and design, program, and test a system consisting of twenty-seven programs in its original design! During the remainder of 1971, various bugs were found, and, it is to be hoped, eliminated from the system. More bugs occurred in the budget series than in any other single segment of the system. Over a period of several months, these were worked out; as of March, 1972, the budget sequence of programs worked smoothly. IMPLEMENTATION Following the implementation of the automated technical services system, several effects were evident. An obvious effect was the saving of two to three days per month formerly spent on bookkeeping. On the other hand, one permanent staff member was added to Technical Services because of the keypunching workload. This addition had two causes: the keypunching load, and the fact that many more books were ordered directly from pub- lishers with a consequent major increase in processing in-house. Therefore, much of what was expended in salary for the extra clerk was saved by eliminating most prepaid processing costs. For several months after implementation, some duplication of effort was required, especially by acquisitions personnel. Thus, the total effect on changing the nature of work was not immediately obvious. By March 1972, duplication was essentially phased out, and more realistic assessments of the 130 Journal of Library Automation Vol. 5/2 June, 1972 impact of automation in changing the nature of the workload are now being made. One of the most obvious changes is the increased number of bills to be approved for payment. By utilizing the computer to batch purchase orders and order cards, almost all materials are now ordered directly from publishers, rather than pre-processed from a jobber. Although the speed by which items are received and processed has increased substantially, there has been a corresponding increase in paper work in this regard. ADDITIONAL SERVICES Besides the immediate effects of the automation of acquisitions within Technical Services, other parts of the library and the college felt the impact. This is especially true of reference, which now has a weekly updated listing of all items on order, in process, or cataloged within the last month, in both author /main entry and title sequence. Budget statements are now available to the Director of the Learning Resource Center and other personnel on a weekly rather than monthly basis. Not only are they received sooner, but they provide more information than is present in the statement originating from the Computer Center. A useful fringe benefit is the availability of overdue notices to vendors when items have heen on order more than 120 days. A computer-generated notice is sent each week to faculty members regarding items requested, cancelled, or cataloged. The response of the library staff and the rest of the faculty to the automated system has been very favorable. COST At this date (March 1972) , costs are difficult to assess, but certainly seem minimal. The only direct costs are the installation of a 129 keypunch, which rents for $170 per month, plus the salary of the extra staff member for keypunching. However, the extra salary is compensated for by no longer ordering items pre-processed at an average cost of $2.05 per item. Naturally, there is some local cost for processing materials such as pockets and labels, but it is minor on a per-volume basis. In addition, by being processed locally, materials are available to the users much more rapidly. Among other costs, the Learning Resource Center had to pay a three- month salary for a programmer. Other computer support, whether personnel or machine time, has not been directly billed to the library. Analyst time is absorbed, in part, in general library salaries as the librarian-analyst is also head ofTechnical Services and is responsible for original cataloging. About one-half of her time is devoted to automation activities. As an indirect cost of automation, it is reasonable to include the cost of a special summer project contract of about $1500 for the reference librarian to catalog A-V materials. This was necessary because the librarian-analyst was directly involved with automation, thus not able to keep up with all media of materials to be cataloged. Purchase-order forms previously covered by the Business Office budget cost the library $900. However, it was a two-year Automation of Acquisitions/CARTER 131 supply which was paid for by money the college, if not the library, would have expended anyway. The multiple-order forms for computer use exceed the cost of more standard forms by several hundred dollars per year. The library also expends about $400 per year to buy punch cards and magnetic tape. Some direct savings resulted from what are by-products of the automated system, but which were previously done manually. These include production of a monthly accession list and notices to faculty members of items they requested which were ordered, cancelled, or cataloged. The accession list was previously compiled by Xeroxing in ten copies the shelflist card for all items added to the collection during a month. This involved both Xerox charges and student assistant time. Notices to faculty were previously sent out by both the order and processing sections. Now these notices are consolidated, which produces savings in addressing time, as well as elim- inating manual production of each notice. Overall, in calculating costs and savings, direct and indirect, it appears at this point that Parkland has automated many library routines very inexpen- sively, although specific cost figures remain to be determined. With the availability of a similar computer, many other libraries should be able to undertake automation of certain basic functions without large expenditures of either money or personnel time. PROBLEMS As with all automated efforts, some problems were encountered at almost every stage of development. Taken as a whole, these were minor and, for the most part, few hitches were encountered. However, so that others may profit from the library automation experience at Parkland, those problems will be discussed. The major problem was the original programmer of the series. This person was not a regular employee of Parkland and was not concerned with being retained. Since he was not part of the staff, he worked erratically and frequently was hard to get hold of. We were working on a tight time schedule, and it was very important to maintain close supervision of the progress being made, although sometimes this was difficult. In addition, even though it was strongly desired that tests be conducted throughout the three-month period, the programmer waited until all coding and compiling was completed before beginning even test-data testing with most programs. Fortunately, it worked out satisfactorily, as the regular staff member of the Computer Center, who presently runs our jobs and does program mainten- ance, took over in mid-July and was available for live-data tests. All staff members directly involved with automation worked very hard the last two weeks of July and the first week of August to complete testing with live data. The programs were further refined during August and September, and most of the bugs were out by early fall . Naturally, changes in specifications continued to be made, and our acquisitions system is definitely not static. 132 journal of Library Automation Vol. 5/ 2 June, 1972 The lesson we learned from the experience with the initial programmer is that, if a regular staff member of the institution can be assigned to the development of programs for the library, avoiding other assignments during that time period, a more satisfactory response can be achieved from the programmer. Also, in such an operation it would be possible to monitor progress on a more regular basis. Another group of problems arose in connection with the new forms required for the automated system. Fortunately, these were not serious. The forms arrived later than they were promised, and, without exception, their cost was about 25 percent more than the original estimates. Because custom forms can take a long time to be completed, it is wise to identify output requirements ·early in the development of an automated system, so that the forms can be completed and delivered when the system is ready for final testing and implementation. A few minor problems revolved around decisions made in file design. For conserving space and holding down the size of the master record, it was decided to pack numerical fields. This would have been satisfactory if packing had been limited to such fields as the Julian date, such as 72001 rather than 01-01-72. (This form of the date was used to provide easy computation when calculating overdue orders. ) Unfortunately, fields such as the numerical part of the LC card number and the Parkland College account numbers were also packed. No problem existed except when the LC card was blank at order time; then the LC number printed as zeros. Of course, these could be suppressed once the problem was identified, although it was decided to make space to unpack the field. It was learned that packed fields always print zero when unpacked, unless this is specifically suppressed, and also that it is impossible to debug packed fields on routine file dumps that are requested with provisions for unpacking and reformat- ting the dump. This is because packed fields print blank when they are dumped. Other minor difficulties included: l. The print chain did not print colons or semi-colons, except as zero, therefore, the library's records all contain commas instead. 2. In the midst of programming the account numbers , all the college's funds were changed, thus requiring the change of constants and edit criteria in many programs. 3. As originally specified for input, the LC classification number did not sort in shelf list order, for instance, BF 8 sorted after BF 21. This was eventually remedied by left-justifying the letters and right-justifying the numbers within separate fixed fields. 4. Routine delays for machine repair and maintenance were a concern, since it is necessary to adhere to a tight schedule in systems develop- ment. Automation of Acquisitions/CARTER 133 FUTURE DEVELOPMENT As is so frequently the case, now that Parkland is committed to automated functions within the library, more and more applications are seen. Even the former skeptics on the staff are enthusiastic, and all the professionals have made suggestions for the future. Several additions to the acquisitions system were made in the first six months following implementation of the system. These included a list of purchase orders sequenced by vendor and enlarging the machine-generated notices to faculty requestors to cover items ordered and cancelled. Various additions have been made in several programs originally part of the system, which expand the services the system can provide for the library staff. Many more minor modifications and supplementary features in acquisitions have been identified for inclusion in the system, and will be added as time permits. The first additional area to benefit directly by the computer availability has been periodicals. Without involving complicated programming, the periodicals holdings have been converted to a card file which is then listed directly, card by card, without changes, except for suppression of a control and sequence number. Nothing more is planned for periodicals in the near future, because the new card file enables the master holdings list of 800 titles to be updated in Technical Services by the periodicals assistant, who also keypunches one-half time. The time-consuming retyping of the holdings list is now eliminated, and multiple copies of up-to-date holdings lists can be produced more frequently with less effort. Another new area for which programming specifications were released in December 1971 is reference. In this system it is hoped that subject bibli- ographies and holdings lists, based on Library of Congress classification, can be produced. This system will have a multitude of purposes, one of the primary ones being to give better service to our faculty members. We get many requests for copies of portions of our shelflist or other extracts of holdings. Rather than filling these requests by Xeroxing cards or tedious typing, a few extract specifications will permit computerized retrieval and printing. Also, search time in the catalog will be cut down considerably. In the subject bibliographies, the library plans to be able to extract on any heading, stem of a heading, or any part of a heading, thus getting much more flexibility than in manual use of the card catalog. Programming for this is currently under way, and after the system has been completed and is operational, some interesting results should be identified. By including three subject headings of fifty characters in our original file design, it was possible to design and program the reference series as a spin-off of the acquisitions- technical services system with a minimum of additional effort. Even if it is eventually decided to lengthen either the number or size of the subject headings contained in Parkland's file, useful services will have been provided under the original design, as well as simply having provided a base for further decisions and developments. 134 Journal of Library Automation Vol. 5/2 June, 1972 Other projects which are being considered for future action are serials holdings (in Parkland's case, mostly annuals and yearbooks which get cataloged), including an anticipation list, and management statistics con- sisting of holdings percentages by class letter versus collection additions and circulation figures by class letter. Circulation itself will undoubtedly not be designed prior to actual residence on the permanent campus ( anticipated for fall 1973), but all of the above are possibilities and some will receive attention in the immediate future. By building a data base which includes subject headings and call numbers, many future projects will be practical to consider as the file maintenance programs and the data base will already exist. These, of course, may be modified from time to time to meet changing conditions and requirements. Additionally, Parkland's library staff has been following cooperative library automation efforts involving other libraries, and would happily consider participation in appropriate cooperative ventures. CONCLUSION In the opinion of both the library and computer staff, the automation of acquisitions is a success. It was accomplished rapidly and essentially on time and economically-with few costs higher than originally anticipated. Now that the system is operating smoothly, with only an occasional bug cropping up, the extra workload caused by parallel operations has been phased out and the total efficiency of the system should continue to improve. The system to date has been running on a weekly basis, and this has proved satisfactory to both the Computer Center personnel and the library. The library is among the first parts of Parkland to be on a regular weekly schedule using the computer. Most other processing is on a monthly and quarterly cycle. In approaching any automated systems development, a general attitude of flexibility combined with thoroughness is very important and will prob- ably bring the best long-term results. By being flexible and open-ended, regardless of what portion of a library's functions were originally auto- mated, the way will be paved to provide a data nucleus for other applica- tions in areas of the library. Thoroughness in design and attention to initial detail are also important, as sometimes it is harder to find the time to make the changes than was expected. There is probably a tendency to get along with an operational system as it is, rather than making minor non-crucial modifications in it, although such changes do get worked in as time permits. Nonetheless, it is very important that in the initial stages a system be as comprehensively planned as feasible. The Parkland College Learning Re- source Center is fortunate in that original specifications (on the whole) were well thought out and provided a cohesive unit, which is also characterized by built-in flexibility, and as a result is adaptable to future growth. Automation of Acquisitions/CARTER 135 ACKNOWLEDGMENTS Numerous individuals have participated in and supported library auto- mation efforts at Parkland College. David L. Johnson, director of the Learning Resource Center provided the initial inspiration and determina- tion. Robert 0. Carr, director of the Computer Center, welcomed the library's commitment to automation and provided the technical advice where necessary. Sandra Lee Meyer, acquisitions librarian, gave full cooper- ation, including tireless aid in clarification of requirements and debugging test results. Since late July 1971, Bill Abraham has been the programmer- operator for the library system and has consistently given more than one hundred percent effort. Jim Whitehead from Western Illinois University contributed valuable advice based on his prior experience in acquisitions automation. Finally, Kathryn Luther Henderson, an inspirational teacher and friend, voluntarily spent many hours writing test data and offering the opportunity for many fruitful discussions. REFERENCES 1. Thomas K. Burgess, "Criteria for Design of an On-line Acquisitions System at Washington State University Library," In Proceedings of the 1969 Clinic on Library Applications of Data Processing, edited by Dewey E. Carroll (Urbana: University of Illinois, Graduate School of Library Science, 1970), p. 50-66. 2. Alvin C. Cage, "Data Processing Applications for Acquisitions at the Texas Southern University Library," In Proceedings, Texas Confe1·ence on Library Automation, 1969 (Houston: Texas Library Association, Acquisitions Round Table, 1969), p. 35-57. 3. John B. Corbin, "The District and Its Libraries-Tarrant County Junior College District, Fort Worth, Texas," In Proceedings of the 1969 Clinic on Library Applications of Data Processing, edited by Dewey E. Carroll (Urbana: University of Illinois, Graduate School of Library Science, 1970), p. 114-34. 4. T. C. Dobb, "Administration and Organization of Data Processing for the Library as Viewed from the Computing Centre," In Proceedings of the 1969 Clinic on Library Applications of Data Processing, edited by Dewey E. Carroll (Urbana: University of Illinois, Graduate School of Library Science, 1970), p. 75-80. 5. Connie Dunlap, "Automated Acquisitions Procedures at the University of Michigan Library," Library Resources & Technical Se rvices 11: 192- 202 (Spring 1967). 136 journal of Library Automation Vol. 5 / 2 Jun e , 1972 6. Robert M . Hayes and Joseph Becker, Handbook of Data Processing for Libraries (New York: Wiley-Becker and Hayes, 1970). 7. John F. Macpherson, "Automated Acquisition at the University of Western Ontario," In Automation in Libraries. Papers presented at the C.A.C.U.L. Workshop on Library Automation at the University of British Columbia, Vancouver, April 10-12, 1967 (Ottawa, Ontario: Canadian Library Association, 1967). 8. Ned C. Morris, "Computer-Based Acquisitions System at Texas A & T University," Journal of Library Automation 1 :1-12 (March 1968 ). 9. Louis Vagianos, "Acquisitions: Policies, Procedures, and Problems," In Automation in Libraries. Papers presented at the C.A.C.U.L. Workshop on Library Automation at the University of British Columbia, Van- couver, April 10-12, 1967 (Ottawa, Ontario: Canadian Library Associ a- tion , 1967 ), p. 1-9. 5736 ---- lib-s-mocs-kmc364-20140601052623 137 TECHNICAL NOTE HELP: The Automated Binding Records Control System An interesting new aspect of library automation has been the appearance of commercial ventures established to provide for an effective use of the new ideas and techniques of automation and related fields. Some of these ventures have offered the latest in information science research and de- velopment techniques, such as systems analysis, management planning, and operations research. Others have offered services based on new procedures, for example, computer-produced book catalogs, selective dissemination of information services, indexing and abstracting activities, mechanized acqui- sitions, and catalog card production systems. One innovation is a new technique devised for libraries to reduce the clerical effort required to prepare materials for binding and to maintain the necessary related records. The technique is called HELP, the Heckman Electronic Library Pro- gram. It was developed by the Heckman Bindery of North Manchester, Indiana, with the cooperation of the Purdue University Libraries. It was recognized by Heckman's management that the processing of 10,000 to 20,000 periodicals weekly and the maintenance of over 250,000 binding patterns would soon become too unwieldy and costly unless more efficient procedures were developed. It was additionally realized that any new system should also be designed as a means to aid libraries with their interminable record-keeping problems. The latter purpose could be accom- plished by providing a library with detailed and accurate information regarding each periodical it binds, and by simplifying the library's method of preparing binding slips for the bindery. In the fall of 1969, after a detailed analysis, the Heckman Bindery Management began the development and programming of a computerized binding pattern system. This system was a result of a team effort involving management, sales, and production departments. John Pilkington, Data Processing manager, directed the installation of the system and Earl Beal performed the necessary programming functions. In December of 1971 approx imately 700 libraries were using the system, and about 100,000 bind- ing patterns were in the data file . 138 Journal of Library Automation Vol. 5/2 June, 1972 As the system was developed, a library's binding pattern data were converted to machine-readable form which then made it possible for the bindery automatically to provide nearly complete binding slips for each periodical title bound. In addition, the system provides an up-to-date pattern record for the libraries' files, and the bindery maintains the resultant data bank of pattern records as the library notifies it of additions, changes, and deletions. In this manner, the bindery expects to establish an efficient method for purging the file of out-of-date information. The system revolves around four forms: the binding pattern index card, the binding slip, the variable posting sheet, and the binding historical record. The binding pattern index card (Figure 1) is a 5" x 8W' card, pink in color, which is a computer printout. One of these cards is retained in the library as its pattern record for each set of each periodical bound by the library. The data given on the card are essentially the same as those main- tained by most libraries in their manual pattern £les, except that more detail is provided by the HELP system, and the library does not maintain the record-the bindery does-in machine-readable form. As changes are made to the patterns, the library clerk simply crosses out the old data on the appropriate binding slip and writes in the new data. When the bindery receives the binding slip, a new index card is produced, among other records, and forwarded to the library with the returned shipment of bound volumes. The system also provides for one-time changes that do not affect the pattern record. The data contained on the index cards include the library account num- ber, the library branch or department code, the pattern number, color, type size, stamping position, title (vertical or horizontal spine positions), labels, call number, library imprint, and collating instructions. The collating in- structions, which are listed in the instruction manual provided by the bindery, are given as a series of numeric codes. Asterisks are used to indi- cate the end of a print line. The binding slips are also 5" x 8}2'' forms, but they are four-part multiple forms, of which three parts are sent to the bindery with the periodical to be bound, and one part, a card form, is retained by the library as its "at bindery" record. The information required by the binding slip is essentially the same as that included on the index card. The library, however, must provide the variable data such as volume number(s), date(s), month(s), or whatever information is required to identify a specific volume. The variable posting sheet (Figure 2) is an 8)~" x 11" form that is used by the library when it sends several volumes or copies of a volume to the bindery at the same time. Since the bindery cannot determine beforehand the number of physical volumes of a title a library will want to send for binding at a given time, it sends to the library only one printed-out binding slip to be used for the next volume of a given serial. If multiple volumes of -r-~---------------------------------------------:r-- 0 PATTERN CUST. ACCT . NO. I LIB RAR Y' I PATTERN NO. I •• 1. COLO~. . , I TRIM I SPINE ICUST . PAT. NO. 'I' 0 I T'"E SLOT OR I I oTUI I I I SIZE START I I ::: LIBRARY I 0 <( ~ ..... oo z z 0 I <( 0 POST • 0 0 0 0 ,, ' 0 -o ·-' 0 ·~ f ·o 0 ~ ACCENTS I ~ z ~ TO I : I rr.· llol ~ !II llol I: I o I Z <( ::E I i u z ·> I a: LLI 0 I ~ I ID z <( :1 :Iii: I ~ X LLI X I ... v E • T I c A L F R L 0 A N 0 T E L 0 • • CALl IMPRIIIIT PANEL l,..lllll$ COLL.ATl 8 lEN . S£W P£RMA · FILM VOl.. OTY. 1 ovt• U: " U fiiiOUI u•• NU... 0, ~: r TAl"[ STUI FILLER SEP. COVEA StU R STU8 W/STUI SHEETS 11111 PAPU y X TITLE I 0 ~ I F '" REQUIRED I ~ I NEW TITLE I I : I 0$ SAMPLE I I Q. OR RUB JOB NO. COVER NO. 0 c o: .. . : 3 0~ : • ! Of 'I Q_l ' + ~---- - ------ . 'F•a ( i o 1 - -- . }, 4 Fig. 1. Binding pattern index card. 140 Journal of Library Automation Vol. 5/ 2 June, 1972 BINDING PATIERN VARIABLE POSTING SHEET 1HE. HECKMAN BIN'DE.~Y, INC. CUST. ACCT. NO. 1 ~.18RJ.RV rATTERN NO.,l-ISRARV NAME PERIODICAl- NAME 'POST PATTERW VARIABl-E INFORMATION FROM \.EFT TO RIGHT IN SEQI./li:NC'E I z 3 4 5 . 6 ; .... '-......_~ .... "-....... - ,_......-"'\_'-~r··-~ ........... ..____.._ · -l, )~ I / -- - ~ Fig. 2. Variable posting sheet. a set are to be bound, the library clerk provides the variable information for the first volume by using the single binding slip, and the variable data for each additional volume of the same title are posted by the clerk on the posting sheet. The bindery will automatically produce from its pattern data bank the binding slips necessary for binding the additional volumes that are listed on the posting sheet. The binding historical record (Figure 3) is a form provided for the use of the library if it desires a permanent record of every volume bound. The use of this form is not required by the system; it is simply a convenience record for the library binding staff. The form is printed on the back of the pattern index card. Spaces are provided for volume, year, and date sent to the bindery, and most of the back of the card is available for posting. All data fields are of fixed length with the maximum size of the records at 328 characters. Some of the data formats are shown in Figure 4. A few of the data fields in the example need additional explanation. The fifth field labeled "PRINT" refers to the color of the spine stamping, i.e., gold, black, or white. The "TRIM #1 & 2" fields are for bindery use only, and indicate volume size within certain groups for printing purposes. The "SPINE" field is also for bindery use, and it indicates the size of type that can be used according to the width of the spine. "PRODUCT NO." refers to certain types of publications such as magazines, matched sets, or items which will be pamphlet (inexpensively) bound. I I 0 0 0 0 0 0 0 0 0 0 TITLE : PUBLISHER ' S ADDRESS: VOLUME YEAR -------------------· BINDING RECORD 0 0 DATE SENT VOLUME YEAR DATE SENT 0 0 0 0 0 0 0 0 Fig. 3. Binding historical record. ,..--- 1 I I I I I I I I I IBr., Print Punch Program Control Card Print Punch Program Control Card Print Punch Program Control Card L----- 96 COLUMN CAl Card Name ______________ _ I 12 1314 15 1 6 171 81 91 10 11 112 113 14l15l16 l11 l18l 19 20 l21l22 l23l24l25l26 l21 l28 l29 l30 131132133 134 35136 3Ji38 l39 l40 1411421 43144145 l I ' Print Line 1 I p ier 1 ' T I I CUST. NO. LIB PATTERN p MAT. TRIM ~IM s CUSTOMER NO. NO. R #I p 1 I I I PATTERN ' N I N I T I E NO. I ' I 2j 3 4j5j_6 1 18 19 110 11112113 14li5l 16 l11 l1 8 ll 9 20121 122 23124125126 21128129130 31132133134 35136 31138139140 I 411 42 143144 145 I II I I I I I II I I I I I Ill Ill II I I II I Ill II Card Name------- -------- I 2 13} 4 }5}6 1 I 8 I 9 I 101 II 1121 131 14115 116111 118 119 20121122123124125126 1211281 29 130 I 311 32 133 I 34 I 35 l36l31 l38l39l40 141 42143144 14511 l I ' Print Line 1 I p,, er 1 I Ti ' I CUST. NO. LIB PATTERN I ' ' NO. NO. I !2 ' ) COLLATE (CON~.) -~ ' I I I I I 2 I 3 4 I 5 I 6 118 19 }10 11}12 }13 J4li5l16 I 11 I 18 I 19 20 I 21 I 22123 I 24125126121 I 28129130 131132133 I 34 I 35136 131 138 I 39140 I 41 42 143 14414511 I I I I II I I , IIIII I I I I I 11 I I I I 11 111111 I I II II Card Name ______________ _ 1 2 I 3 14 I sl 6 I 1 I sl 91 10 111 112 j 13 14l15l 16l11 l1 8}19 20 }21}22}23}24}25}26}21}28 }29 }30 l31 }32 }33}34l35l36l37j38 l39l40 ]41 ]42 143 144]45~ Print Line 1 Pr ier 1 Ti CUST. NO. LIB PATTERN I ~ NO. NO. 5 I 2 I 3 41516 1 I 8 I 9 I 10 II 112113 14115 I 16117 I 18 I 19 20121 I 22123124125126 I 27128129130 I 31 I 32 133 134 I 35136131138 I 39140 I41I42I43I44I4SL~ I I I IJJ ll . Ill I I I I l l I I l I ll l Ill l l l l l l l l l I ll Fig. 4. Data formats. ----, 1 J MULTIPLE LAYOUT FORM Print Lines 3 and 4 Tier3 GX21·9088·0 UM /050 " Pnnted •n U S A "NO of,Offt'IS_,.,~,..\w~l,.. "1f.---------'--COLLATE----------------l 11 Line 2 Print Lines 3 and 4 r2 Tier 3 ----------------------~----VARIABLE ------------------------~----~ 1t Line 2 Print Lines 3 and 4 r2 Tier3 -----VARIABLE (CONTt) ------------------~ ' ' I I I I ~ l I I I I _____ J 144 Journal of Library Automation Vol. 5/ 2 June, 1972 l. lllRARY HAMil!. 'UfT. ACCT. NO, Llll - - r: HOW 80UNP I PRO.~~~ ;::;:. J'IITTflll NO.,I'ItiNY I"AYIJttA..:l TRIM - I ~''NI-l CU$ T. I"AYTEitN NO. I !rvPE nor DR 'PATTER-N PR.l)o.ITlNG- ~t::TU P SIXE HART ~ lfOR.IX:OlHAL. lv VEil TICAL ' I Fr? FRONl' OR. LABELS VARIABLE fGl CAFYIONS CALL ~ c IMPJliNl' ~ I PANEL. ~ LINE:S p COLLATINGom ~ Fig. 5. Pattern printing setup. - Technical Note / HAMMER 145 One additional form used in the system is for Heckman's internal opera- tions. That is a data input form known as the "pattern printing setup" (Figure 5). This form is used by the bindery's input clerks to prepare new binding patterns for conversion to machine-readable form. The data pre- scribed by the form is much like that required by the binding pattern index card, except that data tags are shown for keypunching purposes. The system operates on an IBM System 3 computer with two 5445 disk drives and a 1403Nl printer. The disk drives provide a total of 40,000,000 characters of on-line storage in addition to the 7,500,000 usable characters provided by the System 3 itself. Five 5496 Data Recorders are used for data conversion. The programs are written in RPG2. The development of computer-oriented commercial services for libraries suggests that, perhaps if librarians wait long enough, they will not have to automate their libraries as commercial ventures will do it for them. The rapid appearance of systems-analysis firms, commercial and societal ab- stracting and indexing services, management and planning consulting groups, and data processing service bureaus tends to bear this theory out. At the very least, libraries will not be able to automate internally without providing for the incorporation of such ready services into their systems. When a service such as HELP is made available at no additional charge, there is no way for libraries to avoid automation. Donald P. Hammer Donald P. Hammer is Associate Director for Library and Information Systems, University of Massachusetts Library, Amherst. At the time the system d escribed in this article was developed, Mr. Hammer was the head of Libraries Systems Development at Purdue University. 5737 ---- lib-s-mocs-kmc364-20140601052745 146 Journal of Lihmry Automation Vol. 5/2 June, 1972 BOOK REVIEWS Book Catalogs. By Maurice F. Tauber and Hilda Feinberg. Metuchen, N.J.: Scarecrow Press, 1971. 572 p. $15.00 In 1963 Kingery & Tauber published a collection entitled Book Catalogs. This is a much larger follow-up, containing twenty papers published between 1964 and 1970 and eight previously unpublished pieces. Not sur- prisingly, nearly all of them are concerned with computer-produced book catalogs-in academic, special, county, public, and school libraries. Al- though nearly all of the previously published papers appeared in well- known journals, it is useful to have them collected together; the older ones are now of mainly historical interest, but, taken as a whole, they form a valuable record of trial and error-also of progress. It would be unfair to single out any of the published articles for special praise or blame. In a rapidly changing field, even the good is soon improved upon. It is the examples, the castings, and above all the mistakes that are so helpful. There is no excuse now for running into problems that have in the past led to the total scrapping of some computer systems: unforeseen filing difficulties, insufficient computer storage, bad economic estimating, and inability to produce an acceptable product. One major problem is still unsolved and indeed has not really been tackled systematically-the pattern of output (main sequence and supplements) that provides maximum usability at minimal cost-a problem surely amenable to OR techniques. As a reviewer from the United Kingdom, I would like to have seen a little more on relevant events there than is provided by Frederick G. Kil- gour's general review: the smaller budgets of British libraries have generally enforced much more careful planning and, although there may be fewer successes, there are also very few failures. The introduction and the three final pieces, all specially written, are of great value, particularly Hilda Feinberg's "Sample Book Catalogs and Their Characteristics" (some samples are unbelievably horrible). For good meas- ure there is a bibliography, a (computer-produced) index, and the listing of "Book Form Catalogs" reprinted from LRTS. Book Reviews 147 I would hazard a guess that it is with COM that the future lies for many libraries. The next collection of papers, for which I hope we shall not have to wait eight years, must surely be entitled "Book and Microform Catalogs." Maurice B. Line An Introduction to PLjl Programming for Library and Information Science. Library and Information Science Series. By Thomas H. Mott, Jr., Susan Artandi, and Leny Struminger. New York: Academic Press, 1972. 231 p. The importance of this text rests in the authors' assumptions that the acquisition of programming skills by the library student is an essential component of his education in the fields of library automation and informa- tion retrieval. Such skills should enable the student to examine critically the relevance of automated information handling for the library, to experiment with some basic methods of manipulating machine readable textual ma- terial, and, "to acquire an understanding of the role of the programmer in the development of ... information handling techniques." The selection of a programming language for this text deserves some comment. PL/1 has been recognized as a particularly suitable language for the processing of textual material and data base management applications. Its extensive and powerful repertoire of bit, character, string, array, record, and file manipulation capabilities argue strongly in favor of its adoption for library and other information handling applications. Students should be encouraged by the selection of PL/1 for this text, for it offers the novice great flexibility and ease in constructing and manipulating even the most complex types of information structures. This title constitutes the first published attempt to tailor an introductory programming text to the needs of the library student. As such, it possesses several characteristics which distinguish it from other basic programming books, including other PL/1 texts. The language features receiving the greatest share of attention in the present title are the set of built-in functions in PL/1 designed to facilitate the manipulation of strings of both binary and character data. Discussion of four of these functions ( BOOL, UNSPEC, VERIFY, and TRANSLATE) is usually omitted from general introductory PL/1 textbooks. Although the discussions of the BOOL and UNSPEC functions are reasonably complete, the explanations of VERIFY and TRANSLATE fail to indicate the scope of their applications. For example, the utility of the VERIFY function as an index function for ranges is completely ignored. A more illuminating example of the power of the TRANSLATE function could have explored its usefulness in converting ASCII characters to the corresponding characters of the EBCDIC set. This might have clarified the section entitled "Internal Representation of PL/ 1 Characters," which contains an equivalence table for the PL/1 character 148 Journal of Library Automation Vol. 5/2 June, 1972 set in ASCII and EBCDIC without indicating its purpose. Additional use of this example could have been made in the presentation of the MARC material, where the practical value of such a function could be stressed. Another desirable feature of this text for the instructor and the library student is the inclusion of sample problems and exercises which , since they refer exclusively to text processing, library automation , and information retrieval, should be readily understandable. Unfortunately, the present volume omits any mention of the PICTURE attribute and its uses. As a powerful device facilitating the interchange of data between numeric and character variables and the uncomplicated editing of numeric fields prior to output time, its inclusion would have proved valuable to the tex t handling programmer. However, it should be emphasized that this appears to be the single instance in the text in which a generally acknowledged basic language feature has been entirely excluded. It seems to me that too much of the text ( 15-25 percent) is devoted to developing some of the elementary concepts of boolean algebra and con- structing a theoretical model of document retrieval based on these concepts. One possible explanation for this emphasis is the fact that the material for the book was drawn from a graduate seminar in programming theory for information handling. Although these chapters are informative and the exposition of ideas is straightforward, they should have been omitted. The space which they occupy could have been used more successfully to explore those PL/1 features essential for information handling but excluded or treated too briefly in the present volume. A list of such topics would include: an expanded discussion of program interrupts and the ON CONDITION, a description of PL/1 record formats emphasizing the variable length record, and a guide to the use of the varying structure method of writing variable length records. The deficiencies of this text are its overemphasis of information retrieval theory and applications, and its failure to stress those features of PL/ 1 which would enable the student to appreciate the file-handling capabilities of the language. However, for many instructors the availability of pro- gramming examples which should be easily grasped by the library student may strongly outweigh these disadvantages. H award S. Harris Guidelines f01' Library Automation; A Handbook for Federal and Other Libraries. By Barbara Evans Markuson, Judith Wagner, Sharon Schatz, and Donald Black. Santa Monica, Calif.: System Development Corporation, 1972. 401 p. $12.50 This handbook is the result of a 1970 study on the status of fed eral library automation projects which was conducted under the auspices of tlw Federal Book Reviews 149 Library Committee's Task Force on Automation. The survey was carried out by the System Development Corporation and funded by the U.S. Office of Education. It is one of two reports generated from the study data, the other report being Aut01nation and the Federal Library Community. The study consisted of a questionnaire survey of 2,104 federal libraries of which 964 responded. Of that number, 57 libraries had one or more functions automated and ten had one or more functions in various stages of development or planning. The survey revealed that, among other activities, 27 cataloging systems (presumably "cataloging" means catalog card pro- duction), 25 serials systems, and 13 circulation systems were operational. The handbook purports to help the federal librarian answer the question: .. Is it feasible to use automation for my library?" It attempts to do this by presenting step-by-step guidelines "from the initial feasibility survey through systems analysis and design to fully operational status." That material more or less follows a pattern of discussion on automation pro- cedure followed by a checklist of the procedures in chart form. The areas covered include "feasibility guidelines" concerning such points as equip- ment, personnel, budget, and existing files; and "systems development guidelines" which include planning, analysis, design, implementation, and operation. The discussions include brief reviews of the various aspects of automation development, and statements describing the experiences of federal librarians as reported in the study. In this fashion, the reader is informed of the steps that should be considered with each aspect of auto- mation development and, additionally, he is informed of what his colleagues have previously done about each phase and/or problem. Much of this material is too general and too brief to do more than call the reader's atten- tion to the fact that certain requirements must be met in the successful development of an automation project. A large portion of the book is taken up with descriptions of automation projects in 59 federal libraries. This overview of the federal sector provides limited descriptive information about each library and reviews the various applications in terms of system descriptions, equipment, programs, future plans, documentation, etc. The reviews are not consistent in that not all of the above points are included in every review. This, however, is the result of the data submitted to the survey by the respondents. Approaches have been provided to this survey material by automated application, form of publication, type of equipment used, and by the special features of each system. Surprisingly, there is no approach by name of library. At least one very important library is not represented, i.e., Livermore, but for some reason, a similar library, Los Alamos, is included. The final section of the book is a potpourri of information about non- federal automation activities and is the weakest section of the volume. It includes a list of "automated libraries" that was published before and is very incomplete and poorly defined. Additionally, it briefly discusses data bases, commercial ventures, and for no apparent reason suddenly includes 150 journal of Library Automation Vol. 5/2 Jun e, 1972 22 pages of information on microforms in libraries. It just as suddenly reverts back to automation and proceeds to provide 23 pages of data on input/output hardware in libraries. The final section is a selected bibliography that seems almost as aimless as the section before it. The items included "have been selected on the basis of their particular interest and applicability to federal libraries," it is stated. They range over the whole spectrum of library automation, and some items have nothing to do with automation at all. There is no index to the book as a whole and a fair number of errors are present. In summary, the book includes a limited amount of rather old information most of which is available in other places in far greater detail. It appears that SDC had some rather weak survey data that seemed like it should be used! As a book of "guidelines" it does succeed in providing information in uncluttered and simplified form , but it is a very disappointing publication that leaves much to be desired both in substance and in organization. Donald P. Hammer Canadian MARC; A Report of the Activities of the MARC Task Group Resulting in a Recommended Canadian MARC Format for Monographs and a Canadian MARC Format for Serials. Recommended to the National Librarian. By Dr. Guy Sylvestre. Ottawa: The National Library of Canada, 1972. Canada's approach to the realization of a proposed format for machine- readable cataloging data was influenced by several factors. First and fore- most was the fact that Canada is bi-lingual, dictating the requirement for the possible representation of data in both French and English. In addition, the National Library of Canada wanted to continue its interaction with the Library of Congress and also to coordinat e the development of a Canadian MARC with international developments. The formats recommended are for the communication of machine- readable cataloging data. The processing of the data by local libraries was not ignored. It was recognized that this could involve ( 1) expansion. of the format to accommodate processing data (e.g., for acquisitions, serial con- trol); and ( 2) the development of data format independent software for effective data storage and retrieval (e.g., a data management system with logical and physical characteristics of data described independently of specific applications software). The MARC Task Group was established as a result of the recommenda- tions of the Conference on Cataloguing Standards held at the National Library of Canada in May 1970. The mission of the Task Group was to study the requirements for a format for machine-readable bibliographic records to be used in Canada. The group was not to concern itself with Book Reviews 1.51 cataloging standards as such, since these were to be considered by the Task Group on Cataloguing Standards. The MARC Task Group limited its attention to monographs and serials because this was the greatest need at the time. It was felt that after development of these two basic formats, i.e., monographs and serials, other formats for films, manuscripts, maps, etc., could be more logically developed. Recognizing that Canada has two official languages and that this creates specific bibliographic needs, the Task Group's first recommendation was that the National Library of Canada assume the responsibility for develop- ing a distinctive Canadian MARC format. Variations from the Library of Congress format are to be kept to a minimum, due to: • Economic considerations. • Dedication of Canadian library communication (in common with the Library of Congress) to the full application of the AACR, American edition and the "version fran9aise." • Willingness of Canada for continued heavy reliance upon the Library of Congress for answering its bibliographical needs in both the tradi- tional way as well as in machine-readable form. • Readiness of Canada to accept future bibliographic developments and amendments proposed by the Library of Congress, e.g., new filing rules. It is further recommended that: • The development of a separate Canadian MARC be coordinated with international developments such as ISBD (International Standard Bibliographic Description) and ISDS (International Serials Data System). • The National Library of Canada adopt the PRECIS (Preserved Con- text Index System) developed for BNB for the purpose of adding subject data to MARC records for Canadian publications in the form of descriptors. • Any new data elements and varying levels of completeness of data introduced into the format in the future (for other media, specialized collections, or retrospective conversions) do not conflict with the basic specifications recommended for Canadian MARC. Several studies were made by the Task Group. One addressed the need for MARC formats and the user requirements for such formats, keeping in mind the need for bi-lingual content in the perspective of an international MARC as to data for author, title, collation and notes, geographic names, and subject. Format requirements were based on a comparison of the United States and United Kingdom formats and the examination of Italian and other national MARC formats. An intensive study was made of the proposed Library of Congress format for serials. The implications and requirements for a MARC format to be used in conjunction with information retrieval and indexing systems were also examined. The best formats were then defined and recommended to the National Librarian. 152 Journal of Library Automation Vol. 5/2 June , 1972 The format recommended for monographs may be summarized as follows: 1. The tags are mainly from the Library of Congress MARC-II, with adoptions from BNB and MONOCLE. Particular attention was paid to avoiding conflict with any of the national formats. The Library of Congress 900 tags were expanded to provide Canadian libraries the option of selecting data in bi-lingual content, i.e., the data for the secondary entry fields could be represented in either the French or English equivalent. 2. The indicators specified in the Library of Congress format have been retained. Some additional ones from BNB and MONOCLE have been added. 3. The subfield codes of the Library of Congress format have been used most often with additional ones from BNB. There is no basic conflict with the Library of Congress MARC. Canadian MARC is more specific and the more precise specifications are hospitable to the Library of Congress format. It was felt that the subfielding for filing values or relationships found in MONOCLE could be met by soft- ware. 4. Descriptive and bibliographic content are not altered in any way since they are dealt with by cataloging codes. However, for codified content (e.g., codes for language, geographic area, bibliographic area, intellectual level), use of standard international codes is recom- mended. Meanwhile, Library of Congress MARC-II codes will be used for some fields, e.g., languages, geographic area . For serials, it was the intention of the Task Group to maintain compati- bility with the Canadian MARC format for monographs. However, it was necessary to study the proposed formats for serials issued by the Library of Congress, MASS-a MARC-based automated serials system proposed in the United Kingdom by the Birmingham Libraries Co-operative Mechani- sation Project, and the French MONOCLE. The proposed Canadian MARC format for serials has been based on the recommendation for the processing of serials issued by the Task Group on Cataloguing Standards. Data elements were isolated to meet special appli- cations such as: 1. The preparation of union lists for serial holdings with minimal bibliographic data (e.g., by broad subject groupings, by form divi- sion). 2. The bibliographic description of Canadian serials for a national bibliography. 3. The development of local library in-house systems for acquisition, processing, and control of serials. 4. The preparation of a Canadian serials directory incorporating a minimum of data and with a constant update facility. Book Reviews 153 This diversity of requirements led the Task Group to state several beliefs. First, the isolation of data elements for local library in-house systems and the compatibility of these data elements to allow for the exchange of computer programs can best be done by allocating a tag structure in a format separate to the main serials communication format. Second, there is a requirement for the relating of entries in the serial and monograph format (e.g., monographs in series which may appear in either format). If an exchange of data between the two formats is necessary, there may be a need to have an additional tag or a more extensive tagging structure for titles and series title entries. The specific recommendations for serials were that the National Library should: 1. Participate in the UNESCO proposals for an international serials data system in which the isolation of data elements for international ex- change will have a direct bearing on the elements in a Canadian MARC serials format. 2. Immediately initiate any action deemed advisable within the inter- national proposals to provide standard serial numbers for Canadian serial publications. 3. Consider the preparation of a Canadian serials directory as a separate project. 4. Initiate a pilot project with other libraries to test the proposed Canadian serials format prior to full implementation. 5. On the basis of the above recommendations, explicitly state which data elements are necessary. (The proposed format for serials recom- mended has those elements asterisked that the Task Group believed were not necessary. These are all processing control-oriented, e.g., frequency control, publication patterns, indexing, and abstracting coverage.) The report includes three comparative tables to be used in evaluating the proposed Canadian MARC formats. Table 1 compares, for monographs, the Library of Congress, United Kingdom, French (MONOCLE), and Italian formats against the format proposed for Canada. Table 2 compares the Library of Congress proposed format and the MASS format for serials against the format proposed for Canada. Table 3 compares the Canadian format for monographs against the Canadian format for serials. Copies of the Table 1 were submitted to the United States, the United Kingdom, France, and Italy for review and comments. The resulting revisions were not incorporated in the report since this would have delayed publication. The tagging structure, therefore, may be slightly revised when the Canadian MARC User's Manual is finalized. However, those interested in the com- patibility of the Canadian formats with the Library of Congress formats and the implications of the Canadian formats for an international MARC format will find the tables sufficient. Lillian H. \Vashi11g ton 154 ]oumal of Library Automation Vol. 5/ 2 June, 1972 MONOCLE: Pro;et de Mise en Ordinateur d'une Notice Catalographique de Livre. Publications de Ia Bibliotheque Universitaire de Grenoble, 4. [par) Marc Chauveinc. 2.eme ed. Grenoble: Bibliotheque Interuniversi- taire, 1972. 197 p. plus 25 Annexes and Errata A review of the 1st edition of MONOCLE appeared in ]OLA in March 1971 ( v. 4, no. 1, pp. 57-58). Readers are referred to that review and to the article by M. Chauveinc in the September 1971 issue of ]OLA (v. 4, no. 3) for a description of the structure of MONOCLE. The format has undergone little change in essentials, but many changes in detail have been made. New fields have been added ( 249: Abridged title of periodical; 270: Printer's imprint; 545: Note showing title of periodical analyzed ) , subfield codes have been changed or added, new indicators have been created (see below), and the names (and therefore the contents) of some fields have been changed ( cf. 241 and 242). The Leader has been enlarged from 19 to 24 bytes to show more exactly the address of the index related to a particular bibliographic record ( 4 new bytes) and to show the current number of fields in the record ( 2 new bytes) and the current length of the record ( 2 new bytes ) as well as the initial number of fields and the initial length. The length of the index is no longer given. Thus the Leader makes use of 8 new bytes and has discontinued 2 (only 18 of the original 19 were utilized). What has remained unchanged is the emphasis on coding for filing arrangement and on the use of tags to identify not only the nature of a field but its different functions and its relationship with other data. There is increased emphasis, however, on the importance of the integration and collaboration of several libraries in automation activities and, therefore, on the need for MONOCLE to be generalized so that it is usable by institu- tions with other goals, hardware, and processing languages than the Uni- versity of Grenoble. Mention is made throughout the volume of the variant approach of the Bibliotheque Nationale which uses MONOCLE to prepare the Bibliographie de la France. One change in the second edition is the increased awareness of the complexities involved in dealing with subrecords. The use of the subrecord technique has therefore been limited to works meeting certain requirements. The requirements are so strict that, for all practical purposes, Grenoble does not use subrecords. Instead, it uses secondary entries, or series headings, or contents notes. An important change has been made in the first indicator position of personal name fields ( 100, 400, 600, 700, 800, 900) which, in the 1st edition, was similar to MARC. A new indicator structure has been created to facili- tate construction of sort keys. A first indicator of '0' is used for forenames of saints, popes, and emperors. A '1' indicates a name that is to be filed exactly as given, whether it is a forename, simple surname, or multiple surname. Book Re vie ws 155 A '2' is used for multiple surnames containing a hyphen that is to be replaced by a blank, e.g., Saint-Exupery. A '3' is used when a name con- tains a blank, apostrophe, or hyphen that is to be deleted, e.g., La Fontaine. A '4' is used for complex names, whether simple or multiple, in which it is necessary to keep some blanks and/or letters and to delete others. For this purpose, MONOCLE makes use of three vertical bars to distinguish text to be printed and used for sorting from text to be printed only from text (supplied) to be used only for sorting. Since the three bars are used only in fields with 1st indicator of '0' or '4', the use of these indicators enables the program to test for them only when these indicators are present instead of in every field. The 1st indicator of '4' is used for complex arrangements utilizing the three bars in other fields as well: 110, 111, 241, 243, 245, 410, 411, 441, 443, 445 and the equivalent 6xx, 7xx, 8xx, and 9xx fields. The errors in this volume are minor. MONOCLE still lists field 653 (Proper names incapable of authorship) as an LC subject field, although this field was discontinued almost as soon as it was created so that it doesn't even appear in the 1st edition ( 1969) of the MARC Manual.s. In a discussion of the use of terminals to catalog books, it footnotes 'the library' of 'Ohio College' rather than 'the libraries' affiliated with the Ohio College Library Center. The review of the 1st edition pointed out that one of the values of MONOCLE for American librarians was the light it threw on MARC. That statement still holds true. For purposes of facilitating its use for this purpose, an English language translation might be of value. Judith HQpkins 156 Journal of Library Automation Vol. .5/2 June, 1972 Information Retrieval and Library Automation This monthly review is unique in its extensive U. S I and international coverage of the many specialized fields which con- tribute to improved information systems and labrary services for sc;ience, social science, technology, law and medicine; these fields include; computer technology and systems, library science and technology, library administration, photo· graphic technology and micro- forms, facsimile and communica- tions, library and information net- works, reprographic and printing technologies, copyright issues, in- dexing systems, mechine-aided indexing and abstracting, docu- mentation and data standards, databanks and anlysis centers. Subscription is $24.00 per year (over- leas subscribers add $6.00). Orders and inquiries should be directed to: LOMOND SYSTEMS, INC. MT. AIRY MARYLAND 21771 48,222 strong ... and still growing! F.W. Faxon Company, the only fully automated library subscrip- tion agency in the world, has an IBM 370/145 computer currently listing 48,222 p e riodic a ls for your library. Our 'til forbidden service - the a utomatic annua l renewal of your subscriptions - provides fast, accurate, and efficient processing of your orde rs and invoices. Send lor free descriptive bro- chure and annual librarians' guide. Library business is our on ly business- since 1886. rEiTIC1 F. W. FAXOn CO. ,InC. llJlLJ 15 Southwest Park Westwood, Massachusetts 02090 Tel: (BOO) 225-7894 (Toll Free) THE AMERICAN LffiRARY ASSOCIATION announces the exclusive distribution here in the United States qf NON-BOOK MATERIALS: THE ORGANIZATION OF INTEGRATED COLLECTIONS Jean Riddle, Shirley Lewis, Janet MacDonald, in consultation with the Technical Services Committee of the Canadian Library Association. Non-Book Materrols published by the Canadian Library Association ( 1970) is now being exclusively dis- tributed in the United States by the American Library Association. These officially approved rules for the cata- loging of audiovisual materials were designed to be compatiable with Parts I and II of Anglo-American Cataloging Rules (ALA 1970). Though written with the school library in mind, the principles can be applied to any library system which houses books and other media together and has a single, unified list of hold- ings. Color coding, organization of media, rules for descriptive cataloging, use of illes, storage and media destination are covered in addition to discussing 20 different media. The glossary of media designations in the book is an attempt to standardize terminology within the media industry. ISBN0-8389-3129-4 (1971 $3.50 IIDIAMERJCAN LIBRARY ASSOCIATION ~ l Huran St • Chicaga 60611 5738 ---- lib-s-mocs-kmc364-20140601052858 THE SHARED CATALOGING SYSTEM OF THE OHIO COLLEGE LIBRARY CENTER Frederick G. KILGOUR, Philip L. LONG, Alan L. LANDGRAF, and John A. WYCKOFF: Ohio College Library Center, Columbus, Ohio Development and implementation of an off-line catalog card production system and an on-line shared cataloging system are described. In off-line production, average cost per card for 529,893 catalog cards in finished form and alphabetized for filing was 6.57 (·. An account is given of system design and equipment selection for the on-line system. File organization and pro- grams are described, and the on-line cataloging system is discussed. The system is easy to use, efficient, 1'eliable, and cost beneficial. The Ohio College Library Center ( OCLC) is a not-for-profit corporation chartered by the State of Ohio on 6 July 1967. Ohio colleges and universi- ties may become members of the center; forty-nine institutions are partici- pating in 1971/ 72. The center may also work with other regional centers that may "become a part of any national electronic network for bibliographic communication." The objectives of OCLC are to increase the availability to individual students and faculty of resources in Ohio's academic libraries, and at the same time to decrease the rate of rise of library costs per student. The OCLC system complies with national and international standards and has been designed to operate as a node in a future national network as well as to attain the more immediate target of providing computer support to Ohio academic libraries. The system is based on a central computer with a large, random access, secondary memory, and cathode ray tube terminals which are connected to the central computer by a network of telephone circuits. The large secondary memory contains a file of bibliographic records and indexes to the bibliographic record file. Access to this central file from 158 Journal of Library Automation Vol. 5/3 September, 1972 the remote terminals located in member libraries requires fewer than five seconds. OCLC will eventually have five on-line subsystems: 1) shared catalog- ing; 2) serials control; 3) technical processing; 4 ) remote catalog access and circulation control; and 5) access by subject and title. This paper concentrates on cataloging; the other subsystems are not operational at the present time. Figure 1 presents the general file design of the system. The shared cata- loging system has been the first on-line subsystem to be activated, and the files and indexes it employs are depicted in Figure 1 by the heavy black lines and arrows. As can be seen in the figure, much of the system required for shared cataloging is common with the other four subsystems. The three main goals of shared cataloging are: 1) catalog cards printed to meet varying requirements of members; 2 ) an on-line union catalog; and 3) a communications system for requesting interlibrary loans. In addi- tion, the bibliographic and location information in the system can be used for other purposes such as book selection and purchasing. The only description of an on-line cataloging system that had appeared in the literature during the development of the OCLC system is that of the Shawnee Mission (Kansas) Public Schools ( 1). The Shawnee Mission cataloging system produces uniform cards from a fixed-length, non-MARC record. The OCLC system uses a variable-length MARC record and has great flexibility for production of cards in various formats. There are a number of reports describing off-line catalog card production systems, in- cluding systems at the Georgia Institute of Technology ( 2), the New Subject Class 3,3 3,1,1, I LC Card Call Author Name - and Title Index Number Title Title Number Number Index Index Index Index Index Index l ! ! l ! l T ! Bibliographic f---+ Holding Library, Record File Multiple and ~ Partial Holdings File I l l 1 ' Date File Name and Techn ica I Note and Address Processing Dash Entries File System and Extra Files Added Entries Fig. 1. General File Design; Shared Cataloging Subsystem in Heavy Lines. Shared Cataloging SystemfKILGOUR, et al. 159 England Library Information Network ( NELINET) ( 3), and the Univer- sity of Chicago ( 4). The flexibility of the OCLC system distinguishes it from these three systems as well. CATALOG CARD PRODUCTION-OFF-LINE An off-line catalog card production system based on a file of MARC II records was activated a year before the on-line system ( 5) . OCLC supplied member libraries with request cards (punch cards prepunched with symbols for each holding library within an institution). For each title for which catalog cards were needed, members transcribed Library of Congress ( LC) card numbers onto a request card. Members sent batches of cards to OCLC at least once a week. At OCLC, the LC card numbers were key- punched into the cards and new requests were combined with unfilled requests to be searched against the MARC II file. By the spring of 1971, over 70 percent of titles requested were found the first time they were searched. The selected MARC II records were then submitted to a formatting program that produced print images on magnetic tape for all cards required by a member library. The number of cards to be printed was determined by the number of tracings on the catalog record and the number of catalogs into which cards were to go including a regional union catalog (the Cleve- land Regional Union Catalog) and the National Union Catalog. Individual cards were formatted according to options originally selected by the member library. These options included: 1) presence or absence of tracings and holdings information on each of nine different types of cards; 2) three different indentions for added entries and subject headings; 3) a choice of upper-case or upper- and lower-case characters for each type of added entry and subject heading; and 4) many formats for call numbers. OCLC returned cards to members in finished form, alphabetized within packs for filing in specific local catalogs. The primary objective of off-line operation was the production of catalog cards at a lower cost than manual methods in OCLC member libraries. Early activation of off-line catalog card production did reduce costs and gave some members an opportunity to take advantage of normal staff turnover by not filling vacated positions in anticipation of further savings after activation of the on-line system. Other objectives of off-line operation were the automated simulation of on-line activity in member libraries and development and implementation of catalog card production in preparation for card production in an on-line operation. The number of catalog card variations required by members, even after members had reviewed and accepted detailed designs of card products, proved to be higher than anticipated. More than one man-year was expended after activation of the off-line system in further development and implementation to take care of the formats and card dissemination variations requested by specific libraries. The one year advance start on 160 Journal of Library Automation Vol. 5/3 September, 1972 catalog production made possible by using MARC II records in the off-line mode proved to be a far greater blessing than anticipated, for it would have been literally impossible to have activated on-line operation and catalog card production simultaneously. A major goal of OCLC card production is elimination of uniformity required by standardized procedures. The OCLC goal is to facilitate co- operative cataloging without imposing on the cooperators. The cost to attain this goal is slight, for although there is a single expense to establish a decision point in a computer program, the cost of selection among three or thirty alternatives during program execution is infinitesimal. Design of catalog cards and format options began four months before off-line activities. Two general meetings of the OCLC membership were held at which card formats were reviewed and agreed upon in a general sense. Next, the OCLC staff published a description of catalog card pro- duction and procedures for participation ( 6). This publication was reviewed by the membership and format variations were reported for inclusions in the procedure. Members reported few variations at this time, but when imple- mentation for individual members was undertaken, it was necessary to build many additional options into the computer programs. To assist the OCLC staff in defining options for off-line catalog products and on-line procedures, an Advisory Committee on Cataloging was established. This committee met several times and provided much needed guidance and counsel. The catalog card format options that members could select were exten- sive. For example, although the position of the call number was fixed in the upper left-hand corner of the card, there were 24 basic formats for LC call numbers, and libraries using the Dewey Decimal Classification could format their call numbers as they wished. In general, the greatest number of format options are associated with call numbers, probably because there has never been a standard procedure for call number construction. Programs Because designing, writing, coding, and debugging of catalog card production programs can cost tens of thousands of dollars, OCLC sought existing card production programs that could run on computers at Ohio State University, which is the generous host of the Ohio College Library Center. Only two programs were located that could both produce cards in the manner required by OCLC and run on OSU computers. Card produc- tion costs were not available for one of the programs, but because analysis suggested that the design of the program would create very high card costs, this program was not selected. The other program had been written and used at the Yale University Library, and although the card production costs were high, it was known that changes could be made to increase efficiency. Thus, arrangements were made to obtain and run the Yale programs at OSU. Members were free to choose a variety of format options and submitted on a Catalog Profile Questionnaire (Figure 2) their specifications for each Shared Cataloging SystemjKILGOVR, et al. 161 catalog. Holdings information and tracings could be printed on any or all of nine types of cards: 1) shelf list; 2) main entry; 3) topical subject; 4) name as subject; 5) geographic subject; 6) personal and corporate added entries; 7) title added entry; 8) author-type series added entry; and 9) title-type series added entry. Subject headings and added entries could have top-of-card or bottom-of-card placement and could be printed in all upper-case or in upper- and lower-case characters. Any type of subject heading and added entry could begin at the left edge of the card or at the first, second, or third indention. Other options are described in the Manual for OCLC Catalog Card Production ( 5). The data received on Catalog Profile Questionnaires were transferred to punch cards and a computer program written in SNOBOL IV embedded the information in the form of a Pack Definition Table (PDT) in one of the principal catalog production programs named CONVERT ( CNVT). Each PDT defined the cards to go into the catalogs of one holding library, a holding library being a collection with its own catalog. The first major program in the processing sequence was PREPROS, which was written in IBM 360 Basic Assembler Language ( BAL) and run on an IBM 360/75. PREPROS converted records from the weekly MARC II tapes to an OCLC internal processing format, including conversion of MARC II characters from ASCII to EBCDIC code. This program also parsed LC call numbers and partially formatted them. It also checked for end-of-field and end-of-record characters and verified the length of record. Finally, it wrote the output records in LC card number sequence into huge variable format blocks of 20,644 characters. The large blocks reduced computer costs since the pricing algorithm employed on the IBM 360/75 imposed a charge for each physical read and write operation. The magnetic tape output weekly by PREPROS was then submitted to CNVT together with the old master file of bibliographic records in LC card number order and a file of request cards that had been sorted in LC card number order. CNVT merged the records on the weekly tape with the master file and then matched the requests by LC card number. When a match was obtained, CNVT deleted some fields from the bibliographic record and formatted the call number according to the specifications of the library that had originated the request. It then wrote the modified record and associated POT's onto an output tape in external IBM 7094 binary- coded-decimal (BCD) character code with the record format converted to that of the Yale Bibliographic System. The second principal product of CNVT was the new master tape of bibliographic records that would become the old master for the next week's run. CNVT also punched out a card bearing the LC card number for each request card for which there was a match. These punch cards were used to withdraw cards from the request card file so that they would not be submitted again. CNVT was first run on an IBM 360/50. The tape file of modified records and POT's was then submitted to 162 Journal of Library Automation Vol. 5/3 September, 1972 OHIO COLLEGE LIBRARY CENTER Ca t alog Pr o f ile Questionnaire I. To define the pack of a rece i ving catalog, the Member should complete the following tabl e . Directions for completing the table are in the Instruction Manual , pp . 2 -3. Leave blank rows for types o f entry not to be included ~n th~s pack. II. 1. What is the name of the holding library or collection for which this pack contains c a rds? Jvvc.;.. ,L... ~~ ::S 2 . What is the name of the receiving catalog into which this pack will go? U"'"" ~~~;~\ ""~R_s~ 3 . If this receiving catalog is not in the holding library or collection , 11 11 11111 put in the following box the stamp to appear above the call number · · · · · · · · · (see Instruction Manual) . lnsti tution: \)," JCl.<>·~ or;. [:>.,\( • .-<' Fig. 2. Catalog Profile Questionnaire. EXPAND, a modified Yale program written in MAD and run on an IBM 7094. By combining the number of tracings and PDT requirements, EX- pAND developed a card image for each catalog card required by the requesting library. It also prepared a sort tag for each image so that the image could be subsequently sorted by library into packs and alphabetized within each pack. EXPAND essentially did the formatting of catalog cards except for the complex LC call number formatting carried out by CNVT. The file of card images was passed to a program named Build Print Tape (BLDPT) written in BAL and run on the IbM 360/ 75. BLDPT first con- verted the external IBM 7094 BCD characters to EBCDIC. Next BLDPT sorted the images, and finally, it arranged the images on a single tape to allow printing on continuous, two-up catalog card forms- the first half of the sorted file was printed on the left-hand cards and the second half on the right. The PRINT program was also written in BAL but run on an IBM 360/ 50. It was designed so that either the entire file or a segment as small as four cards could be printed; the latter feature was of greatest use in reprinting cards that for one of several reasons were not satisfactorily printed during the first run. Cards were printed six lines to an inch and the print train used was a modified version of the train designed by the University of Chicago which in turn was a modified version of the IBM TN train. Shared Cataloging SystemjKILGOUR, et al. 163 The printer attached to the IBM 360/50 was an IBM 1403 N1 printer. This printer appears to be superior to any other high-speed printer cur- rently available, but to obtain a product of high quality, it was necessary to fine-tune the printer, to use a mylar ribbon from which the ink does not flake off, and to experiment with various mechanical settings to determine the best setting for tension on the card forms and for forms thickness. Above all, patience in large amounts was required during initial weeks when it seemed as though a messy appearance would never be eliminated. OCLC off-line catalog card production programs were written in as- sembler language and higher level languages. Use of higher level languages for character manipulation incurs unnecessarily high costs. Therefore, for a large production sys tem like OCLC, it is absolutely required that processing programs and subroutines that manipulate all characters, character by character, be written in an assembler language to obtain efficient programs that run at low cost. Programs that do not manipulate characters, such as the OCLC program for embedding PDT's in CNVT, may well be written in a higher level language. Materials and Equipment-A Summary Off-line catalog production was based on availability of MARC II records on magnetic tapes disseminated weekly by the Library of Congress. Without the MARC II tapes, the off-line procedure could not have operated. Each week, the new MARC II records were added to the previous cumulated master file also on magnetic tape, and previously unfilled and new requests were run against the updated file. OSU computers employed were an IBM 360/75, an IBM 360/50, an IBM 7094, and an IBM 1620. The run procedure was complex and therefore somewhat inefficient, but this inefficiency was traded off against a pre- dictably high expense to write a new card formatting program. Members submitted a request for card production on a punch card on which the member had written an LC card number. Members could specify a recycling period of from one to thirty-six weeks for running their request cards against the MARC II file before unfulfilled requests would be re- turned. In general, request cards bore LC card numbers for that section of the MARC II file that was complete; at first, the file was inclusive for only "7" series numbers, but in early 1971 the RECON file for "69" numbers was added. Request cards often numbered several thousand a week. Catalog card forms are the now-familiar two-up, continuous forms with tractor holes along each side for mechanical driving. The card stock is Permalife, one of the longest-lived paper stocks available. A thin slit of about one thirty-second of an inch in height converts each three-inch verti- cal section of card stock to 75 mm. The lowest price paid in a lot of a half million cards has been $8.065 per thousand. After having been printed, the card forms are trimmed on a modified UARCO Forms Trimmer, model number 1721-1. This trimmer makes four 164 Journal of Library Automation Vol. 5/3 September, 1972 continuous cuts in the forms and produces cards with horizontal dimensions of 125 mm. Cards are stacked in their original order as printed and are therefore in filing order. The trimmer operates at quoted speeds of 115 and 170 feet per minute or 920 and 1,360 cards per minute. Measurements of speeds of operations confirmed these ratings. Results The off-line catalog production system produced 529,893 catalog cards from July 1970 through August 1971 at an average cost of 6.57 cents per card. This cost includes over twenty separate cost elements plus a three- quarter cent charge for overhead. The firm of Haskins & Sells, Certified Public Accountants, reviewed the costing procedures that OCLC employs, found that all direct costs were being included, and recommended the three-quarter cent overhead charge. The number of extension cards varies from library to library depending almost entirely on the types of cards on which libraries have elected to print tracings. However, one university library with a half-dozen department libraries and requiring tracings on only shelf list and main entry cards averages approximately six cards per title. Cataloging using the OCLC off-line system results in a decrease of staff requirements, and some libraries that used the system during most of the year found that they needed less staff in cataloging. Reduction of staff by taking advantage of normal staff turnover facilitated financial preparation for the OCLC on-line system in these libraries. Evaluation Despite the obvious inefficiences generated by running production com- puter programs on four different computers in two different locations and despite inefficiencies in the programs themselves, computer costs to process MARC II tapes and to format catalog cards, but not to print them, was 2.27 cents per card. As will be shown later, newer and more efficient pro- grams have halved this cost, but even at 2.27 cents per card for formatting and .33 cents per card for printing, the cost of OCLC off-line card produc- tion is less than half the cost of more traditional card production methods ( 7). Two features originally designed into the system were never imple- mented, somewhat diminishing the usefulness of the system for some libraries. One of the incompleted features was a technique for deleting, changing, or adding a field to a MARC record (this capability exists in the on-line system). Absence of this procedure meant that libraries had to accept LC cataloging without modification except to call numbers. The second missing feature was the ability to print multiple holding locations on cards (this capability also exists in the on-line system) although it was possible to print multiple holdings in one location. This deficiency limited the usefulness of the system for large libraries processing duplicates into Shared Cataloging SystemjKILGOUR, et al. 165 two or more collections. Both of these features could have been activated, . but shortage of available time prior to activation of the on-line system prevented their implementation. Figure 3 shows the high quality of the catalog cards produced. Subse- quent to attainment of this level of quality, there have been no complaints from members except in cases where a piece of chaff from the card forms went through the printer and caused omission of characters. OCLC con- tinues to vary the design of its continuous forms to achieve completely chaff-free stock. The shortest possible time in which cards could be received by the member library after submitting a request card was ten days, but it is doubtful that this response time was often achieved. The minimum average response time for the three-quarters of requests for which a MARC record was located on the first run was two weeks. Delays at a computer center or incorrect submission of a run could extend this delay to three and four weeks, and unfortunately such delays were cumulative for subsequent requests until the "weekly" runs were made sufficiently more often than weekly to catch up. If another delay occurred during a catch-up period, the response time further degraded. During the fourteen months of operation, there were two serious delays. The amount of normal turnover that occurred in OCLC libraries during the fourteen months and that was taken advantage of by not filling positions was too small to reduce the financial burden incurred in starting up the on-line system. A few libraries demonstrated that it was possible to take advantage of such attrition. However, 20 percent of the libraries did not participate in the on-line system and perhaps half of those who did partici- pate were uncertain as to whether the on-line cataloging system would operate or would operate at a saving. When feasibility of on-line shared cataloging has been substantiated and other centers begin to implement similar systems, it should be possible to activate off-line catalog production sufficiently in advance of on-line imple- mentation to enable participants to take adequate advantage of normal attrition to minimize, or nearly eliminate, additional expenditures. Experi- ence such as that of OCLC will enable new centers to calculate the number of months necessary for off-line production required to reduce salary expenditures by an amount needed to finance the on-line system. SHARED CATALOGING-ON-LINE The cataloging objectives of the on-line shared cataloging system are to supply a cataloger with cataloging information when and where the cataloger needs the information and to reduce the per-unit cost of catalog- ing. Catalog products of the system are the same as the off-line system- catalog cards in final form alphabetized for £ling in specific catalogs; the on-line system is not limited to MARC II records but also allows cataloging input by member libraries. The shared cataloging system, which accommo- 166 ]oumal of Library Automation Vol. 5/3 Septembe r, 1972 JC423 oLl7 CtT TT 171 oE45 1971 OAkU La~reit de Lacharr~ere Repe. Stude& sur ta theorle deaocratlc: Spinoza, Rousseau, Beaet, Marx. Paris, Pa:yot, 1963o 218 P• 23 c •• C8ibllotheque politique et econoaique) Bibliocraphical ~ootnotes. Dawis, Mildred J., edo Babroider:y desians, 1780-1820; 1ro• the aanuscript collection, the Textile Resource and Research Center, the Valentine Museua, Richaond, Virginia. Edited by Mildred J. Davis. New York, Crown Publishers f1971 1 xiii, 94 P• (chie1l:y illus. (part colo)) 29 c•• COMMERCIAL POLICY. 338.91 1:875In l:reinin, Wordechai Elihau, 1930- 00 International econo•ics; a polic y approach (b:y) Mordechai E. ~reinin• Mew York Harcourt, Brace, Jovanovich [ 1971 1 x., 379 P• it lus. (The Harbrace series in business and econo•i cso) DC 430.5 • z9 C34 OAkO Intersectoral capital ~tows in the econoaic dewelopaent o~ Taiwan, 1 89 5- 1960. Lee, Tena-hui • Intersectoral capital ~lows in th e econoaic developaent o~ Taiwan, 1 8&~- 1960. Ithaca (N.Y.] Cornell Univ ers it y Press [ 1971 1 XXt 197 P• 23 Cao An out&rowth o~ the author's the~is , Cornell Oniwersit:y, 1968. Bibliography: P• (183)-1 8 1. 0 A"Rnt 76-1 59031 ( Fl(;UH E Fig. 3. Computer-Produced Catalog Cards. HED UCED 25%) dates all cataloging done in modern European alphabets, builds a union catalog of holdings in OCLC member libraries as cataloging is done. One library, Wright State University, is converting its entire catalog to machine- readable form in the OCLC on-line catalog. The third major goal is a com- munications system for transacting interlibrary loans. System Design and Equipment Selection Figure 4 depicts the basic design of computer and communication com- Shared Cataloging Systemf KILGOUR, et al. 167 ponents for th e comprehensive system comprised of the five subsystems described in th e introduction. The machine system for shared cataloging was designed to be a subsystem of the total system so that subsequent modules could be added with minimal dismption . Similarly, the logical d esign of the shared cataloging subsystem was constructed so that the modules of shared cataloging would be common to the remaining file requirements as shown in Figure 1. Design of the on-line shared cataloging system began with a redefinition of the catalog products of off-line catalog production ( 5) . In this exercise, the Advisory Committee on Cataloging, comprised of members from seven libraries, contributed valuable assistance. The committee was also most helpful in designing the formats of displays to appear on terminal screens. Important decisions in the design of the computer, communications, and terminal systems were those involving mass storage devices and terminals. Random access storage was the only type feasible for achieving the objec- tive of supplying a user with bibliographic information when and where he needed it. Hence, random access memory devices were selected for the comprehensive system and ipso facto for shared cataloging. Data Channel System File Catalog F1 l e Data Channe I MEMORY Drive Contr ol Data Channe l - - ----Connect1on made 1f CPU #I malfunct ions - ·- Connect1on made if CPU #2 ma l funct1ons Fig. 4. Computer and Communication System. 168 ]oum.al of Library Automation Vol. 5/ 3 September, 1972 The cathode ray tube (CRT) type of terminal was selected primarily because of its speed and ease of use by a cataloger. CRT terminals are far more flexible in operation than are typewriter terminals from the viewpoint of both the user and machine system designer. For these reasons, CRT terminals can enhance the amount of work done by the system as a whole. It was originally planned to select a computer without the assistance of computerized simulation, but in the course of time, it became clear that it was impossible to cope with the interaction among the large number of variable computer characteristics without computerized simulation. There- fore, a contract was let to Comress, a firm well known for its work in com- puter simulation. Ten computer manufacturers made proposals to OCLC for equipment to operate the five subsystems at peak loading (an average five requests per second over the period of an hour ) . All ten proposed computer systems failed because simulation revealed inefficiencies in their operating systems for OCLC requirements. OCLC and Comress staff then proposed a modification in operating systems, which the manufacturers accepted. The next series of trials revealed that more than half of the computers or secondary memory files would have to be utilized over 100 percent of the time to process the projected traffic. As a result of these findings , one computer manufacturer withdrew its proposal, and five others changed proposals by upgrading their systems. On the final simulation runs, the percent of simulated computer utilization ranged from 19.70 percent to 114.31 percent. A subsequent investigation of predictable delays due to queuing in such a system showed that unacceptable delays could arise if computer utiliza- tion rose above 30 percent at peak traffic. Three manufacturers proposed computer systems that were under 30 percent utilization and, for these, a trade-off study was made that included such characteristics as cost, re- liability, time to install the applications system, and simplicity of program design. The findings of the simulation and trade-off studies provided the basis of the decision to select a Xerox Data Systems Sigma 5 computer. Major components of the OCLC Sigma 5 are the central processing unit (CPU), three banks of core memory with a total capacity of 48 thousand 32-bit words or 192 thousand 8-bit bytes, a high speed disk secondary memory, 10 disk-pack spindles with total capacity of 250,000,000 bytes plus two spare spindles, two magnetic tape drives, two multiplexor channels, five communications controllers, a card reader, card punch, and printer. The character code is EBCDIC. Figure 5 illustrates the Sigma 5 configuration at OCLC. In this configuration, the burden of operating communications does not fall on the CPU so that there is no requirement for "cycle stealing" that slows processing by a CPU. The lease cost to OCLC of the equipment represented in Figure 5 is $16,317 monthly. The listed monthly lease of the equipment is $21,421 from which an educational discount of 10 percent is deducted. (The remaining difference is due to a rebate because the original order included secondary Shared Cataloging System j KILGOUR, et al. 169 memory units that XDS was to obtain from another manufacturer who proved incapable of supplying units that fulfilled specifications. Hence, XDS was forced to supply other memory units having a higher list price but has done so at a cost per bit of the units originally ordered.) The printer furnished with the Sigma 5 does not provide the high-quality printing required for library use. At the present time, OCLC prints catalog cards on an OSU IBM 1403-N1 printer that without doubt provides the highest quality printing currently available from a line printer. However, OCLC is designing an interface between a Sigma 5 and an IBM 1403 Memory Bonk No. I --Dolo ---- Control Memory Bonk No. 2 Memory Bonk No. 3 I I I ----- - ----------~ l I 1 r----------r Sigma 5 CPU Multi- plexor Opera! or's Console Cord Punch Magnetic Tope Units Cord Reader Dolo Bose Disk Bonk No. I Doto Bose Disk Bonk No. 2 _______ !J Bus-Shor in g 1---+----, Mull i- plexor Fig. 5. XDS Sigma 5 Configuration. 170 journal of Library Automation Vol. 5/3 September, 1972 printer; XDS is also developing a new type of printer that will provide high quality output. When the Sigma 5 can produce quality printing, it will be fully qualified to be used for nodes in national networks. As has already been stated, the CRT-type terminal was selected because of its ease of use. Moreover, the simulation study confirmed that CRT terminals would place far less burden on the central computer and therefore, for the OCLC system, would make possible selection of a less expensive computer than would be required to drive typewriter terminals. Although typewriter terminals cost less, the total cost could be higher for a system employing typewriter terminals than for one using CRT's because of greater central computer expense. Library requirements for a CRT terminal are: 1) that the terminals have the capability of displaying upper- and lower-case characters and diacritical marks; 2) that the image on the screen be highly legible and visible; 3) that the terminal possess a large repertoire of editing capabilities; and 4) that interaction with the central computer and files be simple and swift. System requirements were: 1) that the terminal accept and generate ASCII code; 2) that it make minimal demands for message transmissions from and to the central site; 3) that it have the capability of operating with at least a score of other terminals on the same dedicated line; and 4) that its cost, including service at remote sites, be about $150 per month. Data were collected on CRT's produced by fifteen manufacturers, and three machines were identified as being prime candidates for selection. OCLC carried out a trade-off study in which thirty-three characteristics were assessed for these three machines. One of the thirty-three (reliability) could not be judged for any of the three because none had yet reached the market. For the remaining characteristics, the Irascope LTE excelled or equaled the other two terminals for twenty-eight characteristics including all nineteen characteristics of importance to the OCLC user. Moreover, the Irascope was outstandingly superior in its ability to perform continuous insertion of characters, line wrap-around during insertion of characters, repositioning of characters so that each line ends in a complete word, and full use of its memory. However, the Irascope was the most expensive- $175 a month as compared with $153 and $166. Nevertheless, the Irascope was selected because of its obvious superiority. Pilot operation by library staffs has not produced complaints concerning visibility or operability; complaints during pilot operation have sprung from failures caused by a variety of bugs in telephone systems and a couple of bugs in the terminals that were subsequently exterminated. The number of terminals needed by a member library for shared catalog- ing was calculated on the assumption that six titles could be processed per terminal-hour. It was also assumed that a library might have only one staff member to use the terminal throughout the year. It was further assumed that as much as three months of the terminal operator's time would be lost to vacations, sick leave, and breaks. At the rate of six titles per terminal-hour Shared Cataloging System f KILGOUR, et al. 171 and with 2,000 working hours in a year, 12,000 titles would be processed annually assuming full-time use. Since only nine months was assumed to be available, it was estimated that 9,000 titles would be processed at each terminal. In large libraries where there would be more than one staff member to operate a terminal, there would be three months of time available to do input cataloging, and since only a few libraries will be obtaining less than 75 percent of cataloging from the central system, it appears that a formula of one terminal for every 9,000 titles or fraction thereof cataloged annually would give each library sufficient terminal-hours. In actual operation, operators have been able to work at twice the assumed rate of six titles per terminal-hour so that there is reason to believe that these guidelines will provide adequate terminal capability. File Organization The primary data that will enter the total system are bibliographic records, and since the system is being designed to conform to standards, the National Standard for Bibliographic Interchange on Magnetic Tape ( 8) has been complied with in file design. In other words, the system can pro- duce MARC records from records in the OCLC file format; more specific- ally, the system can regenerate MARC II records from OCLC records derived originally from MARC II records, although an OCLC record contains only 78 percent of the number of characters in the original MARC II record. Similarly, the system can generate MARC II records from original cataloging input by member libraries. The simulation study clearly showed that bibliographic data would have to be accessed in the shortest possible time if the system were to avoid generating frustrating delays at the terminal. Imitation of library manual files or of standard computer techniques for file searching would not pro- vide sufficient efficiency. OCLC, therefore, set about developing a file organization and an access method that would take advantage of the computation speeds of computers. OCLC research on access methods has produced several reports ( 9,10,11) and has developed a technique for deriving truncated search keys that is efficient for retrieval of single entries from large files. These findings have been employed in the present system that contained over 600,000 catalog records in April1973, arranged in a sequential file on disks, and indexed by a Library of Congress card-number index, author-title index, and a title index. The research program on access methods did not, however, investi- gate methods for storing and retrieving records. Research on file organization included experiments directed toward development of a file organization that would minimize processing time for retrieval of entries or for the discovery that an entry is not in the file. Since the OCLC system is designed for on-line entry of data into the data base, it was not possible to consider a physically sequential file for the index files. 172 ]ottmal of Library Automation Vol. 5/ 3 September, 1972 The indexed sequential method of file organization obviates the data-entry obstacle posed by physical sequential organization, but is inefficient in operation. Consequently, scatter storage was determined to be the best method for meeting the efficient file organization requirements of the system. The findings of the investigation have shown that very large files of bibliographic index entries organized by a scatter-store technique in which search keys are derived from the main entry can be made to operate very efficiently for on-line retrieval and at the same time be sparing of machine time even in those cases where requests are for entries not in the file ( 12). This research also produced two powerful mathematical tools for predicting retrieval behavior of such files, and a design technique for optimizing record blocking in such files so that, on the average, only one to two physical accesses to the file storage device are needed to retrieve the desired information. The files displayed in Figure 1 are constructed by a single file-building program designed so that additional modules can be embedded in the program. The program accepts a bibliographic record, assigns an address for it in the main sequential file, and places the record at that address. Having determined the bibliographic record address, the program next derives the author-title search key and constructs an author-title index file entry which contains the pointer to the bibliographic record. Then the program produces an LC card number index entry and a title index entry, each of which contains the same pointer to the bibliographic record. When a bibliographic record is used for catalog card production, an entry is made in the holdings file. When the first holdings entry is made for a bibliographic record, a pointer to the holdings entry is placed in that record; the pointer to each subsequent holdings entry is placed in the previous holdings entry. An entry is made at the same time in the call number index containing a pointer to the holdings entry. This file organization operates with efficiency and economy. The files containing the large bibliographic records and their associated holdings information are sequential, and hence, are highly economical in disk space. The technique used ensures that only a low percentage of available disk area need be reserved for growth of these large sequential files. Disk units can be added as needed. Each fixed-length record in the scatter-store files is less than 3 percent of the size of an average bibliographic record, and since 25 percent to 50 percent of these files are unoccupied, the empty disk area is small because of the small record lengths. Sequential Files The bibliographic record file and holdings file are sequential files, the holdings file being a logical extension of the bibliographic record file. A record is loaded into a free position made available by deletion of a record or into the position following the last record. Whenever a new version of a Shared Cataloging System/KILGOUR, et al. 173 record updates the version already in the file, the new record is placed in the same location as the old if it will fit; otherwise, it is placed at the end of the file and pointers in the indexes are changed. There is a third, small sequential file containing unique notes for specific copies, dash entries, and extra added entries. Each bibliographic record contains the information in a MARC II record. Each record also contains a 128-bit subrecord capable of listing up to 128 institutions that could hold the item described by the record. At the present time, only 49 of the 128 bits are used since there are 49 institutions partici- pating in OCLC. The record also includes pointers to entries in index files, so that the data base may be readily updated, and a pointer to the beginning of the list of holdings for the record. In addition, each record has a small directory for the construction of truncated author-title-date entries, which are displayed to allow a user to make a choice whenever a search key indexes two or more records. Although each bibliographic record includes all information in a standard MARC II record, records in the bibliographic record file have been reduced to 78 percent of the size of the communication record largely by reducing redundancy in structural information. OCLC intends to compress bibliogra- phic records further by reducing redundancy in text by employing com- pression techniques similar to those described in the literature ( 13,14). The holdings file contains a string of holdings records for each biblio- graphic record; individual records are chained with pointers. Information in each record includes identity of the holding institution and the holding library within the institution, a list of each physical item of multiple or partial holdings, the call number and pointers to the next record in the chain, and to the call number index. The last record in the chain also has a back-pointer to the associated bibliographic record. Whenever there is a unique note, dash entry, or extra added entry coupled to a holding, that holding has a pointer to a location in the third sequential file in which the note or entry resides. Index Files Indexes include an author-title index, a title index, and an LC card num- ber index. Research and development are under way leading to implementa- tion of an author and added author index and a call number index. A class number index will be developed and implemented in the future. With the exception of the class number index, which by its nature is required to be a sequentially accessible file, the OCLC indexes are scatter storage files. The construction of and access to a scatter storage file involves the calculation of a home address for the record and the resolution of the collisions that occur when two or more records have the same home address. The calculation of a home address comprises derivation of a search key from the record to be stored or retrieved and the hashing or randomizing of the key to obtain an integer, relative record address that is converted to a 174 Journal of Library Automation Vol. 5/3 September, 1972 storage home address. The findings of OCLC research on search keys has been reported (9,10,11). The hashing procedure employs a pseudo-random number generator of the multiplicative type: Home Address= rem ( K x.jm) where K is the multiplier 65539, x,. is the binary numerical value of the search key, and m is the modulus which is set equal to the size of the index file; 'rem' denotes that only the remainder of the division on the right-hand side is used. Philip L. Long and his associates have shown that efficiency of a scatter storage file is rapidly degraded when the loading of the file exceeds 75 percent ( 12 ); therefore, OCLC initially loads files at 50 percent of physical capacity. Hence, the modulus is chosen to be twice th e size of initial number of records to be loaded. When 75 percent occupancy is reached a new modulus is chosen and the file is regenerated. Collisions are resolved using the quadratic residue search method pro- posed by A. C. Day ( 15) and shown to be efficient ( 12). In this method, a new location is calculated when the home address is full; the first new location has the value (home address - 2), the second (home address - 6 ), the third ( home address - 12 ) and so on until an empty location is found if a record is being placed in the file, or the end of the entry chain is found if records are being retrieved. When the file size m is a prime having the form 4n + 3, where n is an integer, the entire file may be examined by 1n searches. Retrieval Techniques The retrieval of a record or records from the OCLC data base is achieved in fractions of a second when a single request is put to the file, and rarely exceeds a second when queuing delays are introduced by simultaneous operation of upwards of 50 terminals. Response time at the terminal is greater than these figures because of the low communication line data rate, but terminal response time rarely exceeds five seconds. Figure 6 shows the map of a record in the author-title index file and the title file. In the author-title file, the search key is a 3,3 key with the first trigram being the first three characters of the author entry and the second being the first three characters of the first word of the title that is not an English article (9). For example, "Str,Cha" is the search key for B. H. Streeter's The Chained Library. However, any or all of the characters in the trigrams may be all in lower case. The author-title index also indexes title-only entries, but the title index provides a more efficient access to this type of entry. The pointer in the record map in Figure 6 is the address of the bibliographic record from which the search key was d erived. The Entry Chain Indicator Bit is set to 0 (zero) if there is another record in the entry chain and to 1 if the record is last in the chain. When this bit is 0, the search skips to the next record as calculated by Day's skip algorithm. The Shared Cataloging SystemjKILGOUR, et al. 175 Bibliographic Record Presence Indicator Bit is set to 0 (zero) to indicate that the bibliographic record associated with this index entry has been deleted; it is set to 1 to indicate that the bibliographic record is present. An author-title search of the data base is initiated by transmission of a 3,3 key from a terminal. A message parser analyzes the message and identi- fies it as a 3,3 author-title search key by the presence of the comma and by there not being more than three characters on either side of that comma. Next, the hashing algorithm calculates the home address and the location is checked for the presence of a record. If no record is present, a message is sent to the terminal stating that there is no entry for the key submitted and suggesting other action to be taken. If a record is present and its key matches the key submitted and if the entry-chain indicator bit signifies that the record at the home address is the only record in the chain, the biblio- graphic record which matches the key submitted is displayed on the terminal screen. If the entry-chain bit signifies that there are additional records in the chain, those records are located by use of the skip algorithm. If more than one record possesses the same key as that submitted, truncated author-title- date entries derived from the matching bibliographic records are displayed with consecutive numbering on the terminal screen. The user then indicates by number the entry containing information pertaining to the desired work, and the program displays the full bibliographic record. The title-index record has the same map as the author-title record and is depicted in Figure 6. The title index is also constructed and searched in Entry Chain Indicator Bit 4 Bytes Bibliographic Record Pointer Nome- Title Search Key Bibliographic Record Presence Indicator Bit Bibliographic Record Pointer Title Search Key Fig. 6. Author-1'itte and Title Index Records. 8 Bytes 176 ]ou,-nal of Library Automation Vol. 5/ 3 September, 1972 the same manner as the author-title index. The title search key ( 3,1,1,1) consists of the first three characters of the first word of the title that is not an English article plus the initial character of each of the next three words. Commas separate the characters derived from each word. The title search key is "Cha,L," for B. H. Streeter's The Chained Libmry, the three commas signifying that the message is a title search key. The bibliographic record pointer and the two indicator bits have the same function as in the author- title record. Figure 7 exhibits the map for a record in the LC card number index. The three left-most bytes in the LC card number section contain an alphabetic prefix to a number where this is present, or, more usually, three blanks when there is no alphabetic prefix. Similarly the right-most byte contains a supplement number or is blank. The middle four bytes contain eight digits packed two digits to a byte after the digits to the right of the dash have been, when necessary, left-filled with zeroes to a total of six digits. The dash is then discarded. For example, LC card number 68-54216 would be 68054216 before being packed. The pointer and the two indicator bits have the same function as in the author-title index record. An LC card number search is started with the transmission of an LC card number as the request. The parser identifies the message as an LC card number search by determining that there is a dash in the string of characters and that there are numeric characters in the two positions immediately to the left of the dash. The remainder of the search procedure duplicates that for the author-title index. On-Line Programs As is the case with all routinely used OCLC programs, the on-line programs are written in assembly language to achieve the utmost efficiency in processing. In addition, every effort has been made to design programs to run in the fastest possible time. In other words, one of the main goals of the OCLC on-line operation is lowest possible cost. The simulation study had shown that it was necessary to modify the operating system of the XDS Sigma 5 so that the work area of the operating system would be identical with that of the applications programs. The XDS Real-time Batch Monitor, which is one of the operating systems furnished by XDS for the Sigma 5, has been extensively altered, and one of the alterations is the change to a single work area. Another major change to the operating system was building into it the capability for multi- programming. At the present time, the on-line foreground of the system operates two tasks in that two polling sequences are running simultaneously, and the background runs batch jobs at the same time. This new monitor is called the On-Line Bibliographic Monitor ( OBM). An extension of OBM is named MOTHERHOOD (MH); MH supervises the operation of the on-line programs. MH also keeps track of the activities of these programs and compiles statistics of these activities. In addition, MH Shared Cataloging SystemjKILGOVR, et al. 177 contains some utility programs such as the disk and terminal 1/0 routines. The principal on-line application program is CATALOG (CAT); its functions are described in detail in the subsequent sections entitled Catalog- ing with Existing Bibliographic Information and Input Cataloging. In general, CAT accepts requests from terminals, parses them to identify the type of request, and then takes appropriate action. If a request is for a bibliographic record, CAT identifies it as such, and if there is only one bibliographic record in the reply, CAT formats the record in one of its work area buffers and sends the formatted record to the terminal for display. If more than one record is in the reply, CAT formats truncated records and puts them out for display. After a single bibliographic record has been dis- played, CAT modifies the computer memory image of the record in accord- ance with update requests from the terminal. For example, fields such as edition statement or subject headings may be deleted or altered, and new fields may be added. When the request is received from the terminal to produce catalog cards from the record as revised or unrevised, CAT writes the current computer memory image of the record onto a tape to be used as input to the catalog card production programs. The catalog card production programs operate off-line, and the first processing program is CONVERT ( CNVT), which formats some of the fields and call numbers. The major activity of CNVT is the latter, for libraries require a vast number of options to set up their call numbers for printing. CNVT also automatically places symbols used to indicate over- sized books above, below, or within call numbers as required. FORMAT is the second program; it receives partially formatted records from CNVT. FORMAT expands each record into the total number of card images corresponding to the total cards required by the requesting library 4 Bytes Bibliographic Record Pointer lBibliogrophic 8 Bytes LC Cord Number Record Presence Indicator Bit Entry Choin Indicator Bit Fig. 7. Library of Congress Card Number Index Record. 178 ] ournal of Libm1·y Automation Vol. 5/3 September, 1972 for each particular title. FORMAT determines this total from the number of tracings and pack definition tables previously submitted by the library that define the printing of formats of cards to go into each catalog. FORMAT, which is an extensive revision of EXPAND, contains many options not found in the old off-line catalog card production system. FOR- MAT can set up a contents note on any particular card, and puts tracings at the bottom of a card when tracings are requested. The author entry normally occurs on the third line, but if a subject heading or added entry is two or more lines long, FORMAT moves the author entry down on the card so that a blank line separates the added entry from the author entry. In other words, each card is formatted individually. The major benefit of this feature, which allows the body of the catalog data to float up and down the card, is that the text on most cards can start high up on the card, thereby reducing the number of extension cards. The omission of tracings from added entry cards has a similar effect. Table 1 presents the percentage of extension cards in a sample of 126,738 OCLC cards for 18,182 titles produced for twenty-five or more libraries during a seventeen-day period, compared with extension cards in Library of Con- gress printed cards and in a sample of NELINET cards "for over 1,300 titles" ( 16). The table shows that the OCLC mixture of cards with and without tracings and with the floating body of text yields about 10.8 percent more extension cards compared to Library of Congress printed cards. Were libraries to restore the original meaning to the phrase "main entry" by printing tracings only on main entry cards, the percentage of extension cards in computer produced catalog cards printed six lines to the inch would probably be less than for LC cards. FORMAT also sets up a sort key for each record and a sort program sorts the card images by institution, library, catalog, and by entry or call number within each catalog pack. Another program, BUILD-PRINT-TAPE (BPT), arranges the sorted images on tape so that cards are printed in consecutive order in two columns on two-up card stock. F inally, a PRINT program prints the cards on an IBM 1403 N1 Printer attached to an IBM 360/50 computer. Cataloging With Existing Bibliographic Information This section describes cataloging using a bibliographic record already in the central file; the next section, entitled Input Cataloging describes cata- loging when there is no record in the system for the item being cataloged. The cataloger at the terminal first searches for an existing record, using the LC card number found on the verso of the title page or elsewhere. If the respon se is negative or if there is no card number available, the cata- loger searches by title or by author and title using the 3,1,1,1 or 3,3 search keys respectively. If these searches are unproductive, the cataloger does input cataloging. When a search does produce a record, the cataloger reviews the record Shared Cataloging SystemjKILGOUR, et al. 179 Table 1. Extension Catalog Card Percentages Number OCLC Lilnary of NELl NET of MARC II Congress MARC II Cards Cards Printed Cards Cards 1 77.2 87.8 79.9 2 18.9 10.0 16.7 3 2.5 1.6 2.5 4 1.1 .3 .6 5 .2 .2 .1 6 .1 .2 to see if it correctly describes the book at hand. If it is the correct record and if the library uses Library of Congress call numbers, the cataloger tra nsmits a request for card production by depressing two keys on the keyboard. Cataloging is then complete. If the LC call number is not used, the cataloger constructs and keys in a new number and then transmits the produce-cards request. If the record does not describe the book as the cataloger wishes, the record may be edited . The cataloger may remove a field or element, such as a subject heading. Information within a field may be changed by replacing existing characters, such as changing an imprint date by overtyping, by inserting characters, or by deleting characters. Finally, a new field such as an additional subject heading may be added. When the editing process is complete, the cataloger can request that the record on the screen be refor- matted according to the alterations. Having reviewed the reformatted version, the cataloger may proceed to card production. When a cataloger has edited a record for card production, the alterations in the record are not made in the record in the bibliographic record file. Rather, the changes are made only in the version of the record that is to be used for card production. The edited version of the record is retained in an archive file after catalog card production so that cards may be produced again from the same record for the same library, should the need arise in the future. The author index currently under development will enable a cataloger to determine the titles of works in the file by a given author. The call number index, also currently being developed, will make it possible for a cataloger to determine whether or not a call number has been used before in his library. The class number index that will be developed in the future will provide the capability of determining titles that have recently been placed under a given class number or, if none is under the number, the class number and titles on each side of the given number. liiJJUl Cataloging Input cataloging is undertaken when there is no bibliographic record in the file for the book at hand. To do input cataloging, the cataloger requests 180 ]ounwl of Library Automation Vol. 5/3 September, 1972 that a work form be displayed on the screen (Figure 8 ) . The cataloger then proceeds to fill in the work form by keyboarding the catalog data, and trans- mitting the data to the computer field by field as each is completed. A~ shown in Figure 8, a paragraph mark terminates each field ; each dash is to be filled in by the cataloger for each field used. Input cataloging may be original cataloging or may use cataloging data obtained from some source other than the OCLC system. Type: Form: Intel I vi: Bib I lv I: 1T ~ t> 1-- t> 2 24- t> 3 250 t> 4 260- t> 5 300 t> 6 4-- t> 7 5-- - t> 8 6-- t> 9 7-- t> 10 8-- - t> I I 092 t> 12 049 -- t> 13 590 Fig. 8. Workform for a Dewe y Library. Lang: ISBN Card No: d ~ b c ~ 1T b c ~ b c ~ d 1T ~ -« d ~ 1T b-J 4[ 1T Shared Cataloging SystemjKILGOUR, et al. 181 When the catalog data has been input, revised, and correctly displayed on the terminal screen, the cataloger requests catalog card production. In the case of new cataloging, not only are cards produced, but also the new record is added to the file and indexed so that it is available within seconds to other users. If a MARC II record for the same book is subsequently added to the file, it replaces the input-cataloging record but does not disturb the holdings information. Union Catalog Each display of a bibliographic record contains a list of symbols for those member institutions that possess the title. In other words, the central file is also a union catalog of the holdings of OCLC member libraries, although in the early months of operation these holdings data are very incomplete. Nevertheless, they will approach completeness with the passage of time and with retrospective conversion of catalog data. Titles cataloged during the operation of the off-line system have been included in the union catalog. The union catalog function is an important function of the shared cataloging system, for it makes available to students and faculties, through the increased information available to staff members, the resources of academic institutions throughout Ohio. Libraries also use the union catalog as a selection tool since they can dispense with expensive purchases of little-used materials residing in a neighboring library. Members also use the file to obtain bibliographic data to be used in ordering. Assessment With over nine hundred thousand holdings recorded in the union catalog as of April 1973, it is clear that having this type of information immediately at hand will greatly improve services to students and faculties. Enlargement of holdings recorded will enhance the union-catalog value of the system. Wright State University is in process of converting its holdings using the OCLC system, and the Ohio State University Libraries-the largest collec- tion in the state-has already converted its shelf list in truncated form. The OSU holdings information will soon be available to OCLC members. Members using the OCLC system report a large reduction in cataloging effort. Two libraries using LC classification report that they are cataloging at a rate in excess of ten titles per terminal hour when cataloging already exists in the system. Libraries using Dewey classification are experiencing a somewhat lower rate. The original cost benefit studies were done on the basis of a calculated rate of six titles per hour for those books for which there were already cataloging data in the system. The net savings will be realized when the file has reached sufficient size to enable the largest libraries to locate records for 65 percent of their cataloging and for the smallest to find 95 percent. To reach this level, members collectively would have to use 182 Journal of Library Automation Vol. 5/3 September, 1972 existing bibliographic information to catalog 350,000 titles in the course of a year, or an average of approximately 1,460 titles for the total system per working day. It was thought that this rate would be attained by the end of the second year of operation. However, at the end of the first month of on-line operation, over a thousand titles per day were being cataloged. The new catalog card production programs operating on the Sigma 5 are much more efficient than the programs used in the older off-line system. Earlier in this paper it was reported that cost of the older programs to format catalog cards, but not to print them, was 2.27 cents per card. If costs of the Sigma 5 are calculated at commercial rates, the new programs format cards at 2.21 cents per card. However, if actual costs to OCLC are used and with the total cost being assigned to one shift, the cost of formatting each card becomes 0.86 cents. The total cost of producing catalog cards is, of course, much more than the cost to format them on a computer. Neverthe- less, either the 2.21 cents or 0.86 cents rate might serve as a criterion for measuring the efficiency of computerized catalog card production. The low terminal response-time delay for the operation of seventy terminals is a good gauge of the efficiency of the on-line system. In particu- lar, the file organization is efficient, for it enables retrieval of a single entry swiftly from a file of over 600,000 records. Moreover, no serious degradation in retrieval efficiency is expected to arise as the result of the growth of file size. The system operates from 7:00 A.M. to 7:00 P.M. on Mondays through Fridays, and at times the interval between system downtimes has exceeded a week. It is rare that the system will be down on successive days, and when a problem does occur, the system can be restored within a minute or two. Moreover, when the system goes down, only two terminals will occa- sionally lose data, and most of the time, there is no loss of data. Hence, it can be concluded that the hardware and software are highly reliable. In summary, it can be said that the OCLC on-line shared cataloging system is easy to use, efficient, reliable, and cost beneficial. ACKNOWLEDGMENTS The research and development reported in this paper were partially supported by Office of Education Contract No. OEC-0-70-2209 ( 506), Council on Library Resources Grant No. CLR-489, National Agricultural Library Contract No. 12-03-01-5-70, and an L.S.C.A. Title III Grant from the Ohio State Library Board. REFERENCES 1. Ellen Washy Miller and B. J. Hodges, "Shawnee Mission's On-Line Cataloging System," ]OLA 4:13-26 (March 1971). Shared Cataloging SystemjKILGOUR, et al. 183 2. John P . Kennedy, "A Local MARC Project : The Georgia T ech Library," in Proceedings of th e 1968 Clinic on Library A pplications of Data Pro- cessing. (U rbana , Ill.: University of Illinois Gradu ate School of Library Science, 1969 ) p . 199-215. 3. New England Board of High er Education, New England Librm·y Information Netw01·k; Final R ep01t on Council on Library Resources Grant #443. (Feb. 1970 ). 4. Charles T. Payne and Robert S. McGee, The University of Chicago Bibliographic Data Processing System: Documentation and Report Supplement, (Chicago, Ill. : University of Chicago Library, April1971). 5. Judith Hopkins, Manual for OCLC Catalog Card Production (Feb. 1971). 6. Ohio College Library Center, Pt·eliminary Description of Catalog Cards Produced from MARC II Data (Sept. 1969). 7. F. G. Kilgour, "Libraries-Evolving, Computerizing, Personalizing," American Libraries 3:141-47 ( Feb. 1972). 8. American National Standards Institute, American National Standard for Bibliographic Information Interchange on Magnetic Tape (New York: American National Standards Institute, 1971 ). 9. F. G. Kilgour, P. L. Long, and E. B. L eiderman, "Retrieval of Biblio- graphic Entries from a Name-Title Catalog by Use of Truncated Search Keys," Proceedings of the American Society for Information Science 7:79-82 ( 1970 ). 10. F. G. Kilgour, P. L. Long, E. B. Leiderman, and A. L. Landgraf, "Title- Only Entries Retrieved by Use of Truncated Search Keys," lOLA 4: 207-210 (Dec. 1971 ) . 11. Philip L. Long, and F. G. Kilgour, "A Truncated Search Key Title Index," lOLA 5:17-20 ( Mar. 1972). 12. P. L. Long, K. B. L. Rastogi, J. E. Rush, and J. A. W yckoff, "Large On-Line Files of Bibliographic Data : An Efficient Design and a Mathematical Predictor of Re trieval Behavior." IFIP Congress '71: Ljubljana -Aug ust 1971. ( Amsterdam, North Holland Publishing Co., 1971 ). Bookle t TA-3, 145-149. 13. Martin Snyderman and Bernard Hunt, "The Myriad Virtues of Text Compaction," Datamation 16:36-40 (Dec. 1970). 14. W. D. Schieb er and G. W. Thomas, "An Algorithm for Compaction of Alphanumeric Data," lOLA 4 :198-206 (Dec. 1970). 15. A. C. Day, "Full Table Quadratic Searching for Scatter Storage," Communications of the ACM 13:481 (Aug. 1970). 16. New England Board of Higher Education, New England Libmry In - formation . .. , p. 100-101. 5739 ---- lib-s-mocs-kmc364-20140601053153 184 ]oumal of Library Automation Vol. 5/3 September, 1972 TWO TYPES OF DESIGNS FOR ON-LINE CIRCULATION SYSTEMS Rob McGEE: Systems Development Office, University of Chicago Libra,ry On-line circulation systems divide into two types. One type contains records only for charged or otherwise absent items. The other contains a file of records for all titles or volumes in the library collection, regardless of their circulation status. This paper traces differences between the two types, examining different kinds of files and terminals, transaction evidence, the quality of bibliographic data, querying, and the possibility of functions outside circulation. Aspects of both operational and potential systems are considered. INTRODUCTION A literature survey was made of on-line circulation systems ( 1 ). To qualify for study, a system needed to perform any major circulation function on-line. Charging and querying were common. Some systems were also found to perform some acquisitions, cataloging, and reference work. Criteria used to examine systems have been presented in an earlier paper as key factors of circulation system analysis and design (2). This paper concep- tualizes the survey findings, and goes on to consider general problems and alternatives of designing on-line circulation systems. The survey shows that on-line circulation systems divide into two types, according to the scope of their bibliographic records. We give the term "absence file" to a set of records for only those items that have been charged or otherwise removed from their assigned locations. The name "item file" is given to what is, or approaches being, a comprehensive file of records for all titles or volumes in the library collection, regardless of their circulation status. Each on-line circulation system either does or does not have an item file. Systems without an item file must contain an absence file, and are Two Types of Designs/McGEE 185 therefore called "absence systems." Systems with an item file are called "item systems." (An item system may also have an absence file, depending upon its design.) Note that an "absence file" and an "item file" are each conceptually or logically defined as a single file, whereas in some opera- tional systems either may be stored as more than one physical file. Two other basic files generally appear in operational systems: a user file of complete records for users; and a transaction file that may be variously used for data collection, system update, system backup, and batch genera- tion of notices. We can now generalize common but not exclusive file definitions for the two design types. Absence systems usually contain three main logical files: 1) a user file; 2) an absence file that contains records only for charged or otherwise absent items; and 3) a transaction file. User identification number and complete item data (all the item data the system is to hold) are input at transaction time to create charge records. These data are typically collected from machine-readable sources such as punched cards or magnetic strips; the surveyed systems use punched cards. Time data, such as charge date or due date, and circumstantial data, such as charging location, may also be collected. During batch processing, user records are accessed by identifica- tion number to obtain name, address, and so forth. Examples of absence systems are found at West Sussex County Library (3,4,5 ), Illinois State Library ( 6,7 ), Midwestern University library ( 8,9,10), Queen's University library ( 11,12,13,14,15), Northwestern University library ( 16), and Buck- nell University library ( 17). Item systems are characterized by three or four major files: 1) a user file; 2) an item file of bibliographic records for all library volumes or titles, or for as many as machine records can feasibly be created and stored; 3) a transaction file that may be used for update of the item file, data collection and analysis, and perhaps notice generation; and optionally 4) an absence file of records for circulating items, if transaction data for them are more efficiently kept here than in the item file. Records in both the user and item files contain either full data or at least enough data to address messages to users and to adequately describe items. If an absence file is used in an item system, it may copy bibliographic data from the item file, or the two files may be linked to avoid data redundancy. Item systems are in operation at Bell Laboratories library ( 18,19 ), Eastern Illinois University library ( 20), Ohio State University libraries ( 21,22 ), and the Technical Library of the Manned Spacecraft Center, Houston ( 23). (The Manned Spacecraft Center library, alone among the systems we have surveyed, does not have a user file. Instead, user's last name, initials, and address code are input at charge time.) Since the basic distinction between absence and item systems is whether descriptive data for an item are machine-held prior to its charge time, item file records are limited primarily by the costs of conversion and machine storag~, but can have full, even MARC-like formats; whereas absence sys- 186 Journal of Library Automation Vol. 5/3 September, 1972 tern records are restricted by the quantity of data that can be input at charge time, i.e. by the capacities of source record coding and data transfer techniques. BASIC APPROACHES TO ON-LINE CIRCULATION SYSTEM DEVELOPMENT Three approaches to the design of on-line circulation systems have origin- ated from different notions of circulation control ( 24). First is the view that circulation control is a separate library function, or one with minimal rela- tionships to other library data processing. Exclusive requirements for user and item data are formulated; the format of bibliographic data and the design of data management capabilities are developed explicitly for circula- tion control, to the exclusion of other library data processing requirements. Absence systems have been developed with this approach, but thus far item systems have not. The second approach is to create a circulation system that is opera- tionally independent of other library data processing activities, but designed with a view toward the possibility of shared usage of bibliographic and other data, and of general library data processing facilities. Compatibility with other functions is provided, to aid later combination. Either system design can take this approach, but item systems can take better advantage of the integration of functions. A third approach is to add circulation control to other large file processes (such as a cataloging system), or to develop them concurrently. This follows an integrated view of library data processing that sees a circulation system operating with many of the same data and processing requirements as other library functions, all of which are handled by a general library data management system. The broad range of library data processing activities needs to be addressed, and an item system design is likely to be preferred. Two concepts underlie these approaches: 1 ) an integrated library system, and 2) a remotely accessible library catalog. An integrated view of the library is one of a total operating unit with a variety of operations that are logically interrelated and interconnected by their mutual requirements for data and processing ( 25) . The term "integrated system" usually implies a system in which centralized, minimally-redundant files undergo shared processing by different library functions. It is not clear exactly how the concept of a remotely accessible catalog should be defined, or exactly what the phrase means to various users. If we take it to mean the capability to access information from a given catalog at remote locations, then a variety of systems may qualify: e.g., telephone access to a group that performs manual card catalog lookups ; multiple locations of book catalogs or microform catalogs; and terminal access to an on-line, computerized catalog. The last is pertinent to our discussion. Two Types of Designs/McGEE 187 How an integrated system is implemented determines if its central biblio- graphic file is accessible from multiple remote locations; how an on-line, remotely accessible catalog is used determines if the system is integrated. Recently the Ohio State University libraries circulation system has come to be explicitly called a remote catalog access system. We have not yet found reports of any on-line system that has integrated all of technical services and circulation. The addition of circulation control to existing on-line cataloging systems has been planned for the Shawnee Mission system ( 26) and mentioned for the system at Western Kentucky University, Bowling Green ( 27). The Ohio College Library Center has not yet decided how it will handle circulation. As long as we define an integrated system on the basis of multiple uses of nonredundant data (among other characteristics), and a remote catalog access system upon physical accessibility, various systems may qualify as either or both. Recognizing these two concepts helps to show how the three approaches to on-line circulation system development capsulize broader trends in library systems. First, the redundancy of bibliographic data in operationally separate but conceptually related functions has characterized traditional manual systems, batch computer systems, and now some on-line systems. Second, the construction of individual, independent subsystems, while planning for their eventual combination, has been called an evolu- tionary approach to a total system, by De Gennaro ( 25). He has also defined a third, integrated approach, in which computerized systems are designed to take advantage of interrelationships among different subsystems, for example, by accepting one-time inputs of given data, and processing them for multiple library functions and outputs. These three trends have been widely experienced in changing relationships among traditional and innova- tive systems for acquisitions and cataloging. The pattern is repeating now in the evolution of on-line systems with respect to technical services processing and circulation control. Large on-line systems are emerging to perform acquisitions, cataloging, circulation control, and reference functions with shared processing facilities and data bases. TERMINAL DEVICES Most on-line circulation systems do not perform all functions on-line, although the following are possibilities: charge, discharge, inquiry, and other record creations and updates, such as reserving items in circulation, renewing loans, recording fines payments, and even converting files to machine-readable form. What do their input/output requirements imply for terminal devices? Inputs for charges may be minimal user and item identification numbers, or full borrower and book descriptions. Evidence of valid charges may be produced; printouts of user number, call number, short author and title, and due date are common. There are also special "security systems" that switch two-state devices in books (such as sensitized plates ~r labels) to record valid charge, but as yet no such system has been 188 Journal of Library Automation Vol. 5/3 September, 1972 coupled with on-line charging. Discharge inputs need only to match existing records; simple access keys such as call number or accession number are adequate. Querying, too, may be accomplished with simple search keys, or with bibliographic inputs such as author and title . All these functions can be performed with keyboard input and output display of alphanumeric data. Absence Systems Although not a requirement by our definition, most absence systems feature machine-readable user and item cards (the Queen's University system does not); the terminals used for charge and discharge must have card reading capabilities. Thus for on-line tasks charge stations need card readers and the ability to produce charge evidence, usually in hard copy; querying by bibliographic search keys requires keyboard input and output display of alphanumeric data; discharge stations need a card-reading capability and a display mechanism to identify reserved items; and file creation requires inputs of alphanumeric data in a character set that may range from minimum to full. There are at least two problems in choosing terminal equipment for absence systems. Fi1·st, any single terminal or configuration that satisfies input and output requirements for all basic functions may be too expensive to install at every library location of these activities. Second, the combina- tion of separate hardware units (such as keyboards, printers, and card readers) may require special hardware or software interfaces that prove difficult and expensive ( 16, 28). Alternatively, separate circulation stations with different terminal devices can be established for specific functions. This solution may introduce problems of hardware and personnel redun- dancy and backup. Difficulties with terminal devices explain in part why most systems perform not all but only selected functions on-line. Item Systems It is possible, in systems with user and item files, to access records by using search keys that are either keystroked or machine-read from cards. The use of machine-readable cards involves the same problems as those described for absence systems. However, choosing keyboard entry of acces- sion or call numbers eliminates card reading, and simplifies requirements so that keyboard devices with display capabilities can perform all basic func- tions. The feasibility of keyboarding the inputs at transaction time has been demonstrated by the systems at Queen's University library (an absence system without machine- readable cards), Ohio State University libraries, Bell Laboratories, and the Technical Library of the Manned Spacecr:1ft Center. A system based on a single terminal device that handles all real- time functions offers attractive simplifications for hardware and tele- processing software. The primary disadvantages also center on the device itself. Factors such as input error, transmission and printing rates, char- acter set, special function keys, noise, and cost have various implications Two Types of Designs/McGEE 189 for system design and operations. Obviously, in a system based on a single terminal device, the characteristics of that device are influential. Kilgour has stated that the two most important factors in configuring a computer system for an item file design are, first, the nature of secondary memory, and second, the kind of terminal device to be employed ( 29). The need to quickly access large stores of data is basic. As for keyboard devices, one often finds that typewriter terminals may require far more computation of a central computer than do cathode ray tube terminals, because many CRT systems have substantial computing power of their own, giving the effect of a satellite computer. This can be important for systems that will run on time-shared machines, or transmit data over long distances. The problems we have described for circulation terminals can be over- come; appropriate devices can be built. Too many library systems have been designed around unsuitable hardware; there has been little choice but to develop circulation systems (both on-line and batch) with data collection devices designed for industrial applications. Their influence-frequently bad-is fundamental to the nature of resulting systems. In fairness, suppliers need both direction and marketing potential. The deeper fault is with librarians, who have inadequately documented requirements and not proven the existence of a market. The integrated approach to library automation ultimately visualizes all library functions using a single set each of bibliographic, user, and other kinds of records, although different pieces of data for different purposes. Similarly, one can say that different sets of terminal requirements arise from the different input/ output specifications among library tasks and not so much from the nature of bibliographic and other data. As functional require- ments of different activities (e.g., acquisitions, cataloging, and circulation) overlap, the opportunity to use identical or similar terminals in a variety of library processes is enhanced. Extending the integrated approach to library- related hardware fits well with the concepts of modular hardware design and add-on features. Take, for example, a basic keyboard/display screen terminal to which modules can be added to read book and borrower identi- fication and to produce hardcopy printout. TRANSACTION EVIDENCE A variety of transactions may occur between a library and its users: charging and discharging books, placing reserves on circulating material, paying fines, etc. Evidence may be provided to verify transaction accuracy and to furnish receipts for users. This evidence can be in various formats: it might be a hardcopy record (or worksheet) of transaction inputs, or a printout or screen display of system responses. Printed charge evidence is a familiar example, and is sometimes used for inspection of items that library users carry from the building. Two kinds of charge evidence may be defined ( 2). Simple evidence contains no more user and item data than are input at transaction time. Complex evidence contains user or item data other 190 ]oumal of Library Automation Vol. 5/3 September, 1972 than charge time input, and requires the system to extract data from machine-held file( s). Printed evidence typically contains an item due date that may be calculated from either user or item criteria, or both; or directly specified at the time of charge. Let us look at the implications of printed charge evidence for the two system types. Absence Systems In most absence systems user identification number and full item data are transferred into the system from machine-readable cards at charge time. There are various ways of printing simple transaction evidence; the follow- ing are illustrative. One technique is to transmit data directly to a com- puter that formats them for output, calculates a due date, and returns them to a printing device. Another method is to process source record data with a terminal system that can buffer and format them, select a Juc date, and output the evidence on a printer. Shifting functions from the computer to a terminal system may simplify teleprocessing software, save time at the central processing unit, and permit nearly normal charge operations during computer downtime. If more elaborate user data than identification number are required, there are two obvious solutions. Central user records may he accessed to provide complex evidence, possibly increasing central processing unit time and response time. Or, user cards (such as magnetically encoded ones ) that contain fuller data could be employed with a terminal system that handles them independently of the central computer. Item Systems In item systems as in absence systems it is possible to use machine- readable cards, with the same implications for printing charge evidence. However, if 1) user and item numbers are keystroked, 2) these are con- sidered sufficient borrower and book information, and 3) decision rules for loan periods are simple, then little or no computer response is required for charging. Due date may be returned and printed to signal completed transaction, or predated date due slips may be used. Alternatively, special terminal features may be added to select and print a due date. This compli- cates otherwise simple terminal requirements. Sophistications such as status checks on the borrower (e.g., Any outstanding fines?) and item (e.g., Is it reserved for another user?) will of course require more extensive processing and responses. If charge time inputs are indeed keystroked, user and item numbers with check digits are desirable, to minimize the effects of input error. For complex evidence response time is important, expecially if terminals are typewriter-like devices. The time required is determined by the sources of response data, their access times, how much data must be transmitted, and the transmission and terminal display rates. Through careful design the time required to obtain charge evidence and complete the transaction can Two Types of Designsj.McGEE 191 be minimized. For example, if the user number carries a code for borrower class, then a due date can be quickly selected and printed, while the item file is accessed for needed author /title data. It is clear that in an item system containing a user file, only very simple inputs are required to record a charge transaction. The additional require- ments for charge evidence, status checks on - user and item, and so forth determine how elaborate and slow system responses may become. AVAILABILITY, HOLDINGS, AND ABSENCE INFORMATION One may take the view that a library should provide the following kinds of responses to users. If a title is requested, library holdings for it should be given. If a specific item is wanted, either its absence or presumed location should be reported. If the item cannot be immediately provided, the library should determine its future availability and inform the user. The terms "availability," "holdings," and "absence information" have special meanings. The availability of a specific item to a library's users is mapped onto the universe of items by the library's acquisitions, cataloging, circulation control, and interlibrary borrowing functions. Availability information obtains from all these sources, but particularly from the public catalog of library holdings. Absence information, in contrast, corresponds only to a subset of library holdings-it tells the locations of library-owned items when they are absent from the locations indicated by the catalog. Absence information therefore corresponds to a subset of holdings information; holdings information is a subset of availability information. In the context of our discussion an absence system provides full absence information and only partial holdings and availability information. An item system can provide full holdings and absence information, but only partial availability information, since items not owned may be ordered, or borrowed from another library. Such con- siderations strengthen the argument that circulation control shares a func- tional unity with other library processes, and should therefore be considered as one of several integrated functions. The provision of absence and availability information is the essence of circulation system querying requirements. Figure 1 shows that different query keys access different subsets of availability information. Note the wider utility of some keys than for just circulation control. Absence Systems In on-line circulation systems built around an absence file, the data representing each physical item may range from a simple accession or item call number (as in the Queen's University and Northwestern University systems) to larger records containing as much data as may be stored and transferred with a machine-readable card (for example, a Hollerith-punched book card). If availability information is to be obtained from a library's public catalog, and not from the circulation system, then access to the file of absence information may be with any key shown in the catalog records: 192 Journal of Library Automation Vol. 5/3 September, 1972 e.g., author and title, title and author, ca11 number, accession number. Only the simpler keys, item call number and accession number, have been used in absence systems developed so far. Consequently their requirements for file organization and access software are minimal. In most systems these keys permit exact matches to only single records, but in the Northwestern University system a call number query may cause display of a set of related records ( 16). Item Systems The query function in item systems is bound by different constraints than those constraints of absence systems. The amount of bibliographic data is not restricted by the storage capacities of machine-readable cards, trans- action response time, or the transfer rates of charge-time inputs. However, the following questions arise: How many and which records from the li- brary's data base (e.g., its shelflist) must be converted to provide a sufficient item file? For each record converted, how much and which data are re- quired? What functions shall such a data base ultimately support? What kinds of absence and availability information will be provided? These deserve special discussion before we consider querying in item systems. ITEM SYSTEM BIBLIOGRAPHIC DATA How much data are required for item file records? In an integrated system full records are ultimately produced by the cataloging process. Should one use full, variable-length, MARC-like records for circulation control? The conversion, on-line storage, management, and access of a large file of full bibliographic records are expensive propositions. One may be compelled toward a lesser effort. Under much of the popular data manage- ment software it is easier to organize and store fixed length records than variable length records with different combinations of fixed and variable length fields. The files of current item systems hold less-than-full biblio- graphic records: Bell Laboratories library utilizes two basic fixed length formats of 155 and 188 characters ( 18,19); the Eastern Illinois item file consists of 124-byte records ( 20); although the Ohio State system contains variable length records, they are less-than-full bibliographically, averaging 103 bytes ( 22 ) ; the ~1anned Spacecraft Center system has fixed length records of 168 characters ( 23). If not a full, MARC-like record, then what? Two questions may be asked: How much data should be converted for each record? and: How much of these data should be put initially into an item file? If one believes a fully integrated system may eventually take over some public catalog functions, then traditional author-title-subject accesses must be maintained, at least until proven unneeded. The minimum genuinely useful set of bibliographic data elements needed for futuristic information retrieval from library catalogs has not been proven; the safe but expensive answer is to convert Univ e r se of all items Two Types of Designs/ McGEE 193 Access Keys .,..------:~}Standard bibliogr aphic / descr i ption l St andard but noncomp r ehensively --__ ::;;~ ~ppli~d.single-element unique A----- - /" 1dent1f1e r: e . g. , ISBN , SSN _,/ l Li brary- a ssigned ke ys s uch as item call number and accession numbe r No te ~means t he access key r etrieves all me mbers of a set ..-- me a ns the access ke y retrieves only so~e members of a set Fig. 1. Possible Access Ke ys to Sets of Availability Information full records. Initially, however, one might want no more data in an item file than are functionally justified. How much are a ctually needed ? The four existing item systems provide traditional information in new ways that have dramatically improved services to users. They answer sev- eral basic kinds of questions on-line: Does the library have book ? Is it available now? What books do I have charged out? Such queries can be answered by nonsubject, d escriptive bibliographic data, and by circula- tion status information that shows if items are absent, and when they may become available. For this an item file needs records only for items that are used, in contrast to a comprehensive on-line shelflist. Which records to include b ecomes a problem remarkably similar to deciding what books to put into low-access, compact storage , or to discard. The two university libraries with item systems chose comprehensive conversions: Eastern Illinois for 235,000 volumes ( 20 ), and Ohio State for 800,000 titles ( 22). What are the potential advantages to users of an item system? If one only wants to know what books are charged, an absence system will suffice. Both the penalties and promise of an item system lie in its bibliographic store-in the records it holds (scope ), and in the data these records con- tain (content ) . Unless real-time querying of an item file can substitute for at least some manual searches of the public catalog, and in an improved way, its bibliographic data offer no direct advantages to users of a circula- tion system; an item system will provide no direct circulation services that an absence system could not. Applying this as a test to the utility of a noncompreh ensive item file (a file of records only for items that are, or are likely to b e, in use ), we find perhaps the key question for development of item systems among libraries with very large catalogs: To what extent may 194 Journal of Library Automation Vol. 5/3 September, 1972 a noncomprehensive item file substitute for accesses to a comprehensive public catalog? Although related, this is not the same question as what proportion of a library's book stock circulates. This is a question of how the public catalog is used: by whom , and for what? Lipetz's study of the card catalog in Yale University's Sterling Memorial Library gives insight to at least that institution's catalog use (30). He found that 73 percent of the users attempting searches were looking for particular documents (known items ). Overall, users' approaches to catalog searches were: author, 62 percent; title, 28.5 percent; subject, 4.5 percent; and editor, 4 percent. This may encourage one to believe that an item file which is accessible by author and title can handle a significant portion of manual catalog lookups. If so, developers of item systems may want to consider strategies similar to the following. If it is shown that satisfactory author /title access can be provided by an item file, then perhaps a large library is justified in dividing its card catalog and retaining only the subject-access portion. The argument is that author/ title access can be provided by an item file of partial records containing nonsubject descriptive data, whereas the requirements for subject access involve still more data that are likely to change as subject descriptions do. However, if a manual card catalog for subjects were maintained , this would facilitate updates of subject headings, and at the same time permit the most efficient format and smallest set of machine-held item file records to be kept. Through the use of machine-held subject authority files, maintenance instructions and replacement heading cards could be computer-produced for update of the manual subject catalog. (Distribution of machine-readable subject headings is being considered by the Library of Congress MARC Development Office.) Reduction in the maintenance and use of a full manual author-title-subject catalog by library technical services departments could produce significant savings, aside from whatever direct improvements in access that machine files might provide. If the item file were noncomprehensive, or contained retrospective records only for those items that circulated, then author/title accesses would of course be limited to the contents of that file. This would require maintenance of full manual catalogs for noncirculating items. Two general alternatives to a comprehensive item file of full records come to mind. One is to utilize records as they are created by the cataloging process, complemented by partial-record conversion (conveniently, in- house) for only those retrospective items that circulate. Another is to create a special circulation-only item file of partial records. This kind of system would use an item file primarily as an alternative to machine-readable book cards. Absence and holdings querying would be supported, but not acquisitions or cataloging functions. A system like this, with an item file of partial records, may be the most reasonable answer for large research libraries ( 28). It should be able to give the same circulation services as an absence system, in addition to satisfying certain kinds of Two Types of Designs j.McGEE 195 public catalog searches. The simplifications for data conversion, data man- agement software, and terminals are worth special evaluation as a middle or simple approach to on-line circulation system development, with an item system design. ITEM SYSTEM QUERYING, BIBLIOGRAPHIC DATA STRUCTURE, AND FILE ORGANIZATION The querying capabilities featured by each item system differ somewhat, and are explained in part by differences in bibliographic data structure and file organization. The data and design of an information-providing system are fundamental to the kinds of services it can provide. A useful conceptual model is the traditional manual library system in which separate files are used for different functions: an in-process file for technical processing, a shelflist for the official holdings, a public catalog for reference, and a circulation file to control item absences. Among these the file for circulation contains less bibliographical data than the others, since even a single data element such as the call number can uniquely identify a physical item and relate it to a fuller description, such as a shelflist record. A circulation file of this nature is in effect a manual absence file, and serves no major purpose other than circulation control. vVere the processing requirements not im- practical, circulation status cou ld be more usefully recorded in public catalog records, in the manner of an item file. The Eastern Illinois University system has an indexed sequential item file organized by item call number plus accession number. It may be queried by this key to get an exact match to a single record, or by a classification number to get a file scan of corresponding records. Query by user number displays charges to the user. The Ohio State system has a read-only item file that is randomized by item call number, but it may also be accessed by an author/title key that consists of the first four characters of the main entry plus the first five characters of the first significant word or words of the title. The second five characters can be blanked to provide author-only access. The file is also accessible by an item record number that is assigned sequentially to new records entering the file. The Bell Laboratories system provides access by item number to its item file, and uses a set of twelve query codes to obtain status and other factual information on users and items. User number and item number are the query keys. The item file is also used to produce a book catalog that gives the item numbers by which queries can be made. The item file of the Manned Spacecraft Center library is sequentially organized by item number, and can also be queried by call number and user number. These systems demonstrate alternatives for bibliographic data structure and file organization and access methods that are summarized by the author in a separate work ( 1) and explained by the references for each system 196 Journal of Library Automation Vol. 5/3 September, 1972 ( 18,19,20,21,22,23). Briefly, the Eastern Illinois, Bell Laboratories, and Manned Spacecraft Center systems use a fixed-length item record structure, and charge data are written directly to item record fields that are defined for this purpose. The Ohio State system has a variable length , read-only item record. Transaction data are recorded in an absence file, and linked to the item file. In the Bell Laboratories system what is conceptually a single item file is actually two separately organized physical Rles with different record formats. Fixed-length book records are organized sequentially, and each contains fields for three loans and two reserves; all copies and volumes are repre- sented. Journal records arc organized by an indexed sequential method, and do not contain copy and volume data, which must be added at transac- tion time. In the Eastern Illinois and Manned Spacecraft Center systems the item file contains a separate record for each physical volume in the library. The Ohio State item file contains one record per title. Although it is difficult to tell without detailed programming knowledge of these systems, theBell Laboratories data structure seems to enable exact matches to single records for status queries (e.g., What is the status of title number ? What is the status of copy ? ) in ways that the Eastern Illinois and Ohio State systems can only accomplish through a terminal operator's interpretation of a displayed set of matching records. The Bell Laboratories system can therefore conduct queries of this nature with keyboard/printer terminals, whereas the Eastern Illinois and Ohio State systems require CRT devices to display large amounts of information. It can also ask what overnight loans are still out, possibly a function of its journal file's data structure. The software implications of these various capabilities will not be dis- cussed here. Suffice it to say that absence systems require simpler accesses and data management than do the kind of item systems we have discussed, and that as item files are designed to replace all or selected public catalog functions, their data management and user interface requirements become greater. SPECIAL ASPECTS OF THE CHARGE FUNCTION Two aspects of the charge function have special significance for on-line systems: patron self-charging, and a telephone and mail or delivery service. Among the on-line systems we surveyed, only the one at Northwestern University is reported to be self-charging. To have patron self-charging requires that charge transactions be simple and convenient. Data transfer methods that require little effort are therefore preferred, and the usc of machine-readable user and item identifications seems to be the best current choice. The Northwestern system uses Hollerith- punched user badges and book cards. Other methods of data entry such as magnetic card reading and optical scanning are often mentioned for circula- Two Types of Designs/McGEE 197 tion control, but as of December 1971 we know of none that has resulted in a practical terminal-based svstem for on-line charging. Two of the item systems promote a telephone and delivery service: the Bell Laboratories system and the Ohio State University system. In each system inquiries can be directed to operators, who may conduct on-line searches of library holdings and circulation information for specific items. The kinds of questions that can be asked are "Does the library have ___ ?" and "Is it charged?" We noted earlier that a catalog can he made "remotely accessible" in several ways: e.g., by a special group that performs manual card catalog lookups for telephoned requests, or by users' consulting multiple copies of book or microform catalogs. In principle, a variety of catalogs and circulation systems can be used together in a telephone inquiry system of this nature. For example, the Library of the Georgia Institute of Technology has recently implemented an "extended catalog access" and delivery service that is based on microfiche copies of its catalog at thirty-six campus locations, coupled with telephone inquiry to a manual circulation system ( 31). Readers look up wanted items and telephone the library to request them. The manual circulation file is checked: available items are charged for delivery, or reserves may be placed for items that are already loaned. Presumably, the currency of information and quickness of response times are better in an on-line circulation system than in any other type. An item system can furnish both holdings and absence information. An absence system needs to be coupled to another system to furnish holdings informa- tion: a requirement is that the holdings information must contain a key by which the corresponding absence records can be accessed. These are basic considerations in providing a telephone and delivery service. SYSTEM BACKUP The problems considered here derive from two conditions: unexpected system downtimes and scheduled periods when the system is not in opera- tion. At these times a system cannot execute on-line tasks . Two classes of backup problems are: 1) provision of service to users during the downtimes; and 2) updating system files to record downtime transactions. The latter are termed recovery problems. One way to backup the query function is to periodically print a list of circulating items. The frequency and ease of access (e.g., number of copies, their locations, telephone access to them ) of such a list can pose substantial problems. An alternative to scheduled printings is an arrangement for quick printouts of a frequently copied backup tape on a redundant computer system. The basic recovery problem is how to enter data into the system for transactions that took place during downtimes. Presumably, if unexpected do\\'ntimes are not inordinately long, discharges and other file updates may be postponed. This simplification is helpful, since transaction sequences 198 Journal of Librm·y Automation Vol. 5/3 September, 1972 among different kinds of updates can become quite complicated, e.g., dis- charges undo charges, and confusing the sequence causes problems: Al- though other kinds of system updates have their own special problems, the following paragraphs only briefly discuss the backup and recovery of charging activities. Absence Systems The provision of transaction evidence in off-line mode has already been suggested for absence systems that have the necessary terminal capabilities. Similarly, there are configurations which, in off-line mode, read user and item cards and produce machine-readable transaction records that can be read-in during post-downtime recovery procedures. The Northwestern University system has a special backup terminal for this purpose. The provision of automatic recovery facilities is an attractive feature. Alterna- tively, multiple part manual transaction records can be made for charges during downtimes. One part may serve as transaction evidence; the other can be used for manual input of recovery data, when the system is up again. Exactly how this is done depends upon other details of the particular system. Item Systems Since inputs of user and item identification numbers are sufficient to record charges in item systems, the recovery problem can be simpler than for absence systems. Typewriter-like terminals with card or paper tape punches or magnetic recorders can be used to create machine-readable recovery data. The requirements for transaction evidence may be crucial. Perhaps the solution to the worst case is the use of a two-part manual transaction form: one copy for transaction evidence, and the other, as above, for post-downtime recovery inputs. We can summarize three hardware solutions for transaction backup in either system type: 1) total system redundancy, 2) backup at the terminal level, and 3) a backup facility between terminal and computer. The cost of full system redundancy makes it unlikely. A facility to log transactions during downtimes is more feasible; there are several choices. One such alternative is to record transaction data off-line in machine-readable form at each data collection point: e.g., to punch paper-tape or cards. Another alternative is to record data from several terminals with a single device, such as a magnetic recorder, or a control unit that coordinates a multiterminal system. A third solution, a variation on the second, is a mini-computer which links terminals, and handles telecommunications with a larger machine that holds system files. This approach has been taken by Bucknell University. It affords more comprehensive backup than merely capturing transaction data. Other functions, such as checks for user validity and reserved items, can be performed on a relatively reliable mini-computer dedicated to circulation. Two Types of Designs/McGEE 199 CONCLUSION On-line library catalogs are now a reality, but not yet for the exotic information retrieval work once popularly projected. Instead, relatively straightforward accesses by author, title, and call number are supporting circulation, reference, and technical processing functions. The needs for better circulation systems and network processing of shared cataloging data have stimulated developments of large-scale operational (not experimental) systems around resident files of on-line bibliographic records. Developers have not waited for solutions to fundamental problems of automatic indexing and information retrieval ; they have put large bibliographic files on-line and provided relatively simple, multiple access keys. The advances that have been made are in methods of physical access to bibliographic records, not in the intellectual or subject access to information. No new information is being retrieved, but familiar processes are being performed in better ways. Improvements in the ease and time of accessing library files have dra- matically upgraded the library's responses for its own routine work and to the public in general. We are experiencing the first of a new generation of practical systems that perform traditional functions with on-line rather than manual files, with as much benefit as possible short of better subject access. The new systems are transcending the barriers to convenient use that have been imposed by the size, complexities, and awkwardness of large manual systems. Historically, it has been impractical to add circulation information to each record in the public catalog for an item. With on-line files of single records per item this is now possible. State-of-the-art computing affords multiple access keys to a record, instead of duplicating it for additional entries as in manual catalogs. How many and which keys are furnished largely determines the extent to which an on-line catalog can replace a traditional one. Difficult cost and technical problems explain the current approaches. Full requirements of a public catalog have been avoided; simpler files have been built to handle explicit processing functions. The advantages are simplified records and fewer access points. Full bibliographic records are variable length, often large, and sometimes eccentric-and therefore rela- tively expensive to handle in machine form. In principle the overhead for access is the same as for manual files: the more entries that are provided, the greater the storage, processing, and cost. Systems with simpler files than the public catalog have therefore been built. There have been no machine equivalents of large library catalogs; so we have studied manual ones to theorize ideal characteristics. In some cases this model may have supplied a misleading bias. Studies of the new on-line systems at work could possibly revise our notions of what is needed. The kinds of systems now emerging are answers for the foreseeable futur e. The tradition of separately organizing and managing public and technical ser- vices will be challenged by the integrated systems. Their centralized files , data handling, and access methods transcend functional boundaries which 200 Journal of Library Automation Vol. 5/3 September, 1972 grew between library tasks that used different but redundant manual files and evolved separate units and procedures to accomplish virtually the same basic data processing functions. The profession has yet to widely appreciate the new overview and managerial changes that are invited. Reaction to them may be projected as a fourth and perhaps painful trend. In sofar as no fully integrated systems have yet been developed, it is likely that as they emerge they will force substantial changes to traditional patterns of library organization and management. ACKNOWLEDGMENTS This work was supported by the University of Chicago Library Systems Development Office under CLR/NEH Grant No. E0-262-70-4658 from the Council on Library Resources and the National Endowment for the Humani- ties, for the d evelopment and operational testing of a library data manage- ment system. REFERENCES 1. Rob McGee, A LiteratU1'e Survey of Operational and Emerging On- Line Library Circulation Systems (University of Chicago Library Systems Development Office, Feb. 1972). Available as ERIC/CLIS ED 059 752. MF$0.65, HC$3.29. 2. , "Key Factors of Circulation System Analysis and Design," College and Research Libraries 33:127-140 ( Mar. 1972 ). 3. H. K. G. Bearman, "Library Computerisation in West Sussex," Program: News of Computers in British Libraries 2:53-58 (July 1968). 4. , "West Sussex County Library Computer Book Issuing System," Assistant Libra1·ian 61:200-202 ( Sept. 1968 ) . 5. RichardT. Kimber, "An Operational Computerised Circulation System with On-Line Interrogation Capability," Program: News of Computers in British Librm·ies 2 :75-80 (Oct. 1968 ) . 6. Homer V. Ruby, "Computerized Circulation at Illinois State Library," Illinois Libraries 50:159-162 ( Feb . 1968 ). 7. Robert E. Hamilton, "The Illinois State Library 'On-Line' Circulation Control System," in: Proceedings of the 1968 Clinic on Librm·y Appli- cations of Data Processing. (Urbana, Ill.: University of Illinois Grad- uate School of Library Science, 1969 ) p. 11-28. 8. IBM Corp., On-line Library Circulation Control Syste m, Moffet Li- brary, Midwestern University , Wichita Falls, T exas. Application bri ef K-20-0271-0. ( White Plains, N.Y.: IBM Corp., Data Processing Div. , 1968) 14 p . 9. Calvin J. Boyer and Jack Frost, "On-Lin e Circulation Control- Midwestern University Library's System Using an IBM 1401 Computer in a 'Time-Sharing' Mode," in: Proceedings of the 1969 Clinic on Two Types of Designs/McGEE 201 Library Applications of Data Processing. (Urbana, Ill.: University of Illinois Graduate School of Library Science, 1970) p. 135-145. 10. Charles D. Reineke and Calvin J. Boyer, "Automated Circulation System at Midwestern University," ALA Bulletin 63:1249-1254 (Oct. 1969). 11. Belfast, Queen's University, School of Library Studies, Study Group on the Library Applications of Computers, First Report of the Working Party (Belfast University, July 1965) 18 p. 12. Richard T. Kimber, "Studies at the Queen's University of Belfast on Real-Time Computer Control of Book Circulation," Journal of Docu- mentation 22:116-122 (June 1966) . 13. , "Conversational Circulation," Libri 17:131-141 ( 1967). 14. ___ ,"The Cost of an On-Line Circulation System," Program: News of Computers in British Libraries, 2:81-94 (Oct. 1968). 15. Ann H. Boyd and Philip E. J. Walden, "A Simplified On-Line Circu- lation System," Program: News of Compute1·s in Libraries 3:47-65 (July 1969). 16. Velma Veneziano and Joseph T. Paulukonis, "An On-Line, Real-Time Time Circulation System." [This documentation of the Northwestern University library system was made specially available to the author. A later version with the same title appears in LARC Reports 3:7-48 (Winter 1970-71)]. 17. H . Rivoire and M. Smith, Library Systems Automation Reports 1971- A-2, Bucknell Library On-Line Circulation System (BLOCS). Ellen Clarke Bertrand Library ( 15 Mar. 1971) 19 p. 18. R. A. Kennedy, "Bell Laboratories' Library Real-Time Loan System (BELLREL)," lOLA, 1:128-146 (June 1968). 19. , "Bell Laboratories' On-Line Circulation Control System: One Year's Experience," in : Proceedings of the 1969 Clinic on Library Applications of Data Processing. (Urbana, Ill.: University of Illinois Graduate School of Library Science, 1970) p. 14-30. 20. Paladugu V. Rao and B. Joseph Szerenyi, "Booth Library On-Line Circulation System (BLOC)," ]OLA, 4:86-102 (June 1971). 21. Richard H. Stanwood, "Monograph and Serial Circulation Control," A paper for the International Congress of Documentation, Buenos Aires, Sept. 21-24, 1970. National Council for Scientific and Technical Re- searcb, Buenos Aires ( 1970) 23 p. 22. IBM Corp., Data Processing Division, Functional Specifications: A Circulation System for the Ohio State University Libraries, Gaithers- burg, Maryland (November 26, 1969) various paginations. [This and other technical documentation were made specially available to the author. This is now available through ERIC/CLIS as: On-Line Remote Catalog Access and Circulation Control System. Part I: Functional Specifications. Part II: User's Manual. November 1969. 151 p. ED 050 792. MF $0.65, HC $4.00] 202 Journal of Library Automation Vol. 5/3 September, 1972 23. Edward E. Shumilak, An Online Interactive Book-Library-Management System. NASA Technical Note NASA TN D-7052. National Aeronautics . and Space Administration, Washington, D.C. ( March 1971 ) 40 p. [This document is available through the National Technical Information Service under Document Number N71-20526] 24. University of Chicago Library, A P1'0posal for the Development and Operational Testing of a Library Data Management System, Herman H. Fussier and Fred H. Harris, principal investigators. ( Chicago, Ill.: 1970) 44 p. 25. Richard De Gennaro, "The Development and Administration of Auto- mated Systems in Academic Libraries," ]OLA, 1:75-91 (Mar. 1968). 26. Ellen W. Miller and B. J. Hodges, "Shawnee Mission's On-Line Cata- loging System," ]OLA 4:13-26 (Mar. 1971). 27. Simon P. J. Chen, "On-Line and Real-time Cataloging," American Libraries 3:117-119 (Feb. 1972 ). 28. University of Chicago Library, Development of an Integrated, Com- puter-Based, Bibliographical Data System for a Large University Li- brary, Annual Report 1967/ 68. By Herman H . Fussier and Charles T. Payne. University of Chicago Library, Chicago, Illinois ( 1968 ) 17 p. + appendixes. 29. Frederick G. Kilgour, Letter to the author 23 November 1971. 30. Ben-Ami Lipetz, User Requirements in Identifying Desired Works in a Large Library, Final Report, Grant No. SAR/ OEG-1-71071140-4427, U.S. Department of Health, Education, and Welfare, Office of Educa- tion, Bureau of Research. (New Haven, Conn.: Yale University Library, June 1970) 73 p. + appendixes. 31. "Library Extends Catalog Access and New Delivery Service," [ 4 p.] A brochure issued by Price Gilbert Memorial Library, Georgia Institute of Technology, Atlanta, Georgia, 1972. 5740 ---- lib-s-mocs-kmc364-20140601053338 Two Types of Designs/ McGEE 203 BOOK REVIEWS The Proceedings of the International Conference on Training for Informa- tion Work, Rome, Italy , 15th-19th November 1971, edited by Georgette Lubock. Joint publication of the Italian National Information Institute, Rome and The International Federation for Documentation, the Hague; F.I.D. publ. 486; Sept. 1972, Rome, 510 p. Let's face it: there is something about any Proceedings that elicits a very personal reaction in many of us: "Here are papers that either, a) got their authors a trip to the conference city; b ) tell how we did good at our place; or c) unabashedly present H.B.I.'s- ( half baked ideas )." I personally like Proceedings that have many papers under category c); such papers make me think ( or laugh ). The great majority of papers in these Rome Pro- ceedings fall basically under category b), i.e.-'how we done it good,' and some quite obviously under a), i.e.-'have paper will travel'-well it was Rome, Italy, after all. However, there is a smattering of papers that fall under c), i.e.-H.B.I.'s. So for those interested in the topic, these Pro- ceedings offer among other things some food for speculative thought. For these other things let us start at the beginning. The contents consists of prefatory sections, one opening address, sixty-six papers, a set of twenty brief conclusions, three closing addresses, a summary of work at the conference, an author index, and a list of participants and authors' addresses. The papers are organized according to two major ses- sions: one on "Training of Information Specialists" (nine invited and forty- two submitted papers ) and another on "Training of Information Users" (six invited and nine submitted papers ). The larger number of papers on training of specialists vs. training of users probably represents a good assessment of real education interests in the field. The conference was truly international: authors came from four con- tinents, twenty countries and four international organizations. Most repre- sented were: Italy as host country with fifteen papers, USA with eight, Great Britain with seven, and France with six papers. The concern for information science education is indeed worldwide; how- ever, if the presented papers are any measure, such education is in big trouble, because one is left with the impression that information science education is in some kind of limbo: the bases, relations, and directions are muddled or nonexistent. But then isn't all contemporary higher education in big trouble, and in limbo? The conceptions of what information science education is all about differ so widely from paper to paper that the question of this difference in itself could be a subject of the next conference. It is my impression that the differences are due to a) widely disparate preconceptions of the nature of "information problems,'' and b) incompetence of a number of authors in relation to the subjects. Accomplishments in some other field or, even worse, 204 Journal of Library Automation Vol. 5/3 September, 1972 a high administrative title does not necessarily make for competence in information science education. The Proceedings offer a fascinating picture of information science educa- tion by countries and by various facets. It also offers frustration due to unbelievably unhygienic semantic conditions in the treatment of concepts, including a confusion from the outset of "training" and "education." The first business of the field should be toward clearing its own semantic pollu- tion; such a conclusion can be derived even after a most cursory examination of the papers. My own choices for the three most interesting papers are: -V. Slamecka and P. Zunde, "Science and Information: Some Implica- tions for the Education of Scientists;" (USA) -S. J. Malan, "The Implications for South African Education in Library Science in the Light of Developments in Information Science;" (South Africa) -W. Kunz and H. W. J. Rittel, "An Educational System for the Infor- mation Sciences." (Germany) The editing of the PToceedings is exemplary; the editors and conference organizers worked hard and conscientiously. The Proceedings also provide the best single source published so far from which one could gain a wide international overview not only of information science education but also of information science itself, including implicitly the problems the field faces. In this lies the main worth of the Proceedings. T efko Saracevic Computer Processing of Library Files at Durham Unive1·sity; An Ordering and Cataloging Facility for a Small Collection Using an IBM 360/ 67 Ma- chine. By R. N. Oddy. Durham, England: University Library, 1971. 202p. £1.75. The task of the book is to guide the reader in the use of the LFP (Library File Processing) System developed by the Durham University library. The LFP System orders items and prints book catalogs in various sequences for a small collection of items with the aid of an electronic digital computer. The system is batch with card input and printed output; the programs are written in PL/1. "The LFP System was designed to be flexible and easy to operate for small files , and is less suitable for files larger than 10,000 items because there are then other problems which it does not attempt to solve." (p. 10). The book fulfills its assigned task well; it is an excellent example of explanations and instructions for the personnel charged with the day to day operations for the particular system described. The book includes excellent introductory chapters on job control language, how computers operate, file maintenance, etc. Outside of the Durham University library, however, the book has little use except as a model of a well done operations guide. Kenneth ]. Bierman 5741 ---- lib-s-mocs-kmc364-20140601053406 CONTENT DESIGNATORS FOR MACHINE-READABLE RECORDS: A WORKING PAPER Henriette D. A VRAM and Kay D. GUILES: MARC Development Office, Library of Congress, Washington, D.C. Under the auspices of the International Federation of Library Association's Committees on Cataloging and Mechanization,. an International Working Group on Content Designators was formed to attempt to resolve the differ- ences in the content designators assigned by national agencies to their machine-readable bibliographic records. The members of the IFLA Work- ing Group are: Henriette D. Avram, Chairman, MARC Development Office, Library of Congress; Kay D. Guiles, Secretary, MARC Development Office, Library of Congress; Edwin Buchinski, Research and Planning Branch, National Library of Canada; Marc Chauveinc, Bibliotheque Interuniver- sitaire de Grenoble, Section Science, Domaine Universitaire, France; Rich- ard Coward, British Library Planning Secretariat, Department of Education & Science, United Kingdom; R. Erezepky, Deutsche Bibliothek, German Federal Republic; f. Poncet, Bibliotheque Nationale. Paris, France; Mogens Weitemeyer, Det Kongelige Bibliotek, Denmark. All working papers emanating from the IFLA Working Group will be submitted to the International Standards Organization Technical Com- mittee 46, Subcommittee 4, Working Group on Content Designators. Prior to any attempt to standardize the content designators for the inter- national exchange of bibliographic data in machine-readable form, it is necessary to agree on certain basic points from which all future work will be derived. This first working paper is a statement of: 1) the obstacles that presently exist which prevent the effective international interchange of bibliographic data in machine-readable form; 2) the scope of concern for the IFLA Working Group; and 3) the definition of terms included in the broader term "content designators." If an international standard format can be derived, it would greatly facilitate the use in this country of machine-readable bibliographic records issued by other national agencies. It should also contribute significantly to the expansion of MARC to other languages by the Library of Congress. At 208 Journal of Library Automation Vol. 5/4 December, 1972 present, the assignment of content designators of most national systems is so varied that tailor-made programs must be written to translate each agency's records into the United States MARC format. The international communications format might become the common denominator between all countries, each national system maintaining its own national version. INTRODUCTION The International Organization for Standardization standard for biblio- graphic information interchange on magnetic tape ( 1) has recently been adopted, following on the adoption of the American National Standard (2). These events, along with the implementation of the United States and the United Kingdom MARC projects and similar projects in other countries, have emphasized the importance of the international exchange of bibliogra- phic data in machine-readable form. There are many problems to be resolved before we can approach a truly universal bibliographic system. Many of these have been described in an article by Dr. Franz Kaltwasser ( 3) . Basic to the exchange of bibliographic data is the requirement for an interchange format which can be used to transmit records representing the bibliographic descriptions of different forms of material (such as records for books, serials, and films) and related records (such as authority records for authors and for subject terms). A format for machine-readable bibliographic records is composed of the following three elements: 1. The structure of the record, which is the physical representation of the information on the machine-readable medium. 2. The content designators (tags, indicators, and data element identifiers ( 4) ) for the record, which are means of identifying data elements or providing additional information about a data element. 3. The content of the record, which is the data itself, i.e., the author's name, title, etc. OBSTACLES The structure of the record, as described in ANSI Z39.2-1971 and in the ISO standard on bibliographic information interchange on magnetic tape, has been fairly well accepted by the international bibliographic community. However, events have shown that as the different agencies examine their requirements and establish the content of their machine-readable records, the content and the content designators so established are not the same across all systems. This lack of uniformity is the result of at least four principal factors: 1. The different functions performed by various bibliographic agencies. Bibliographic services are provided by many types of organizations issuing a variety of products. These products are dissimilar because the Content D esignators/ AVRAM and GUILES 209 uses made of them vary, reflecting dissimilarities in the principal functions of the agencies involved. The main products of some of the different biblio- graphic services are briefly described as follows: Catalogs serve to index the collections of individual libraries by author, title, subject, and series. To enable a user to find a physical volume rather than merely a bibliographic reference, catalogs also provide a location code. A unique form of entry for each name or topical heading used as an access point is maintained by means of authority files. The various access points serve to bring together works by the same author, works with the same title, works on the same subject, and works within the same series. A unique bibliographic description of each item makes it possible to dis- tinguish between different works with the same title, and different editions of the same work. Natio1Ull bibliographies provide an awareness service for those items published within a country during a given time period. A national biblio- graphy is not a catalog, since it is not based on or limited to any single collection, nor is it concerned with providing access to the physical item itself. Abstracting and indexing services are principally concerned with index- ing technical report literature and individual articles from journals and composite works. Because these services generally index more specialized materials and are aimed at the specialist in a particular discipline, more complete indexing by means of a relatively large number of very specific subject terms is the rule. Like the national bibliography, the abstracting and indexing service is not concerned with a single collection or, in most cases, with providing access to the item on the shelf. 2. The lack of internationally accepted cataloging practices. The Paris Conference of 1961, which resulted in the Paris Principles, set the framework for an international cataloging code. Following the confer- ence, progress in standardization was evident in the work begun on the formulation of cataloging codes embodying, in varying degrees, the Paris Principles. One such code is the Anglo-American Cataloging Rules (AACR) ( 5). However, we are concerned with the present, and the differences that exist in the cataloging codes of various countries do create differences in the content that may affect content designation of machine-readable bibliographic records. The differences between cataloging rules practiced in the library com- munity and in the information community ( 6) are even more prominent. In the United States, these differences are clearly seen in a comparison between AACR and the COSATI rules (7). Even more significant is the fact that in preparing entries for abstracting and indexing services, it is common practice to use a name as it appears on the document, without attempting to distinguish it from names of other persons so as to bring together the works of a single author. In addition, cataloging practice in the information community often requires inclusion of data elements that 210 Journal of Librm·y Automation Vol. 5/4 December, 1972 are not used in the library community (e.g., organizational affiliation). It is obvious that these differences in practice are serious obstacles to achieving agreement on details of content designation for machine-readable records used in each environment. 3. Lack of agreement on organization of data content in machine- readable records in different bibliographic communities. Bibliographic data can be organized in machine-readable form in many different ways. For example, one approach could be the grouping of data elements by bibliographic function, such as main entry, title, etc.; another approach could be the grouping together of information by type, such as all personal names, all corporate names, etc. There are pros and cons associated with each of these groupings. This difference in organization exists in some instances between the library community and the information community. For the present discussion, it is not appropriate to analyze the relative merits between the two points of view. It must be emphasized, however, that there is no optimum organization, and that a variety of users will use the data in a variety of ways. It is certainly true that any given system can define, upon agreement of its members, a particular use to be made of the data exchanged and, in this case, perhaps an optimum data organization can be defined ("perhaps" is used because hardware is another variable that comes into play). 4. Lack of agreement as to the functions of content designators. There is a lack of agreement as to the functions of content designators, as well as a misunderstanding, in some instances, of the rationale for the assignment of certain of them to specific data elements. The lack of agree- ment as to the functions of content designators is clearly seen when one examines the use of the data element identifiers in the different national formats. For example, in some cases the data element identifier is assigned to the data element according to its value in a collation sequence (e.g., a is smaller than b, b is smaller than c). The result is a prescribed order, from the smallest value to the largest, for selecting the data elements to build a sort key for file arrangement. In other systems, the data element identifier assigned to a data element is for the unique identification of that data element. There is no prescribed ordering built into the data element identi- fiers; the identification of the data elements allows them to be selected according to the requirements of the user to build a sort key for file arrangement. Data element identifiers in some cases are tag dependent, i.e., they identify the same data elements consistently when used with a par- ticular tag and data field, regardless of the combination of data elements present in the data field for any particular record. In other cases, the data element identifiers are tag, indicator, and data dependent, i.e. , the meaning of the data element identifiers changes and the data element identifiers are assigned to different data elements, depending upon the combination of data elements occurring in a data field for a particular record. Content Designat01·s j AVRAM and GUILES 211 SCOPE The scope of responsibility for the IFLA Working Group is to investigate the present assignment of content designators for the purpose of determin- ing those areas in which there is uniformity of assignment and those areas in which there is not uniformity. Once this has been done, the Working Group's next task is to explore how best these differences can be accommo- dated so as to arrive at a standard for the international interchange of bibliographic data. Within that scope, the Working Group will first be concerned with the requirements for the international library community, i.e., libraries and national bibliographies. The magnitude of this assignment is such that it appears unwise to impose the additional problems of the needs of the information community concurrently. If the attempt is made to do so, and the result of the effort is failure, it will not be clear whether we failed because the task was too difficult or whether it is not possible to merge two communities with significant variation throughout their systems. On the other hand, if only the library community is approached at this time, the result of the effort can be success; but if the result is failure, at least one factor will be clear if only in a negative sense: there will be no lingering question as to whether the attempt might have succeeded had the problems of only one community been addressed at one time. In summary, it may be stated that our attempt to standardize content designators within the library community will be complicated by: 1) the lack of an international cataloging code; 2) the dissimilarities in the pro- ducts of various agencies created by the different functions performed by those agencies; and 3) the lack of an agreement on the functions of the content designators themselves. The lack of agreement on an international cataloging code will have an impact on our work, but is an area which is out of scope for the Working Group, and therefore can be considered a variable over which there is no control. The dissimilarities in the functions of the different bibliographic services are also a given. However, since it was possible to work around these differences in the formulation of the Inter- national Standard Bibliographic Description, it may be possible to do so for the standardization of content designators. Therefore, within the two variables given above, our emphasis should be placed on attempting to resolve the lack of agreement on the functions of content designators and then we can proceed to attempt to standardize the assignment of tags, indicators, and data element identifiers. The present paper concentrates on the substance of the problem, namely, a statement of the definition of tags, indicators and data element identifiers and their functions, i.e., the information they are intended to provide to a system processing bibliographic data. The concept of a SUPERMARC has been discussed in the literature ( 8, 9) as an international system for exchange, leaving the various national systems as they now exist. Each country would have an agency that would 2I2 Journal of Library Automation Vol. 5/4 December, 1972 translate its own machine-readable record into that of the SUPERMARC system; likewise, each agency would translate the SUPERMARC record from national bibliographic systems into its own format for processing within the country concerned. At the international level, there would be only one record format. This concept has the theoretical advantage of eliminating the difficulties inherent in seeking agreement internationally. However, what has not been addressed is the problem inherent in this concept, namely, the problem associated with any switching language. This may be illustrated in the following manner. Consider the case of a national agency (called System 1) whose format is not detailed in regard to content and/or content designation. When System I translates to SUPERMARC, the result will be a SUPERMARC record, but it will be a SUPERMARC record still only defined at the level of detail of the limited record of System I. This will be true regardless of the level of detail at which SUPERMARC is originally defined. Likewise, when a national agency (called System 2) accepts records from System I via SUPERMARC and translates the SUPERMARC records into its own format, the resulting records will be the limited records of System I, regardless of the detail of System 2's local records. This may be schematically represented as follows: System 1 SUPERMARC =No more detail than (little detail) (great detail) System I SUPERMARC System 2 =No more detail than (record from (great detail) SUPERMARC record System 1) from System 1 The result of this analysis suggests that systems with formats of less detail than that of SUPERMARC must permanently upgrade their national formats to the level of detail of SUPERMARC while systems with formats more detailed than SUPERMARC must be prepared to accept the fact that records from other countries will probably require significant modification. Therefore, although national variation is allowed in a SUPERMARC system, the international community still faces all the problems of international agreement, i.e., arriving at an acceptable level of content designation for SUPERMARC. CONTENT DESIGNATORS Bibliographic records in machine-readable form permit the manipulation of data and allow greater flexibility for the creation of a variety of products. The full potential of machine-readable files has not been exploited to date, but based on experience and the projection of this experience into the future, it may be said that the variety of uses of machine-readable cata- loging data will be limited only by the imagination of the user. Among the possible products are printed catalog cards, book catalogs, special bibli- ographies, special indexes, book preparation materials, CRT display of cataloging information, management statistics (analysis of data by type Content Designators/ AVRAM and GUILES 213 of material, subject, language, date, or other parameters), etc. All of the above are possible in a wide variety of output formats. In order to produce these various tools, there are four basic operations ( 10) which are performed on the data. 1. Store-the storage operation is the internal (to the computer) man- agement of the data, i.e., how files are organized, the type of accessing technique ( s) used, and the data elements (e.g., author, title) selected as keys to the complete bibliographic record. 2. Retrieve-the retrieval operation is used here in its broadest sense, to cover the following kinds of retrieval: the retrieval of a single element from a record; the retrieval of a known item, such as the selection of a record by unique number or author and title; the retrieval of a category of records, such as those for all French lan- guage monographs on a particular subject with an imprint date of 1968 or later; the retrieval of all bibliographic records for a particular form of material, e.g., serials. (The latter retrieval capability allows segmentation of files not only for display purposes but also for the implementation of certain file organization techniques. ) 3. Arrange-the arrange operation puts information in a sequence that is most useful for the user of the product, i.e., an alphabetic sequence or a systematic arrangement. 4. Display-the display operation as used in this context implies for- matting, the purpose of the operation being to make the information human-readable, e.g., display on a CRT, computer printout, and photocomposed output. For example, to display a particular catalog record on a CRT device, the record must be retrieved from the data base by a known number or other means of access and formatted for display; or, to prepare a special bibli- ography, all records satisfying a particular search argument are retrieved from the data base, arranged in some predefined order, formatted and printed. The storage operation is implicit in the examples. In order to perform these four basic operations through machine manipu- lation, content designators are assigned to the data content of the record. Therefore, it may be stated that the function of content designators is to provide the means for the user to store, retrieve, arrange, and display information in a variety of ways to suit his needs. There are three types of content designators currently in use: tags, indicators, and data element identifiers. For the purposes of standardization, agreement must not only be reached on the definition of those three ele- ments but also on other basic issues. The definitions for the elements are given below, as well as a general discussion of some of the decisions that must be made concerning each of the elements, prior to attempting to achieve standardization. 1. A tag is a series of characters used to identify or name the main content of an associated data field ( 11). The designation of main 214 Journal of Library Automation Vol. 5/4 December, 1972 content does not require that a data field contain all possible data elements (units of information) all the time. For example, the imprint may be defined as a data field containing the data elements, place, publisher, date of publication, printer, address of printer. The tag for the data field called imprint would be the same if only a partial set of the data elements existed for any single occurrence of the data field in a bibliographic record. Should the method of assigning tags be simply to assign a unique series of characters to a data field whereby the characters have no meaning other than to name the main content of the data field? Or is it desirable to give values to the characters making up the tag? In the latter case, a tag may identify a data field both by function and type of entry, thus allowing greater flexibility in internal organization of the data as well as its formatting for output. 2. An indicator is a character associated with a tag to supply additional information about the data field or parameters for the processing of the data field. Indicators are tag dependent because they provide both descriptive and processing information about a data field. Should alphabetic characters as well as numeric characters be assigned to indicators? Should the character B (blank) always mean a null condi- tion and the character 0 (zero) have a value or a meaning? Should indicators with the same values and meanings be used for different data fields and their associated tags where the situation warrants this equality? For example, a personal name may be a main entry, an added entry, or a subject entry. If it is deemed desirable to further describe the type of personal name such as forename, single surname, multiple surname, or name of family, the indicators set for each of the data fields mentioned above would have the same value and the same meaning. This technique has the advantage of simplifying machine coding for the processing of different functional fields containing the same types of entries. 3. A data element identifier is a code consisting of one or more characters used to identify individual data elements within a data field. The data element identifier precedes the data element which it identifies ( 12). Should data element identifiers be given a value, i.e., file arrangement value, other than the identification of the data element? Should data element identifiers be tag dependent only or tag, indicator, and data dependent? Should the same data element identifiers be assigned, so far as is possible, to the same data element regardless of the field in which the data element occurs? Should data element identifiers be restricted to alphabetic characters or should they be expanded to allow the use of numerics and symbols? The assignment of a filing value to a data element identifier is intended to minimize the effort required to create software for filing. However, assigning filing values to data element identifiers results in identifiers that are tag, indicator, and data dependent. On the other hand, without assigning Content Designators/ AVRAM and GUILES 215 filing values to the data element identifiers and using computer filing algorithms, the system can avoid data dependent codes, thus ensuring maximum consistency across all fields. For example, the use of the same data element identifier assigned to a title wherever a title appears in the record allows the flexibility of selecting all titles by data element identifier. Furthermore, tag, indicator, and data dependent data element identifiers create additional complexity in the editing procedure ( 13). Although fixed fields are not content designators, they do take on similar characteristics as to function, i.e., to provide the means for the user to store, retrieve, arrange, and display information in a variety of ways to suit his needs. Therefore, they should be considered by the Working Group along with the content designators. A fixed field is one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the field from occurrence to occurrence. The contents of the fixed field can actually be data content, e.g., date of imprint; or a code representing data content, e.g., type of illustration; or a code representing information about the record, e.g., language of the record; or data concerned with the processing of the record, e.g., date entered on file. Here again, certain basic issues must be resolved. Should the character b (blank) be used to signify a null condition, e.g., in a record without any type of illustration b (blank) would be used? Should the codes that repre- sent more than two possible conditions be alphabetic or numeric? Should the characters 1 (one) and 0 (zero) be used to indicate an on-off condition, e.g., a book contains an index to its own contents ( 1) or it does not ( 0 )? It is important to keep in mind the eventual necessity of correlating the content designators and fixed fields for all the formats defined for different forms of material (books, serials, maps, films, music, etc.) . By adhering as much as possible to the same content designators and fixed fields, the processing of different forms of material will be facilitated in terms of the software required to perform a particular process and to combine all forms of material in a single product, such as a book catalog. REFERENCES 1. International Organization for Standardization. Bibliographic Informa- tion Interchange-Format for Magnetic Tape Recording. Draft inter- national standard ISO/DIS 2709. Technical Committee ISO/TC 46 Secretariat (Germany), 1972. 2. American National Standards Institute. American National Standard for Bibliographic Information Interchange on Magnetic Tape. ANSI Z39.2-1971. New York: American National Standards Institute, 1971. 3. Franz Georg Kaltwasser, "The Quest for Universal Bibliographical Control," Unesco Bulletin for Libraries, 25:252-259 (Sept./Oct. 1971). 4. Data element identifiers have more commonly been referred to as subfield codes. 216 Journal of Library Automation Vol. 5/ 4 December, 1972 5. Anglo-American Cataloging Rules. Prepared by the American Library Association ... North American Text. Chicago: American Library Association, 1967. 6. The term bibliographic services has been used to include all agencies concerned with bibliographic products. For this paper such agencies have been further subdivided into two communities: the library com- munity, defined as including libraries and national bibliographies; and the information community, defined as including the abstracting and indexing services. This broad definition has been used for the sake of simplicity. 7. Committee on Scientific and Technical Information. Standard for Descriptive Cataloging of Government Scientific and Technical Re- ports. Washington: Committee on Scientific and Technical Informa- tion, 1966. 8. R. E. Coward, "MARC: National and International Cooperation," in International Seminar on the MARC Format and the Exchange of Bibliographic Data in Machine-Readable Form, Berlin, 1971: The Exchange of Bibliographic Data and the MARC Format, ( Mi.inchen- Pullach, 1972), 17-23. 9. Roderick M. Duchesne, "MARC and SUPERMARC," in International Seminar on the MARC Format ... , p. 37-56. 10. These basic operations are not used in this context to mean basic machine operations such as add, subtract, multiply, and divide. 11. A data field is a variable length field containing bibliographic or other data not intended to supply parameters to the processing of the bibliographic record, i.e., content data only. 12. There are in existence formats in which the data element identifier is a single character, i.e., a delimiter. In this case, there is no explicit identification function built into the data element identifier. If, in the particular data field, the data elements are all of the same type, such as a multiname data field, then the meaning of the delimiter is implicit. 13. Editing is used in this context to mean the human or machine assign- ment of content designators. 5742 ---- lib-s-mocs-kmc364-20140601053517 REGIONAL NUMERICAL UNION CATALOG ON COMPUTER OUTPUT MICROFICHE William E. McGRATH: Director of Libraries; and Donald SIMON: Systems Analyst and Computer Programmer, University of Southwestern Louisiana Library, Lafayette, Louisiana. A union catalog of 1,100,000 books on computer output microfiche (COM) in twenty-one Louisiana libraries is described. The catalog, called LNR for Louisiana Numerical Register, consists not of bibliographic information, but primarily of the LC card number and letter codes for the libraries holding the book. The computer programs, the data bank, and output are described. The programs provide the capability for listing over two million entries. Also described are the statistical tabulations which are a by-product of the system and which provide a rich source for analysis. Twenty-one Louisiana libraries have produced on Computer Output Microfiche (COM) a Union Catalog containing locations for 1,100,000 books. About 150,000 of these are current acquisitions (books acquired in the last two years ) ; the rest are volumes in the retrospective collections of ten of the twenty-one libraries. The Numerical Register of Books in Louis- iana Libraries, as the catalog is now entitled, is the second step toward what is hoped will be a comprehensive current and retrospective list of over two million volumes, the estimated holdings of the participating libraries. The first was a conventionally printed Register of 550,000 books, issued in 1971 and distributed to fifty Louisiana libraries. The new Register is not a bibliography. It includes no bibliographic information. It is a location device for books whose bibliographic informa- tion is already known and includes nothing that is not also listed by the Library of Congress. The title was deliberately chosen to distinguish it from 218 Journal of Library Automation Vol. 5/4 December, 1972 an older bibliographic Louisiana Union Catalog. All books listed in the Register are those having a Library of Congress ( LC ) card number; indeed the LC card number is the entry. The term "numerical" was chosen because we anticipate using other numbers besides the LC number-e.g., the Mansell number, and the International Standard Book Number ( ISBN ). The LC card number is the most widely used book number we now have. This fact is put to good use by the Library of Congress in its own NUC- Register of Additional Locations. There are other LC number indexes, but they are not union catalogs. (The Mansell number, of course, will be very useful when publication of the NUC-Pre-1956 Imprints is complete.) Many more titles can be represented on a page by number codes than by complete bibliographic data, at a ratio of perhaps 600 to 9. Unit costs are, therefore, much less. The first edition ( 1971 ) containing 550,000 volumes was produced for an estimated total cost of $22,600-$8,600 grant plus $14,000 absorbed. One hundred copies of the Register were printed in hard copy form with approximate overall unit costs for keypunching, computer, travel, salaries, and printing, as follows : In terms of actual expenditures In terms of total funds, (grant funds ) expended plus absorbed Per title entry 2.5¢ 6.0¢ Per volume entry 1.6¢ 3.8¢ The second edition (November 1972) contains over 1,100,000 volumes and in terms of the second grant, was produced on Computer Output Microfiche for an estimated total cost of $31,200, i.e., $10,000 grant plus $21,200 absorbed. (Reproduction costs for the COM are negligible. For an original copy of 5 fiche , containing all1,100,000 volumes, we were charged $25 by a commercial firm, and for extra copies, $3 each. Copies for distribution will be sold at a slightly higher price.) Unit costs for the COM edition are: In terms of In terms of total actual expenditures funds, second grant (seco nd grant funds) expenditures plus absorb ed Per title entry 1.8¢ 5.6¢ Per volume entry .9¢ 2.8¢ Unit costs computed on the basis of total costs to date suggest that they remain relatively constant from cumulation to cumulation. The concept of a numerical register is not new. The idea was discussed at length in a proposal by Harry Dewey ( 1) almost a generation ago in which he espoused all the essential ideas, and again in 1965 by Louis Schreiber ( 2). Both argued that if the bibliographic data including the LC card number were already in hand, one could then merely look up the number in a numerical union catalog to determine a location. Goldstein and others ( 3 ) have also studied what they called the "Schreiber catalog" and have produced a sample computer printout of LC numbers. Computer output microfiche, on the other hand, was not anticipated in the original concept. It has made reproduction and distribution cheap, fast, and Regional Numerical Union Catalog/McGRATH and SIMON 219 eminently feasible. The history of the Register and its rationale have been discussed more fully by McGrath ( 4). PROGRAMS COMPRISING THE UNION CATALOG SYSTEM The Union Catalog data record is shown in Table 1. The first three fields are the familiar LC card number, and the fourth, the library location. Table 1. The Data Record (1) (2) ALPHA Year or series numeric series Agr 69 (3) Serial number within numeric series 2354 (4) Library c ( 1 ) Alpha series prefix - this data field may contain from 1 to 4 alphabetic characters denot- ing a special series. (2) Numeric series prefix- this data field may contain 1 or 2 digits. ( 3) Serial number -this data field may contain up to 6 numeric digits. ( 4) Alphabetic library designation code- this field contains a preassigned alphabetic code (up to 26) designating the participating library. The three programs which use this data record and comprise the Union Catalog System are shown in Figure 1 and described below. LNREDT PROGRAM LNREDT is an editing program which examines all card input data to determine whether they are acceptable or not. Each data field as shown above is examined as follows: Field 1 for the presence and rejection of nonalphabetic characters, and also to determine if the alphabetic code is a member of the accepted set of codes obtained from the Library of Congress; the accepted records are transferred after checking all fields to a magnetic tape file for subsequent use; rejected data records are printed and visually scanned for the source of error; Fields 2 and 3 for the presence and rejection of nonnumeric characters; Field 4 to determine if alphabetic. LNRSRT PROGRAM LNRSRT sorts all records on the above mentioned tape file. The major sort key is the numeric prefix, Field 2. The minor sort keys in order of the sort sequence are: Field 1-the alphabetic special series indicator; Field 3-the book serial number; Field 4-the library code designation. LNRLST PROGRAM LNRLST is the main program which uses the sorted data tape to: 220 Journal of Library Automation VoL 5/4 December, 1972 1. LNREDT Card recor ds Editing of data fields Generation of records of nique titl~s in combinations 3. LNRLST Combinations en tered in memory matrix and coun initialized Subsequent combinations matched and tallied 2. LNRSRT Sorting of records Calculation of statist i cal tables Fig. 1. Flow Chart of the Programs Comprising the Regist er System . Regional Numerical Union Catalog/McGRATH and SIMON 221 a. create a single record for each unique LC number containing the library code designation of each library having this particular book; b. produce a listing of the above records in LC card number order; c. generate records of unique titles in combinations of libraries owning the titles; d. enter into a memory matrix the combinations of libraries created in part (c); combinations are then counted; each time a combination is encountered, the matrix is searched for a match; if a match is found, the corresponding matrix position is incremented by one; if no match is found, a new matrix position is created with the new combination and the corresponding count initialized to one; this routine also pro- vides for a total count of each library's contributions plus a grand total of all libraries' contributions; e. tabulate, from the data compiled in (d) above, several elaborate tables of summary statistics; these statistics are described later in this paper. The number of libraries the program LNRLST can accommodate is a variable and is entered as an execution-time parameter along with the library names and code designations. The main program occupies approx- imately 150,000 bytes of core memory. THE OUTPUT A sample of the Register entries appears in Figure 2. A simple one-letter designation was used to identify each library rather than the usual National Union Catalog ( NUC) designation in order to save space in the printout. These letters appear alphabetically to the right of each LC number. A typical page of the Register contains ten columns of up to six-digit LC num- bers, with the two-digit series number appearing only once at the beginning of each series. Thus each page contains about 600 LC numbers. The latest cumulation of 1,100,000 volumes ( 560,000 LC numbers) consists of nearly 1,000 pages. The entire output was produced on five pieces of fiche directly from the cumulated tape. The COM program was written by the com- mercial firm which contracted to run it. The computer output microfiche was issued on five 4x6 pieces in 42X. Each piece contains 208 frames and each frame contains an average of 1,126 volumes and 573 titles. The data can be produced on 24X fiche as well as roll film. STATISTICAL SUMMARY The large samples of holdings (from an initial 5,000 volumes, through successive cumulations to 90,000 and, the most recent, 1,100,000) provide an excellent data base for statistical analysis. We believe the samples may be the largest title by title comparison of monographs ever tabulated in this format. Very little analysis is presented in this paper, but the data base and its format will be explained. Even without analysis, many interesting observations can be made. 222 ] ournal of Library Automation Vol. 5/4 December, 1972 4449 E z 4587 CE 90 0 1544C AZ 4607 E 9157 AE 15503 c EZ PS 4690 BCEN 9236 B 15972 0 EJ 76 4729 o"' 9314 z 15980 E 80 15168 4788 E 9611 E 16003 E E J 4859 c 9717 0 16109 E M 112600 J 4891 E 9792 BE 16141 EO A 4903 ACED 9944 z 16393 A E 4911 E 10294 0 1 6405 E 75630 ELMO 77 4938 E 10349 ! 1~472 E 75728 A A 5087 BJLO 10354 16649 E 75736 z 5 . 5158 AB 10357 E 16681 E 75779 AI 56 I 5190 A 10361 J 1 6728 E 75787 AE 100 0 5296 0 10365 E 16752 E 75823 AE 214 BP 5564 c 10460 E 17260 CE 75866 ALIIZ 257 BE 5~68 E 10468 A 17567 E 75874 Ei'. 360 A 5647 E 10558 z 17b89 E 75902 ACECL 407 A 5655 A 10631 17733 0 l 431 CP 5785 0 10645 E 18103 E 75937 ABCMN 553 c 5813 ~~ 10661 AE 18154 E Ol 632 E 5821 10716 ED 15225 E 75996 c 738 AE 5927 E 10723 z 19038 E 76051 ACIOP ~ABCEH 6112 E 10774 8 ~56~,1.. ? I 'OF Fig. 2. Portion of a typical page of the computer printout showing the 2-digit 76 and 77 series, a typical prefix-PS, the serial numbers with the series, and letter codes to the right of each serial number. For example, Library A has the book 77-5; seven libraries-A, B, C, M, N, 0, and Z hold the book 77-75937. Each page contains ten columns; only five are shown. Most of the tabulations are designed to throw light on the various aspects of the overlap problem, since a decisive factor in determining the utility of . the Register is a knowledge of the number of titles held in common by all the libraries. Over the years there has been continuing interest in overlap. Probably the first and most elaborate of the early studies was by Leroy Merritt ( 5), and one of the most recent by Leonard, Maier, and Dougherty (6). Continuing interest is expressed in such proclamations as that by Ellsworth Mason where he claims that materials are "being acquired in duplications that are rather staggering across the country." ( 7). The following statistics were tabulated from input for current acquisi- tions, the most recent being a total of 90,302 volumes, rather than the retro- spective and current totals in the production runs. The 90,302 volumes were acquired for the most part during the two year period, fall 1969 to fall 1971. The statistics show holdings for sixteen libraries. THE BASIC TABULATION-TITLES HELD IN COMMON BY UNIQUE COMBINATIONS OF LIBRARIES The basic tabulation sections which are shown in Table 2 actually fill seven pages of computer printout. The tabulation is designed so that each unique and actual combination of libraries is separately listed, and the books held by each combination are counted. Thus, in the table, although the total number of books held in common by Libraries A and B is 127, the Table 2. Titles H eld in Common by Each Unique Combination of Libraries ~a n~ L ibrary in Combined Library in Combined lli Comhination Common Holdin gs % Combination Common Holdings 01 01 At3C AdCE A t~ CEtii-1JLZ AtiCElZ AtjCEL ~ B~~~1_ ~ AtlCHHiPZ Ao{;HJLMP A ~ CIJI\ MPZ Ali CJNUZ AoCL ~b~~ ~ - ABE 2 39874 AijEH 2 44346 A~EHlJM l 55067 Al) tHli"•IIIPl I 66188 A~EHJ l 48499 A~EHJ~l I l 60064 A~EHJI'. .52790 A~EHMO 1 54117 A~tHZ l 52108 AUtJOZ 1 57757 AoEL l 44765 AtlEU AbEZ A d GHL I-IP A~r1 A!JHZ ~~~ AuJKLP AoKNO ', 01 A ~L ArlM ,01 AbN AtHJ • u Library Titles Combi- in Combined % nation CommonHoldings ::J:j ~ .... 0 ~ ~ ~ .: ~ ""t .... (") !;::) ........ <::: ~ a· ~ CJ !;::) .... !;::) c (jQ ......... ~ (".) 0 !:Xi ~ ~ § 0.. CJ) -~ 0 z ~ 224 Journal of Library Automation Vol. 5/4 December, 1972 number of books held in common by them and no other library is only 52. The number of books held by Libraries A, B and Z, and no other library is 18. None of these 18 is included in the count of 52, and none of the 52 in the 18. They are mutually exclusive. But the 18, plus the 52, plus the small counts in each of the other combinations in which A and B share holdings is 127. The percentage of common holdings for each combination is also given except when the percentage is less than .01. Thus libraries A and B have .48 percent in common of their total combined holdings of 10,688 volumes. It is interesting to note that of the 65,535 possible combinations, in only 444 combinations did the percentage of common holdings exceed .01 percent, and in only 8 did the percentage exceed 1 percent. Of these, th.e highest is 5.43 percent (A and Z). This 5.43 percent means that 678 of A and Z's common holdings were held by no other library. The total of A and Z's common holdings that were also held by other libraries is 1,315, or about 10.5 percent of 12,470. Again this is the highest percentage of any combination. Summary of Titles Held in Common The basic tabulation of titles held in common is summarized in Table 3. Column 1 is the number of libraries from 1 to 16 in each combination. Column 2 is the total number of titles counted in all combinations. For example, 59,907 titles exist in unique copy, thus there were only 59,907 copies (column 3), but there were only 8 titles which as many as 9 libraries held, for a total of 72 copies ( column 3). Column 4 shows that all 16 libraries contributed unique titles and that there were 117 different combinations of two libraries, out of a possible 120 (column 5). Thus there were 3 combinations of 2 libraries which had no titles in common. It is also most interesting that there were only 7 com- binations of 9 libraries out of a possible 11,440, and no combinations of 10 or larger. According to the binomial distribution, there are 65,535 theoretical ways that 16 libraries can combine (total, column 5), whereas, in this sample, only 1,198 combinations occurred (total, column 4). Column 6 is the result of column 2 divided by column 4. Thus 3774.19 is the average number of unique titles contributed by each library. 74.92 is the average number held by any combination of 2 libraries, and 6.89 is the average held by any combination of 3. SUMMARY OF EACH LIBRARY'S MULTIPLICATED TITLES The administrators of each library are especially interested to know how many of their own titles are also held by other libraries. This information for total input (i.e., for titles with LC prefixes from 1900 to the present) is given in Table 4. (Tables were also produced giving the same kind of - Table 3. Summary of Titles Held in Common by Unique Combinations of Libraries (Spring 1971 tabulation) ::x:l ('\) O'c:l .... Column 1 Column 2 Column 3 Column4 Column 5 Column 6 c ;:3 1::1 No. of Libraries Total No. of Total No. of No. of Times Theoretical No. of Average Title ....... in Each Titles in all Copies in all a Combination Times a Combination Overlap Per <: Combination Combinations Combinations Occurred can Occur Combination ~ (Binomial Distribution) ~ ('\) 1 59,907 59,907 16 16 3,774.19 '""t ... 2 8,766 17,532 117 120 74.92 2 ..... 3 2,453 7,359 356 560 6.89 ~ 4 782 3,128 360 1,820 2.17 ;:s ... 5 279 1,395 214 4,368 1.30 g 6 84 504 75 8,008 1.12 (") 1::1 7 43 301 41 11,440 1.04 .... 1::1 8 13 104 12 12,870 1.08 -c O'c:l 9 8 72 7 11,440 1.14 .......... ~ 10 0 0 0 8,008 0.00 (') 11 0 0 0 4,368 0.00 (') :0 12 0 0 0 1,820 0.00 > ~ 13 0 0 0 560 0.00 ::r: 14 0 0 0 120 0.00 II:> 15 0 0 0 16 0.00 ::I 0.. 16 0 0 0 1 0.00 C/) -Totals 72,335 90,302 1,198 65,535 60.38 ~ 0 z ~ ~ CJl Table 4. Summary of Each Libt-ary's Multiplicated Titles (1900-1971 imprints) t:5 0) ...... Column 1 Column2 Column 3 Column 4 ColumnS Column 6 Column 7 c ~ Library Library Number at Each Library's No. of Titles Each Library's Each Library's ~ Code Volumes Volume as a for Which Copies M ultiplicated M ultiplicated ...... Contributed %of Total are also Held Titles as a % at Titles as a % of c by Each Volumes by Other Own Titles Grand Total -Library Libraries (Col. 5+Col. 3) (Col. 5+Total, Col. 3) r:--. Louisiana State ... ~ Library A 4,708 5.21 2,497 53.03 2.76 ~ Louisiana Tech ~ University B 5,980 6.62 2,378 39.76 2.63 > University of South- ~ ..... western Louisiana c 6,353 7.03 1,932 30.41 2.13 0 Louisiana State Uni- ~ versity-Baton Rouge E 29,186 32.32 6,190 21.20 6.85 ..... .... Louisiana State Univer- 0 ;:s sity Medical Center F 580 .64 168 28.96 .18 < Grambling G 1,606 1.77 471 29.32 .52 0 -Centenary H 4,472 4.95 2,061 46.08 2.28 Louisiana State Uni- CJl .......... versity-Aiexandria I 2,765 3.06 1,087 39.31 1.20 ~ Southeastern Louisiana J 4,153 4.59 1,849 44.52 2.04 tJ Northwestern Louisiana K 563 .62 230 40.85 .25 ("!) (") Northeastern Louisiana L 4,891 5.41 1,980 40.48 2.19 ("!) 3 Loyola-New Orleans M 3,803 4.21 1,744 45.85 1.93 0" Louisiana State Uni- ("!) versity-Shreveport N 4,291 4.75 1,749 40.75 1.93 ~'"1 Louisiana State Uni- ,_.. (.0 versity-New Orleans 0 5,968 6.60 1,783 29.87 1.97 -..:( ~ Nicholls p 3,221 3.56 1,048 32.53 1.16 New Orleans Public z 7,762 8.59 3,228 41.58 3.57 Totals 90,302 100.00 30,395 Average 5,644 6.25 1,900 37.78 2.09 Regional Numerical Union Catalog/McGRATH and SIMON 227 information by decade and for the last two years, but are not reproduced here.) The column labels are self-explanatory, but it may be observed that the total in column 5, 30,395, equals the difference between the total copies, 90,302 (column 3, table 3) and the number of titles held by one library only, 59,907 (columns 2 and 3, table 3). DISTRIBUTION OF TITLES PUBLISHED AND MULTIPLICATED BY DECADE Table 5 shows that the very largest overlap, in current acquisitions, occurs among books with recent imprints. This is to be expected since these figures do not make any comparison to older books recently acquired by one library to those already in another library, and since the acquisition of older books is from a much larger universe than that for current books. Table 5. Distribution of Contributed Titles Published and Multiplicated by Decade (Titles acquired from 1969 to 1971) Imprint Number of Titles %of Titles Number of Volumes %of Total Period Contributed Contributed M ultiplicated Volumes M ultiplicated 1900-1909 1,483 2.05 23 .13 1910-1919 1,049 1.45 29 .16 1920-1929 1,180 1.63 22 .12 1930-1939 1,816 2.51 74 .41 1940-1949 2,539 3.51 102 .57 1950-1959 5,353 7.40 361 2.01 1960-1971 58,915 81.45 17,356 96.59 Totals 72,335 100.00 17,967 100.00 OTHER SUMMARY STATISTICS The foregoing tables illustrate the kind of tabulations that can be made with this type of data. More detailed tables can be compiled, and indeed were-e.g., tables giving the percentage of books acquired for each year and each decade for each library, with ten year totals and averages. Other possibilities would be frequency distributions and summaries for clusters of similar libraries. This material awaits analysis. We believe it contains many heretofore unsuspected insights. FUTURE PLANS Since the data can be updated so readily, plans are being made to pro- vide funds for the extraction and keypunching of LC numbers in the re- maining retrospective collections of the participating libraries. These li- braries contain an estimated total of two million volumes. Succeeding cumulations will be readily produced on COM. Most of the cost has been 228 Journal of Library Automation Vol. 5/4 December, 1972 for extracting retrospective numbers from card catalogs. Once the remain- ing retrospective collections are cumulated, costs for cumulating current input will be negligible. Any final catalog of course can never list complete holdings since each library has many titles without LC numbers. Those titles could be listed in more conventional form. Since they are in a minority, the expense would be far more reasonable than it would be to reproduce entire holdings in conventional form. We have said nothing about other aspects of the project. In committee discussions, however, much has been said about the feasibility of using the LC card number to access the information in other major projects such as MARC, and possibly even the data bank in the Ohio College Library Center. Technically, it is feasible to print a conventional bibliographic catalog by matching up our LC numbers with titles listed in the current MARC tapes; pragmatically and economically, of course, it is another matter. Other possibilities are the printing of a list of specialized holdings by accessing the subject headings on the MARC tapes, assignment of special- ized acquisitions, and the gathering of information which might affect de- velopment of a joint processing center. ACKNOWLEDGMENTS This project was supported in part by the Library Services and Construc- tion Act Title III funds administered by the Louisiana State Library. The authors wish to give special thanks to Miss Sallie Farrell, Louisiana State Librarian, for her enthusiastic support and fine advice. We wish also to thank the other members of the L.L.A. Committee on the Union Catalog: Mr. Sam Dyson, Louisiana Tech University; Mrs. Jane Kleiner, Louisiana State University, Baton Rouge; Mrs. Elizabeth Roundtree, Louisiana State Library; Dr. Gerald Eberle, Louisiana State University, New Orleans; Mrs. Hester Slocum, New Orleans Public Library; Mr. Charles Miller, Tulane University, New Orleans; Mr. Ronald Tumey, Rapides Parish Library, Alexandria; and finally, Mr. John Richard, past president of the Louisiana Library Association, who saw the importance of the project, and who appointed the original committee. Complete documentation for this project, including computer programs, has been deposited with the ERIC Clearinghouse on Library and Infor- mation Science ( 8). REFERENCES 1. Harry Dewey, "Numerical Union Catalogs," Library Quarterly 18:33-34 (Jan. 1948). Regional Numerical Union Catalog/McGRATH and SIMON 229 2. Louis Schreiber, "A New England Regional Catalog of Books," Bay State Librarian 55: 13-15 (Jan. 1965). 3. Samuel Goldstein, et al., Development of a Machine Form Union Cata- log for the New England Library Information Network (NELINET). (Wellesley, Mass.: New England Board of Higher Education, 1970) (U.S. Office of Education final report, Project No. 9-0404.) ED 043 367. 4. William E. McGrath, "LNR: Numerical Register of Books in Louisiana Libraries," Louisiana Library Association Bulletin 34:79-86 (Fall197l). 5. Leroy C. Merritt, "The Administrative, Fiscal, and Quantitative Aspects of the Regional Union Catalog," in Union Catalogs in the United States (Chicago, Ill.: American Library Association, 1942). 6. Lawrence E. Leonard, Joan M. Maier, and Richard M. Dougherty, Centralized Processing: A Feasibility Study Based on Colorado Aca- demic Libraries. (Metuchen, N.J.: Scarecrow Press, 1969). 7. Ellsworth Mason, "Along the Academic Way," Library Journal 96:1671- 76 (15 May 1971). 8. William E. McGrath and Donald J. Simon, LNR: Numerical Register of Books in Louisiana Libraries; Basic Documents (Lafayette, La.: Louisiana Library Association, Dec. 1972) (U.S. Office of Education) ED 070 470, ED 070 471. 5743 ---- lib-s-mocs-kmc364-20140601053644 COMPUTER-BASED SUBJECT AUTHORITY FILES AT THE UNI- VERSITY OF MINNESOTA LIBRARIES Audrey N. GROSCH: University of Minnesota Libraries A computer-based system to produce listings of topical sub;ect terms and geographically subdivided terms is described. The system files and their associated listings are called the Subject Authority File (SAF) and the Geographic Authority File (GAF). Conversion, operation, problems, and costs of the system are presented. Details of the optical scanning conver- sion, with illustrations, show the relative ease of the technique for simple upper case data files. Program and data characteristics are illustrated with record layouts and sample listings. INTRODUCTION As a corollary to the creation and maintenance of large library catalogs, it has become necessary for academic or research libraries to maintain author- ity files of various kinds, such as author name, subject, series. In a manual cataloging system these files serve to unravel the mysteries of form, mean- ing, and usage to the cataloger. They also serve as a control to h elp avoid conflicts, synonyms, or overlapping subjects. With a system of decentral- ized catalogs using different subject entries from a system's union catalog, some method must be derived to preserve such usage for the cataloger. A computer-based subject authority file provides that means. In January 1970, the University of Minnesota libraries began studying the relationship of subject authority files to both the present manual cata- loging system and to a planned mechanized system employing the MARC II format for storage of bibliographic data. Minnesota's subject authority files are divided into two distinct logical files: Subject Authority and Geo- graphic Authority Subdivisions. The Subject Authority File ( SAF) con- tains all topical subject heading terms and their subdivisions down to nine Subject Authority FilesjGROSCH 231 levels of term, and Geographic main headings, i.e. U.S. with nongeographic subdivisions. Nonterm data such as origin, usage notes, "libraries using," and other kinds of information are contained in the SAF. The Geographic Authority File ( GAF) contains topical headings found in the SAF, with geographical place names as subdivisions and indications of direct or indi- rect terms in geographic heading assignment. Also similar nonterm data as found in the SAF are found in the GAF. Immediate and long range benefits, together with the cost of conversion versus photocopying showed that greater flexibility would be achieved through the conversion to machine-readable form. Some of the benefits were: 1) immediate assistance to the libraries performing their own decentral- ized cataloging, while providing cards to the union catalog at Minnesota; 2 ) future assistance to our coordinate campus libraries should they wish to increase compatibility of their catalogs to the Minneapolis Campus union catalog; 3) future provisions of a machine-readable authority to enable linking of various subject vocabularies together for an on-line controlled vo- cabulary subject searching system. When the decision had been made to convert the files to machine- readable form, we tried to determine what others had done regarding this application. Although much previous work has been done on subject analysis, cataloging, vocabulary construction, and mechanization of biblio- graphic processes, very few designers have developed systems to support thesauri or subject heading files. In 1967 Heald ( 1) reported on the system for TEST-Thesaurus of Engineering and Scientific Terms. The following year Hammond of Aries Corp. ( 2) described the NASA Thesaurus and Way ( 3) outlined in detail the Rand Corporation Library Subject Heading Authority List ( SHAL) mechanized using punch cards and computer in 1967. Mount and Kollin ( 4) described the use of the computer in the up- dating and revision of the subject heading list for Applied Science and Technology Index. Of course several famous information systems use mechanized thesauri , among them the National Library of Medicine's MEDLARS System with its MeSH vocabulary and the Department of Defense DDC Descriptors. In addition, the seventh edition of the Library of Congress Subject Head- ings utilized computer photocomposition. Another reported work on subject headings in a mechanized system is that of the Library of Congress in which a MARC record for subject head- ings is discussed. Avram et al. (5) give examples of this record and describe the system now under development at LC. Unfortunately, for us, we completed the work herein reported in 1971, thereby not structuring our file to MARC specifications. We mention this work here, as our file will lend itself to such a conversion, should we later require it. 232 Journal of Library Automation Vol. 5/4 December, 1972 DATA PREPARATION AND FILE CONVERSION The SAF and GAF files comprised 59 catalog card drawers of informa- tion (about 115,000 lines of typed data). Each file would be converted and maintained separately, but would use the same system design and processing programs. At a later stage, merging the files would be considered. More- over, the cost of the system would be lower if one design could be used for both files. Two conversion methods were evaluated, keypunching and optical scan- ning. Other methods would have lent themselves to this conversion, such as IBM Magnetic Tape Selectric Typewriters ( MT /ST) or an on-line system such as IBM's Administrative Terminal System (ATS ). However, because of the relatively small file size (under six million characters) and a desire for as economical a conversion as possible, only keypunching and optical scanning input were seriously considered. MT /ST typewriters were ruled out because of cost and lack of locally available tape conversion equipment. Keypunching was considered too slow in relation to typing. Our assessment of optical scanning as the cheapest method was confirmed later after com- pletion of the conversion phase of the project, as an estimated $1800 in total savings over keypunching. Files were converted without intermediate coding, permitting the typists to transcribe directly from the subject and geographic authority card files. The data preparation was done by the Catalog Division's subject authority coordinator. This librarian edited the file to eliminate ambiguities before the typist received the drawer. Otherwise, except for a quick check of the typist's finished sheets, the data were not examined again until after they were in machine readable form on tape. This procedure worked very smoothly, and caused the staff of the Catalog Division little inconvenience during the conversion phase. Figure 1 shows flow of the complete conver- sion activity. Equipment used for preparation of the data consisted of two IBM Selectric typewriters Model 715 with carbon ribbon, dual cam inhibitor, and 065 typing element ( Rabinow font). One machine had a pin feed platen. This feature later proved to make no discernible difference in the quality of the typed output, but some typists stated that they preferred the pin feed platen over the standard platen. The Control Data 915 page reader with a CDC 8092 Teleprogrammer operating under GRASP III software was used for the conversion. Block time was rented at a commercial service bureau for $50.00 per hour. Li- brary Systems Division personnel operated the system during these time periods. Control Data provided a system manual and debugging time in order to prepare for our operation during conversion. However, little assistance in handling the application was actually received from the Con- trol Data personnel, who were familiar only with business data processing. A stock form, called the CDC 915 page reader form, procured from a Pr1111 SAF!:.. CAF lv!astcr !.ists 1-. upda t <> l'JJ'I,,tt.· . t r~·a ll' S,\F" GAF I \la::-,h·r t <.q"' c.. ' CO:"\' U\SJO\. ACTJV J T) Scanned Shev t s Subje ct Authority FilesjGROSCH 233 CDC 9 15 PrOCl'Ssing Consoll• c..• rr ur· list & n ·.1··• 1 1-------tl shec·ts CDC 3300 list tape fo r er r or chcc king CDC 3l00 updat .. , < onvl•rl ra\\ l alh' tu tnt<~ r- 11"letl i a t f• fornHtt Fig. 1. Conversion Process for SAF and GAF 234 Journal of Library Automation Vol. 5/ 4 December, 1972 T SOC I AL SCIENC E RESE ARCH$SGF n T ESS AYS n T PERIOD I CALSn D CL , En T SOCIAL SC I ENC ESn N DO NOT SUBD I VIDE FURT HE R WITHOUT APPROVAL - n T ABST RA CTSn T PERIODIC ALSn i R c .. 19 00- $f1NU n MNU PH . D, THESIS SHAW , ~EOP~E. i!1i; 1970-n ADOPTED J UNE 1970 PER RECOM~ENDAT I O~ OF A- S - n DO NO T DAT E SUODIV I S?D E FUR THER IN MNU CAT - n Fig. 2. SAF Input Typing Sample Page local forms vendor, was used. This form has a typing area of 9~" x 13" marked off by faint blue lines. Top and bottom alignment areas are pro- vided to check for line skew. Scanner throughput is increased by use of the longest permissible form with as much single line data as possible. Figure 2 shows a portion of a typed page from the SAF. Line 1 is the format recognition line which was repeated on each sheet as a precaution against its loss by the optical scanner program during processing. Such a loss of the format recognition line would have forced complete rerunning of the job. The remaining lines show the various data elements identified by tag characters. The complete set of tag characters is shown in Table 1. The end of page symbol # is used on pages which terminate before the last physical line of the page to increase scanner throughput. The h symbol terminates each line and serves the same speed-increasing function. Table 1. Conversion Identification Tags T ag T D N c R z X Description Term D epartmental catalog in which the t erm is used Scope note or general note on use of the term Continuation line Reference from which the t erm was verified if other than LC F ollowed by S = See; by SA = See also; by X = See tracing; by XX = See also tracing. Geographic authority flle cross reference tracing (implied ). Subject Authority FilesfGROSCH 235 Table 2. Term Subfield Indicat01'S Indicator $ SGF $ DIR $ IND $ MNU $ PROV $MeSH $ NAL Description Term also entered in GAF Direct Indirect Local University of Minnesota subject term Provisional term Medical Subject heading term National Agriculture Library term Indentation spaces serve as a flag to the conversion program to show the level of the term or other data element. This technique decreased the number of characters to be typed, yet level errors were easy to detect during proofreading. Subfield indicators for certain nonterm data completed the input format used during conversion. Table 2 describes these indicators and the meaning of each subfield. The GAF typed input is shown in Figure 3. Note the similarity between the two files, yet the presence of the variant treatment of an older term (SOCIAL SURVEYS IN) from a newer term (SOCIAL SCIENCES). As a result the Catalog Division has now changed these old form terms to con- form with Library of Congress subject heading forms. 1>4oor. T SOCIAL SCIENCESn T HISTORY$DIRn T BYZANTINE EMPIREn T SOURCESn D ARTn T SOCIAL SURVEYS IN$DIRn X AFRICA, SOUTHn X ALABAMAn BRYNMAWRM?, WALESe~~ X aRYNMAWR, WALESn # Fig . 3. GAF Input Typing Sample Page 236 ]ourruJl of Library Automation Vol. 5/ 4 December, 1972 During typing, error correction by typists was facilitated by the use of three special characters: .J, -Delete line ? -Delete preceding character t -Type over a character to delete character without inserting blanks. A program is typed on an optical scanning sheet in an assembly level language for the CDC 915 page reader. It is then assembled into object code which operates the page reader and its controlling computer. An example of the program used in this conversion is shown in Table 3. Line 1 of this program defines the input-output and control characters together with a coordinate to terminate reading of a line if data are not found on the line. It also defines the special characters described above for error correction, end of line, etc. Line 2 specifies that a stock form (not pre- printed) is to be read, giving the left-most and right-most character posi- tions and maximum number of lines per page together with the first line number to establish the scanning area coordinates. These coordinates are expressed as three digit octal values determined through use of a forms grid and ruler. Line 3 describes the tape record format including the field size, the blank fill character, left or right justification, and alphanumeric or numeric only data field content. Line 4 instructs the 8092 telepro- grammer unit to convert certain characters to octal values matching the CDC 3300 computer system which are not identical to the normal 915 page reader octal values. The final E terminates reading of the program sheet. From this sheet GRASP III compiles an object program which is stored in the 8092 teleprogrammer memory, enabling scanner operation. SYSTEM DESCRIPTION AND OPERATION The raw data tape created during optical scanning was used to build the SAF and GAF data files. The magnetic tape coding is binary (odd parity) using 800 bpi density. A fixed length record of 20 characters is used with 100 records per physical block. As many 20 character Format C (continuation of data) records are used as needed to achieve variable length logical records. Table 4 shows the three record formats used. Table 3. CDC 915 Program for Raw Data Tape Creation ICTLIBLK,DSICAN, ? IDLT,tiEOL,niEOP,#IfMT,wlww ISTKID27,350,116,004lww E Subject Authority FilesjGROSCH 237 Table 4. SAF and GAF Record Formats Fo rmat A - Control Record a;ar.--- Contents Pos . 1 2-S 6 7-14 15 16- 18 19-20 Reco rd tv~e Paqe number Column number File cr~ation date File identification Subj. Au th. (SAF) Geog . Auth. {G4F) Co lumns used (123 standard) )lumber of 1 i ne s ner page (75 standard) Format a - Data Record (ini tiaJl ~~!~ · - Con tents 1 Record Tyoe Tenn Reference tenn (GAF only) Reference Dept. Library See See also See from Values 1-9999 1-3 '1:1-DD-YY s G 123 . 121' 111' 131 80 max . Values T X R D 1 2 3 4 Fonnat B C ar. Pas 4 S-6 7-20 (Continued) Contents Qualification code (6 bit binary) SGF (Se~ Geograohic) DIR (Direct entry) 1~0 (Indirect ent.) PRO (Provisional entry) t1NU {r1i nn esota tenn) MESH (Medi cal subj. heading term) NAL (National Agri. Library tenn) Comb inations of these terms are possibl~. They are stored by adding the above values together, i . '!. 17 - r1NU/SGF Number of disolay lines for item First 14 characters of i tern Values 1 2 4 8 16 32 48 2 3 See also from Level number 1 -7 Fonna t C - Data record (cont1nuation) Sort exception code i~umeric ~xce'ltlon Hvn;en excertion Sub>titut1on exceo. U.S. obbreviation ~t . Brit. ab 1 'r~v . !J H s u ' Char· Contents Pas. 1 2-20 ~ecord tyoe Con tinuation of item . Values blank or To change or modify the file, keypunched cards are used; one transaction card is used for each correction for both SAF and GAF files. Table 5 shows the layout of this card. Table 5. SAF and GAF Transaction Card Column 1-4 5 6-7 8-9 10 11 12 13 14-15 16-80 Contents Page of master list Column of master list Line of master list Sequence number Deck number Continuation number Level num ber Transaction type Add Cancel Modify Record type Term Reference term ( GAF) Reference Departmental Library See See also See from See also from Data Values 1-9999 1-3 1-80 00-99 or blank 0-9 blank or 0-9 1-7 A c M T XT R D s SA X XX 238 Journal of Library Automation Vol. 5/4 December, 1972 Catalogers in the Wilson Library (the University's largest and central library) and the Bio-Medical Library use a 3 x 5 card as an input form. This card is filled in and transmitted to the librarian acting as subject co- . ordinator. Then the information is keypunched and prepared for submis- sion to an updating run. The normal schedule as originally planned was to run a cumulative supplement monthly, with a quarterly full updating of the file. However, this schedule has been flexible as the transaction vol- ume has varied considerably from early estimates. Currently updates are run quarterly to produce supplements, with a full listing annually. These updates vary from 5,000 to 14,000 transactions. The program for the system is written in COBOL for the CDC 3300 computer operating under the MASTER operating system. Upon demand the program performs four basic functions on the data files: 1 ) creation of a cumulative supplement list from a transaction card deck; 2) updating of the tape files from the transaction card deck; 3) preparation of master lists either during the update process or independently; and 4 ) querying the file on the basis of user defined search terms. Parameter cards control the options available when supplements or master lists are to be run. The ACCEPT, DECK, LIST, ABORT, LINE, SPACE, COLUMN parameters provide control over cutoff for new supplement, transaction card list form, termination of job if the number of error cards exceeds a given value, number of lines per page of output, and number of blank lines before and after each transaction on the suppl~ment, and whether a single or double column supplement is to be produced. Figure 4 shows a sample from the SAF Supplement. The updating phase of the program creates the new master file and pro- duces an update error listing accompanied by a report on composition of the file by level number, kind of data, and logical/physical record counts. The master list printout is also controlled through parameter cards. The LINE, COLUMN, SELECT options indicate the number of lines of data to be printed in each column, the number of columns per page, and which pages are to be listed. This latter feature permits supplying replacements for pages improperly printed or bound and suppression of printing when a program restart is necessary. Figure 5 shows the most commonly used Master List format. The file query function is performed upon demand to assist in file revi- sion, to change a term throughout the file , or other special purpose. The search items can be composed of any and /or combinations of record types, record levels, qualification codes, sort exception codes, and key words or phrases. A keyword search is a character by character search of file items. Thus, by specifying a root word, all derivatives of the word formed by adding prefixes or suffixes will be identified. If these derivatives are not desired, a blank preceding and/or following the root word in the search key will prevent their display. However, the word will not be identified if it is "'" · t'-•7"> 0 L t .... 'It<:; ~I· '11 • 7? ~ ll r Cflgp.-,g~tl"l'' "' 1 ~., . ., Dl'"' t t(, ! <:LA T TON <:tf' '""l::>P"'04. T J n 'l l ~If wt' tf!"< r-<::; "'"' r '""'o~~>.oo ~ 1 r "' rr T t~r.<: O[<:;flll,l ll' " '" r' L I fr'H'"J W I'\O o" ~ <:: rr '""l:;~r>r)oe T I I" 'I o r <;ro " '"" l"f"l""l'" Ht l'\ •~ <;fr '" l'l~O:.fll TIH TI"'N .aNt'! "'(Qr,r-o nr CO~P r) I'A T J Qt•<:: r(' =~~~~~n~!, r'Q Of II>N r c ooro;P nt;flf' NC'f o;cwont.<:: hjl'l r::ouq.sro:; "'<: ., ~v<:; ,. .. 'ill!. O!f"A'l 4 t• tw o o o:. <:;[ r AtJ T>4"10 S . CCS. t l Olf"4N r" f:O: l t l"' ( Of 11 fCI''ff;4 l" DfOJQI'Hr.aL<; cotrf'l owaotNG(4L o;P .. TNrTfll' I"Of ~f a._n I"OfMfN&l ~ l ~>j (.tJlf( <:; ff '"A 'I f tA••r.u arr <:;ff f".l'IT I"O! I"IC A~'"' "'OU~"' A T 10 '1 "'£" Ow~~"•ll; n[<::c;Pw & G£4l <:;PW J'It';T "' O roy •r Por vE: NTt O"' 1' Q\,' 1 W Dll.:;lf!I"JOJIT fi')N t '1'1)1)1'"1 etU>tt f'tO ,lfJ""' 1 '1 .... I,. f r..,e:vrwt rnN Y Vf1U fw ~I T C!:O\.S f ("O T.'4., OOIC:Vf "4TJON' r:: rni"!"'IL L4.W l"f'Nt't ~rT Or' lAw<:; vr f' C!:J "I"~IL J U OI SDH~TJON 0Lf~OTNG t.tH) ORJi r:lt""' O::<'f l"~f .. !"'IIL PQ11C'!01.J!: f (' 0 .. ~" <;,r[ I"OJ OI!"'Il l4.W (Of' ..... N l 1W) CO•H" lfl"f 1'1~ LdiW C: <:; C'r r.:tTOITIIIll J IJ '!J<;'H C T lO"'l Pl ~:~~~~ T:~2.~$1~ ~~l~~l1'!'" PO .. ,. o;rr !"C!IIIIT~Al lh f Of\Ooi &N L.\W) ~O tf lCOL O(IJN T Ofl>; t yr r, oepwv conn ra~ ~ Uf io4('1oo;, ~=g~~T :.~~~!~~~ f :~~~fl~~ <;H Nflll~ON f"r;>Q<; 'S S £r"T.fi1Nc;; c uo ~ 0£'\rOJDrT O"l GNQ TUVEl fOC:t• 'f 0 "1 0 " 1C fON rl !Tl O"lS 1 0 1:0. Of'!;f01f' f! 0 "1 A ~C TO~V(l 111 1) 1- r'f\O:A .... f l l" ll l1 '1 Oft:l! r'I ')Tr:AL S .,..,r tr fT~S ""U""'rc;; IIJQL t or.o ~ o,..,. Cu'~ e OJ G U 1'"111~>ITQ E~>~GlJS ~o~ " Ul TlJOf 6.,1'1 f( £L t G I Q"' Subject Authority Files/GROSCH U"fT VI' 0 <:Ify OF MI NNF'i0 T6 'i~~~t~; .. ~ ~~ ... ~; ~ i y r 1 ~" H;a "1' .. .J fr !l t .J io ~1\1 n t 1" ~"' ~'~l"lo ' " " t'IJllo l f:o L on ~o 1 , .. {l tl C I N Ot'l 0"1 01'1 f'l"' (ltl r: A"" 00 CHI 00 Ca N or~ r:a~ I ~ o , .. 7 t .. ~ t c; , aoo " I OtloZ H1 t Ot Af10 ' 1 o t .. z 20!> ,. ~ t ,.on r I O lio? '"~ flU Of\0 T I O lio ? '"~ 1'H ADO <:; I 0\lo? ''1 101 B(!O T 1 Olio ? ?2 1 t 5 t ann '< I Olio ? ?23 101 AOI) T 0131o '36 ~ Ofl C At.' 'S I Olio? zz.~ 1<;1 100 <:; I II IIo? Z?lo Ofl ru,j T , ,~ .. 31~ t o t • oo 1 ' Otlo? ??'!> oo ON c:; IIlJa '"''" 15 1 an o I Olio? ?lf. 00 CAN 1 I D11o2 ?21 ll l'f Cl"l 'i I Otlo ? ;t?~ U ("AN f I Otlo ? ??G 00 {"A"' <:; I I 0 1 lo? 7c; 1 I 0 1 ~!11'1 T I lt tlo ? 9 1'1o I t'l l AOO T I : I I I I I I 014 ? ~'SZ 00 C IN T IHit ? l~'i 101 AOrJ T Otlo~ 11! ''Ill Cl~ T Otlo3 1 H- 101 l{)O T 014 ! 1 ~El tSt 100 l' Ol lolo l ,3 101 A(10 t IIJ iolo 133 tst ~~n " llt]O tEo(' t!!'t AOO T I n11o1o 2'if 1 1}1 100 T ?'ifl 1 St ADO I) ?Ofl 00 C ... l T ;?E. 7 00 O N r1 ]Qf tnt 100 T Ht t 'i t JIM ~ 011q t fiO 11o f I flO IH]G tF. O 1~1 AOO ADO AM AOO •no •on AOO rAN CAN CON O N CAN r.&N on~ lF.f tot U )l} nt!OO V? tftt an o Dtlo" t '7 t 'i t •nn nt•)T V I " f 'l ~ • "" ' pcrn •r\trrro:; "r'\J71~~ ,:;;o rr·~~ "'' '"'' I 1_,,_ JO< I• • .,,,, 'CTJC••t s~~> rH.ll · y . ,:. ~,.~;~l.~'¥!1 .~~~;- ~ " ~ .. ..... :, 1i-1T~J,,.S;r; ~;1~,.:HM:}n rr "'" .. , ':' I r p,f c: Of" rr ... t .&': T t r(!H~"I •r t rr "P' rt u.>. P ~~~' ' " OF u .- .. t ~llt .. , 1. .. r,,e t C'f1P.lUf T7 0 "' '· Y'" t • " "Ct~;;::~~l~ft~ ~; ~ ~~f ,.Al f f' 'i l O(" fl~~~fl ~ ~~f! J<:C y H ·•rv • ~ro: 1\JH"tof""J<:; ·C"~ Jy'! ri'IWll- ' ' '" ll~ ( JIH ).OO::J'i . .. , . ~·~. 7 1 c . ... ,. .. r~· r .. • t '"* ~ ....... ., ... , :. "' "'ll"'·""""' • n•·rr'" ~ r .,.q uot: :::g: ~~~; ~ J~J!~H:~i,~~ u-1 o • "('• '· ,"1(1' 1 ~' T1 Y If tJ~ • • l i, W) 0 1·•··.r:r , cr -.. ~ ... ,(' .. ~ C:H<' t r r l'tfi(Trt ~o~r <; :· ''' f ~ ; ~,;~.K1 ~~~~ ~ ~~ ~~ r,: .. ' ~~ ~ ~f~!j'~! ~~ i (: "t.,t · r r,. r •• . ~·r =~""~i~t:~:~;· ... ~ ,., .. ,- , ,. ... ,. .,. C"'' 'trt: •• .,. • ttl\ t IPL~ ~ • c,.~rr •tt t , , • I ,("'0 P~ tr ~ '' "'• 1"1 I t"ll"' P'H f IO , t , ! <\3 <>-t"lo' . f'l~ ·l"t.!I"T ll''l"' Ct' ~J·?;l!;: ... : ,~! ~/f , 6 ·~~ r : ;~~ ~~~jpo y ~" ~::~-~~,Y~f'·~~~~~~~~~ ' ~~~-·11 nrn , ''l Tl ~.l'f Htrt• , ft'"dll • .•· ~"'• n tr~ • ,. .... ~ '' • '' '' ,. , D l"ttCo'" ' ' "c•rtt ., , r " r.l U .fll r-· - "~:~.l~t ~ ~c ~~~~itiL<: ll f' ,., r~ U" • "O"' "r~ t - ~ ... , , ,., ' '' ., 1"- r • •~ r • •r• 1 l· f'' I /'5-f,t' COMU FRU. ITAT( -.,1 •21 • rn' o:• Pila t 1r1 c~c:,r,.ru•(,n ·~r: ... ~~:~: ~~~ H·~(l~~~ .. ~,,~, ,ro II('(' " <'• ((fO,""" fY illr. .. .t klf'l 1 r ~ '"'" •n _., co::~~" ~=ffH ~n~~; ~~ n1~~ :, C' \l ll ro~· tHC1 "":~~~"~~~~f' ~ r~ . ~:~~~ ~ cn·'"' "" ' ''f"' l.r ' ' ch t Hr . ., ""r ONf,C. &'tlf ,.f<' , f' f'I-IS£ C'f •JTn"' t,.D IIH • H' P P!J ..; t f'l''l~"RV•TtOI t'F t't'<;C -•- ,.,,.,.. rr•~•uv •t rn. cr· "' P'f iT t r • ,., ff'~f ... ,:::. • ~('\< ., (l' f> t rr ,. .,. li""~ IPT " ,~ '"J k\.~1 ... H t r . t:"'HVJ>v • nr"' •ttO . ,. . Tf'Ci fiJ : • ~"n· .,~ ,h'• ' rot o1 .. r.,.t· r- "'" ,.,::,..:a¥~!~:; · .... ~~~~t ~~~~l=~ .. , .,., , li P , l t' f' h l l,.r ~ TS • r .,,.,c;rq,. u tct r ,- " 'll:IOI L "rc;ru•r~" I <;CF ; ~:~. !~~ .,:f .. , <'c; A'f S "'" I"C'I! r 't" ~~~~~~~l ~ ~~ ,~ar.-t.c t '>r. F v.-;1 ~\~~~t ·~~!o& "r " yriO D('If' ll' " rn,..o;.• J> VU I O .. ('f ,..IIU<> . "Fr "''''"' r"'t .,, ~y t • I I"._ ,. ,.. .,o:r- J>yttt r • rr r'' "' ''llfr,o: · , ... "~H; ~~ ; q .~f '"- t"V• t latot .,.('I f' O! :~ l·~l ~ ~ ~~ 's~ ~~~~~~~~; I O N t;(' ~<'~" f.'Y .t tJ rJ CF r P1 Jol S "~I Pf' [t 1 • , (f'NJ' I'V .t tr; ,_ -H O PP' l i" CH1 C,._ r:o•·c;~J.o 'flt~N CF ~r i f lft"(:o:; c;u· rC"" t'l ~'t i"N Cf N.tT U"Il FF <' OIJ111 '" r..; • ro;,i~~" ~!l ~~ .$' ! <'PI (' ~ tP~Yf ~~'ll rc;t, · ~g; ~~=~:lJ~~ ~= ;~;"'!~~f' c• o;. H .,OLCI6" f'f':,~~~ :~n('~C"~~ :~~;~~~l OP<' o:-rr CljJttJ•t:•. (l""'"''lfV I t J I"'N &Jon cr• t i" I I TJ')t ,.,. :, ·::~~n ~ f, ,.~~n~~; f('l"l •rc• crov11 1 ~1 t I'I (P~u c:o:t u IC 0tU711 1H i< C.< F I PTtl IPII U"' O: J ., ro • """" ' 'n• t r~ •r~< tiH' ""'"t."f< • C:t'l'C)10f'OltH' H I W) • cowo:;f' l(t.,H' OI •,.tu••t a, r ,, • r t"t••nli Tf ( "" •"t" toorc ., ~ If ) i •nrt,Y Fig. 5. Master List Format Using 3 Column Standard expansion of the shelf list. Although file conversion took five months to complete, the program to operate the system was delayed because of termi- nation of the programmer originally assigned to the project. Although the basic program features were ready in about 3-4 months, it was not until January of 1971 that the system was installed. During that year the staff gained experience in the system and cleansed the data of many ancient errors. By the end of the year, the system was an integral part of our Cata- log Division support activities. COSTS As was pointed out previously, there was consideration given to photo- copying the authority files to provide a duplicate set for the Bio-Medical Library. It was determined that this would cost $2,400 ( 60,000 cards @ $ .04 each) . This equalled the cost of the typing personnel and rental of optical scanning equipment. Moreover, there would have to be duplicate cards and filing to maintain both files, with no assurance that they would remain exact duplicates of one another. In our opinion the benefits of this Subject Authority Files j GROSCH 241 Table 6. Conversion Costs Item Senior clerk typists @ $2.40 ( 2 FTE for 3 mos.) CDC 915 rental (20.1 hours @ $50 per hour) Typewriter purchase Typewriter rental ( 2 mos. ) Magnetic tape CDC 915 forms CDC 3300 computer time @ $95.00/hr. Total Cost $1810.56 1007.50 532.70 60.00 74.00 400.00 1411.45 $5296.21 computer-based system offset the additional cost over the photocopying approach. To create these files completely cost $5,296.21 for all direct expenditures for clerical help, scanner time, typewriter purchase and rental, supplies, and CDC 3300 computer time. Table 6 shows the breakdown of these costs. During the conversion and development phase, salaries of the systems personnel were absorbed by the library so that only these direct costs were charged to the project. Also, the library absorbed the Subject Coordinator's time for editing the file of cards prior to typing. Two senior clerk-typists at $2.40 per hour each were employed for three months full time to type the data. Operating costs are borne by the library, which requires a half time librarian as Subject Coordinator and a student keypunch operator for 15-20 hours per week. The Systems Division provides program maintenance as required. Supplies and computer time require about $2,100 per year if quarterly full lists are used with monthly supplements. Some idea of the relative processing economy can be shown by exam- ining some typical running times on the computer. The sizes of the SAF and GAF files are respectively 4.35 and 1.75 million characters. A typical supplement with 12,000 transactions takes 45 minutes to print on the CDC 3300 equipped with a 1000 line-per-minute printer for either SAF or GAF. Printing of a full master list for the SAF and GAF is 1 hour 25 minutes and 45 minutes respectively. Updating the files takes about 1 hour 40 minutes for 12,000 transactions. A query of the file takes about 30 minutes. Cur- rent computer and channel charges are $95 per hour. GENERAL OBSERVATIONS Our experience with this project has shown us the high reliability of the CDC 915 page reader as a conversion device. Less than 1 percent of the total amount of data the page reader scanned was rejected. Those errors rejected were easily spotted and retyped. No scanner-produced errors were found in the data; however, there was an occasional failure to pick up spaces when more than three occurred together. These errors were very infre- quent and were discovered in the raw data proofreading. These errors were corrected and, after the final output file was generated, we again 242 Journal of Library Automation Vol. 5/4 December, 1972 checked for similar conditions and found everything in order with regard to term level indication. With an upper-case file such as this, use of the CDC 915 is simple and easily accomplished. ·However, the library should not rely upon a scanner manufacturer or the installation where a unit is being leased to provide all the assistance required. The library will have to design its application and become familiar with the equipment in order to achieve best results. All optical scanning usage requires that certain care be exercised in t~e typing operation. Lines must not be skewed, characters must not be blurred, and length of line can be critical even though the scan optics may be opened and closed over longer lines than are intended to be typed. Further, it is imperative that the paper used in the scanning operation meet speci- fications for use with the chosen scanner. Our experience indicates that a pin feed platen is not necessary to maintain forms alignment if typists use care in initial alignment. We experienced some operational problems when we actually tried our program on the page reader. Initially, the system would not compile our program. It was not due to a catastrophic error in our program, but rather a hardware fault in the 8092 teleprogrammer. In trying to read the program onto tape after compilation, the system consistently failed. We finally gave up trying and recompiled from the scanned input sheet at the begin- ning of each conversion run. No one at the data center could explain our failure to load, but we must assume an intermittent or undetected hard- ware problem. During the job run it was imperative that the scanner be watched closely as occasionally it would stop reading or fail to feed a sheet. These were not difficult problems but did require occasional attention by the center's customer engineer. On one occasion the scanner failed during our run, and we could not achieve a timely repair. We rescheduled for the next week and then experienced no problem. After our experiences with the 915 page reader at the data center we felt that we knew as much about the equipment as any of the operators we met while doing our production runs. We would not hesitate to use the page reader again for a simple file conversion, and would continue to handle the operation ourselves as the center operators were no better able to run our job. ACKNOWLEDGMENTS The author wishes to thank Mr. Eugene D. Lourey for developing the program for this system. Mr. Curt Herbert deserves recognition for the preliminary design for the system and initiating the optical scanning activi- ties. Also, Mr. Carl 0. Sandberg, who was responsible for the many details of the conversion portion and who now maintains these programs, con- tributed many significant design parameters. The staff of the Catalog Di- vision, too, deserve our gratitude for their file cleansing and data editing during and after conversion. Subject Authority FilesjGROSCH 243 REFERENCES 1. J. Heston Heald, The Making of TEST -Thesaurus of Engineering Scientific Terms. (Final Report of Project LEX, [U.S. Office of Naval Research: Nov. 1967] AD 661,001). 2. William Hammond, Construction of the NASA Thesaurus, Computer Processing Support, Final Report. (Aries Corp., 1968) N 68-28811. 3. William Way, "Subject Heading Authority List, Computer Prepared," American Documentation 19: 188-99, (April 1968). 4. Ellis Mount and Richard Kollin, "Analysis and Revision of Subject Headings for Applied Science and Technology Index," Special Libraries 60: 639-46, (Dec. 1969). · 5. Henriette D. Avram, Lenore S. Maruyama, and John C. Rather, "Auto- mation Activities in the Processing Department of the Library of Con- gress," Library Resources and Technical Services 16: 195-239, (Spring 1972). 5744 ---- lib-s-mocs-kmc364-20140601053820 BOOK REVIEWS Proceedings of the Conference on Interlibrary Communications and Infor- mation Networks, edited by Joseph Becker, sponsored by the American Library Association and the U.S. Office of Education, Bureau of Libraries and Educational Technology held at Airlie House, Warrenton, Virginia, September 28, 1970-0ctober 2, 1970. Chicago: American Library Associa- tion, 1971. 347p To see how rapidly the field of library networking and communications has moved in recent times, one need only try to review a conference on the subject some years after it was held. What was fresh, imaginative, inno- vative, or blue-sky has become accepted or gone beyond; errors in thinking or bad guesses as to the future have been shown up; and the blue sky has been divided into lower stratospheres and outer space for ease of working. Under these circumstances one can only review such proceedings as history. The assumptions on which the conference was based were the traditional ones of librarians and information scientists-that access to information should be the right of anyone without regard to geographical or economic position, and that pooling of resources (here by networking operations) is one of the best ways to reach that goal. Since 1970 both of these assumptions have been questioned, but at the time of the conference there were no opposing voices. The final conclusions, of course, were based on these assumptions. Na- tional systems were recommended, both governmental and private, with the establishment of a public corporation (such as the Corporation for Public Broadcasting) as the central stimulator, coordinator, and regulator, to be served by input from a large number of groups. Funding, the attendees decided, should be pluralistic, from public, private, and foundation sources (are there any others?), but with the federal government bearing the largest burden of support. Since it is deemed desirable to give the widest chance for all individuals to use these networks, it was recommended that fee-for- service prices should be kept low through subventions of the telecommuni- cations costs by libraries and information centers. And since new techniques and methods need to be learned, both education and research in the field must be strengthened and enlarged. Since the basic components of networks of libraries and information cen- ters was conceived as being: 1. Bibliographic access to media 2. Mediation of user request to information Book Reviews 245 3. Delivery of media to users 4. Education traditional questions of bibliographic description, the most useful form of public services (including such things as interviewing requestors, seeking information on the existence of answers, locating the answers physically, providing them, evaluating them and obtaining feedback), as well as the best ways to set up networks were discussed at length. Moreover, since new technologies have sometimes been touted as the answer to many of these problems, a whole section on network technology was included. Such sub- jects as telecommunications, cable television, and computers were examined; here most of the recommendations still remain to be carried out. The organization proposed for these networks again plowed old ground. The conferees felt that one should use the tremendous national and disciplin- ary resources already established (the Library of Congress, the National Library of Medicine, the National Agricultural Library, Chemical Abstracts, etc.); there should be a coordinating body to minimize duplication of effort and assure across-the-board coverage; the systems must be sold to legislators if public money is to be provided; and more research on the best networking operations is necessary. Above all in almost every section of the report and in the Preface the then-new National Commission on Libraries and Information Science was referred to as the great savior. Together with requests for public money, it might be said, this was the thread binding all sections of the conference together. Was this conference necessary? Could it have brought forth something more useful than the gentle spoof in Irwin Pizer's poem "Hiawatha's Network?" It was undoubtedly very inspiring for those at the conference- all 100 of them-who probably learned more over the cocktail glass and dinner plate than at the formal sessions, and who learned as they grappled with the difficulties of consensus-making. But need the proceedings have been published? Is everything ever said at a meeting always worth pre- serving? How about the concept of ephemera rather than total recall? Would not a short summary of the recommendations have sufficed? Estelle Brodman 5760 ---- lib-s-mocs-kmc364-20141005043400 File Structure for an On- Line Catalog of One Million Titles J. J. DIMSDALE: Department of Computing Science, University of Alberta, Edmonton, Canada, and H. S. HEAPS: Department of Com- puter Science, Sir George Williams University, Montreal, Canada. 37 A description is given of the file organization and design of an on-line cat- alog suitable jo1· automation of a library of one million books. A method of virtual hash addressing allows rapid search of the indexes to the cata- log file. Storage of textual material in a compressed form allows consid- erable reduction in storage costs. INTRODUCTION An integrated system for on-line library automation requires a number of computer accessible files. It proves convenient to divide these files into three principal groups, those required for the on-line catalog subsystem, those required for the acquisition subsystem, and those required for the on-line circulation subsystem. The present paper is concerned with the files for the catalog subsystem. Files required for the circulation subsystem will be discussed in a future paper. The files for an on-line catalog system should contain all bibliographic details normally present in a manual catalog, and the file should be orga- nized to allow searches to be made with respect to title words, authors, and Library of Congress ( LC) call numbers. It may also be desired to search on other bibliographic details, in which instance the appropriate files may be added to those described in the present paper. The file organization should be such as to support economic searching with respect to questions in which terms are connected by the logic opera- tions AND, OR, and NOT. It should also allow question terms to be con- nected by operations of ADJACENCY and PRECEDENCE, and it should allow question terms to be weighted and the search made with reference to a specified threshold weight. It may be desirable for the file organiza- tion to include a thesaurus that may be used either directly by the user or by the search program to narrow, or broaden, the scope of the initial query or to ensure standardization of the question vocabulary. The file organization and search strategy should ensure that the user of the on-line catalog system receive an acceptable response time to his 38 Journal of Library Automation Vol. 6/ 1 March 1973 queries, although it is likely that some of the operations required by the circulation system will be given a higher priority. Thus the integrated sys- tem must time-share between search queries, circulation transactions, and other tasks that originate from a number of separate terminals or from batch input. Such tasks might arise from acquisitions, and from update and maintenance of the on-line catalog. The system should be a special purpose time-sharing system such as the Time Sharing Chemical Informa- tion Retrieval System described by Lefkovitz and Powers and by Wein- berg.1· 2 In this system the queries time-share disk storage as well as the central processor. Since an on-line catalog is a large file, and hence expensive to store in computer accessible form , it is desirable to store it in as compact a form as possible. For example, a catalog file for one million titles is likely to in- volve between 2 x lOS and 5 X 108 alphanumeric characters. If stored char- acter by character the required storage capacity would be equivalent to that supplied by from seven to sixteen IBM 2316 disk packs. It is also impor- tant to design the frequently accessed files so as to minimize the number of disk, or data cell, accesses required to process each query. The files described in the present paper include ones stored in com- pressed form and organized for rapid access. Throughout the present paper the term title is used in a general sense. It may include periodical titles as well as book titles. However, it is sup- posed that frequently changing information, such as periodical volume ranges, will be stored as part of the circulation subsystem rather than the catalog subsystem. OVERALL FILE ORGANIZATION The complete bibliographic entries of the catalog may be stored in a serial (sequential) file so that any record may readily be read and dis- played in its entirety. However, as indicated by Curtice, use of an inverted file is to be preferred for purposes of searching.3 An alternative to the simple serial file is one organized in the form of a multiple threaded list ( multilist) in which all records that contain a particular key are linked together by pointers within the records themselves. The first record in each list is pointed to by an entry in a key directory as described by Lefkovitz, Holbrook, Dodd, and Rettenmayer.4-7 For very small collections of documents Divett and Burnaugh have at- tempted to organize on-line catalogs by use of ring structured variations of the multilist technique.8• 9 Neither file organization is feasible for a collection of a million documents because of the long length of the threads involved. Many disk accesses would be needed in order to retrieve all elements of a list, and hence there would be a very slow response to queries. The cellular multilist structure proposed by Lefkovitz and Powers, or the cellular serial structure proposed by Lefkovitz, may well prove to be a viable alternative to the organization proposed in the present paper.10• 11 File Structure for an On-Line Catalog j DIMSDALE 39 However, as indicated by Lefkovitz, the inverted organization provides shorter initial, and successive, response times in answer to queries.12 In the present paper it is supposed that the on-line catalog file consists of both a serial file of complete bibliographic entries and an inverted file organized with respect to search keys such as title words, subject terms, au- thor names, and call numbers. Such a two-level structure is often assumed and has been termed a "combined file" by Warheit who concluded it to be superior to either a single serial file or a threaded list organization.13- 17 The file structure described in the present paper uses indexes based on the virtual scatter table as described by Morris and Murray, the scatter in~ dex table discussed by Morris, and the bucket as treated by Buchholz.18- 20 The attractiveness of a similar structure for use in the Ohio College Li- brary Center has been analyzed by Long, et aP1 The basic elements of the file organization are shown in Figure 1. It is supposed that the access keys are title words, but a similar file structure is used for access with respect to keys of other types. KEY HASHI NG HASH {eg. TITLE WORD)_. F UNCTION-+ TABLE FILE Fig. 1. Overall File Organization Any key may be operated on by a hashing function which transforms it into a pointer to an entry in a hash table file. This file contains pointers to both a dictionary file of title words and an inverted index which is stored in a compressed form. Entries within the compressed inverted index serve as pointers to the catalog file of complete bibliographic entries. Terms, such as title words, within the catalog file are coded to allow a compressed form of storage. The codes used in the compressed catalog file serve as pointers to the uncoded terms stored in the dictionary file. There would be a separate hashing function, hash table file, dictionary file, and compressed inverted file for use with each different type of key. However, there is only one compressed catalog file. For a search scheme that allows use of a thesaurus of synonyms, narrow- er terms, broader terms, and so forth, a thesaurus file may be added (Fig- me 2). The files must be organized to allow for ease of updating. As further bibliographic entries are added it is necessary to add additional pointers from the inverted index. Also, whenever a new key occurs in a bibliograph- ic entry it must be added to the dictionary, assigned a code for storage in 40 Journal of Library Automation Vol. 6/1 March 1973 the compressed catalog file, and entered into the compressed inverted in- dex. KEY HASHING (eg. TITLE WORD) - FUNCTION Fig. 2. File Organization with Inclusion of a Thesaur·us STRUCTURE OF THE HASH TABLE FILE In order to locate the set of inverted index pointers that corresponds to a given search key K, the key is first operated on by a hashing function that transforms it into a bit string of length v bits. Each such bit string is said to represent a virtual hash address, and is regarded as the concatena- tion of two substrings of length r and v-r bits. The two substrings are re- spectively said to constitute the major and the minor M( K) of the virtual hash address. The major is further divided into two bit strings B(K) and I(K) that define a bucket number B(K) of a bucket f3(K), and an index number I(K) of an entry within the bucket. The major that represents the pair of numbers B ( K), I ( K) is said to constitute a real hash address. The hash table file is divided into portions, or buckets, of equal length. Each bucket is further divided into an index section, a content section, and a counter section (Figure 3) . The index sections of all buckets have the same length. Similarly, all content sections are of equal length, and so are all counter sections. As the hash table is created, entries are added sequentially into the con- tent section so that any unfilled portion is at the end. In contrast, the index section of any bucket may contain unfilled entries at random positions and hence constitutes a scatter table. The hash table :file is created as follows. The various keys are trans- formed by the hashing function into bit strings B ( K), I ( K), M ( K). In the bucket f3 ( K) of number B ( K) an entry as described below is added to the content section, and the vacancy pointer within the counter section is incremented to point to the beginning of the unfilled portion of the content section. The I(K)th entry number in the index section is then set to point to the position of the entry added to the content section. The en- try placed in the content section includes the minor M ( K) and a diction- ary pointer to where the key is placed in the dictionary file as well as a pointer to an entry in the compressed inverted index. If there has previously occurred a bit string B(K1), I(K1), Mr(K1) in which B(L) = B(K), I(K1) = I(K), Mr(K1) # M(K) then no change is File Structure for an On-Line Catalog/DIM5DALE 41 B(K), I(K), M(K) HASH TABLE FILE: COUNTER SECTION: COUNTER SECTION NUMBER NUMBER NUMBER OCCUPIED OVERFLOWS FROM OVERFLOWS INTO Fig. 3. Bucket of the Hash Table File made to the I ( K) th entry in the bucket f3 ( K) or to the minor M ( K1) in the content section. Instead, the chain pointer is set to point to the location of a new entry that is added to the content section. In this new entry the minor is set to Mt(K) and the dictionary pointer is set to indicate where the new key is placed in the dictionary file. There is said to have resulted a collision at the real hash address B(K), I(K). If there has previously occurred a bit string B ( K1), I ( Kt) , M ( K1) in which B(K1) = B(K), I(K1) = I(K ), M(Kt) = Mt(K), where K1 =F K, then the collision bit that precedes M( K.,_) is set to 1. and a further content entry containing M ( K) is chained from the entry that contains M ( K1 ) . There is said to have OCCUlTed a collision at the virtual hash address B(K), I(K), M(K). The last three entries included in the counter section shown in Figure 3 are optional but are useful for monitoring the performance of the hashing function with respect to bucket overflows and so forth. A bucket becomes full when there is no remaining unfilled space in its content section. If a further chain pointer is required from a content en- try, its preceding overflow bit Qc is set to 1 to indicate that the pointer is to another bucket. Likewise, if a further entry is required in the index section its preceding overflow bit Qr is set to 1 to indicate that it refers to an entry within another bucket. The bucket is then said to have over- flowed. Methods of handling bucket overflow, and choice of the new bucket, are discussed in a subsequent section. It should be noted that use of a hash table as described above retains most of the advantages of the usual scatter index method in which the in- 42 Journal of Library Automation Vol. 6/1 March 1973 dex entries and content entries are stored in two separate files. It has the further important advantage that in most instances a single disk access is sufficient to locate both the index entry and the corresponding content en- try. As noted by Buchholz and Reising~ if it is known that certain keys are likely to appear with high frequency in search queries then it is advanta- geous to enter them at the start of creation of the hash file. 22 • 23 They will then tend to appear near the beginnings of the content entry chains and hence require little CPU time for their subsequent location. Furthermore, they will tend to appear in the same bucket as their corresponding index entries, and hence their location will usually require only a single disk ac- cess. NUMBER OF BITS FOR VIRTUAL HASH ADDRESS Suppose the hashing function is chosen so that the majors of the trans- formed keys are uniformly distributed among the R slots available for real hash addresses B,I. If there are N keys then a = N / R may be termed the load factor. It is the average number of keys that are transformed into any given real hash address. The probability that any given real hash address corresponds to k keys is given by M urra y24 as (I) Pk = e-a ak j kl Hence, for any given real address the probability of a collision occurring is N (2) C = ~ Pk = 1 - Po - P1 = 1 - (1 + a)e-a. k= 2 If a collision occurs at a particular real hash address, the expected length of the required chain within the content section is N (3) L = ~ kP~r/C k=2 N = (1 / C) (~ kPk - Pd k:O = ( l j C) (a - ae-a) _ a (e" - 1) ea.- 1 - a It may be noted that if the load factor a is equal to 1 then L = 2.43. If all the transformed keys are distributed uniformly among the V pos- sible virtual addresses B, I, M then the expected total number of collisions at virtual addresses is given by Murray25 as (4) p = N2/ 2V provided V" N. The expected relative frequency of collisions at virtual addresses is therefore (5) f = N / 2V. File Structure for an On-Line Catalog j DlMSDALE 43 It proves convenient to regard N, f, and a as basic parameters in terms of which may be determined the number r of bits required in the major 1 and the number v of bits required in the virtual hash addresses. The value of r must be at least as large as lo~R = lo~(N/a), and hence r may be chosen according to the formula (6) r = r log2 (N/ a) where r means "the smallest integer greater than or equal to." The value of v must be at least as large as (7) v = r lo~V = r lo~ (N/2£). If N and f have the form N = 2n and f = 2-'Y then v may be chosen according to the formula (8) v = n + 'Y - 1 and the number of bits required for the minor is (9) m = v- r. CHOICE OF BUCKET CAPACITY With an 8-bit byte-oriented computer, such as the IBM 360, it proves convenient to use 8 bits of storage for each entry number plus overflow bit within the index section. If a value of zero is used to indicate an un- used index entry there remain up to 127 possible values for entry numbers. Thus the number c of entries in the content section must be less than or equal to 127. Suppose there are b slots for index entries in each bucket. The total number of index entries in the entire file is R. It follows from the results of Schay and Spruth,26 Tainter,27 and Heising28 that the probability P( b, c) of overflow of any bucket is given by oo c-<>b (10) P (b, c) = ~ (ab)k -· k = c + 1 kl For selected values of b, Beyer's tables of the Poisson distribution have been used to compute P ( b, c) and to determine the largest value of c for which P(b, c) L O.OI.29 The results are shown in Table 1 for the in- stance in which a = 1. A similar table has been computed by Buchholz3° for the instance in which c = b and a ranges from 0.1 to 1.2. As is apparent from Table 1, an increase in the value of b allows use of a smaller ratio c/ b and hence permits more economical use of storage. With b = 64 the allowed value of c/b is 1.33 and hence c may be chosen equal to 85. The reduction in access time that results from structuring the file so that each bucket contains both index and content entries is, of course, effected at the expense of additional storage costs. For example, if cj b = 1.33 then the space allocated for storage of content entries is 33 percent greater than if content entries are stored in a separate file. Relaxation of the condition P(b,c)..:::: 0.01 allows a reduction in cj b, but the increased number of buck- et overflows will cause additional disk accesses to be required. 44 Journal of Library Automation Vol. 6/ 1 March 1973 Table 1. Values of b, c, and cj b for which P(b,c~O.Ol when a= 1. b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 60 100 c 5 6 8 10 11 13 14- 15 17 18 19 20 22 23 24 25 27 28 29 30 80 125 TREATMENT OF BUCKET OVERFLOWS c! b 5.00 3.00 2.66 2.50 2.20 2.17 2.00 1.88 1.89 1.80 1.73 1.67 1.69 1.64 1.60 1.56 1.59 1.55 1.53 1.50 1.33 1.25 When a new key is found to map into a bucket whose content section is full then some means must be found to provide space in some other buck- et. The particular procedure that should be used depends on the extent to which the entire set of buckets contain unfilled portions. Suppose that many buckets are almost full and that the number c of al- lowed content entries is less than 127. The entire hash file may then be ex- panded with the same index sections but with longer content sections. If many buckets are almost full and c = 127 then the entire file may be expanded in such manner that each bucket is replaced by a pair of buckets that contain the same number b of allowable index entries, but whose number Ct of allowable content entries is chosen to ensure that P(b,ct ) L 0.01. Such doubling of buckets also doubles the number of index entries but it does not double the storage required for the entire file. Each key K that corresponds to an entry in the original bucket is associated with an entry in the first , or second, of the new buckets according as the leading bit of either its index address I ( K) or its minor M:( K) is equal to 0 or 1. The effect is to shift one bit from I(K) or M(K) into the bucket address B ( K ) . This method is based on a suggestion of Morris. 31 · Suppose that few buckets are almost full. Then a suitable means of de- termination of an unfilled bucket for storage of the minor is through use of some overflow algorithm that determines a sequence of bucket numbers Bo ( K) , Bt ( K) , B2 ( K), etc., corresponding to any given full bucket {:3o ( K) . Suppose there are nb buckets. A quadratic residue algorithm (11) Bi (K) = [B0 (K) + aj + bj2 ] mod nb File Structure for an On-Line Catalog / DIMSDALE 45 has been considered by Maurer and by Bell for use with in-core hash ta- bles, but it suffers from the disadvantage that the existence of a full buck- et /)o ( K) will divert entries into the particular buckets /31 ( K), /32 ( K), etc. and hence cause them to fill more rapidly than other buckets which may contain fewer entries.82 • 33 It is believed that a more desirable form of the quadratic residue al- gorithm is ( 12) Bj ( K) = { B0 ( K) + f1 [I ( K)] } mod nb where fl is a suitably chosen function. Letting B, ( K) depend, through fs, on both j and I ( K), instead of on j alone, allows reduction of the tendency to fill a particular set of buckets. To prevent a tendency to overflow particular buckets it is also desirable for the overflow algorithm to produce bucket numbers that are uniformly distributed among all possible bucket numbers. Among the more promising forms to be chosen for the fl [I ( K)] are the following ( 13a) fj [I ( K)] = I ' ( K) j where j = 1, 2, ... , nb -1, and l'(K) denotes I(K) if I(K ) is odd, but de- notes I ( K) + 1 if I ( K) is even. Since nb is a power of 2 such choice of I'(K) ensures that I'(K) and nb have no common factors, and hence that Bi ( K) steps through the sequence /3o ( K ), /31 ( K), etc. covering every buck- et in the file. ( 13b ) fj [I ( K) ] = I ' ( K ) j 2 where j = 1, 2, ... , r \ / n-1, and r means "the least integer greater than or equal to." ( 13c ) fdl ( K)] = Rdi ' ( K)] where j = 1, 2, ... , Db, and RJ[l'( K)] denotes a number output by a pseu- dorandom number generator of the form suggested by MorrisB4 with an ini- tial input of I' ( K) instead of 1. It may be remarked that use of Equation 13a requires the least number of machine instructions, and the least CPU time per step, but it has a strong tendency to cluster the f31(K) immediately after the /3o(K) and hence it is likely to be the least effective of the three methods. Use of Equation 13b produces less clustering, but the sequence does not include all buckets of the file. Use of Equation 13c requires the largest number of instructions and CPU time per step, but the f3J(K) are less likely to cluster and they are uniformly distributed among all possible buckets. Thus Equation 13c produces shorter chains of overflow buckets and hence re- quires fewer disk accesses. If a new key K maps into a full bucket /3o ( K) then the following pro- cedure is used to determine the bucket into which the minor of K is to be inserted: ( i) The chain of pointers from the I ( K) th entry of the bucket /)o ( K) is followed, possibly through overflow buckets given by Equation 12, in or- 46 Journal of Library Automation Vol. 6/1 March 1973 der to locate the terminal entry of the chain. Suppose this terminal entry is within a bucket /3J ( K) . (ii) If there is available space in bucket /3J(K) then the minor Mr(K) is entered and chained as described previously. (iii) If bucket /3J ( K) is full, but there is space in /3J + 1 ( K), then the minor M ( K) is entered into /3J + 1 ( K) and chained as described previous- ly. ( iv) If buckets f3J ( K) and /3j + 1 ( K) are both full, and bucket /3J + 1 ( K) contains at least one nonempty index entry I ( K') whose chained content entries are all contained within /3J + 1 ( K), then the minor M ( K) is stored according to the following displacement algorithm: The terminal member of the chain from I ( K') is displaced to an over- flow bucket /3r ( K') determined by use of Equation 12, except that if both /3r(K') and /3r + 1(K') are full then a further bucket is determined by use of the displacement algorithm. The minor M ( K) is substituted for the displaced entry in bucket /3J + 1(K) and is chained appropriately. ( v) If application of Step ( iv) leads to a bucket /3J + 1 ( K), or /3r + 1 ( K), that contains no nonempty index entry whose chained content entries are all contained within it, then the entire hash file must be expanded by use of one of the procedures described at the beginning of the present section. It should be emphasized that, although Step ( iv) is necessary for com- pleteness, the probability of its use is very low. With a probability of less than 0.01 for a bucket overflow, the probability of use of Step ( iv) is less than (0.01) 3• SEARCH PHASE AND PROBLEM OF MISMATCH In the previous sections the structure of the hash index file has been dis- cussed with emphasis on details of its creation and update. During search of the catalog files by use of the inverted index, each search key is pro- cessed by the following search alogorithm: Step 1: The search key K is transformed by the hashing function into a virtual hash address B(K), I(K), M(K). Step 2: The bucket /3(K) is read into core. Step 3: The index entry specified by I(K) is examined. If it is empty then the search key is not present in the data base. If it is not empty then Step 4 is performed. Step 4: The overflow bit of the index entry specified by I(K) is exam- ined. If it is equal to 1 then Step 5 is performed. If it is equal to 0 then Step 6 is performed. Step 5: The overflow algorithm is used to determine the address of there- quired overflow bucket which is then read into core, and Step 6 is executed. Step 6: The minor of each entry in the chain of content entries is com- File Structure for an On-Line Catalog/DIMSDALE 47 pared to the minor of the search key's virtual hash address until either a match is found or the chain is exhausted. Whenever the chain leads to an overflow bucket then Step 5 is performed. Step 7: If a match is found for M ( K) then the collision bit of the entry is examined. If it is equal to 0 then Step 9 is performed. If it is equal to 1 then Step 8 is performed. Step 8: The dictionary entry that corresponds to each content entry in the virtual address collision is read into core and compared to the search key K. If no match is found then the search key is not pres- ent in the index. Step 9: This step is included because there is a small probability that a misspelled search key, or one not present in the hash file, may be transformed into the same virtual address as some key already in- cluded in the file. The step consists of reading the corresponding dictionary entry into core and comparing it with the search key. For reasons discussed later in the present section it is desirable to omit this step. It should be noted that in most instances the search algorithm will not require execution of Steps 5 and 8. In fact, with the hash index files de- signed as described in the previous sections, the probability of execution of Step 5 is about 0.01 and the probability of execution of Step 8 is about 2-16• Consequently, if Step 9 is also omitted the number of disk accesses re- quired to find the index entry corresponding to a search key is approxi- mately l.Ol. The mismatch problem, which gives rise to Step 9 of the search al- gorithm, is less serious than might be expected. Suppose the hash function distributes the transformed keys uniformly over all hash addresses. The probability that a new, or misspelled, key maps into an existing entry is given by (14) Pc = NjV The probability that a search leads to a mismatch is therefore ( 15) P m = P .N j V where Ps is the probability that the search key is misspelled or not in the hash table. Thus, for a hash table of N = 216= 65,536 title words and V = 28\ an assumption of Ps = 0.1leads to Pm = 3 X 10- 6• Because Pm is extremely small, and because each execution of Step 9 re- quires up to two disk accesses, it is desirable to omit this step. If experience shows that particular new or misspelled search keys occur frequently, and cause mismatches, they may themselves be entered into the hash index file. In fact, some degree of automatic spelling correction may be provided if some common misspellings are included in the hash files and chained to the content entries that correspond to the correctly spelled keys. Correct, but alternative, spellings of search keys may also be treated in the same man- ner. 48 Journal of Library Automation Vol. 6/1 March 1973 SIZE OF HASH FILE FOR TITLE WORDS Suppose the docwnent collection contains T different titles that comprise a total of W words of which there are N different words. Let W = W /T de- note the average number of words in each title. Reid and Heaps85 have re- ported word counts on the 57,800 titles included on the MARC tapes be- tween March 1969 and May 1970 and have noted that (16) w = 5.5 ( 17) log10N = 0.6 log1oW + 1.2. Examination of other data bases has led to the conclusion that log N is likely to be a linear function of log W over the range 0 L W L 106• For a library of one million titles the Equations 16 and 17 may there- fore be used to predict that when T = 106 then (18) W :::: 5.5 X 106 and N = 1.8 X 105 • It follows from Equation 6 that if a = 1 the number of bits required in the major is (19) r = 18. According to Equation 7, in order to reduce the frequency f of collisions at virtual addresses to 2-16 the number of bits required in the entire vir- tual address is (20) v = r [lo~ (1.8 x 105 + 16 - 1] = 33. Consequently, the number of bits in the minor is ( 21) m = v - r = 15. However, with such a choice of r then R = 218 and the value of the load factor is, in fact, (22) a = N/R = 0.7 It follows from Equation 4 that the expected total number p of colli- sions at virtual addresses is equal to approximately 2. It may be further noted that Murray36 has derived the following approximation for the prob- ability that the number of collisions at virtual hash addresses lies within the range a to d: d (23) P (a, d) = ~ e-"P p1/il (0 ~i L U~ N) i= a where L means "greatest integer less than or equal to." When p = 2 the equation gives a value of 0.9998 for the probability that the total number of collisions lies between 0 and 8. Thus the above choice of r, v, and m leads to a title word hash table file with excellent virtual ad- dress collision properties. Use of Equation 10 with b = 64 and a= 0.7, leads to the result that the probability of bucket overflow may be reduced to 0.01 by choosing c = 62. In view of the above value of m it proves convenient to allocate 10 bytes of storage for each content entry. Each entry consists of a 2-byte portion to contain the 15-bit minor preceded by a collision bit, a 1-byte portion to File Structure for an On-Line Catalogj DIMSDALE 49 contain a 7-bit chain pointer preceded by an overflow bit, a 3-byte diction- ary pointer, and a 4-byte pointer to an inverted index. The 64 one-byte index entries, the 62 ten-byte content entries, and 4 one-byte counters, con- stitute buckets of length 688 bytes. The entire hash file consists of R en- tries, and hence R/b = 212 buckets. Its storage requirement is therefore for 212 X 688 = 2.82 X 106 bytes. It may be remarked that nine 688-byte buckets may be stored unblocked in one track of an IBM 2316 disk pack, and that the entire hash file occu- pies 11.38 percent of the disk pack. When the disk and channel are idle the average time to access such a bucket is the sum of the average seek time, the average rotational delay, and the record transmission time. For storage on an IBM 2314 disk drive the average bucket access time is therefore 60 + 12.5 + 2.8 = 75.3 milliseconds. The average access time for a sequence of accesses could be reduced by suitable scheduling. SIZE OF HASH FILE FOR LC CALL NUMBERS For a library of one million titles the number N of call numbers is 106• If a = 1 and f = .2-16 it follows from Equations 6, 7, 9, and 4 that (24) r = 20, v = 35, m = 15, p = 16. With such a choice of r the load factor is approximately equal to 1. Equa~ tion 23 gives a probability of 0.9998 that the total number of virtual ad- dress collisions lies between 0 and 34. Use of Equation 10 with b = 64 and a = 1.0 shows that the probability of bucket overflow may be reduced to 0.01 by choosing c = 85. The content entries for LC call numbers may be arranged as for title words except that the 4-byte pointer to an inverted index is replaced by a 3-byte pointer to the compressed catalog file. The bucket length is there- fore 64 + 85 x 9 + 4 = 833 bytes. The storage requirement for the hash file is ( 220/ 26 ) x 833 = 13.65 x 106 bytes which may be stored in 2184 tracks, or 54.6 percent, of an IBM 2.316 disk pack. The average time to access a bucket is 60 + 12.5 + 3.3 = 75.8 milliseconds. SIZE OF HASH FILE FOR AUTHOR NAMES In the present section the term "author" will be used to include personal names, corporate names, editors, compilers, composers, translators, and so forth. It will be assumed that for personal names only surnames are en- tered into the author dictionary. A search query that includes specification of authors with initials is first processed as if initials were omitted, and the resulting retrieved catalog entries are then scanned sequentially to elim- inate any entries whose authors do not have the required initials. It will also be supposed that each word of a corporate name is entered separately into the author dictionary, and that the inverted index contains an entry for each term. In the absence of reliable statistics regarding the distributions of author 50 Journal of Library Automation Vol. 6/1 March 1973 surnames, words within corporate names, and so forth, the following as~ sumptions have been made in order to estimate tile size of the author dic- tionary and hash file for a library of one million titles: ( i) Personal author names contain 2 x 105 different surnames of average length 7 characters. ( ii) The corporate author names include 4 X 104 different words of average length 6 characters. (iii) The author names include 1.6 X 104 different acronyms such as IBM, ASLIB, and so forth; their average length is 4 characters. It is thus supposed that N = 2.56 X 105 entries are required in the author hash files. Calculations similar to those of the previous section show that ( 25) r = 18, v = 33, m = 15, p = 4, a = 1.0. Equation 23 gives a probability of 0.9999 that the total number of virtual address collisions lies between 0 and 13. The probability of bucket over- flow may be reduced to 0.01 by choosing c = 85. Content entries of 10 bytes may be arranged as previously described for title words. Hence each buck- et requires 918 bytes of storage. The storage requirement for the hash file is ( 218/ 26 ) X 918 = 3.76 X 106 bytes which may be stored in 586 tracks, or 14.6 percent, of an IBM 2316 disk pack. The average time to access a bucket is 76.1 milliseconds. STRUCTURE OF DICTIONARY FILES The structure of the dictionary files for title words and author names is as described by Thiel and Heaps.87• 38 Each dictionary file contains up to 128 directories each of which points to up to 128 term strings that may each contain space for storage of 128 terms of equal length. Thus each dictionary file contains up to 214 different terms. The dictionary pointers in the hash files are essentially the codes stored instead of alphanumeric terms in the catalog file. The most frequent 127 title words are assigned dictionary pointers of the form (26) 10000000 10000000 1XXXXXXX PT and do not have corresponding entries in the inverted index file. The last byte forms the code used to represent the title word within the compressed catalog file. The next most frequent 16,384 title words are assigned dictionary point- ers of the form ( 27) 00000000 1XXXXXXX lXXXXXXX or (28) 10000000 OXXXXXXX 1XXXXXXX File Structure for an On-Line Catalog/ D!MSDALE 51 according as there is, or is not, a corresponding entry in the inverted index. The last 2 bytes are used as codes in the compressed catalog file. The remaining title words are assigned dictionary pointers of the form ( 29) OXXXXXXX OXXXXXXX lXXXXXXX ...____~ --...------' p~ p~ PT They all have corresponding entries in the inverted index file, and the 3 bytes are used as codes in the catalog file. The reason that terms coded in the form 26 or 28 do not have corre- sponding entries in the inverted index file is that very frequently occurring terms form very inefficient search keys. Also, previous results suggest that omission of corresponding entries in the inverted index allows its size to be reduced by about 50 percent.39• 40 The codes of type PT, ( Ps,PT) , and ( Pn,Ps,PT) are used respectively for approximately 50 percent, 45 percent, and 5 percent of the title words. The average length of the coded title words in the compressed catalog file is therefore 1.55 bytes. Associated with each dictionary file there is a directory of length 512 bytes whose entries point to the beginnings of term strings within the dic- tionary file and also indicate the lengths of the terms. Within the hash ta- ble file a dictionary pointer of the form Po, P s, PT points to the PT th term of the Ps th term string in the dictionary associated with the Po th directory. There is a single directory associated with each set of pointers of type PT and Ps, PT. The average length of the 1.8 X 105 different title words is 7.6 characters, and hence the entire set of term strings requires 1.8 X 105 X 7.6 = 1.37 x 106 bytes for storage of title words. Since twelve directories occupying 12 x 512 = 6144 bytes will be required, and since some term strings will contain un- filled portions, the storage requirement of the dictionary file will be slight- ly larger. If the title word dictionary is stored on disk in 1,000 byte rec- ords then the storage requirement is 238 tracks, or 5.95 percent, of an IBM 2316 disk pack. The assumptions made previously regarding author names imply an au- thor dictionary size of 1.70 X 106 bytes and sixteen directories whose total storage requirements are 16 X 512 = 8,192 bytes. Using an IBM: 2316 disk pack the storage requirement is for 286 tracks, or 7.15 percent. On completion of a search through use of the inverted index .file there results a set of sequence numbers that indicate the position of the relevant items in the compressed catalog file. Before such items are displayed to a user of the system, each term must be decoded through access to the direc- tory and dictionary to which it points. The time required to decode a catalog item depends on how the direc- tories and dictionaries are partitioned between disk and core memory. Sev- eral partitioning schemes for title words have been analysed, and the re- sults are summarized in Table 2. 52 Journal of Library Automation Vol. 6/1 March 1973 In the calculations used to obtain Table 2 it is assumed that title words occur with the frequencies listed by Kucera and Francis.41 It is supposed that both the directory and term strings corresponding to codes of form PT are stored in a single physical record, that every other directory is con- tained wholly within a physical record, and that each dictionary term may be located by a single access to a term string. Any required CPU time is re- garded as insignificant compared to the time needed for file accesses. From the results shown in Table 2 it appears that the best partition be- tween core and disk is probably that which gives an average decode time of 42 milliseconds while requiring a dedicated 1501 bytes of core memory. This results when core is used to store both the directories and term strings for terms that correspond to pointers of type PT, and the directories only for terms that correspond to pointers of type Ps,PT. COMPRESSED CATALOG FILE Since the title word codes stored in the compressed catalog file have an average length of 1.55 bytes, whereas uncoded title words and their delim- iting spaces have an average length of 6.5 characters, the compressed title fields occupy only 24 percent of the storage required for uncompressed words. Uncoded author names and their delimiting spaces have an average length of 7.6 characters and are coded to occupy not more than 3 bytes; hence coding of author names effects an average compression factQr of less than 3;7.6 = 40 percent. For LC call numbers the compression factor is less than 30 percent. Clearly, subject headings, publisher names, and series statements may be coded with even more effective compression fac- tors. The saving in space through compression of the catalog file may be translated into a cost saving as follows. If there are an average of 5.5 words in each title then one million titles include 5.5 X 106 title words and delimiting spaces which, if stored in the catalog file in uncoded form, would require 3.63 X 107 bytes.42 When stored in coded form the require- ment is for 8.54 X 106 bytes. Charges for disk space vary considerably with different computing facilities. At the University of Alberta users of the IBM 360 Model 67 are charged a monthly rate of $.50 for each 4,096 bytes of disk storage. Thus, for title words alone the advantage of storing the catalog file in compressed form is to allow the monthly storage cost to be reduced from $4,440 to $950. CONCLUDING REMARKS The results reported in the present paper indicate that a satisfactory structure for a catalog file may be designed to use the concept of virtual hash addressing and storage of terms in compressed form. Access and de- coding times may be reduced to acceptable amounts. It may prove advantageous to arrange the items in the catalog file in the order of their call numbers. This will tend to reduce the number of disk File Structure for an On-Line Catalog/ DIMSDALE 53 Table 2. Average Time to Decode a Title Word of the Compressed Catalog File. Core Resident Directories Ter-m String None Pr PT, (Ps, PT) All PT, (Ps, PT) All None Pr Pr p.,. PT, (Ps, Pr) 0 Pr, (Ps, Pr ) 0 Average Number Accesses 1.50 1.01 0.55 0.50 0.49 0.44 ( Ps, Pr) 0 signifies the 128 most frequent of the codes Ps, PT Average Decode Time (milliseconds) ll5 77 42 39 38 34 Dedicated Core Memory (bytes) 0 989 1501 7133 2474 8106 accesses needed to retrieve catalog items in response to queries since it will tend to group relevant items. However, the benefits should be weighed against the additional expense required to maintain and update the or- dered file. The present paper has omitted discussion of the form of the query lan- guage or the search algorithm that operates on the elements of the invert- ed index. A formal definition of one form of query language has been discussed by Dimsdale.48 Details of a search algorithm and structure of a compressed form of in- verted index have been discussed by Thiel and Heaps.4 4 It may be noted that each content entry in the hash table file has 4 bytes reserved for a pointer to a bit string of the inverted index. Whenever the bit string is less than 4 bytes in length it is stored in the content section and no pointer is required. Storage of such bit strings within the content entries signifi- cantly reduces the storage requirements of the inverted index and also re- duces the number of required disk accesses in the search phase of the pro- gram. ACKNOWLEDGMENT The authors wish to express their appreciation to the National Research Council of Canada for their support of the present investigation. REFERENCES 1. D. Lefkovitz and R. V. Powers, "A List-Structured Chemical Information Retl"ieval System," in G. Schecter, ed., Informatio-n Retrieval (Washington, D.C.: Thompson Book Co., 1967), p.l09-29. 2. P. R. Weinberg, "A Time Sharing Chemical Information Retrieval System" (Doc- toral Thesis, Univ. of Pennsylvania, 1969) . 3. R. M. Curtice, "Experimental Retrieval Systems Studies. Report No. 1. Magnetic Tape and Disc File Organization for Retrieval" (Master's Thesis, Lehigh Univ., 1966). 4. D. Lefkovitz, File Strttctures for On-Line Systems (New York: Spartan Books, 1969). 54 Journal of Library Automation Vol. 6/ 1 March 1973 5. I. B. Holbrook, "A Threaded-file Retrieval System," Journal of the American So- ciety for Information Science 21: 40- 48 (Jan.-Feb. 1970). 6. G. G. Dodd, "Elements of Data Management Systems," Computer Surveys 1:117- 33 (June 1969). 7. J. W. Rettenmayer, "File Ordering and Retrieval Cost," Information Storage and Retrieval8:19-93 (April1972). 8. R. T. Divett, "Design of a File Structure for a Total System Computer Program for Medical Libraries and Programming of the Book Citation Module" (Doctoral Thesis, Univ. of Utah, 1968). 9. H. P. Burnaugh, "The BOLD (Bibliographic On-Line Display) System," in G. Schecter, ed., Information Retrieval (Washington, D .C.: Thompson Book Co., 1967)' p.53-66. 10. Lefkovitz, Powers, "A List-Structured Chemical Information," p.109--29. 11. Lefkovitz, File Structures for On-line SysteTM, p.141. 12. Ibid., p.177. 13, F. G. Kilgour, "Concept of an On-Line Computerized Catalog," Journal of Li- brary Automation 3:1-11 (March 1970). 14. J. L. Cunningham, W. D. Schieber, and R. M. Shoffner, A Study of the Organiza- tion and Search of Bibliographic Holdings Records in On-Line Computer SysteTM: Phase I (Berkeley: Univ. of California, 1969). 15. R. S. Marcus, P. Kugel, and R. L. Kusik, "An Experimental Computer Stored, Augmented Catalog of Professional Literature," in Proceedings of the 1969 Spring Joint Computer Conference (Montvale: AFIPS Press, 1969) p.461-73. 16. J. W. Henderson and J. A. Rosenthal, eds., Library Catalogs: Their Preservation and Maintenance by Photographic and Automate d Techniques; M.I.T. Report 14 (Cambridge, Mass.: M.I.T. Press, 1968). 17. I. A. Warheit, "File Organization of Library Records," Journal of Library Auto- mation 2:2(}...30 (March 1969) . 18. R. Morris, "Scatter Storage Techniques," Communications of the ACM 11 :38-44 (Jan. 1968) . 19. D. M. Murray, "A Scatter Storage Scheme for Dictionary Lookups," Journal of Library Automation 3:173-201 (Sept. 1970). 20. W. Buchholz, "File Organization and Addressing," IBM Systems Journal 2:86-111 {June 1963). 21. P. L. Long, K. B. L. Rastogi, J. E. Rush, and J. A. Wyckoff, "Large On-Line Files of Bibliographic Data: An Efficient Design and a Mathematical Predictor of Re- trieval Behavior," in Information Processing 71 (North Holland Publishing Com- pany, 1972) p.473-78. 22. Buchholz, "File Organization," p.l02-3. 23. W. P. Reising, "Note on Random Addressing Techniques," IBM Systems Journal 2:112- 16 (June 1963). 24. Murray, "A Scatter Storage Scheme," p.178. 25. Ibid., p.181. 26. G. Schay and W. G. Spruth, "Analysis of a File Addressing Method," Communi- cations of the ACM 5:459-62 (August 1962). 27. M. Tainter, "Addressing for Random-Access Storage with Multiple Bucket Capaci- ties," Journal of the ACM 10:307-15 (July 1963). 28. Reising, "Note on Random Addressing," p.ll2-16. 29. W. H. Beyer, Handbook of Tables for Probability and Statistics (Cleveland: The Chemical Rubber Company, 1966). 30. Buchholz, "File Organization," p.99. 31. Morris, "Scatter Storage," p.42. 32. W. D. Maurer, "An Improved Hash Code for Scatter Storage," Communications of the ACM 11:35-38 (Jan. 1968). File Structure for an On-Line Catalog/DIMSDALE 55 33. J. R. Bell, "The Quadratic Quotient Method: A Hash Code Eliminating Secondary Clustering," Communications of the ACM 13:107-9 (Feb. 1970). 34. Morris, "Scatter Storage," p.40. 35. W. D. Reid and H. S. Heaps, "Compression of Data for Library Automation," in Canadian Association of College and University Libraries: Automation in Li- braries1971 (Ottawa: Canadian Library Association, 1971), p.2.1-2.21. 36. Murray, "A Scatter Storage Scheme," p.183. 37. L. H. Thiel and H. S. Heaps, "Program Design for Retrospective Searches on Large Data Bases," Information Storage and Retrieval8:1-20 (Jan. 1972) . 38. H. S. Heaps, "Storage Analysis of a Compression Coding for Document Data Bases," INFOR 10:47-61 (Feb. 1972) . 39. Thiel and Heaps, "Program Design," p.l5-16. 40. Reid and Heaps, "Compression of Data," p.2.1-2.21. 41. H. Kucera and W. N. Francis, Computational Analysis of Present-Day American English (Providence: Brown University Press, 1967). 42. Reid and Heaps, "Compression of Data," p.2.4. 43. J. J. Dimsdale, "Application of On-Line Computer Systems to Library Automa- tion" (Master's Thesis, Univ. of Alberta, 1971), p.50-68. 44. Thiel and Heaps, "Program Design," p.l-20. 5759 ---- lib-s-mocs-kmc364-20141005043103 The New York Public Library Automated Book Catalog Subsystem S. Michael MALINCONICO: Assistant Chief, Systems Analysis and Data Processing Office and James A. RIZZOLO: Chief, Systems Anal- ysis and Data Processing Office, The New York Public Library. 3 A comprehensive automated bibliographic control system has been devel- oped by the New York Public Library. This system is unique in its use of an automated authority system and highly sophisticated machine filing al- gorithms. The primary aim was the rigorous control of established forms and their cross-reference structure. The original impetus for creation of the system, and its most highly visible product, is a photocomposed book catalog. The book catalog subsystem supplies automatic punctuation of condensed entries and contains the ability to pmduce cumulation/ supple- ment book catalogs in installments tL'ithout loss of control of the cross- referencing structure. BACKGROUND In 1965 studies confirmed what much of the New York Public Library's administration had long felt: the public card catalog of the Research Li- braries, containing entries dating back to 1857, was rapidly deteriorating.1 It was estimated that 29 percent of the cards were illegible, damaged, or in some other way unusable. Further, cataloging and card filing arrearages were monotonically increasing at an alarming rate. Increases in labor costs were eroding all efforts to cope with these problems manually. In addition, the deputy director at that time (now director), John M. Cory, realized that a wider base of support was absolutely essential to the survival of the New York Public Library as an institution. As a result of these disquieting observations, three logical conclusions followed. First, the existing card catalog would have to be closed off, re- habilitated, and photographically preserved. Second, available technology should be explored as a possible solution to some of the spiraling arrearage problems. In particular the applicability of computer technology was to be explored. This exploration appeared to offer some most attractive long- term solutions. The capture of all future cataloging in a machine-read- able form would obviate for all time the deterioration problem. This strategy could also provide a basis for a check against spiraling costs, since traditionally unit costs have tended to increase in manual and decrease in 4 Journal of Library Automation Vol. 6/ 1 March 1973 automated systems.2 Seen within the context of the MARC project at the Library of Congress ( LC), the economies were becoming manifestly ob- vious. The long-term benefits to the entire library community of a national network of shared machine-readable bibliographic data could not be de- nied. Capture of data in machine-readable form for use by information retrieval systems which might become economically feasible in the near future had to be viewed as a matter of great value. Third, wider access to the resources of the New York Public Library had to be provided if a wider base of support for the library's operation was to be sought. The solution decided upon was the development of an automated bib- liographic control system capable of producing photocomposed book cata- logs. The book catalog would then serve as the prospective catalog and aug- ment the retrospective card catalog, which would also appear in book form following photographic duplication of the cards. 3 This solution, at one stroke·, addressed itself to all three of the major problems, and showed great promise as a future investment. Reproducible book catalogs could be widely distributed. A machine-based system would eliminate manual filing, would take full advantage of cataloging available from MARC, and would begin at the earliest possible time the establish- ment of an invaluable machine-readable bibliographic data base. Photographic techniques had already been employed in producing book catalogs, e.g. the National Union Catalog, the Book Catalog of the Free Li- brary of Philadelphia, and the Enoch Pratt Free Library catalogs, among others. 4 Computer-produced book catalogs embodying various techniques (computer line printing, photo-typesetting, etc.) and levels of sophistica- tion were being produced by many institutions, e.g. Harvard University's Widener Library Shelflist, Stanford University's Undergraduate Library Catalog, Baltimore County Public Library's catalog, among others.5- 7 An extensive review of various types of book catalogs including typical pages of each is given by Hilda Feinberg.8 Following extensive studies conducted by Messrs. Henderson, Rosenthal, and Nantier of the NYPL Research Libraries, the Systems Analysis and Data Processing Office (SAD PO) was formed, staffed by EDP and library spe- cialists, to be completely dedicated to the solution of problems of automated bibliographic control and library automation. From the beginning it was de- cided that if EDP technology were to be utilized, it should qe utilized in a manner which took full advantage of the properties of the medium. The computer was not to be used as an ultrasophisticated and costly print- ing press. The application of new technology to a field will invariably lead to waste and awkward results if the intrinsi_c properties of the technology are not fully utilized. The fundamental properties of EDP technology lie in its abilities to: 1. Reorganize and combine data; 2. Select items meeting a set of predefined conditions; 3. Maintain a permanent but flexible correlation between items; ' Automated Book Catalog Subsystemj MALINCONICO 5 4. Transform a set of conditions into data; 5. Perform all of the above with remarkable speed and accuracy; 6. Perform all operations with a merciless consistency. Thus, it was realized, at the outset of the project at NYPL, that technology could provide a great deal more than the maintenance of a machine-read- able record and its reorganization for display. A rigorous control of bib- liographic data was possible, and would extract maximum utility from any investment in EDP technology. It was with these ideas in mind that ma- chine-based authority control and filing systems were developed. The au- thority control file provides the fnndamental utility of the system. Con- trol of data usage has always been of paramount concern to the profes- sional bibliographer. It becomes even more important in a machine-based system in which the data lie in an essentially invisible form until a fairly complex display operation is performed. Advantages of an Authority File Another bibliographic aid which the computer could provide through an authority control system was the maintenance and integrity of a cross- reference structure. In addition, one of the classical functions of cross- referencing could be eliminated : it would no longer be necessary to direct a user from one classification which has been used extensively to a newer one when terminology changes. Consider the problems which might arise if the Library of Congress were to change its current usage of the heading Aeroplane to Airplane. It would be virtually impossible, under a manual system, for a library to attempt to locate, alter, and refile all cards bearing the tracing Aeroplane. With a central authority file the problem is reduced to a single transaction and a fraction of a second of effort by the comput- er. The change is effected with an accuracy unattainable in a manual sys- tem. Finally, the common nuisance of a cross-reference leading to yet an- other cross-reference is automatically obviated. The presence of a machine-readable authority file and the ability to veri- fy use of all forms against this central authority, with machine accuracy, eliminates all clerical errors in the usage of names and headings to which a manual system is susceptible. The problem of consistent usage is greatly compounded in a machine-based system which does not provide mechanical verification. Inconsistencies in any automated system generally tend to di- minish its utility, and invariably lead to ludicrous results. Nonetheless, in- consistencies of usage in an automated system are more readily corrected than those in a manual system. The existence of a central authority file, however, reduces the operation to maximum simplicity and allows no devia- tion from established standards. While maximum rigor in machine control was attempted, an attempt was also made to shield the professional librarian, who would be using the system, from as much of the tyranny imposed by the machine as possible. In the system finally adopted, the librarian need only exercise care when 6 Journal of Library Automation Vol. 6/ 1 March 1973 establishing a form. Following establishment of the form, the cataloger need not be concerned, with any of the details of the entry, such as punctu- ation, accent marks, MARC delimiting or categorization. The authority subsystem supplies all such details. In short, the cataloger is only required to spell the form correctly. The machine will identify any incorrect usage; thus a great deal of tedious and time-consuming (and thereby costly) man- ual searching is eliminated. At the same time that work began on the automated system at NYPL ex- tensive activity in library automation was also in progress in many other parts of the country, involving virtually all areas of library operation: cat- aloging, acquisitions, serials control, circulation, and reference services (in- formation retrieval) . Since, at NYPL, it was assumed that the bibliograph- ic data base and its conb·ol would form the cornerstone of each of these systems, cataloging was given first priority. This approach differed from that taken at other institutions; others, Columbia University for example, chose to develop an acquisitions system first. 9 Still others developed highly sophisticated circulation systems, Ohio State University being notable among these. 10 Even among those institutions which chose to address themselves to the problems of automated cataloging, important differences in approach were evident. These diHerences were largely a result of attempts to solve differ- ent types of problems related to cataloging. Among the many projects ini- tiated at that time two will be mentioned, as they are representative of the differences in approach to automated cataloging. The first is represented by the University of California Union Book Catalog project, undertaken by the Institute of Library Research ( ILR). This system is characterized by an attempt to minimize, via computer pro- gramming, manual intervention in data preparation. Employing the tech- nique of Automatic Format Recognition, the ILR staff attempted to find the most economical means of rendering a vast amount of retrospective data in- to machine-readable form. 11 In converting such a large amount of data they had to also concern themselves with the statistical error levels to be expect- ed from keying. Having decided that extensive manual edit was too time- consuming and costly, and itself prone to statistical error, they attempted to create computer programs which would use the massive amounts of data as a self-editing device. In a sense, ILR used the nature of the problem as its own solution. The goal of the project was the production of a book catalog representing a five-year cumulation ( 1963- 1967) of materials on the nine University of California campuses, and a MARC-like data tape clean enough for print purposes. NYPL, on the other hand, decided to consider only prospective materials in a continuously published catalog, and the creation of a MARC-like record which would approach in com- pleteness, as closely as was economically feasible, that created by the Li- brary of Congress. To this end manual tagging and editing were absolutely essential. Automated Book Catalog Subsystem/MALINCONICO 7 The second system to be considered is the shared cataloging system de- veloped by the Ohio College Library Center.12 The primary emphasis here is on the economy to be derived by instantaneous access to the combined cataloging efforts of a cooperating group of libraries. At OCLC the pri- mary emphasis was placed on on-line bibliographic data input and access. The major bibliographic product to be produced was a computer printed card set. The overriding consideration of OCLC was the sharing of re- sources among many users, while at NYPL the major concern was the con- tent integrity of a single user's file. Advantages of a Book Form Catalog A book form catalog has several advantages over a card form catalog: it is portable, compact, more readily scanned and extremely simple to re- produce. When coupled with an automated system for maintenance and production the advantages are greatly magnified, as manual filing is virtual- ly eliminated. The format, sequencing, and usage of terms in a book catalog may be varied at will to accommodate users' needs and library service pol- icies. Advantages and disadvantages to book catalogs are summarized in the introduction to Tauber and Feinberg's collection of articles on book cata- logs.13 Comparisons of book versus card catalogs are presented by Cath- erine MacQuarrie and Irwin Pizer in articles reprinted in the work cited above.14• 16 The most obvious advantage of the book catalog is its portability. Wide availability of the catalog of a library's collection makes possible a level of service not economically feasible under any other system. Access to the complete collection of a library system can be made available economically to every educational institution in the region served by the system. Access to a highly valuable research collection can be made available to a much wider geographic region than was hitherto possible. The concept of a union catalog for a region becomes much more viable, making possible re- gional cooperation in acquisitions policies and relieving the burden of heavily duplicated collections currently borne by library systems within manageable geographic regions. Such cooperative ventures allow the cost of maintaining the catalog to be defrayed among the various members of the consortium. Thus, a book form catalog would appear to provide groups of libraries with the possibility of operating economies, while in- creasing the overall level of service to the public they serve. The utility of a book form union catalog has already been demonstrat- ed by the experience of the M~d-Manhattan Libraries in New York. Mid- Manhattan, a central circulating library, consists essentially of five librar- ies in two locations. Provision of complete bibliographic access with a tra- ditional card catalog would require the manual maintenance of five indi- vidual and two union catalogs. The utility of the Mid-Manhattan catalog has been further increased with the inclusion of the entire NYPL Branch Library system in January 1973. 8 Journal of Library Automation Vol. 6/1 March 1973 A library's internal operation benefits by wide availability of the catalog, as individual copies of the catalog can be made available to the acquisi- tion division, the cataloging division, and each of a library's special collec- tion administrators, making references to the traditional Official Catalog more efficient; such has been the experience of NYPL. Baltimore County Public Library reports a similar finding.16 A perhaps hidden advantage of a book catalog lies in its compactness. A book catalog requires neither the space nor the expensive furniture re- quired by a card catalog. The problem of space becomes more and more acute as the "information explosion" continues to mushroom. An ironic squeeze is encountered in that the collection yearns more and more for the space occupied by the catalog, while the catalog, in growing, continues to make its own demands on available space. DESCRIPTION OF THE NYPL BIBLIOGRAPHIC SYSTEM FILES Before attempting to describe the book catalog subsystem, we shall briefly describe the nature of the files from which the bibliographic data are drawn. The complete bibliographic system consists of four major files and com- puter programs for their control and maintenance. • The files are: 1. Complete MARC data base (updated weekly with all changes and ad- ditions) from which cataloging may be drawn; 2. Bibliographic master file; 3. Authority master file; 4. Bibliographic/ Authority linkage file. For the purpose of this discussion we shall take the existence and main- tenance of these files for granted, and concern ourselves solely with their use in the production of photocomposed book catalogsP Bibliographic Master File This file contains unit records for each bibliographic item in the collec- tion; t books and book-like materials, monographs, serials, analytics and in- • The system actually consists of three independent sets of such files (MARC is com- mon to all)-one each for the Research Libraries, the Branch Libraries, and the Dance Collection. t Separate data bases are maintained for the Research and Branch Library systems. The Research Libraries file contains all book and certain book-like material added to its collections since January 1971. The Branch Libraries' file contains all holdings of books and book-like materials of the Mid-Manhattan Library collections. This file currently duplicates to a large extent the holdings of the rest of the Branch System, and will eventually encompass the entire system. Automated Book Catalog Subsystem/MALINCONICO 9 dexing items are included.: The information content is identical to that of MARC records. Tagging and delimiting adhere to the MARC conven- tions except in those cases in which it was necessary to expand delimiting in order to enhance the functional utility of the MARC coding structure. Some data distinctions which MARC has since dropped, but which are nonetheless useful, have been retained. The expansions consist of the ad- dition of several delimiters not used by MARC in order to provide filing forms (which are automatically generated, but which may be manually overridden) for titles, and sequencing information for volume numbers of series and serials. Transformations from a MARC II communications for· mat to the NYPL format and vice versa are possible due to the isomor· phism of the two records. The transformation of MARC II format records into NYPL processing format is carried out in the normal course of proc- essing, in which MARC records are selected for addition to the NYPL files. Authority Master File This file is the central repository of all established forms. Names (per- sonal, corporate, and place), series titles, uniform titles, conventional ti- tles, and topical subject headings are all established on this file. Categoriza- tion of each form is controlled by this file. No form is accepted for use in a bibliographic record unless it matches a form already established on the authority file, and is used consistently with the categorization assigned to it, e.g. a form categorized as a topical subject is never permitted as an author, a series title may only match a form categorized as a title, etc. The cross-reference and note structures are maintained on this file. An additional heading employed by NYPL, which falls conceptually half way between a cross-reference and a subject heading, the dual entry, is also con- trolled here. The dual entry heading serves to bring together, under a non- LC heading, bibliographic items which NYPL considers unique, by virtue of the nature of its collection. An example might be found in the geneal- ogy division which contains a very extensive collection dealing with New York City. Use of the dual entry allows a sequencing under both a subject heading indirectly regionalized to New York City (LC heading) and at the same time a drawing together of all items about New York City into a sin- gle sequence headed by New York City. Take, for example, the LC estab- lished heading Elections-New York (City); NYPL automatically causes all ~ At NYPL a distinction is made between analysis and indexing of a work. The latter refers to selective analysis, used when it is desired to provide, for example, subject access to a significant article in a periodical without creation of the series added entry. There are two types of indexing provided by the NYPL system. The fust creates only a subject tracing; such treabnent might be accorded an article of topical significance by a staff writer of a popular periodical. The second would create both an author and subject entry; this might be used in the case of an author of note writing on a significant subject in a popular periodical, e.g. Norman Mailer writing on political conventions for Esquire Magazine. 10 Journal of Library Automation Vol. 6/ 1 March 1973 items traced to the above heading to appear under both the LC heading, and the dual entry New York (City)-Elections (Figure I). The dual en- try merely provides an alternate form of organization for display. No bib- liographic tracing is permitted directly to a dual entry. The additional en- try point is automatically created when a catalog is printed. Manual effort by the cataloger in order to provide the additional entry point is prevent- ed; in addition, the bibliographic record remains rigorously MARC-com- patible. Automatic control of cross-references, dual entries, and the en masse al- teration of classification are facilitated by the authority subsystem togeth- er with the correlative and reorganizational capabilities of the computer. There is some irony in the relative ease with which the computer allows such individualized organization of data to be effected and the computer's reputation-richly deserved-for imposing a bland uniformity on its vic- tims. The authority file provides one other invaluable service: it controls, in a single location, filing forms to be associated with a heading. Consistency of filing is assured and, again, extreme simplicity of alteration is possible. Only one record need be changed in order to alter the filing of the entire 201 Election handbook. [Boise, 1970] 87 p. 71-511901 [JLD 71·314) Elections in Ghana, 1969. Austin, Dennis, 1922· [New Delhi, 1970] 26 p. 7l -44166S [JFE 71·564] ELEctiONS • JAPAN. Curtis, Gerald L. Eleclion campaigning, Japanese style. New York, 1971. xiii , 275 p. 71-591294 [JLD 71-805) ELEctiONS • JURISPRUDENCE. see ELECflON LAW. ELEctiONS • LANCASHIRE, ENG. • HISTORY. Clarke, P. F. Lancas hire and the new liberalism. Cambridge [Eng.} 1971. ix, 472 p. 71-S09SJ8 (JLE 11·191) ELEctiONS • MANAGEMENT AND METHODS. see ELECTIONEERING. ELEctiONS • NEW YORK (CITY) .. Ivins, William Mills, 1851-1915. Machine politics and money in elections in New York City. New York, 1970 [cl887} 150 p. 72-41160 [IRGN 72-92] ELECfiONS ·NORWAY. Koritzinsky, Theo. Velgere, partier og utenrikspo litikk. Oslo, 1970. 182, [I] p. 12-261079 [JLD 72·536) 2 4 NEW YORK (CITY) • ECONOMIC ASSISTANCE. Poston, Richard Waverly. The gang and the establishment. New York [ 1971] xii, 269 p. 72-59612 [JLD 72-583] NEW YORK (CITY) • ECONOMIC ASSISTANCE· LAW AND LEGISLATION. U. S. Congress. House. Committee on Education and Labor. Subcommiltee on the War on Poverty Program. Antipoverty program in New York City and Los· Angeles. Washington, 1965. vii, 209 p. n -222049 [JLE 72·171) NEW YORK (CITY) • ELEctiONS.+ Ivins, William Mills, 1851-1915. Machine politics and money in elections in New York City. New York, 1970 [cl887] 150 p. 72-41160 [IRGN 72-92] New York (City). Environmental Protection Administratioll. Fabricant, Neil, 1937· Toward a rational power policy: energy, politics, and pollution. New York [1971] vi , 292, [30] p. 72- 143433 [JSE 72·291] New York (City) Federation of Jewish Philanthropies. see Federation of Jewish P•iluthropies of New York. Fig. 1. The NYPL Research Libraries Dictionary Catalog, July 1972: CIU-F page 201 on the left, and L- N page 297 on the right. Dual entries under New York (City) are shown on the right. This cata- log was produced in 6 and 8 pt. type set on 8 pt. body. Automated Book Catalog Subsystem/ MALINCONICO 11 body of material associated with a heading. Filing forms are automatically generated, with provision made for a manual override. Automatic filing has been found to be correct in better than 95 percent of the cases current- ly in use. The remaining 5 percent required manual intervention. The machine filing algorithms are based on language and on MARC categorization and delimiting. 18 Initial articles are dropped in each of thirty-eight languages, including the major languages transliterated into a romanized alphabet (those employing Cyrillic alphabets, oriental lan- guages, Hebrew, and Yiddish) . Chronological subdivisions are filed auto- matically obseiving rules regarding inclusiveness of dates, etc. Important chronological periods (currently fifty-four such periods) are recognized and filed automatically, e.g. American Revolutionary and Civil Wars, French Revolutions, Chinese dynasties, Middle Ages, etc. Roman enumer- ation is automatically filed in correct decimal sequence. Bibliographic/ Authority Linkage File The basic function of the bibliographic/ authority linkage file is to pro- vide a communications channel between the two major files by assigning to each authority form a neutral unique number. The linkage file then provides access to the established form regardless of the metamorphoses which it may have undergone since its original use (the number remains inviolate). Each authority upon addition to the file is assigned a unique number; however, the authority file is sequenced by an alphabetic sort key. This sort key bears no logical relationship to the filing form of the heading; it is constructed by dropping punctuation and accent marks, converting to up- per case, dropping multiple blanks, and appending a hash total. The link- age file maintains the correspondence between authority control number and alphabetic sort key. Only the authority control numbers, determined by the first bibliographic/ authority file match for each field, are carried in the bibliographic records. In addition, information is provided to the book catalog subsystem re- garding changes in the authority file (alteration of established forms, etc.) which would cause an entry exhibiting such alterations to be imme- diately regenerated for inclusion in a book catalog supplement. Appropri- ate action is taken against the bibliographic file when activity to an au~ thority heading is sensed by the book catalog subsystem. The presence of a dual entry form, which will require the creation of an additional entry under the associated variant form, is also indicated here. Alternative Input Files It should be mentioned that the full set of files described above is not a mandatory requirement for creation of a book catalog. A bibliographic file in a MARC II communications format alone will suffice. We have per- formed tests using both another library's data file and the MARC file as 12 Journal of Library Automation Vol. 6/1 March 1973 sole input to the system. Using unmodified file update software we have generated from these MARC II format data bases complete authority files, and thence book catalogs. No cross~references or scope notes are possible in this mode of operation, since MARC makes no provision for them. A further experiment was performed using another library's data base (in MARC II format) in combination with the cross-reference structure of the NYPL authority file. This led to highly satisfactory results, demonstrat- ing that a photocomposed catalog could be created, and exhibiting the util- ity of the input file enhanced by cross-references.§ THE PHOTOCOMPOSED BOOK CATALOG SUBSYSTEM The system for production of book catalogs represents only the visible tip, albeit a large and complex tip, of the entire bibliographic system. It consists, in a11, of ten computer programs and several score modules. The system was designed with thought toward production of catalogs with a variety of output options. In most cases, these options can be attained by the elimination of entire programs or modules. Space does not permit a consideration of all possible variations; the most important will be men- tioned in the course of the discussion. One consideration which was deemed of paramount importance was to remain as independent of photocomposition hardware as possible. Photo- composition is yet in its infancy; hence, an inextricable commitment to a particular device, it was decided, was to be avoided. The final approach taken was the design, by SADPO, of generalized photocomposition soft- ware which is responsive to device-independent typographic commands. The only function of this software is to accept, as input, completely de- fined text data and typographic instructions from which it generates for- matted pages. This task is accomplished via a translation of device-inde- pendent into device-particular commands in the form of a photocomposi- tion device driver tape. Should a new or more desirable photocomposition device become available, or significant advantage be found in employing a different photocomposition vendor, only one program need be altered. The photocomposition software is completely generalized and can be used to generate anything from book catalogs to typeset prose, in virtually any format (see the section on the pagination program for a discussion of the formatting options provided). Figures 2, 3, and 4 demonstrate some of the possibilities . . The creation, organization, and control of data to appear in the catalog was undertaken as a completely distinct set of programming tasks. Design Obfectives of the Book Catalog System Before embarking upon a discussion of the technical aspects of each G The September 1972 Hennepin County Public Library book catalog was published with a bibliographic data base produced by the Hennepin County Library combined with the NYPL Research Libraries' authority flle. Automated Book Catalog Subsystem/ MALJNCONJCO 13 DICTIONA R)' CATALOG SUPPLEMEN T, AUGUST/NO VEMBER 1972 A. A. A. see Associ~td Amt'rlc.n Artists. A. A. H. P. E. R. see .A•etitoan Assod1tion for H• altb. Ph)l'iQ' IAlllcMioft, a•d Recrn.lio.-. A. A. t... L 1o«: Ar.rima .4uoe:ia tl011 ol Law l.lh..-. A AMOA re~rts. Afro-American Music O ppnrtu!UiiC's Assnd:lllinn. v. 3. no. 4- ; O c i./Oc<:. 197 1- MinneapOlis. CI.JRRF.NTlSSUES A V A ILABLE t:-1 MUSIC DIVISION . 12-.. Dl .. 979 [M.,sie Div.) A. A. s. sec Assor:i:ado• l•r ASilft Sra~~d~s. A. A. S. H . 0 . F or corpoTati: body repreK!I.ICd by cht'sc ioailiits. ~e:. A~k:aJto ~ioal .r S tMe HiPw•y Otneials. A. B. A. sec: American Bankert AiseciS~~Iiod. Abe clt r Ft rmehtmpr:ill.let'lt chnik. Sareng., Klaus K. ( Bertin. 19701 243 p. 72-J96ll1 (JSD 72·391] An A. B.C. of B:ritidl fe.Ms. Ma~arthy, D.1phne. 1-ondnQ, 1971. 127 p., 2() plaitS. 72-21S.UI (JSE 72· 640] ABC ~oor de ... atenporl. Bron&erJ, J . F. Laren [197 1] . 199 p. 12·l9llll [JFD 7l·l831] ABC"s ol libr.ry proNotiOJt. Sherma n, Stc11e, L9l8· Metuc he n, N.J., 19 7 l.lv, U2 p. n .nn •7 (JLD 72-1106] A. C. A. For corpot"ate 'body represc:r.te d by these iflilials. !4X: Assocbtell Coundls • f tile Arts. A CU D: • n-itk al iMi1llt int• Israer.s. 111--.s. 1, no. S/6-; lt\ l.!g._ 1971?· Oeylon. Mo. CURR ENT I$SI.J£S AVAILA8LE IN PEJHODIC A LS DIVISiON. 72-421 5-tO (Per D l,,] A. C. S . moaow•ph. ~ee Ame rican Chen~ieal Society. ACS mo11•1raph. A etUI ha rfd•no. Mn trona.rdi, Lucio, 193().. Milano, )971. 14fl p. 71·2HI1.S [JFil 71· 1735] t\VAIL ..-.OLE 1N PERIODICALS DIVISION. i'2-2UI24 (P~- Div.] A. I . P. see ArHr ftaa l11stit•te of PI••Mts. A. J. R. se.~ Au.claz.IMV .ita1ia .. rsui. A. J. C. su A .. r~ Jewisll C~tec. A. L. G. 0 . L. (COMPUTER PROGRAM LANGUAGEI see ALGOL (COMPUTER PROGRAM UNGUAGEI A. L. P. see Lllbor Parf)" (A.astraUa). A Ia quile de Din. S.b4e. Philippe de. Parfs. 1970.96 p . 12·11112> [~FO 7:1-,78] A Ia ~he clts trisws ~ Bohattc;. M ilnsl.a\1, [Pracuc-. 1970) ,;g, (63J p. nf ill"-s. lp :u1 c nl.) 1l-<40Jil76 [IFG 12--9"] A La Vitille Rutslf:, in.c. The att or the anLdsmith. & the jc:we\tr; a loan C'lhibition ror the bcntfit c f the: Y oun1 Women·, Chri51i an Auoc:ia,ion of 1hc: City nf New Y ork. Novtn~bc:r 6·Novcmbc:r Zl. 1961. New York (1 96!1 U9 p , ill us. (pari col.) 2& em. 7l.J09ll2 [MN O 71-UI] A. M . A. RC: A.-rkaa MN.ical AuoclMka. AMDG, a histor y ol C•dhts Coli~ 18?0.79. Harney . Thomas E .. tl9 7· C~n i!dus Col le&e. N.e w York (1 971llS ~ p. n-m(;}o 7 HOlO] A. M. I. R . A. ~ A111straliu Miael"'!iJ lnlll•stries Ranrch Auod•tio.t. A. M . I. R . A. bullt tla. A llUraliaa Mu~cral Industries Rnurc:h Associauon. Bulletin. fMtibou rnc:) FULL RECOlO O F HOLDI NGS IN CENTRAL SERIAL Jt.ECORD- 12·171208 [JSP 7l-29t] A. M. S. !ICC: Ame riun M•t .. c~nclk•l S odety; Amcric:a n M et toroloaica l Sot'if:ty, A M o•tevidu. 0<)Uoq:ha.lk. louis Moreau, 1829· 1869 . (S)rnphony. nca. 2) (1168?) score: (SJ pI 7l·lt;<91 (lNG 7H9) Aaron. Daniel. 19J2· Supervi;sioll :and C un-lcwlum De11elopmen 1. A. S. C. I. see Hylller•bN, 11\di:a. Admin.lscr • tl'e S taff Col~ae ol I-.4.1L A. s. 1. s. Stt AIDt:ric ... Soclelr r.,. t.r.r-.ati .. Sdctl«. ASIS Werlc,O.p ef C..,.t~r CDMposkioR, Washi•ston. D. c., \970. Prn.::ccdin &'· Edited by R o bert M. Lartd.au. Wu hina:to n , AtiUtican Society fnr T11rorm<1 1i on Scicnc:t [I 97 11 he., 2 58 p. illw. 24 em. BiblioJtllpl'ly: p. 249-l SO. 1l·l'161JO (JFE 72·1217) A. S. S. £. Me A.uricu S.C~ty of Sa-.itary ~ ... A. S. T. D . see AMeT~ S.Clety fer Tr ... ni"l Md J>e..-~lop•t•t. AS1Mt l£S/ AIAA Sptt:t Sinn1.lation Co,.ruc "c~, :zct. Philt delpllia. 1967. Tcchnk~l p.ap~r!l. ~~:~ri':.~h;~;)j~~219 ~~t~r. ~3' ~:.tin& •CCH.pon10rrd b y A..nmta.• S!IKiely for TtU.I'-1 •nd M I MriM. lm;ritlll:t of £nnm.-enut St-ienca (Md) J\llteri.:n lm.~itiMt of Auou.ckJ. md Ast~nt~Wtk:l.. • 7l...OJII10 [JS P 71·5 11] A., S ergio lllirliitR. ~c lll:irlien A •• Str&io. 1945· A. T . .1\. sc~: Great BrlU,fr~. Ait Tr:ansporl A~txiU•ry. A. T. £ E. M. ~e Asod.lciH T£cni~ Esp:aiol:1 lilt ESIYdi•s M ttallilrekos. A. T . L K C Acadttnit Tt ri:..SOpolita n 4e k tras. A·l' (FIGHTER·BOMBER J'l.ANf:SJ ••• MUSTANG (FIGHTE!li'LANES ) A llt rnpo y (uea:o! Mdn Stntos, Frllnci ~cn. Sa.ntia ao d e: Chile, 1970. 59 p. 72- 274U4 [ JFC 7:.695) A. V, H . sec: Hln&:M)'· Alla• •Ne\nU Hat6s.ic. AVR; allatllltilter Vlit.nt.tf·Rc-port.(N r.J I·; 1972· ticusetWamm. Gcrm.any. C URRENT Fig. 2. The NYPL Research Libraries Dictionary Catalog Supplement, November 1972: A- Z page 1. This catalog was produced in 6 and 7 pt. type set on 7 pt. body utilizing a three column format. Cap- tions are in 8 pt. type. THE MID-MANHATTAN UBR/tRY ..4pMi~ lAuis, li/J1·1 873. (cont.) ~~-~oflbt~ofl., ................ llkll!~ dte~ ...-L PlaCCI 1·14.,. ~ preaded .,. .. t.lwlti!.Ui ouul.lhwlqof ~,._.., ~ot irwse.autl• ......__,~,...CM66. rt4Ml'tln ,_~* .. a-at .. ._......., • aa... .. ....._ ~ NeucWW. 1o 24 )lilo.t JU7. 7J.JWIO Cd [551.312·A) AGASSIZ. LQUIS, 1117-1173. Luri&, Edward, 19l7· Loub Apssiz. Chieqo (cl960] 449p. 70.104Ut C.:3 [B·Apai .. t) Tharp, Louise (Hall) 1898· AdveniUJ"o\IJ alliance. Booton (el g$9] 354p. "'"'"' Cc3 [B·Apui&-T} "'-*• J- It•......, ll77•l!N7. Felaaieo and impromphll. Freeport, N .Y., 8oob for Ullraries Prasa [ 1967] :U8p. <-r - --) _ol ... ltUod. 71H1,.. C.:l [82+Apta) AGATBIAS, SCHOI.ASTICUS, D 512. Cam~, Averil Apthiu. Oxford [c l970) 168p. (t~;~;i't~ A.,.._, ~~oDor AluloJ, tHO- The na..t olllcer's alli. 70.100619 C.: Co4 [B·Apew-L) Manh, Robert, 1932· A,new, the une•aminod-. New York [e1967) 182p. 71· 111010 C.: Co4 [B-A-w·M] ..._. s. Y .... AlMa. s-..t J_,., , .... Fig. 3. The Mid·Manhattan Names Catalog, April 1972: A- CIT page 20. This is a divided catalog produced in a two column f01mat utilizing 6 and 8 pt. type set on 8 pt. body. 14 Journal of Library Automation Vol. 6/ 1 Beyond the ttable: state. NeW York. [<1971) 2·2S. p.; 71·591412 c.. (301.1.~5) Bcr_. t~w a*-'le tt.a.. Schon. Oo:n•ld. AP. New Yoct (ct971} 2-lS.C p.; Tl·S~I47l Q4 [l01.2 .. S) ...... - .... AMII. (eel.) Freeport, N.Y. (1971. df443}29tp. 12·210216 Co6 [94U31 .. A) a.,o~ ........ KrotJ'Iey, Herbert. New York [~1966} 209p. 7l·'89SO Cc4 (3tU41-K) n. 8.._041 Gila owl !U Eplld,. ol ra.L Sltiner, RudoU', 18.61·1925. Ntw Vo.-k [1971) 102 p. Tl·S919H Q4 (294.5924-S] Bbofn. o;.>wu, C. Od.mcpu, 19ll· N"' York.[cl969) )17, l26p. 70.4301S9 C:.4 [966.905.0] "he BUlle aMI dn: tade•t Nc• £Me. v,~,~,, P.ola.ad de, l91l3· Carden City, N.Y .• 1911. 21!4 p . 1l·1386S8 Cof [221-V] :BIBL£ ·COMMENT AlliES. Bisek, M&nhcw_ (~.} Peeke'a coml'l)tfttaJY on lhc Bib~e. (Loll.doa, e196l:U 116, »II) n:..~:=~. Th::' .. -ti~ 114'7·l9U. Detroit, 1961. Sv. 11·399610 Co6 [OIU7J.B) Blb ..... ~a '-' iaiU .. AMri GW.. Fig. 4. The Mid-Manhattan Titles Catalog Supplement, July 1972. This page was created as a test utilizing 4 pt. type on 4 pt. body. The actual supplement created for use by the public was created in 6 and 8 pt. type. Automated Book Catalog Subsystem/ MALINCONICO 15 processing step, we shall state the objectives which we set out to meet, and the constraints-generally economic-under which they were met. Method of Publication As it is economically impractical to publish the entire catalog on a very frequent basis, a cumulation/ supplement scheme was adopted. Two basic types of supplements are possible: ( l ) a supplement containing only new items for the period represented; or ( 2) a cumulative supplement contain- ing all items new to the system since the last appearance of a cumulation of the entire collection, automatically replacing all previous supplements. The latter is more costly than the former. The economic desirability of the former was eschewed in favor of convenience to the user. Under the scheme adopted, a user has, at any time, only three sources to consider- the retrospective catalog, the prospective cumulation, and the cumulative supplement. B We have derived several optimization formulae for reaccumulation schedules.19 Application of these formulae indicated a reaccumulation cycle of approximately one year, assuming that supplements would appear monthly. The formulae also indicated that a small premium would have to be paid for the administrative convenience of spreading the printing and processing load of the cumulation over the span of the entire reaccu- mulation period, compared to the cost of a complete printing at the begin- ning of each period. The adopted publication scheme calls for the publi- cation each month of *2 of the cumulation, together with a supplement containing all items which have not yet appeared in the cumulation and those which have been altered since their appearance in a cumulation. The division into twelve segments is table-controlled; the number of segments may be varied from one to sixteen. For example, in January a cumulation is published for the alphabetic span A-B; a supplement is published for the remaining letters of the alphabet. A similar situation would occur the following month, etc. Thus, at any given time the public is presented with a set of volumes representing the cumulated catalog and a supplement which contains all material not found in the former. The public is un- aware of the fact that the cumulation is being cyclically updated. They are only aware of the fact that they have no more than three sources to consult: ( 1) the old card catalog, ( 2) the basic cumulative book catalog, and ( 3) the cumulative supplement. The fact that entries are migrating from the supplement to the basic cumulation each month is of no conse- quence from the standpoint of catalog usage. The decision governing representation of an item in a cumulation or supplement is made on an entry by entry basis. For example, one of the n All material in the card catalog h as become known as the retrospective collection, and all material entered into the automated system after January 1972 has b ecome known as the prospective collection. 16 Journal of Library Automation Vol. 6/ 1 March 1973 subject added entries may have migrated into the cumulation; hence, it will no longer appear in a supplement. However, the main, and all other added entries, falling into different filing ranges, will continue to appear in a supplement until they too can be absorbed into the cumulation. Sim- ilarly, alterations to a bibliographic record will cause only those entries whose text or sequencing is affected to reappear in a supplement. A change to or an addition of a subject tracing will cause only that subject added entry to be regenerated for inclusion in a supplement. The main, and all other added entry citations, which remain unaltered, need not reappear in a supplement (assuming they have previously migrated into the cumula- tion). Condensed Added Entries In order to keep printing costs to a minimum, all added entries are con- densed; title page extension, publisher, and bibliographic notes do not ap- pear under any of the added entries, the assumption being that the user who is interested in such data will take the trouble to refer to the main entry, which contains the complete bibliographic citation. This type of back-and-forth reference, while quite awkward in a card environment, is extremely simple in a book catalog. Economic considerations also led to the decision to suppress tracings from the main entry. The system was de- signed to allow these decisions not to be irreversible. The choice of data which are to appear with an entry is governed by a set of tables which may be readily altered should it be desired to change the format or context of an entry. Punctuation of condensed entries is accomplished automatically. This is not a trivial problem, and one that only a cataloger can truly appreciate. Consider, for a moment, the myriad ways in which bracketing may occur within the title or imprint statement, and the ways in which these may span the two fields. Add to these factors the rules which do not permit the appearance of double punctuation. We have found that punctuation of added entries is effected correctly in 98 percent of catalog entries. In those instances in which ALA punctua- tion rules are observed in the complete record, correct punctuation is as- sured (this is not true of cataloging obtained from European sources). Control of Cross-references It is in the realm of cross-references that the mindless consistency of the computer is most effectively employed. The goal to which we addressed ourselves was the absolute integrity of cross-referencing. Under no circum- stances-short of erasing a cross-reference from a previously published catalog-were cross-references to refer the user to a heading which did not have an associated bibliographic citation. All meaningful cross-refer- ences providing alternate access points to a citation must appear. By the same token, in order to minimize costs, cross-references which appear in a Automated Book Catalog Subsystem/MALINCONICO 17 cumulation available to the public are not to be repeated in a supplement. Cross-references to a heading would be considered valid entry points to the catalog when bibliographic citations appear under a subdivision of that heading. For example, the appearance of bibliographic citations under Ne- gro Art-Exhibitions would cause all cross-references to Negro Art to be generated (Figure 5). The same rules concerning appearance in supple- ments and cumulations are observed for these secondary cross-references. Alterations to cross-references which have appeared in a cumulation will cause the altered forms to reappear immediately in a supplement, provided the referenced heading is still in use in the catalog. Similarly, alteration of the referenced heading would cause the reference to the new form to be automatically generated. Nepi. Aatiat.o. I..a comunili estctica il> Kant. Bari, Adriatica, 1968. 399 p. 25 em. "2. edizione accresciuu: Includes biblioarephl~al rcrcrepces. 72-283171 [JFE 72-659) NOeriQM. Shapiro, Norman R. (comp) New York [1970) 247 p. 72-4010599 [JFD 72·5021) ne Necro ud J-.lea. Pim, Bedford Clapperton Trevelyan, 1826-1886. Freeport, N.Y., 1971. vii, 72 p. 12·3324'8 [HRC 72·749) ne Necro ocl doe • ....._ Bond, Frederick Weldon. College Park, Md. [1969, c1940] x, 213 p. 72-365267 [MWED 12·657) NEGRO ART • EXHIBITIONS. ~ Harlem Cultural Council. New blac~ts. [New York, 19691 [541 p. (chietry illus., ports.) 72-420544 [MCW 72-9il8] NEGRO ART • UNITED STATES. Harlem Cultural Council. New black artists. [New York, 1969] (S4] p. (chien.x. ~u .... , ports.) 72-420544 [MCW 72-908] NEGRO ART· UNITED STATES • HISTORY. Chase, Judith Wraag. Afro-American art and craft. New York [1971] 142 p. 72·363299 [3-MAMT 72-910] NEGRO ARTISTS· UNITED STATES. Fax, Elton C. Seventeen black artists. New York [1971] xiv, 306 p. 12·31l294 [MAMT 72·732) NEGRO ARTS • HARLEM, NEW YORK (CITY) HUJJins, Nathan Irvin. 1927- Harlem rcnatasance. New York, 1971. xi, 343 p. 12-173133 [JFD 12·3936] Afro-Americans. New York (19711 61 p. 12·261130 [JNF 72·6] NEGROES. . Black America. New York [1970] xv, 303 p . 72·296234 [IEC 72·1178) NEGROES· ADDRESSES, ESSAYS, LEctURES. Goldstein, Rhoda L. Black life and culture in the United States. New York [1971) xiii, 400 p. 12·240427 [IEC 72-1 Ul6] Necroes •d the ....,.t depl'essiotL Wolters, Raymond, 1938- Westport, Conn. [1970J.xvii, 398 p. 72-296828 [IEC 72-1260] NEGROES • ART. see NEGRO ART. + NEGROES AS BUSINESSMEN. Andreasen, Al~n R, 1934- Inner city business. New York [1971) xix, 238 p. 72-3371SS [JLE 72-977) Durham, Laird. Black capitalism. Washington [1970) vii, 71 p. 72-401063 [JLD 72-1931] Jones. Edward H. Blacks in busin.,.., New York (1971] 214 p. 72-4008520 [JLD 71-2200) NEGROES AS BUSINESSMEN • DIRECTORIES. National Minority Business Directories, inc. National Black business directory. (Minneapolis] FULL RECORD 01' HOLDINGS IN CENTRAL SERIAL RECORD. 12·406758 [JLM 12·221] NEGROES AS PHYSICIANS. sec NEGRO PHYSICIANS. Fig. 5. The NYPL Research Libraries Dictionary Catalog Supplement, October 1972: L - Z page 120 and 121. These pages demonstrate the generation of the cross reference Negroes- Art see Negro-Art even though only subdivisions of Negro-Art appear in the catalog. A further consideration extends to cross-references which have migrated into a cumulation. When a cumulation segment is updated, all cross-ref- erences which previously appeared in it should continue to appear if, and only if, the referenced heading is still in use in either the same segment of the cumulation, another segment of the cumulation, or a supplement; if not, its use is discontinued. Subsequent use of the referenced heading would then call up the cross-reference for reuse. Each of the above desid- _,. ...: ._______. - ··---· _ - - ~ 18 Journal of Library Automation Vol. 6/ 1 March 1973 erata requires rather intricate logic when the cumulation is being produced in monthly installments, as any of the following is possible: 1. Cross-reference in a supplement, referenced heading in a supplement; 2. Cross-reference in a supplement, referenced heading in a cumulation; 3. Cross-reference in a cumulation, referenced heading in a supplement; 4. Cross-reference in a cumulation, referenced heading in a cumulation. In each case, the cross-reference must be suppressed whenever the refer- enced heading disappears from the catalog available to the public, but must be retained when it refers to a heading existing in any part of the catalog. The cross-reference and referenced heading may easily appear in catalog segments published as much as eleven months apart, making it ab- solutely essential that both the authority and book catalog subsystems maintain strict control of the cross-reference structure. Control of Hierarchies It was decided that the appearance of cataloging under a subdivision of a heading which contains associated notes should cause the higher level heading with its attendant notes to appear. Such a heading would be forced to appear regardless of whether or not it itself headed a biblio- graphic citation, under the assumption that notes concerning a heading might he valuable to a user interested in a subdivision of that heading (See Figure 6 for an example ). Acta symboJJc:a. v. I, no. 2·; fall, 197(). [Akron, Ohio] CURRENT ISSUES AVAILABLE IN PERIODICALS DIVISION. 72·218723 [Per. Di•.] ACflNG. Schreck, Everett M. PrinciJ:IIes and styles of acting. Reading, Mass. [1970) 354 p . 72·24IS44 [MWEQ 7:Z-t57] ACflON SONGS. see GAMES WITH MUSIC L'Actirite utistiq•e. Philippe, Marie Dominique. Paris (1969)1 v. 72· 272967 [JFD 72-1443] Actirities by v.no.s ceatraJ bllllks to pro-'Ote economic ud sod~ •elfue prop-aas [by Lester C. Thurow and others] A staff report prepared for the Committee on Banking and Currency, House of Representatives, 91st Congress, second session. Washington, U. S. Govt. Print. Off., 1971 .' vii, 332 p. 24 em. At head of title: Committee print Includes bibliographies. 72·288174 [.JLE 12·835] ACTORS. Here are entered works on actors .. including bolh men and women. Works about women actors alone or women as actors are entered under the heading Actresses. ACTORS, AMERICAN • BIOGRAPHY. Shaw, Dale, 1927- Titans of tbe American stage. Philadelphia [1971) 160 .P.· 72·313460 [MWER 12-524) Fig. 6. The NYPL Research Libraries Dictionary Catalog Supplement, August 1972: page 2. The heading Actors is caused to appear due to the presence of a scope note and the use of a subdivision of the heading. Automated Book Catalog Subsystemj MALINCONICO 19 Dictionary and Divided Catalogs The same system was required to serve two divisions of the New York Public Library, each of which has different traditions and philosophies of service to identifiably different users. Therefore, an additional flexibility was required of the system: the ability to produce both dictionary form and divided catalogs. The Research Libraries, which have traditionally used a dictionary fom1 of catalog, wished to continue that practice. The Branch Libraries, on the other hand, felt that their public could be better served by a divided cata- log, separated into Titles, Subjects, and Names. The system was designed in such a manner that the modification of a single parameter in the final sort would produce either form of catalog. BOOK CATALOG SUBSYSTEM-TECHNICAL DESCRIPTION The entire subsystem consists of ten separate programs, each of which will be described below. The flow charts in figures 7, 8, and 9 depict the processing How of the subsystem. The system was designed to operate on an IBM 360 model 40 (which has since been replaced with a 370 model 145) with 256K bytes of core storage. The programs were written exclu- sively in BAL for a DOS configuration. A conversion to full OS has re- cently been completed. Each processing step described below is executed se- quentially. Significant peripheral devices required are: five tape drives, one disk drive in addition to those required by the operating system, and a line printer. Please refer to figures 7, 8, 9, and 10 for the programs and files refer- enced by symbols PI, Tl, Dl, etc. Entry Explosion and Construction-Program Pl This program serves as the driver for the entire subsystem. In this step entries are selected for inclusion in a supplement or cumulation segment. Requests for data required from the authority :file are initiated. The for- mat and data content of each entry are defined by this program via a set of tables. These tables may be altered at will, allowing redefinition of the format and content of any entry. The bibliographic master file is updated to indicate the appearance of an entry in a cumulation, preventing its sub- sequent appearance in a supplement. In addition, this program is charged with accepting communication of activity to the authority file and taking the appropriate action with respect to the bibliographic file. This activity may take several forms: alteration of a heading, change of delimiting, change to a filing form, posting or removal of a cross-reference or dual en- try, change of categorization, or the complete transfer of all cataloging from one valid heading to another. Evidence of activity to an authority heading is carried on the authority 1 bibliographic linkage file ( D0 ). When such activity has affected a head- 20 Journal of Library Automation Vol. 6/ 1 March 1973 Fig. 7. Subsystem flow chart. Explode catalog entries Generate responses, select headings.and update x-reference linkage Pl Create requests for x-references, dual entries, &. higher level headings P3.1 Automated Book Catalog Subsystemj MALINCONICO 21 Format headings module Format headings module Fig. 8. Subsystem fiow chart. Eliminate duplicate heading requests P4 Locate higher le ve l headings, & dual entries PS 22 Journal of Library Automation Vol. 6/ 1 March 1973 Create requests for secondary x-references, write headings P3.2 Locate x-references, update x-reference indicators P6 Fig. 9. Subsystem flow chart. Format headings module Format headings module Automated Book Catalog Subsystem/ MALINCONICO 23 Insert authority text data into skeleton catalog entries P7 Pagination P8 Fig. 10. Subsystem flow chart. 24 Journal of Library Automation Vol 6/ 1 March 1973 ing used by a bibliographic record as an authority field, the field is tagged for verification by the authority file in the next file update/ authority-in- terface run. The indicator for the field in question, denoting previous ap- pearance in a cumulation, is turned off. At the same time, the indicators for all other catalog tracings which require that authority field as data are turned off. When a transfer from one heading to another has occurred, the new linkage number is inserted into the authority directory of the bib· liographic record. This is not absolutely necessary, as the authority / biblio- graphic linkage file provides the link via a chain when a transfer has oc- curred. Nonetheless, the insertion of the true authority control number into the bibliographic file eliminates the necessity of a chained search in all future accesses of the tracing, space on the linkage file is conserved, and no additional indicators are required to make note of the fact that the entry has been caused to reappear in a supplement as a result of the transfer. In all cases of activity to an authority record, reverification is forced for the associated tracing field in order to guarantee correct usage of the altered authority. Each bibliographic record is examined to determine whether it will con- tribute to the .catalog. This is done on an entry by entry basis. Each field of the bibliographic record capable of defining a catalog entry is exam- ined. All fields which define a catalog entry (tags 1- - , 245, 4- -, &- -, 7- -) carry a set of indicators denoting appearance in the cumulation, and a number defining the cumulation segment into which the entry should file. An additional indicator for authority fields denotes the presence (or ab- sence) of an associated dual entry on the authority file. Appearance in the cumulation and filing segment number of the dual entry are also carried in the bibliographic record, allowing independent control of the dual en- try citation. As may be readily seen, the dual entry acts as a phantom trac- ing in the bibliographic record and will thus not be specifically mentioned in the discussion of selection criteria below. An entry is selected for construction on the basis of the following cri- teria: 1. The bibliographic record is in a valid status, i.e. has passed all editing tests, and sufficient time for proofreading has elapsed. 2. All authority fields required for construction of the entry have been verified against the authority file in the weekly bibliographic file up- date/ intedace production runs. 3. It files in the segment being produced that month. 4. The indicator denoting appearance in the cumulation is NOT set. Thus, any alteration to the content of a bibliographic record, war- ranting immediate reappearance of an entry, may be communicated to the book catalog subsystem by the extinction of the cumulation in- dicator. Both cumulation and supplement entries are created in the same run. The entries are separately collated by causing the highest level of the final Automated Book Catalog Subsystem/ MALINCONICO 25 sort to be a code denoting supplement or cumulation. It will prove fruitful at this point to draw a distinction between a cata- log entry-the printed bibliographic citation-and the machine record which is created by the system prior to phototypesetting. The machine rec- ord is nothing more than a highly organized print record. The final merg- ing of such print records from various processing steps completely define the text, typography, and sequencing of the final printed catalog. The ma- chine print records created by the system up to step P8 will be referred to as Text Entry ( TE) records. When an entry is to be included in a particular month's catalog segment or supplement, a table for the particular type of entry is consulted in or- der to determine the data and the typographic commands which will gov- ern the entry's format. At this point only a skeleton Text Entry record is constructed, as all authority data will be obtained from the authority file. The sequencing information is contained in the sort key of each TE record, which defines six levels of sorting: 1. Collation-catalog or supplement. This is further refined when a di- vided catalog is being produced. 2. Level I sort, and sort code. 3. Level II sort, and sort code. 4. Level III sort, and sort code. 5. Publication date. 6. Publisher. In the case of certain series entries, level II and III may be split into two half-size levels by the program in order to further refine the sort sequence. As an example of the use of sort levels I, II, and III, we might consider a subject added entry. In that case, the level I sort is defined by the filing form of the subject tracing, level II by the filing form of the author's name and level III by the filing form of the title of the work. The sort codes are used to separate entries which would result in the same sort keys but are conceptually different, e.g. a name which might simultaneously de- fine a title added entry, a main entry, and a subject added entry. A similar situation exists at the second sort level where conventional titles are to be separated from titles or subject title entries. Sort key levels, as all other data elements required in a TE record, will be directly inserted into the record under construction if they consist of nonauthority data, and will be identified by linkage codes for later inser- tion when the filing form data is returned from the authority file. The final TE record will not be completed until step P7, to be described below. Following construction of the sort key (or indications to complete a sort key) typographic commands and text data are inserted into the TE record. The typographic commands are contained as binary bit settings in a record directory. The directory also defines the location and length of each data element, or gives a linkage code when the data are to be obtained from the authority file, and hence cannot be inserted until program P7. The order 26 Journal of Library Automation Vol. 6/1 March 1973 of entries in the directory defines the printing sequence of text data. Thus, when text data are available, true locations and lengths are provided in the record. When they are not, linkage codes replace them in the direc- tory. These linkage codes are simply replaced by true locations and lengths when the authority text is added to the end of the record by another pro- gram (P7). It will suffice at this point to mention that all typographic commands are present in the record. The function of the commands will be discussed in detail below when the pagination program ( P8) is discussed. Having constructed a set of skeleton TE records, the program initiates requests to the authority file for authority text data and filing forms. Re- quests are also made to the authority file for headings which are to print above the bibliographic citations. These headings will be constructed in the same manner as catalog enbies, i.e. as TE records. They will then be merged with the respective TE records as citation entries. These heading requests also initiate a sequence of processing steps culminating in the lo- cation and formatting of all relevant cross-references. The necessary cross- references are formatted into TE records, and are likewise merged to form the complete catalog. When an entry is chosen for inclusion in a cumulation segment, indica- tors to that effect are set in the bibliographic master record; it is then writ- ten onto the updated bibliographic master file. Locate Authority Data and Select Headings-Program P2 All inquiries to the authority file are sorted into authority sort key se- quence and matched with the authority file. All inquiries will result in a match to a valid authority record. A match for each inquiry is assured by the weekly file updatej intetface processing programs. Inquiries to the authority file result in any combination of the follow- ing actions: ( 1) authority text and filing data are supplied, via a response record, to program P7 for the completion of TE records created by pro- gram Pl; (2) authority records are selected to serve as headings above bib- liographic citations (these same records will also cause cross-references to be selected); ( 3) authority records are selected in order to initiate a search for the associated dual entry, as per instructions contained in the inquiry record. The selected headings consist of complete authority records with in- structions regarding their eventual use and routing. Headings are routed, via a collation code, into cumulation segments or supplements. Since a sin- gle authority heading may appear as both a main entry and subject head- ing, indicators are set defining its eventual use as one, the other, or both. These indicators will be called usage indicators. Usage decisions made by Pl are passed to this step as part of the inquiry records. The results of these decisions are then transmitted as a set of codes inserted into the se~ lected authority records. Automated Book Catalog Subsystem/MALINCONICO 27 This program is further charged with the responsibility of keeping cur- rent the catalog status indicators for cross-references by maintaining two binary indicators with every cross-reference. A cross-reference record with multiple see fields will have a pair of indicators for each see field. The first binary indicator denotes prior appearance of a cross-reference in a cumulation segment. The second indicates that the referenced heading cur- rently appears in some part of the catalog. In passing through the entire authority file, this program will note that a heading which falls in the cur- rent month's filing range has had no requests for its use lodged against it. When this is the case, transactions are created for every cross-reference, de- fined by see froms in the heading record, extinguishing the second binary indicator described above. The cross-reference will then not be used again until it is required. The need for this operation will become more evident when we discuss program P6. The maintenance of the physical linkage between cross-references and headings is performed by the authority file update subsystem. This sub- system guarantees that the linkage is kept current regardless of alterations to headings and cross-references. Hence, all see froms are guaranteed to re- fer to a cross-reference (direct see) record on the file. Explode Hierarchies, Cross-references and Dual Entries-Program P3.1 The selected authority records are examined for the presence of see from fields. If any are found, they are used to create further inquiries to the authority file for cross-references. A similar operation is performed for dual entries with the exception that the dual entry inquiry is not cre- ated unless it was requested by program Pl. The request is passed via in- dicators in the inquiry record (as discussed above in the description of program P2). All records which are subdivisions of headings, e.g. Sculpture-Tech- nique, will cause inquiries for all significant higher level headings ( Sculp- ture in this case) to be created. Higher level headings will supply addition- al entry points via cross-references to them, or may themselves appear if they contain notes. Cros.'i-reference requests are separated for later processing. They will be processed with requests for secondary cross-references to be generated by program P3.2 below. Exclude Duplicate Headings and Separate Inquiries-Program P4 This program is nothing more than a sort with exits. The input tape of selected headings and higher level heading requests is sorted, and if a re- quest for a higher level heading has already been filled by a heading se- lected in 'P2, the request is dropped. All usage information carried by the request is logically added to the matching heading. When multiple requests for the same higher level heading are discovered, all but the first are 28 Journal of Library Automation Vol. 6/ 1 March 1973 dropped. Usage information from all duplicates is added to the retained request by a logical OR operation. The authority records which were selected by P2 for use as headings are formatted into complete text entry (TE) records for later input to the pagination program. TE heading records are formatted by a single module invoked by this step and again in P5. The surviving hierarchy requests, and all dual entry requests are separat- ed for processing in the next step. Format Headings Module All heading records selected for print are processed by this module, which converts the input text and filing data of authority recmds into TE records. At times quasi-duplicates of the TE record are constructed with different filing and typography codes for use as main entry and subject headings. At times portions of the data are encoded as nonprinting because it is lmown that the print data will be provided by other heading records. This is the case with author/ conventional title records. The author head- ing is assured because of the explosion of higher level headings; hence, a simple method is provided for insuring its appearance only once regardless of the number of associated conventional titles. When a subject heading record is created, the heading is made to appear twice in the record, once in upper case for printing, and once in its normal upper and lower case form, encoded as nonprinting, for possible use as a dictionary heading by the pagination program. The conversion to upper case is effected via a translate table, because of the presence of control in- formation within the text for floating diacritics. Also, diacritics and many special characters do not have a simple upper case equivalent due to the use of the complete ALA character set. Punctuation of cross-references is effected in this module. The com- plexities by no means approach those encountered in punctuating con- densed added entries; nonetheless, they do exist. For example, terminal periods in headings referenced in a cross-reference must be replaced with semicolons when more than one heading is referenced, a blank mus l be in- serted following the hyphen and preceding the semicolon in open ended dates, the final referenced heading in a string must end in a period unless it terminates with a hyphen, quote mark, exclamation point, question mark, parenthesis, etc. Typographic codes which apply to headings, notes associated with head- ings, and phrases in cross-references are inserted by this program when TE records are created. Locate Hierarchies and Dual Entries-Program P5 All heading requests are applied to the authority master file. When the heading corresponding to a request is located, the entire authority record is written onto an output file for further processing. This process is sim- Automated Book Catalog Subsystem/ MALINCONICO 29 ilar to that executed when the original heading requests were processed in program P2. Higher level headings are encoded for use in accordance with their categorization and filing form. When a requested dual entry heading is located, a TE record is written for later processing by the pagination program. A response record contain- ing the filing form of the dual entry is also written onto an indexed se- quential disk file. A direct access file is necessary since the catalog record contains only a link to the primary heading, and all requests for the dual entry come via a request against the primary heading in program P2. Rath- er than attempting a complex scheme for keeping track of all bibliograph- ic items requiring the dual entry data, only one copy of the dual entry re- sponse is isolated and indexed by the control number of the primary head- ing. It is then retrieved on that basis when needed. Explode Secondary Cross-references, Separate and Select Hierarchical Headings-Program P3.2 This program is simply a phase of program P3.1 described above. The major difference lies in its handling of the authority records which it ac- cepts as input. They are written out as TE records, but only if they meet one of two conditions: if the authority record matching the heading re- quest contains notes, it is selected for eventual formatting into a heading; or if it represents an author, required of an author/conventional title combination. In all other instances higher level headings are not selected for printing. The Format Headings module is invoked by this step for all higher level headings selected for print. If secondary cross-references are not desired, the explosion module which creates the requests is simply by- passed. Similarly, higher level headings may be suppressed. No further attempt is made to generate higher level headings, as they have all been exploded in P3.1. The exploded cross-reference requests are separated in this program, just as they were in P3.1. Locate Cross-references-Program P6 Prior to execution of this step tapes T3.1, T3.2, T3.3 are sort/merged into a single tape T3.4 (Figure 9). T3.4 now contains all of the transac- tions generated by program P2, and all cross-reference requests. Recall that P2 has created transactions extinguishing the indicator carried by cross-ref- erence headings, denoting that the referenced heading appears somewhere in the catalog. The sort causes all of these transactions to be applied be- fore any cross-reference requests are processed. It might appear a bit para- doxical that a request should be made to a cross-reference whose refer- enced heading was not selected in P2; however, recall that a cross-reference may be invoked as the result of the use of a subdivision of the refer- enced heading (secondary cross-reference). At this point some discussion of the cross-reference record is in order. A cross-reference may point to several headings simultaneously, e.g. Ani- 30 Journal of Libraty Automation Vol. 6/ 1 March 1973 mals see Aardvarks/ Bears/ Cats/ ... Zebras. Each referenced heading is controlled individually. Only the required references are extracted as need- ed. In the example above, if Aardvarks and Cats appeared in the catalog those two references would have been selected, and no others. Hence, the discussion which follows will be greatly simplified if we con- sider each cross-reference transaction to apply to only a single reference. This is effected operationally by carrying the control number of the head- ing which gave rise to the cross-reference request within the request. Following the application of transactions, if any, to extinguish indica- tors, the selection fm· print logic is executed. Cross-references are selected for printing when the indicator specifies that the cross-referenced heading appears somewhere in the catalog available to the public, regardless of whether there is a specific request for it, and the cross-reference is filed in the segment being produced. A request for a cross-reference which already appears in a cumulation segment currently in use is ignored. A request for a cross-reference which is not already in the catalog is honored. The actual logic is somewhat complex; however, the end result is as described above. Cross-references to be printed are routed to either a supplement or cum- ulation installment depending upon the filing range in which they fall. When a divided catalog is being produced cross-references are further routed into the appropriate catalog on the basis of categorization. Follow- ing the selection of, or refusal to select, a heading, the indicators denoting prior appearance in the catalog and linkage to a heading in use are up- dated. Continuing integrity of the cross-reference structure for future printings of the catalog is thus assured. Complete Citation Text Entry Records-Program P7 Prior to execution of this processing step, response records emanating from P2 are sorted into bibliographic item number sequence. Sequencing is necessary since the skE-leton TE records are in the same sequence as the bibliographic master file. Identification of authority response data re- quired by a TE record is via bibliographic item number and a sequence number assigned to each authority field within a bibliographic record. Sub- fields of a response record are identified by delimiter. Response records are matched to skeleton TE records bearing the same item number. Following the match, all required data are inserted into the skeleton TE record. Codes are carried in the TE record directing this pro- gram to perform certain formatting functions not possible in step Pl. These functions include insertion of certain combinations of parentheses and brackets required by series notes, addition of a series note to certain call numbers, and the replacement of the author portion of an author- title combination se1·ies note with His:, Her:, In His:, In Her:, etc. None of the above could have been accomplished in a typographically accept- able manner in program Pl. Dual entry data are obtained from the indexed sequential file ( Dl). The Automated Book Catalog SubsystemjMALINCONICO 31 identification of such data is via the authority control number of the pri- mary LC subject heading carried in the bibliographic record. This number is used to access file Dl for the required text and filing data. Pagination-Program P8 Prior to execution of this step a set of page initialization records is cre- ated for the particular type of catalog being produced. These records are prepared by a program not shown in the subsystem flow. Initialization rec- ords govern the overall format of the book to be produced. There are six such initialization records, all of which must appear at the beginning of the input tape. They may also appear embedded anywhere among the TE records in various combinations. The first initialization record, known as a Page Dimension ( PD) record, defines the physical dimensions of the page to be printed. Parameters car- ried in this record also determine the dimensions of inner and outer page margins, head and foot margins (independently for recto and verso pages), number and width of columns, body size on which to set type, and spacing between entries. When an embedded PD record is encountered the program will terminate any page cun-ently being formatted, begin a new page, and continue formatting in accordance with the redefined dimen- sions. The second initialization record defines the starting page number, and indicates whether paging is to start with a recto or verso page. The pagina- tion program may also be directed via this record to place a black square at the edge of a page, at a location defined by the record, to serve as a thumb index. This record may also appear anywhere else on the tape. When it does appear as an embedded record it commands the program to terminate the page being formatted at that point, to begin a new page, and possibly provide a number of blank pages. This allows volumes to be broken at predefined sort points. In this manner we may separate alpha- betic segments, the various volumes of a divided catalog, or cumulation and supplement volumes, and move the thumb index. Subsequently four records define caption and legend text (independent- ly for recto and verso pages). Any one or combination of these records may also occur elsewhere on the tape. When they do occur as embedded records, the program terminates the page currently being formatted, alters the appropriate caption and/ or legend text, and continues to format text. Interfiling of these records with TE records allows captions to be changed automatically between volumes of a divided catalog, or between supple- ment and cumulation volumes, or at any other desired sort point. The six records described above control those aspects of page format which are common to a large class of entries. Individual TE records carry typographic commands which are specific to the entry, or to an element of the entry. A code carried by each TE record (Entry Fo1'mat Code) defines typographical rules for the entry as a whole. This code is used to identify 32 Journal of Library Automation Vol. 6/ 1 March 1973 data to be used in the formation of dictionary and column headings when page breaks occur. Certain widow rules affecting the entire entry are speci- fied, e.g. entry may not span columns, entry may not form the last line of a column, etc. Line advance commands, defining the amount of space (if any) to be left between enbies, are carried in this code. Data elements within an entry may require different typographic rules. Format codes for each such element are carried within a record directory. The directory also serves to identify the location and length of text data to be typeset in accordance with the typography specified by Element For- mat Codes. Element Format Codes consist of 32 bit fullwords. Groups of bits within the word define separate typographic rules. These bits may be set in any combination, defining a complete spectrum of typography. The major typographic parameters governed by these bit settings are: 1. Starting indention ("continue on the previously used line" is includ- ed). 2. Overflow indention to be used if the element must be continued onto another line. 3. Space to be left on a line before adding any additional text to a pre- viously used line. 4. Justification-left, right, center of column, and center of page. 5. Type size height. 6. Type size width relative to height. 7. Type face-bold or light. 8. Type style-Roman or Italic. 9. Element widow rules-restrictions which do not allow text to : span columns, form the first line of a column, span from a verso to a recto page, or span from a recto to a verso page. 10. Line break-indicating whether lines may be broken at blanks only, or may be broken at blanks and certain special characters. Line break decisions observe a hierarchy of rules, e.g. if the indicator is set to break at blanks only and no blanks are found within the entire line, the program automatically reverts to the second option (break at blanks and special characters ); should that also fail, the line will be broken arbitrarily at the last character which fits on the line. 11. Hyphenation indicator-due to the great number of foreign lan- guages used in the NYPL catalog no hyphenation routine is em- ployed. Allowance has been made, however, for the inclusion of a hyphenation module should it be desired in the future and an indi- cator provided in order to invoke it. Other rules of lesser importance exist, but space does not warrant their discussion. The entire ALA character set plus several additional characters specified by NYPL may be typeset via this program on an III Videocomp. Diacritics are floated onto the characters they accent. The coding structure adopted by NYPL consists of two unique codes preceding a pair of characters to be Automated Book Catalog Subsystem / MALINCONICO 33 overprinted. The first code indicates to all processing programs that the data to follow must be interpreted in a unique manner. The second de- fines the unique treatment to be accorded. We currently employ only two such functions codes; both imply a form of overprint. Coding in this manner allows unlimited expansion of the character set. A function code has been assigned but not yet utilized for overprinting of triplets. This would be necessary in handling doubly accented characters, such as are found in Vietnamese. Functions codes have been assigned defining escapes to nonroman alphabets. The character set includes two blanks in addition to the normal word space. One of these will provide a word space on printed output but will fail line break tests. Such a character is of great utility as a separator in abbreviations and as a word space preceding such terminal characters as a close parenthesis. Conversion of the NYPL data base to utilize this super blank will be effected following definition of sufficiently reliable rules for its automatic generation at input. The second blank is a zero set width character. This character, when present in a machine record, is assigned a null width by the phototypesetting device. Its utility lies in areas in which it is required to remove only one or two characters from a record, but it is not desired to expend the programming or processing time in restructuring the record. All of the input text data and format codes are translated into com- mands to an III Videocomp 830 and written onto a driver tape. The driver tape is then delivered to a photocomposition vendor who mounts it on a Videocomp to produce camera ready copy for catalog pages. The camera ready copy is then delivered to a printer who produces multilith plates, and thence, pages which are bound into monthly supplements and cumulation segments. CONCLUSION Photocomposed book catalogs have been in use at NYPL since January 1971. The effectiveness of the system can, perhaps, best be judged by the only adve rse reaction received thus far: in the case of material which must pass through the bindery after cataloging, entries appear in the catalog be- for e the materials reach the shelves, thereby causing annoyance to users. Judged by more serious criteria, the system has been proven to be an op- erational success. The processing budget for the Research Libraries is now insignificantly higher than it was under the manual system, but cataloging volwnes have increased dramatically: 7,500 titles/ mo. cataloged vs. 5,500 titles/ mo. under the old manual system. The increase in productivity can- not be solely attributed to the automated system. Some of it is attributable to the revision, by the head of Preparation Services, of manual procedures. E xpansion of Book Catalog Coverage The entire bibliographic system is currently in the final stages of revi- 34 Journal of Library Automation Vol. 6/1 March 1973 sion for production of a multimedia catalog of the Dance Collection of the Research Library of the Performing Arts. 20 The organization of cita~ tions referring to material in diverse media will be accomplished by pr~ viding separate sequences under appropriate headings, denoting: Works by, Works about, Visual works, Music, Audio materials. Listed under each of these headings will be the following types of materials: 1. Works by-Written works by an author. 2. Works about-Written works about an author, performer, etc. (The subheading is not used under topical subjects.) 3. Visual works-Photographs (original and indexed), prints and orig* inal designs, motion pictures and videotapes, filmstrips and slides. 4. Music-Music scores. 5. Audio Materials-Phono records and phonotape. These headings are not as specific as those suggested by Riddle, et al., however, they do provide the early warning function discussed by Virginia Taylor.zl, zz This catalog is due for publication in early 197 4. Pending the success of this venture, a study will be made of the means of extending the scope of the Research Libraries' catalog to include nonbook materials. In late fall 1973, an extremely exciting and bold step will be taken by the Jewish Division of the Research Libraries. They will begin data input of material in Hebrew, using the recently defined ANSI correspondence scheme for Hebrew characters. 23 Within this scheme roman and special keyboard characters have been assigned to each character of the Hebrew alphabet. Book catalog display of Hebrew text will utilize these characters in a left to right print mode until such time as development money is found for the digitization of Hebrew character fonts, and for modifica- tions to the pagination program in order to display mixed roman and He~ brew text. All Hebrew entries will be filed in accordance with conventions for sequencing Hebrew text. The Hebrew entries will be interfiled with entries in romanized forms by conceptually assuming the sequencing al- phabet to contaln 57 characters: blank, A, B, ... , Z, 0, 1, . .. , 9, N , !l , ... , .n . If we have an author who has written several titles in roman al~ phabet languages, and others in Hebrew, we would create a sequence of main entries under his name interfiled according to the alphabetic se- quence shown above. All Hebrew or variant title added entries would be found in a sequence starting at the end of the roman alphabet. The primary reasons for adopting such a scheme as opposed to the more traditional romanization are: 1. A nationally endorsed correspondence schedule has been provided by ANSI. 2. It is desired to enter this data into the automated system and end the manual operation at the earliest possible time. 3. It is desired not to have to revise all cataloging when true Hebrew text may be economically displayed. It is virtually impossible to re- Auto1Mted Book Catalog SubsystemjMALINCONlCO 35 cover the true form of nonroman text from its romanized form. These two areas, nonroman alphabet display and inclusion of nonbook materials, represent the only areas in which further development of the book catalog system is planned. Future efforts will be directed to conver~ sion of the batch-oriented processing system to one with on-line file main- tenance capability. It should be stressed again that the primary aim of the bibliographic system is not production of book catalogs. The system was designed to cre- ate a highly controlled data base which could be used in conjunction with whatever display medium it; technologically and economically feasible. On- line access to the catalog will require extreme control of the data, as au- tomated retrieval techniques require very precise definition of access points. The problems of data organization become greatly magnified when CRT display devices are used, as the visual scan range produced is severely limited. The extensive development effort to produce book catalogs was under- taken at NYPL since it was felt that for at least the next decade book cata- logs in printed or microform would provide the only economically viable form of access to the collection. Book catalogs will, no doubt, also serve as backup forms of display for a considerable time after introduction of electronic access techniques. REFERENCES 1. Seoud Makram Matta, The Card Catawg in a Large Research Library: Present Conditions and Future Possibilities in The New York Public Library, Submitted in partial fulfillment of the requirements for the degree of Doctor of Library Science. (New York: Columbia University, School of Library Service, 1965). 2. I. A. Warheit, "Automation of Libraries-Some Economic Considerations," Pre- sented to: Canadian Association of Infornuition Science, Ottawa, Ontario, Canada, 27 May 1971. 3. James W. Henderson and Joseph A. Rosenthal, eds., Library Catalogs: Their Preservation and Maintenance by Photographic and Automated Techniques (MIT Report No. 14.) (Cambridge, Mass .: MIT Press, 1968). 4. Margaret C. Brown, "A Book Catalog at Work (Free Library of Philadelphia)," Library Resources and Technical Services 8:349-58 (Fall1964). 5. Richard De Gennaro, "Harvard University's Widener Library Shelflist Conversion and Publication Program," College & Research Libraries 31:318-33 (September 1970). 6. Richard D. Johnson, "A Book Catalog at Stanford," Journal of Library Automation 1:13-50 (March 1968). 7. Paula Kieffer, "The Baltimore County Public Library Book Catalog," Library Re- sources and T echnical Services 10:133--41 (Spring 1966). 8. Hilda Feinberg, "Sample Book Catalogs and Their Characteristics." In: Book Catalogs by Maurice F . Tauber and Hilda Feinberg. (Metuchen, N.J.: The Scare- crow Press, 1971) p.381-511. 9. Paul J. Fasana and Heike Kordisb, The Columbia University Libraries Integrated Technical Services System. Part II: Acquisitions. (a) Introduction. (New York: Columbia University Libraries Systems Office, 1970). 62 p. 10. Gerry D. Guthrie, "An On-line Remote Access and Circulation System." In: Amer- 36 Journal of Library Automation Vol. 6/ 1 March 1973 ican Society for Infor11Ultion Science. Annual Meeting. 34th, Denver, Colorado, 7-11 November 1971. Proceedings 8:305- 9, Communications for decision-makers. (Greenwood Publishing Corp.: Westport, Connecticut, 1971). 11. Ralph M. Shoffner, "Some Implications of Automatic Recognition of Bibliographic Elements," Journd of the American Society for Infor11Ultion Science 22:275-82 (July/ August 1971) . 12. Frederick C . .Kilgour, "Initial Design for the Ohio College Library Center: A Case History." In : Clinic on Library Applications of Data Processing, 1968. Proceedings (Urbana: University of Illinois, Graduate School of Library Science, 1969) , p. 54-78. 13. Maurice F. Tauber and Hilda S. Feinberg, Book Catalogs (Metuchen, N. J.: The Scarecrow Press, 1971). 14. Catherine 0. MacQuarrie, "Library Catalogs: A Comparison," Hawaii Library Association ]ournal21:18-24 (August 1965). 15. Irwin H. Pizer, "Book Catalogs Versus Card Catalogs," Medical Library Association Bulletin 53: 225-38 (April 1965). 16. Kieffer, "The Baltimore County Public Library," p.l33--41. 17. James A. Rizzolo, "The NYPL Book Catalog System: General Systems Flow," The LARC Reports 3:87-103 ( Falll970). 18. Edward Duncan, "Computer Filing at The New York Public Library," The LARC R eports 3:66-72 (Fall1970). 19. S. Michael Malinconico, "Optimization of Publication Schedules for an Automated Book Catalog," The LARC Reports 3:81- 85 (Fall 1970) . 20. Dorothy Lourdou, "The Dance Collection Automated Book Catalog," The LARC Reports 3: 17- 38 (Fall 1970). 21. Jean Riddle, Shirley Lewis, and Janet Macdonald, N on-book Materials: The Or- ganization of Integrate d Collections. Prelim. ed. (Ottawa, Ont.: Canadian Library Association, 1970). 22. Virginia Taylor, "Media Designators," Library Resources and Technical Services 1:60-65 (Winter 1973) . 23. Edward A. Goldman, et al., "Transliteration and a 'Computer-Compatible' Semitic Alphabet," Hebrew Union College Annual 42:251-78 (1971). 5761 ---- lib-s-mocs-kmc364-20141005043558 56 HIGHLIGHT OF MINUTES Information Science and Automation Division Board of Directors Meeting 1973 Midwinter Meeting Washington, D. C. Monday, January 29, 1973 The meeting was called to order by President Ralph Shoffner at 8:10a.m. The following were present: BOARD-Ralph M. Shoffner (Chairman ) , Richard S. Angell, Don S. Culbertson (!SAD Executive Secretary), Paul J. Fasana, Donald P. Hammer, Susan K. Martin, and Bemiece Coulter, Sec- retary, ISAD. COMMITTEE CHAIRMAN-Stephen R. Salmon. GUESTS- Charles Stevens and David Weisbrod. REPORT OF NATIONAL COMMISSION ON LIBRARY AND IN- FORMATION SCIENCE. Mr. Charles Stevens, Executive Director of the National Commission on Library and Information Science, discussed the Commission's priorities and objectives for planning libmry and information services for the nation. The Commission has identified six areas of activ- ity in which to conduct investigations in relation to its charge which is to study " ... Library and information services adequate to meet the needs of the people of the United States." These six: areas are: ( 1) understanding information needs of the users; ( 2) adequacies and deficiencies of current library and information services; ( 3) pattems of organization; ( 4) legal and financial restrictions on libraries; ( 5) technology in library and in- formation systems; and ( 6) human resources. REPORT TO ALA PLANNING COMMITTEE. The report to the ALA Planning Committee on !SAD's long range plans was deferred until after the !SAD Objectives Committee report is received in June. OBJECTIVES COMMITTEE INTERIM REPORT. Mr. Stephen Salmon, chairman, provided an interim report of the committee. The committee will recommend that the Division continue to exist and will list its proposed objectives, which may differ from the original objectives. At the request of Louise Giles, chairman of the Information Technology Discussion Group, special attention will be given to that group's interests in formulating the statement of objectives. MEMBERSHIP SURVEY COMMITTEE. Mr. Shoffner relayed Ms. Pope's report that the membership survey will cost $700.00, which is not available in the current budget. Mr. Culbertson said that the cost could be decreased by surveying a sample of 1,000 members. The decision was to Highlights of Minutes 57 request the full amount for the survey, to be performed in the coming fiscal year. ASIDIC REPRESENTATIVE. Mr. Peter T. Watson, through correspon- dence with Mr. Shoffner, reported that ASIDIC is interested in liaison with ALA, and was concerned with the possibility of accomplishing this through ISAD. Mr. Culbertson reported that ASIDIC could become an affiliate of ALA for a $40.00 fee, but that ISAD could recommend a formal liaison, especially if ISAD and ASIDIC had similar interests. MOTION. It was moved by Paul Fasana that this matter of ASIDIC liaison with ALA be passed on to the Executive Director, Mr. Robert Wedgeworth, and that the President of ISAD write and inform him of such. SECONDED by Richard Angell. CARRIED. POLICY STATEMENT ON PRIVACY OF DATA PROCESSING REC- ORDS. Mr. Culbertson had been approached about !SAD's making a state- ment on broad issues of data processing, including privacy. A need has been made known by the ALA Washington Office for having such a state- ment on which to base their stand in certain hearings. Mr. Hammer felt it very appropriate that the Association (ALA) take a position on it. Mr. Weisbrod mentioned that !SAD could be involved because of the vulner- ability of machine-readable files due to the large quantity of data proc- essed. MOTION. It was moved by Paul Fasana that the ISAD Board recom- mend to the ALA Council that it (ALA) develop some policy express- ing its membership's attitude toward the privacy of machine-readable data. SECONDED by Donald Hammer. CARRIED. ]OLA EDITOR. Mr. Shoffner reported, concerning the appointment of an editor, that two contacts were outstanding and he would report to the Board on Wednesday. Mr. Culbertson has been serving as temporary editor. Mr. Fasana noted that the schedule for 1972 was for four issues, but only one had appeared. He asked what plans there were to catch up or cancel. Mr. Culbertson said that legally ISAD could not cancel any issues, and that a statement had been written for the "Memo to Members" section of American Libraries. He also mentioned the previous board action to have ]OLA Te chnical Communications become a part of the 1973 volume. Wednesday, January 31, 1973 Mr. Shoffner called the meeting to order at 10:00 a.m. Those present were: BOARD-Ralph M. Shoffner (Chairman), RichardS. Angell, DonS. Culbertson (!SAD Executive Secretary), Paul J. Fasana, Donald P. Ham- mer, Susan K. Martin, and Berniece Coulter, Secretary, ISAD. COM- MITTEE CHAIRMEN-Brigitte Kenney, Ronald Miller, and Velma Vene- ziano. GUEST-Peter Watson. CONFERENCE PLANNING COMMITTEE REPORT. Mrs. Susan 58 Journal of Library Automation Vol. 6/ 1 March 1973 Martin, chairman, reported that the 1972 seminar on telecommunications had been successful, and the April seminar with the National Microfilm Association in Detroit was proceeding as scheduled. The seminar on the national libraries, originally scheduled for January, and the seminar on netM works which was to be in March had both been postponed until the next fiscal year. Planning of the Las Vegas preconference program is continuing smoothly; the institute is to be concerned with a review of the state-of-the-art of li- brary automation. It will update the !SAD preconference institute of 1967. ISAD / LED EDUCATION COMMITTEE REPORT. A written report was submitted. (See Exhibit 1.) RTSD / ISAD / RASD REPRESENTATION IN MACHINE-READABLE FORM OF BIBLIOGRAPHIC INFORMATION COMMITTEE REPORT. Chairman Vehna Veneziano reported that as a result of a ]OLA Technical Communications announcement that the committee meeting was open and that there would be discussion of the controversial International Standard Bibliographic Description ( ISBD), 2()0-300 persons attended the com- mittee meeting. The committee felt that changes such as ISBD in the MARC records by the Library of Congress should take into account the users of the MARC distribution service. Committee action on the ISBD was delayed until the ISBD for Serials proposal was further along. It was stated that the ISBD for serials should be as consistent as possible with the ISBD for mono- graphs. The committee suggested that each division publish these standards in its journal. MOTION. It was moved by Paul Fasana that the !SAD Board suggest to the JOLA Editorial Board that discussion drafts of standards be published in the ] ournal of Library Automation. SECONDED by Donald Hammer. cARRIED. Mrs. Veneziano pointed out that a resolution was passed concerning the formation of an ad hoc task force for a period of two years. The task force would work with emerging standards relating to character sets: Greek and Cyrillic alphabets; mathematical and logical symbols; and control characters relating to communications. Three persons were suggested for the task force: Charles Payne of the University of Chicago, David Weisbrod of Yale, and Michael Malinconico of the New York Public Library in addition to Lucia Rather and Henriette Avram of the Library of Congress. The task force would report back to the Board through the committee. MOTION. It was moved by Paul Fasana that !SAD consider the creation of a task force to work with emerging standards relating to character sets and the insertion of a fund request in the ISAD budget for $1,060 ( $700 for 2 trips for 3 persons and $360 per diem for 3 persons for 2 days for each of 2 trips). SECONDED by Donald Hammer. CARRIED. Highlights of Minutes 59 The committee wished to go on record that since RTSD had recently formed a Committee on Computer Filing that computer filing rules was a function of the Interdivisional Committee on Representation in Machine- Readable Form of Bibliographic Information. The subject of library codes was discussed. Bowker was assigning numeric codes to libraries, book publishers, and book dealers. The com- mittee is concerned about standards and does not wish to see the creation of systems of incompatible codes. TELECOMMUNICATIONS COMMITTEE REPORT. Brigitte Kenney, chairman, submitted a written 18-month report of the committee (Exhibit 2). Miss Kenney announced that she was resigning as chairman of the com- mittee and that no present member was available to assume the chairman- ship. Mr. Hammer, as President-Elect, was charged with appointing the next chairman. The function statement of the Telecommunications Committee has been grouped into four areas: ( 1) communication to members; ( 2) training; ( 3) legislative matters; and ( 4) research. She pointed out that both ]OLA Technical Communications and Amer- ican Libraries had said in writing that they would accept articles on tele- communications, particularly cable TV, and had accepted none. Also, she ltad attempted for a year and a half to assemble an information packet at ALA Headquarters, but did not know the status of the project. Headquarters had requested guidelines on cable policy from the com- mittee; she stated that they had not succeeded in completing this task. No guidelines had been provided. Mter !SAD and ALA sources did not re- spond to a request to publish a cable newsletter, the American Society for Information Science was approached. The ASIS Council approved this the previous Friday and she had obtained seed money from the Markle Foun- dation. Miss Kenney referred to the resolution introduced that afternoon in Council that an ad hoc ALA committee be established to address itself ex- clusively to cable matters and be representative of all units of ALA, and that it take on very specific tasks with clearly delineated time limits. She further stated that she had not felt that !SAD had given adequate support to the ISAD Telecommunications Committee's activities, and thought that the Board would have to decide if this was an appropriate committee for !SAD. If so, was the function statement too broad? Should it be narrowed to just data transfer? Miss Kenney also suggested that the committee be expanded in size to include more people involved in tele- communications. In the discussion which followed it was indicated that it could take from two to three years to set up a committee in ALA as an interdivisional com- mittee. It was decided that a committee chairman should be found and that 60 Journal of Library Automation Vol. 6/1 March 1973 the Board could then work with the chairman in the definition of the tasks to be performed. PUBLISHING OF MINUTES. It was decided that the Board of Di- rectors express to the Editorial Board their desire that the minutes of Board meetings be published in the journal. SEMINAR AND INSTITUTE TOPICS COMMITTEE REPORT. Ron- ald Miller~ chairman, enumerated the following points of the committee's meeting: that ( 1) a long range plan for seminar programs be written to cover the period from July 1974 through June 1978; (2) part of the money from the institutes be budgeted to support a professional staff person at ALA Headquarters to handle the burden of the work; ( 3) policy be es- tablished concerning commercial groups using ISAD programs for a market- ing channel, particularly products of use to libraries; ( 4) institutes or semi- nars be regionalized in the U.S. and Canada; and ( 5) liaison efforts be utilized (a) within the network of ALA, (b) through subcontractors, and (c) through continuing education programs of library schools or other institutes of higher education. In the discussion by the Board it was agreed that a written document, both specific and general, be put before the ISAD membership concerning future seminar and institute topics in order to ob- tain reactions. ]OLA EDITOR APPOINTED MOTION. It was moved by Donald Hammer that the Board approve the appointment of Susan K. Martin as editor of the Journal of Library Automation. SECONDED by Paul Fasana. CARRJED. TRIBUTE TO DON CULBERTSON. "The Board commended Don S. Culbertson for long, energetic and useful service to ISAD." EXHIBIT 1 January 23, 1973 !SAD/LED Education Committee Report The ISAD/LED Education Committee met Sunday, January 28, at 9:30 a.m. in the Garden Restaurant of the Shoreham Hotel. Present were members James Liesener, Robert Kemper, Gerald Jahoda, Edward Heiliger, and (ex officio ) Ralph Shoffner. Ab- sent were Ann Painter and Duane Johnson. Discussion focused on DISC (Developmental Information Science Curriculum), what has been achieved by the DISC contingent working under the aegis of ASIS, and how ISAD/ LED could contribute to achieving the DISC objective of producing transferable "modules" or packaged programs for information science teaching. It was decided that to reach this objective what would be required were: ( 1) An overall structure or frame of reference which could be used to coordinate modules developed by interested and dedicated individuals. (2) Specifications for module construction. Re 2- lt was decided to await the completion of modules currently b eing developed by Charles Davis and David Buttz and to examine these (at Las Vegas) as providing guidelines for module specifications. HighUghts of Minutes 61 Re l-It was suggested by Ralph Shoffner that a frame of reference might be achieved, with some dispatch, by drawing up a list of about 20 questions in the area of information science, which library schools might expect their graduating students to answer, each question being answerable in no more than an hour. The idea was that modules might be designed around these questions. Also, it was seen that these ques- tions might serve a useful purpose in organizing information science teaching in light of professional program evaluation and accreditation. The suggestion of "questions" was enthusiastically received and the following day Gerald Jahoda, Edward Heiliger and Charles Davis drew up a "sample" list of questions and outlined the following procedure: ( 1) The sample list of questions is sent to ISAD /LED Education Committee mem- bers as well as to ASIS SIG/EIS and ASIS Education Committee members for recom- mendations in the way of additions, deletions and word revisions. By February 15, 1973. (2} The questions are revised and edited by an ad hoc committee consisting of interested members of the three committees involved. By March 30, 1973. ( 3) The revised list of questions is sent to accredited library schools in the U.S. and Canada for additions, deletions and word revisions. By April 15. ( 4) !SAD/LED Education Committee members together with invited members from the ASIS Committees involved revise the question list at Las Vegas. (5) Designating potential module constructors for each of the questions on the final question list. Formulation of module specifications at Las Vegas. Immediately after Las Vegas the designated module constructors will be solicited. They will be sent a "question" together with module specification. This is where we are January 29, 1973. EXHIBIT 2 Respectfully submitted, Elaine Svenonius TELECOMMUNICATIONS COMMITTEE Annual Report 1972/ 73 1. Communications: a. Cable Newsletter: After exhausting every possible avenue within ALA (AMLIBS, lola TechnicaJ. Commtmications, Headquarters Clearinghouse, Information Packet) the chairman received the mandate considered necessary to go ahead with plans for an effective communications medium. The mandate came in the form of a unanimous resolution from the 104 attendees at the Cable Institute, held in September, to pro- duce such a newsletter. !SAD Board approval/endorsement was obtained, and Lbe chairman approached ASIS which will publish the newsletter. Start-up money was obtained from the Markle Foundation for the first promotional issue, which will receive widest distribution. Based on response to the initial mailing, the newsletter will continue on a subscription basis, provided 750 subscriptions are obtained. The chairman and two other people will volunteer their time as coeditors. b. The chairman has been operating a clearinghouse on cable information out of her office, which has become incredibly time-consuming. It is impossible for one person to do all that is needed; innumerable letters have been written and phone con- versations held with people and groups wanting advice on dozens of issues connected with cable. It is hoped that the newsletter, the Proceedings of the Cable Institute, and a soon-to-be-established Task Force within SRRT on Cable will lessen the almost im- possible load. c. Specific letters were written in response to requests from the Rocky Mountain 62 ]oumal of Library Automation Vol. 6/1 March 1973 Federation (justification of library use of the ATS-F satellite), Senator Mike Gravel (introduced several bills on telecommunications, wanted to know what libraries could do with this medium), and a presentation will be made to the National Commission hearings in New York. d. A librarian-representative was located, suggested, and subsequently appointed to the FCC Federal-State-Local Advisory Committee on Cable. (A first for librarians!) e. Liaison was maintained with nonlibrary groups: Publicable, of which the chair- man is a member, the MITRE Symposium on Cable, to which the chairman was in- vited, and, as a result of that meeting, the Aspen Workshop on Cable in Palo Alto, which the chairman attended by invitation from Douglass Cater, together with eight other people, to decide on the direction this activity should take. At all three meetings the chairman attempted to represent the library viewpoint on cable. f. A Las Vegas program was to be planned, together \vith ACRL and the AV Committee. Plans did not materialize, and the committee is being approached by the soon-to-be-established SRRT Task Force on Cable to cosponsor a program on cable at Las Vegas. 2. Training: 1. Institute on Cable Television for Librarians: Held September 17- 20, 1972, and attended by over 100 librarians from thirty-four states, representing public and state libraries primarily, this was directed by the chairman, and funded by USOE. Russell Shank and Frank Norwood, consultants to the TC Committee presented major talks. The entire Institute was videotaped and the tapes are available. Proceedings will be issued in March as a double issue of the Drexel Library Quarterly. The Institute was designed to provide a format and material (including videotaped presentations) to allow others to do their own institutes. 2. Telecommunications Seminar: Conducted by Russell Shank, consultant to the TC Committee, it presented an overview over various aspects of telecommunications. Held in Washington September 25-26, 1972, it, too, was attended by almost 100 Ji. brarians from all types of libraries. The chairman and Frank Norwood, consultant, participated in the presentation of papers. 3. Legislative Matters: The committee expressed its concern to the ALA Legislation Committee about the lack of sufficient personnel to keep abreast of legislative and regulatory matters affecting telecommunications. The chairman of the Legi~lation Committee responded by stating that the ALA Washington Office had been trying to do their best, in the absence of funding for additional personnel, and would continue to do so. The com- mittee attempts to follow legislative and regulatory developments in the telecommuni- cations area, and works closely with the Washington Office in this activity, providing persons to testify, and supplying two of the four members of the Subcommittee on Copyright (Shank and Kenney) . The committee participated actively in the revision of the ALA Policy Booklet, concerning itself with matters pertaining to networks and telecommunications. All recommendations were incorporated in the final draft of this document. 4. Research: The telecommunications requirements study, long ago proposed, is dormant. Shank and Kenney are actively working on putting together a proposal to respond to a call for proposals from NSF in the area of telecommunications policy research. The committee will discuss the proposal during the Midwinter Meeting, 1973. Respectfully submitted, Brigitte Kenney 5762 ---- lib-s-mocs-kmc364-20141005043630 63 TECHNICAL COMMUNICATIONS ANNOUNCEMENTS With this issue we begin the process of shifting the emphasis and content of Technical Communicatiom. Some of the newsletter features of Technical Commu- nicatiom will be dropped due to the fact that as a quarterly publication it cannot be satisfactorily used to disseminate cer- tain kinds of temporal information (e.g., short lead-time announcements, notifica- tion of institutes, seminars, meetings, etc.) . Instead, brief articles, letters, or comments on lOLA articles, and perti- nent information about technical develop- ments will hopefully assume a larger per- centage of the allotted pages for Tech- nical Communications. The ISAD Edi- torial Board, in approving these changes, voiced the opinion that Technical Com- munications would be much more useful as a result. Concise technical communications and information notes featuring any aspect of the application of computers, systems analysis, or other technological develop- ments (hardware, software, or tech- niques) pertinent to libraries are solicited. The design is also meant to provide a forum for the more rapid dissemination of information that will sometimes serve as the basis for the longer, more detailed articles which are published in lOLA. Thus, the salient findings in a study, or the important developments taking place in a project or ongoing operation can be made known long before they might oth- erwise be brought out in a formal presen- tation. These changes should become evident by the March 1974 issue, and to insure that this type of material does begin mak- ing its appearance, please send your let- ters notes and technical communications , , . to the editor of Technical Commumca- tions. (See cover sheet, Page II.) TECHNOLOGICAL INROADS CATV Library Application In Mobile, Alabama, a cable television subscriber can telephone the public li- brary's reference department and turn to the library service channel to see the in- formation requested over the telephone. The library installation costs are reported to be less than $500. The spectrum of patrons making use of the service include financial analysts (looking at charts and graphs) , illustrators and advertising personnel (obtaining pic- torial representations) , technicians re- questing information from manuals, teach- ers, and even tourists looking for direc- tional information. Business applications loom important in the future and are already underway. It is now possible to offer a centralized mi- crofilm storage with coded access to vari- ous documents. Similarly it was noted that retrieval and transmission of videotapes for the use of realtors will be explored. This would provide real estate agents with the ability to give a videotape tour of properties for sale. Transmission time of a tape could be metered and billed to the appropriate realtor. Other possible applications encompass such library activities as story hours, in- struction for children in schools, and live telecast of library functions. (Extracted from The American City, March 1973) PEACESAT (Pan Pacific Education and Communication Experiments by Satellite) Populations in the Pacific Basin are of- ten small in size and divided by great dis- tances, making it impossible for many to sustain adequate levels of educ ation, health care, and technically based ser- vices. Inadequate communications consti- tute a principal barrier to development. 64 Journal of Library Automation Vol. 6/1 March 1973 Pan Pacific Education and Communi- cation Experiments by Satellite (PEACE- SAT) is a demonstration project in which selected educational and medical institu- tions in the Pacific Basin are linked by means of communication satellite relay. Voice and facsimile are sent and received by each location in the system. Slow-scan television and teletype will be used at some locations. The PEACESAT project is not a per- manent service. It is a pilot demonstra- tion to provide experience in the use of long distance transmission on which to base the design of future telecommunica- tion services. Its objectives are to increase the quality of education in the Pacific by facilitating sharing of scarce, costly re- sources; to improve professional services in sparsely populated areas through tele- communication support; and, generally, to assist in applying the potential of satellite technology to the solution of domestic so- cial problems and peaceful world develop- ment. The system is unique in the world. The satellite used is the ATS-1 operated by the National Aeronautics and Space Admin- istration. Only established and tested tech- nology is used in the system. The costs to participants are small. Exchanges conduct- ed through the PEACESAT facilities in- volve two-way communication, two or more locations interconnected at one time, and often many users at each location en- gaged in dialogue with users at other ter- minals. The format and content are deter- mined by the users. The idea of using satellite relay to fa- cilitate communication for educational, health, and community services in remote areas of the Pacific Basin was proposed in 1969 to the National Aeronautics and Space Administration by Dr. John By- strom, professor of communication at the Manoa campus of the U Diversity of Ha- waii. A start on the project was made in December 1970 when President Harlan Cleveland approved a grant from the U Di- versity's Innovative Program. In February 1971, NASA approval for use of the ATS-1 was granted. Dr. Paul Yuen, pro- fessor of electrical engineering, and ICa- tashi Nose, associate professor of physics, had two prototype ground terminals avail- able when the Federal Communications Commission approved licenses for the ex- periment. In Phase I of the project, beginning April 1971, ground terminals construct- ed at the university were successfully test- operated and utilized between Hawaii Community College in Hila and the Ma- noa campus of the University of Hawaii. The Hawaii State Legislature emphasized its support of the project by appropriating $75,000 in April 1971. The international network began in Jan- uary 1972 with terminals at Wellington Polytechnic in Wellington, New Zealand, and the University of the South Pacific in Suva, Fiji, joining the system. Additional terminals have been established at Maui Community College, Kahului, Maui ( Ha- waii); Papua New Guinea Institute of Technology, Lae, PNG; the University of South Pacific Centre, Nuku'alofa, Tonga ; and the Department of Education, Pago Pago, American Samoa. Operating ter- minals are being established at Saipan and Truk in the Trust Territory of the Pacific Islands. The project is administered by the Uni- versity of Hawaii with the assistance of the Governor's Committee on Pan Pacific Educational Communications, appointed by Governor John Burns and headed by UH President Harlan Cleveland. A Facul- ty Advisory Committee assists develop- ment at the University of Hawaii. Recom- mendations for long range planning in medical research are provided by a Med- ical Communications Study Advisory Committee Project director is Dr. John Bystrom, assisted by James McMahon, system coordinator. Technical design and development is under the direction of Dr. Paul Yuen. Key to the system is a small inexpensive ground terminal designed and constructed at the university by Katashi Nose. Each of the educational institutions which have terminals have their own au- tonomous staff and organization which op- erate the equipment and develop educa- tional uses of the system. Management of the PEACESAT terminal on the Manoa campus is under Carol Misko, terminal manager. During its relatively short existence, the PEACESAT system has been utilized in a wide variety of educational and scien- tific programs. The East-West Center used a receiving station on the ocean liner Pres- ident Wilson to conduct orientation ses- sions with its arriving grantees. Hamilton Library on the Manoa campus has demon- strated exchange of materials with other locations via PEACESAT. Doctors of the Pacific Research Section of the National Institute of Health consult with doctors at the Bethesda, Maryland, National Li- brary of Medicine. The Hawaii Coopera- tive Extension Service has used the system to conduct seminars with specialists from New Zealand, Fiji, Tonga, and Hawaii lo- cations. Faculty and students at the various campuses of the system have utilized the communication channels made available by PEACESAT. A few among the many disciplines they represent are political science, English, Spanish, education, In- donesian languages, physics, oceanogra- phy, computer science, journalism, urban planning, and speech-communication. It was the PEACESAT system which car- ried the world's first regularly scheduled class of instruction via satellite. Within the Pacific Basin keen interest has been shown in the development of this project, as evidenced by discussion of PEACESAT at meetings of the South Pacific Forum and the South Pacific Com- mission. The PEACESAT network recent- ly provided the means for South Pacific poets to exchange their works with one another. Amonv those joining in the well- received poetry series was the poet lau- reate of Tonga. In April 1972 the National Library of Medicine awarded the University of Ha- waii a contract for a study of medical net- working in the Pacific, incorporating dem- onstrations of library and professional ex- changes. Hours of operation for the network are currently 9:00-10:00 a.m. and 4:30- 6:00 p.m., Monday through Friday {Hon- olulu time). The Manoa Exchange Cen- ter is located in George Hall (212) on the Technical Communications 65 campus of the University of Hawaii. PEACESAT Project, Program in Commu- nication, University of Hawaii, Honolulu, HI 96822. Phones: ( 808) 948-8848, ( 808) 948-8771, RCA Telex #723597. Tomorrow's Library: Spools of Tape Libraries with ranks of musty tomes and files of catalog cards may be difficult to flnd in the future. Books probably will be in museums; libraries will be on spools of computer tape. Library users might push a button for a no-deposit, no-return paperback printout, instead of standing in line for a hardback from the stacks. Movement in these directions has al- ready begun at the University of Georgia where a staff of 110 and $9 million of computer hardware provide the following type of service: A professor sits before a CRT and types out the chemical names of DDT on the keyboard. Almost imme- diately, the television screen above the keyboard displays a list of 176 scientific references to DDT. This information is the result of an elec- tronic search of about 40,000 issues of Chemical Abstracts, a title compilation on computer tape of all published scientifl.c papers in chemistry. Similar abstracts are available in other scientific fields, and three large foundation grants will enlarge these holdings to include literature in en- gineering, education, and the humanities. The information retrieval system allows a user to "browse as he would in a li· brary." But the browsing is done through one of 37 remote terminals. The number of remote terminals is expected to more than quadruple in future years, giving a total of some 200 individual outlets. (Ex- tracted from CoUege Management) LIBRARY PROJECTS AND PROGRAMS Microfiche Catalog by Tulsa City-County Library The Tulsa City-County Library com- puter output microfiche catalog was pub- lished in early March, according to Ruth Blake, director of technical services, Tul- sa City-County Library. The catalog is in 66 Journal of Library Automation Vol. 6/1 March 1973 register-index format. The register, ar- ranged by number, contains full biblio- graphic information for each title. Adult and juvenile indexes contain brief biblio- graphic entries, location information, and a reference to the register number of each title. Both indexes are in dictionary form, with authors, titles, and subjects in a sin- gle alphabet. Minnesota Bio-Medical Mini-Computer Project The University of Minnesota Bio-Med- ical Library has received a $361,729 three-year grant from the National Li- brary of Medicine to provide support dur- ing the development of a low cost, stand alone, library dedicated computer system. The system will employ on-line terminals for data entry and file query functions, and will be based on an integrated system design of a processing system which would be suitable for use in other librar- ies of a similar size. The premise of the development is that an integrated acquisi- tions, accounting, in-process control sys- tem for all library materials coupled with an on-line catalog/ circulation control sys- tem can be operationally affordable by a library or system of libraries in the 200,000 volume class using its own com- puter system. A Digital Equipment Corp. PDP 11/ 40 system has been selected. The CPU fea- tures 16K core, 16 bit word, power fail/ automatic restart, programmable real time clock, extended instruction set, and mem- ory management option which permits ac- cess to 124K of memory. A DEC writer data terminal will be used as console and initial terminal on the system. Two 9 channel 800 bpi tape drives and one 40 million character moving head disk pack drive comprise the system's initial mass storage. A 132 column, 96 character set line printer completes the initial hardware configuration. Before the system is in- stalled, suitable CRT type terminals and communications interfaces will be chosen. Six of these terminals will be required when the system is fully operational. Memory expansion in the CPU and addi- tional mass storage may be acquired de- pending upon needs, although the design efforts will be to minimize the amount of core required for the system and most ef- ficiently use the mass storage available. One of the problems of using a mini- computer system to service an interactive on-line library system is a lack of a suit- able operating system which can require minimal residency in core, yet contain only the functions needed on a library sys- tem. Current timesharing operating sys- tems provide some parts of a system, such as device handlers, but require too great allocation of core, or programming in a compiler level language such as BASIC. This approach has been deemed unsatis- factory if system costs for hardware are to be kept reasonable. During the development period a PDP 11140 DOS operating system will be used to assist in writing a hybrid operat- ing system and utilities using the PDP 11/40 ASSEMBLER language. Also un- der development will be the file design, the system common modules, and system dictionary. These elements of the system will be required to then design and pro- gram the individual system applications. Since the grant does not provide any support for data conversion, the circula- tion application will be developed and in- stalled for the reserve materials. These only number a few thousand and involve short loan periods and other complexities which will provide an excellent test of a circulation control system for general li- brary-wide use. Other application systems, such as acquisitions and serials already are computer supported and therefore have existing machine-readable data files. The project staff includes Glenn Brud- vig, director of the Bio-Medical Library as principal investigator; Audrey N. Grosch of the University Libraries Sys- tems Division as project director and the following systems specialists: Bob Den- ney, Carl Sandberg, Eugene Lourey, and Don Norris. PERTINENT RECENT PUBLICATIONS Nationwi.de Survey of Library Automation -Phase I. The California State University and Colleges has published the final report of Phase I of its nationwide survey of library automation. This comprehensive survey performed for the Chancellor's Office-Li- brary Systems Project by Inforonics, Inc. covers over twenty-five library automation projects in the United States and Canada. Those interested in obtaining a copy should write, enclosing a check in the amount of $5.00 (Californians remember the 6 percent tax) to Chancellor's Office; The California State University and Col- leges; 5670 Wilshire Blvd., Suite 900; Los Angeles, CA 90036. A Survey of Commonplace Problems in Library Automation, compiled by Frank S. Patrinostro. This survey documents actual library experiences concerning problems encoun- tered, their causes, and what steps were taken to solve the problems. Order from LARC Press, Ltd.; 105-117 W. Fourth Avenue; Peoria, IL 61602. Survey of Commercitdly Available Com- puter-Readable Bibliographic Data Bases, edited by John H. Schneider, Marvin Technical Communications 61 Gechman, and Stephen E . Furth. Pub-· lished by ASIS. This reference tool provides descrip- tions of eighty-one machine-readable data bases. Key Papers on the Use of Computer- Based Bibliographic Services, edited by Stella Keenan. Published jointly by the National Federation of Abstracting and Indexing Services (NFAIS) and ASIS. Contains selected papers on the use and evaluation of computer-based ser- vices. Cost Reduction for Special Libraries an.d Information Centers, edited by Frank Sla- ter. Published by ASIS. The four sections of the book cover an overview of recent literature on costing for 1ibraries; general cost reduction con- siderations; show and tell-special cost reduction efforts; and real costs for infor- mation managers. (The three preceding publications are available from Publications Division, American Society for Information Science, 1140 Connecticut Ave., N.W., Washing- ton, DC 20036.) 5763 ---- lib-s-mocs-kmc364-20141005043703 68 BOOK REVIEWS Indiana Seminar on Information Networks (ISIN). Proceedings. Compiled by Donald P. Hammer and Gary C. Lelvis. West Lafayette, Indiana: Purdue University Li- braries, 1972. 91 p. (Available at no charge from the Extension Division, Indiana State Library, 140 North Senate Avenue, Indi- anapolis, Indiana 46204 as long as the supply lasts). The Indiana Seminar on Information Networks (October 26-28, 1971) was an attempt to introduce Indiana librarians to the benefits (and presumably problems) of library networking. Papers included in the proceedings are Introduction to Net- works (Maryann Duggan), Library of Congress MARC & RECON (Lucia J. Rather), NELINET (Ronald F. Miller), An On-Line Interlibrary Circulation and Bibliographic Searching Demonstration (Gary C. Lelvis and Donald P. Hammer), Ohio College Library Center (Frederick G. Kilgour), User Response to the FACTS (Facsimile Tran.smission System) Netu1ork (Lynn R. Hard), Indiana TWX Network Discussion (Margaret D. Egan & Abbie D. H eitger), and How Does the N etwork Serve the Researcher? (Irwin H. Pizer) . As with any collection of written papers or oral presentations, the quality is mixed. The papers are introductory in nature, the Pizer article being the exception. The ma- jority report "case studies" of particular automated operations and/ or networks (MARC & RECON, NELINET, OCLC, FACTS) . The FACTS article is the most interesting of these "case studies" because it moves beyond simply reporting "how we done it good" into an evaluation of why the network did not succeed (the network did not meet a real and/or consciously recognized need of the libraries it was pro- posing to serve) and emphasizes the im- portance of careful planning. Any would- be network planner should read this article; there are many lessons to be learned. Although the collected papers have all of the disadvantages usually associated with a collection of oral presentations (material is loosely organized and lacks continuity, introductory and oversimplified, repetitive, and out of date), they are a valuable ad- dition to the growing body of literature dealing with networks both from the ideal- ized conceptual view and, perhaps more importantly, from the practical reality view of existing networks. Kenneth J. Bierman Systems Librarian Virginia Polytechnic Institute Computers and Systems; An Introduction for Librarians, by John Eyre and Peter Tonks. Hamden, Connecticut, Linnet Books (Shoe String Press), 1971. 127 p. $5. 75. ISBN: 0-208-01073-4. At last an inexpensive introductory text specifically written for librarians and li- brary students! Not since N. S. M. Cox's The Computer and the Library have we had such a short, easy to read, yet compre- hensive, description of the essentials. Com- plementing the text are twenty-nine figures illustrating everything from batch and real-time processors, disc drives, program process, and systems flowcharts to data elements, formats, and input procedures, MARC II records on magnetic tapes, and sample pages from a computer-produced author catalog. The text reads like a well-organized glossary, treats the subjects of library use of computers and systems analysis in a way at once simple and informative. The authors had tested the material with students in courses at the School of Li- brarianship of the Polytechnic of North London. Thanks to the British-American cooperation surrounding MARC efforts, this book will be as useful in our library school classes as it is in theirs. The index d eserves a special note be- cause it was compiled after the style of PRECIS developed by the British National Bibliography. It is a facet analysis of the text featuring access to "activity:thing: type:aspect" in a prescribed permuted order. Although there is not much empha- sis in such a text on subject access or in- formation retrieval, this is not entirely overlooked and this index serves as an ex- cellent example of what could be done by computer. Truly an excellent introduction to com- puters and systems analysis for librarians! A two-page bibliography contains sug- gestions for further reading on the topic or for an expanded reading of various applications of computers in libraries. Pauline Atherton School of Library Science Syracuse University ISIS: Integrated Scientific Information Sys- tem; A General Description of an Ap· proach to Computerised Bibliographical Control, by William Schieber. Geneva: International Labour Office, 1971. 115p. $1.50. This document is a well-written descrip- tion of the computerized library system developed at the International Labour Office. Planning and development for the system began in 1963. It has been imple- mented and is now in operation within the Central Library and Documentation Bmnch of the ILO. The ISIS Bibliographic Control System is a large file system for storing, processing, and retrieving bibliographic information. The ILO data base consists of some 45,000 records of books, periodical articles, and other documents. Each record consists of conventional bibliographic data (with less detailed definitions than MARC data, how- ever) plus an abstract. In form, the ab- stract appears to be written in natural language, but all descriptor words used in the abstract are taken from a controlled vocabulary and, in fact, provide subject indexing. On-line terminals are used for IDe searches. The search system allows searches by subject descriptors, language, and date of publication. Sequential formulation of the search allows control of the number of responses to a desirable size. Records are also indexed on various data fields, such Book Reviews 69 as author and title. Display of records and browsing are handled on line, but printing of lists or bibliographies is handled through subsequent batch printing jobs. Regularly scheduled outputs of the system include printed catalogs, indexes, and au- thority lists. Two other systems have been developed at the ILO using some programs and files of the Bibliographic Control System. One is for controlling loans of library books, the other is for serials data and includes a subsystem for routing library periodicals. These three major systems are described in some detail in this report. A fourth sec- tion deals with system monitoring and control. Costs are discussed here. The ISIS system is an interesting and unique one even though the system is geared primarily to a special library en- vironment. It is evident that much careful thought and attention to detail went into the system design and development. The integrated use of programs and files as described here and the details of some design elements make this a useful docu- ment. The report itself is well done. Describ- ing a complex system for a varied audience is a difficult task. The author, William D. Schieber, has put together an excellent example of a systems report document. Charles T. Payne Systems Development Office University of Chicago Library Title Derivative Indexing Techniques: A Comparative Study, by Hilda Feinberg. Metuchen, N.J.: The Scarecrow Press, 1973. x+297p.; index and bibliography. This book is primarily a survey of key word indexes, with some discussion of is- sues in indexing. The survey is quite good, but already out of date. The discussion is unfortunate. The survey covers a wide range of com- puter-based article title key word indexes, including extreme cases such as Permu- term. Sample pages are included for fifty-six indexes, and thirteen lists of ex- cluded words ("stopwords") are· given. Reproduction of samples is generally ex- 70 Journal of Library Automation Vol. 6/ 1 March 1973 cellent, and this portion is valuable in showing the virtues and defects of various approaches to key word indexing. Since this survey, at least three major libraries have begun publication of key word in- dexes to serial titles, a type of index with different problems which is likely to be more common in the future. The discussion suffers from a lack of focus. There are no clear standards for key word indexes or the traditional tools they complement or replace, and studies of user preference and convenience have been limited and inconclusive. It is dif- ficult to say what makes a key word index more or less workable, and this book seems to cloud the issues even more. Ms. Feinberg makes some questionable and unsupported assumptions about what users think, want, and need, and a number of recommendations which are at best only applicable to indexes of article titles in scientific fields. Take three major recommendations: plural and singular forms should be inter- filed, synonyms and similar words should be interfiled, and foreign titles should be translated. The University of California (Berkeley) library found "College," "Uni- versity," "Company" and "Papers" to be good exclusion words, while "Colleges," "Universities," "Companies," and "Paper" are good subject words. Synonym control increases homonym problems, makes for longer (and thus more difficult to use) lists, and entails difficult decisions as to what con~titute true synonyms. Translation raises the qm ..., tion of whether a user should be guided to a publication he may not be able to read. In sum, these and similar decisions should depend much more on the field of study and user popu- lation than on this type of general treat- ment. There are other problems reflecting de- ficiencies in the areas of technical back- ground, understanding of typography, and appreciation of some reasons for key word indexing. Ms. Feinberg comes out strongly in favor of "title enrichment"-adding artificial titles to improve indexing. This, however, adds cost and time to the key word approach, and subtracts from its clear advantages. A large section is de- voted to an experimental study of different indexing programs, with the result that different programs produce different in - dexes. Generally, the discussion detracts from the survey. Finally, the title chosen seems unfor- tunate. "Key word indexing" may not be an ideal term, but it is fairly well known; must we introduce yet another vague, polysyllabi<, phrase, "title derivative in- dexing"? Walt Crawford University of California Berkeley Accountability: Systems Planning in Edu- cation. Leon Lessinger & Associates. Creta D. Sabine, Editor. Homewood, Ill.: ETC Publications, 1973. 242 pages. "Accountability" has become a rallying cry in many educational circles of late: for the public in its demand for visible results for educational dollars, and for educators as they attempt to define and defend new programs. This well-sequenced collection of nine papers on this subject addresses the problem of accountability at all levels of the educational 'enterprise. First is a conceptualization of systems- planning through an explanation of the systems approach, cost effectiveness, and cost analysis. Next are specific methods of systems-planning at the classroom, com- munity college, university, and state ;"' Stuffing Envelopes 600 2,000 600 1,400 - - - ~ Looking up Addresses 400 1,200 400 800 - - - Vl Reserves Staff 18,267 18,267 14,763 - - - 3,504 ~ Equipment c::: 1030 System 7,182.48 7,182.48 14,606.04 7,423,56 '--- - - en 2260 Terminals - 1,682.64 2,243.52 1,682.64 2,243.52 - - > Share of 2848 ·- 3,733.20 3,733.20 3,733.20 3,733.20 - - z t:J 2741 Terminals - - 2,176.80 - 2,176.80 - - tr1 ::0 $40,428.84 $38,257.08 $3,440 $6,964 en 0 Net Saving in Annual Cost of Batch System Over: PHASE I $36,988 z PHASE II $31,293 '-0 --l 98 Journal of Library Automation Vol. 6/2 June 1973 APPENDIX 2 GRoss CoMPUTER OPERATING CosTS DuRING PHASE I Costs shown include all circulation runs. Nov. 1970 Dec. 1970 ]an. 1971 Feb. March April May $ 5,402.57 4,265.12 3,605.33 3,937.78 4,349.41 2,981.39 2,421.39 CPU Hrs. 36.0241 28.4410 24.0419 26.2595 29.0043 Average monthly operating cost of Phase I over seven months: Average monthly operating cost of former batch system: APPENDIX 3 19.8820 16.1487 $3,851.85 $3,100.00 DEVELOPMENT CosTS FOR PHASE II CoMPLETION Present system (Phase I ) converted to Minerva with new file organization, etc. interface to batch system. and Systems Computing Centre Library Programming and systems tests Est. Pacific Westem Consulting (Minerva) at $150 per day Computer time (est. ) Forms, staff training Parallel runs Minerva Total Phase II on-line Systems Computing Centre IBM Support Library Personnel Programming and Systems tests Pacific Western Consulting Computer time (est.) Forms, staff training Parallel runs (33 d ays at $35 per day) Equipment rental (@ $1,200 per month additional) Total development Phase II Total System Development (Already spent-in addition): $11,576 2 months 7 days Subtotal 7 months 5 days Subtotal ( 10 days) 13 months 48 days Subtotal 21 months 10 days Subtotal $ 1,800 200 2,000 5,600 750 8,350 1,500 50 350 --- $10,2.50 --- 11,700 1,400 1,900 15,000 16,800 1,500 33,300 10,000 1,000 1,155 1,300 46,755 57,005 - - - Circulation Mash:r Lis~lng On-Line and Back at S.F.U. j SANDERSON 99 APPENDIX 4 (a) ORIGINAL CIRCULATION SYSTEM IBM 1034 Card Punch Dajly Circulation S)l>tom 100 Journal of Library Automation Vol. 6/ 2 June 1973 1030System Circulation Cards Payment Cards, Lost Book Billi, Reserve Bills, etc. Reserves Listiog By Course APPENDIX 4 (b) PHAsE I Create On~ Line l..o.an Master Inquiry and Update Program (Status, Holds & Renewalt) 3 In Cene ntl Lo3.nS 1031 Badge- Card Readers 2 In Reserves Reser\le Listing"' By Course {Weekly) Bode-Up 1034 Ca rd Punch On~Line and Back at S.F-U-/SANDERSON APPENDIX 4 (c) PROPOSED PHASE II C"..reate On-Lioe L200) (> 200) (>200) 171 (>200) 67 25 18 16 172 90 71 63 (>200) 105 102 81 16 8 6 6 55 25 23 II 67 36 32 30 26 12 9 87 44 38 106 62 57 8 5 5 29 21 21 37 30 30 17 50 78 5 23 31 Fig. 1. Number of Names Retrieved 90, 99, and 99.5 Percent of the Titne for Different Key Structures acters than the key segment to be derived, the segment was left-justified and padded out with blanks. If there was no middle name or middle initial, a blank was used. Another program derived shorter keys from the 8,7,1 structure ranging from 3,0 to 5,2,1. Next, a sort program arranged the shorter keys in alpha- betical order. A statistics collection program then processed the alpha- betical file. This program counted the number of distinct keys, built a fre- quency distribution of names per distinct key and cumulative frequency distributions of names per distinct key in percentile groups. RESULTS Figure 1 presents the findings at three levels of likelihood for retrieving n Catalog Records Retrieved/ LANDGRAF 105 Table 1 . Number of Names R etrieved With 90 Percent Likelihood No. of Characters 3 4 5 6 7 No. of Names Retrieved ( > 200) (>200) (>200) ( > 200) 26 25 16 171 18 17 12 8 8 16 9 6 5 5 Key Structure 3,0 4,0 3,1 5,0 3,2 4,1 3,1,1 6,0 5,1 3,3 4,2 3,2,1 4,1,1 6,1 5,2 5,1,1 3,3,1 4,2,1 or fewer names when a variety of search key combinations were employed ranging from three to six characters from the surname, zero to three char- acters from the first name, and with or without the middle initial. Table 1 is an extraction from Figure l and contains the number of names retrieved at a level of 90 percent likelihood for the various search keys employed. Figure 2 has the same structure as Figure 1 but contains the degree of distinctness as percentages, ( no. of distinct keys) 100 no. of entries x percent. Table 2 records distinctness arranged by number of characters per key. Figure 3 is a graphical representation of the degrees of distinctness of the various keys. In this figure, different types of lines connect points represent- ing key structures that contain an equal number of characters. The bottom line in Table l may be read as saying that 90 percent of the time a 4,2,1 key will retrieve five or fewer names from a file of 167,745 personal name keys. The bottom line of Table 2 states that from the same file the 4,2,1 key. yields a single name 64.1 percent of the time. DISCUSSION, This experiment has shown the degree of distinctness-that is to say, the number of distinct keys divided by the total number of entries from which all keys were derived-to be a useful tool in determining what key struc- tures may be efficiently used. As seen by comparing Figure 1 with Figure 2 and Table 1 with Table 2, there is a high degree of correlation between distinctness aJ}d the likelihood of retrieving a certain number of names 90, 106 Journal of Library Automation Vol. 6/ 2 June 1973 NO. OF CHARACTERS EXTRACTED FROM THE SURNAME ~ 0 a: I- lA.~ 0 03: ~!::: ~~ a:o 1-z ~< 3:-' Cl)t-< ffiiE t;w!: :~w Susan Ma1tin, and Berniece Coulter, Secretary, ISAD. GUESTS-Stephen Salmon, James Rizzolo, Douglas Ferguson, Brett But- ler, David Waite, Velma Veneziano, Pearce Grove, Ronald Miller, Freder- ick Kilgour, and Lawrence W. S. Auld. President Shoffner pronounced a quorum present. Highlights of Minutes 149 SEMINAR AND INSTITUTE TOPICS. Chairman Ron Miller said the committee had reviewed the literature and felt there was value in regard to continuing education programs. He said they had received summaries and statistics on previous seminars and felt the institutes should be contin- ued. CONFERENCE PLANNING COMMITTEE REPORT. Brett Butler, chairman, summarized the seminars held during the year. The microforms seminar in Detroit was not held as a separate seminar but incorporated into sessions of the National Microform Association's Annual Meeting. Two other seminars were not held as planned but postponed. Surplus monies in the Preconference fund were to be used for publishing of the proceedings of the Preconference. Tapes had been professionally made of these proceedings. The 1975 meetings were being planned now with the goal of cooperat- ing with ASIS. The program in New York on National Libraries which was cancelled last Midwinter (1973-Washington, D.C. ) was being considered with the scope to be increased to include other national libraries-France, Great Britain, etc. The program could require more time than the normal time slots. Maryann Duggan was preparing a goals document to be distrib- uted to each participating library. Mr. Butler continued with his report saying that Maryann Duggan would plan and coordinate the Networks Seminar for the spring. The focus would be on the proposed theme-"Ad- vertising Library Automation-How to Share Yom Efforts." The general feeling is that ISAD should also do something with other associations in state library operations and library schools. Mr. Fasana stated that some feedback on the Las Vegas Preconference had been related in such questions as "Why was the registration fee for preconference so expensive this year?" Mr. Butler reported that a reduc- tion in the price of the proceedings was offered to registrants. !SAD would have to subsidize ALA Publishing Services for the discount offered these registrants because ALA cannot sell at different prices to the membership. Mr. Miller stated that his committee felt that a good deal of the work now done by volunteers should be done by a staff member and those costs be in- cluded in the registration fees. This approach should be used for future seminars. Mr. Fasana said that some divisions list the analysis or a break- down of costs on their advertising or program for preconferences. Apparently, President Shoffner said, the price was reasonable, judging from the response. Mr. Butler said another objection was the conflict with ACRL' s preconference on networks. Had we had knowledge of their pre- conference previously, the two could have been coordinated. Mr. Shoffner said that in a large conference conflicts were to be expected. Mrs. Venezi- ano felt there were few people in attendance at the networks preconfer- ence who, had it not been held, would have attended !SAD's. PRESIDENT GIVEN AUTHORITY TO APPOINT COMMITTEE 150 Journal of Library Automation Vol. 6/ 3 September 1973 MEMBERS WITHOUT INDIVIDUAL APPROVAL. Mr. Kilgour asked that the Board give him the authority to appoint committee members with- out approval of each individual as the ISAD Bylaws stated. He asked for blanket approval. This was previously given to Mr. Shoffner. Mr. Kilgour asked that the Board invoke Section 2B of the Bylaws and provide that the appointments last until the end of the president's term. Mr. Fasana suggested that a sense of "yes" be given. REPORT OF MARBI COMMITTEE. Mrs. Veneziano, the chairman, suggested the acronym "MARBf' (Machine-Readable Bibliographic In- formation) be used for convenience in remembering the lengthy title of the committee. The committee was still concerned with trying to define its role and the mechanism to implement the role. One important consideration of the committee was to serve as a link be- tween the members of ALA and the Library of Congress in order to avoid the repetition of the problem which arose regarding ISBD and MARC rec- ords. Henriette Avram had prepared a position paper on this committee's relationship to any impending content changes to MARC records. These would not be changes required as a result of changes in cataloging rules, over which LC has no say, but rather changes in the needs of the .library community. The paper was not intended to set forth a permanent operation, but to propose guidelines. It outlined what would happen at the point where some possibility for change was discovered, and how LC and the commit- tee would communicate. It did not, of course, detail the committee's com- munication with division members and the relationship of the committee to the MARC Users Discussion Group. The committee concluded that it should communicate at least with the MARC subscribers, and that James Rizzolo, chairman of the MARC Users Discussion Group, would assume the responsibility of circulating information on impending changes to the MARC subscribers and MARC users and get the information back to the committee which would determine if there was a consensus and in its best judgment give a reply back to LC. A second activity of the committee would be to reach interested people through some means as ]OLA TC or LRTS. The voluminous amount of papers should not be distributed generally as they become obsolete quick- ly, but a center is needed for storing these papers and the fact that they exist should be circulated to the library field in general. Copies could be made available for a price to those interested. Mrs. Veneziano hoped this could be worked out with someone at ALA Headquarters. Another area-that of nonbibliographic data, e.g., uniform library codes, dealer codes, etc.-is of interest to the committee. Mrs. Veneziano's personal opinion was that though the function statement of the commit- tee indicates responsibility for bibliographic information only, the com- mittee should also be involved with anything which impacts the use of that bibliographic data. She expressed hesitancy to have many other committees Highlights of Minute~ 151 working in this area as too much time is devoted to getting feedback from other committees before a decision can be made. . The committee would like to propose adoption of a mechanism where.- by it can set up a subcommittee, task force, or working group with a limited life-span which would study and react to very specific technical proposals and working papers, etc. which are developing informally at a national and international level. These subgroups must be very responsive so that the committee will not be placed in a position where it cannot take action readily. Also there must be a flexible mechanism for establishing subcommittees. RTSD has strong feelings on setting up such a group without approval by the division. She did not think, however, there was any objection to creat- ing task forces and felt that it was the only way to obtain expert comment on some of these materials. The feeling of the committee was that Henriette A vram' s position pa- per should be accepted with minor provisos: the implementation (Section 1-B) and the time frame allowed for reporting back to LC. Henriette A vram is to go back to LC and check these modifications. H they meet LC's approval, the committee will accept the position paper. The Character Set Subcommittee, consisting of Charles Payne, David Weisbrod, and Michael Malinconico, will study the latest draft working papers and comment to Mrs. Avram who is on the Task Force. · · · Mr. Fasana said the committee had power to set up subcommittees be- cause of the board's previous approval on this; in addition, the function statement gives the committee the right to set up a task force. President Shoffner pointed out that the results of deliberations require a joint submission to all three boards. Mr. Hammer volunteered help in any coordination which might be needed. Comments regarding the distribution of the opinion survey that Mrs. A vram' s report calls for was that it should not be general but that · it be noted (perhaps in lOLA TC) that the survey is available. , Mr. Shoffner suggested that the board accept John Kountz's statement re- garding the establishment of a committee on nonbibliographical data and reconsider the matter again at Midwinter. John Linford suggested that it would be best to expand the charge to the committee to include ·the non- bibliographical area. Paul Fasana stated that the committee's function statement now is so worded that it can include noncataloging data. The sense of the board was agreement that the authority already existed; TELECOMMUNICATIONS COMMITTEE REPORT. The new chair- man, David Waite, reported that the committee first discussed the commit- tee's focus, as it was the desire of the board, as h e understood it, to make some changes in this committee. In the past the activities of the commit- tee were basically in cable TV. The present members were not too inter~ ested in making that their prime target, but instead the electronic commu- nications of bibliographic data. They are not going to just look at hot is- 152 Journal of Library Automation Vol. 6/ 3 September 1973 sues but are currently proceeding in the area of telecommunications in- formation, at the same time keeping their eyes open for important de- velopments in the technological field under the broad base of telecommu- nications. The two main focal points of the committee would be education and standards. In education the committee would try to communicate with de- cision makers as related to aspects of communications to be investigated and then overflow to the general library community. Mrs. Martin asked if there might not be a problem with the committee's taking on this role since seminars, institutes, etc., were the function of the Conference Planning Committee. The planning for such seminars, etc., Mr. Butler said, on a six or nine month basis did not work adequately. There was no objection to the Telecommunications Committee functioning in this area Mr. Kilgour remarked that AT&T and other phone companies, as well as FCC, had a great deal going on with impact on telecommunications and Butler said, on a six or nine month basis did not work adequately. There was no objection to the Telecommunications Committee functioning in this with networks presently and he felt that ALA should present a position to FCC. The committee should therefore inform itself extensively as to what is going on so that if it appears some action by ALA was needed, we would be prepared. COMMITTEE ON OBJECTIVES REPORT. Chairman Stephen Salmon summarized the discussion by the Objectives Committee of the three issues raised at the first session of the ISAD Board meeting regarding the Infor- mation Technology Discussion Group: ( 1) How such a group should fit into the organizational structure of ISAD. The committee sensed a media committee was not an answer but felt a discussion group was appropriate. The media group should be continued even if transformed into a com- mittee. ( 2) The restatement of the objectives and activities of the division to include the media group. Consideration was given to Paul Fasana's words that "related technology" in the ISAD Bylaws' Objectives Statement included educational technology already. But the committee agreed with the board that a change in the language would help clarify and Mr. Kil- gour had some recommendations in rewording the Objectives Statement to solve the problem. ( 3 ) Terminology for the name of the division and the journal. They finally identified three possible name changes for the divi- sion: (a) Information Science and Library Automation, (b) Information Science and Educational Technology, and (c) Information Science and Technology. The final decision was that the present name of the division, "'Information Science and Automation," was the best. The committee also thought the draft report should specifically include another objective, i.e., to offer expertise in this area to others in ALA and other professional organizations like ARL. Mr. Salmon listed the additions and changes made and included in the final draft of the committee's report. Highlights of Minutes 153 MOTION Paul Fasana moved that the ISAD Board accept and adopt the report of the Objectives Committee. SECONDED by Susan K. Martin. CARRIED. Mrs. M·artin remarked tl1at the Information Technology Discussion Group was already within ISAD. Mr. Shoffner explained that the Board had accepted the group only for one year and during that year ISAD in- tended to determine whether this activity was within ISAD' s scope. Mrs. Martin asked, if ISAD considers educational technology and audio- visual concerns to be within its scope, what the relationship would be with the other audiovisual committees in ALA and also what COO's role would be? The Board was not asserting what is out of scope with any other parts of ALA, Mr. Shoffner answered, only what was within ISAD's scope. Presi- dent Shoffner thanked Chairman Salmon for his report and the committee for its work in carrying out the original charge as given and meeting the time schedule. He then declared the committee disbanded. There was some discussion on coordinating with other A V committees in ALA and what aspect of A V, ISAD would be concerned with. Mr. Kil- gour suggested that the Information Technology Discussion Group should pursue its own goals and not concern itself with the coordination of all ALA AV groups. Such coordination he felt was impossible. Whether a number of committees or subcommittees in the discussion group could be formed was also discussed. Mr. Shoffner stated that these should be "units" of the discussion group, not "committees." He further said he was reluc- tant to establish committees and would do so only after a group of people committed to doing a job showed, over some continuing period of time, productive activity on a number of different tasks that relate to each oth- er. REPORT OF EDITORIAL BOARD. Mrs. Martin said that at the Mon- day ISAD Board meeting she had talked of retaining ]OLA TC as a sep- arate publication, but the final feeling of the Editorial Board was negative. The thought was to create a separate section within ]OLA but with a different format, as the green sheets are inserted in the Library Associa- tion Record ( .. Liaison"). Don Bosseau, editor of ]OLA TC, remarked that the Editorial Board had provided insight into another need which was for truly technical com- munications, e.g., a short summary which would show up later in a longer, detailed article. The Editorial Board felt that TC should be made into some- thing that has more impact than news releases. ISAD/LED EDUCATION COMMITTEE REPORT. A written report was submitted by the committee. (See Exhibit A.) COLA REPORT. The membership of the Discussion Group had in- creased to 145. Chairnlan Don Bosseau, who has held that position since the incorporation of the group within ISAD, said ballots would be sent out shortly for the election of a new chairman. 154 journal of Library Automation Vol. 6/3 September 1973 Mr. Bosseau also asked about control of membership in the group and stated that ALA's guidelines indicate one person per institution as a maxi- mum membership. The Board corrected this idea by saying that this limi- tation was not ALA's but the old COLA limitation. There is no limit on membership by ALA. Mr. Butler asked that the planning of COLA, MARC, and Information Technology Discussion Groups' meetings be coordinated. Mrs. Martin said that David Weisbrod had suggested that there be a COLA meeting at ASIS and that would be part of the cooperation be- tween ASIS and ISAD in the program area. MARC USERS DISCUSSION GROUP. Mr. James Rizzolo told of his intent to make a survey by breaking up the mailing lists he had into three groups: ( 1) MARC subscribers, ( 2) those interested in using MARC, and ( 3) an informational group. Mr. Kilgour thought the group was called "MARC Subscribers" not "MARC Users." Mr. Shoffner said the name had always been MARC Users. Originally there had been the intent to set up a "MARC Subscribers'' group but Mr. Culbertson had said that it would not fit into either ISAD or ALA's structure. They then settled on MARC Users Discussion Group. It was stated that both MARC and COLA Discussion Groups should be in the program section of the ALA Conference Program book. It was pointed out that program time can be requested by committee or discus- sion group chairmen. INFORMATION TECHNOLOGY DISCUSSION GROUP. Mr. Shoff- ner requested Mr. Donald Hammer to inform the ISAD Information Technology Discussion Group that the board would not establish an AV committee, but intended to continue with the Information Technology Discussion Group in response to their memo of March 2, 1973 requesting an A V committee within ISAD. REPORT TO ALA PLANNING COMMITTEE. Mr. Shoffner also re- quested Mr. Hammer to forward the Objectives Committee report on the long range plans of ISAD to the ALA Planning Committee as a means to meet their request. (This report had been deferred from Midwinter so that the final report of the Objectives Committee could first be heard.) RTSD COMPUTER FILING COMMITTEE. Mr. Fasana said he was asked by the RTSD Board why ISAD had refused their request to appoint an ISAD member to the RTSD Computer Filing Committee. Mr. Hammer said he would see that a committee member was appointed. Mr. Shoffner expressed appreciation to the board and turned over the gavel to President Fred Kilgour. The meeting was adjourned at 12:00 noon. EXHIBIT A JUNE 25, 1973 Highlights of M intttes 155 MINUTES OF THE 1973 ANNUAL ISAD/LED MEETING The 1973 annual meeting of !SAD/LED convened June 25 at Caesar's Palace, Atrium I, Las Vegas. Present were members Jim Liesener, Ann Painter and Elaine Svenonius; and visitors Martha West (California State University, San Jose), Barbara Fleming (University of Nevada, Reno) and Philip Heer (University of Denver); in attendance were Pauline Atherton and Charles Davis. Discussion centered on two topics: the DISC questions as commented upon by Library School faculties and the future course of DISC. The general and specific comments on the DISC questions given by library school faculties are given on the attached sheets. These sheets include, in addition to re- sponses reported at ALA, responses which arrived belatedly throughout the summer. General criticisms are primarily of two types: the questions are either too broad or they are outside the domain of information science. It was felt that had the use to which the questions are to be put-viz., to develop modules, not to examine gradu- ating students-been clearer, the charge "too broad" would not have resulted. As to what is to be included in the domain of information science, this was precisely the point of the exercise of generating questions and comments limiting or extending the domain of information science, should be accorded consideration. At the June 25 meeting, the individual questions were discussed generally in light of comments received. Following the discussion, participants at the meeting expressed informally and with varying degrees of determination interest in developing modules around certain of the questions. The meeting ended with a discussion of the future of DISC. A technical session is being planned by the ES SIG at Los Angeles in October: Program Modules for Developing Curricula in Information Science; the plan for module develop- ment will be advertised and some demonstration modules shown with a view to draw- ing up module specifications. Also contemplated is a program by !SAD/LED in January at Midwinter ALA.- Elaine Svenonius, August 15, 1973. 5774 ---- lib-s-mocs-kmc364-20141005044532 156 Corporate Author Entry Records Retrieved by Use of Derived Truncated Search Keys Alan L. LANDGRAF, Kunj B. RASTOGI, and Philip L. LONG, The Ohio College Library Center. An experiment was conducted to design a corporate author index to a large bibliographic file. The nature of corporate entries necessitates a different search key construction from that of personal names or titles. Derivation of a search key to select distinct corporate entry rec01'ds is discussed. INTRODUCTION This paper describes the findings of an experiment conducted to design a corporate author index to entries in a large file of catalog records at the Ohio College Library Center; a companion paper describes findings of a similar investigation into retrieval employing a personal author index. 1 The center has operated an on-line, shared cataloging system since August 1971. In addition to a Library of Congress card number index, the system maintains truncated name-title and title index files. The user is thus able to retrieve entries employing truncated search keys. Three previous papers report results of experiments which led to the design of the name-title and title indexes.2- 4 For monographs having personal names as main entries, a truncated 3,3 search key consisting of the first three letters of the author's name plus the first three letters of the first non-English-article word of the title was judged to be satisfactory in that this key yielded five or fewer entries per query in more than 99 percent of the cases when keys were selected at ran- dom.5 However, a recent study by Guthrie and Slifko reveals that a model which employs random selection of entries yields results closer to actual ex- perience, and with a higher average number of entries per reply.6 A search key composed of the first five or four characters of the sur- name and the first or first and second initials makes possible efficient re- trievaP However, the situation is different in the case of corporate entries because many corporate names begin with the same or similar words. For example, in the records examined, the initial words of more than 1,300 publications are "U.S. Congress, House Committee On .. .. " Obviously a Corporate Author Entry RecordsjLANDGRAF, et al. 157 type of search key different from that which proved efficient for retrieving personal authors is required for retrieval of corporate entries. MATERIAL AND METHODS The experiment used a file of approximately 200,000 MARC II records having a total of 68,169 corporate name entries. Corporate entries were ex- tracted from the llO, Ill, 410, 411, 710, 711, 810, and 811 fi elds in the rec- ords. A program edited the file to extract keys; initial English language ar- ticles were removed from each entry, and the words "United States," "U.S .," "U. S.," "Great Brit.," and "Great Britain" appearing anywhere in the entry were replaced with "US" and "Gt Brit" respectively. A blank was substituted for each subfield delimiter and associated code, and unwanted characters such as punctuation, diacritics, and special symbols were re- moved; the program also closed up the space that the unwanted character had occupied. One blank replaced multiple blanks. The elements extracted consisted of five segments of eight characters each, representing the initial eight characters of the first five words of the corporate entry. Segments containing fewer than eight characters were padded out with blanks. If a corporate name had fewer than five words, the remaining segments were blank. To study a given type of key, the file was sorted on a specified number of initial characters of each segment; these initial characters were then employed as search keys by a program which sequentially compared the characters in the key, counting distinct and identical keys. RESULTS AND DISCUSSION Table 1 presents the number of distinct keys and the maximum number of occurrences of identical keys for the structures studied in the experi- ment. The larger the number of distinct keys for a fixed number of en- tries in the file, the better the key will be for retrieval purposes. Given two search keys which are more or less equally specific, the one which is sim- pler to use is preferable. The peculiarity of corporate-entry keys can be observed from Table 1. Even for the 8,8,8)8,8 key structure the percentage of distinct keys ( 33.7 per- cent) is low, and the maximum number of occurrences of an identical key ( 1304) is high. Another observation revealed by Table 1 is that as the key structure goes from five to three segments, there is a steady decrease in the percentage of distinct keys and consequently an increase in the maximum number of entries per key. However, a reduction in the number of char- acters in a segment does not cause a great deal of deterioration. For exam- ple, for 8,8,8,~,8 keys, the percentage of unique keys and the maximum number of entries per key are respectively 33.7 percent and 1304, while for 2,2,2,2,2 keys, the corresponding figures are 32.3 percent and 1307. Thus, the 2,2,2,2,2 key structure seemed a good candidate for a corporate 158 Journal of Library Automation Vol. 6/ 3 September 1973 Table 1. Number of Distinct Keys and Maximum Number of Identical Entries Per Key for Different Key Structures in 68,169 MARC II Records. Key Structure 8,8,8,8,8 8,8,8,8,0 8,8,8,0,0 4,2,2,2,2 4,2,2,2,1 4,2,2,2,0 4,2,2,1,0 4,2,2,0,0 3,3,2,2,2 3,3,2,2,1 3,3,2,2,0 3,3,2,1,0 3,3,2,0,0 2,2,2,2,2 2,2,2,2,1 2,2,2,2,0 2,2,2,1,0 2,2,2,0,0 1,1,1,1,1 Number of Distinct Keys 22982 20476 16283 22411 22120 19513 18589 14801 22417 22132 19560 18654 14922 22053 21743 19034 18036 13842 19028 Number of Distinct Ker1s as a Percent of Total Number of Records 33.7 30.0 23.9 32.9 32.4 28.6 27.3 21.7 32.9 32.5 28.7 27.4 21.9 32.3 31.9 27.9 26.5 20.3 27.9 Maximum Number of Entries Per Key 1304 1305 1802 1307 1308 1311 1311 1807 1307 1308 1311 1311 1806 1307 1308 1311 1311 1807 1308 entries index and therefore the number of entries per reply for this key structure was more intensely studied. On the average it is desirable that the number of replies per query be such that information by which the user can choose among the possible re- plies can be displayed on a single CRT screen. This maximizes the utility of a computer system, since it minimizes the amount of system activity to promptly satisfy a user's request. Since some query keys produce but one reply while others produce hundreds of candidate records, it is necessary to use the mathematics of probability to determine the likely long-term ef- fect of a given choice of system parameters. Using the approach indicated Table 2. Average Number of Entries Per Reply for Key St1·ucture 2,2,2,2,2 for Various Multiplicity of Entries. Number of Average Number Maximum Frequency Total Records Percent of Distinct Keys of Entries of Any Entries in File in File Total Records Eliminated Per Reply 19 44174 64.8 389 5.0 29 48127 70.6 223 6.6 39 50854 74.6 142 8.1 49 52422 76.9 107 9.1 59 53513 78.5 87 10.1 Corporate Autho-r Entry RecordsjLANDGRAF, et al. 159 as useful by Guthrie and Slifko, the analysis of the effect of various choices of search key becomes the following. Assume that every entry has an equal probability of being accessed. Then, in attempting to retrieve each entry once, keys having i number . of entries will cause a total of i 2 entries to be accessed. If ft denotes the fre- quency of keys having i number of entries and M denotes the maximum allowable occurrences of any key in the file, the average number of entries per reply y, is given by: Jl{ where ~ i ft is the number of entries in the file whose derived keys have • = 1 a frequency of M or less. The above formula yields the average number of entries per reply for the 2,2,2,2,2 key to be much larger than 20 for M > 100; but some 2,2,2,2,2 keys corresponded to more than 500 file entries. A typical CRT display ter- minal can accommodate only ten or fewer entries per screen. Therefore, if the average number of entries per reply is desired to be ten or fewer, it is necessary either to ignore entries with high multiplicity or to adopt a different scheme of storing and retrieving such items, in which case the mathematical result would be the same as ignoring high-frequency items. The average number of entries per reply was computed for five different values of M ( 19,29,39,49, and 59); the results of these computations are in Table 2, which reveals that if keys in the file are allowed a maximum recurrence of 39 entries per key, it would be possible to have keys in the main index for about 75 percent of total records, while entries for only 142 high frequency keys would have to be shunted to a secondary index. In this case, the average number of entries per reply would be about eight. Table 3 gives the probability of number of entries per reply for the in- dex file consisting of 50,854 (out of a total of 68,169) records with the maximum frequency of any key in the file being 39. For preparing this table the assumption is made that each entry in the file has an equal prob- ability of being accessed. Thus the probability of obtaining i entries per reply is given by: P(i)= Jft 'f. ifJ i= 1 where f, is frequency of keys occurring exactly i number of times in the index file. An inspection of this table shows that in 87.7 percent of the 160 Journal of Library Automation Vol. 6/ 3 September 1973 Table 3. Probability of Number of Entries Per Reply for an Index File Using 2,2,2,2, 2 Key. Number of Entries 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Frequency 14820 2893 1276 726 427 312 248 195 150 120 78 88 56 71 62 48 41 28 24 22 18 16 23 25 13 9 12 18 10 11 11 13 6 9 7 6 11 5 2 Probability Pt·,ccntasc- 29.1 11.4 7.5 5.7 4.2 3.7 3.4 3.1 2.6 2.4 1.7 2.1 1.4 1.9 1.9 1.5 1.3 1.0 0.9 0.9 0.7 0.7 l.l L.l 0.7 0.4 0.7 1.0 0.5 0.7 0.7 0.8 0.4 0.6 0.4 0.5 0.8 0.3 0.2 CumulutioC ProlHJhiiJlll 1' ~rr.cnltrEW 29.1 40.5 48.0 53.7 57.9 61.6 65.0 68.1 70.7 73.1 74.8 76.9 78.3 80.2 82.1 83 .6 84.9 85.9 86.8 87.7 88.4 89.1 90.2 91.3 92.0 92.4 93.1 94.1 94.6 95.3 96.0 96.8 97.2 97.8 98.2 98.7 99.5 99.8 100.0 time there would be 20 or fewer replies. This represents two screensful of information on a typical CRT display. CONCLUSION A file containing only those entries for which the frequencies of 2,2,2,2,2 search keys is 39 or fewer would produce 20 or fewer entries per Corporate Autlwr Entry RecordsjLANDGRAF, et al. 161 reply approximately 88 percent of the time, but such a file excludes 142 high frequency keys for 17,315 of a total of 68,169 entries . Therefore, a special technique for handling corporate~entry derived keys of high multi~ plicity is desirable. REFERENCES 1. A. L. Landgraf and F. G. Kilgour, "Catalog Records Retrieved by Personal Author Using Derived Search Keys," Journal of Library Automati{)n 6:103-8 (June 1973}. 2. F. G. Kilgour, P. L. Long, and E. B. Leiderman, "Retrieval of Bibliographic Entries from a Nam~Title Catalog by Use of Truncated Search Keys," Proceedings of the American Society for Information Science 7:79-82 ( 1970}. 3. F . G. Kilgour, P. L. Long, E. B. Leiderman, and A. L. Landgraf, "Titl~Only Entries Retrieved by the Use of Truncated Search Keys," Journal of Library Auto- mation 4:207-10 (Dec. 1971). 4. P. L. Long and F. G. Kilgour, "A Truncated Search Key Title Index," Journal of Library Automation 5:17-20 (March 1972}. 5. Kilgour, Long, Leiderman, "Retrieval of Bibliographic Entries." 6. G. D. Guthrie and S. D. Slifko, "Analysis of Search Key Retrieval on a Large Bibliographic File," Journal of Library Automation 5:96--100 (June 1972}. 1. Landgraf and Kilgour, "Catalog Records Retrieved." 5775 ---- lib-s-mocs-kmc364-20141005044633 162 Grant Project Information via a Shared Data Base Justine ROBERTS: The Library, University of California, San Francisco A quarterly keyword index to campus grant projects is provided by the Health Science Library at the University of California, San Francisco, using a data base created and maintained by the campus' Contracts & Grants Office. The index is printed in KWOC format, using the chief inves- tigator's name as the key to a section of project summaries. A third section is also included, listing the summaries under the name of the sponsoring department. INTRODUCTION Communication channels between the computer center and the library at the University of California, San Francisco are open despite the "normal" and accompanying library use traumas of an all-purpose university comput- i.rig center. Thus the library's chief administrator received an immediate, if un- expected, response to her statement of campus need for subject access to information about local research and training projects. As she summarized it, the information need is expressed as "Who is doing what, where, with what amount of funds?" Such queries about campus work come to the Health Science Library regularly, but had often remained unanswered be- cause of inadequate published sources and the lack of easily accessible local sources. Neither the campus Contracts and Grants ( C&G) Office, nor any other campus unit, had files organized to allow inquiry by subject or by department name, nor were any departments staffed to provide a general information service of this nature. Previous investigation by the library had revealed the fiscal infeasibility of extracting citations of publications on campus research projects from a commercially available data base. The latest locally compiled directory of campus work was eleven years old. Response by the computer center director to the library statement was a suggestion for a three-department cooperative project between the library, the computer center, and the C&G Office to produce a quarterly index of the machine-readable administrative file of the latter department. This accession-ordered file is comprised of 1441-character records which provide for 42 data elements needed by the C&G Office to monitor the progress and fiscal status of all extramurally funded projects and proposals (Figure 1). Grant Project Information/ROBERTS 163 214G-1000.40 D0+15 1G404~61 .. 1FDRSHAN TOlANO!l15 , .... 0527110TOU~T06l0720701700ftlDlZT IAINJNG G«.ANr ,,. FNoOCRtNa..ciGf- , OTAii£-,Es - ---=- -· ···· ·- ··ao .. ·~oooo oooooooo ooaooooo o:)oooooo noooooo'l ooonoooo oo QO D90Q OOQOQOOD P QQ DOOQO OQOOO QO Q OQ®CJO'r' OOO QQOO O 00000 0 30 OQO O ' 0 OO Q.ZZ4146 U l 00 0 S4ll6l 00050119QOOQ09 QQ00005Z2 1}Q D04 1] 460 oooaoo00241~ l '5JOOI,4li"000'50Jl~OOOOOOD0004~ft0200045 14000000000 00 OOOOOOOOOOODOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 00 ooocooooooooooooooooooooooooO!lODooooOoOOiiOODOoOO ·~ ·-··-oa-oooooooooDooOi,-OooooooOoooo"-oo""oo"'o"'oo""oo"'o"-oo""oo"'o"-no"'oo"'oo""o:------:oo"'"""o"'oo""no""o"'ooo= o""o"'oo'"'ooo"' . . Q1JOO QODQQQODQOOOOOOOOOOQQj)~-~-QQO.QQ.l1t)Q_Q~.Q.Q0J).Q..Q.Oo;>OJ><00t!!JDOmO"-'OOtl!JOOlJL0>J<'0!!!JOOlJ!_OO!!J!O!!LO _ __,.OO!!_!!!OOlJ!_OO!!J!O'l!)DO'-"'OO .. O""OO!JlOO"'O!!!JOO"'OOI!!IO.!l!DDlli!DDil!IOl!lOO"'OO""O""'-DO OOO CCIOOOOOOOOOOO oa oooooooooooOilnooooooooaooooooooooooooooooooooooo 00 OOOOOOOOOOOOOOODOOOOOOJOOOOOOOOOOOOOOOOOOOOMOQO -·- - --oo oooooooooooooooooooooOOoOOOoooOoOoOoOooOoOciOOOoo :J 1 ~o 11111010 t 720610 no lOt7206l07l OO Moooooo ttl r4 2 o , ooto4 3'911600 1 06'9U t 1 .. ...QQJ 0 9607 IIP Q ll ltl 2l'9 0 0 115)512 0 - --· ·- -··--·-- -- ·· - ·- ---- Fig. 1. Dump of Contracts & Grants Office Master Tape Record Original specifications for these project records had in fact included a gesture toward information retrieval in the form of a 5-digit "discipline" code; this code had quickly become null when problems of maintenance and interpretation revealed themselves. However, the regular monthly entry and updating of other descriptive data was already twelve months under- way at the time the cooperative project was suggested. LIBRARY INDEX The original proposal for a library index to the C&G file was production of a standard KWIC (keyword-in-context) index to project titles, to be based on use of an IBM SHARE library program developed by computer center staff. After review of available library programs and output, the product was finally specified to be a KWOC (keyword-out-of-context) in- dex to project titles, using the chief investigator's name as key to a second 'bibliographic" section of project summaries (Figures 2, 3). A third section was added to list project summaries indexed by campus department name (Figure 4). The C&G file included 12 elements which the library considered to be of general campus interest. These elements, comprising the project summaries, are: ( 1) project title; ( 2) chief investigator's name; ( 3) award status (i.e., funded or proposed); ( 4) project site (i.e., campus or affiliated institution); ( 5) project type (e.g., training, basic research, applied research); ( 6) grant number; ( 7) total project duration to date; ( 8) award period; ( 9) award amount; ( 10) granting agency name; ( 11) campus department; and ( 12) school. Items 10, 11, and 12 of this list exist as numeric codes in the records, --:....-;;;;-:--c:-:--:-:c-:--'""""'!oi-Sif\G-CC,...MUf\ITY iErifil HEili 1- - --·· · -·- - - DUNLAP ... MENTAL HEALTtt PRINC.IPL[ S AND EARLY EDUCAT W N f (J it OEAF CHILDREN SCHLESINGER, H ~~~~:~l~~ !~ T=E~:~:~=~ ~~ ~f :~M~E~~~~6:~ ~ ~N H:~!!~~T ~ ---------.oiJSi:Ji=LI'ft~To~t,..\5NG:"!t:•o.c· ~~ -- HElABOllt ROUTES & THEIR CthTitCL SINGER, 1 ~~~~~~~-'i~~~;~~·~s~~~~~2!-'la~~O;;C~~~Tc.;:~:~ot· !~I ~Nf E~7:~~~~~~1 ~SMH1-BOL-IS-M - -- tA:::~·/ STUUTES Cf BILE PIGMENT I"'LTAtiOLlSM SCHMID, R Sa! :~~: ~" "~~ 4~~~:~" .~V£ ~~ '\i~e~A~:!:~uf!Jf2~' ~\1!!2C;;TE,;oe"'s•T ----------~~~/t¢~~t:·o.JL" ,.-. --- tNetR.N ERROIIS QF METAtiCLI SM SMITH, L "-*w.lffiS'-''-'-~---l:~iffi~~~~~~~ o: ~~p~~o~~~~~ii~k~~~~~·~~·~;tt~~:" ceu s ___ ~-----~~n~at .. " Fig. 2. Keyword Subject Section 164 journal of Library Automation Vol. 6/3 September 1973 _,uc..._..Sf-:_j,jli'-"J:B .. A ... BfL-------...ti.ll.....UlHtL<;l A _ t_:_r;_FI_~•li.J1!.!2t.K _..21JliL,j[;L_ _ _ ________ JP:AAG!ilEc__.L19"- ""'•"'o"se'""•"'"'•u:-,--cw:-.- .- .- . -.- .,..,LY"'•"'•H"'cc" Y=TE &li"~UN E CY1 Ct·~ · sTs- · - ~ ·- ... ----- PATHoluGv Pfi(~ j lVPE 8 SllE:UCSf AWARD CCONTJIIIUEOI ----- ----"YP"-!..li!...J.I .;t!'t!CJ .(ll__TO !?/31/i.r.) AGENCY CC.Zl __ _!!N""c:._,RO,_.I_,t.,_A0,_,7"-19"'1~--,.----- AOSEtiiAU, w. • ••• lRANSHR FACT(l FI -CHARACHR t. ROLE TUMOR IMMUNITY --------~~=~TH~~u,.lC~ .. ~~icii/fj' TO. '6f3C~.Q..tJ~1';~0~ AG~~~~~-: NU: ~=&B Q ...!•~o~Sfeo!N.!.!TH!!!.A'-'L ''-'''-'''-'-' ~· ~·_.<;EX"-T~~:~~D~E~~~~t!\~:0:~~"uf'f~~;'t~:t4l fIELDS GF p;~~T ~~:~ :1-NE~~~~~UCSF AWARD 't'R 5 C IJ/01170 TO 4/30/HI AGENCY 0061 NO: ROlNS-09146 - ."'c"'TH""•"'••,-,""'s'.- .- , -. - .- -cs"Et'""RE'""T"'Io'-'N'"'P"'R"'cT"'E""'IN7S ·&IiNSeY F.ANU.'faT ____ ---- ·------ .•• --· - , ···- DENTISTRY PA.OJ lYPE 8 SITE:UtSf PROPOSAL RUO,.,tO:, A ... • • • THE FCREST CYCLES CF UENGUE IN M.tlAYSU HCCPEII fCUNCAl ION PRIJJ TYPE B '-'-'U.CI ~~-=~~~-;~~,·~2~1~/c,'~'l:'~-T~o,._:s'~- '=''~~~·~·-~~~~-~=:::~~~~~~:;~~~~~~--- Fig. 3. Chief Investigator Section with decoding tables comprising a separate file at the beginning of the tape volume. These items, together with items 3, 4, and 5 are coded on input but are not uniformly edited by the regular C&G update program. PROGRAM REQUIREMENTS Necessary program functions included the selection of active grant records and the editing, decoding, and formatting of selected data elements for printing, and the extraction and sorting of index terms. These functions were divided between a main routine coded by library staff, an indexing subroutine and print program written by a computer center staff member, and an IBM utility sort. Local programs were written in PL/ 1, using the newly installed PL/1 Optimizing Compiler, and provided several tests of the compiler's capacities. Only projects currently in "award" status, or which have outstanding proposals during the preceding twenty months, are selected for printing, approximately two-thirds of the file at this time. These conditions are tested on various data fields in the C&G record, and data from selected records are reformatted into a partial print line and passed to the keyword extraction ,..t: n iCINt-, C.J:~E i.!A~-fJ~f S."I· i~lll. M!SCiiPTrmi'tf t=EI't.~-lCt'lr-: tP[;i,- · SO:(Gl U ~' tldC.JNf: P.-l:J TYPE H SITE:UCSF- JH:C I tiNE' GF.I\ERiol • HGVLAT TEN 'CF·' P.'ftfHANSf.()pi1iffGN' fNTHELTVf'R Si.HI:C L GF ~LOJC.INE Pfi.C J TYPE ti' StTE:UCSf ,.ECII':IH, GHEJ.IAl • P41Ht:ttiE'5iTTi:' ACLH Hr-:1-l 'fAill.:l,·l: S( Hf.Ll Gf MFfliCINE P'-'~J TYPE ~ SllE:V .~ -::;~E"'o"'I t"I ;:;-;N E,-, -,Gc;oE N"'E"'•A""L- .--;:;""'"'•"'••"""""TH'r'RG 'f C. ((iii C INC G E 1\ S. SCHCDL r.F "'~IHCINt Pllf1J TYP[ 0 SJTE:UCSF nDtClf'.li:, bHEMI\l ,. lY.SYl OliVAS£ K(:'t l~HC tftl.ac:;'f.'N' C: Vi"--SS (il'-."it' t'NG St:Hl:l Of MI:CICWE tl'-'(I J TYH b Stlf.:UC.SF PROPOSAL PROPOSAL PROPOSAL • PROPOSAL P~C.POSAL CCONTJNUED. BISSElL•: OJ-~:~-~ ::-, ·,;-._ ;·_.-;.r:::··· ~:-;:-:9_ SIEGEl, R --...,:noulc'>IN"'Ec::-,<'o~, .... , ....... - .--;;ccN:N;T;TR;;;A,.CT.--.;10;-;;-.0,:;;v•"•""oP c··cr:r;ifucr lRAit-.iH~G SESSIONS FrJr( op,.;s•"'n"•"'•"'•s'"o"-NN..,EL;---_ --,,.---,_,~~-'"-d""''"'J"sc::-,-;;";--- SC.HCCL OF MfOJC:lNE P !l:O J T'r'PE T SJTE::UCSF "PROPOSAL: .. !-·.~ '·. .._, ... t ~ICIM , GEI~F. RA l • y[f;ULATH: N Of Ci tf~[ E" iPF'fS'S'j cs:·l>y-~tffiJ~f.i£'RGiiJ '(;'cj;,HtXE""S--------,8;';-A;i-;XT,-,'ER;;-,-J.,--' .SCI-iCC( OF •t:I:I CIN !: tl'fJl CAROIGV AS CUL AR RES - -SCtu.ii L ( f ~;[c-iCiN E ·- •.• • ._, ( A-RlfH) VAT 'R f S iffi __ _ --·- - - - -- ·- ·· - .... ----- - OCC IU Gl l'AJH (lOG Y, CLHd CAL SCH CL Cf. "'HICI ~§ . , .... _ _ • o __ t:liN P~~I::I~E. O )008901 HISTO k Y 1-:EALTH SCI SC~OOL (Jf fi'lOICINE •ll . HIST OF HEALTH S tl _ _!l!!t;t<~F~I.lJift!JI,":L!,BIX.X---:-'-------"U'ts F t i] J\!IflACTS &._G.B.A.tllS....ll:tlla.-. __ ._____2/...L'lLll------- - - ---"M.I'---'-- •••••FilE2 II.EAO EI'I'ORS ----------~A.CifNCY C.OOE~~T . IN CC.G UBL'E -·---- - -·--- - - DEPf • CCOES NOT FOUND ------------ - ------ -- SUE CO DE S NOT FOuN:~::,e~~~~ce. Supplement Annuel 0 J915 16(1UC.)2 33)1,a 1917 IS (la$( )4 )5)UCI'IIU<.tf0 of1Qne fiUmbef t.•49llrt'w 1 MnU 818 V. I ·I S(l884 fl889-1958): n . ser . 1959· 0,6 J18qf84 ftance Mm1s1ere de l'1nstruc110n pubhq~ etdes beau•· 201.4JOO trts. S..france. M1 ntsterede l'edutaiiOn nat10naJe C1talo@:ue de~ thHH de doctor at. fr1nce Mm1sttre de l'+nstructiOn publtqueet des beaux arts. SHBultttm des btbiiOtheques e t Cfe~ archtves. 2074200 fra nc.,, M•n•stere de l'mstru<:tton pubhque e t Cfes beaux 207000 lrts. Comlte des travaux h1stor•ques et sc.enflhque~ Se. f ranee. CotNte dn travau• h•storiQUf:S et sc.~ttftques. SectiOn de Gqraph.e.BuUeM. fr~e Mon!Sterede l'lliSlrUC.IIOOPUbiJQvtetdesDeiJUI 2074 40() arts Comtte des travau• h•SICM'IQUts et sc~ent•hques. SH france Corrute des tr1v1u• h •stonques el sctenl•hqoes Sec lion de GqrtphM!.Bullel•n. funce Mlntstere de l'lflletteur Revue g~ale 2074!.00 d 'admentstultton Par•s. ltnfK. Bere~·tew•vlt. AI hudot t•tle. 1879-1910: M•n.stet"e del'.nterteUr. T•tle YlrttS J.i•lhtly V. ll. Jan OK J878. Pub11HaYee.lacollaborahonde senateurs, de deputes. de mtmbtes du Conwtl d'etat , de lonctiQnna trts tt de publtCtStes sous Ia dlfe18 •~ MnU PER 1878 19 10 . fra": ~~~!~t'!,"~:~~~~:t~!faet~fsc:~,;:~s de Ia 20;:.600 frtnct Comm.ss10n de Pubf•uhon des Documents Relahls a u• Or1c•nesdc Ia Guerre de 1914 Oocument:s.d•plomat 1ques •. ftlnc e , M•n•sttre des ltnances . SH Bulletmde 2074 100 Stlltsltque et de teg,stat1on comparee. rrance Mtnts tere des t.nances et des afla1rs :?0741!.00 econom1que. Rapport sur le:s.comptes de Ia nat•on See france . Se,....tCI des etudes tconom 1ques et hnan<:Jtres.Rapport sur les comptes de Ia nat•on france M.n1steredu comfl"ltrce. Annates du commerce 2074900 extet~eur Pa"s 18&3 18~1 hlvet tUe Oocumenssurlecommercee~ter•eur . VariOUs chanles '" n.Jme ot m 1ntstry: 1843-1851. Mtn1stere de l' t&oc uttureetdu commerce. 18S2-f eb. 1853, Mm•sterede l'•tlltfleur. de l' ailrtculture et du commerce. feb.· May 1853, Mm•ste•e de l'tntt:t~eur. June 1853·July 1869, Mtn1sterede l'aguculture. du CotniT\frct tt des tra vaux publiCs. Auc. 1869 1881 , Mln•stere de t·agnculture el du comme•ct. OcustOnaMy pubhs~ •n comb•nt'd numbers. No more publ•she800 Rapport Wt lei c.omptu de Ia n11Uon. P.Jrts, lmpr nat~te Vol tor 1949 1 95~ •ssued by wrviCe dts etudes Konon'I•Ques et ftna ncttrts and lnSittut n.c.on.l de~ ~tahshque et des etudes e<:Of'IOmiQues 1962 I 966 by lnst•tut nattanal c!r Ia s tttiSh que et des etudes economtQUes 6) •1<600 MnU Ooc 1967 MnU Wll 1960 1966 JJ6 "" f8A 74 fran« ServteedesPichesMartl lfN'S SHfr•nc.e OffiCe 2'01~900 Sctenl!fiQUt et T tchntQue des Pectle'S ~ntunes Rr-vve (kos lra vau• France Serveces techntQUes et .ndustt~el~ de 2016000 t·.eronaullque PublteatiOnS scttnllhQues el techrnques ~ france Mtntstere de l'alf Pubhcat10ns sc.enhftQUn et ledw'IIQUt:S- frlntt , Armee Etat mtp- ServiCe hts torJQue S..Rewe 1076100 tus lortque de I"Jrmet pubhutiOn tmneslf•eiM: de I'Etat ma1or de Franceactuelle. P•ns 1076100 MnU Doc. VI Snol (JANI966· fr~rtea·A.merlque . New Yor.t. 2'016100 MnU NEW Current 3 ye"s FrJnIU MnU PER V. l 6(1919 1924) France •llustrat•on P•t•S 2016~ T•tle vaues, Mar. 1947 -1948. france •llustrat1on htteratte et thentrtle Absorbed by Nouveau fem1na SuPP't'mtlll tntllt••• tt ht ttl',..,t' •a 2<~216 MnU PER OCT1945 1955 , s uppl no I 191 (1947-1955). France 1llus1ratton ll!ter,we et thea ttale Sftfr;snce •llustrat!On Franceln,l•tut natfontfci'etudes dernocraphtqw1 . Tr1v1u• et documents. P•os MnU XX V 9 11. 1.3 19 .21 40,43 (1950 ff ance hbte.ltbefte egaht~. hatt:tMe tondon, Ham$h H 1m111on l•m• l ed Ceased ovb v ll no 74 (Jan 1947) MnU P[R V 1 13 (N0V1940 JAN J947). La France lotteta11e P•tts Nomorepubl•s~' l &'lllll MnU ptA VI 36(1832 1839). lrancsscan ht)IOf' tc:at classiCs S..Ac~myol Amerun f rlinc,s,un HtSIOr)' f ranci'SCan htstor-cat c lass:cs. 2076000 2016800 2076~ 539 Fig. 1. Minnesota Union List of Serials. Preliminary edition sample page. provide maximum benefit for Minnesota's library users. Our state's library environment features: • one large academic research library-the University of Minnesota; • many smaller academic libraries in the 75,000-250,000 volume class; • two large public library systems-the Minneapolis Public Library and the St. Paul Public Library; • one private research library-the James Jerome Hill Reference Library -which serves as a nucleus for the metropolitan area private college Minnesota Union List of Serialsj GROSCH 169 library network called CLIC (Cooperating Libraries in Consortium ); and • some library automation activities among these libraries, with the largest automation staff and activity at the University of Minnesota. The parallel developments of networking and systems design at the university made possible the proposal to the MINITEX Program Advisory Board for funds to develop the system and publish the first union list. In summer 1971 this program received approval and work was begun in mid- August. On September 1, 1972, the Preliminary Edition of MULS was pub- lished and distributed to participating university and MINITEX network members. Following is a report of this work, its results and problems. PROGRAM SCOPE Obviously, to create a system capable of eventually including library holdings state-wide and to convert such data requires definition of an initial and future scope. The initial scope was defined as: Conversion of the University of Minnesota Libraries' actively received titles, departmental libraries' complete titles, and inactive titles in the Libraries' Periodical Division. Development of a batch input tape software system capable of supporting initial conversion, correction, and updating to produce the Preliminary Edition of MULS. The future scope would potentially include the augmentation of the MULS data base with the following non-University of Minnesota holdings: a. Eight metropolitan area private colleges in the CLIC network, with production of a CLIC union list for their members' use; b. Minneapolis Public Library serials and unique titles from other public libraries of over 50,000 volumes, with production of a public libraries union list; c. Holdings of all state agencies, which would include the Minnesota Historical Society, State Law Library, State Department of Health, and Legislative Reference Library, with production of a union list for their internal resource sharing; d. State supported colleges' holdings; e. University of Minnesota inactive general collection serials, thereby completing access to the state's largest research library; f. Private college holdings outside of the metropolitan area CLIC insti- tutions; and g. Selected special libraries' holdings. At the moment of this writing we have the initial scope completed, are just completing a, b, and c, and have planned work on d and e for 1973. In view of this scope the initial MULS magnetic tape system was based on the MARC format to permit: • publication of a photocomposed or line-printer-method full union list; 170 journal of Library Automation Vol. 6/3 September 1973 • publication of regional combination or individual library lists using an IBM 1403 line printer equipped with the ALA graphic print train; • storage of complete and verified information on each serial as known, together with the source of the cataloging data; • extraction of the data via individual libraries to assist those wishing to develop automated serials management systems including check· in, claiming, binding, etc.; • conversion of the file to other storage media such as disk; • fulfillment of the smallest to the largest libraries' needs for biblio· . graphic detail; and • extension to a fully automated resource sharing system which would further improve the benefits of library cooperation. With this picture of the program scope, the design factors, data conversion, computer system, programs, photocomposition, costs, and problems will be described below. SYSTEM DESIGN The easiest way to look at the MULS design is to gain an understanding of the MULS MARC Record content as shown in Table 1. This record is the basic unit which is entered, including all associated cross-references or added entries to be made. It in tum generates each of these secondary entries in the file. In this brief description we will assume the reader is familiar with the MARC serials record as described in Serials: A MARC Format: Preliminary Edition and its Addendum No. 1.1• 2 There are some differences between the MULS format and the LC MARC format, most importantly the addition of a sort field (Tag 249) and the subfield arrange· ment for holding fields (Tag 850). Other variations have been indicated in Table 1, which uses the same organization as that contained in the LC format description referred to above. Figure 2 shows a page from a master·file listing. Note entry no. 2074000. This listing is formatted with the sequence number of the record appearing on the first line, followed by the bibliographic level and the remaining leader information. Next the record directory entries are found for fields 008-950 as applicable. On the next line are the 008 fixed length data ele· Table 1. MULS MARC Record Content A. Leader 1. Logical record length-five characters 2. ReCQrd status = 1 for MARC record 3. Legend = 4 for added entry ( AET) or cross-reference (XRF) entry a. Type of record-not used (blank) b. Bibliographic level ,. s c. Two blank characters 4. Indicator count = 2 Minnesota Union List of SerialsjGROSCH 171 5. Subfield code count "' 2 6. Base address of data "' 5 characters 7. Sequence number= 7 characters B. Record directory 1. Variable :Geld tag "' 3 characters 2. Field length = 4 characters 3. Starting character position = 5 characters C. Control fields-008 Fixed Length Data Elements 1. Date typed 2. Publication status 5. Country of publication code 9. Type of serial designator 10. Physical medium designator 12. Form of content a. Type of material code b. Nature of contents codes 13. Government publication indicator 14. Conference publication designator 20. Language code 21. Modified record designator 22. Cataloging source code D. Variable :Gelds 1. Indicators In general we have not followed LC in the use of indicators. One exception is the use of filing indicator for the 100 and 200 series tags, which we implemented before seeing that this feature was provided in the Addendum No. I to the LC format. Therefore, the indicators except as above are both blank. 2. Subfield codes Except for the holdings statements (TAG 850) we have generally followed LC philosophy. For TAG 850 we now precede the $a sub:Geld with a $z sub6eld, suppressed on printing, which contains the 4 digit number identifying each specific holding library which is also found at the end of the 008 field. 3. Variable fields currently used. 010 LC card number 022 Standard serial number 041 Languages 100 Main entry-personal name 110 Main entry-corporate name 111 Main entry-Conference or meeting 200 Title as it appears on piece 245 Full title 249 Sort key from 100 or 200 series tags stored in collating codes and limited to 120 characters 250 Edition statement 260 Imprint 500 General note 501 Bound with note 515 Note for explanation of dates, volumes, etc. 525 Supplement note 555 Cumulative index note 730 Added entry 850 Holdings 950 Cross-reference tracing NOTE: We have followed LC numbering for the above data elements, and have substituted blanks on the tape record for those elements omitted. We have also expanded the 008 field to include a variable number of 4 character elements which contain the index number of each holdings location listed in the z subfield of TAG 850. 172 Journal of Library Automation Vol. 6/3 September 1973 20739!JIJs~2221IIIIJJUI1,i621 tl0AUII4~J00000 llOilll:ilOUU .. fl .l450LI..&UUUlOO 24900820UI4U 260001100222 51JOOCI5:JU02J:l 7J0004600286 l't500U5800JJZ ~51;1}(;J~IIf J:ttl 711ll.t• fr .. c: ,.. • frta 111 ut.:JJ 0130 U:£• rr.an('t!'• "'jntst er~ •le l' ..,chu:atlol'l llAtlonale.t Ua..a Aunu.,jre dt!' l'.educa'i"n natlouele•l u i>a t t.:AN(:t.: .-'C I HI STEWE-Dt: - Lto.DIICAl' l O~NA 1lONALE·--ANNUA 1 R£-DE-L£DUCAT ION-NATIONAL E .1: o:.a p,.,.,,..,J 1~ .. "rubt•"'• I>Ar" L't!"st!tut p"'ed'aMolllque natlonale.ul l:ia i-r"&nc~: .. l11stitut p edaiiOMiqua nationale.a: ••• NnU f'h .. IL!icltdl9&2,1965,19b5, uno- •SeJ70.S~ qt'844t: .l~• »nU ~bM ~c$dCurrent voalu•e unly • .l074UUO~U.llltltll110164J I OO~UU450•JOOIJ 4J 100011-:00045 ll000SU00063 245003800113 249008200 IS l 260001100233 500044000244 5000U54006H4 "i:.H1Cli'72UI•JJ~ 500014'700E1U 51500'7~01057 730006901l36 '730005301205 730005401258 730003801312 85000760135(] 71121"'• fr fwe c 0107 ll:.ia o-"4!llt'l r"'v :.2& U~a ..-ran~e. Winlstere de l' .. educatlon natJ.onale.t: O:&a c .. ,.,l~JHIC "~"' th~se:, de doctorat.1 0~& t"" WANCf. .-11 IN ISTf:IH·:-DE-LEDUCATI ON-No\, IONALE.--cATA.LOGUE-DES-THESEs-De-DOCTOio\1•( U:.ia t- .. r.i>l•1 &~ ... At hf!'.arJ of titl6': IRR4/IR85-19.31: Winlstere de l'lnetructlon publlque; 1932-1958 •lntet'er• de l' .. edu.catlo n natlonale, l~lb0-19btl• lt184/B9-1958: laaued •nnually, 5 yearly parte beln• paaed contlnouely end •upplle d elth a ••neral t. P• and !4ub.lect Index to fore • volu••• Vots. 1-5, H ( 1884189-1904109, 1919/23) ha¥e aleo en elphebetlcat ll11t of author•• In vuts. b-7 11909/IJ-1914/UO a list uf author& Ia .-:lven •lth each taeclcle.S: l.,t;d HP...:Innln,. •lth 1950, vol. nuaberli no lon11er UMed•S: .J:Iio Ut"t~otlrtnln.r •1 th theses ot" 19JO, a llHt was a lao publlwhed ln parta, as auppleaente to the nuaber• of the llblloaraphle de Ia Franc~, 1. ptie., Blblloaraphle, 1932-data.l -IS• Tttle'.,•rJea: IH84/89-IUStJ, Catalo.eue des th"et~ee et ""acrlta acade•lc:~uea. 1965- ; called Blblla.raptlle da la Prance, Suppl .. ••ent A.nnu~l 0.1 t~a l!ll5-lt. C fA!:IC• J2-JJ) and 1911-tR (lase. 34-35) each laeued Jn one nuaber.S: I$"' 1-" rancor. Wiul9t .. ere de l'ln&tru~tlon publlque et dea beaux-erte.t l:i• Fr .... uce. !Hrectlon det> hJblloth eques de France .. t J5• JJihllu~otr.J.ph.le de la France. SUJ>pleaent Annuel D.S: 4:io• Ct~CIIlOio(U~ des theses de doctorat. t lo:Da NnU SlJitiH.ScSdV.l-15 ( IE84/18fl9-1958t; n. ser• l9511- 207410Us l.tu. Minist"ere de l'ln»tructlon publlque .. t ri•a beaux arte.Sb8ull•tln d .. a blbllothequea et de• archlvew.l: ~ .. FicANCF.-!41 NIS 1 EWE-Dh-Lf HS"fltUCliON-PUHLIQUF.-ET-DI:S-RI:"AUX-ARTS •• J 2:074JUOs .l2UrHJbiUUJ!=!t4 00HliUU50U000 150021000005 2490115011215 u 119 Fr11.ru: ... ~tni.st"~rl!' <.It!> !'instruction publlque vt dee beaua •rt•• C~•lt .. ., d~• trevaux hl•torlquea et sclentltJqu•9•lbF•anc ... <..:urzrit v tl~~ tru.v .. ux l•lt~CorifiUP.S et scientlfioues. Sectloa de G I!CIRr•phle.llulletln.1 :f.t~ t- ).IANLL::~ -.loll N I Sl"l:"Wt·:-Uh-LI NS ofUCll ON-PUHL JQIJE- El-OES-I:tEo\UX-AWTS.-CONITE-OES-TIIAVAUI-AISTC 5 ICUE5-ET-SC I ENTIF IQUF.S•.I Fig. 2. Master-File Listing. ments with the last four digits the holdings location index number which is the same as the suppressed $z subfield in the 850 field. Then the variable fields are listed in numeric sequence. Note the subfields as indicated by $z, $b, etc. The number to the left of each $a is the MARC tag number. Another departure from MARC is to store the call number as a subfield of the holdings statement since it may vary among participating libraries. To contrast how the information is stored and how it appears when published, the same record is shown in the left column of Figure 1. Also, the next record shown is generated from an added entry TAG 730 in this parent record. We have prepared a detailed coding manual which is fol- lowed by our coders; this document presents various examples of conditions and details the full system structural requirements. These changes in the format were made to simplify wherever possible, to provide for conditions which the original LC format did not cover, and to preserve the MARC structure with full text. With the exception of sub- ject headings, all bibliographic text is stored. Other MARC tags may be added to the system at any time. The initial system was tape-based, as our computer system at that time did not have uncommitted disk drives. Also, we needed to gain some de- tailed knowledge of the file and record characteristics to most effectively design the disk-based system. This knowledge could be gained easily after Minnesota Union List of Serials j GROSCH 173 some basic data were stored in the system. Since programmer time was our most precious commodity, this phased approach was used to: ( 1) achieve enough support on the tape system to permit publication of the Preliminary Edition of MULS while gathering file and data characteristics; and (2) bring into operation a disk-based system with completely automatic added- entry correction and generation, coupled with very flexible correction procedures. DATA CONVERSION Various methods of data conversion were investigated. Two requirements seemed obvious in our system-compilation of data on a code sheet and efficient, accurate keyboarding. Further, since the MARC character set was being used, any potential device had to provide a minimal keying situation to accommodate this character set. Compilation of data on a code sheet was necessary because multiple files in multiple locations would be checked to gather all of the information. Keyboarding had to be efficient as it was initially estimated that some 25 million characters would be entered before we were ready to publish the union list. The IBM Model V Record Only Magnetic Tape Selectric typewriter ( MT jST) was chosen as offering the best approach for high volume, short duration use. Three machines, each equipped with the special MARC element and key buttons, were leased. Typists easily corrected their dis- covered errors on these units. Each typist followed detailed typing in- structions and, after mastering the coding manual practices and procedures, was a trained coder. During July I August 1971 all training aids were prepared, forms designed, and staff recruited. The initial staff complement received their training during the last two weeks of August. During September the data gathering staff was brought to full strength and consisted of: Project Director Editors (librarians-library assistants) Senior Clerk-Typists Clerks (students) 1 FTE 4 FTE 6 FTE 12 FTE Full-time equivalents are used as staff were in many cases part time or temporarily lent to the project. During the period August 1971-June 15, 1972, which comprised the total data preparation time for the Preliminary Edition, five librarians and thirty-five students actually were trained and participated in the project. It took about six weeks to bring most of the staff to an acceptable per- formance level. Some students found the work too complex or detailed and voluntarily left the project. One clerk-typist did not gain sufficient pro- ficiency to pass out of a trainee status and was terminated at the end of her probation period. Thereafter, with a staff of this size, performance problems were minimal. ,i 174 Journal of Library Automation Vol. 6/3 September 1973 The data to be included in the Preliminary Edition comprised the university's • currently received, centrally recorded serials ( 20,000 titles); • inactive Periodical Division titles ( 8,000 titles); • coordinate campus locations of the university ( 4,000 titles); • complete departmental library titles excluding the Bio-Medical Li- brary ( 6,000 titles). The Bio-Medical Library was excluded due to its present mechanized serials system which would be used to produce a separate serials list, issued as volume 3 of MULS to the university and the MINITEX participating libraries. This separate publication was necessary due to the short time in which the initial data were to be collected. However, the Bio-Medical Library is now also being included in the body of the MULS data base. These four categories of serials necessitated quite different approaches dependent upon the available check-in files, shelflists, or catalogs. For example: to capture data on the currently received, centrally recorded titles we photocopied the Kardex drawers from the serial check-in file maintained in our headquarters library. These running titles were checked against the official card catalog in the library. If the title was found, the bibliographic infom1ation was transcribed, together with all Kardex and catalog locations. If not, the Kardex data were copied onto a code sheet for subsequent verification together with its listed location. About 5 percent of the time the photocopied sheet was illegible. These entries had to be transcribed from the check-in file, verified, and then passed on to the next step. When bibliographic data had been assembled on the code sheets they were edited in groups, each group accompanied by its photocopied sheet. Corrections were entered by editors, the catalog or check-in file was re- checked as necessary, and then the sheets were sorted by holding location. Next all holdings information was procured from the remote location to make sure it was the most reliable information. Finally, the sheets were returned to be rechecked and typed. "Mopping up" occurred at each holding location to encode inactive titles and uncataloged serials. When a title could not be verified, the piece itself was used to develop the main entry, added entries, and other pertinent cataloging information. Similar procedures were used on the inactive Periodical Division shelflist. Departmental library locations involved the use of shelf-locator visible indexes and shelflists, coupled with check-in files and branch catalogs. Coordinate campus locations outside the Twin Cities metropolitan area re- quired the checking of title/holdings listings provided by these campus libraries. Many entry problems resulted, because variant cataloging ap- proaches were used in many of these libraries. Typing and subsequent input were done as coding sheets became ready for keyboarding and were therefore in random order. Over 40,000 individual records were typed, each averaging about 480 characters (an approximate Minnesota Union List of Serials/ GROSCH 175 18 million keystrokes). During the period February-June 15, 1972, when the complete file was proofread from the thirteen volume master-file listing, another 5 million keystrokes were required to delete, to reenter, and to correct entries and associated cross-references. Our final keyboarding stroke count was exceedingly close to our original estimate of 25 million charac- ters. The proofreading portion of the data conversion took twice as long as originally anticipated, causing a delay of two months in photocomposition scheduling. Proofreading was completed on June 15, 1972, and on the following Monday the photocomposition vendor received the final output tape. Due to some format changes and continued systems problems the photocomposition output was not received until July 21. Printing and binding followed and on August 28 the Preliminary Edition, consisting of 1,566 text pages in two class A bound volumes, was ready for distribution. COMPUTER SYSTEM Two computer systems were used in MULS production. One system was used to convert MT /ST cassette tapes and involved initially an IBM 2495 cassette converter coupled to an IBM 360/ 20 system. This configuration was replaced by off-line tape conversion using a Data Action Tape Pooler and the same computer for code conversion and record blocking. Two- hour to one-day service was provided by this service center, located in a local insurance company. The raw data tape resulting from the above process then required processing on the second computer system, an IBM 360/50 at the University of Minnesota. All programs are written for the COBOL F compiler and operate under OS/ MFT using 1600 cpi magnetic tape. Two 80K core par- titions are required for the updating and printing programs. The ALA graphic print train is used to print the file and control listings. Figure 2 was printed with this character set. PROGRAMS MULS programs for the present tape system were conceived as two sets: ( 1) conversion, file creation, and updating; and ( 2) printing functions. The first set performs the following functions: • identification and checking of fields for validity, tagging, and structure from the raw input tape; • creation of MARC-type main entries; • creation of secondary entries generated from the added entry (TAG 730) and cross-reference (TAG 950) fields; • creation of correction and deletion entries; • sorting of main entries and the generated secondary entries in alpha- betical sequence; • sorting of correction and deletion entries in sequence number order; • addition of new records; "' ~~ .. ,,, ,, 'I'' I ... J• !'I · · ~-· 176 Journal of Library Automation Vol. 6/ 3 September 1973 • deletion of an old record; • addition of a new variable field, including holdings statements; • substitution of data in a variable field; • deletion of a variable field; • production of a transaction file reflecting changes to the data base; and • generation of a new master tape, which can include resequencing the entire file andjor producing a work list of the file. However, any change in a !00, 200, 730, or 950 tag requires deletion of the complete record with its secondary entries, and reentry of the record in its changed form. This is because a two-pass update would be required in the tape system to automatically colTect secondary entries as well as to generate them. The second set of programs perfom1s the following functions: • printing of a formatted work list selectively by location or combination of locations, diacritical printing preceding the character to which it applies; and • printing of a conventional union list format which closely duplicates the design of the photocomposed page in Figure 1. Selectivity by location or groups of locations is present and all diacritical characters are overprinted as in the photocomposed list. PHOTOCOMPOSITION The Preliminary Edition of MULS, as shown in Figure 1, was photocom- posed by a Twin Cities firm using a Harris Fototronic CRT composition system and an IBM 370/ 145 computer system. We chose the lowest bidder which was fortunately a local firm. The bid required the vendor to pro- gram from our MARC format master file tape an input tape for the photo- composer which would produce the specified format, using the MARC char- acter set in a font to be chosen from sample text pages. The vendor's bid in- cluded programming, composing, and procurement of several of the characters used by MARC which were not in his current font repertoire. A test tape was provided to the vendor for his developmental use, together with documentation on the MARC MULS system. After seeing the initial result of our specified format we were not pleased with the result. The reason for this was compounded by the fact that: • the vendor had not followed some of the suggestions; • the vendor had made some unspecified changes; • the program had injected some data errors and other unacceptable conditions; and • ·the library, in its total lack of experience with this variable density form of display, had no idea of the real effect of its proposed format in getting efficient character density coupled with attractiveness. Each of the design problems was looked at in order to adjust character Minnesota Union List of Setials j GROSCH 177 size, column length or width, continuation line placement, display form (bold, regular, oblique), and relative data element placement. Four iter- ations were required to finally produce the format shown in Figure 1. As a result, our photocomposition and printing costs were half the costs had the original format been developed. Style and readability also improved dramatically. The choice of type font was made by comparing sample pages in both serif and sans serif styles, including Times Roman and other well-known fonts. Various library staff members were asked to vote on their preferred font. News Gothic was an overwhehning favorite by both public and technical services oriented librarians. The photocomposition vendor had produced many catalogs and books using other special alphabets and characters, but had not previously done any catalog from a MARC format tape. This made possible a high degree of expertise on their part in handling our special character requirements, but added some developmental problems because of lack of MARC format ex- perience. Except for superscripts, subscripts, and the underline, all MARC characters have been needed to display the text. Our advice to those considering catalog photocomposition is to request bids, as the price on this service has continued to drop. The page price will be dependent upon the services perfom1ed. In our case the vendor handled all composition programming. One can estimate that at a minimum 40 percent-50 percent of the page charge would be involved in this service. Also, the size of the job will cause a variance in the price a vendor will quote-the larger the number of pages, the cheaper the cost per page. On a very large application it may be to the library's advantage, if resources permit, to train their own programmer to program the composing device. However, we feel that our best needs were served b y contracting for this support as our programming staff was limited and did not have any prior composing-machine experience. COSTS The expenditure to produce a computer-based serials catalog will vary dependent upon salary and equipment rates and the conditions found in the library system. In the case of MULS, condition of the files used ranged from disastrous to excellent, yet with only fragmentary information in each file. Moreover, entry forms varied greatly among the many check-in, shelf- list, and catalog files. Therefore, data collection was much more expensive than it would have been had we keyboarded directly from one existing file of data. To present some idea of costs for others planning similar activities, we have developed some average costing information from our expenditures. Each main entry in MULS costs $2.81 on an average, figuring all known actual charges or subsidized costs. This main entry cost includes all associ- ated secondary entries, which is about one secondary entry generated per h ll, h i" II"• rr! I; ~ ;;jL I I" I'" 178 Journal of Library Automation Vol. 6/3 September 1973 1.5 main entries. This $2.81 breaks down to approximately $1.00 for design, programming, and administrative costs; $1.40 for data conversion; and $.41 for photocomposition, final printing and binding. Let us look at some specific items which figure into this average cost per record to give the reader some idea of what is reasonable to expect in a project of this sort. A good example is conversion of MT/ST cassette tapes to computer compatible magnetic tape, including code conversion and blocking of the records. Our per-cassette conversion cost varied from $.50 to ·$2.00 per cassette. This variance was caused by a change from on- line to off-line conversion and the problem of handling cassette tapes which did not have the proper stop code at their end. Our actual billed average throughout the whole project was $. 73 per cassette. If no tapes had been prepared omitting stop codes and if total off-line conversion had been used, our average would have been $.50 per cassette. A typical cassette tape averaged seventy-five new MARC entries, so this was a very economical charge for this method. Another specific cost to examine is computer time. On our IBM 360/50 system, time is billed as time on/off the system and not according to some calculation of CPU /channel/storage/peripheral device usage. Normally an internal university rate is a great deal cheaper than a commercial rate for the same equipment. However, the billing method used in our system has probably increased our costs for computer time over the CPU time method of billing, since the user is at the mercy of contending with other jobs on the system at the same time; i.e., waiting for his processing turn. This has had a noticeable effect in our case; run times to update the file have varied from four to six hours machine onjoff time almost independent of the number of transactions being processed. Photocomposition page rates over the last few years have been dropping as competition in this area has flourished. Two years ago it was common to receive quotes of $6.00 per page or even higher. Most prices we re- ceived were under this figure; but at the time our contract was · signed, our successful bidder, who also was our lowest bidder, quoted $2.60 per page. This included full programming support to convert our MARC format tape for creation of the photocomposer input tape. Today rates much lower than this can be found. Moreover, rates under $1.00 per page can be obtained if the customer is able to create his own input or driver tape for the photocomposition device, making this method considerably more attractive for even low volume per-page printing. In the case of MULS, one photocomposed page equals ten double column computer printed pages without photoreduction. Photoreduction can cut computer output pages about one-third, yet obviously not to the limit achieved through the photocomposition method. Therefore, considerable printing costs can be saved dependent upon the number of copies of each page printed. Minnesota Union List of Serialsj GROSCH 179 PROBLEMS The problems encountered during this project and its daily operation presently have been, for the most part, those commonly found in any large scale project. The large volume of data, less than ideal computer environ- ment, condition of the original data, and large staff required to produce this effort all magnify many problems which seem unimportant in a small or short term project. In general these problems fall into the following categories: ( 1) data handling and bibliographic; ( 2) communications; ( 3) estimating; and ( 4) hardware or computer related problems. Data Handling and Bibliographic. Those who create and use research library catalogs can appreciate the formidable physical problem in any data conversion activity. A half century or more of cataloging variations must be brought together; mistakes in the original data, differences in format of cards, and spelling or usage inconsistencies must be weeded. Couple this situation with a new staff, large in number but containing few pro- fessionals. The result could be disastrous if proper decision-making and problem identification did not occur. Not knowing the magnitude of these problems we decided on almost verbatim transcription of records but spelling out all abbreviated words in any filing field. When our first file listing appeared-some 40,000 main entries plus 30,000 secondary entries-we saw that the filing arrangement was very poor due mainly to spelling variants, failure to consistently follow instructions to spell out abbreviated terms (which somehow escaped editing), and dif- ferent entry forms for the same body. Transcription of data from the original source was very accurate but because of these problems in the original data our proofreading resulted in some change occurring in about 10,000 of these 70,000 records. The use of punctuation marks in main entries varied so much that some corporate entries were filing in five or six separate groups in the list, each separated perhaps by several pages. The great shocker was the arrangement under the United States, as some coders had copied exactly from the card without spelling out U.S. and in- serting a period and space. About a dozen entries had failed to be caught by the editors and appeared as one block. Then, to compound the problem, others spelled out United States but forgot to insert a period after it. More- over, very early in the project the typists incorrectly inserted 2 spaces after the period. In all, there were six forms to the U.S. entries alone, with only one being correct. This lesson taught us that no matter how well instructions and examples are prepared misunderstanding can result; and, of course, editors and others will not catch all possible errors. However, these major errors were eliminated before publication. With the large volume of data and limited funds our conversion process was quite streamlined with most of the error- checking resulting after the data were on tape and displayed in their proper relation to other records. Few keyboarding errors occurred which 180 Journal of Library Automation Vol. 6/3 September 1973 were not caught at typing. The predominant errors resided in the nature of the original data, or in the lack of some piece of information from three or four different files which may have been checked in building the full record. Communications. In any large project effective communication is neces- sary to improve quality of work and progress toward completion of the scheduled task. Frequently scheduled meetings of the staff were used to inform all project members of decisions, receive their suggestions and criticisms, and develop coordinated work assignments among the teams of each editor /librarian. All typing personnel were trained as coders and were periodically relieved of typing to code. This gave them an insight into detecting problems for referral to the professional staff, renewed their knowledge of proper format, and provided more variety in their work. All project members were capable of performing tasks of coding,. control list checking, and proofreading. The most capable clerical staff also assisted the editors in editorial work. It was felt that our use of the team approach, unified training, frequent staff meetings, and very detailed written docu- mentation served to channel communication with a resultant minimization of these problems-once the first few months of the project had passed. Estimating. In most data conversion work accurate estimating is required on many matters. Some estimates we made were very accurate, such as basic time and staff to complete initial coding, typing time and staff, and supplies needed. However, other estimates were not very accurate. For example, the time to edit and correct the file once basic data collection was completed was double our original estimate and required more typing than anticipated. This caused the publication schedule to be delayed two months. Difficulties at the computer center and at the photocomposition vendor caused another two months delay, even though it is doubtful that our photocomposition firm would have been ready had we met our original estimate. Our original target was publication not later than two months after the basic data collection period of six months, i.e., in eight months. However, on a project of this size, and with the addition of about 7,000 more titles than we had originally estimated, we did not feel that the fifty- four weeks really required was excessive. Computer time was also difficult to estimate because of the time on / off the system. Dependent upon the nature of the other jobs on the computer, this time varied greatly, for updating runs were almost independent of the number of transactions. There is always room for improvement in esti- mating, and, obviously, we have learned many things from this experience to use in further work. Hardware/Computer Center. Our largest problem was creating firm com- puter scheduling commitments on our campus IBM 360/50 computer, which serves the business functions of the university. All other campus computing facilities use Control Data equipment which is six-bit character, word oriented. With the extended character set requirement and the availability Minnesota Union List of Serialsj GROSCH 181 of the IBM 360/ 50, which we were already using for other work with the ALA graphic print train, it was natural for us to choose this system. Current facilities are now satisfactory to permit our tape batch system operation and the development of our new disk-based batch system. Tape pooling operations for the MT / ST have caused some problems due to equipment changes at our vendor. We have now switched to a new con- version source as our former vendor upgraded his data entry system to key- to-disk. The three MT/ ST typewriters we leased pedormed quite reliably, but one machine seemed to have more down time than the others. Now that our typing load is down1 we have cancelled two Model V s and will maintain two machines. We are now choosing a new system for key input to cassette tape. On the new equipment we will do our proofreading and initial correction off-line resulting in a further cost saving. This was not possible previously as our typing load required two-shift operation on all machines during the Preliminary Edition preparation time. CONCLUSION A great amount of effort has been expended to achieve a unified serials data base to serve Minnesota's libraries. It is our hope that this system can continue to be developed in as flexible a way as possible so that future needs can be supported through the system. Only the imagination of those involved in networking is the limit to identifying the future needs to be met through access to this data base. Of course, we would hope that one day our data could benefit the development of other similar programs in other states and, perhaps more importantly, in achieving a true national serials data base. ACKNOWLEDGMENT Many staff members at the university and other institutions contributed their invaluable counsel as we h~ve proceeded on the development of the system and the data base. The MULS project staff particularly receives our deep gratitude for its yeoman effort. Special commendation is due Mr. Don Norris for systems design and principal programming support. Mr. Carl Sandberg, who wrote all printer output programs, also contributed invaluable assistance to the project. The MINITEX program and Uni- versity Library administration receive our appreciation for placing their confidence .in the Systems Division. MULS and its support system is truly a product resulting from the coordinated concern and interest of the afore- mentioned individuals and groups. REFERENCES I. U.S. Library of Congress. Information Systems Office. Serials: A MARC Format. Preliminary edition. Washington, D.C.: Library of Congress, 1970 (L.C. 73-606842). 2. U.S. Library of Congress. MARC Development Office. Serials: A MARC Format. Addendum No.1. Washington, D.C.: Library of Congress, June, 1971. 5777 ---- lib-s-mocs-kmc364-20141005044842 182 TECHNICAL COMMUNICATIONS REPORTS-LIBRARY PROJECTS AND ACTIVITIES Ohio State University Health Sciences Library Uses Autamated Bookstack System The new Health Sciences Library at Ohio State University began serving stu- dents in May 1973 with some of the most advanced features in any library in the country. It contains an automated book- stack system to locate and file books, and is the fourth library in the country to have the system ( Randtriever, manufactured by Remington Rand Corp.), says Jo Ann Johnson, director of the Health Sciences Library. "The bookstack system will find and de- liver a book via a conveyor belt in about a minute," Miss Johnson said. The chief advantages of the system are that it saves space and is speedy and accurate, she pointed out. "The book stacks in the new library take up about 15 percent of the total space while in most libraries the stacks take 40 to 60 percent of the space," Miss Johnson said. Aisles in the stacks are narrow, about 15 inches, and the shelves rise through two stories of the library-twenty-two feet in all, she said. The library has a ca- pacity of 175,000 volumes. The accuracy of the system will reduce the problem of misfiling. Also, book theft should dwindle because the stacks will be closed to users, she said. The library is connected with the com- puterized circulation system of the univer- sity library, made up of a main library and twenty-three branch libraries. This circu- lation system is the first of its type in the country and permits library users to place telephone calls to learn titles and authors and to charge out books. Other features of the modern library will include a computer-assisted instruc- tion area to be completed later, and con- nections to MEDLINE, the international computerized information system of med- ical journals. Miss Johnson explained that the auto- mated books tack system works like this: A library staff member sends instructions via a terminal to an electronic device in an aisle. The device travels on vertical and horizontal columns in the aisles. It picks out a small bin of books containing the requested one, then travels to the end of the aisle and places it on a conveyor belt. At the terminal, the staff member se- lects the requested book from the bin, usually containing about eight volumes, and sends the bin back for refiling. A glass window permits observation of the system. University of California, Berkeley Serials Key Word Index The University of California, Berkeley, General Library has published a Serials Ke y Word Index to titles of 45,741 seri- als. The computer-produced index is the largest of a fairly new variety of key word indexes, covering titles of serials rather than articles. The 360/ Assembler pro- grams written by the Library Systems Of- fice include a number of innovations. Berkeley serial records are stored in MARC format, upper-lower case, capital- ized by citation rather than catalog stan- dards. The key word extract program ig- nores prepositions and conjunctions, etc. (which are not capitalized); treats certain multiword terms (La Paz, United Na- tions) as single words; prepares a library- standard sorting key (with U.S. filing as UNITED STATES, & filing as AND, and distinction made between two types of hy- phenation); and does no stop-list search- ing or other searching for excluded words. Key lines are sorted by key word; all other processing is based on an alphabetic file of key words attached to main entries. Thus, vocabulary control (forced inter- filing of abbreviations, synonyms, cog- nates, etc.-not heavily used in this edi- tion) is a fast, simple runtime operation, changing certain key words (on a single alphabetic pass) and generating "see" ref- erences. Exclusion of low-content words is also a fast, simple runtime operation, done in the printing program, allowing ex- cluded-word entries to print if the word occurs first in either title or author, and generating an explanatory note under each excluded key word. Listings are main-entry, alphabetic un- der key word groups, with brief holdings, campus location, and call number where available. The key word appears in all capital letters within each entry, and re- dundant entries are collapsed-that is, if a wqrd appears more than once in an en- try, each occurrence is capitalized, but the entry is only listed once under the key word. The first edition limits entries to 98 characters and holdings to 13 characters; the programs have since been revised to allow up to 193 character entries and up to 45 characters of holdings. Both versions T echnical Communications 183 of the programs retain as much runtime flexibility as possible, while maintaining extremely low running time. The first edition, including mostly non- document currently-received titles, is photocomposed in a 6-point slab-serif type and published in three paperbound vol- umes. Copies are available for $60 a set from Systems Office, Main Library, Uni- versity of California, Berkeley, CA 94720. - Walt Crawford, University of California, Berkeley PROGRAMMING AND COMPUTERS PLEA, a PLI 1 Efficiency Analyzer PL/ 1 users find that the language of- fers infinite ways to invoke inefficient code. Partial defense is provided by care- ful manual reading. Another, and very PLEA UT~~CTIO'! ANALYSIS PACE STAT£NENT T~A~ COUNTS FOil MAIN P~OCEOORf T!SliMEo OFFSET 000)68 IN LOAD NOOULEo 'l 141 :' 69 ·r- 10 1 3 11 "\J ~ u •• ~~ 2 II ,, u 4 »- ' I Z1 l6J ---lt 25 9 35 14 u 5 ' ' f: Hi ' .. "s .. , u ~- 34 10 !i 3 .-... !. · . Ul .... _,... 6 ' E0Tt1) ; J = (TOF + BOT) I 2; IF B > .A(J) THEN BOT = J; ELSE TOP • J; END; IF TOP> 101 B = A(TOP) THEN; /•NOT FOUND•/ ELSE FLAGl • 'l'B; 'f, Total ~ 20.4 10.1 13.4 s.z 0.7 19.4 1.1 1.5.6 88.9 9.4 Totals 98.)* ( 0 )- Truncation error. No.Traps 151 197 121 1Z 28J 17 232 1312 146 14.58 Fig. 2. Each test repeated 2000 times with Argument B in Array and 2000 times with Argument B not in Array. Sampling interrupt interval was .00 seconds. to get a reasonable sampling of the re- mammg blocks. A comparative run showed that the optimization overhead was charged to the proper statement groups, but pragmatists will note that the problem setup was biased against the bi- nary search solution. However, using the trap totals from Tests 2, 3, and 4, the 50 percent Proba- bility Test indicates that the probability of no significant difference between meth- ods 2 and 4 is more than 5 percent; the probability of no significant time differ- ence between methods 2 and 3 is more than 1 percent. PLl:A is available at a program distri- bution fee of $25 from the SHARE Pro- gram Library Agency, Triangle Universi- ty Computer Center, P.O. Box 12076, Research Triangle Park, NC 27709. Thanks are due Dr. David Gomberg, University of California, San Francisco Computer Center, for most of the runtime and several of the statements used in the test.-]ust·ine Roberts, Systems Librarian, UC-San Francisco INPUT To the Editor: I am writing to you concerning the ar- ticle which occurred in the September 1972 issue of Journal of Library Automa- tion entitled "The Shared Cataloging Sys- tem of the Ohio College Library Center." I also note that this issue of your jour- nal, even though dated Sept.-mber 1972 was not received until July of 1973 by this library, and, indeed, it was a timely arrival for at the present time the North- west Association of Private Colleges and Universities is investigating the feasibil- ity of seeking service from the OCLC for some of its library requirements. However, in talking with Mr. Kilgour and his asso- ciates at ALA this summer it was exceed- ingly difficult to get a complete cost pic- ture of participation in the OCLC and to this date we have not been able to get a complete cost breakdown obligation. In this regard, this article was extreme- Technical Communications 185 ly interesting and I requested one of my staff members to do a careful analysis of the cost aspects of the OCLC services. I am attaching this analysis for your inter- est and perhaps it will be of suitable per- tinence for the readership of your journal. Certainly I, and other of my colleagues at this university and in NAPCU would more than be interested in response by Mr. Kilgour and his associates. SUMMARY: Desmond Taylor Library Director Collins Memorial Library University of Puget Sourul Tacoma, Washington AN ANALYSIS "Average cost per card for 529,893 cat- alog cards in finished form and alphabe- tized for filing was 6.57¢ each ... the system is easy to use, efficient, reliable, and cost beneficial. An off-line catalog card production system based on a file of MARC II records was activated a year be- fore the on-line system." Requests were hatched weekly. Library of Congress card numbers were keypunched onto cards for searching. Seventy percent were found the first search. "Members could specify a re- cycling period of from one to thirty-six weeks ... before unfulillled requests were returned." Lowest price in lots of one-half million Permalife cards was $8.01 per thousand. CPA's checked the system and found that all direct costs were included in 6.57¢ cost. No mention is made of the preexisting cataloging systems-IAnc FIELD ~~COG. MAl\CilREC. FORMA'TTER SKED OUTPUT SORT~RECOIH) CRE.o\Tt)R P.~>Cero~u OUTI'\I'rPACE FORMATTER Fig. l. BIBCON: Basic System Schematic INmALlNflri' DATA l'I\OC:ESSOR PBlm'SUS PROOF LISTING FORMA. TIER BIPUST OUTPUT ENTitY AND COLOMN F ORMAtTER IBM SORT OU"rPUT FILE SORTER must be prepared for book catalog production, with any of the standard catalog entries as keys. (b) Provision must be made for the widest feasible variety of col- umnar output formats. (c) The format for any machine-readable records must be compatible with the MARC standard. The system has been installed with revisions and modifications on an IBM 360 Model 50 computer used by the California State Library. All pro- - Knowledge Numbers 090 Call Number Main Entry 100 Main Entry Supplied Titles 240 Uniform Title Paragraph 245 Title Collation 300 Collation Series Notes Bibliographic Notes 500 Notes LC Subject Headings BIBCON ! GIBSON 239 650 Subject Added Entry Other Added Entries 700 Author Added Entry 7 40 Title Added Entry (Traced Differently) Series Added Entry 810 Series Added Entry (Traced Differently) Remaining Unspecified Data 099 Remaining Unspecified Data 400 Series, Traced (Personal ) 410 Series, Traced (Corporate) 440 Series, Traced (Title) 490 Series, Untraced or Traced Differently Fig. 2. Variable Field Tags-AFR-MARC II grams in this version are written in the IBM Basic Assembler Language ( BAL) instead of the original combination of BAL and COBOL. In its .first version, BIBCON processed monographic records exclusively. Various programs have now been modified so that the system will also process serial records in a simplified MARC serials format. This article, however, will describe only the system for processing monographic records. The system has been used to produce catalogs of monographs for UC Santa Cruz, UC San Diego, and the one million record supplement to the UC catalog of books.2- • Portions of the system were used to produce the initial copies of the University of California Union List of Serials. The California State Library Automation Project is using this basic .file management system to process both monographic and serial records for the production of several book catalogs. These will include, principally, the California Union List of Periodicals, reflecting the periodical holdings of libraries throughout California, the California State Library List of Peri- odicals, and the Catalog of Books in the California State Library. AUTOMATIC FIELD RECOGNITION (AFR) At the heart of the system is the program which creates MARC-like rec- ords from unedited input data. This program, called Automatic Field Recognition or AFR, identifies control and variable fields and creates a leader and record directory for each record submitted to it. In order to ac- complish this, when a record is submitted to the program, it .first sets aside areas into which data for each of the four parts can be placed. The field 240 Journal of Library Automation Vol. 6/ 4 December 1973 Control Numbers 0 1 0 LC Card Number 0 11 Linking LC Card Number 0 1 5 National Bibliography Number 0 1 6 Linking NBN 0 2 0 Standard Book Number 0 2 1 Linking SBN 0 2 5 Overseas Acquisitions Number (PL480, LACAP, etc.) 0 2 6 Linking OAN Number 0 3 5 Local System Number 0 3 6 Linking Local Number 0 4 0 Cataloging Source 0 4 1 Languages 0 4 2 Search Code Knowledge Numbers 0 50 LC Call Number 0 5 1 Copy Statement 0 6 0 NLM Call Number 0 7 0 NAL Call Number 0 71 NAL Subject Category Number 0 80 UDC Number 0 8 1 BNB Classification Number 0 8 2 Dewey Decimal Classifica- tion Number 0 8 6 Supt. of Documents Classi- fication 0 9 0 Local Call Number Main Entry 10 0 Personal Name 110 Corporate Name 111 Conference or Meeting 13 0 Uniform Title Heading Supplied Titles 2 4 0 Uniform Title 2 4 1 Romanized Title 2 4 2 Translated Title Title Paragraph 2 4 5 Title 2 5 0 Edition Statement 260 Imprint Collation 3 0 0 Collation 3 5 0 Bibliographic Price 3 6 0 Converted Price Series Notes 4 0 0 Personal Name-Title (Traced Same) 410 Corporate Name-Title (Traced Same) 4 11 Conference-Title (Traced Same) 4 4 0 Title (Traced Same) 4 9 0 Series Untraced or Traced Differently Bibliographic Notes 5 0 0 General Notes 5 0 1 "Bound With" Note 50 2 Dissertation Note 50 3 Bibliographic History Note 50 4 Bibliography Note 50 5 Contents Note (Formatted) 5 0 6 "Limited Use" Note 5 2 0 Abstract or Annotation Subject Added Entries 6 0 0 Personal Name 6 1 0 Corporate N arne ( exclud- ing political jurisdiction alone) 6 11 Conference or Meeting 6 3 0 Uniform Title Heading LC Subject Headings 650 Topical 6 51 Geographic Names 6 52 Political Jurisdictions Alone or with Subject Subdivisions Other Subject Headings 6 6 0 NLM Subject Headings (MESH) 6 7 0 NAL Subject Headings Fig. 3. Variable Field Tags-LC-MARC II 6 9 0 Local Subject Heading Systems Other Added Entries 7 0 0 Personal N arne 710 Corporate Name 7 11 Conference or Meeting 7 3 0 Uniform Title Heading 7 4 0 Title Traced Differently Fig. 3 (continued) BIBCON /GIBSON 241 7 50 Name Not Capable of Authorship Series Added Entries 8 0 0 Personal N arne-Title 810 Corporate Name-Title 811 Conference or Meeting- Title 8 4 0 Title identification progresses on the basis of two signal symbols which are in- serted between fields during input and on the basis of the order and con- tent of the fields. When a control or variable field is identified, a standard MARC record directory entry is created, containing the AFR-MARC II field tag, the length of the field, and the starting character position of the field (Figure 2) . Necessary indicators and subfield delimiters are also creat- ed and placed in their proper positions in the field's data stream, and the field, along with its field terminator, is placed into the area set aside for data fields. AFR-MARC II Records It is important to emphasize that the system produces MARC-like rec- ords rather than full MARC records. While the basic record structure is exactly like that of standard Library of Congress MARC, distinctions such as personal versus corporate main entry are not shown by the field tagging and the degree of subfield delimiting is extremely restricted.5 Compare the list of variable field tags for AFR-MARC II (Automatic Field Recogni- tion MARC II) records to that for LC-MARC II (Library of Congress MARC II) records (Figures 2 and 3). At present, AFR-MARC II provides detailed subfield tagging for only two fields, call number ( 090) and title ( 245). This lack of detailed discrimination causes no problem, however, for output of book catalog entries. It can affect filing sequence, since ALA filing rules depend on such distinctions as personal versus corporate author to determine proper sorting. The decision to omit detailed subfield discrimination is a concession to cost. The two principal developers ( UC and CSL) decided that, for book catalog production, detailed subfield delimiting would be of little value and that the benefits of such detail (i.e., ability to sort according to LC filing rules) would not justify the added costs in editing, input, program- ming, and processing which would be required to provide this detail. A sample of an AFR-MARC II record created by the Automatic Field Recognition program is shown in Figures 4 and 6. It can be contrasted with the LC-MARC II record for the same title (Figures 5 and 7). Both a ma- chine-based representation (Figures 4 and 5) and a formatted output ex- ample (Figures 6 and 7) of the record are shown. 242 Journal of Library Automation Vol. 6;4 December 1973 t $aSocial~policy--Bibl. 10 $aUnited~Nations.~~Dept.~of~Social~Affairs. t ~ = blank 1 = field terminator t = end of record rZ ?164 s66. 11.5 00001 United Nations Educational, Scientific and Cultural Organi· zation. Education Clearing llou.se. Education for community l' 2 fal ~-\tr:a!r!t. n. Title. ( S~rlcs) m, J.., , J.t. 1.1. 1 s I.B5.U37 no.·{ 0 016.370193 ---Copy 2. 1 Z711H.S06US Library of Congress 151 L United Nations. in, i, s r, l, ~&-373 Fig. 4. Sample Library of Congress Card in APR-MARC II Format BIBCON /GIBSON 243 Fig. 5. Sample Library of Congress Card in LC-MARC II Format Input Data AFR creates MARC structured records from unedited input data. To what does "unedited" input refer? Without a program such as AFR, each MARC field tag, subfield code, indicator, etc., for every MARC or MARC- like record must be manually supplied by a human editor. With AFR the input keyer simply indicates that some field is beginning; it is then up to the AFR program to identify the field. AFR will accept input created by a variety of methods. The decision on input method is based principally on cost. Since input costs can vary widely as a result of various local conditions, provision has been made in the BIBCON system to accept data in card or tape format. Keypunch and Op- tical Character Recognition (OCR) input are the two methods used thus far. A sample OCR input record appears in Figure 8. While input instruc- tions will vary according to the input method used, the four basic keying requirements remain the same: 1. Begin an input record with an identification number. 2. Place a field separator symbol before each field (i.e., each indention on the catalog card). 3. Place a different symbol (called a "location" symbol) after call num- ber and after the library location data. 244 Journal of Library Automation Vol. 6/4 December 1973 ERRORS TAG1 IND 2 SUB3 DATA RECORD NO. 0000001 LEADER DIRECTORY 00689nam 00145 , 008004100000090002500041099007700066100009700143245014900240 300001800389410004500407650002500452650002100477700004600498 008 090 $a $c 100 10 $a 245 1 300 $a $b $c $a 410 21 $a 650 0 $a 650 0 $a 700 10 $a 099 $a 710324sl954 Z 7165 S66 US Ref. 00000 eng United Nations Educational, Scientific and Cultural Organization. Education Clearing House. Education for community development; a selected bibliography, prepared by UNESCO and United Nations [Division of Social Affairs. Paris, 1954] 49 p. 28cm. Its Educational studies and documents, 7 Social policy--Bib!. Education--Bib!. United Nations. Dept. of Social Affairs. LB5.U37 no.7 /016.370193 /55-373 /Z7164.S66U5 /Library of Congress$ 1. TAG = Field tag. 2. IND = Indicator. 3. SUB = Subfield code. Fig. 6. AFR-MARC II PRINTSUS Output Format 4. End each input record with an end-of-record symbol ·! Variations on these four basic rules may be required because of restric- tions of the input device used, because of variations in content or form of the input data, or because output specifications require nonstandard treatment by the programs. The task of manipulating the varying input into a form which is acceptable to AFR is performed by a program called PREAFR. PRE AUTOMATIC FIELD RECOGNITION (PREAFR) This program provides the interface between any one of the different input methods and the AFR program. Basically, PREAFR accepts data from keypunched cards, and OCR PREAFR accepts it from tape records. Both forms of the preprocessing program combine input data segments until an end-of-record symbol is reached, indicating that all the data for one bibliographic record have been assembled. A character by character search is made, and special characters and diacriticals which were input as special codes are translated into the values necessary for output processing. BIBCON /GIBSON 245 ERRORS TAG IND SUB DATA RECORD NO. LEADER DIRECTORY 001 008 050 0 051 0 082 $a $b $a $a $b $c $a 110 20 $a $b 245 1 $a $b $c 260 1 $a 300 $c $a $c 410 21 $a $b $t $v 650 0 $a $b 650 0 $a $b 00804nam 2200181 001001300000008004100013050003100054051002700085082001500112 110009700127245013500224260001800359300001800377410013600395 650002500531650002100556710004600577 55-373 710324sl954 fre LB5 U37 no. 7 Z7164.S66U5 LBS U37 no. 7 Copy 2 0164.370193 00000 eng United Nations Educational, Scientific and Cultural Organization. Education Clearing House. Education for community development; a sel~cted bibliography, prepared by UNESCO and United Nations [Division of Social Affairs. Paris, 1954] 49 p. 28 em. United Nations Educational, Scientific and Cultural Organization. Education Clearing House. Educational studies and documents, 7 Social policy Bibl. Education Bibl. 710 20 $a United Nations. $b Dept. of Social Affairs. Fig. 7. LC-MARC II PRINTSUS Output Format In addition the program can perform several editing and checking func,.. tions. These functions are optional and are dependent upon the input equipment and upon the wishes of the user. Options such as deletion of data on the basis of special input symbols, checking to determine that the record control number is valid, and production of a file of control num- bers for records in which data could not be interpreted by the input device are standard. Because this program provides the interface between different, nonstan- dard input methods and one standard record formatting program, it is very user-dependent. The basic logic will remain the same, but individual options will have to be added or subtracted by each separate user. 246 Journal of Library Automation Vol. 6/4 December 1973 0000001 R=Z 7164 =Sbb =US Y=REFY=UNITED =NATIONS =EDUCATIONAL, =S CIENTIFIC AND =CULTURAL =ORGANIZATION. =EDUCATION =CLEARING =HO USE./=EDUCATION FOR COMMUNITY DEVELOPMENT; A SELECTED BIBLIOGRAP HY, PREPARED BY =U=N=E=S=C=O AND =UNITED =NATIONS {{=DIVISION 0 F =SOCIAL =AFFAIRS· =PARIS, 1954}}/49 p. 28 CM· {=ITS =EDUCAT IONAL SIIITUDIES AND DOCUMENTS, 7}/1. =SOCIAL POLICY--=BIBL· 2. =E DUCATION--=BIBL· =I. =UNITED =NATIONS. =DEPT. Of =SOCIAL =AFFA IRS· =I=I· =TITLE· {=SERIES}/=L=BS.=U37 NO. 7/016-370193/55- 373/=Z71b4·=~bb=U5/=LIBRARY Of =CONGRESS$+ NOTE: Data are from the catalog card shown in Figure 5. Fig. 8. Sample OCR Input S 0 0.0 0 1 0 0 9 8 It 1 , 8 B - ~~ U 0 E L A I R E W B .5JillOZZ'i0000EQFOEQEDEIFOF~H.E1~1E4C4C503C!t2.Jl2t5404062EpB8.96'tl!.MJUAUOilUM!lllllllill.__ C G P -'1-'l8.5.hfill!tQlEC.lllllll!iift.3.1U9.ll!WiAZA2JUAB!l!JIZAB40C7859699.ll.Qtl2400T96A493~~289968799Q197~~1l9.!1_~- -- R . K • T . R -.S.l..'ll~.QAl!i6.ti!!~95A3B!99B985AZ!t08ZA8400997828599A3'tl!.llU!>llll!B~l!fllUl~AllllWll!!Hill.Allill.'l..~r,~;:____ A .J E • G eS 0 _M,9,9.AliO.Cl93S}B5S54CBl,5l.~QIU.81!J485A240C594949695A2484040~~~Ull.lg£li,489A2A3?.!ll__ u.s. w p • c c • 1'1 ___8.1_MUB.S.a.~t,4C8995.t.013A8~8f24B40AZA840f6CJ69993844007A48~~JlU.~5JlS!illtl~M.~~!t.Qf.l_E2_· 69 189 • • ( .,. • -EM'.S.3.E!tll~ 0 6 2 fl F 8 F!! •OS 7484040899393 A4A2484040 9781 99A340 839693 48,06840 86818 3A2 8'1 94 AZ48 &B!t09] 9699 A3A24B 74 .. . IT . A . .. . . I Q .. B .. _!t._OM)..f2£.A.83!.UR4Df3 A p 8540C 1 9943,942 A340AJ 9_5A44Q8889A240A6969993845D404062069540AZ97~57A4040C 281 A.L..:._ • I _JHt..a59.3.B 1 8999 8 5 5E ,c. 38 8,.. 40 p1 S9A 3 89AZA]40 8195 8440888942 4066 969993 844840 40 62C99 ,8393 A484 8SA 24 0 82 89 82 93 • l • 8 • c p • l 8 . 2 1 - 1 8 __ag_g.6J.l9.!Ul...9.1.8..8.AB..~.M..O!tll6 ZE 1 48 .S.OCZBI 448485 9181 8~998 5684 OC388 81 9993851240078~ 8599998 5~0F 1 FA FZF 169F li_$_ 6 7 • J • . P " G . • . . I I ~ K , R • I. I 1 • Fbf74B401o~llJ..S.ItMS..3-UA36B40C78596998785AZ48404QC9C94840Q2969T916Q400996828599A348404~C2CCJ4840 T - 8 • I S -E.3...8.9..U93SSiAie(v.dc281A.§84esqJ8reQgCjai§5Elr0&38885408t' 9iU389Az.AlA1lW~.889A240A69699~lll448401tDEZ8599. ····· - ~ . . ·-··~~· · ······"'· - -·~ ... . · . ,.·.-~ •; . . - . .. .. .... ... . I II ·ll__O C 0 1 C I o·., l 6 0 3 3 P 2 C 2 6 C • C -.e.g..as.A2SD3701 E400QO£il.ED.E.OE..O£.l.E0Flf00099f9~0F6FOF3F]4007~_f_2F6407999858679401tOC381249761l!!.QU888199 Fig. 9. PREAFR Output Data-Printed From Tape Record PREAFR produces a file of variable length, machine-readable records (Figure 9) which are passed to AFR for formatting into a MARC struc- ture with limited MARC II tagging as described in the section on AFR. RECORD PROOFING AND CORRECTING PRINTS US PRINTSUS is an output program which provides formatted AFR- MARC II records, showing field tag, subfield delimiters, indicators, etc. This printout is designed for proofing of the MARC records created by AFR. Samples of this type output appear in Figures 6 and 7. BIBCON /GIBSON 247 FIX By processing data according to "FIX commands" this program corrects records in MARC format, operating as a context editor. Corrections can be made to content or structure. Entire records can be deleted and new records can be created using FIX "correction" statements. When any change is made, FIX automatically updates the record's leader and direc- tory to reflect the record as changed. There are two input files: bibliographic records, in MARC format, and the FIX correction data. The input records file must be in MARC format and must be in the same order (by record I.D. numbers) as the FIX correc- tion data file in order to successfully update the records. The FIX program method of making corrections is based on the FIX ex- pression, which can be considered as a "language," with rules of grammar governing the structure of expressions (sentences), the order of elements within the expressions and the possible contents of each element (see Fig- ure 10). Output Processor The output processor consists of three programs and an IBM utility sort program. These general-purpose programs, which are designed to create book catalog page output, allow a variety of options for sorting as well as formatting. SORT KEY EDIT ( SKED) This program performs two major functions (Figure 11) as follows: (a) from a single MARC record it creates a record for each point of ac- cess to that record as specified by the program user; and (b) it establishes a 256 character sort key at the head of each record extracted. The file is then passed to an IBM sort package for sequencing. Record Extraction SKED does not actually extract data from the original MARC record. Instead, it replicates the full record for each access point specified. It is left to the BIBLIST program to extract the required data from these rec- ords. Thus, if a particular bibliographic record should have five access points (one for main entry, one for title, two for subjects, and one for some other added entry), SKED would output five full MARC records. Es- sentially the only differences in the output SKED records would be in the data found in the sort keys prefixed to each record. The record for main entry access would contain main entry data as its first element; the title en- try access record would contain title data first, etc. Sort Key Creation Data for the sort key are selected on the basis of user-specified tables. RF.CORD 0200380 . 01 FIX RIPR1'SS!ON 3 1 s 1 1)50 1 I J~PUT l£2 a 1. Loans, Parsonal - S~n Francisco. OOT !'Ul' l.'liJ 1 · a 1 . Loans, Parson u - s an !' nncLsco. 02 PIX !IPR~SSION 2 1 ' c' d FIX COMMANDS IN THIS SAMPLE I~PUT 2~'; 0 a San Prancisc:). v. c2~CI. ttSSnlN OUTPUT 245 0 a san F'~ancisco. v.24c!ll. 01 03 FIX EXPRPSSION 2 1 1 v.24cll. 1 i 1 300 1 I 02 1 NP U:I' l ll S ,, " ~-~'Pr.~::rn·r. r#ic-o~·-v·;ncm. v~ OUTPUT 300 a v. 24cm. 04 245 0 a se:n l'rancisco. OS 04 FII ~IPRESSJON 2 1 I i!l I cd 1 1 'Report.• INPUT 245 0 a san 17ranci:;c:>. - 100 10 a san Francisco Reme~ial Loan Association. OUTPUT 245 n a Reoort.san Francisco. 100 10 a S<~.n Francisco Remedial Loan Association. u::>7IX :O.:XYR~ SSTOH-------z-1-Tport.-~-C .., -. HPU'! O!JTPfl'!' 245 0 245 0 a a P.e~ort.san Francisco. ....... Report.._fan Francisco. Fig. 10. Sample FIX Data, Illustrating FIX Operations COO£ MEANING 8 set field tag c ~data ~ ify -fi·era- cd ~&~ ata c ~data Report. ~ 00 ...... I ....... .Q., t""« J J a No 0 ;:s < ~ 0';) ~ tJ Cl) @ g_ ~ ..... ~ GENERAL DESCRIPTION OF SKED SUBSYSTEM FUNCTIONS OF PRINCIPAL TABLES ORIGINAL MARC REC. BROAD PROGRAM FUNCTIONS F!ELDTABLE 1. EXPLOSION OF RECORD. User indicates the fields to be extracted Leader Directory Cant'! Var. Field table is used to determine and their heirarchy. Fields Data number of "Skcd Records" to make. Stream For each field selected, the next table is 2. EXTRACTION OF DATA; SORT consulted by the program. KEY BUILD For each Sked record to be made, I the program cascades through the three tables as is required, in order FIELD CONTROL TABLE to properly extract the data from the MARC record to build sort keys. For each field selected by the field table, the control table provides a pointer to the 256 byte sort key is created. matching subfield control entry in the next 3. CREATION OF SKED RECORDS . 1-..;':;:ab~le=( see=be==-lo=::w~).;_. -:--:---------1 Original MARC record is rep1icated "SUPERBLANK": Indicates the number · of and unique sort key is affixed to blanks to be placed at the beginning and the front end. end of each field. ~....;..;..._....;..;..._-,rl~---j SUBFIELD CONT'L TABLE SEQUENCING: For each field selected, indicates the sequence for subflelds to be placed in sort key. BLANKS: Indicates if blanks are to be placed at the end of each sub6eld (for SKED ~--~so~rt~in~g~p~u~~~se~s~)~·------------------~ RECOtJRS I I INITIAL ARTICLES: Indicates if Initial EXAMPL E nrticles should be dropped in each subfield. Conditions Require 3 MISCELLANEOUS EDITING: All upper SKED Records ca:<>e ; remove blanks; drop punctuation; etc. -----------.---_;;C;.;:O;.;.P;.;.Y_O;;,;F;._., + COPY OF COPY OF - SORT KEY NUMBER 1 - MARC DATA - SORT KEY NUMBER 2 -MARC DATA SORT KEY NUMBER 3 - MARC DATA R (Main Entry) M 1-0 ~ v ... t/': R (Title Added Entry) M II// 1/: I/ v.~ R (Name Added Entry) M vv v v v E SWTH, JOHN@JFIRST I V.1 !/':: f/.; V .I E FIRST WORLD WAR 40 I I' /[/ V E DOE, JANE 40 FIRST I 1' .1 r -' C WORLDWA!r@)I96710 S v/V/ J/ V.1 C SMITH,JOHN40!96710 S r/V/v V C WORLDWAR40SMITH, S I .1 [/ VV V/ c l /.-t:~:vv . f- c l/ v / 1-- JOHN40196710 c 1/ v ~ I~V I/~ ~ l/ ~l~v: ~ [/VV~ ~ ~ 256 Byte -----i ~~ ~ ~ ~ +--- 256 B)ie _____, ~ v rj~ ~ ~ 256 Byte~ I.~~ f~ V: Fig.ll. General Desc1·iption of SKED Sttbsystem tl:l ....... tl:l r.J a ~ ...______ 0 1--( t::d {J') 0 z 1.'0 ~ ---------------·- ··· ·· _____ _____ !;;_l}I..:t.fO~l'UlUl.'l'A't'lLLI..l.ffiAIU__ ____ _______ _____ _ ·-------··--·- ··- PAGE 3 SAMPLE SKED TABLES - AUTHOR/TITLE LIST --·- ·-· ------ ·· ··- --- ------------- - ···- ·- ~---· ------------------------- ···-------······· ···----- tnc: nA.IFr:T r.nnF. ~II DR I Af.IJR? ST ~T 50URt:E STI\TF.MENT Fl51lCT70 l./OH7Z 11'1 *· ······ ... . .. .. . .. ·- .. .. .... _. . ......... 0~01)~'>~0 ~·1 * F I' F F F F F F F F F F U 'J11)fl~'l70 'l() * L 'L L L l l l l L l L L N 01\0fJil'>~O --·----·--·-'ll........ o __ -lJ _ ___ o. o. o ........ o.o o o . o .. ........ o o u ... __ DO'JIJO'''JO .. ... 'l7 * T T T C l L [J 9 2 F I I S 001101000 I< l (I 1)1)1)1)( 0'•0 97 ... ·--- ......... - ·-· . - ···· .... -·-- ...... ..... .. . ... ....... 1)!]1\0111~1) ~q • 000010~0 q9 o BYTE 0 .I I> 1 ~ I 1 I I I I 1 I 000n1 OHI ---------- ---·--------· ·--- ··--100 .. ~ _____ ........... ---·· ....................... 0 .• 1 .l.3 4 ... 7 .. 6 .9 .. ·· -· ···--·--·----. O~(IOIOAO .. .. 101 * OOOQII)QQ tn7 • IJnnnlti)O .. ona1 1 r:...Ftof:lf.~ll~tlJCUf51l5----·--·-.lCJ _______ IJC ___ .J:.! 400 '· ,x .0. ilOOQ OOMM00010' SEP.IF.S-COR 0·101JII~? 0 OIH6C f?F.-1 r•;t~'l:l~O~F'li:S 1 cq llC (. '<''•5', X' 0 C·1•J10E 51JSQ01)01·70000il000f\0QDF r.IJ01J0' ITt'.( I OIJOIJl\60 .. Oilll.!~\l. l'~f' '-oX 1.000000.ESU500JUF 1000000 JOOOOO~ 10000' ..... .IME.l---- 00001200 .- 00r.'.i 11'1 l.lC r. 'I XX 1 1X 'l)tlllOCl0[~tJSC.1'1ClF30flllOOOIHJ00~0 !)0D0•10' 000 1)1 Z 10 OOIHIIO FOF~fRIH)OtiOIClCI II~ llC C't11Jfl 1 oX'UllQ007f.lr.lfl~0'•F'o70 ............... ··- ·----· -·--· ----·---·- ·-----· __ 115 .. ~.----------· ·-·- ·-·····-- -·-- ...... ... . - ---- ..... ______ .. 00001730 -· OO.nft, f4F4rC:rooor,nF'>r.~ 11~ OC f.''I'I0 1 ,X'003:JOOr-51J'i00<11JfiFFfFFl101101lCOOI)MO• SERIES-TTL 000!112'10 00;1'\FR F?F'ol' 'ir.olO'lrllF'>f.'i 117 I;C C'2'~~;•,X•0•J0·1CJOr~·.J~OOilOF?000003100001F'l0000' ITMEI COIJOIZ50 -'lOO•oOC .FOFOI:tl0._ )Jq * ~O~!JIZ•O oo14'C! fOftl'rr.::n .woF>r: 'i I?O nr. f"! l'1' ,x•o'l!iCJil11Jru-rrrr-.l'lU~OOO~F 3JOOUOOOOCOOO ·~•HJ000' 00')()1300 00:14~1: FOFOHtlflOOLS. --IHni-Corps•en. (Chico, 196q] 50p. illas. Sponsored bJ the California Dept. of Education, Dhision of Co11peasatory Education, Bureau of CoU•uDity SerYices aDd. :Hgra.nt Educat.ion, Peb.-A'119· 1968. 1 .. children of aign.nt laborers--Educatioa. 2~ Teachers--Butte co., Calif. 1. F.lecurntaty and secondu:y educatioa act. If. Educatioa--California.- -Buite co. 5>. social17 baDdicapped c:.taildrea-- Educa.tion. c:GPS GPS CI.I.IPORirlll. &D'US081 COI!ftiSSio• 01 VOftElC~ --Day careo .. Tnnscrip.t of the ..,ublic hearlnq held joJntly with senate aad Asse•blt social Melfec~ coaa ittces. san rrancisco, October 11, 18, 19h~. 22dp .. 1. D-ty nurseries .. ll@b!t D) GPS CALIPOR:tU. ADYISOitf COrllSITTEE 01 COftP!MSATORf EDUClTIOII., -.:..Reconeadations for Pxpansion by the California State Legislature of the state Cospctn1iatarr Education ptograll base" on tho rtcAt@~r act. [ 196~?) 46p. Paul r. J,avtence. chau:•.._n. 1. CaUCornia. Advisory couitt.ee 'ln co•pt!~satorr Education. 2. t't[ceptional children--Uuc:at1on. l .. Socially handicApped childreD--Educahon. C170 Rl CPS CALIPORIIU. BUREATJ 0, ET.E~EMURI f.DUCA1'101. --CJilifot'nia proql."a.ll for tbe ear" of children of \forkinq po\rcnts. S<~cra~ento, Califocnia state Dept. of e;tucation ( tqu] 1 h, 12'S pa incl. lllusa (plaas) tllbles. for11s. 21 c•. (CaliforniAa Dept. of education. Bulletin of tlle California state DP.['artaent nt education, wol. Ill • no.6l .. Prepare4 by the Diwision of ele11entary el\ucation.•-- rorevord .. Rihlioqraph:!': p. 79-101. 1 .. DctJ nur~e~:im;. 2. childr•n--Cbarities. protll!'ction, etc.--Cctlifornia.. J. World vat, 19)9-- Childu·n. 1200 88 '1. 12 ao.,6 CPS 'rh~ California children's centers and preschool educatioDal proq~ss. See under: Califoroh .. Legislati•• laalyst .. L42'l C41 GPS C&LIPORifU. CO!'!rsiSStOlf POR SPF.CUL EOtJCAtiOJ,. --The education of pbysicdJ., handic .. pc~ children. Prepare~ by the C0211ir.~ion for:- spP.Cii\1 P.dllCiltioD of thP. CaliforD.ia state Oepat't:.ent of ll!'dueation. Saer&aento. Calitot'nia state Dopt. ot elluc:ttioa (19111] wiii, 121, (1] p .. 2 .1 ca. (California. Dept. of education. nullnti~ of the Callf:>rnia state Oepa~:ti!IPnt of education, •oL x. no.12) lt h"ad of title: ••• Oecpallel', 19111.Coataias biblioqrapbies. 1. Oefecti't'e and deliaqu.,nt claases--P.~IIColt:ioa. 2 .. Cbildrea, lbnol'•al and bacltvard. l. Edue"tl01l and childten. "· !docl!ltion--Califoraia. £200 88 '1.,10 D0.1Z GPS ClLIPO!UUl. COD!DUlTllfC COnNCIL POR HICII!II EDUCATIOM. --California biqbet educ;stion IID4 the dis:a!!Yantaged: a status report. n6a. 67p. on cou~r: lh1•ber 10J2. 1. Education .. lligbt>t. 2. nnlu~r!litiP.s and colleges--California. 1. student aid--california. If. socially handicapped childrP.n--!ducatioa. El'JO 06 GP5 --Co1lifornh hiqher Adacatlon and the disadvantaC)e4; a status rP.port 68-2 for prosentation to the Council, Pehruary 1CJ, 1968. 86p. 1 [ Pablie .. tion] 68-2) 1. EduciltioD. lliqher. 2. Schohrships--califotnia. l. Students--Califot"nh. 4 .. Stud<-nt aid-- Califot'nia. 'l. Socidly handicApped children-- Califol'nia. to .. llniversities and c:olleqes-- California. 1. "P.r~onnPl s~rwice in P.ducatlon-- California. R. Uniwcr s itie:o; and colleqes-- Ca li fornia--r.n tra nc" rtoq ui rfl'•en ts. J"1lf0 llh GP!l --nse- of cxc,..pt ion!; to ad .. i:lRion!'; s t"'rtdar-ls for a.:il!li:;sion t"t .!iS·l•h.1nt.,'l ~d !';tU:1Pnt,,: !l nivPrRity of CalitornirJP.S--F.ntraneP. rr.quireaP.nts. 2. Univ~rr.itics anr1 cotlr.ttea .. - California. l .. CollpP.ns.ltory ~r1ucation. F.19D Al GPS Fig. 15. Sample BIBCON Output: CSL Education Catalog rials. Additionally, portions of the software have been transferred successfully to the Hennepin County Library, Minnesota. Disadvantages: 1. Personnel Dependency: BAL: The system is written in Basic Assembler Language, thus neces- sitating the services of an experienced programmer. MARC: Because the system operates upon MARC structured record format, the average programmer may well have a difficult time in dealing with the added complexities introduced by this aspect. 256 Journal of Library Automation Vol. 6/ 4 December 1973 OPTIONS: The wide range of options provided by the system neces- sitates highly complex programs which may be difficult for the average programmer to grasp readily. 2. Equipment Dependency: IBM: Because the programs are written in IBM Basic Assembler Lan- guage, the system is presently usable on IBM equipment only. Conclusion The BIBCON-360 .system is a versatile and inexpensive method for pro- ducing book catalogs, when a wide range of format options are i·equired and when the catalogs must contain bibliographic information with more than one entry or access point per bibliographic record. If a simple, main entry catalog is needed, microfilm reproduction of the catalog cards may still be much cheaper. BIBCON-360 is most useful for producing large scale catalogs (e.g., un- ion catalogs) to be distributed widely to assist in the effort to provide the widest possible dissemination of library information at the least possible cost. REFERENCES I. California. State Library, Sacramento. Automation Project, A Users' Manual for BIBCON 360; A File Manngement System for Bibliographic Records Control {Sacramento: California State Library, 1972), 274p. (This manual, produced in limited quantities, is now available only on interlibrary loan.) 2. University of California, Santa Cruz, Author-Title Catalog of the University Library (Santa Cruz: University of California, 1970), 32 v. 3. University of California, San Diego, Autlwr-Title Catalog; Subject Catalog (San Diego : San Diego Medical Society-University Library, 1969), 350p. 4. California. University. Institute of Library Research, University of California Union Catalog of Mortograplu; Cataloged by the Nine Campuses From 1963 Through 1967; A Supplement to the Catalogs of the University Libraries at Berkeley and Los Angeles Published in 1963 (Berkeley: University of California, 1972), 47 v. 5. U.S. Library of Congress. Informatilm Systems Office, MARC Manuals Used by the Library of Congress (Chicago: ALA, 1970), p.42. 6. California. State Library, Sacramento, Recent Works in the California State Library in Science and Technology (Sacramento: California State Library, 1972) , p.426. 7. California. State Library, Sacramento, Special Education Problems; A Catalcg of Materials in the California State Library (Unpublished). (This topical catalog was output only to test refinements to the BIBCON-360 programs. It was not published, but the sample pages produced illustrate furth er refinements in formatting and sorting routines.) 5786 ---- lib-s-mocs-kmc364-20141005045627 257 Technical Communications ANNOUNCEMENTS ISAD Institute on Bibliographic Networking Information Science and Automation Division (IS AD) of the American Library Association will hold an Institute in New Orleans on February 28-March 1, 1974 at the Monteleone Hotel in the French Quarter. The subject of the institute will be "Alternatives in Bibliographic Net- working, or How to Use Automation Without Doing It Yourself." The seminar will review the options available in co- operative cataloging and library networks, provide a framework for identifying prob- lems and selecting alternative cataloging systems on a functional basis, and suggest evaluation strategies and decision models to aid in making choices among alterna- tive bibliographic networking systems. The institute is designed to assist the participant in solving problems and in se- lecting the best system for a library. Methods of cost analysis and evaluation of alternative systems will be presented and special attention will be given to comparing on-line systems with micro- fiche-based systems. The speakers and panelists are recog- nized authorities in bibliographic network- ing and automated cataloging systems and will include: James Rizzolo, New York Public Library; Maryann Duggan, SLICE; Jean L. Connor, New York State Library; Maurice Freedman, Hennepin County Li- brary, Minneapolis; Brett Butler, Informa- tion Design, Inc.; and Michael Malinconi- co, New York Public Library. The cost will be $60 for ALA members and $75 for nonmembers. For hotel reser- v~tion information and a registration blank, write to Donald P. Hammer· ISAD; American Library Association; 50 E. Huron St.; Chicago, IL 60611. P.S. Mardi Gras is February 26! ISAD Forms Committee on Technical StandariU for Library Automation (TESLA) The Information Science and Automa- tion Division of the American Library As- sociation now has a Committee on Tech- nical Standards for Library Automation (TESLA). TESLA, recently formed with the ap- proval of the ISAD Board of Directors will act primarily as: a clearinghouse fo; technical standards relating to library au- tomation; a focal point for information re- lating to automation standards; and a co- ordinator of standards proposals with ap- propriate organizations, e.g., the American National Standards Institute the Electron- ic Industries Association, N~tional Associ- ation of State Information Systems. The committee's initial work will be to formulate areas and priorities in which standards are required, to document exist- ing standards sources, and to develop a "library" of applicable standards to be drawn upon by the membership of ALA. According to the new committee's chairman, John Kountz, California State Universities and Colleges, "It is auspicious that this time be selected for the imple- mentation of a standards committee for library automation. With the current in- troduction en masse of production library automation systems and the fading of re- search and development activities, such standards will come into good use as they may be developed for library automation. In addition, the close linkage with new developments such as the Information In- dustries Association and the availability of standardized data bases, hardware, and communication standards are becom- ing requirements. The standards which shall be emphasized in the committee ac- tivities are those relating to areas of inter- est- for administrators and automators 258 Journal of Library Automation Vol. 6/ 4 December 1973 alike. These standards are intended to fill the void for future library automation op- erations." The committee efforts should be mea- sured in terms of facilitating the automa- tion of library functions as required on an individual library basis. Information relat- ing to the standards committee activities and its scope, or general information re- lating to library information technical standards, should be addressed to: ALA/ ISAD Committee on Technical Standards for Library Automation, John Kountz, Chairman, 5670 Wilshire Blvd., Suite 900, Los Angeles, CA 90036. Formation of an Ad Hoc Disetl$sion Group on Serials Data Bases As a result of an informal meeting held during the ALA Conference in Las Vegas to discuss the problems associated with the establishment and maintenance of union lists of serials, an Ad Hoc Discus- sion Group on Serials Data Bases was formed, with Richard Anable acting as in- terim coordinator. The Council on Library Resources agreed to fund a meeting of the group's steering committee on September 21, 1973 at York University in Toronto, Canada. Many of the major union list ac- tivities on this continent will be represent- ed as well as the National Libraries and ISDS National Centers from both Canada and the United States. A list of the subgroups that have been formed gives a good idea of the individu- al problem areas which the group is tack- ling: a. Record format comparison b. Minimum record data element re- quirements c. Cooperative conversion arrangements d. Organizational relationships and grant support e. Holding statement notation f. Bibliographic standards g. Authority files h. Software evaluation and exchange A detailed description of the history and activities of the Discussion Group can be found on page 207 of this issue. For further information contact: Rich- ard Anable, York University, Downsview, Ontario, Canada, M3J 2R2, ( 416) 667- 3789. TECHNICAL EXCHANGES File Conversion Using Optical Scanning: A Comparison of the Systems Employed by the University of Minnesota and the University of California, Berkeley By this time most large libraries in the U.S. have converted into machine-read~ able form at least some of their IDes. Most of them, however, have used relatively in- efficient techniques (such as key-punch- ing) or relatively expensive ones (such as on-line data entry). It was with pleasure, then, that I read Ms. Grosch's recent ar- ticle ("Computer-based Subject Authority Files at the University of Minnesota Li- braries," Journal of Library Automation, Dec. 1972) describing a conversion tech- nique that she, like the library at the University of California at Berkeley, has found to be extremely cost effective, namely optical character recognition using a CDC 915 scanner. Berkeley has used (and still is using) this technique in its efforts to create what will soon be among the largest machine- readable serials files of any university in the world. That me currently contains records for over 50,000 serials (in the MARC structure). It is expected to con- tain records for about 90,000 unique ti- tles (approximately 30 million characters) before the end of the current fiscal year. Based on our eJ~:perience in this undertak- ing, I would like to offer the following comments on the use of the CDC 915 scanner as it is used in Minneapolis and in Berkeley. Costs-It should be crystal clear that the main reason for using the scanner is cost of the keyboarding device. That is, the keyboarding device for the CDC 911S scanner is an ordinary ten pitch Selectric typewriter which can be purchased for under $500.00 or rented for from $II.OO to $30.00 per month. When not used as a computer input device the machine functions as a normal office typewriter. A device like an MT/ST that rents for about $110.00 a month costs about $.60 an hour for every hour it is used, or ten times as much. Keyboard operators for a typewriter are easily obtained since there is no need to train an operator in the idiosyncrasies of keypunch cards, CRT terminals, magnetic or paper tape devices, etc. Keyboarding is fast and easy, especially when com- pared to a key punch. Mistakes are easily corrected by, for example, merely crossing out the character ( s) in error. Keyboarding on a Selectric for a scan- ner and keyboarding on a device like the MT/ST both require a "converter" (the scanner itself or the MT/ST-to-computer- tape converter) . These "converters" are equally available and the decision to use one keyboarding device over another should not hinge on the "availability" of such "converters," as is usually the case. In addition to selecting a cost-effective keyboarding device, Minnesota has also operated a system that delivers the data to the keyboarding device in an efficient manner: the typing is done from the source document itself, rather than from a copy of that document that has been transcribed onto a "coding sheet" or a photocopy of that document. Ms. Grosch points out that photocopying the source document would have raised the project costs by about 50 percent. In addition, keyboarding from photocopied documents would probably have been much slower and less accurate. The Berkeley typists al- so keyboard from the original document, even when that document is a public cata- log card that must be temporarily marked up in order to resolve ambiguities for the typists. Supplies-It is true that the ordinary Selectric typewriter (without the pinfeed platen) performs satisfactorily. Thus, one does not need continuous forms for the typewriter. Indeed, it is not necessary even to use a "stock form"; plain 20 pound white long grain paper will do. We use Zellerbach's Hammermill Bond 820, which costs $2 a ream. At Minnesota, using this paper instead of the "stock form" would probably have reduced the supplies cost from $400 to less than $25. Technical Communicat!ons 259 Had a keypunch been used, the operation would probably have required about $150 worth of IBM cards. Scanner Throughput-Careful design of the format of the data on the typed sheet can substantially improve through- put on the CDC 915 scanner. With dou- ble spaced typing (three lines per inch), the CDC scanner is capable of reading data at the rate of over a half million characters an hour, or about twice as fast as was actually achieved at Minnesota. Thus, with altered design of the input format, about half of the cost of the "con- verter" -the scanner-could· have been saved, representing an additional savings of $500. The principle applied to maximize throughput on a scanner such as the CDC 915 is to enter as much data as possible on a line and as many lines as possible on a page without crowding the data so much as to cause the machine to misread. (The machine enforces stricter tolerances as its capabilities are pushed to their lim- its.) One wants to get as much as possible on a line for the same reason that one wants to get as much as possible onto a punched card: there is a fair amount of machine overhead involved in advancing to the next line and/or page. The Berkeley system uses a sheet of pa- per that is 8}~ x 14 inches in size, and the typists type each line a full 63~ inches long. Typing is double-spaced (even though the machine is capable of handling single-spaced typing) because this in- creases the vertical skew tolerance from ~ of a character height to a full character height. Figure 1 is an example of a page typed at Berkeley. At Berkeley, more than one field may be placed on a line, each field being sep- arated by the "fork" character (Y) . Like Minnesota, typists identify each field by a one-character code at the beginning of the field (A for Author, T for title, H for holdings, C for call number, B for branch library location, etc.). Typists are instruct- ed to type until the margin locks. The be- ginning of each logical record is identified by the ''chair" character (r\) plus the typ- ist's initials at the beginning of the line. Thus the entire line is utilized, and the 260 Journal of Library Automation Vol. 6/4 December 1973 nSSYA=FAOYTPROCEEDINGS OF A SYMPOSIUM ON MAN MADE FORESTS AND TH EIR INDUSTRIAL IMPORTANCE, CANBERRA, 19b7YH1-3, 19b7//YCSD118.FS nSSYA=FAOYTREPORT ON A SURVEY OF THE AWASH RIVER BASINYH1-S, 1965// nSSYA=FAOYTETUDES PEDROHYDROLOGIQUES, TOGOYH1-3, 19b7//YCS599.C7Fb Fig.l. Berkeley Optical Scanner Input. machine is not required to read a large number of blank spaces at the beginning of the line (which, as Ms. Grosch points out, it has trouble doing since it cannot readily tell whether six blanks may, in fact, be really five or seven blanks) . We generally do not proofread the sheets after they are typed. We have found that when proofreading is necessary (usually during training), it is not difficult to proofread data typed in the format that we use. Data Element Identification-At Berke- ley, as at Minnesota, the typist identifies the data element (e.g., the author or the title) rather than relying on a computer algorithm of the kind used by the Library of Congress or the Institute of Library Re- search (automatic format recognition). This approach was selected because it was felt (a) that the typist could perform this task better than the computer could, and (b) that the routine nature of the typing job necessitated the insertion of more meaningful tasks for the typists. The data presented to the typists for interpretation can be in a wide variety of languages and may be transcribed on the source docu- ment according to any one of the conven- tions used by the library during the past several decades. Typing Throughput-The Berkeley conversion system includes the use of cer- tain "super abbreviations" that typists may use in place of commonly occurring words or phrases. All such abbreviations are two or three characters in length and are pre- ceded by an equal sign. For example, "= F AO" is translated into "Food and Ag- ricultural Organization of the United Na- tions," by the computer software. Al- though this substantially improves key- boarding throughput, its chief advantage is the insurance that the long phrase is en- tered into the :file correctly and consistent- ly. I personally find the requirement that the typist at Minnesota type the "format recognition line'; at the top of each sheet in .order to avoid the necessity of a "com- plete rerunning of the job" to be not only wasteful, but playing brinkmanship with systems design. Expanding the Character Set-Al- though the CDC 915 scanner is capable of reading only the OCR A font (an all upper case font), it is relatively simple to produce upper-and-lower case output from data input via the CDC 915. Two alternatives are: 1. Have the typist key a special char- acter that means "next character is to be capitalized" before each upper case character (the technique used by typists throughout the Western world, in the form of the shift key) . If, for the CDC scanner, the dollar sign were chosen to be that special character, the "$JOHN" would rep- resent "John" and "JOHN" would represent "john." This technique can be used to expand the keyboard to include diacritical marks. A Berke~ ley typist keys "ESP AN%EOL" to produce "espafiol," since the com- puter translates %E into a tilde over the preceding character. 2. Do all capitalization by logic con- tained within the software. A primi- tive computer algorithm might sim- ply say "capitalize the first word of every sentence plus the following proper nouns .... " The Berkeley li- brary currently uses such a tech- nique for the capitalization of words in serial entries. This has been done in order to print out the serial en- tries following standard rules of Technical Communications 261 CALL NUMBER University of California - Berkeley - General "Library Serials Key Word Index Page 372 HOLDINGS QA76.AlC545 Z699.AlH3 QA76.Al!4.R4 QA74.Ali65 QA16.All5549 QA16.A1!555 COMPUTERS QA76.A1A36 ENGI Advances in COMPUTERS , QA76.AlA36 HATH Advances in COMPUTERS , , , •••• QA76.AlC56 ENG! COMPUTERS and Automation •• , •• , RA409.5.AlC65 BIOL COMPUTERS and Biomedical Research • QC.145.2.C6 ENG! COMPUTERS and Fluids. , .•• , ••. , , , , •• 1,1960- 1, 1960- 2, 1953- 1, 1967- 1, 1913- 1968 TA64l.C65 LIBR COMPUTERS and Libraries; an Australian Directory. 1, 1911-ENG! COMPUTERS and Structures. , • • • • • . • • • , • QA76.5.A1C65 MAIN COMPUTERS and the Humanities ••••••••••• 1, Sep. 196* 5, 1910/11- 0n Order UNDE COMPUTERS and the Humanities ••••••••••• On Order HATH COMPUTERS, Control and Information Theory •••• LB1028.5.A1C* ~~g~ g~~~~~~~e igna~~~6tEi~di~e~~;in~der9r~d~~te • c~r~i~uia: P~o~e~dln9s: f~~ 8~~~ AlH65 * i:gi ~~~i~~~ng~n~~r:~~;t~~n~ir~~~i~:E~~Ste;~n!~~egg:Pb~~~~~' PrOc~edi~gS 1, 1970- 1, 1970- 1,1958- 1,1969- 1:18, Dec.l* 1, Mar. 196* 1, 1971- TI<6540.I55 ENG! IEEE Transactions on COMPUTERS , •••••• , • , • Z699.AlP76 LIBR Program; News of COMPUTERS in Libraries •••• , , , , ENGI Quarterly Bibliography COMPUTERS and Data Processing ••• 8~l~:~~2Q3 :i~ ~!~f~rH:r ag~i3f~~~~y. o: ?~P:rr~R~ ~n~ ~a~a. P:o:e~s~n? : QA4 7. T7 MATH Tracts for COMPUTERS. • , , • . • , , , , • • • , , • • , 1, 1971- 1, 1919; 2* 11-161; 1S- Fig. 2. Berkeley's Serial Key Word Index: Sample Page. style, rather than the traditional rules of librarianship, namely every significant word in the title is cap- italized. (Did the library practice arise because early typewriters had shift keys that were hard to use?) Our computer algorithm says essen- tially "capitalize all words in the en- try except the following insignificant ones .... " This technique has creat- ed an upper-lower case file without having typists use the shift key, or its equivalent, .at least a half million times. Figure 2, a page from Berke- ley's Serials Key Word Index, illus- trates the results of this system. The Real Problem-! do not mean to imply that everything is rosy in file con- version land. A file conversion is a messy, difficult and essentially unproductive task, no matter how well done, because it merely transforms existing data into an- other form and in so doing exposes, for all to see, the "many ancient errors" which we do not want to see. It also exposes the "ambiguities" that were perhaps better left ambiguous, not to mention the incon- sistencies that have cropped up as library practices varied. I would suggest that any file conversion that works from files that have been built up over some time period requires more in the way of resources for the "cleansing" than for the conversion. That is, in the case of the subject authority files at Min- nesota, I would guess that far more than $5,296.21 (the total amount spent on typ- ists, keyboards, computers, supplies, etc.) was spent resolving ambiguities (before the drawer was handed to the typist) and "cleansing" the data in the one year be- tween the time when the data had been converted and the time that they were put to use. This has been our experience at Berkeley. Stephen Silberstein University of California, Berkeley 262 Journal of Library Automation Vol. 6/4 December 1973 REPORTS-LIBRARY PROJECTS AND ACTIVITIES Bucknell University Plans Entire Bibliographic File to Go On-Line Bucknell University's already strong computer-usage program is expected to be strengthened in 1973/74 to permit stu~ dents and faculty to conduct fast, accurate searches of the university library from any of thirty-five campus terminals. A $28,000 grant to the Bucknell University library from the Council on Library Resources is supporting this program. Seventy-five percent of Bucknell's stu- dents already use the campus computer in course work. And Bucknell's on-line li- brary data base includes records of ap-- proximately 25,000 of the library's 200,000 books. The council grant will en- able additional computer storage to be rented to permit the entire bibliographic .file at Bucknell to go on~line. The com- plete flle is already in machine-readable form. While Bucknell's current system enables a search of the on-line files by author- title, title alone, and Library of Congress (LC) number, its enlarged plan calls for subject search capability as well. Using LC classification numbers, a user will be able to ask the computer to locate and dis- play the authors and titles associated with the subject of interest, examine the near neighbors of his original hit in the file, or he may pick an author's name from the response and enter the system again on the author's name to see what else the au- thor may have written. Stanford University Data File Directory The Stanford University Data File Di~ rectory, compiled by Douglas Ferguson, is available as an example of a library- produced access publication for computer~ ized data files on a university campus. The directory lists and describes colle~ tions of social, economic, political, and scientific research data on punched cards, computer tape, and disk, located on the Stanford campus. Each file description di- rects the user to documentation and pub- lished research in the university library collection or elsewhere. Access to each data file is controlled by the owner and is listed in each file description. The di~ rectory is available, for prepayment of $4, from the Financial Office, Stanford University Libraries, Stanford, CA 94304. STANDARDS Editor Note: The recent flurry of activi- ty concerning standards which affect l~ brary automation, dilta bases, etc., is pointed up in the several actions reported in the last issues of TC. Perhaps the fu- tility of keeping up with standards and the need for a clearinghouse type of op- eration is best recognized by noting a sample of some recently adopted stan.. dilrds which now have or will potentially have ramifications in library automation. The following list does not represent a complete accounting of all pertinent stan- dards due to lack of a comprehensive source. Selected ANSI Standards Many ANSI Standards published in the ANSI categories of "Information Process- ing Systems" and "Information Systems" may be of interest to ISAD members. Selected items are listed below. The new American National Standards Insti- tute (ANSI) catalog is available free of charge from the Institute's Sales Depart- ment at 1430 Broadway, New York, NY 10018. The catalog lists " ISO Standards" and "ISO Recommendations" as well. X3.14 RECORDED MAGNETIC TAPE FOR INFORMATION INTERCHANGE (200 CPI, NRZI) (REVISION OF ANSI X3.14-I969)--Provides the standard technique for recording American Nation~ al Standard Code for Information Inter- change (ASCII), X3.4-l968, on magnetic tape at 200 characters per inch ( CPI) using nonretum-to-zero-change on ones (NRZI) recording techniques. Approval date: December 12, 1972. X3.38 COMPUTER CODE FOR STATES-X3.38-I912 provides two-digit numeric codes and two-character alpha- ------~------------------- betic abbreviations for both the states and the District of Columbia. The numeric codes will allow the states and the Dis- trict of Columbia to be sorted into alpha- betic sequence. ANSI X3.38-1972 may be obtained from the American National Standards Institute at $1.25 per copy. It was devel- oped under the secretariat of the Busi- ness Equipment Manufacturers Associa- tion. X3.31 STRUCTURE FOR THE IDEN- TIFICATION OF THE COUNTIES OF THE UNITED STATES FOR INFOR- MATION INTERCHANGE (NEW STANDARD )-Identifies a three-digit numeric code structure for the counties of the states of the United States, includ- ing the District of Columbia. Supersedes the listing which appeared in the March 26, 1971 issue of Standards Action. Ap- proval date: March 14, 1973. X3.39 RECORDED MAGNETIC TAPE FOR INFORMATION INTERCHANGE (1600 CPI, PHASE ENCODED) (NEW STANDARD)-Presents the standard technique for recording the coded charac- ter set provided in American National Standard Code for Information Inter- change, X3.4-1968 (ASCII) on magnetic tape at 1600 characters per inch (CPI) using phase recording techniques. Ap- proval date: March 7, 1973. X3.40 UNRECORDED MAGNETIC TAPE FOR INFORMATION INTER- CHANGE (9-TRACK 200 AND 800 CPI, NRZI, AND 1600 CPI, PE) (NEW STANDARD)-Presents the min- imum requirements for the physical and magnetic interchangeability requirements of ~-inch wide magnetic tape and reels between information processing systems, communication systems, and associated equipment using American National Stan- dard Code for Information Interchange, X3.4-1968 (ASCII). Approval date: March 5, 1973. BSR X3.41 CODE EXTENSION TECH- NIQUES FOR USE WITH THE 7-BIT CODED CHARACTER SET FOR ASCII (ANSI X3.4-1968) (NEW ,PROPOSED STANDARD)-Provides means for aug- menting the standard repertory of 128 Technical Communications 263 characters of American National Standard Code for Information Interchange, X3.4- 1968 (ASCII), with additional graphics or control functions, by extending the 7-bit code while remaining in a 7-bit en- vironment, or increasing to an 8-bit en- vironment in which ASCII is a subset. Order from: Business Equipment Man- ufacturers Association; 1828 L St., NW; Washington, DC 20036. Single copy price: Free. BSR X3.47 IDENTIFICATION OF NAMED POPULATED PLACES AND RELATED ENTITIES OF THE STATES OF THE UNITED STATES, STRUC- TURE FOR THE (NEW PROPOSED STANDARD)-Provides the structure for an unambiguous, five digit code for named populated cities, towns, villages, and similar communities and for several categories of named entities similar to these in one or more important respects. Order from: Business Equipment Man- ufacturers Association; 1828 L St., NW, Washington, DC 20036. Single copy price: Free. BSR X11 .6 OPERATIONAL DATA PROCESSING APPLICATIONS CON- TAINING CONSTITUTIONALLY PRO- TECTED DATA, DOCUMENTATION REQUIREMENTS FOR (NEW PRO- POSED STANDARD)-Provides all those involved with operating electronic data processing applications, involving consti- tutionally protected data, with a list of minimum documentary requirements which apply to such applications. Order from: Society of C ertified Data Processors, 38 Main St., Hudson, MA 01749. Single copy price: $2.00. BSR Xll.l CATEGORIES OF ERROR- CREATING CHARACTERISTICS OF VARIOUS DATA STORAGE SYSTEMS USED WITH ELECTRONIC DATA PROCESSING APPLICATIONS (NEW PROPOSED ST ANDARD)- Provides the consumers of electronic data processing applications and the suppliers and imple- mentors of such applications with a tech- nique for defining the error-generating capabilities that exist in the data st orage system used to hold the consumer data. It is one of a series of data storage stan- 264 Journal of Library Automation Vol. 6/4 December 1973 dards being prepared by the Society of Certified Data Processors Technical Stan- dards Committee, to provide a method whereby the application implementor ·and the application consumer may communi- cate easily, allowing the application con- sumer to take the responsibility for the ac- curacy of the maintenance of the data base by electronic data processing sys- tems. . Order from: Society of Certilled Data Processors, ATTN: Chairman, Technical Standards Committee, 38 Main St., Hud- son. MA 01749. Single copy price: $2.00. BSR X11.2 DATA ITEMS STORED IN GENERAL DATA BASES, CLASSIFI- CATION . OF (NEW PROPOSED STANDARD)-Provides the suppliers of data to a general data base \vith a means of communication with the operation of the base regarding the characteristics of the data items being supplied. Order from: Society of Certified Data P~o~ssors, ATTN: Chainnan, Technical Standards Committee, 38 Main St., Hud- son,.MA01749. Single copy price: $2.00. BSR Xl1.3 DATA BASE PROCESS- ING ACTIVITIES BASED ON DATA ITEMS USED, CATEGORIES OF (NEW PROPOSED STANDARD)-Provides the application designers of data base appli- cations and the operators of several data bases with a means of describing the characteristics of the data items stored in the data base. Order from: Society of Certified Data Processors, ATTN: Chairman, Technical Standards Committee, 38 Main St., Hud- son, MA 01749. Single copy price: $2.00. BSR X2.3.4-1959 CHARTING PAPER- WORK PROCEDURES , METHO D OF -This standard was one of the original input docume nts considered in the devel- opm.ent of American National Standard Fi.owchart Symbols and Their Usage in Information Processing, X3.5-1970 (orig- inally ANSI X3.5-1966). However, ANSI X2.3.4-1959 was not considered sufficient- ly useful to serve the needs of the commu- nity which now uses ANSI X3.5, nor at that time did X3 have responsibility for ANSI X2.3.4 or feel that it should initiate action to modify the older standard. The subject standard was subsequently as~ signed to American National Standards Committee X3 for review and revision, re- affirmation or withdrawal. Current review finds no interest in this standard, either in the form of users of the standard or of an organization desiring to assume its maintenance. Order from: American National Stan- dards Institute, Dept. BSR, 1430 Broad- way, New York, NY 10018. Single copy price: $1.00. SC/ 20 STANDARD SERIAL COD- ING-The American National Standard Identi.Scation Number· for Serial Publica- tions, .Z39.9-1971 is available from ANSI at $2.25 per copy. In June 1970, ISO/ TC 46/WG 1· accepted the system as out- lined in Z39.9-1971 -as the basis for the international standard numbering system. A final ISSN standard was presented to the Plenary Session on TG 46 in October 1972 at the Hague. The International Center (IC) of the International Serials Data System (ISDS) is responsible for the administration of the ISSN as a central authority. The IC-ISDS was established with headquarters in the Bibliotheque Nationale with financial support being shared by the French Government and UNESCO. The National Serials Data Program (NSDP) has been selected to serve as the United States National Center and as such is the sole agency responsible for the con- trol and assignment of ISSN in the U.S. (Note-The ANSI !STAB (Informa- tion Systems Technical Advisory Board) rejected the proposed ANSI Z219.1-1971 , Use of CODEN For Periodical Title Ab- breviations. This proposal had been sub- mitted to ANSI by the American Society for Testing and Materials in 1971 for ap- proval as an American National Standard; Z39 members were asked to comment on it during the public review in July and August 1971. After considerable discus- sion the 1ST AB came to the conclusion that the proposed standard was in conflict with 239.9-1971, the. ANSI Identification Number for Serial Publications.) SC/ 2 MACHINE INPUT RECORDS -The members of SC/ 2 have agreed that this standard cannot be written at this time. The purpose of the proposed standard was for general information in- terchange at the interface between data processing terminal equipment {such as data processors, data media input/ output devices, office machines, etc.) and data communications equipment (such as data sets, modems, etc.) . The decision was based on the fact that the problem of de- signing a format is not being addressed here (that standard already exists, name- ly Z39.2-1971) but rather the problem of ne twork protocol. Therefore, the trans- mission of the bibliographic record itself, taken in this context, is only a small part of the total picture. Subcommittee 2 has concluded, how- ever, that in the light of future develop- ments in network protocol, bibliographic data should be transmitted in the Z39.2- 1971 interchange format standard. In or- der to further this recommendation, the present Z39.2-1971, the American Nation- al Standard for Bibliographic Information Interchange on Magnetic Tape, will be revised by SCI 2 to reflect a broader scope, i.e., information interchange in digital form, with appropriate sections in the document describing the existing stan- dards for different media (the first of these would be magnetic tape since this standard already exists). This should have the effect of using the standard format in future systems via telecommunications as well as via magnetic tape. The additional sections discussing various media will aid the user of the format regardless of the media involved. INPUT To the Editor: Say it isn't so. Tell me that, as editor of Technical Communications, you are not responsible for the item on page 65 of val. 6, no. 1. I refer to the squib headed "Tomorrow's Library: Spools of Tape." I am particularly offended to see this kind of outdated foolishne ss promoted after noting two pages earlier that the new di- rections for Technical Communications will involve pertinent information about technical developments. How could a Technical Communications 265 publication entitled College Management possibly contribute technically significant information about such a specialized and sophisticated area as library automation? In general, I think blue sky articles are inappropriate for TC. Carl M. Spaulding Council on Library Resources The new format and content of Techni- cal Communications is expected to evolve, and thf18 no step function change was an- ticipated. In the meantime, while operat- ing on an accelerated publication schedule I have attempted to find pertinent (if not completely appropriate) articles for TC. I would like to see more contributions of hardcore technical communications from the field, but until people accept the new design for TC and contribute to it, the se- lections wiU be scarce. Incidentally, I have received some com- ment to the contrary, that perhaps a " Bltte Sky?" category of news notes in TC would serve the useful purpose af providing an- other perspective, or putting "far out" items into context. Certainly, contributions of the type submitted by Stephen Silber- stein in this issue and Justine Roberts in the last issue of TC represent the direc- tions envisaged for TC's content. In most technical fields there's a place for the proposed TC type of forum, and r m confident library automation and technology have a similar need. I would appreciate more readers' comments, and more importantly, brief write-ups of the technical aspects of your accomplishments ami. findings which would be of interest to ISAD members.-DLB POTPOURRI UNISIST International Serials Data System The International Serials Data System { ISDS ) as establi~hed within the frame- work of the UNISIST program, is an in- ternational network of operational centers, jointly responsible for the creation and maintenance of computer-based data banks. 266 Journal of Library Automation Vol. 6/ 4 December 1973 The objectives of the ISDS system are: a. To develop and maintain an inter- national register of serial publica- tions containing all the necessary in- formation for the identification of the serials. b. To define and promote the use of a standard code (ISSN) for the unique identification of each serial. c. To facilitate retrieval of scientific and technical information in serials. d. To make this information currently available to all countries, organiza- tions, or individual users. e. To establish a network of communi- cations between libraries, secondary information services, publishers of serial literature, and international or- ganizations . f. To promote international standards for bibliographic description, com- munication formats, and information exchange in the area of serial pub- lications. The ISDS is designed as a two-tier sys- tem consisting of: an International Centre (IC) National and Regional Centres The ISDS-International Centre is estab- lished in Paris by agreement between Unesco and the French Government. It is temporarily located at the Bibliotheque Nationale. The ISDS-IC will establish an interna- tional file of serials from all countries. This file will be limited, initially, to sci- entific and technical publications, and will be gradually extended to include all dis- ciplines. Each serial will receive an Internation- al Standard Serial Number (ISSN), which has been developed by the International Organization for Standardization (ISO). Products which could be derived from the International Serials Data System are as follows: Titles Index; ISSN Index; ISDS Reg- ister of . Periodicals (Register); Clas- sified Titles Index ( CTI); New and Amended Titles Index (N & AT); Cumulated New Titles (CNT); Per- muted Index; Microform Reference File (MRF). A magnetic tape service will be pro- vided of the current master file, and of the new and amended titles. The responsibility for the establishment of National or Regional Centres belongs to Unesco member states, and associate members who wish to participate in the UNISIST program. Upon establishment each National Cen- tre will obtain a block of ISSNs from the International Centre and will gradually take over the responsibility for the regis- tration of serials published in its territory. A regular information exchange pro- gram will be established between the na- tional centers and the international cen- ter. The international register will thus be a regularly updated cumulation of the ini- tial file established by the IC and the Na- tional or Regional files. Serials published in countries with no National or Regional Centres will be regis- tered by the International Centre, which will endeavor to obtain the necessary in- formation. The relationship with users of ISDS is primarily through National or Regional Centres, but this general rule does not ex- clude direct contact with the Internation- al Centre. The building of a consistent internation- al file of serials implies close cooperation between all members of ISDS. The work in all countries will be based on a common set of rules concerning: bib- liographic description, communication for- mat, character sets, abbreviations, trans- literation, etc. Coordination between all members of the system is one of the main tasks of the International Centre. Close cooperation has also been estab- lished with various international organi- zations, the objectives of which are close- ly related to those of ISDS. In November 1972 the Director-Gener- al of Unesco informed member states of the creation of the International Centre and has invited them to cooperate in ISDS by establishing national or regional cen- ters. To assist in the creation of these na- tional or regional centers provisional guideUnes were made available. These guidelines are at present being finalized and will shortly be widely distributed in English, French, Spanish, and Russian. The response of member states was most encouraging and to date the follow- ing countries have set up or are in the process of setting up national or regional centers: Argentina, Australia, Austria, Canada, Colombia, Dahomey, France, Federal Republic of Germany, Guatemala, India, Italy, Malta, New Zealand, Nigeria, Philippines, Union of Soviet Socialist Re- publics, United Kingdom, and United States of America. For further information and ISSN as- signment contact IS OS-Intemational Cen- tre, Bibliotheque Nationale, 58 rue de Richelieu, Paris 2eme, France. ADL to CondtJCt Study of the Data Base Publishing Industry Arthur D. Little, Inc., the Cambridge, Massachusetts, consulting firm, is launch- ing a major study of the data base pub- lishing industry. The study, which will be available on a subscription basis, will cov- er present and future technology utiliza- tion, economics, markets, and business and competitive structure. More specifi- cally, the study will: • Characterize typical data base pub- lishing activities in terms of markets, products, sales strategies, methods of data base collection, distribution, etc.; • Identify the current and expected roles of private industry sectors, gov- ernment, and professional associa- tions; • Analyze existing and latent markets for data base publishing ventures and estimate market growth over the next five years; • Describe criteria for analyzing the economics of data base publishing services and pricing them; • Review hardware, software, and de- velopments likely to affect the indus- try in the next five years, including emergence of lower-cost switched data networks; • Describe the probable impacts of Technical Communications 267 public policy and regulatory devel- opments, including copyright legis- lation, patentability of software, and concern over protection of confiden- tiality of personal information, and • Characterize the reasons for past failures of certain data base publish- ing ventures and propose strategies for successful involvement. The study will be directed by Vincent Giuliano and Robert Kvaal. Dr. Giuliano has extensive experience working with major information dissemination systems, ranging from libraries to telecommunica- tions-based computer systems. He has led a variety of systems development, systems analysis, evaluation, and market research projects at ADL. Mr. Kvaal has focused his recent work on strategic planning issues facing com- puter services companies, and on assisting computer users in financial institutions, retail and distribution companies. This work has included operational and man- agement audits, planning and implemen- tation assistance, management informa- tion systems development, and the overall design of a nationwide teleprocessing sys- tem. According to Giuliano and Kvaal, data base publishing enterprises tend to evolve through well-defined stages of automation and business development: maintenance of manual data bases (reports, clippings, etc.) and the manual preparation of con- ventional printed products; partial com- puterization of the data base and some computer usage in preparation of conven- tional printed products; considerable au- tomation of the data base and output process; offering of information retrieval and specialized search services on an over- night or phone call basis; and offering di- rect access to the data base via remote computer terminals. "But," Giuliano and Kvaal note, "the growing tendency of data base enterprises to evolve along this scale is creating dislocations in many of them, while at the same time, offering new op- portunities for participants and suppliers. This uncertainty makes a study such as ADL's especially useful at this point in the industry's development." 268 Journal of Library Automation Vol. 6/ 4 December 1973 The results of ADL's study will be pre- sented to clients in published form and in group meetings held in appropriate loca- tions. The cost to each subscriber is $2,000. Additional information may be obtained from Philip A. Untersee (617- 864-5770). PERTINENT PUBLICATIONS New 1973 ACM Publication Catalog The new expanded thirty-four page Publication Catalog of the Association for Computing Machinery has been released. The catalog covers technical publications in over thirty major segments of the com- puting and automation field. Copies are available upon request by writing to: Pub- lication Services Department, Association for Computing Machinery, 1133 Avenue of the Americas, New York, NY 10036. Proceedings of 1973 National Computer Conference The Proceedings of the 1973 National Computer Conference & Exposition are now available from the American Federa- tion of Information Processing Societies, Inc. (AFIPS). The Conference Proceedings, Volume 42, contains more than 160 technical pa- pers and abstracts covering a wide range of topics in Computer Science & Technol- ogy and Methods & Applications featured at the recent '73 NCC, June 4-8 in New York. The price of the 920 page hard-cov- er volume is $40. A reduced rate of $20 is available for prepaid orders from mem- bers of the AFIPS' Constituent Societies stating their affiliation and membership number. Copies of the Proceedings may be ordered from AFIPS Press, 210 Sum- mit Ave., Montvale, NJ 07645. Computerized Serials Systems The LARC Association announces a new publication series entitled Computer- ized Serials Systems. Each volume in the series will consist of six issues published · at bimonthly intervals in both paperback and hardbound editions. Each issue will be authored and edited by a person di- rectly affiliated with the project reported, and each issue will be devoted to papers relating to an automated serials project undertaken by a specific library. The for- mat of the new series is designed to pro- mote understanding through clear narra- tive description and extensive illustrative materials. For details concerning the purchase of individual issues or a subscription to the complete volume contact LARC Press, 105-117 West Fourth Ave., Peoria, IL 61602. -- 5787 ---- lib-s-mocs-kmc364-20141005045744 BOOK REVIEWS Key Papers in Informatwn Science. Edited by Arthur W. Elias. Washington, D.C.: American Society for Information Science, 1971. 223 p. $6.00. When I re-read the articles making up this volume for the purpose of writing this review, a strong feeling of nostalgia welled up. As a reader who has lived t?rough the years of speculation, explora- tion, experiment, development, and de- bate that they embody, I couldn't help but feel again the spirit of excitement that I and others felt at the time. These are in- deed "key papers," and it's valuable to have them together. Oh, of course some names are missing and are missed- Mooers, Taube, Fairthorne, Perry and Kent, Bar-Hillel, Bush, Shaw-but enough of them are here to give a full flavor of the times. The question is whether, as a collection, this set of papers has value be- yond nostalgia. Before turning to that question, however, let's see what they consist of. The volume groups nineteen papers into four categories: (1) Background and Philosophy, (2) Information Needs and Systems, ( 3) Organization and Dissemi- nation of Information, and ( 4) Other Areas of Interest. The first includes pa- pers by Borko, by Shera, and by Otten and Debons that attempt to define infor- mation science, its relationship to librari- anship, and its potential as an indepen- dent discipline. The second includes pa- pers by Weinberg, by Murdock and Lis- ton, by Taylor, by Parker and Paisley, and by Kertesz that outline the purposes and functions of information transfer, especial- ly for the sciences. The third includes pa- pers by Doyle, by Fischer, by Conner, and by Rees that present some of the techniques which have been developed for handling, organizing, and presenting information-especially mechanized ones such as KWIC indexes, automatic index- ing and abstracting, and SDI. The final section presents a potpourri of topics: a paper by Lipetz on information storage Book Reviews 269 and retrieval, one by De Gennaro on li- brary automation, one by Garvin on na- tural language, one by Borko on systems analysis, and one by Heilprin on technol- ogy and copyright. The defined purpose of this collection is to serve students and instructors in in- troductory courses in information science, by making these key papers readily avail- able as assigned readings. They indeed are useful readings, and the organization imposed on them by the editor, Elias, adds greatly to their usefulness, making them far more than a simple chronological listing. Despite this, however, I must confess that, as the instructor in an introductory course in which we used the Key Papers for the purpose for which it was intend- ed, it fell short of meeting the needs. Since then, I've tried to evaluate why. Recognizing that the difficulties may have been due to the style of the instruc- tor and the form of the course the fact is that any collection of readings: valuable though they individually may be, has many deficiencies. I suppose they can all be summed up as follows: a collection of papers has the appearance of a book with- out being a book. It lacks congruity; it lacks balance; it lacks inherent structure in contrast to that which is imposed; it lacks a theme or point to be made; it lacks a consistent style. As a sometime publisher, as an editor of a -series of books as a reviewer of prospective manuscrip~ I have felt that these things are as impor- tant in evaluation as substance and con- tent. Beyond this, a more important fact is that these papers, "key" though they are, represent the past, not the present. An in- troduction to information science requires reading assignments in the work of today, not just those of historical importance. On the other hand, the fact remains that these are important papers, ones with which students should be come familiar- and not simply for historical purposes, and that most instructors and classes should Bnd this a useful volume. Robert M. Hayes Becker & Hayes, Inc. 8924 ---- 3 Hoist by Their Own Petard A funny thing happened at ALA Midwinter. What's more, it was fasci- nating as well, for it was one of the loveliest examples of "communica- tions dysfunction" I've ever seen. (Dysfunction: impaired or abnormal functioning.) Librarians-information scientists-have always been concerned with the transfer of information. In recent times, this concern has been ex- plicitly identified as constituting the major component of the profession's domain. Whether one interprets information to be the book, and discusses its transfer in terms of acquisitions, circulation, and interlibrary loan, or one interprets information to be datum, and discusses transfer in terms of access, retrieval, and transfer, the fact remains that information transfer is the area of concern of the information profession. Yet, as is already evident from the paragraph above, the medium being used to relay the message, the unit which is basic to the process of infor- mation transfer, i.e., the word, is a fractious thing. One would think that informationalists would be among the most alert to this frailty of lan- guage; yet, though the problem has been addressed at great length by a great many, members of our profession have not been predominant among them. We, too, use words ever more loosely, violate structure ever more often, and transpose jargon ever more freely-unaware, and, appar- ently, uncaring that in the process we are vitiating the very foundation of our field. And thus, at the Palmer House in Chicago, during a very balmy January Midwinter Meeting of the American Library Association, a select group of professional practitioners who had gathered together to work together found themselves caught in their own trap. They were unable to commu- nicate! Information specialists-listening without hearing, reading with- out comprehending, talking without communicating. It was almost frightening. "Network" concerns got defined in terms of the need for reimbursement for interlibraTy loan. The phrases "data base interchange," "machine-readable record exchange," and "networking" were being used interchangeably, engendering damaging misconceptions. The distinction between "conb:act negotiation assistance" (which CLR will pro- vide the Anable serials group) and "contracting" (which CLR is not doing here) was not made. Legislative "networks" described procedural, not sub- stantive, activity. The jargon of Internal Revenue Code section 4942 ( j) 4 Journal of Library Automation Vol. 7/1 March 1974 ( 3) (operating foundation) and the jargon of the technical sector ( op- erations) were interpreted as being synonymous. And the word standard lost its identity altogether. The irony is overwhelming. Like the old adage about the shoemaker's children who don't have shoes, it would appear that it is the information specialists who cannot communicate.-Ruth L. Tighe, New England Li- brary Information Network 8925 ---- Institutional Political and Fiscal Factors In the Development of Library Automation, 1967-71 Allen B. VEANER: Stanford University, Stanford, California. 5 This paper (1) summarizes an investigation into the political and financial factors which inhibited the ready application of computers to individual academic libraries during the period 1967-71, and (2) presents the author's speculations on the future of libraries in a computer dominant society.il> Technical aspects of system design were specifically excluded from the in- vestigation. Twenty-four institutions were visited and approximately 100 pe1·sons interviewed. Substantial future change is envisaged in both the structure and function of the library, if the eme1·ging trend of coalescing libraries and computer- ized «information processing centers" continues. SUMMARY OF MAJOR FACTORS WHICH INHIBITED THE APPLICATION OF COMPUTERS TO LIBRARY PROBLEMS, 1967-71 Major factors which inhibited the application of the computer to the library during the period 1967-71 can be categorized under three broad headings: (A) Governance, organization, and management of the comput- er facility; (B) Personnel in the computer facility; and (C) Deficiencies in the library environment. A. Governance, Organization, and Management of the Computer Facility 1. Uncertainty over who was in charge of the computer facility.-This problem was partly attributable to the fact that the goals and objectives of the facility were imprecisely stated or not stated at alL Often there was no charter, no systematic procedures for establishing priorities, and excessive autonomy by the computer facility. These factors often per- mitted the facility to operate as a self-directing, self-sustaining entity, responsible to no informed, upper level manager. '~> The paper is based on a CLR Fellowship Report to the Council on Library Resources, Inc., for the period January-June 1972. 6 Journal of Lihra1·y Automation Vol. 7/1 March 1974 2. Effect of high level administrative changes.-In a few instances, the library automation effort was instigated by the president of the institu- tion. He could, in effect, personally direct the allocation of resources. However, whenever a high administrative official leaves, the resulting vacuum is quickly filled by other interests, the atmosphere changes, and his personal program goals dissolve. 3. Management inadequacies.-The effects of domination by a techni- cian or special interest group are described below in more detail. Al- though more and more organizations are putting together influential user groups to point the way toward better management, decision-mak- ing responsibility and authority continued to be misplaced in a few in- stitutions which vested authority for technical decisions in a committee of deans who were somewhat remote from current trends in computing because of their administrative responsibilities. (In one institution, it was half jokingly stated that a dean in any hard science could be char- acterized as suffering from a minimum technological time lag of two years.) 4. Lack of long-range planning inclusive of attention to community priorities.-Few facilities visited had any written long-range plans, ei- ther for the acquisition of hardware, the conversion of older programs, or the involvement of users in systems design. Ad hoc arrangements were prevalent. 5. System instability.-This was more the rule than the exception, espe- cially in software, operating systems, hardware configuration, and pric- ing. Wherever an academic computing facility was used for library de- velopment, the same broken record always seemed to be playing: the fa- cility was always being taken apart and put together again. Of course li- brary development was not the only user affected; complaints arose from all users. 6. Biased pricing algorithms.-In the academic facility, student and re- search use were competitive. Hence systems were typically geared to dis- tribute computing resources around the clock in some equitable and ra- tional way. For instance, short student jobs were sometimes given a high priority for rapid turnaround, while long, grinding calculation work was pushed off to the evening or night shift by means of variable pricing schedules or algorithms. A pricing algorithm is basically a load leveling device to smooth out the peaks of over-demand and the valleys of un- der-utilization which would have occurred in the absence of such con- trols. Devising pricing algorithms is by no means a simple task, since many factors must be taken into account: the kinds of machine re- sources available, their respective costs, the data rates at which they can function, market demand, hardware and software available, and system overhead, to name but a few. Library jobs tended to suffer in both batch and on-line processing. In the former case, because batch jobs on large data bases took so much Institutional Political and Fiscal Factors/VEANER 7 time, library work generally could not be done during the prime shift; in the latter case, an on-line library system made substantial demands upon a facility's storage equipment and telecommunications support, and competed with all other on-line users. 7. Sense of competition with the library for hard dollars.-This prob- lem, which is related to pricing bias, is detailed further on page 21. 8. Scheduling problems.-Many of the institutions visited had systems or charts for scheduling production, development, and maintenance. But conversations with system users often verified that schedules were either not met or had been unrealistically established. This was especially the case with development work B. Personnel in the Computer Facility 1. Selection and evaluation.-Inasmuch as the library often did not have the competence to judge personnel nor the ability to generate meaningful specifications, there was generally very little protection from incompetence in this area. 2. Elitism: The notion that the masters of the computer are inherently superior to and have better judgment than computer customers.-Elitism is a paradox: it can be positive or negative-positive when the best brains produce software designs of true genius with respect to function, performance, economy, and reliability-but in its negative manifesta- tion, reminiscent of the girl with the curl in the middle of her fore- head: "When she was good, she was very, very good; when she was bad, she was horrid." During the boom years when computer facilities were expanding fast- er than the supply of competent staff, elitism seemed fairly common in the computer center. The excitement of rapid development, the seem- ingly unlimited intellectual challenge presented by the powerful appa- ratus, and high strung dispositions sometimes caused tempers to flare or immaturity to sustain itself beyond a reasonable time. Strange hours, strange habits, bizarre behavior, all seemed to conspire against ordered and rational development. Fortunately, as the field matures, the negative aspects of elitism are dying; managers now can concentrate on staff de- velopment work to turn top intellectual talents toward productive achievement. 3. Disinterest.-This factor may be allied to elitism. In some instances, the computer center's staff gave considerable attention to the library dur- ing the period immediately following machine installation, when utiliza- tion was low. Later, the staff's keen interest became "dulled" at the thought of operating a production system. "More interesting jobs" were .challenging the programmers and beginning to fill up the machine. 4. Fear of the unknown big user.-It was recognized early that the li- brary could be among the computer facility's largest potential customers, perhaps the largest. In some facilities, this recognition may have induced 8 Journal of Library Automation Vol. 7/1 March 1974 a fear of being taken over or overwhelmed by the user, who would then be in a position to dominate and dictate the direction of further devel- opment and operations. 5. Fears of an unknown production environment-Simply expressed, a production environment removes much of the stimulus for creative approaches to problem solving unless continuous development is main- tained for new systems and new applications. Many of the best program- mers did not wish to lose their freedom to innovate and actively resisted participation in establishment of a production environment, with its concomitant requirement of "dull" maintenance support work. C. Deficiencies within the Library Environment 1. Failure to understand in full detail the current manual system.- Even where the manual system was understood, there was often an in- ability to describe it in the clear, unambiguous style essential to system design work. These deficiencies were further compounded by the unwill- ingness of some librarians to learn how to communicate adequately with computer personnel. 2. Inability to communicate design specification.-Many did not under- stand how to put together a specification document; particularly they did not know how to account exhaustively for all possible cases or al- ternatives. Librarians were unaccustomed to defining their data process- ing requirements quantitatively or with precision-both absolutely in- dispensable to the computer environment. Also, as much as the computer facility changed its software environment, many library development efforts were constantly changing their system requirements-a condition which made it all but impossible to program efficiently. 3. Failure to understand the development process.-Development is a new phenomenon in libraries. Most librarians were not educated to com- prehend development as an iterative process, characterized by experimen- tation, error, feedback, and corrective measures. Accustomed to the rela- tive stability of long-established procedures-some of which had stood for generations, even centuries-some librarians were baffled by the rapidly changing new technology, others showed impatience and a low tolerance for frustration. Many expected development projects to re- semble turnkey operations, and the failure of the process to accommo- date these expectations produced disappointment and an inability to cope with the computer environment. 4. Failure to recognize the computer as a finite resource.-Both librari- ans and early facility managers seemed to look upon the computer as an inexhaustible resource, the former through lack of sophistication and the latter apparently through myopia or possibly ambition. Some man- agers must have told their users that there was "no way" their equipment could be saturated in the foreseeable future. Apparently some library users were naive enough to believe. Institutional Political and Fiscal FactorsjVEANER 9 5. Excessive or unrealistic performance expectations.-Few library users understood the relationship between the system specifications and func- tional results, and fewer still understood the significance of perform- ance specifications. The situation was not assisted by notions of "in- stantaneous" retrieval pushed by salesmen or the popular press. (The writer recalls vividly how one salesman told him the library could have a CRT device for $1 a day! And indeed, the device itself was $1 per day if one cared to do without the keyboard, without cables, installation, control units, teleprocessing overhead, a computer, software, etc.) 6. Lack of an established tradition of research and development ( R & D) and the lack of venture capital in the library community.- The challenge of the computer may have been largely responsible for activating research and development as a serious and continuous effort in librarianship. Inexperience in raising and managing funds for R & D, as well as a general lack of knowledge of computer cost factors inhibited progress or tended to make the development effort inefficient and full of surprises. 7. Human problems.-Some libraries having prior experience with small batch systems underestimated the scale of effort for contributing to the design of the large system, selling it to the users, installing it, and train- ing the users. 8. Insufficient support from top management.-In some instances, li- brary management did not accord the automation effort the kind and de- gree of support essential to success. In particular, some librarians seemed to feel that automation was a temporary affair, definitely of less impor- tance and significance than current manual operations. Some did not rec- ognize the sacrifices in regular production that would be necessary and some did not appreciate the continuing nature of development work. BACKGROUND Two important prerequisites to progress in library automation were money and technical readiness. The government supplied the first, indus- try the second. The announcement by IBM in 1964 of its System 360 oc- curred at a fortunate time for the American library community. President Johnson's administration had launched enormous programs in support of education. The Library Services and Construction Act was soon to channel millions of dollars into library plant expansion and, perhaps more signifi- cantly, the Higher Education Act of 1965 was to sponsor research, which UI1til then had only the support of limited funds from the Council on Li- brary Resources, Inc., and the National Science Foundation. (Support from the National Science Foundation was largely, although not exclusive- ly; directed toward discipline-oriented information services; one of the largest NSF grants went to the University of Chicago Library.) It was the right time to invest in library automation. Important mile- stones were already behind the library community: the National Library 10 Journal of Lihm1'y Automation Vol. 7/1 March 1974 of Medicine's MEDLARS program was well underway, the Airlie Confer- ence on library automation had been held and its report published ("the White Book"), and the Library of Congress automation feasibility study ("the Red Book") had appeared. 1 • 2 The first MARC format was being tested in the field. In computer technology, third generation equipment represented major increases in computing power, processing speed, reliability, and capacity to store data in machine-readable form. IBM's sales force was successful be- yond imagination in getting System 360's installed in large universities, as well as in business and government. IBM promised a new kind of software-time-sharing-which would virtually eliminate the tremendous mismatch of data processing speed between the human being and the ma- chine. The new methods of spreading computer power through telepro- cessing and time-sharing promised to make the computer at least competi- tive with and possibly an improvement over "antiquated" manual systems of providing rapid access to large and complex data files. Within this relatively unknown environment, universities and libraries entered the software development process, which if successful, could en- able them to catch up where they had been hopelessly falling behind. Cir- culation, book purchasing, and technical processing loads in many libraries seemed to double and triple overnight as the country's schools and their programs grew to accommodate expanding enrollments. Manual systems that had been reasonably workable and responsive in environments char- acterized by slow growth demonstrated significant and disturbing defects -the inability to deal with peak loads, or rapidly changing loads. The same effects were felt in administrative and academic computing: a bigger and more complex payroll, more students to register, construction con- tracts to monitor, more research grants which demanded bigger computers, and so on. These were truly boom years. But in the academic community there was still another force developing which was ultimately to be of even greater significance for libraries than the inconveniences of being unable to handle the housekeeping load: a dramatic rise in the expectations of patrons, especially in the academic community, where computers already abounded. Libraries had come to be felt by some as strongholds of conservatism and expensive luxuries; li- brarians were faulted for not "putting the card catalog onto magnetic tape," for not implementing automated circulation systems, or otherwise failing to take advantage of new and powerful data processing tech- niques. The libraries were caught amidst a variety of sometimes conflict- ing, sometimes complementary factors: the visionary ignorance of the com- puter salesman, the senior academic officer possessed by the computer dybbuk, a lack of sympathy or understanding among some computer cen- ter managers, a lack of appreciation by students and faculty of the com- plexity of identifying, procuring, and cataloging unique copies of what must be the least standardized product known to man, and their own luke- Institutional Political and Fiscal Facto1'sjVEANER 11 warm commitment to undertake the hard work required to learn how to use the computer resource. Anxieties about jog displacement caused some library staff to look upon computers with trepidation, thus further placing the librarian in a defensive position. While these forces were taking shape, the library's bibliographic activities continued to be seriously hampered by inadequate international bibliographic control.~~ Some essential computer hardware, especially the programmable CRT terminal with an adequate character set, was either nonexistent or totally unsuitable to library appli- cations. In this institutional context librarians entered the world of com- puters and data processing. t PURPOSE It is the purpose of this report to examine in some detail how internal institutional factors affected the development of computerized biblio- graphic systems, and especially to consider nontechnical, negative factors: what slowed down or inhibited the applications of computers in librarian- ship? This report is not concerned with the merits or demerits of specific systems or their features; indeed, the investigator did not inquire about system specifications. Major questions centered about the factors which fostered or hindered the development p1'ocess, regardless of the merit of a project or system. SCOPE Investigation was limited almost solely to those institutions considered likely to have large scale, in-house development projects using third gen- eration computer equipment. The majority of places visited were large academic libraries. The time span included in the survey begins approxi- mately in 1967 and ends in 1971. A total of twenty-four institutions was visited and some 100 persons interviewed; a list of the institutions visited is in Appendix 1. METHODOLOGY Site Visits and I nte1'views Arrangements were made to visit four types of individuals: the director of libraries, the head of the library's system development department, the director of the computation center, and whatever principal institu- tional officer was managerially and/ or financially responsible for campus computing. Considerable variation was found in the type of person as- signed this last responsibility-it could be the provost, the vice-president u Implementation of the Library of Congress' Shared Cataloging Program under Title II6f the Higher Education Act of 1965 was soon to alter this situation dramatically. t The painful trauma libraries and librarians experienced in getting into computers is too well documented to summarize here. Perhaps the best summary has been done by Stuart-Stubbs. a 12 ] oumal of Library Automation Vol. 7/1 March 197 4 for academic affairs, or the vice-president for business/ financial affairs. Choice of the major institutional official to be interviewed was often de- termined by the pattern of computing in a particular institution, or the facility which supported the development effort. At first the investigator attempted to utilize a structured questionnaire for interviewing. This very quickly broke down, as the interviewees were generally voluble and ranged widely over many related topics or items which they would have been asked about later. Accordingly, after the first few interviews, the formal questionnaire approach was dropped and a simple checklist of major questions kept on a few cards to make sure that each major issue had been addressed. Every interviewee received the inves- tigator graciously and none was unwilling to talk; indeed, if anything the opposite was the case-most persons seemed to be eagerly waiting for an opportunity to air their views. Visits and interviews occurred during the period January-April1972. Literature Searches Searching the literature on this topic has been extremely frustrating. In the literature of computer science and management, there are many arti- cles on pricing algorithms, machine resource allocation schemes, and issues of managing the computer facility, but none specific to the topic of this report. Besides scanning professional literature, the author has regularly conducted for the past year monthly computer searches via the UCLA Cen- ter for Information Service's SDI Service. Abstracts and citations were searched in Research in Education (RIE) and Current Index to Journals in Education (CIJE). With respect to problems faced by the library in ac- quiring computer services, the results have been nil in both cases. The au- thor reluctantly concludes that no major recent studies have yet been pub- lished in this sensitive area, although two papers by Canadian librarians are very helpful. 3• 4 The National Academy of Sciences/Computer Science and Engineering Board's Information Systems Panel appears to have come closest to identifying the issues in its report, Library and Information Technology: A National Systems Challenge. Still, the comments in that re- port are highly generalized and do not grapple with specifics. 5 STRUCTURE OF EDUCATIONAL COMPUTING Most of the visited institutions maintained separate facilities for ad- ministrative and academic computing, while a few ran combined facilities or were in the throes of consolidating their facilities. The differences be- tween administrative and academic computing have historical roots deeply embedded in institutional soil. Administrative computing is usually an out- growth of punched card installations first set up for payroll and financial reporting. Academic computing, on the other hand, has its origins within the institution's instructional and research programs. Typically it has been supported by external grants and contracts and has been oriented toward Institutional Political and Fiscal Facto1'sjVEANER 13 the "hard" sciences. Until the recent dropoff in federal support of higher education, academic computing was a money maker (through the overhead on grants and contracts) while administrative computing was a money spender. ADMINISTRATIVE COMPUTING Typically very little computational work is done in administrative ap- plications; most of the computer work is associated with input, update, reading records, writing records, and printing reports. Except for the pay- .roll application, the consumer group has tended to be somewhat smaller and less transient than the academic group. But to university administra- tors the computer could do much more than write checks and pay bills. Many significant administrative applications had already been installed on second generation equipment: faculty-staff directories, inventories of space, supplies, and equipment, records of grades, course consumption re- ports, etc. All these tended to expand the user group, increasing competi- tion for the resource. The advent of third generation equipment made it attractive for administrators to think about applications centered around the so-called "integrated data base." This led to a demand for further new services for the registrar, fund raising and gift solicitation, student ser- vices, purchasing, etc. Conventional administrative computing-particularly that part of it which generated regular reports-lent itself naturally to batch processing, and indeed many of the early computer installations actually continued established punched card operations, merely using the computer as a faster calculator and printer. The administrative computing shop is typically characterized by (or hopes to be characterized by) great systems stability and dependability, a cautious and measured rate of innovation, and in the opinion of some academic computing types, not much imagination. File integrity, backup and recovery, and timely delivery of its products are prime goals in an administrative computing system. The administrative computing facility very much resembles the library in two important as- pects: ( 1) it is a production system; and ( 2) it is almost entirely an over- head function, i.e., there is little or no attempt at cost recovery from sys- tem users for its services. ACADEMIC COMPUTING Academic computing is a much different world. It serves a large, vocifer- ous, .influential, and mostly technological user community, many of whom ~~e not only competent in programming, but more importantly, possess ready cash. But this is changing: as academic computing expands to ser- vice users in the humanities and social sciences rather than mainly those in the "hard" sciences, the user group is growing and it will probably not be long before it embraces the total academic community. In hard science applications, the academic facility typically performs an 14 Journal of Library Automation Vol. 7/1 March 1974 enormous amount of computing ("number crunching") with a relatively small amount of output. System backup and recovery is important to the academic computing facility, but file integrity responsibility may often be assigned to the user since such a center sometimes does not maintain the data base but merely provides a service for manipulating it. The main components of academic use are department- or discipline-oriented re- search and student instruction, the latter being particularly strong if there is a well-established computer science department. Software development has customarily played a major role in academic computing and the usual practice was to actively seek out imaginative sys- tems programmers for whom change and system improvement are food and drink. Consequently, instability, both in hardware and software, has been more the rule than the exception in the recent past, although as the management of computer facilities matures, this too is changing. CURRENTTRENDSANDSTATUS It is obvious from the above that administrative and academic comput- ing have been characterized by diametrically opposed machine and man- agerial requirements. Where they have been combined in the same facility, tensions have prevailed and neither user was happy. In a few instances known to the writer, such combinations have been abortive and a reversion made to divided facilities. But as computing matures it is becoming evi- dent that operational stability is needed for all types of computing, not just administrative computing. Additionally, the financial crises now prevalent in institutions of higher education have brought more realistic attitudes to the fore in understanding just what kinds of facilities can be afforded, and how they should be managed. Additionally, the economies of scale, the increasing flexibility of hardware and growing sophistication of software are now combining to form an environment which can better satisfy all potential users of computers. There are clear indications that a unified, well-managed shop with competent staff might now economically and effi- ciently serve a variety of applications, including administrative and aca- demic-on the same facility. However, this is a developing trend and does not correspond with what the writer actually observed during his visits. In situ he saw much evidence that Anthony Oettinger's observations of some years ago were still valid: ... routine scheduled administrative work and unpredictable experimental work coexist only very uneasily at best, and quite often to the serious detri- ment of both. Where the demands of administrative data processing and education require the same facilities at precisely the same time, the argu- ment is invariably won by whoever pays the bills. Finances permitting, the loser sets up an independent installation. 6 Indeed, it would not be unreasonable to conclude from the interviews that in most places visited, computing during the period 1967-71 was in a Institutional Political and Fiscal FactorsjVEANER 15 state of disarray. There is abundant and disagreeable evidence of tech- nical incompetence, lack of management ability, ill spent money, commu- nication failures, and naive and disillusioned users. But it would be a mistake to conclude that the failures in library auto- mation are attributable primarily to computer-oriented personnel or hard- ware problems-librarians in their own way displayed many of these same failures. It would be another mistake to dwell excessively on the high failure rates observed. In any complex technological endeavor, the rate of failure is dramatically high at the beginning; there is ample evidence here from the aircraft and space industries. Indeed, the likelihood of a first success in anything complex-library automation is complex, as we have learned the hard way-is practically nil. ORGANIZATION AND MANAGEMENT PROBLEMS: THE ACADEMIC COMPUTING ENVIRONMENT Early academic computing facilities were typically run by faculty mem- bers in engineering, applied mathematics, computer science, or related fields. This arrangement was satisfactory when computers were small, rel- atively primitive, and the user community was confined to those few peo- ple who could program in machine language or assembly language. As equipment became bigger and more powerful, and as higher level pro- gramming languages developed, more and more people learned program- ming. Correspondingly, the task of managing the computer facility grew rapidly in size and scope. The budget of a large computer center in a mod- ern university can easily run to several millions of dollars annually. The manager must balance seemingly innumerable, complex forces: personnel, management, government and vendor relationships, demands from vocal users, establishing priorities, the challenge of hardware advances, market- ing, pricing services, balancing the budget, etc. It soon became clear that few faculty members possessed either the multifaceted talents or the ex- perience required for effective management. As the center's budget grew, and particularly as the shift was made from second to third generation equipment, th,e faculty member tended to be re- placed by the technician as manager. Unf01tunately for many of the fa- cility users, the technician tended to promote his own technical interests in software development or hardware utilization. In some instances, the user community felt that the facility was being run more for the benefit of the staff than for the users. The technician-manager often looked at the com- puter as his personal machine, much as some faculty members had earlier felt the computer to be their own private preserve. The vice-president of one university expressed the view that the technician-manager doesn't real- ly have an institutional loyalty tied to the goals and objectives of the aca- demic programs; he is more loyal to the machine or the software. In a school with a long history of computer utilization, there had been no tech- 16 Journal of Library Automation Vol. 7/1 March 1974 nician in charge of the computer facility for a decade. Yet in a school not too far away, an officer indicated that his institution had "made the same mistake twice in a row" by hiring a technician to manage the computer fa- cility. The technician-manager represents a highly personalized management style, one in which goodwill, friendship, or personal interest is the key to effective service. It can hardly represent an arrangement for the successful development and implementation of computerized bibliographic systems. In the third and current organization and management phase of aca- demic computer facilities, the professional manager is in charge. Schools are now beginning to see the need to develop formal charters for their computing centers, quasi-legal instruments which will lay out their specific responsibilities as service agencies. A professionally managed service agen- cy eliminates one of the most irritating elements in the allocation of com- puter resources: personal judgment by the faculty or technician-manager as to the worth of a project, which was so prevalent during earlier man- agement stages. At the time of the interviews, very few institutions ac- tually had such charters, but their need was being recognized. It is now uni- versally accepted that the computer center can no longer be the plaything of the faculty nor the expensive toy of the technician. ORGANIZATION AND MANAGEMENT: THE ADMINISTRATIVE ENVIRONMENT Because of its historical development the administrative computing fa- cility was usually first run by someone with an accounting or financial background. (Academic computing persons occasionally put disparaging labels on such people as "EDP-types" or characterized them as having a "punched card mentality.") The nature of the workload virtually meant that the administrative shop would be set up mainly for batch processing and any data base services provided for other users would involve printed lists. Such facilities were found satisfactory by a number of libraries even for applications such as circulation, which produced gigantic lists-prob- ably because it represented a vast improvement over an antiquated, poor- ly designed, or overloaded manual system. However, there was at least one major technical consideration which had direct political and financial implications for the library which turned to the administrative computing facility for its computer support. This was the library's need to support and manipulate a data base with nearly every data element of variable length-a requirement that was practically non- existent in administrative computing. Some facilities were unable or un- willing to meet this requirement. The move from tape-oriented systems to mixed disc and tape systems on third generation equipment necessitated an upgrading of programming staff, and brought into the administrative shop the same clearcut distinc- tion between system programmers and application programmers which had Institutional Political and Fiscal FactorsjVEANER 17 emerged earlier in the academic shop. This change in turn demanded ap- pointment of more knowledgeable facility managers, many of whom were drawn from business and industry rather than the ranks of in-house accounting staff. This transitional period was characterized by two enormously challeng- ing parallel efforts: the conversion of existing programs to run on third generation equipment and the development of new applications. To an ex- tent these responsibilities were competitive, and from this viewpoint it was certainly not a propitious time to embark upon anything as complex as bib- liographic data processing. Yet numerous workable systems emerged for circulation, book catalogs, ordering and accounting systems, and serials lists. These were not accomplished without anguish as the library did not con- trol the machine resources and often did not control the human resources -the facility manager tended to make his pliority decisions to please his boss who was certainly not the librarian. Besides, no application could really take precedence over payroll or accounting in the administrative shop. To the librarian it was more like borrowing another person's car than renting or owning a car: when the resource was urgently needed some- one else had first call. ORGANIZATION AND MANAGEMENT: THE LIBRARY AUTOMATION ENDEAVOR A detailed study of this subject is not within the scope of this investiga- tion. However, it will be useful to note that the organization and manage- ment of library automation activities demonstrate development phases which closely parallel those in the computing environment: 1. A stage in which the user himself ( cf. accountant or faculty mem- ber) undertakes to perform the activity. In this stage individual librari- ans learned programming, did their own design work, wrote, debugged, and ran programs themselves. (This was possible in the "open shop" en- vironment prevalent in many early computer facilities.) 2. A stage in which the technician-in this case a librarian with appro- priate public service expertise (for circulation applications) or technical processing knowledge (for acquisitions, cataloging, or serials) -took charge of an organized development effort, hired his own programmers and systems analysts, and negotiated directly with the computer facility.* 3. A stage in which the professional system development manager is hired to oversee the total effort. Such a person is sometimes drawn from business or industry, is a seasoned project manager, and has broad knowledge of computers, especially in the area of costs. Such an ap- *The technical person need not be a librarian. Northwestern University represents a significant instance where a faculty member in Computer Sciences and Electrical Engi- neering undertook the development effort. 18 ]oumal of Library Automation Vol. 7/1 March 1974 pointment is more common in the large library, the consortium, or net- work. HUMAN PROBLEMS ASSOCIATED WITH RAPID CHANGE IN INSTITUTIONS Some institutions, particularly in their administrative functions, became embroiled in a seemingly endless round of internal psycho-social problems which did not make the environment conducive to problem solving. The move to computerizing manually oriented functions, whether in the li- brary or other parts of an institution, was found to be extremely threaten- ing to established departmental structures. It was consistently reported that the political and emotional aspects of system conversion, both in the li- bra.ry and elsewhere, were much more aggravating than the technical as- pects. The problem simply showed up first outside the library because ap- plications of computers occurred there earlier. Departments were some- times unwilling to give up data for computer manipulation for fear that computerization would take jobs away. This phenomenon is not unknown in librarianship where some professionals take an extremely proprietary attitude toward bibliographic data. Now pressures from governments, leg- islatures, and the academic community at large are gradually establishing the concept that some categories of data are corporate, and do not belong to a specific individual or department, or even to an institution, but should be shared through networking or other mechanisms. But the rapidity of microsocial change and its upsetting emotional consequences caught some library leaders unawares. A considerable reeducational process for both management and labor is required to smooth the transition to the new view. MOTIVATION PROBLEMS It is difficult to elicit sound comment concerning motivation (or lack thereof) as a deterrent to progress in library automation. It is an emotion- al subject and neither the librarians nor the programmers come out "clean." The prima donna computer programmer, much in evidence in the early days of computer center development, is very much on the wane these days. Like the spoiled child, the prima donna programmer could only exist where personal interests were permitted to take precedence over so- cial goals-or perhaps where institutional goals for the computer facility had not been clearly articulated or had not yet come into focus. Some prima donnas, partly out of ignorance, partly through a stereotyped image of library activities, were inclined to disdainfully dismiss library applica- tions as "trivial," and demand "really challenging" assignments. But the librarians had their prima donnas, too. Some had learned enough programming to be a little dangerous and they then felt like peers who could tell the computer center not only what to do but how to do it. At first, few members of the library staff were willing to learn how to ar- Institutional Political and Fiscal FactorsjVEANER 19 ticulate their specifications and requirements to the management of a com- puter facility. Most librarians expected some kind of miraculous magic, akin to a wave of the hand, to bring a computer system to reality. Very few understood the heuristic nature of development. So there were barriers of status, depth of knowledge, and language-any one of which would have sufficed to kill the development of the good mo- tivation essential to breaking new gro~nd. In the wrong combination they could present an overwhelming conspiracy, for their mutual interaction could only produce polarization and intransigence. THE LIBRARY AND THE COMPUTER FACILITY The Role of Similarities and Differences For a long time the library has been the "heart of the university." Until the advent of the computer, little could challenge the supremacy of the li- brary as the principal resource of an educational institution. Even the fac- ulty could be put into second place, since it was difficult to attract high quality faculty without good library resources, and the faculty were to a greater degree transient, for the library was considered "permanent," an investment for all time. The computer represents a new and challenging force in the arena where shrinking resources are allocated among com- peting academic users. Both the library and the computer facility have ex- perienced exceedingly rapid growth in the recent past, concurrent with an expanded demand for services which can easily outstrip available re- sources. Among some of the larger academic libraries, the staff of the com- puter center may be half or greater than half that of the library. Important differences between the two services have recently come into focus. First, most of the services and benefits of the library are intangible. Because of this it has always been difficult to measure the cost benefit of the library as an institution, and it is well known that counts of the num- ber of people entering the door or the number of circulations are far from true measures of the library's functional success. The computer, on the other hand, is a relentless accounting engine; computer facilities can produce endless statistics on the number of jobs run, lines printed, ter- minal hours provided to users, turnaround time, cards punched, etc. The computer's output is extremely tangible and can be more directly and easily related to academic achievement than can library use. A second major difference lies in apparently different financial roles within the institution. In most organizations, the library is run as an over- head expense, without any attempt to charge back to users or departments proportional costs of utilization. Like air, the library resource is there for anyone to use as much or as little as he pleases; the library gets a "free ride," but the computer center is expected to pay its own way. This dichot- omy is often explicitly designated as the "library-bookstore" duo model. Furthermore, since the library does not generate much in the way of re- 20 Journal of Library Automation Vot 7/1 March 1974 search grants and contracts, it is looked upon as a consumer rather than a producer of financial resources. In fact, those who support computing in preference to books point to. the fact that overhead income generated by computer-related research grants and contracts is shared with the library which may have done little to contribute toward the acquisition of such income! In some institutions the situation has become critical indeed be- cause of the recent substantial reductions in federal· support. Much po- litical in-fighting has been necessary to maintain current levels of comput- er activity, and not all such efforts· have been successful.· Some institutions have been forced to cut back on computing power, merge facilities, or combine resources with other institutions. · · · · Several years ago when the National Science Foundation imposed an ex- penditure ceiling on grants, associated overhead income was corresponding- ly reduced. One computer center director was reported to have suggested that the effect of this overhead cut could be nullified by a simple, internal reallocation of funds, say by taking the needed amount from the budget of another agency on campus of less significance to researchers and scien- tists, such as the library. This attitude is clear evidence that the library has lost its sacred cow status as a "good thing" on the campus. It too must justi- fy itself. Close examination of the library and the computer facility gives clear evidence that both deal with the same commodity: information. Within the recent past several computer facilities have changed their designations to "information processing" facilities or centers. Several institutions, notably the University of Pittsburgh and Columbia University, have co- alesced the library and the computer center organizationally or have both units reporting to a vice-president for information services. The recogni- tion and furtherance of this natural linkage may do much to reduce the potentially destructive competition which can characterize the relationship between the two units. There are remarkable growth parallels between the two facilities-the library acquiring and processing more and niore books in response to ex- panded publication patterns, more users, and the· growth of new ·disci- plines and interdisciplinary research, while the computation facility moves rapidly from one generation of software and hardware to the next. The expansion of both organizations produces seemingly equal capital-intensive and labor-intensive pressures: library processing staff doubles and triples, while the ·newly acquired books demand ·more in the way of housing, whether of the traditional library type or warehouse space; the computer center moves toward more sophisticated hardware, especially terminals and communications, which need to be supported by greater numbers of still more highly qualified· systems programmers, communication experts, and user services staff. Both services have a marketing. problem; but the computation facility, being relatively more dynamic and more interactive (because of terminal services), can be more sensitive and responsive, .fi- nancially and technically, to its clientele than can the library. Only now Institutional Political and Fiscal Factors/VEANER 21 with the emphasis upon computerized bibliographic networking has the li- brary as an institution begun to approach the marketing strategies and the effective user feedback already well developed in computation facilities. Service Capacity, Resource Utilization and Sharing Differences both in service capacity and resource utilization represent a key political issue affecting the future of both libraries and computer fa- cilities. In major universities, the budget for the computer facility is now not far from the library budget in size, and in a few institutions it ex- ceeds the library budget. With the diminution of external grants and con- tracts, the two organizations compete for the same hard dollars. This eco- nomic competition can either drive the two facilities apart, dividing the campus, or cause them to coalesce-as has been the case at Columbia and Pittsburgh. Despite its high operating costs, from the viewpoint of resource utiliza- tion, the well-managed computer facility can almost always point to an ex- cellent record.§ No matter how well managed, the research library can nev- er make this claim in the context of its current materials and processing expenditures, much of which by definition is aimed at filling future needs. The library and its patrons cannot "use" all the resources at their com- mand; the library could not even service all the patrons should they de- mand the use of "all" the resources. In contrast, the computer facility (particularly large on-line systems with interactive capabilities) can be very efficiently utilized even when demand is heavy. Thus, to the "objective" eye, it would appear that in the computer facility both the institution and the individual patron get more value for their dollar than they do in the library, which in comparison resembles a bottomless financial pit. One may counter that apples and oranges are being compared, but the institution which pays their bills nevertheless makes the comparison. Flexibility, Inflexibility, and the Future Besides better resource utilization, the computer facility offers the pa- tron far greater flexibility of resource use than can the library. There is no way a large collection of books on the Celtic language or the military history of the Austro-Hungarian Empire can help a professor of structur- al engineering, a student of marine biology, or a researcher in modern urban problems. Even the books these people actually need and use cannot easily assist others, as relevant data in them is not indexed or readily avail- able for computer manipulation. · The point is that, unlike the library, the computer is a highly elastic uni- versal tool, one that each user can temporarily shape to his own need, repli- cate .the shape later, or if he wishes change the shape at will. The tradi- tional.lib:rary has no such flexibility; its main bibliographic retrieval de- §In fact, if a computer resource is not much used and isn't "carrying its weight," it can be disposed of, by sale if purchased, or by cancellation if leased. 22 Journal of Library Automation Vol. 7/1 March 1974 vice-the card catalog-is especially noted for its high maintenance cost, its limited ability to respond to complex queries, and a general fixity of or- ganization and structure that is ever at variance with changing patron ex- pectations and interests. (If computers can be flexible, why can't the li- brary?) There is much in the library that is not used because it is inaccessible- locked up in an inflexible retrieval tool or unavailable because the state- of-the-art (both in bibliography and computer science) or staffing does not yet permit far deeper access via "librarian-negotiators" and patrons at ter- minals interacting with large and deeply indexed data bases. As long as ma- jor portions of the library budget and staff are devoted to housekeeping and internal technical processing, the library will look less good, less "cost- beneficial" to the academic community than does the computer facility. But there is growing recognition that both institutions deal with informa- tion processing which covers a wide spectrum of time. True, the storage formats differ, but this may be a temporary phenomenon. As progress is made on improved, less expensive conversion of data from analog to digi- tal form and vice-versa, the day may arrive when the library and the com- puter facility are indistinguishable. Will the Library Become an Information Utility? Computer utilities are an important developing trend and it is some- times suggested that library services could be delivered within the utility model. Utilities and libraries as they exist today have very different char- acteristics. A utility can be defined as a system providing a relatively undifferentiat- ed but tangible service to a mass consumer group and with use charges in accordance with a pricing structure designed for load leveling (i.e., op- timization of resource utilization). Typically, a utility both wholesales and retails its services. Within this definition, a conventional library cannot be construed as a utility; its services are generally intangible and very highly differentiated-indeed, chiefly unique, for rarely is one book "just as good as another"; its clientele is not the general public but a highly select group which itself contains highly unequal concentrations of users; and almost no libraries impose user charges in the interest of cost recovery; practically speaking, there is only one United States wholesaler (of bibliographic data) -the Library of Congress. This situation is changing in several respects. First, the establishment of practical, computerized bibliographic networks has introduced among par- ticipating institutions cost sharing schemes closely resembling the load leveling or rate averaging algorithms prevalent among utilities.ll These HAn example of rate averaging is the practice of the Ohio College Library Center to lump total telecommunication cost and prorate it into the membership fee, in effect creab":ng a distance independent tariff. (This arrangement does not hold outside of Ohio.) Institutional Political and Fiscal FactorsjVEANER 23 new ideas have been readily accepted by libraries and could even become the basis for balancing more equitably the costs of interlibrary loan traffic. Second, specialized "information centers" have evolved in certain fields, partially as a consequence of lack of responsiveness (or slow turnaround) by conventional library services, and "for profit" commercial services have been set up. Examples of the latter include the European S'il Vous Plait and its American counterpart, F.I.N.D. (Often such commercial services do not hire librarians as they are considered too tradition bound.) A third force which is rather inchoate at the moment may soon take on a recognizable shape: facilities management. Under such a scheme, the complete management responsibility for all or part of a function is con- tracted to an outside vendor. For instance, it is conceivable that some li- braries in the near future may have no in-house staff for technical process- ing. Services would be purchased totally from a vendor or obtained from his resident staff, much as computer centers buy specialized expertise through the "resident s.e." (systems engineer). The gradual buildup of computerized bibliographic services offers an excellent opportunity for commercial ventures into turnkey bibliographic operations for libraries. This would bring the libraries one step closer to the utility concept, as they buy a complete package from a wholesaler who probably services many customers. The traditional library service concepts we know today may undergo drastic changes in financing and in methods of delivery. Beyond the com- mercialized or contractual arrangement for technical processing, which is only one component of the total information flow, lie unknown territory and little explored concepts: use charges for library services (the bookstore model), the "for profit" library, the complete information delivery sys- tem integrated with computers, communication satellites, and cable TV. If the computer-based library is to become an information utility, a ma- jor accommodation will be needed in the financing arrangements, perhaps in form of user charges-for no utility can survive without regulated de- mand. An unlimited, uncontrolled demand for any product or service is untenable, for without regulation (i.e., pricing) demand rapidly outruns supply. In the traditional library, where theoretically every user has the "righf' to unlimited demand, this never happens for several reasons: (1) not all potential patrons elect to use the resource; ( 2) the users must usu- ally go to the library to access the bibliographic apparatus and obtain the materials held by the library; ( 3) every item in a library collection does not have an equal probability of use; and ( 4) there is a finite rate at which human beings can "use the resource," i.e., people can read just so f~st. None of these self-limiting factors applies to say, electric power, ra- dxo and TV broadcasting, telecommunication services, or similar utilities. The library picture could become quite different if these limitations were removed or mitigated. Suppose the patron could access the biblio- graphic apparatus through his home computer terminal attached to his TV 24 Journal of Libmry Automation Vol. 7 ;1· March 1974 in the "wired city." Further suppose that he could receive selected, short items (where time of delivery is important to him) directly at his TV set, or longer items having less time value as microforms or hard copy delivered by mail or private delivery systems. Given such possibilities, the collecting policies of individual .. libraries" (if they continue to be called by that name) might well change drastically so that nationally, collections might become much more standardized or .. homogenized" -increasing the likeli- hood that individual holdings will have more nearly equal use probabili- ties. This would imply the need for one or more national and/ or regional centers for servicing the less used materials, along with appropriate deliv- ery systems and pricing schedules. CONCLUSION Work on library automation has proceeded during a highly develop- mental period in the history of computing. In this sense, librarianship has suffered no worse than any other computer application, nearly all of which have gone through traumas of design, installation, redesign, reprogram- ming, etc. The main distinction is that in many of these other applications -government, military, industrial, or commercial-there have been . far greater resources available to the task and vastly greater experience with the development process. Despite the obstacles, progress in computerized bibliographic work has been far more significant and has achieved far more than many librarians-especially those unaccustomed to the develop- merit cycle..;..can appreciate. The snowballing growth of practical consortia and networks along with the successful installation and operation of sev- eral on-line bibliographic systems has already changed the face of libtari- anship in ·a very short time. Like the breaking of the sonic barrier, once the initial.difficulty is overcome, further progress is easier. The ·computer has successfully achieved what librarians have until re- cently· only paid lip service to: cooperation and wide sharing of an expen- sive· and large· resource. Though the linear growth model in libraries has been dead for some time, the recognition of this fact has riot yet penetrat- ed the entire profession. If libraries are to survive as viable institutions throughout this century and into the next, their leaders inust solve the fi- nancial, space, ·and human communication problems inherent in growth. Local autonomy, local self-sufficiency, and the "freedom" to ·avoid, evade, and even· undermine national standards now show up as expensive and dangerous luxuries-potentially self-destructive. Only through the com- putet will true library cooperation be possible~ Only the development of regional and national bibliographic networks,· with the assistance of sub- stantial federal funding, can really .. save" the library. The computer is ac- tually the' library's life insurance and blood plasma .. A failure to respond to the challenge of the ·computer could be fatal, for it is increasingly ap- parent that patrons growing up in the computer era will not patiently in- teract'with··library systems geared to nineteenth-century methods. Nothing Institutional Political and Fiscal FactorsjVEANER 25 in the educational system exists to .force people to use a given resource; people use the resources which are effective, responsive, and economical. If the computer is a better performer than the library, patrons will go to the computer. This will be pa!ticularly the case as computer services· become broader in coverage, simpler to lise, and unit prices continue to decline. Despite the serious and irritating problems associated with learning''tp ·use the computer,. librarians must continue aggressively to support. computer applications; indeed, library leaders can impart no more important mes- sage than this to their community leaders. · ACKNOWLEDGMENTS· I wish to thank the following persons for their support: Dr. E. Howard Brooks, who was vice-provost for academic affairs in 1971, and Da'vid C. Weber, director of libraries, respectively, Stanford University, for grant- ing the leave of absence which enabled me to undertake this project. I acknowledge with thanks the contributions of the following persons who reviewed early drafts of the paper, in many cases making valuable suggestions and in other instances helping me ward off errors: Mrs. Hen- riette D. Avram, head, MARC Development Office, Library of Congress; Hank Epstein, director of Project BALLOTS and associate director for li- brary and administrative computing, Stanford Center for Information Processing; Frederick G. Kilgour, executive director, Ohio College Library Center; Peter Simmons, professor of library science, University of British Columbia; Carl M. Spaulding, program officer, Council on Library Re- sources, Inc.; David C. Weber, director of libraries, Stanford University. REFERENCES 1. Barbara Evans Markuson, ed., Libra1'ies and Automation; Conference on Libraries and Automation, Warrenton, Va., 1963. (Washington, D.C.: Library of Congress, 1964). 2. U.S. Library of Congress, Automation and the Library of Congress; a survey spon- sored by the Council on Library Resources, Inc. (Washington, D.C.: Library of Congress, 1963), 3, Basil Stuart-Stubbs, "Trial by Computer: A Punched Card Parable for Library Administrators," Library ]ournal92:4471-4 (15 Dec. 1967). 4. Dan Mather, "Data Processing in an Academic Library: Some Conclusions and Ob- servations," PNLA Quarterly 32:4-21 (July 1968). 5. Lib1'aries and Information Technology: A National Systems Challenge; a Report to the Council on Library Resources, Inc., by the Information Systems Panel, Com- puter Science and Engineering Board. (Washington: National Academy of Sciences, 1972). 6. Anthony Oettinger, Run, Computer, Run (Cambridge, Mass.: Harvard University · Press, 1969), p.196. (These same comments were cited in Allen B. Veaner's earlier article, "Major Decision Points in Library Automation," College & Research Libraries :299-312. 26 Journal of Library Automation Vol. 7/1 March 1974 APPENDIX 1 List of Institutions Visited University of Alberta University of British Columbia University of Chicago Cleveland Public Library The College Bibliocentre, Ontario University of Colorado Columbia University Cornell University Harvard University University of Illinois Indiana University Massachusetts Institute of Technology University of Michigan New York Public Library Northwestern University Ohio College Library Center University of Pennsylvania Pennsylvania State University Umversity of Pittsburgh Purdue University Simon Fraser University Syracuse University University of Toronto Yale University 8926 ---- 27 Automatic Format Recognition of MARC Bibliographic Elements: A Review and Projection Brett BUTLER: Butler Associates, Stanford, California. A review and discussion of the technique of automatic format recogni- tion ( AFR) of bibliographic data are presented. A comparison is made of the record-building facilities of the Library of Congress, the Univer- sity of California (both AFR techniques), and the Ohio College Library Center (non-AFR). A projection of a next logical generation is described. INTRODUCTION The technique commonly identified as "format recognition" has more potential for radically changing the automation programs of libraries than any other technical issue today. While the development of MARC has provided an international stan- dard, and various computer developments provide increasingly lower op- erating costs, the investment in converting a catalog into machine-readable form has kept most libraries from integrating automated systems into their operations. The most expensive part of the conversion to machine-readable form has been the human editing required (generally by a cataloger) to iden- tify the many variable portions of the MARC-format cataloging record. A full cataloging record contains several hundred possible sections (or fields) in the MARC format. Research at the Library of Congress (LC) into this problem resulted in the concept of "format recognition" to re- duce cataloging input costs. With the automatic format recognition ( AFR) approach, an unedited cataloging entry is prepared (keypunched or otherwise converted to ma- chine-readable form). Then the AFR computer program provides identi- fication of the various elements of the catalog record through sophisticat- edcomputer editing. A degree of human post-editing is generally assumed, but· the computer basically is assigned the responsibility of editing an un- .(lae;n.1:itie~d block of text into a MARC-format cataloging record. pioneering AFR work at the Library of Congress is presently in use original cataloging input to the MARC Distribution Service. This 28 Journal of Libm1·y Automation Vol. 7/1 March 1974 system is quite sophisticated because its output goal is a complete MARC record with all fields, subfields, tags, and delimiters identified almost en- tirely through computer editing. The Institute of Library Research (ILR) at the University of Califor- nia, faced with the need to convert 800,000 catalog records to MARC for- mat, has developed a ·Jess ambitious AFR program which provides a level of identification sufficient to provide the desired book catalog bibliograph- ic output, or to print catalog cards. . . · The aim of this paper is to examine these two AFR strategies and con- sider their implications for input of two major classes of cataloging rec- ords: ( 1) LC or other cataloging records in standard card format; and ( 2) Original cataloging not yet in card format. Comparing the two AFR strategies to an essentially non-AFR format used at the Ohio College Library Center for on-line ca:taloging input, we will propose a median strategy for original cataloging. format recognition ( OFR). The thesis is that differing strategies of input should be used for records already formatted into catalog card images and for those original cataloging items being input prior to card production. AUTOMATIC FORMAT RECOGNITION An examination of the Library of Congress ( LC), University of Cali- fornia ( U C), Ohio College Library Center ( OCLC), and original format recognition ( OFR) strategies will show the operating differences .. A ·de- tailed field-by-field comparison of the nearly 500 distinct codes which can be identified in creation of a MARC record is attached as Appendix I. General comparisons can be made in several areas: input documents, man- ual coding, level of identification, input and processing costs, error correc- tion, and flexibility in use. Input Documents-The LC/AFR program operates from an uncoded typescript to a machine-readable record prepared through MT /ST mag- netic tape input. This typescript is, however, prepared from an LC cata- loger's Manuscript Worksheet, in which thereis some inherent bibliograph- ic order.· The LC/ AFR program does not rely on this inherent order al- though its design takes advantage of the probable order in search strate- gies. LC/ AFR could operate with keying of catalog cards, book catalog en- tries~ or any structure of bibliographic data. The UC program is designed more specifically to handle input of for- matted catalog cards, and some of its AFR strategy is based on· the se- quence and physical indentation pattern on standard catalog cards. It would not work effectively on noncard format input without special recog- nition of some tagging conventions. The OCLC program allows direct input to CRT screen from any input docutnent; it requires complete identification of each cataloging field or subelement input. Automatic Fo1'mat Recognition/BUTLER 29 Manual Coding-LC/ AFR requires minimal input coding. Within the title paragraph, the title proper, the edition statement, and imprint are explicitly separated at input. Series, subject, and other added entries are recognized initially from the Roman and Arabic numerals preceding them. Aside Jrom these items, virtually all MARC fields are recognized by the computer editing program. UC/ AFR inserts a code after the call number input, thus providing ex- plicit identification at input. It also identifies each new indentation on the catalog card explicitly, thus implicitly identifying main entry, title, and certain other major cataloging blocks on the card. The OCLC input specifications require explicit coding, some of which is prompted by the CRT screen. Level of Identification-LC/ AFR provides the highest possible level of MARC record identification, deriving practically every field, subfield, and other code if it is present in an individual cataloging record.~ In evaluation of this element of LC/ AFR it should be realized that the needs of the Li- brary of Congress in creating original MARC records for nationwide dis- tribution (and its own use) are much more sophisticated and complex than those of any individual user library or system. The UC/ AFR approach reflects a more task-oriented approach, deriving a sufficient level of identification to separate major bibliographic elements. This technique is clearly sufficient to produce computer-generated catalog cards or similar output in a standard manner. However, UCjAFR lacks several identifiers, such as specific delimitation of information in the im- print field, which would make feasible the use of its records for further computer-generated processes. The OCLC input format is of variable level; many elements are option- al and are noted with an asterisk in Appendix I. At its most complete, the OCLC format specifically excludes only a very few MARC fields, most no- tably Geographic Area and Bibliographic Price. Input and P1'ocessing Costs-Direct cost information has not been pub- lished for production costs of any of the format recognition systems. The Library of Congress has reported that ..... the format recognition tech- nique is of considerable value in speeding up input and lowering the cost per record for processing."3 While formal reports have not been pub- lished, informed opinion has placed the cost of creation of a MARC rec- ord at a level of $3.00 ± $.50. Format recognition is credited with an in- crease in productivity of about one-third on input keying and an increase of over 80 percent in human editing/proofreading, and actual computer 0 A number of standard subdivi'sions of various fields were first announced as part of the MARC format in the 5th edition of Books: A MARC Format, which was pub- lished in 1972.1 Consequently they are not specified in Format Recognition Process for MARC Records, published in 1970, which was used as the reference for this paper.2 They are, however, clearly subfields which could be identified by expansions of AFR. These elements are marked with a lower-case "r" in Appendix I. 30 Journal of Libtary Automation Vol. 7/1 March 1974 processing times approximate those achieved with earlier Library of Con- gress MARC processing programs.4 lt would seem that AFR may have low- ered Library of Congress MARC processing costs to the level of $2.00 ± $.50. In the final report of the RECON Pilot Project, cost simulation pro- jections for full editing and format recognition editing were given as $3.46 and $3.06 per record, respectively.6 While full cost information has not been derived for the UC/ AFR program itself, figures have been informally reported at library automa- tion meetings indicating that the cost of record creation was approximate- ly $1.00 per entry. Included in this figure is computer editing of name and other entries against a computerized authority file, which is done manual- ly in the LC/ AFR system. This program is undeniably the least-cost ef- fort to date providing a MARC-format bibliographic record. No cost data are provided on the OCLC on-line input system. It can be observed that the coding required is quite similar to the pre-AFR system in use at the Library of Congress itself, and that on-line CRT input had been evaluated at LC as a higher-cost input technique than the magnetic- tape typewriters currently providing MARC input. LC is considering, though, on-line CRT access for subsequent human editing of the MARC record created through off-line input and AFR editing. Error Rate and Correction-Any AFR strategy, with present state of the art, generates some error above the normal keying rate observed with edited records. The strategy aims for lowest overall cost by catching these errors in a postprocessing edit which must be performed even for records edited prior to input. The Library of Congress reports, "The format rec- ognition production rate of 8.4 records per hour (proofing only) . . . is slightly less than that (about 9.2 per hour) for proofing edited records. With format recognition records, the editors must be aware of the errors made by the program ... as well as keying mistakes."6 The savings in pre- keyboard editing and increased keying rates more than make up for this slight decrease in postprocessing editing. At the Library of Congress, where AFR is used for production of MARC records, a full editing process aims at 100 percent accuracy of in- put. While such a goal is statistically unreachable, considerable effort· is expended by the MARC Distribution Service to provide the most accurate output possible. From a systems perspective, errors existing in MARC rec- ords are perhaps less reprehensible than errors in printed bibliographic output, simply because the distributed MARC record can be updated by subsequent distribution of a "correction" record. It should be noted that some MARC subscribers have voiced concern about the increased percent- age of "correction" records, which the Library of Congress indicates come primarily from cataloging changes rather than input edit errors. The UC/ AFR program clearly takes a statistical approach to biblio- graphic element input and processing. Shoffner has indicated that the scale of the 1,000,000 record input project caused a reevaluation of the feasibil- ity of traditional procedures. 7 The result is, in the UC/ AFR implementa- Automatic Format Recognition/BUTLER 31 tion, a MARC record essentially devoid of human editing. For a smaller scale of production, the UC approach could be combined with post-editing such as that used at LC to increase overall file accuracy. In passing, how- ever, it should be noted that rather sophisticated verification techniques are used in the UC/ AFR approach which could be of value in future ap- proaches. These include, for instance, comparison of all words against a machine-readable English-language dictionary; words not found in the dic- tionary are output for manual editing as suspected keypunch errors. Little information is available on the error rates and corrections in the OCLC system. However, most records keyed to the OCLC system are for a local member's catalog card production, so feedback is provided and pre- sumably errors are corrected through re-inputting to obtain a proper set of catalog cards. There is no central control on the quality of locally entered OCLC records at present, except for the encoding standards developed by OCLC. Flexibility in Use-A number of considerations are appropriate here- how many types of format (catalog cards, worksheets, etc. ) can be used as input, how many possible outputs can be developed from the derived MARC format, how adaptable is the system to remote and multiple input locations, how many special equipment restrictions are there? The LC/ AFR program is clearly the most flexible in ability to accept varying inputs and provide a flexible output. It is, however, not capable of any authority-file editing at present (this is done manually against LC's master files before input). While the input form could be used rather easily at remote locations, the MARC AFR programs themselves are not available for use outside the Library of Congress. The UC/AFR program provides a rather minimal set of cataloging ele- ment subfields but does provide more sophisticated textual editing within the program. It is quite adaptable to remote input as long as the original "worksheet" is in catalog card format, a restriction which in effect requires a preinput human editing step for original cataloging input. The MARC format provided would not be sufficient for some currently operating pro- grams using the full MARC format, but is quite sufficient for most bib- liographic outputs. The OCLC input program is dependent on visual editing at the time of CRT keying. Its flexibility in input is considerable, and outputs can ap- proach a full MARC record if all optional fields are identified. ORIGINAL FORMAT RECOGNITION A working conclusion of this review is that an AFR program developed according to the strategy of the University of California will deliver a satisfactory MARC-format record at a lower cost than other AFR or non- AFR alternatives. However, much of the efficiency of the UCjAFR is based on the presence of an already existing LC-format catalog card from which to keyboard machine-readable data. For original cataloging to be keyboarded from a cataloger's worksheet, 32 Journal of Library Automation Vol. 7/1 March 1974 an original format . recognition strategy is proposed which · provides a somewhat more detailed format than the UC/ AFR MARC while retain- ing a generally flexible system and low input costs. Several system consid- erations also guide the design of an OFR system designed for relatively general-purpose user input and multiple output functions: • no special equipment requirements for input keying; • no special knowledge of the MARC format required; • minimal table-lookup or text searching in processing; • flexible options for depth of coding provided; and • sufficient depth of format derived for most applications. The OFR input strategy outlined in Appendix I provides a much greater degree of explicit field coding at input than the AFR programs outlined above. The basis for this decision is the judgment that this cataloging, being done originally by a professional, can readily be coded by element name prior to input. No effort is made to identify MARC field elements which· occur with very low frequency, or which are of limited utility for most applications. For instance, the "MEETING" type of entry occurs in all combinations, in only 1.8 percent of all records studied by the Library of Congress in its format recognition study. 8 MARC elements requiring either extensive human editing or complex computer processing are likewise excluded from input, on a cost-utility basis. An example is the Geographic Area Code, which must either be as- signed by a knowledgeable editor or derived through extensive computer searching for the city /county of publication. However, where little penalty is attached to allowing input of coded in- formation, the OFR format allows input for inclusion in the derived MARC-format record. CONCLUSION It is clear that the AFR programs developed for specific needs by the Li- brary of Congress and the University of California can be great factors for change in library automation strategies over the next decade. Striking benefits in cost savings, ease of input, and subsequent processing are to be gained. The abbreviated outline of an original cataloging ( OFR) input strategy is simply a suggestion of a second generation of format recognition pro- grams which will undoubtedly develop to serve more general needs for MARC-format bibliographic input. REFERENCES 1. U.S. Library of Congress, MARC Development Office, Books: A MARC Format. 5th ed. (Washington, D.C.: U.S. Government Printing Office, 1972). 2. Format Recognition Process for MARC Records: A Logical Design (Chicago: American Library Association, 1970). Automatic Format ReeognitionjBVTLER 33 3. Henriette D. Avram, Lenore S. Maruyama, and John C. Rather, "Automation Ac- tivities in tl:te Processing Department of the Library of Congress," Library Re- sources & Technical Services 16:195-239 (Spring 1972). · . 4. Ibid., p.204, 206. 5. RECON Pilot Project, RECON Pilot Project Final Report, prepared by Henriette D. Avram. (Washington, D.C.: Library of Congress, 1972). 6. Avram, et al., "Automation Activities," p.206. 7. Ralph M. Shoffner, Some Implications of Automatic Recognition of Bibliographic Elements, Technical Paper No. 22. (Berkeley, Calif.: Institute of Library Research, University of California, April 1971) . 8. Format Recognition Process, p.48. APPENDIX I Format Recognition Input Specifications Code Outline FIELD TAG The number listed is the field tag number of that bibliographic ele- ment in the MARC format. Each general field is listed first. Following it are notes indicating areas within the field. Fixed-field indicators within the field are listed first; each one's code number follows a slash after the field code (041/ 1 =field 41, indicator code 1). If there is more than one group of indicators, an additional code describes group 1 (Il) or group 2 (I2). Subfields within the field are alphabetic codes following a "+" sign after the field code ( 070+b =field 070, subfield b). FIELD NAME The overall field name is listed first. Fixed-field indicator names are listed at the first indenti'on under the Field N arne. Subfield names are listed at the second indention under the Field N arne. TREATMENT BY PROGRAM These codes indicate the processing provided for each field and subelement by the four computer processing systems considered. Codes are slightly different for each column considered: LC The Library of Congress system. "R'' indicates that the element described is recognized by the program, rather than explicitly identified at input. "I" indicates the element is keyed and not recognized by the format recognition process. A small "r" denotes elements introduced to the MARC format since AFR documentation was published, but presumably treated by the AFR program just as "R" elements. "0" indicates that the element marked is omitted from input altogether. UC The University of California system. Codes are identical to those above, but the "r" code is not used. OCLC The Ohio College Library Center system. In addition to the above codes, "~" following any item denotes that input is optional. "I" code is used wherever an element is tagged even though the OCLC programs create the MARC format from these tags. OFR Original Format Recognition proposals. Codes are similar to those described in the previous paragraphs. FIELD TAG, INDICATOR 015 015+a FORMAT RECOGNITION INPUT SPECIFICATIONS TREATMENT BY PROGRAM FIELD NAME LC UC OCLC OFR National Bibliography No. R 0 0 I~ 34 Journal of Library Automation Vol. 7/1 March 1974 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 025 Overseas Acquisition No. R 0 0 0 025ta 041 Languages R R I I" 041/0 Multilanguage indicator R 041/1 Translation indicator R 041ta Text/translation code R I 041tb Summary language code R I 043 Geographic Area Code R 0 0 0 043ta 049 Holdings Information 049ta Holding library code 0 I I I 050 LC Call Number R R" I R" 050/0 Book is in LC R 050/1 Book not in LC R 050ta LC Class Number R I R 050tb Book Number R I R 051 LC Copy Statement R 0 0 0 051ta 051t b 051tc 060 Natl. Lib. Medicine Call No. R 0 R" 060ta NLM Class Number R I" 060t b NLM Book Number R I" 070 N.A.L. Call Number R 0 R" 070ta NAL Class Number R 0 070t b NAL Book Number R 0 082 Dewey Decimal Classif. No. R 0 I R" 082ta DDC Number R I 086 Su. Docs. Classif. No. r 0 I 0 086ta Su. Docs. Number r I 090 Local Call Number ( LC) 0 R R 090ta LC Class Number I" R 090tb Book Number I" R 092 Local Call Number (Dewey) 0 0 .. I" 092ta Dewey Class Number I" I" 092tb Book Number I" I" 100 Personal Name R R R 100/0,11 Forename R 100/1, 11 Single Surname R 100/2,11 Multiple Surname R 100/3,11 Name of Family R 100/0,I2 Main Entry not Subject R 100/1,I2 Main Entry is Subject R lOOt a Name R I lOOt b Numeration R" I lOOtc Title assoc. w /name R I lOOtd Date R I lOOte Relator R I lOOtk Form Subheading R I lOOtt Title of Book R I lOOtl Language r I" lOOt£ Date of Work r I" Automatic Format Recognition/BUTLER 35 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR IOO+p Part of Work r I" 110 Corporate Name R 0 I" 110/0,11 Inverted Surname R I" 11011,11 Place or Place + Name R 110/2,11 Direct-order Name R I" 110/0,12 Main Entry not Subject R 110/1,12 Main Entry is Subject R 110+a Name R I 110+b Subordinate Unit R I 110+c Relator R I 110+k Form Subheading R I 110+t Title of Book R I 110+u Nonprinting Element R 0 110+1 Language r I" 110+p Part Code r P' 110+f Date of Work r I" llO+g Miscellaneous r I" Ill Conference or Meeting, M.E. R 0 I 0 111/0,11 Inverted Surname R 111/1, 11 Place or Place+ Name R 111/2,11 Direct-order Name R 111/0,12 Main Entry not Subject R 111/1,12 Main Entry is Subject R 111 +a Name R, I 111 + b Number R I 111+c Place R I 11l+d Date R I 111+e Subordinate Unit R I 111 +f Date of Publication r I" 111+g Miscellaneous R I" 11l+k Form Subheading R I 111+1 Language r I" 111+p Part r I" 111+t Title of Book R I 130 Uniform Title Heading, M.E. R 0 I I" 130,11 Blank 130/0,12 Main Entry is not Subject R 130/1,12 Main Entry is Subject R 130+a Uniform Title Heading R I 130+£ Date of Work r I" 130+ g Miscellaneous r I" 130+h Media Qualifier r I" 130+k Form Subheading r I 130+1 Language r I" 130+p Part r I" 130+s Alternate Version r I" 130+t Title of Book R I 240 Uniform Title, Supplied R R I I" 240/0,11 Not Printed on LC Cards R 240/1,11 Printed on LC Cards R R 240+ a Uniform Title R R I 240+£ Date of Work i I" 240+k Form Subheading r, I 240+p Part of Work r I" 36 Journal of Librmy Automation Vol. 7/1 March 1974 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC ,, ·. uc OCLG OFR 240+s Version r I" 241 Romanized Title R 0 I" 241/0, I1 Not Printed on LC Cards R 241/1,!1 Printed on LC Cards R 241 +a Romanized Title R I"' 245 Title R R I I 245/0, I1 No Title Added Entry R R R 245/1, I1 Title Added Entry R R R 245/0,I2 Nonfiling Field R 0 245+a Short Title R R I R 245+b Subtitle R R I R 245+c Title Page Transcription R R I R 250 Edition Statement R 0 I R 250+a Edition R I 0 250+ b Additional Information R I 260 Imprint Statement R 0 I I 260/0 Publisher not M.E. R I R 260/1 Publisher is M.E. R I R 260+a Place of Publication R I R 260+b Publisher R I R 260+c Date of Publication R I R 300 Collation R R I R 300+a Pagination or Volume R R I R 300+b Illustration R 0 I 0 300+c Height R 0 I 0 350+a Bibliographic Price R 0 0 I 400 Series, Personal Name R (R) I R 400/0, I1 Forename R 400/1, I1 Single Surname R 400/2, I1 Multiple Surname R 400/3, I1 Name of Family R 400/0,I2 Author not Main Entry R 400/1,I2 Author is Main Entry R 400+a Name R I R 400+b Numeration R I 400+c Title Associated R I 400+d Dates R I 400+e Relator R I 400+k Form Subheading R I 400+f Date of Work r I"' 400+1 Language r I"' 400+p Part of Work r I" 400+t Title of Book R I 400+v Volume or Number R I 410 Series, Corporate Name R (R) I I 410/0, I1 Inverted Surname R R I" 410/1, I1 Place, Place + N arne R R 410/2, I1 Direct-order Name R R I" 410/0,12 Author not Main Entry R R 410/1,12 Author is Main Entry R R 410+a Name R I 410+b Subordinate Unit R I 410+e Relator R I Automatic Format Recognition/BUTLER 37 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 410+f Date of Work r I" 410+g Miscellaneous r I" 410+k Form Subheading R I 410+1 Language r I" 410+p Part r I" 410+t Title of Book R I 410+u Nonprinting Element R 0 410+v Volume R I 411 Series, Conference Title R 0 I I" 411/0, II Inverted Surname R 411/1, II Place, Place+ Name R 411/2, II ·.Direct-order Name R 411/0, I2 Author not Main Enhy R 411/1, I2 ,Author is Main Enhy R 411+a Name R I 4ll+b Number R I 411+c Place R I 41l+d Date R I 411+e Name Subordinates R I 41l+f Publication Date r I" 411+g Miscellaneous r I" 41l+k Form Subdivision r I 411+1 Language r I" 4ll+p Part r I" 411+t Title of Book R I 4ll+v Volume R I 440 Series, Title R R I I 440+a Title R R I R 440+v Volume or Number R I R 490 Series, Untraced or R R R T:raced Differently I 490/0 . Series Not Traced I 490/1 • Series Traced Diff. R I R 490+a Series Name R R I R 500 Bibliographic Notes R R R 500+a General Note R R I" 501 +a "Bound With" R 0 502+a Dissertation R I" 503+a Bibliography History 0 0 504+a Bibliography Note R I 505 Contents Note R R 505/0 . Contents Complete R 505/1 · Contents Incomplete R 505/2 Partial Contents R 505+a Contents Note R I" 520+a Abstract or Annotation R I 600 Subject A.E., Personal R R I I 600/0, II Forename R 600/1, II Single Surname R 60012, II Multiple Surname R 600/3, II Name of Family R 60010, I2 LC Subject Heading Code R I 600/1, I2 Annotated Card Heading R I 38 ] ournal of Library Automation Vol. 7/1 March 1974 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 600/2,12 NLM Subject Heading Code R I 600/3, I2 NAL Subject Heading Code R 0 600/4, I2 Other Subject Heading R I I 600+a Name R I 600+b Numeration R I 600+c Associated Title R I 600+d Date R I 600+e Relator R I 600+f Date of Work r I"' 600+k Form Subheading R I 600+1 Language r I"' 600+t Title of Book R I 600+p Part of Book r I"' 600+x General Subdivision R I 600+y Period Subdivision R I 600+z Place Subdivision R I 610 Subject A.E., Corporate R 0 I I 610/0,11 Inverted Surname R 610/1,11 Place, Place+ Name R 610/2,11 Direct-order Name R 610/0,I2 LC Subject Heading Code R I 610/1,I2 Annotated Card Heading R I 610/2,I2 NLM Subject Heading Code R I 610/3,12 NAL Subject Heading Code R 0 610/4,12 Other Subject Heading R I I 610+a Name R 0 1 I 610+b Subordinate Unit R I 610+e Relator R I 610+f Date of Work r I"' 610+k Form Subheading R I 610+1 Language r I"' 610+g Miscellaneous r I"' 610+p Part r I"' 610+t Title of Book R I 610+u Nonprinting Element R 0 610+x General Subdivision R I 610+y Period Subdivision R I 610+z Place Subdivision R I 611 Subject A.E., Conference R 0 I 0 611/0,11 Inverted Surname R 611/1,11 Place, Place + N arne R 611/2,11 Direct-order Name R 611/0, I2 LC Subject Heading Code R I 611/1, I2 Annotated Card Heading R I 611/2, I2 NLM Subject Heading Code R I 611/3, I2 NAL Subject Heading Code R 0 611/4, I2 Other Subject Heading R I 611 +a Name R I 611+b Number R I 61l+c Place R I 61l+d Date R I 61l+e Subordinate Unit R I 611+f Publication Date r IO 61l+g Miscellaneous R I"' Automatic Format Recognition/BUTLER 39 APPENDIX 1 (continued) FiELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 6ll+k Form Subheading R I 611+1 Language r I~ 61l+p Part r I~ 61l+t Title of Book R I 6ll+x General Subdivision R I 6ll+y . Period Subdivision R I 6ll+z Place Subdivision R I 630 Subject A.E., Uniform Title R 0 I 0 630/0,I2 LC Subject Heading Code R I 630/1,I2 Annotated Card Heading R I 630/2, I2 NLM Subject Heading Code R I 630/3,I2 NAL Subject Heading Code R 0 630/4, I2 Other Subject Heading R I 630+a Uniform Title Heading R I R 630+£ Date of Work r I~ 630+g Miscellaneous r I~ 630+h Media Qualifier r I~ 630+k Form Subdivision r I 630+1 Language r I~ 630+p Part r I~ 630+s Alternate Version r I~ 630+t Title R I 630+x General Subdivision R I 630+y Period Subdivision R I 630+z Place Subdivision R I 650 Subject A.E., Topical R R I R 650/0, I2 LC Subject Heading Code R I 650/1, I2 Annotated Card Heading R I 650/2,I2 NLM Subject Heading Code R I 650/3,I2 NAL Subject Heading Code R 0 650/4, I2 Other Subject Heading R I I 650+a Topical Subject, Place R I 650+b Element after Place R I 650+x General Subdivision R I 650+y Period Subdivision R I 650+z Place Subdivision R I 651 Subject A.E., Geographic R 0 I 0 651/0, I2 LC Subject Heading Code R I 651/1,I2 Annotated Card Heading R I 651/2, I2 NLM Subject Heading Code R I 65113,12 NAL Subject Heading Code R 0 65114,12 Other Subject Heading R I 651+a Geographic Name, Place R I 651+b Element After Place R I 651+x General Subdivision R I 651+y Period Subdivision R I 651+z Place Subdivision R I 690 Subject A.E., Local Topical 0 0 I~ 0 690+a Topical Subject, Place 0 I 690+b Element After Place 0 I 690+x General Subdivision 0 I 690+y Period Subdivision 0 I 690+z Place Subdivision 0 I 40 journal of Libm1·y Automation Vol. 7/1 March 1974 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 691 Subject A.E., Local Geogr. 0 0 F 0 691+a Geographic Name, Place 0 I 691+b Element After Place 0 I 691+x General Subdivision 0 I 69l+y Period Subdivision 0 I 691+z Place Subdivision 0 I 700 Other A.E., Personal Name R R I R 700/0, I1 Forename R 700/1, I1 Single Surname R 700/2, I1 Multiple Surname R 700/3, I1 Name of Family R 700/0, I2 Alternate Entry R 700/1, I2 Secondary Entry R 700/2, I2 Analytical Entry R 700+a Name R I 700+b Numeration R I 700+c Title Associated R I 700+d Date R I 700+e Relator R I 700+f Publication Date R I~ 700+k Form Subheading R I 700+1 Language r IO 700+p Part of Work r I~ 700+t Title of Book R I 710 Other A.E., Corporate Name R 0 I J~; 7~0/0, I1 Inverted Surname R 710/1, I1 Place, Place + N arne R 710/2, I1 Direct-order Name R 710/0, I2 Alternate Entry R 710/1, I2 Secondary Entry R 710/2, I2 Analytical Entry R 710+a Name R I 710+b Subordinate Unit R I 710+e Relator R I 710+f Date of Work r I~ 710+g Miscellaneous r IO 710+k Form Subheading R I 710+1 Language r I~ 710+p Part of Work r I~ 710+t Title of Work R I 710+u Nonprinting Element R 0 711 Other A.E., Conference R 0 I I~ 711/0, I1 Inverted Surname R 711/1, I1 Place, Place+ Name R 711/2, I1 Direct-order Name R 711/0, I2 Alternate Entry R 711/1, I2 Secondary Entry R 711/2, I2 Analytical Entry R 711+a Name R I 711+b Number R I 711+c Place R I 711+d Date R I 711+e Subordinate Units R I 711+f Date of Work r I~ . ' Automatic For.mat Recognition/BUTLER 41 APPENDIX 1 (continued) FIELD•TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 711tg Miscellaneous R I" 711tk Form Subheading R I. 711tl Language r I" 711tp Part of Work r ·I" 711tt Title of Book R I 730 Other A.E., Uniform Title R 0 I R 730/0,I2 Alternate Entry R 730/1, I2 Secondary Entry R 730/2,I2 Analytical Entry R 730ta Uniform Title R I 730tf Date of Work r I" 730tg Miscellaneous r I" 730th Media Qualifier r I" 730tk Form Subdivision r I" 730tl Language r I" 730tp Part of Work r I" 730ts Alternate Version r I" 730tt Title of Work R I 740 Other A.E., Title Traced Differently R R I R 740/0, I2 Alternate Entry R 740/l,I2 Secondary Entry R 740/2,I2 Analytical Entry R 740ta Title Different R I 800 Series A.E., Personal R R I" I 800/0 Forename R 800/1 Single Surname R 800/2 Multiple Surname R 800/3 Name of Family R SOOt a Name R I SOOth Numeration R I BOOte Title Associated R I 800td Dates R I BOOte Relator R I 800tf Date of Work r I" 800tk Form Subheading R I 800tl Language r I" 800tp Part of Work r I" 800+t Title of Work R I 800tv Volume or Number R I 810 Series A.E., Corporate R R I" I" 810/0 Inverted Surname R 810/1 Place, Placet Name R 810/2 Direct-order Name R SlOta Name R I 810tb Subordinate Unit R I 810te Relator R I 810tf Date of Work r I" 810tg Miscellaneous r I" 810tk Form Subheading R I 810tl Language r I" 810tp Part of Work r I" SlOtt Title of Work R I 810tu Nonprinting Element R 0 42 ] ournal of Library Automation Vol. 7/1 March 1974 APPENDIX 1 (continued) FIELD TAG, TREATMENT BY PROGRAM INDICATOR FIELD NAME LC uc OCLC OFR 810+v Volume or Number R I 811 Series A.E., Conference R 0 I<) 0 811/0 Inverted Surname R 811/1 Place, Place+ Name R 811/2 Direct-order Name R 811+a Name R I 811+b Number R I 811+c Place R I 811+d Date R I 811+e Subordinate Unit R I 811+f Date of Work r IO 811+g Miscellaneous R JO 811+k Form Heading R I 811+1 Language r I"' 811+p Part of Work r I"' 811+t Title of Book R I 811+v Volume or Number R I 840 Series A.E., Title R 0 I"' 0 840+a Title R I 840+v Volume or Number R I 590+a Local Notes Field 0 0 I"' 0 910+a User Option Data Field 0 0 I"* 0 8927 ---- HIGHLIGHTS OF ISAD BOARD MEETING 197 4 Midwinter Meeting Chicago, Illinois Monday, January 21, 1974 43 The meeting was called to order at 10:15 a.m. by President Frederick Kil- gour. Those present were: BOARD-Frederick G. Kilgour, Lawrence W. S. Auld, Paul J. Fasana, Donald P. Hammer (ISAD Executive Secretary), Susan K. Martin, Ralph M. Shoffner, and Berniece Coulter, Secretary, ISAD. GUEST-Brett Butler. MIDWINTER 1973 MINUTES APPROVED. MOTION. Mr. Shoffner moved to approve the minutes of the Midwin- ter 1973 Board Meetings. SECONDED by Mr. Fasana. CARRIED. LAS VEGAS ANNUAL MEETING MINUTES ACCEPTED. A correc- tion on page one of the Las Vegas Annual Meeting Minutes was noted: Mr. Auld's name should be added to the list of guests present. MOTION. Mr. Fasana moved that the minutes of the ISAD Board meetings at the Las Vegas Annual Conference be accepted as correct- ed. SECONDED by Mrs. Martin. CARRIED. ISAD HISTORY COMMITTEE. The matter of appointing members to the ISAD History Committee, whose function is to prepare a history of ISAD for ALA's Centennial celebration in 1976, was considered. Mr. Shoffner said that during the time he was president, he had rendered the ISAD History Committee inactive. It was suggested by Mr. Kilgour that a historian would serve the purpose better than a committee. Mr. Shoffner remarked that he anticipated the chairman would be a historian. Mrs. Mar- tin asked whether a check could be made first whether ALA is planning to publish any document for the Centennial celebration that would make any preparation by an ISAD committee or historian worth while. Mr. Kilgour remarked that ISAD definitely should be included if ALA did plan to publish any document and asked the board to give an "OK" to appoint a historian. MOTION. Mr. Fasana moved that the ad hoc ISAD History Committee 44 Journal of Library Automation Vol. 7/1 March 197 4 be abolished and recommended that the president be given the right to appoint a historian if ALA planned to publish a Centennial docu- ment. SECONDED by Mr. Auld. CARRIED. ALA DUES STRUCTURE. Mr. Hammer explained the information submitted to the board concerning the proposed ALA. dues structure. The basic fee for ALA membership under this proposed dues structure would be $35. Membership in each division would be an additional $15. In es~ sence, each division would be on its own financially:· If there are not enough memberships to support a division, as could be the case, the divi- sion would cease to exist. !SAD could support itself with its present membership, but there is no. way of knowing how many !SAD members would still select !SAD if the choice of two divisions included in the dues was removed. The divisions that publish a journal would attract membership much more easily than those that do not provide a journal. Mr. Hammer further remarked that the proposed dues schedule indicates that the divisions must prove them- selves with membership dues as their only support, but this does not apply to ALA Committees, SCMAI, units such as the Office for Intellectual Free- dom, Office for Library Service to the Disadvantaged, and the administra- tive and support units of ALA. These units may be of great value to ALA, but if one tinit is forced to prove its value financially, then it seems that all should have to prove themselves. The divisions would be expected to depend on their own resources, e.g., if the division runs out of postage ·money, there would be no further mail- ings. The divisions would be expected to pay for their support services.· The idea is very closeto the federation plan which has been circulated for some time. In answer to the question of how a new division would get started, Mr. Hammer replied that he assumed there would have to be enough memberships to provide for it financially. Mr. Shoffner suggested that the discussion be divided into two parts: ( 1) the principle involved; and ( 2) the financial aspect. · The following points were brought up in the ensuing discussion by the board regarding the proposed dues structure: . Starting a new division could be a problem; perhaps it could be subsi- dized for a stated time, after which the division: would be self-sufficient. The proposed separation of dues, however, would force a clarity in ex~ penditures of. ALA in respect to how the divisions would benefit. Some divisions could not be self-supporting and yet are producing im- portant contributions for ALA. ' ' A division would be at the mercy of the ALA supporting units. If a sup~· port unit was not efficient, the diviSions would be handicapped in the ser- vices to their members. Would a division be able to know enough in advance how much money could be counted on for program planning? The answer was "yes" based Highlights of Meetings 45 on past membership, except in the first year. The income would be predict- ed on the basis of the previous year's income. An excess of income would remain in the division's funds. If the division income fell short of the an- ticipated amount, it would have no back-up from ALA as it has presently. A person could not join one or more of the divisions without joining ALA. Some divisions could become part of a stronger division, e.g., a division could be broken up and absorbed into several other divisions with related interests. Was there any plan to absorb or redirect these divisions which ob- viously could not be self-supporting? Nothing has been announced so far. If a division got into financial difficulties, it could not cut down on its professional staff as a professional staff is needed to maintain ALA's status with the Internal Revenue Service. It was noted that there were more im- portant reasons than this for maintaining a professional staff . . · This proposal was drafted by the then Deputy Director Ruth Warncke in 1970. The board was informed that a cost study of ALA was recently discussed by staff members, but the reply has been that it would take five years to make such a study. The ISAD Board disagreed with the period Of five years, but stated that it could take a year. . · A division should be allowed to set up its own budget under this pro~ posal as well as have a voice in ALA policy. · · The proposal appeared to be unfair in some points: ( 1) some divisions would have about twice their present income through memberships, while ISAD would break about even; ( 2) life members would be entitled to membership in all divisions; ( 3) apparently institutions without a group insurance plan of their own could join ALA for $35 and be entitled to the gioup insurance for their staffs; at some point an examination of the priv- ileges in each category of membership should be made; and ( 4) if the $35 ALA membership fee were increased in the future, this would directly af- fect membership in the divisions. The ISAD budget for the 1973/74 year is approximately $47,000 and the Journal of Library Automation $23,000, or a total of approximately $.70,000. If ISAD membership should fall back to 3,000 members and the membership fee were $25, ISAD could still be viable. "Mr. Kilgour's poll of the board revealed all were in favor of the prin- ciple of more or less independent divisions, but with reservations; The fol- lowing was therefore moved: · 'MOTION. Mr. Shoffner moved that the ISAD Board favors the prin- . ciple of divided annual fees for ALA and for its divisions subject to: ' · ( 1) division determination of the fee structure for division member- ships and publications; ( 2) division participation in the governance of ALA Headquarters activities. SECONDED by Mr. Fasana. MOTION CARRIED. ·SELECTIVE DISSEMINATION OF INFORMATION SYSTEM. Mr. 46 Journal of Library Automation Vol. 7/1 March 1974 Hammer presented a proposal for establishing on a subscription basis a Selective Dissemination of Information system for ALA members (see Exhibit 1). Mter discussion it was decided that Mr. Hammer would con- tact Ohio State University library and obtain information on exact proce- dure as to how this would be run, how it would be publicized, who would develop the profiles, who would handle the subscriptions, the cost to the division, etc., and then repmt to the board. CO-SPONSORSHIP OF BASIC DATA PROCESSING SEMINARS. Mr. Hammer presented a proposal to the board regarding co-sponsorship of basic data processing seminars with organizations outside ISAD, such as IBM and Dataflow Systems, Inc. in Bethesda, Maryland. In the past ISAD seminars have generally been on library applications, but what he had in mind, Mr. Hammer said, was primarily on the basics of data processing, systems analysis, and other basic aspects that would be of interest to ad- ministrators. The intent would be to give administrators enough knowl- edge so that they could evaluate the results that they should be gaining from their data processing systems. These institutes would be a package deal in that the personnel and materials would be commercially supplied, Dataflow has conducted seminars for the United States Civil Service Com- mission. IBM has some seminars which are free, but there is a charge if they have to develop a special program. Comment was made regarding seminars conducted several years ago where problems developed as to the commercial aspects. MOTION. It was moved by Mrs. Martin that the matter of !SAD's co- sponsoring basic data processing seminars with outside organizations be referred to the ISAD Program Planning Committee for discussion and their evaluation. SECONDED by Mr. Fasana. CARRIED. Tuesday, January 22, 1974 The meeting was called to order by the president, Mr. Kilgour, at 2:25 p.m. Those present were: BOARD-Frederick G. Kilgour, Lawrence W. S. Auld, Paul J. Fasana, Donald P. Hammer (ISAD Executive Secretary), Susan K. MQrtin, Ralph M. Shoffner, and Berniece Coulter, Secretary, ISAD. GUESTS-Alex Allain, Brigitte Kenney, Ron Miller, and Velma Veneziano. DRAFT ON ALA GOALS AND OBJECTIVES. Mrs. Brigitte Kenney sought feedback from the board on the paper previously distributed on the ALA Committee on Planning's Draft Statement on ALA's Goals and Objectives. Several changes were suggested. Mrs. Kenney expressed her ap- preciation for their input. FREEDOM TO READ FOUNDATION. Mr. Alex Allain from the foundation presented the cause of the Freedom to Read Foundation in re- Highlights of Meetings 47 gard to the current problem of censorship. He stressed the desire to keep channels open with the divisions of ALA and with systems and networks across the nation. MARBI AND ISAD STANDARDS COMMITTEE (TESLA). Velma Veneziano, chairman of the MARBI Interdivisional Committee, appeared before the ISAD Board requesting clarification of the functions of MARBI and the ISAD Standards Committee ( TESLA). She said that her committee would like discrepancies cleared up and duplications eliminat- ed. Mrs. Martin suggested that the charges to both MARBI and TESLA be reworded to clarify their functions. ISAD BYLAWS COMMITTEE. In response to discussions concerning the establishment of several committees, Mr. Shoffner MOVED to establish an Organization Committee. SECONDED by Mrs. Martin. Mr. Fasana pointed out that the mechanism for establishing a Bylaws Committee was already spelled out in the ISAD constitution. The president can appoint the committee. MOTION WITHDRAWN. Mr. Shoffner withdrew his motion. Mr. Fasana suggested that the Bylaws Committee also be charged with the organizational and review function. The matter of the Standards Com- mittee's function was also made the charge of the Bylaws Committee. Wednesday, January 23,1974 President Kilgour called the meeting to order at 10:15 a.m. Those pres- ent were: BOARD-Frederick G. Kilgour, Lawrence W. S. Auld, Paul J. Fasana, Donald P. Hammer ( ISAD Executive Secretary), Susan K. Martin, Ralph M. Shoffner, and Berniece Coulter, Secretary, ISAD. GUESTS- Brett Butler, John Kountz, Ann Painter, Charles Payne, James Rizzolo, Richard Utman, Velma Veneziano, and David Waite. REPORT OF THE NOMINATING COMMITTEE. The chairman, Charles Payne, announced the nominees for the 197 4/75 slate of ISAD candidates: Vice-President/President-Elect: Board Member-at-Large: Henriette A vram Allen Veaner Ruth Tighe Maurice Freedman The board members extended a vote of thanks to the Nominating Com- mittee for their work. REPORT OF MARC USER'S DISCUSSION GROUP. Mr. James Riz- zolo, chairman, said most of the discussion in the discussion group re- volved around ALA, CLR, and the change in CLR' s status which was moved in August from one IRS classification to another. It is now an "op- 48 journal of Library Automation Vol. 7/1 March 1974 erating foundation," i.e., it is active in programs rather than waiting for a reaction to a request using funds they have as a "carrot.'~ Also discussed was whether CLR should fund and pick the participants or CLR should do the funding and ALA pick the participants. , . Also the group considered the question of standards and how one ar- dves at them. There are a number of groups in ALA dealing with stan- dards, but there is a need to work out a systematic method of developing standards. There needs to be a routine mechanism set up for going from an imtial formulation of an idea for a standard to a standard that the profession can live with. REPORT OF PROGRAM PLANNING COMMITTEE.. The commit- tee met at the ASIS meeting in Los Angeles prior to meeting at the ALA Midwinter Meeting. . :rvir. Brett Butler, chairman, announced that three European librarians );lad been invited to participate in the 1974 Annual Program .at New York City. Mr. Kilgour was handling all arrangements. Mr. Kilgour informed the board that the travel expenses of all three librarians were ·being pro- vided for by sources outside ALA. Linda Crismond is the local planning person for the 1975. San Francisco Annual Conference program which will be sponsored jointly with ASIS. Joshua Smith had suggested Mark Radwin of Lockheed as liaison and he had agreed to serve in this capacity. . The New Orleans institute on "Alternatives in Bibliographic Network- ing" had enough registrants by Midwinter to confirm it. There had been some difficulty concerning contact with speakers but the .details had been straightened out. Copies of the program ·for the New Orleans institute were distributed. Mr. Butler also inforrried the board that his committee was looking into the details of cooperating with other institutions and state schools which might be interested in working with ISAD in a seminar or institute. The committee was also considering what type of programs should ·be present- ed, subcontracting to outside companies, and how to control these. The members of the committee were working on a procedure manual for use in conducting institutes .. TELECOMMUNICATIONS COMMITTEE REPORT. The activities of the Telecommunications Committee are highly organizational at pres- ent. The committee has swung away from Cable TV as its primary interest and towards telecommunications as applied to bibliographic networks. The chairman, David. Waite, said there was a need to set up a simple guide to carry out their charge for the educational activities and legislation advisory responsibilities to the ALA Committee on Legislation. More peo- ple would probably be appointed to the Telecommunications Committee as there was a need for more expertise to assign to the areas identified by the committee. . Highlights of Meetings 49 He further said that the need now is to determine what existing appara. tus may be utilized to fulfill the committee's responsibility to disseminate information regarding telecommunications as applied to the library com- munity so that the committee could put most of .its effort into technical work. One project discussed was to gather background information on biblio- graphic data centers and network activities and their needs for telecom- munication facilities in order to draft a requirements statement. The pur- pose of such a: statement is. that the committee could communicate .with new telecommunications systems. The committee was not aware of an ade· quate statement of library requirements that. is readily available ·for the commercial services that .are steadily increasing. Assignments have been given to Gordon Randall, Maryann Duggan, and Ron Miller to gather this information. Mtr. Waite remarked that the committee would be interested in ~ny re- port on the proposed ISAD networks committee when available. Brett. But- ler, chairman of the Program Planning Co:rrnnittee; suggested that a tele- commtmications institute should be in the future plans and Mr. Waite's or any of his committee members contribution of any ideas about· such would be appreciated. REPORT 6F THE INTERDIVISIONAL COMMITTEE ON MA- CHINE-READABLE BIBLIOGRAPHIC INFORMATION (MARBi). (See Exhibit 2~) Mr. Kilgoirr appointed Velma Veneziano to serveas liai~ son to the ISAD Standards Committee from MARBI. Her term as chair~ man of MARBI will conclude in Jnne 197 4. REPORT OF COLA DISCUSSION GROUP. (See Exhibit 3.) REPORT OF COMMITTEE ON TECHNICAL STANDARDS FQR LIBRARY 'AUTOMATION (TESLA). (See Exhibit 4.)~Report of Chairman Jolln Kountz. · · TECHNOLOGICAL UNEMPLOYMENT. President Kilgour felt ALA should do something about the spreading of unemployment due.' to in- creased use of technological development. M!r. Auld suggested that someone be appointed to study the potential and existing problems in this area. This could be funded either: (1.) ,un- der a fellowship by CLR; or (2) application for the J. Morris Jones .Goals Award. .· · · · Mr. Fasana. thought an interdivisional committee might be set up be~ tween the fotir rnost directly affected divisions: ISAD, LAD, LED.; 'and RTSD. ·. . . , Mr. Shoffner expressed· his view that as efficiency is ii:rcreased productivi- ty is increased aJ}d could possibly therefore increase employment. Mr.: Kil~ gour said tha.t.history had proved to.the contrary. Mr. Shoffner stated he felt the problem was on~. of education and ·.training. A specification· of 50 Journal of Library Automation Vol. 7/1 March 1974 what is expected of one and what training he would receive during a tech- nical changeover was needed. Mr. Fasana's suggestion was that the four divisions be asked for papers of their views or a program at the San Francisco Annual Conference be prepared on the subject of technological unemployment. Mr. Auld asked if it could not rather be introduced at the New York Annual Conference, to which Ann Painter volunteered the use of the ISAD /LED Education Committee's two-hour time slot for the program at New York. MOTION. Mr. Fasana moved that Mr. Kilgour phrase a statement of the problem on technological unemployment as he sees it and present it to the !SAD /LED Education Committee for consideration as the program theme at the New York conference. SECONDED by Mrs. Mar- tin. CARRIED. PROPOSED STANDARDS IN ]OLA TC. Mr. John Kountz brought up the subject of using lOLA TC for the interactive mechanism of present- ing the proposal of a standard to the ISAD members for comment, and of having a form included to be filled out and returned. The board agreed that this was a good idea. ISAD/LED EDUCATION COMMITTEE REPORT. Ann Painter, chairman, asked for clarification of appointment of new members to the committee. Roger Greer is the only member whose term continues past this year. Mr. Hammer was asked to find out who appoints members to the above committee. The committee is working on a series of papers defining educational "modules" and has sent out a revised questionnaire to identify appropriate subject areas. It is planning to send the questionnaires to associated institu- tions as well as to the ALA accredited schools. The need for funding the modules rather than depending upon volun- teer or "slave labor" was considered by the committee. Volunteers have lit- tle preparation time and so often there is a lack of in-depth or consistency in developing these modules. Also the committee would like to set up a file of modules available to people across the country. There could be a prob- lem of copyright involved. Mr. Kilgour asked Miss Painter for suggestions of people who might be interested in serving on the committee. lOLA MANUSCRIPTS. Mrs. Martin, editor of ]OLA, asked the board for its feeling on whether it would be appropriate or desirable to put the date of acceptance on published manuscripts in lOLA. The board decided that should be the editor's decision. VOTE OF THANKS TO MRS. MARTIN. The board gave Mrs. Susan Martin a unanimous vote of thanks for her work in getting the issues of ]OLA caught up to date in time to meet the Post Office deadline of De- cember 31, 1973 in order to retain the second class permit. Highlights of Meetings 51 REPORT OF THE MEMBERSHIP SURVEY COMMITTEE. (See Exhibit 5.) BOARD MINUTES IN lOLA. The board suggested that minutes pub- lished in ]OLA be entitled "Highlights of ISAD Board Meeting" rather than minutes. The meeting was adjourned at 12:30 p.m. EXHIBIT 1 PROPOSAL FOR.ESTABLISHING ON A SUBSCRIPTION BASIS A SELECTIVE DISSEMINATION OF INFORMATION SYSTEM FOR ALA MEMBERS The original proposal for an SDI system was intended for ISAD members only, but interest has grown at ALA Headqua1ters to the extent that it is being considered as a service to be provided for all ALA members. The proposal therefore does not require any action on the part of the ISAD Board. It is presented here for information and to give the board members an opportunity to comment on the idea and make suggestions toward developing the best possible procedure. It is hoped that a presently operating system can be found that would enable ALA members to subscribe to a system using multisubject data banks that would auto- matically adjust profiles according to past output results and that would supply as requested copies of articles and documents whenever possible. Such documents would of course be supplied at a fee additional to the basic subscription fee. It is also hoped that the operators of the system would be responsive to subscriber feedback and would improve the system as warranted. At present the only existing data banks in the library and information science fields are ERIC and MARC, but hopefully as time goes on others will be developed. It, for example, would seem prudent for the H. W. Wilson Company to consider the sale of Lihm1·y Litemtme in machine-readable form. In any event, there is no reason to limit subscriptions to the service to information science data banks. If interested, members of ALA could subscribe to other subject fields depending upon the data banks made available by the operating service. Chemistry librarians could, if useful to them, sub- scribe to Chemical Abstmcts Condensates, engineering librarians to Enginee1'ing Index, etc., etc. Only time and the availability of SDI can determine the interest of librarians in such services. At the time of writing, only one of the two agencies contacted for information has provided descriptive data on their system. A copy of one of the papers sent by the UCLA Center for Information Services is attached. Ohio State University libraries had not as yet responded. Enquiries will be made with other operating systems so that a basis for comparison wiii be available for decision at ALA Headquarters. Comments and suggestions from ISAD Board members would be appreciated. In- formation regarding presently operating systems would also be of great value. December 13, 1973 EXHIBIT 2 REPORTS OF THE MEETINGS OF THE MARBI COMMITTEE (Interdivisional Committee on Representation in Machine Readable Form of Bibliographic Information) January 19 and 20, 1974 Number one priority was the resolution of the relationship between the Library of Congress and MARBI in its capacity as the MARC Advisory Group. 52 Journal of Library Automation Vol. 7/1 March 1974 There was discussion of the position paper which was presented at the Las Vegas meeting (copy attached) entitled "The Library of Congress View on Its Relation to the ALA MARC Advisory Committee." LC had revised certain portions of this paper to conform with MARBI's wishes. These revisions were acceptable to the committee. There was concern, however, over an addition which pertained to MARBI's role with regard to formats other than books and serials (namely films, maps, music, etc.) Alternate wording to LC's proposal was worked out by Paul Fasana and John Knapp. Several documents were submitted by Henriette Avram: (1) A proposed document numbering scheme for communications between LC and the committee and vice versa, and (2) Proposed format for presenting changes to MARC formats (copies attached). These documents and proposals were acceptable to the committee. (Note: Incidental to this discussion, the committee officially adopted "MARBI" as its official acronym.) 1. The LC liaison presented two proposed MARC format changes for the com- mittee's consideration entitled: LC/MARBI 2-Addition of $x subfield for 4xx fields to allow for ISSN. LC/MARBI 3-Specincation of the 830 field. The committee decided that the following plan of action would be followed with regard to these two changes: They would be announced and distributed to ISAD MARC Users' Discussion group at its January 21, 1974 meeting. The proposed changes would be sent to all on MUDG's mailing list, asking for replies to the MARBI Chairman by February 16, 1974. The chairman would summarize responses and poll MARBI committee members who would respond by March 16, 1974. The MARBI committee chairman would respond to LC by March 16, 1974. MARBI will request publication of changes in ]OLA Technical Communications. 2. Henriette Avram presented to the committee a CLR statement which had been presented to ARL entitled "A Composite Effort to Build an On-Line National Serials Data Base." The committee took note of the presentation with interest and voted to take no action on the matter at the January 19 meeting. 3. The Character Set Subcommittee of MARBI reported that it had issued a written report which will be used in support of the United States position concerning develop- ment of standards within the International Standard organization. MARBI issued thanks to the subcommittee and requested that they remain con- vened pending review of further developments coming from activities within ISO. 4. There was a report on activities of the ad hoc committee convened by CLR to discuss use of the MARC format in a network environment. A paper entitled "Sharing Machine Readable Bibliographic Data: A Progress Report on a Series of Meetings Spon- sored by the Council on Library Resources" was discussed. The committee took note of these activities with interest and will wait for formal submission of format changes from the Library of Congress. 5. MARBI discussed the apparent overlap of the change between MARBI and the new ISAD Committee on Technical Standards. MARBI passed a resolution that the ISAD representatives should bring to the attention of the ISAD Board its concern over the similarity of the function statements of the two committees, and asked that these apparent discrepancies be considered and any duplication be eliminated. 6. The proposed MARBI Serials task force was discussed. It was felt that MARBI committee members needed to keep up on developments, and that the chairman should continue to collect and distribute as much documentation as possible to the committee Highlights of Meetings 53 members. It was decided that there was no need ~tt this time to set up a separate sub- committee to perform this function. 7. The proposed amendments to ISO 2709-1973(E) were discussed. It appears that there are several proposals circulating to change this standard. MARBI formed a subcommittee to study these proposals and respond, and possibly, to make counterproposals. The position of MARBI will be reported to the chairman of ANSI Z-39, SC/2 and will be used in support of the U.S. position within ISO. Any committee member or interested professional may reply individually. The subcommittee appointed consists of Charles Payne, John Knapp, Mike Malinconico, and Charles Husbands. Response will be made by April 1, 197 4. At its regular scheduled meeting, on January 20, all members were present. (John Byrum was unable to attend the unofficial meeting on January 19.) The distribution of the RTSD and ISAD manual material was discussed. The discussion of the previous day was summarized for purposes of review and for the benefit of the nonmembers attending the meeting. 1. MARBI and LC The alternative wording to the LC position paper was presented by Paul Fasana. It was passed. Henriette Avram will have it published in LCIB and will submit it to lOLA TC. LRTS will also receive a copy. The paper will be submitted to each divisional board. 2. The national on-line union file of serials was discussed. Larry Livingston answered questions. 3. The Character Set Subcommittee report will see that ISAD has a copy. Interested professionals should ask for a copy from them. 4. The activities of the ad hoc CLR committee were again reviewed. 5. The ISAD Standards committee was discussed. 6. The Serials task force for MARBI was reported on. 7. The proposed changes to ISO 2709-1973 (E) were reviewed. New Business: 8. The activity of the IFLA working group on content designators was discussed. It was reported that there is an attempt to standardize content designators across na- tional boundaries, for purposes of international exchange. There are problems in the area of cataloging rules, not all libraries participating, and language. No action was needed, as this is only for informational purposes at this time. 9. Location codes were discussed, but the issue was tabled pending report of ad hoc CLR committee. 10. Language and geographic area codes were brought up but not considered necessary to become involved. 11. The Z39 Standard Account Number (SAN) was reported by Emery Koltay. 12. Progress in regard to the publication of the ISBD-M and S was discussed. EXHIBIT 3 COLA REPORT-MIDWINTER '74 About fifty people were in attendance at portions of the four-hour meeting. The first half was taken up by a series of informal presentations about activity AT: BY: Stanford Allen Veaner CSUC John Kountz Berkeley & ULAP Sue Martin ULAP CIS Project at UCLA Peter Watson 54 ]oumal of Libmry Automation Vol. 7/1 March 1974 AT: NYPL-RLG & SUNY Plans University of Chicago LC BY: Mike Malinconico Charles Payne Rob McGee Mary Kay Daniels Questions were entertained at the end of each presentation. The second half was opened by a few announcements by Maryann Duggan about the New Orleans Institute and Henriette Avram about the serials proposals. The major portion of the second half consisted of a panel discussion by John Kountz, Eme1y Koltay, Tom Brady, and John Knapp on the communication of orders, claim reports, ILL requests and responses in machine-readable form. John Kountz addressed general system design aspects, Emery Koltay discussed the ISBN, ISSN, and standard account numbers, Tom Brady discussed B&T's experiences with BATAB, and John Knapp addressed the nature of the data elements and the record structure itself. Considerable discussion followed the presentations, centering heavily on the ISBN and its good points and failings. Both parts of the meeting seemed to be well received. The major value of COLA seems to be as an occasion for a wide variety of automation-oriented people to discuss a similarly wide variety of topics in an informal environment. There was some feeling that the presentations in the first half could have been more tightly controlled. The presentation in the second half was quite useful, I feel. I would like to suggest COLA as a good sounding-board for proposals and place for announcements, distribu- tions of handouts or written position papers. John Kountz and I have discussed setting aside a portion of it for TESLA reports. EXHIBIT 4 Respectfully submitted, Brian Aveney TO: Board of Directors, Information Science and Automation Division FROM: John Kountz, Chairman, Committee on Technical Standards for Library Automation SUBJECT: Report of Committee's Activities, ALA Midwinter Meeting, 1974 The Committee on Technical Standards for Library Automation (TESLA) held its inaugural meetings on Tuesday, January 1974 (4:3D-6:00 p.m. and 8:30-11:00 p.m.). These were icebreaker meetings for a new group. In view of the interest that had been expressed in various quarters, several interested observers attended, as well as six of the seven committee members (for membership attendance see attached list). In addi- tion, the following individuals were invited to meet with the committee and present their review of standards activities in other areas; establish a working perspective for the committee within the American Library Association; and delineate the constraints of the committee's charge: Mr. Fred Kilgour, Mr. Don Hammer, Ms. Velma Veneziano, Mr. Emery Koltay. While the specific discussion that ensued covered a variety of topics, the central objectives for these two meetings (establishing/ defining action areas, constraints, roles, and reviewing in some detail the committee's charge) were met. In addition, stress was placed throughout the discussion on differentiating between professional, service, biblio- graphic, and similar library standards, and the communications/ clearinghouse function to be served by the committee in its dealings with technical standards impacting library automation. Highlights of Meetings 55 At its next meeting, the committee can be expected to complete its deliberations on the charge, complete a proposed pilot procedure for the handling of initiative/reactive requirements for standards, and recommend a shakedown of the proposed procedure. Committee on Technical Standards for Library Automation ALA Midwinter Meeting 1974 Attendees of Meetings held 21 January 1974 Dr. Edmund A. Bowles, IBM Mr. Arthur Brody, Bro-Dart Industries Mr. Jay Cunningham, University of California Mr. John Kountz, Chairman, California State University and Colleges Mr. Tony Miele, Illinois State Library Mr. Richard Utman, Princeton University Absent: Ms. Madeline Henderson, National Bureau of Standards EXHIBIT 5 REPORT OF THE MEMBERSHIP SURVEY COMMITTEE We mailed out 4,337 questionnaires as of November 3. As of last week, we had received 1,666 replies. They have now dwindled down to about five or six a day, so I feel we have probably received the majority of responses from our mailing. I hope for about a 40 percent response. The returns are presently being coded now by my graduate assistant, and the Uni- versity of South Carolina Computer Centre will keypunch them for us. I am hopeful that we can start analyzing the results by the end of February, and have the report ready for you by April. The expenses to date have been: $346.95 164.32 166.60 $677.88 preliminary mailing printing of envelopes return postage The bill for printing the questionnaire hasn't been received yet but should be a very minor one. Jim Williams will write the program for the data, and the library school has computer time which we can use. I expect when all the expenses are in that the total will be more than the budgeted $700, but not very much more. Submitted by: Elspeth Pope, Chairman Jim Williams Bill Summers Martha Manheimer 8928 ---- 56 TECHNICAL COMMUNICATIONS ANNOUNCEMENTS New COLA Chairman Brian Aveney, of the Richard Abel Co., has been elected Chairman of the COLA Discussion Group, effective January 1974. Prior to his present position with the Design Group at Richard Abel, Mr. Aveney was head of the Systems Office at the University of Pennsylvania libraries. The COLA Discussion Group tradition- ally meets on the Sunday afternoon pre- ceding each ALA conference. Meetings are open, and all are invited to attend. And A Book Review Editor A member of the University of Bri:tish Columbia Graduate School of Library Sci- ence faculty, Peter Simmons, has been appointed Book Review Editor of the ]ow·nal of Library Automation. Mr. Sim- mons is the author of the "Library Auto- mation" chapter in the Annual Review of Information Science and Technology, vol- ume 8, the most recent of his publications. Authors and publishers are requested to send relevant literature to Mr. Simmons at the Graduate School of Library Science, University of British Columbia, Vancouver, British Columbia, for review. Missing Issues? The rapid publication sequence of the 1972 and 1973 volumes of the ]omnal of Library Automation has created problems for some ISAD members and subscribers. If your address changed during 1973, or if your ALA membership suffered any quirk, you are especially likely to have missed one or more of the issues due you. If this is the case, please write to the Membership and Subscription Records Department of the American Library As- sociation, 50 E. Huron St., Chicago, IL 60611. Indicate which issues you are miss- ing, and every attempt will be made to forward them to you as quickly as possible. New ERIC Clearinghouse Stanford University's School of Educa- tion has been awarded a one-year con- tract by the National Institute of Educa- tion (NIE) to operate the newly-formed ERIC Clearinghouse on Information Re- sources under the direction of Dr. Richard Clark. The new Clearinghouse will be part of the Stanford Center for Research and Development in Teaching. The Clearinghouse on Inf01mation Re- sources is the result of a merger of two previous Clearinghouses-the one on Medfa and Technology formerly located at the Stanford Center for Research and Development in Teaching, and . the one on Library and Information Sciences for- merly located at the American Society for Information Science in Washington, D.C. The new Clearinghouse is responsible for collecting information concerning print and nonprint learning resources, including those traditionally provided by school and community libraries and those provided by the growing number of technology- based media centers. The Clearinghouse collects and pro- cesses noncopyright documents on the man- agement, operation, and use of libraries, the technology to improve their operation, and the education, training, and profes- sional activities of librarians and informa- tion specialists. In addition, the Clearinghouse is col- lecting material on educational media such as television, computers, films, radio, and microforms, as well as techniques which are an outgrowth of technology-systems analysis, individualized instruction, and microteaching. LIBRARY AUTOMATION ACTIVITIES- INTERNATIONAL Computerized system at the James Cook University of North Queensland library. The system design phase of an inte- grated acquisitions/ cataloging system for the library at the James Cook University of North Queensland has been completed by a firm of computer consultants, Ian Oliver and Associates, and programming has commenced. History The system, known as CATALIST, is a batch system to be operated on the university's central computer, a PDP-10. It will be programmed in FORTRAN and MACRO, the assembly language of the PDP-10. Desc1·iption The system will cover all aspects of cataloging/ acquisitions procedures for all library material apart from serials includ- ing: (a) production of orders, followups, reports (b) budget control (c) fund accounting (d) routing slips (e) accessions lists (f) in-process and catalog supplements (author/title and added entry) and subject catalog supplement shelf- list and supplement (g) catalogs (author/title and subject) (h) union catalog cards. Some features of the system include the maintenance of average book price in all subject areas. These are continually up- dated by the system to reflect the current fluctuations in the trade. Thfs information will be used together with machine-based arrival predictions to control the budget and fund allocations. MARC data will be used as much as pos- sible, with records for individual items being supplied from external sources on request. Technical Communications 57 The In-Process Catalogues, which will contain items on order, items arrived, and items cataloged since the previous edition of the catalog, will contain added entries for all material where such information fs available. The catalogs will be produced on COM. Roll film will be used for public catalogs and fiche for in-house use. Data for the National Union Catalogue will be sub- mitted on minimally-formatted computer- produced cards. For further information contact Ms. C. E. Kenchfngton, Systems Librarian, Post Office, James Cook University of North Queensland, Australia 4811. TECHNICAL EXCHANGES Editor's Note: The two following articles, prepared by the Library of Congress and the Council on Libtary Resources, respec- tively, have been distribttted through vari- ous LC publications. Due to the im- portance of the two documents, however, and to the fact that they may not have reache.d the entire libtary community, it seemed therefore appropriate to publish the papers again in Journal of Library Automation. Sharing Machine-Readable Bibliographic Data: A Progress Report on a Series of Meetings Sponsored by the Council on Library Resources Beginning in December 1972 and con- tinuing since that date, the Council on Li- brary Resources has convened a series of meetings of representatives of several or- ganizations to discuss the implications of bibliographic data bases being built around the country and the possibilities of sharing these resources. Although the deliberations are not yet completed, the Council, as well as all par- ticipants in the meetings, felt that it was timely to make the progress to date known to the community. Since publication in the 58 I ournal of Library Automation Vol. 7/1 March 197 4 open literature implies a long waiting pe- riod between completion of a paper and the actual publication date, it was decided that this paper should be written and dis- tributed as expeditiously as possible. Since the Library of Congress has vehicles for dissemination of information in its MARC Distribution Service, I nfoTmation Bulletin, and Cataloging SeTvice Bulletin, LC was asked to assume the responsibility for the preparation of a paper to be distributed via the above mentioned channels as well as sending copies to relevant associations. The institutions participating in the delib- erations have been included as an appen- dix to this paper. The bibliographic data bases under consideration at individual institutions contain both MARC records from LC as well as records locally encoded and tran- scribed. These local records represent: ( 1) titles in languages not yet within the scope of MARC; (2) titles in languages cataloged by LC prior to the onset of the MARC service; ( 3) titles not cataloged by LC; and ( 4) titles cataloged by LC and recataloged when the LC record can- not be found locally. The first two cate- gories, in many instances, are being en- coded and transcribed by institutions using LC data as the source, i.e., proof- sheets, NUC records, and catalog cards. These are referred to for the remainder of this paper as LC source data and the third and fourth categories as original cat- aloging. All participants agreed that the stmc- ture of the format for the interchange of bibliographic data would be MARC but several participants questioned if a subset of LC MARC could not be established for interchange for all transcribing libraries other than LC. 1• 2 Although LC had re- ported its survey regarding levels of com- pleteness of MARC records and the con- clusions reached by the RECON Working Task Force, namely, "To satisfy the needs of diverse installations and applications, records for general distribution should be in the full MARC format," it appeared worthwhile to once more make a survey to see if agreement could be reached on a subset of data elements. 3 The survey in- eluded only those institutions participating in the CLR meetings. The result of the survey again demonstrated that consid- ered collectively, institutions need the complete MARC set of data elements. The decision was made that the LC MARC format was to be the basis of the further deliberations of the participants. Attention was then turned to any addi- tional elements of the format or modifica- tions to present elements that may be re- quired in order to interchange biblio- graphic data among institutions. All con- cmned recognized that although networks of libraries, in the true sense, still do not exist today, much has been learned since the development of the MARC format in 1968. Certain ground rules were established and are given below: 1. The material under consideration is to be limited to monographs. 2. The medium considered for the transmission of data is magnetic tape. 3. Data recorded at one institution and transmitted to another in machine- readable form is not to be retrans- mitted by the receiving institution as part of the receiving institution's data base to still another institution.4 4. Any additions or changes required to the MARC format for "network- ing" arrangements are not to sub- stantially impact LC procedures. 5. Any additions or changes required to the MARC format for "network- ing" arrangements are not to sub- stantially affect MARC users. Long discussions took place concerning modifications to LC source data by a transcribing library and the complexity in- volved in transmitting information as to which particular data elements were mod- ified. Ground mle 6 was established stat- ing that if any change is made to the bib- liographic content of a record copied from an LC source document (other than the LC call number), the transcribing library would be considered the cataloging source, i.e., the machine-readable record would no longer be considered an LC cat- aloging record. Any errors detected in LC MARC records are to be reported to LC for correction. A subcommittee was formed to study what MARC format additions and modifi- cations were required. The subcommittee met on several occasions and made the following proposals to the parent commit- tee: 1. Fixed field position 39 and variable field 040, ·cataloging source, should be expanded to include information defini.ng the cataloging library, i.e., the hbrary responsible for the cata- loging of the item, and the transcrib- ing library, i.e., the library actually doing the input keying of the cata- loging data. 2. LC should include the LC card number in field 010 as well as in field 001. When the LC card num- ber is known by an agency transcrib- ing cataloging data, field 001 should contain that agency's control number and field 010 should contain the LC card number. 3. Variable field 050 should not be used for any call number other than the LC call number. Transcribing agencies should always put the LC call number in this field if known. 4. A new variable field 059, contribut- ed classification, should be defined to allow agencies other than LC to record classification numbers such as LC classification, Dewey, National Agricultural Library classification, etc., with indicators assigned to pro- vide the information as to what clas- sification system was recorded and whether the cataloging or transcrib- ing agency provided this data. 5. Variable field 090, local call number should follow the same indicator sys~ tern as defined in field 059. (090 contains the actual call number used by either the cataloging or transcrib- ing library while 059 would contain additional classification numbers as- signed by the cataloging or tran- scribing library.) 6. LC would assume the responsibility of distributing any agreed upon ad- ditions or modifications as either an Technical Communications 59 addendum to or a new edition of Books: A MARC Format. Discussions following the presentation of these proposals indicated concern re- garding three principal areas: 1. The modifications of any data ele- ment in an LC source document oth- er than the addition of a local call number dictated that the institution performing the modification of the record assume the position of the cataloging source. This resulted in the possibility that a large num- ber of records would undergo mi- nor changes and consequently the knowledge that the record was actu- ally an LC record would be lost. This loss was considered a critical problem. 2. The creation of a MARC record im- plied that each fixed field and all content designators should be pres- ent if applicable for any one record. During the LC RECON project, it was recognized that certain fixed fields could not be coded explicitly because the basic premise in the RECON effort was the encoding of existing cataloging records without inspecting the book. Consequently, the value of certain fixed fields such as indicating the presence or absence of an index in the work, could not be known. Participants felt that a "fill" character was needed to de- scribe to the recipient of machine- r~adable cataloging data that a par- tiCular fixed field, tag, indicator, or subfield code could not be properly encoded due to uncertainty. The "fill" character will be a character in the present library character set but one not used for any purpose up to this time. 3. Although networking is not clearly defined at this time, participants felt that the MARC format should have the capability to include location symbols to satisfy any future re- quirement to transmit this informa- tion in order to expedite the shar- ing of library resources. Majority opinion indicated there was a 60 Journal of Library Automation Vol. 7/1 March 1974 need to guarantee the recognition of an LC source record, that a "fill" character could serve a useful function, and that a method of transmitting location symbols was required. Three position papers were written on the topics outlined above giv- ing the rationale for the requirement and describing a proposed methodology for implementation. These papers were re- viewed at a meeting of the participants and are presently undergoing modification taking into account recommendations made. The revised papers are to be distributed prior to the next meeting in January 1974. Following this meeting, another paper will be prepared for publication which will in- clude a definitive account of the modifica- tions and additions recommended for the MARC format as well as describing the rationale for the additions and modifica- tions. At that time the proposals will be submitted to the library community for its review and acceptance. If the additions and changes are ap- proved by the MARBI5 Committee of the American Library Association, LC will proceed to amend or rewrite the publica- tion Books: A MARC Format. However, the points elaborated below deserve em- phasis toward the understanding of the issues described in this paper. 1. The meetings were concerned with a national exchange of data, not in- ternational. 2. The additions and modifications rec- ommended for the MARC format, with one exception, affect organiza- tions other than the Library of Con- gress exchanging machine-readable cataloging data. Except for distribut- ing records with the LC card num- ber in field 010 as well as 001, the MARC format at LC will remain in- tact. 3. LC will investigate the use of the fill character in its own records, both retrospective and current, and for records representing all types of ma- terials. Henriette D. Avram MARC Development Office Library of Congress REFERENCES 1. The MARC format has been adopted as both a national and international format by ANSI and ISO respectively. 2. Subset in this context includes· both the data content of the record (fixed and vari- able fields) and content designators (tags, indicators, and subfield codes) . 3. RECON Working Task Force, "Levels of Machine-Readable Records," in its Na- tional Aspects of Creating and Using MARC/RECON Reco1'ds (Washington, D.C.: Library of Congress, 1973), p.4-6. 4. This rule did not extend to a subscriber to the LC MARC service duplicating an LC tape for another institution. One can readi- ly see the chaos that would result if institu- tion A sent its records to institutions B and C, B then selected all or part of A's records for inclusion in its data base, and then transmitted its records to A and C. The re- sult of the multitransmission of the same records, modified or not, would create use- less duplication and confusion. 5. RTSD/ISAD/RASD Representation in Machine-Readable Form of Bibliographic Information Committee. APPENDIX 1 List of Organizations Participating in the CLR Sponsored Meetings Library of Congress National Agricultural Library National Library of Medicine National Serials Data Program New England Library Information Network New York Public Library The Ohio College Library Center Stanford University Libraries University of Chicago Libraries Washington State Library University of Western Ontario Library A Composite Effort to Build an On-Line National Serials Data Base (A Paper for Presentation at the ARL Midwinter Meet- ing, Chicago, 19 January 1974) An urgent requirement exists for a con- certed effort to create a comprehensive national serials data base in machine-read- able form. Neither the National Serials Data Program nor the MARC Serials Dis- tribution Service, at their current rate of data base building, will solve the problem quickly enough. Because of the absence of a sufficient effort at the national level, several concerted efforts by other groups are under way to construct serials data bases. These institutions have been hold- ing in abeyance the development of their automated serials systems, some for sev- eral years, waiting for sufficient develop- ment at the national level to provide a base and guidance for the development of their individual and regional systems. This has not been forthcoming, and local pressures from their users, their adminis- trators, and their own developing systems are forcing these librarians to act without waiting for the national effort. These ef- forts are exemplified by the work of one group of librarians, described below. What has now come to be known as the "Ad Hoc Discussion Group on Serials" had its beginnings in an informal meeting during the American Library Association's Conference in Las Vegas last June. You will also hear this discussion group re- ferred to as the "Toronto Group." This is because its prime mover has been Richard Anable of York University, Toronto, and because the first formal meeting occurred in that city. The expenses of the Toronto and subsequent meetings have been borne by the Council on Library Re- sources, and Council staff have been in- volved in each meeting. A fuller exposition of the origins, purposes, and plans of the Toronto group has been written by Mr. Anable for the Journal of Libm1'y Automa- tion. It appeared in the December 1973 issue. Quoting from Anable: "At the meeting [in Las Vegas] there was a great deal of concern expressed about: 1. The lack of communication among the generators of machine-readable serials files. 2. The incompatibility of format and/ or bibliographic data among existing files. 3. The apparent confusion about the Technical Communications 61 existing and proposed bibliographic description and format 'standards'." End of quote. The Toronto Group agreed that some- thing could and should be done about these problems. If nothing else, better communications among those libraries and systems creating machine-readable files would allow each to enhance its own sys- tems development by taking advantage of what others were doing. As the discussions progressed, several points of consensus emerged. Among them were: 1. The MARC Serials Distribution Ser- vice of the Library of Congress and the National Serials Data Program together were not building a nation- al serials data base in machine-read- able form fast enough to satisfy the requirements of developing library systems. This systems development was, in several places, at the point where it could no longer wait on se- rials data base development at the national level as long as progress re- mained at the current rate. 2. The MARC serials format developed at LC offered the only hope for ma- chine format capability. Every sys- tem represented planned to use it. For the purpose of building a com- posite data base outside LC, the MARC serials format would prob- ably require minor modification, principally by extension. These ex- tensions could and should be added on so as to do no violence to soft- ware already developed to handle MARC serials. 3. There existed some difference be- tween the LC MARC serials format and that used by the National Serials Data Program. These differences arose from several circumstances. For example, the MARC serials for- mat predated the International Se- rials Data System (ISDS), the Na- tional Serials Data Program, and the key title concept. When these three came along, the requirement existed that the NSDP abide by the conven- tions of the ISDS. Since the key title 62 Journal of Librm·y Automation Vol. 7/1 March 1974 is not yet a cataloging title, but is the title to which the International Standard Serial Number is assigned, it is natural that the approach to se- rial record creation by NSDP should be different from that of a library cataloging serials by conventional methods. A working group under the auspices of the IFLA Cataloguing Secretariat has devised an Interna- tional Standard Bibliographic De- scription for Serials. The working group's recommendations are to be distributed for trial, discussion, and recommendation for change in Feb- ruary. When the ISBD ( S) is accept- ed into cataloging practice, some of the differences in MARC usage and NSDP procedure will disappear. Others will still remain and they must be reconciled. We cannot con- tinue with two serial records, both of which claim to be national in pur- pose but which are incompatible with each other. A good exposition of the differences in these serials rec- ords from the point of view of the MARC Development Office is in an article by Mrs. Josephine Pulsifer in the December 1973 issue of the Journal of Libmry Automation. 4. Major Canadian libraries are active in cooperative work on serials and these two national efforts should be coordinated. Several other circumstances bear on the problem. For example, the National Se- rials Data Program is a national commit- ment of the three national libraries. In ad- dition to the funding from the three na- tional libraries, there are excellent chances that the NSDP will receive funds from other sources to expedite its activities. The NSDP is responsible for the ISSN and key title and for relationships with the International Serials Data System. Ul- timately, the ISSN and key title will be of great importance to serials handling in all libraries. For all of these reasons it is imperative that the activities of the NSDP be channeled into the comprehensive data base building effort described in this pa- per. When it was realized at the Council on Library Resources that the Toronto Group was serious and that a data base building effort would result, it was obvious that this had enormous significance for the Li- brary of Congress and other library sys- tems because the result would be a de facto national serials data base. Accord- ingly, a paper was prepared and sent to LC, urging that an effort be made in Washington to coordinate the efforts of the MARC Serials Distribution Service, the National Serials Data Program, and this external effort. In addition, it was felt that LC should take a hard look at its own several serials processing flows and attempt to reconcile them better with each other and with the external effort. To do this, LC was urged to do a brief study of LC serials systems, using LC staff and one person from CLR. LC agreed and the study is now very nearly complete. The written guidance given the study group members was quite specific. They were to study all serials flow at LC and make their recommendations based on what LC should be doing, rather than being constrained by what LC is doing. The overall objectives of the study were to aim for the creation of serials records as near the source as possible and one- time conversion of each record to ma- chine-readable form to serve multiple uses. Specifically to be examined were the serials processing flows of the Copyright Office, the Order Division, the Serial Rec- ord Division, New Serials Titles, and the National Serials Data Program. While all of this was going forward, the Toronto Group had some more meetings. OCLC was tentatively selected as the site for the data base building effort. It is un- derstood by everyone that this is a tem- porary solution; eventually a national-level effort must be mounted which will pro- vide a post-edit capability to bring the composite data base up to nationally ac- ceptable standards. A permanent update capability is also required. This perma- nent activity, hopefully, will be based at the Library of Congress. OCLC was cho- sen as the interim site for several reasons, but especially for its proven capability to produce network software and support which will work. Within a very short time OCLC will have on-line serials catalog- ing and input capability which will ex- tend to some two hundred libraries. No other system is nearly so far advanced. The Toronto Group has assured itself that the data record OCLC intends to use is adequate and is now working on the con- ventions required to insure consistency in input and content, to include some recom- mendations for minor additions to the MARC serials format. During their deliberations, the Toronto Group realized that, to be effective, their efforts needed formal sponsorship, and discussions to this end were begun. Initial- ly, several agencies were considered to be candidates for this management role. Vari- ous considerations quickly narrowed the list down to the Library of Congress, the Association of Research Libraries, and the Council on Library Resources, and repre- sentatives of these three met to discuss the matter further. During the discussions, CLR was asked to assume the interim management responsibility until a perma- nent arrangement could be worked out. CLR was selected because, as an operat- ing foundation under the tax laws, it can act expeditiously in matters of this kind. CLR can also deal with all kinds of li- braries and has no vested interest in any particular course of action. Meanwhile, certain institutions in the Toronto Group had indicated that they were ready to pledge $10,000 among themselves for the specific purpose of hir- ing Mr. Anable as a consultant to continue his coordinating activities. The group asked CLR to act as agent to collect and disburse these funds. CLR is ready to assume the initial re- sponsibility for the management of this cooperative data base building effort, if that is the will of the leadership in the li- brary community. CLR is prepared to commit one staff member full time to the project who is well versed in the machine handling of MARC serials records. This is Mr. George Parsons, and other staff members will assist as appropriate. Mr. Anable has agreed to act as a consultant to help coordinate these activities. CLR would aim for the most complete, accu- Technical C01nmunications 63 rate, and consistent serial record in the LC MARC serials format which can be had under the circumstances. During the effort, CLR will act as the point of con- tact between OCLC and the participating libraries, assisting in negotiating contracts and other agreements as required. The composite data base will be made avail- able to all other libraries at the least pos- sible cost for copying. Initially at least, the costs of this effort will have to be shared by the participating libraries, since no ad- ditional funds are presently available. The goal is to build 100,000 serial records the first year, another 100,000 the second year, and design and implement the per- manent mechanism the third year, while file-building continues. As the project gets under way, it will work like this: a set of detailed written guidelines for establishing the record and creating the input will be promulgated, and agreement to abide by them will be a prerequisite to participation. Selected libraries with known excellence in serial records will be asked to participate; others may request participation. Those selected who already have or can arrange for ter- minals on the OCLC system will partici- pate on line. This is the preferred method, but it may be possible to permit record creation off line, such records to be added to the data base in a batch mode. It is very difficult to merge serial files from dif- ferent sources in this way, so an attempt will be made to find a large serials data base in machine-readable form for use as a starting point. This file would be read into the OCLC system. A participating li- brary wishing to enter a record would first search to see whether it existed in the ini- tial data base. If a record is found, it would be updated insofar as this is pos- sible, within the standards chosen for the system. It may be further updated by oth- er participants, still within the system standards, but at some point update on a record in the system will reach a point of diminishing returns and the record will remain static until a post-edit at the na- tional level can be performed. These rec- ords will be for use as their recipients see fit, but their prime purpose is to support the development of automated serials sys- 64 Journal of Library Automation Vol. 7/1 March 1974 terns while eliminating duplication of ef- fort. Details of how to Hag these records in the OCLC data base as they are being created by this effort will be worked out, as will be the relationship between this effort and the rest of OCLC activities. CLR will, from time to time, report prog- ress to the community. It would be the hope of CLR that the Toronto Group will continue to assist in the technical and detailed aspects of the project. In addition, and after consultation with the appropriate people, an advisory group will be appointed to advise CLR in this effort. Lawrence Living8ton Council on Library Resou1'Ces INPUT To the Editor: Re: file convm·sion using optical scan- ning at Berkeley and the University of Minnesota discussed by Stephen Silber- 8tein, JOLA Technical Communications, December 1973. It is rewarding to find someone who has actually read in detail one's published work (Grosch, A. N. "Computer-Based Subject Authority Files at the University of Minnesota Libraries"), I generally agree with Mr. Silberstein's observations regard- ing the use of optical scanning for library file conversion. However, several points were raised by Mr. Silberstein on which I feel further comment is needed. Perhaps in my article I should have cautioned the reader that when develop- ing procedure and programs for the CDC 915 page reader, there is a great variance in these machines depending upon: 1. How early a serial number unit, i.e., vintage of machine, 2. What version of the software system GRASP is being used, 3. What degree of machine mainte- nance is performed out, and 4. What kinds of other customers are using the scanner. It was our misfortune to have a CDC 915 page reader that had many peculiari- ties about it which could or would not be resolved by a maintenance engineer. In addition it was not heavily used and what use it did receive was mostly nonrepetitive conversion jobs dealing mostly with mail- ing address file creation and freight bill- ing. In our initial testing we tried to use various stock bond paper and had various reading difficulties. In talking with others who had used this particular machine we found that the choice of paper stock was critical on this scanner. I might add that we did not actually use $400 worth of paper on this as I sold half of the stock we had ordered to another user locally who was going to use this device. It might be worth mentioning that we had a failure of a potentially large con- version project reported to us. This proj- ect tried to use this equipment but could not create a suitable input format because of a specific uncorrected peculiarity of not being able to read lines of greater than six inches without repeated rejects. We were aware of this from our experience which is why we kept our line short using the ro to terminate reading of the line at the last character position. Also our input was double spaced, not single spaced as you seem to infer in your comments. With this particular device we also found that the format recognition line was easily lost, necessitating greater time spent in re-running the job. Therefore, even though this was a great commission of sin on our part according to Mr. Silberstein, I then must plead guilty to using expedi- ent methods to turn a bad situation into an acceptable one. I might also point out that this solution had been employed at various times by some past users we con- tacted. In fact, I have later found out that occasionally such a technique has been resmted to in one of our other local user installations on a much newer ma- chine. I do not wish to imply that our conver- sion achieved maximum through-put but that in any case it was a cost effective way to proceed. With a small file conversion such as this one which is to be done on a one-shot basis, it seemed foolish to me to spend much time optimizing, but rather to find a way that worked as our difficul- ties were encountered. If this had to be a continuing job we would have had to get a better maintained scanner and in- vested more time and money into the proj- ect. I take the view that we wish to cou- ple modest human costs with modest proj- ects and reserve for greater projects of a continuing nature more optimized proce- dures. I agree that file cleansing is undoubted- ly the most costly operation but I cannot say by just what amount since my respon- sibilities did not include such work. This Technical Communications 65 was later performed by our Technical Ser- vices Department. Our general point in writing about this project was to convey our broad experi- ences using this technique on a subject authority system as we had not seen such use reported in the literature previously. I would hope your comments and mine here serve to illustrate that one's systems problems must be solved in light of the conditions and not always according to what we term the best theory or practice. To this end I hope others will profit from both of om comments. Audrey N. Grosch University of Minnesota Libraries 8929 ---- 66 Journal of Lihm1'y Automation Vol. 7/1 March 1974 BOOK REVIEWS Computer Systems in the Library: A Handbook for Managers and Designers. By Stanley J. Swihart and Beryl F. Hef- ley. A Wiley-Becker & Hayes Series Book. Los Angeles: Melville Publishing Compa- ny, 1973. 388p. Once every year or two, either in En- gland or the United States, a book ap- pears attempting to explain computer sys- tems to librarians. This book, Compute?' Systems in the Library, is the most recent of the introductory texts. It starts off with a chapter entitled "Why Automate?" which skims ve1y lightly and uncritically over the often-repeated reasons for using computers. In this instance, money is in- cluded as a reason to automate, for we are told that "When properly planned, unit operating costs are normally reduced when a function is automated." Automa- tion's impact on the library's research and development budget is not discussed. The book then proceeds to the six chap- ters which occupy the bull< of the book They cover the automation of six major librmy functions: catalog publication, cir- culation, acquisitions, cataloging, catalog reference services, and serials. Each chap- ter consists of a description of one or two apparently existing automated systems, with a complete discussion of how the sys- tem functions, what files are involved, the data in each file, coding and formats used in the files, and reproductions of various output products from each file. Unfortu- nately, we are not told where each of these systems exists, and the systems often appear to use techniques that are suitable only for very small libraries. For example, in the circulation system that is described, a packet of prepunched book cards is to be carried in the book; each time the book is charged or discharged one of the cards is removed, with the last card serv- ing as a signal to create a new deck of cards. Little mention is made of the data collection terminals that are so commonly used in automated circulation systems, with the result that the description is very closely linked to a single system, with lit- tle opportunity for the reader to compare various methods or techniques of informa- tion handling. The latter part of the book addresses itself to some general problems, including the interlibrary sharing of data and pro- grams; the planning, implementation, and control of automation projects; and brief discussions of input and output problems, the protection of records, and some con- siderations in choosing hardware. Three appendixes offer a 2,500-word exclusion list for KWIC indexes, a set of model key- punching rules for a corporate library, and a thirty-three-item bibliography in which the majority of works listed were pub- lished between 1964 and 1968. A major weakness of the book seems to be its lack of critical focus. Library auto- mation problems are treated as being not particularly difficult; in fact, "the authors can see no serious or major disadvantages to automation in libraries. The situation," we are told, "can be compared with the disadvantages of using typewriters or tele- phones." This reviewer finds it difficult to know what sort of audience these words, and the entire book, are addressed to. Though subtitled "A Handbook for Man- agers and Designers," it would be an in- experienced manager indeed who needed to be told that "In its mode of operation, a keypunch is quite similar to a typewrit- er. A key must be struck for each charac- ter . . . ," or that "The catalog master file may be stored on magnetic tape reels or on magnetic disks." The experienced li- brarian, on the other hand, will not be pleased to learn that "many libraries with computer systems have given up the Li- brary of Congress [filing] system for Mel Mac and have placed Mac in order between Mab and Mad, and Me between Mb and Md." Nor will anyone associated with libraries be pleased to discover that "computer centers not only can, but fre- quently do, lose information. From time to time complete files are erased. There is almost no way to ensure that informa- tion will not be inadvertently erased." The librarian who is already involved in automated systems will not need this book; the librarian who wishes to learn about automation and the systems analyst who needs to understand library systems will do well to read other sources in addi- tion to this one. Peter Simmons University of British Columbia The Metropolitan Library. Edited by Ralph W. Conant and Kathleen Molz. Cambridge, Mass.: M.I.T. Press, 1972. 333p. $10.00. The editors describe this book as a se- quel to the important Public Library and the City (1965), also published by M.I.T. Press. The focus again is on the concerns of metropolitan public librarians, combin- ing the viewpoints of specialists from li- brary and social science disciplines. Of the eighteen papers included, only three, by John Trebbel, John Bystrom, and Kathleen Molz, concentrate on the impli- cations of present and future technology on public library service. Their papers of- fer a general, if hard-nosed, approach to the need for specific research into the economic, behavioral, professional, and technological barriers impeding the ad- vent of the automated millenium. Micro- graphics, reprography, computers, fac- simile transmission, telecommunications hardware, and technology are considered essential components of information trans- Book Reviews 61 fer with which libraries must become compatible-and comfortable. The iJllperative need for and conduct of long-range research in telecommunica- tions is outlined by Bystrom, including aspects of research necessary for both a national telecommunications network link- ing all types of libraries and the local use of community cablevision by individual library outlets. The three authors devote considerable head-shaking to the chilling reality of fi- nancing technological adaptations and in- novations in libraries-the "snake in Eden" according to Trebbel. Govern- ments, specifically national governments, are cited as the logical sources of the enormous sums required for automated library and information services of what- ever kind. Molz warns repeatedly and forcefully that libraries, while not discarding the book, must change their priorities. Con- tinued dependence on print as the prime information transfer medium is insupport- able. The public library must adapt to a multimedia world. None of the foregoing is new to 'infor- mation scientists or specialists in automa- tion, but as concerned participants in the knowledge business they should find these papers of general interest. Lois M. Bewley University of British Columbia 8931 ---- 71 Who Will Steer the Ship? During 1973, the existence of two study groups sponsored by the Council on Library Resources became informally known in the library automation community. By the time of the 197 4 ALA Midwinter Meeting, the lack of formal identification of these groups, their goals, and their relation to CLR provoked some spontaneous and possibly faulty responses from the ALA members present. In the March 1974 issue of lOLA, Ms. Ruth Tighe analyzed with perception and accuracy the behavior of information scien- tists attending that meeting. However, we feel that further attention should be paid to the precise situation in which we find ourselves. The Council on Library Resources has, for eighteen months, funded a small group with the acronym of CEMBI. Informal communication had it that this group of library automation ex- perts originally was to devise a standardized subset of the MARC mono- graph format; however, a full year passed without public announcement of this work. Unable to come to an agreement, the group seems to have turned to specific strategies for interchange of machine-readable biblio- graphic data. These goals are, of course, valid and worthy of pursuit. CLR also announced at Midwinter the intent to administer a system plan for large-scale serials conversion to create a national serials data base. This project draws upon the considerable efforts of the ad hoc "Toronto" group, which provided a status report of its efforts in the December 1973 issue of lOLA. In addition, an invitational conference was sponsored in April 197 4 by CLR, to discuss national bibliographic control, with a small number of conference attendees and with a total absence of publicity. There are three major problems inherent in the situation described above, affecting not only the library automation community but libraries as a whole. First, there is no apparent justification for the air of secrecy which has surrounded CLR' s direction of these worthy efforts. Surely it must be abundantly clear to all administrators these days that in order to imple- ment a far-reaching program it is necessary to inform if not consult with the target population. Those librarians who are not associated with CLR do not necessarily have axes to grind or home-grown systems to foist upon the world. They do wish to be kept informed of discussion and develop- ments which may eventually have a direct effect upon their work. While it is perfectly reasonable to foster technical progress in a difficult area of study by forming a closed working group of skilled professionals, there seems to be little gained by avoiding recognition of such a group. 72 Journal of Library Automation Vol. 7/2 June 1974 Second, the approach of these projects has sidestepped all the existing channels of operation and communication which we have been striving for over a decade to create. Should CLR wish a certain task perfornied, it should be able to contract with the Library of Congress or to fund an existing ALA committee to carry out the work Under the present circumstances, these established channels are likely to find their deliberations bypassed and superseded by these ad hoc groups. Third, it seems important when determining issues (ad hoc standards for local input of MARC-like monograph and serials records) which are of long- range concern to many libraries, that it is particularly important not to bias a development effort toward the needs of one type of library. The approach of the large research library, while important:'is not the only vantage point from which to perceive the problems of nationwide bibliographic systems. We recommend that the Council on Library Resources find an alternate method of accomplishing its goals-a method which includes provision for adequate communication and which takes advantage of existing channels. Such a method, for the CEMBI group, might be to declare its deliberations to be a user-group standards proposal for submission to the RTSD/ISAD/ RASD Representation in Machine-Readable Form of Bibliographic Infor- mation Committee ( MARBI), the appropriate ALA committee. If CLR wishes more intensive review of this or any other proposal from MARBI, it could fund the necessary expenses for more frequent meetings of MARBI. An analogous method would establish the serials project as a funded program, with the desired task goals, within the Library of Congress, the National Serials Data Program, or an appropriate library union serials organization. CLR has done many good deeds for the library world in its lifetime; it would indeed be unfortunate were it to inadvertently allow the growth of professional confusion and resentment that were evident at the Midwinter Meeting. SusAN K. MARTIN 8932 ---- A Simulation Model for Purchasing Duplicate Copies in a Library W. Y. ARMS: The Open University, and T. P. WALTER: Unilever Limited. At the time this study was undertaken the authors were at the University of Sussex. 73 P1'ovision of duplicate copies in a lib1'at'Y requires knowledge of the de- mand fo1' each title. Since di1'ect measu1'ement of demand is difficult a sim- ulation model has been developed to estimate the demand for a book f1'om the number of times it has been loaned and hence to dete1·mine the number of copies required. Special attention has been given to accurate calibration of the model. INTRODUCTION A common difficulty in library management is deciding when to buy dupli- cate copies of a given book and how many copies to buy. A typical research library has several hundred thousand different works; many are lightly used but all are potential candidates for duplication. The problem which we faced at Sussex University was how to obtain reliable forecasts of the demand for each title and to translate this into a purchasing policy. At present Sussex spends between £10,000 and £20,000 ($22,00o-$44,000) per year on duplicate copies, and as the university grows this amount is increas- ing steadily. Because of the large number of books in a library relatively little data are available about each title. Records are kept of books on loan or re- moved from the library, but frequently these are the only routine data col- lected. Few large libraries even manage inventory checks. We therefore looked for a system that could be implemented with the minimum of data collection, preferably one based on existing records. FORECASTS OF DEMAND If the demand for a particular book is known, it is possible, though not necessarily easy, to determine how many copies of that book are needed to achieve a specified level of service, such as a copy being available on 80 percent of the occasions that a reader requires the book. Unfortunately demand cannot be measured directly, even retrospectively. Records of the 74 Journal of Librm·y Automation Vol. 7/2 June 1974 number of times that a book is issued from the library contain no infor- mation about how many times the book was used within the library, nor how many readers failed to find a copy and went away unsatisfied. Since both these factors are extremely difficult to measure, one of the central parts of our work was to develop a method of estimating them from data readily available. To forecast demand two lines of approach seemed reasonable: subjec- tive estimation based on faculty reading lists; and forecasts based on the number of loans in previous years. In the past, Sussex Library has made extensive use of reading lists provided by faculty to decide how many copies to buy of each title. As the books most in demand are those recom- mended for undergraduate courses this seemed a sensible approach, though the number of copies required is not obvious even if the demand is known. Webster analysed the effectiveness of these lists in predicting de- mand for specific titles and evaluated the purchasing rule being used, one copy for every ten students taking a course. 1 Restricting his attention to books known to be in demand and marked in the catalog, he drew a ran- dom sample of 673 titles, about 4 percent of the books falling into this category. He compared the number of loans of each of these titles over a term· with data from the reading lists supplied at the beginning of the term. As the library had made a special effort to obtain reading lists for all courses taught that term, he had data on the number and type of students taking each course, the importance given to each text, and the subject areas involved. Yet despite a thorough analysis of these data Webster was able to find very little relationship between observed demand and reading list information. His work shows that faculty at the university have remark- ably little knowledge of the books that their students read. In the sample some books strongly recommended to large groups of students were hardly used and some of the most heavily used works appeared on no reading list. The results of this study are fascinating from an educational viewpoint but less satisfying as operational research. The failure of this .. approach led us to predicting demand from records of the number of past loans. This divides into two parts: using the num- ber of loans over a period to estimate what the total demand was during that period; and using this estimate of the demand in one period to fore- cast the demand in another. Various evidence suggests that the latter is a sensible thing to do. The main demand for heavily used books comes from undergraduate courses. Most faculty are loyal in their reading habits, rec- ommending books they know rather than new ones, and each course tends to be repeated year after year with a syllabus that changes only gradually. The use of past circulation to forecast future use is fundamental to a Markov model of book usage developed by Morse and Elston and tested with data from the M.I.T. Engineering Library. 2 For our work we have used the number of loans in a given term to predict the demand in the cor- responding term a year later. Simulation M odelj ARMS and WALTER 75 Estimating the total demand in a period from the number of loans in that period is more difficult. This requires a model of the circulation sys- tem. MATHEMATICAL APPROACH Several attempts have been made to apply the methods of inventory con- trol or queueing theory to the problem of buying duplicates. For example, Grant has recently described an operational system using the simple rule that the number of copies required to satisfy 95 percent of the demand is n (p,. + 2cr.)/t where n is the number of times that the book is issued during a period of t days and p,8 and cr8 are the mean and standard deviation of the time that each book is off the shelf when on loan. 3 This type of approach has the advantage of being straightforward to use. Periodically a simple computer program analyzes the circulation histo- ry of each book in the library and prints a list of books requiring duplica- tion. However, the method suffers from difficulties both mathematical and practical. To obtain the simple mathematical expression given above, sev- eral simplifying assumptions have to be made. For example, the expres- sion ignores use of a book within the library, and identifies demand in a period with the number of loans within that period. Practical difficulties in arriving at a more exact mathematical expression are discussed in the next section. DIFFICULTIES IN CONSTRUCTING A MODEL The following are the main difficulties that we found in constructing a model, either mathematical or using simulation: 1. The most useful measure of the effectiveness of a duplication policy is satisfaction level, the proportion of readers who on approaching the shelves find a copy of the book there, but satisfaction level is al- most impossible to measure directly since, although some unsatisfied readers ask that the book be held for them, most go away without comment. More or less equivalent is the percentage time on shelf, the proportion of time that at least one copy of the book is available. This can be measured directly, though a visit to the shelves is needed, and was found useful in validating our model. If the underlying de- mand is random these two measures of effectiveness have the same value. 2. Use of books within the library is also difficult to measure. At Sussex, as in most libraries, data are available only on the number of times that a book is lent out of the library. If a reader does not find a copy on the shelves or if he uses a book within the library but does not take it away then no record is generated. Since various studies, notably that of Fussier and Simon, suggest that the amount of use within li- 76 ]oumal of Libmry Automation Vol. 7/2 June 1974 braries often exceeds the number of loans recorded by a factor of three or more, if the number of loans is used to estimate demand a reasonable knowledge of within-library use is essential.4 3. The number of copies required to achieve a specified satisfaction lev- el does not go up linearly with demand. Since a reader is satisfied if he finds a single copy on the shelves, proportionately fewer duplicates are needed of the books most in demand. At Sussex more than twenty copies are provided of several books and this nonlinearity is very no- ticeable. 4. The demand for a title is erratic, changing from term to term, from week to week, and from day to day, even if the mean demand is con- stant. Over a period such as a term three different effects might be ex- pected: a background random demand independent of university courses; sudden peaks when a book is required for a course taken by several students; and feedback caused by previously unsatisfied read- ers returning. 5. The circulation of books is surprisingly complicated. At Sussex some books are designated short term loan and can be borrowed for up to four days only; the remainder are long term loan books and can be borrowed for up to six weeks. Circulation data show that the time for which a book is off the shelf is not the same as the period for which it is lent, but has a heavily skewed distribution. Few books are returned until near the due date; just before the book is due back there is a peak when most books are returned but many become over- due and the tail of the distribution dies away slowly. SIMULATION As these various factors seemed too complex to derive usable mathe- matical results, we decided to use computer simulation of the book circula- tion. Simulation of book circulation is not new. In particular it has been used at Lancaster University by Mackenzie et al. to decide loan periods.5 Their report includes a good description of the general approach. The object of our simulation was to model the circulation process so that we could study the relationship between three groups of parameters: 1. 0 bserved data Number of copies available Number of loans 2. Total underlying demand 3. Measures of effectiveness Satisfaction of level Percentage time on shelf. The results obtained from any simulation are only as accurate as the values given to the variables used to calibrate the model. As several of these values were not known at all accurately when the work was begun, special efforts were put into careful validation and calibration of the mod- 76 ]oumal of Libmry Automation Vol. 7/2 June 1974 braries often exceeds the number of loans recorded by a factor of three or more, if the number of loans is used to estimate demand a reasonable knowledge of within-library use is essentiaJ.4 3. The number of copies required to achieve a specified satisfaction lev- el does not go up linearly with demand. Since a reader is satisfied if he finds a single copy on the shelves, proportionately fewer duplicates are needed of the books most in demand. At Sussex more than twenty copies are provided of several books and this nonlinearity is very no- ticeable. 4. The demand for a title is erratic, changing from term to term, from week to week, and from day to day, even if the mean demand is con- stant. Over a period such as a term three different effects might be ex- pected: a background random demand independent of university courses; sudden peaks when a book is required for a course taken by several students; and feedback caused by previously unsatisfied read- ers returning. 5. The circulation of books is surprisingly complicated. At Sussex some books are designated short term loan and can be borrowed for up to four days only; the remainder are long term loan books and can be borrowed for up to six weeks. Circulation data show that the time for which a book is off the shelf is not the same as the period for which it is lent, but has a heavily skewed distribution. Few books are returned until near the due date; just before the book is due back there is a peak when most books are returned but many become over- due and the tail of the distribution dies away slowly. SIMULATION As these various factors seemed too complex to derive usable mathe- matical results, we decided to use computer simulation of the book circula- tion. Simulation of book circulation is not new. In particular it has been used at Lancaster University by Mackenzie et al. to decide loan periods.5 Their report includes a good description of the general approach. The object of our simulation was to model the circulation process so that we could study the relationship between three groups of parameters: 1. 0 bserved data Number of copies available Number of loans 2. Total underlying demand 3. Measures of effectiveness Satisfaction of level Percentage time on shelf. The results obtained from any simulation are only as accurate as the values given to the variables used to calibrate the model. As several of these values were not known at all accurately when the work was begun, special efforts were put into careful validation and calibration of the mod- Simulation Model/ ARMS and WALTER 77 el. A separate study was made for a small sample of books, to com- pare the percentage time on shelf estimated by the simulation with the ac- tual time for which a copy was available, found by looking at the shelves. The results of this study were used to check the amount of use within the library. By this means we were able to verify the simulation model and calibrate it to a highly satisfactory level of accuracy. DESCRIPTION OF PROGRAM The basic layout of the simulation is shown in Figure 1. .This is a time advance model with a period of one day. The program has been coded in FORTRAN and running on the ICL 1904A computer at Sussex takes about one second of machine time to simulate two years. This fast speed has enabled us to try a wide range of values for most parameters and to experiment with a variety of distributions of arrival times and book re- turn dates. 1. Satisfaction level At the beginning of each day the number of demands for that day is generated. The satisfaction level is taken as the proportion of these requests which can be satisfied from the books left on the shelf from the previous day and those returned during the simulated day. 2. Within-library use The proportion of use that takes place within the library was a key parameter in calibrating the model. The first version of the simula- tion program assumed a figure of 25 percent use within the library. This was based on a small survey of the type of books being studied, standard texts used for undergraduate courses. The weakness of this survey was that it used a count of those books that were left lying in the library at the end of the day and did not make sufficient allow- ance for books reshelved by readers or by library staff during the day. The validation experiment showed a consistent difference between predicted and observed percentage time on shelf which could be cor- rected by changing the value of the within-library use parameter to 60 percent. 3. Distribution of demand Two distributions of demand have been used, Poisson arrivals with a specified mean, and a step demand superimposed on a Poisson pro- cess. In both cases provision is made for a proportion of unsatisfied readers to return later. As the effect of this feedback is to introduce sharp peaks of demand, the two distributions have proved surprising- ly similar in the results produced and most of the runs of the pro- gram have been done with random demand. A recent survey showed that 69 percent of readers who fail to find a book intend to return, but we do not know how many actually come back nor what the time interval is before they return. 6 The simula- tion proved to be insensitive to moderate changes of these parameters 78 Journal of Library Automation Vol. 7/2 June 1974 Advance clock one day Add returned books Generate requests Fig. 1. Outline flowchart of simulation program Generate :return date Generate return date Reader return date Simulation Model/ ARMS and WALTER 79 and for most runs 25 percent of unsatisfied readers were deemed to return after a delay which averaged two days. 4. Period for which the book is off the shelf The simulation allows for a book to be borrowed within the library, in which case it is available again the next day, or to be lent from the library. If the book is lent, the return date is generated from one of two histograms which respectively refer to books available on short and long term loan. These histograms were derived from an analysis of all books returned during one week in autumn 1970, modified to reflect changes in the circulation system. VALIDATION EXPERIMENT Although the structure of the simulation is fairly straightforward sev- eral parameters used in the model have been estimated indirectly. Valida- tion of the model took two forms. Firstly we ran the program with a wide range of values for the main parameters to see which most influence the results. Secondly a small study was set up to measure the percentage time on shelf of a number of books. For each book, the actual availability was estimated by the simulation from the number of loans during the same period. Twenty-eight books known to be in heavy demand were selected, half in physics and half in sociology. Over a period of eight weeks the shelves were inspected once per day, at random times during the day, to see if a copy was available. The number of loans of each copy of each book dur- ing the period was noted and the library staff carried out a thorough check to determine whether any copies shown in the catalog had been lost, stolen, or had their loan category altered. The simulation was used to estimate the percentage time on shelf and this was plotted on a graph against the ob- served percentage. Figure 2 shows the graph for the original values of the parameters. In this graph the x axis shows the percentage time on shelf predicted by the simulation; the y axis shows the percentage observed. If the model were perfect the points would lie near the line y = x, deviations being caused by y being a random variable. The graph in Figure 2 is clearly convex down- wards showing a consistent error in the model, with these values of the pa- rameters. Knowing that the simulation is sensitive to the parameter giving the proportion of use that takes place within the library and that our esti- mate of its value was not precise, a series of graphs were prepared varying this parameter. Figure 3 shows the same observations plotted against pre- dictions assuming 60 percent use within the library, the value which best predicts the observations. This graph is much closer to being linear than Figure2. The next question is whether the nonlinearities in Figure 3 are the type to be expected from y being a random variable. A very rough calculation helps to answer this question. If we make the dubious assumption that 80 I ournal of Lihm1'y Automation Vol. 7/2 June 197 4 Observed availability (percent time on shelf) 100 50 25 o~----------~~----------~50~----------~75 ____________ -JlOO Predicted availability (percent time on shelf) Fig. 2. Observed percentage time on shelf against predicted ( 25 percent use within library) availability of a copy on a given day is independent of the days before and afterwards, then, for x given, y should be approximately normally distributed with mean x and variance x( 1 - : ) , where n is the number of days in the study (forty). If this calculation were exact, 95 percent of the observations of y would lie within two standard deviations of x, but, since the assumption of independence is definitely false, we would expect the number of observations which fall within the range to be less than 95 per- cent. The curves y = x ± 2 { x(l- x)/n} ¥. Observed availability (percent time on shelf) 100 75 50 25 Simulation Model/ ARMS and WALTER 81 Predicted availability (percent time on shelf) Fig. 3. Observed percentage time on shelf against predicted ( 60 percent use within library) with 95 percent probability curves have been added to Figure 3. Two points lie well off all graphs and cannot be explained except as the result of books being stolen or lost during the period of the study. Of the remaining twenty-six all but three lie within the curves. This shows that the simulation model as finally calibrated gives a very reasonable description of the situation. OPERATIONAL EXPERIENCE The results of this simulation have been used by library staff since the middle of 1971 initially on an experimental basis. A two-stage process is in- 82 Journal of Library Automation Vol. 7/2 June 1974 volved. From the computer based circulation system caU; be found the number of times that each short term loan copy has been circulated. From these figures the library staff can estimate the demand for a title, over a given period. Once the demand has been estimated the staff can use the simulation again to determine how many copies would have been required to have achieved a specified satisfaction level, perhaps 80 percent. If fewer copies are held by the library orders are placed for extra copies. At present these procedures are done manually using tables, but the possibility exists of modifying the computer system to identify those titles which need ex- tra duplication. The actual decision to purchase needs to be done by li- brary staff who can take account of factors not included in the simulation, such as price and changes of undergraduate courses. CONCLUSION Although this work was carried out during 1971, we shall have little op- erational experience of the method in action until the computer circula- tion system is reorganized. In the past, different copies of the same book have been processed entirely independently, meaning that the total num- ber of loans of a given title can only be found by manually adding up the number of loans of each copy. In the revised computer system this will be done automatically. Experience will probably show that the best procedure combines use of the simulation model with reading lists and the skill of a librarian. One possible feature of a computer based system is that it could automatically indicate which books appear to require duplication. The method used here would seem to apply equally well to other li- braries. Naturally the circulation patterns of other libraries are different, which means that a different simulation would be needed, but this work has shown that it is possible to calibrate a simulation accurately enough to examine the circulation of individual books. ACKNOWLEDGMENTS We would like to thank the many members of the University of Sussex library staff who have helped at various stages, particularly P. T. Stone who was closely involved throughout. REFERENCES 1. P. F. Webster, Provision of Duplicate Copies in the University Library, Final year project report (University of Sussex, 1971). 2. P. M. Morse and C. R. Elston, "A Probability Model for Obsolescence," Operations Resem·ch 17:36-47 (1969). 3. R. S. Grant, "Predicting the Need for Multiple Copies of Books," Journal of Library Automation 4:64-71 (June 1971). 4. H. H. Fussier and J. L. Simon, Patterns in the Use of Books in Large Research Libmries (Chicago: Univ. of Chicago Pr., 1969). 5. A. G. Mackenzie et al., Systems Analysis of a University Library. Report to OSTI on Project Sl/ 52/02, 1969. 6. J. Urquhart, Private discussion, 1971. 8933 ---- Automated Periodicals System at a Community College Library Vivian HARP: Staff Analyst, Illinois Bell Telephone Company, Chicago, and Gertrude HEARD: Serials Technician, Moraine Valley Community College Library, Palos Hills, Illinois. At the time of writing, Ms. Harp was Assistant Librarian at the Moraine Valley Community College Library. 83 Automated systems need not be extensive to save time and improve efficien- cy. Moraine Valley's off-line operation, based on a file of 715 periodical ti- tles, generates renewal orders, sends claims, and records subscription his- tories. BACKGROUND Moraine Valley Community College (MVCC) is a two-year institution serving southwest Cook County, Illinois. It opened in September 1968 and now has an enrollment of 3,468 students. The library maintains 715 paid and free periodical subscriptions. Because of the small staff size, periodicals had originally been handled by the cataloger. Two subscription agencies were tried and found unsatis- factory. Problems with overlapping subscriptions and lapsed subscriptions which were never picked up became quite severe, and time spent tracking down problems approached that needed to handle orders and renewals in- dependently. Periodicals were transferred to the public service librarian when the staff was expanded. As untangling of agency problems proved more and more time-consuming, a serials technician was assigned to main- tain subscriptions, straighten out old problems, check in periodicals, and handle claims. For each subscription, bibliographic and order information and MVCC holdings were entered on a three-by-five-inch history card; on the verso were records for each renewal of purchase order number, subscription length, cost, and subscription dates. Magazines were and are checked in on Kardex files; the Kardex card also holds the latest publisher's mailing label. A checklist is used to ensure that eaQh new title has a Kardex card and storage box prepared and a listing in the public holdings record, plus any special instructions for routing. Form letters are used for original en- quiries to the publisher regarding availability and cost. When a subscription was renewed, data from the "current'' section of 84 Journal of Library Automation Vol. 7/2 June 1974 the history cards had to be transferred to the back and updated informa- tion entered on the front. A worksheet was made up to give all the neces- sary renewal information to the typist and the actual purchase order typed from that worksheet. Once the purchase order was completed its number was marked on the worksheet and the history card. Worksheets were kept on file to serve as easily accessible copies of the purchase orders for use in correspondence since the library copy of the purchase order was tied up in the accounting process. Since many publishers do not provide renewal notification-and in our case these renewals amounted to over 40 percent of our orders-various methods to provide ourselves with notification of approaching expirations were attempted, including the use of colored plastic jackets in the history card file and division of the file by date. Failure in this area was the chief weakness of the manual system. The cards were bulky and required much handling. Creating a holdings list destroyed any semblance of the color- coded order. If a card were removed for use in correspondence, it could be misplaced or misfiled and therefore not be considered for renewal at the proper time. Duplication of paperwork and repeated erasures and transfers of information on the cards were other drawbacks of this oper- ation. INTRODUCTION TO AUTOMATED SYSTEM It was hoped that an automated system would indicate approaching ex- pirations and simplify the actual renewal procedure. The following spe- cific objectives were set up: 1. To provide advance notice of subscriptions due for renewal even if a renewal notice were not received. 2. To produce a purchase order, or a replica providing on a single sheet all data needed for renewal. 3. To produce a list of periodical holdings that included the history of all renewals. 4. To claim missing issues of paid and free subscriptions. 5. To produce fiscal and subject area cost reports that would facilitate budget evaluation. Two special problems had to be given consideration: ( 1) the college has a complicated check approval system requiring initiation of purchase or- ders two months before the check is needed; and ( 2) the automated system needed the capability to handle standing orders, government documents (depository and agency items), free materials, and titles held only on mi- crofilm as well as ordinary renewals. These special items make up almost 30 percent of the total subscriptions, and to maintain a parallel manual system for them would be unsatisfactory. METHOD AND MATERIALS Selecting data elements for inclusion was based as much as possible on Automated Periodicals System/HARP and HEARD 85 the types of output reports desired. A simple holdings list as an end in it- self was felt to be wasted effort, but as a by-product of the master file we wanted to generate public holdings lists twice a year. Necessary data were readily available from the three-by-five-inch cards, with one addition-a unique number was assigned to each title. The data necessary would require more than one input card to produce the type of reports we wanted; therefore, as data elements were being considered, card codes and item numbers were also assigned to identify information for programming purposes. Space was allocated to each field, using informa- tion recorded on the history cards, and the coding of certain fixed and variable fields was decided upon. The card formats are outlined in Figure 1. Figures 2 and 2A list the codes and their meanings. All Cards-Column1-5 Unique Number 6 Card Code Cardl cc 7 8 9-66 67 68 69 70-71 72-76 77-80 Card 2 cc 7-15 16-24 25---33 34-39 4Q-48 49-51 52-74 75---79 80 Card 3 cc 7-80 Type of material (coded) X ( cancel) or H (hold) (coded) Title Type of subscription (coded) How to pay (coded) Years subscription runs Account charged (coded) Cost Renewal date Invoice number Periodical holdings Microfilm holdings Purchase order number Subscription length data Frequency ( coded) Indexing ( coded ) Blank Method of payment Publisher's name Fig. 1. Field Desc1'iptions Card 4 cc 7-40 Publisher's street address 41-43 Subject code (coded) 44-49 Purchase order date 50 Blank 51-75 Publisher's city, state 76-80 Publisher zip code Cards 5, 6, 7, 8 & 9 cc 7-80 Publisher's mailing label data CardA cc 7-80 Claim information Card B-History cc 7--12 Purchase order number 13-17 Date history transferred 18-26 Dates subscription ran 27-31 Cost 32 No. years subscription ran 33-80 Invoice number Garde cc 7-80 Comments 86 Journal of Library Automation Vol. 7/2 June 1974 Cardl CC-7-Type of Material P-Periodicals I-Index V-Vertical File M-Microfilm N-Newspapers A-Membership (Assoc.) L-Librarians File CC-67-Type of Subscription N-New R-Renewal CC-68-How to Pay R-Check Request I -Imprest Fund T-Invoice in Triplicate CC-70-71-Acct. to Be Chm·ged L-Library B-Biology H-Humanities P-Physics SS-Social Science HS-Health Science T-Technology BU-Business E-Economics Card2 CC-49-51-Frequency S-Sunday only D-Daily D&S-Daily & Sunday W-Weekly Q-Quarterly A-Annually BIW-Every 2 weeks Fig. 2. Coding Symbols and Meanings BIM-Every 2 months SMO-Semimonthly SAN-Semiannually IRR-Irregular 3/y-Three times a year 5/y-Five times a year 7/ y-Seven times a year 9/y-Nine times a year 10/y-Ten times a year 11/y-Eleven times a year CC-52-73-Indexing 01-Index Medicus 02-Applied Science and Technology 03-Business Periodicals Index 04-Education Index 05-International Nursing Index 06-Library Literature 07 -MLA Bibliography 08-Nursing Literature Index 09-Public Affairs Information Science 10-Readers Guide to Periodical Literature 11-Social Science and Humanities Index 12-New York Times Index 13-Art Index 00-No Index CC-80-Method of Payment A-Payment Enclosed B-Payment and Notice Enclosed C-Please Invoice in Triplicate D-Payment and Invoice Copy Enclosed Input transmittal forms were designed with the aid of the Information Systems staff to record information for use in keypunching. Forms shown in Figures 3, 4, and 5 illustrate the transmittal of information for key- punching. While customized data transmittal forms are available commer- cially, it was found just as satisfactory and much more economical to de- sign our own forms and have them reproduced by college facilities. Since our main consideration in choosing information for inclusion was Automated Periodicals System/HARP and HEARD 87 001 Accounting 002 Aeronautics 003 African Studies 004 Afro-American 005 Agriculture 006 Anthropology and Archaeology 007 Architecture 008 Art 009 Astronomy 010 Automation 011 Banking and Finance 012 Bibliography 013 Biological Sciences 014 Boats and Boating 015 Book Reviews 016 Business and Industry 017 Chemistry 018 Cities and Towns 019 Conservation 020 Crafts and Hobbies 021 Criminology and Law Enforcement 022 Dance 023 Dissident Magazines 024 Economics 025 Education 026 Engineering 027 English 028 Entertainment 029 Fire Fighting 030 Fishing and Hunting 031 Folklore 032 Games and Sports 033 General 034 Geography 035 Geriatrics 036 German Language Fig. 2A. Codes Used fo1· Periodical Subiect List 037 Government 038 Health Science 039 History 040 Home 041 Indexes and Abstracts 042 Journalism 043 Labor and Industrial Relations 044 Library Periodicals 045 Linguistics and Philology 046 Literary and Political Reviews 047 Literature 048 Mathematics 049 Men's Magazines 050 Military 051 Motion Pictures 052 Music 053 Newspapers 054 Ornithology 055 Philosophy 056 Photography 057 Physics 058 Political Science 059 Psychology 060 Radio, TV, and Electronics 061 Religion and Theology 062 Romance Languages 063 Science-General 064 Slavic Languages 065 Sociology 066 Theatre 067 Travel 068 Traffic and Transportation 069 Vocations and Vocational Guidance 070 Women's Magazines 071 Memberships 072 Social Work to overcome our renewal problems, we had to determine what information was needed for this purpose. If purchase orders were to be generated, a field to be used as a key would be required. The program would check the contents of this key to determine whether or not the subscription was due for renewal. Since the logical key seemed to be the expiration date, it was allowed a separate field (Figure 3, Item 009), even though this partially 88 ]oumal of Library Automation Vol. 7/2 June 1974 Sheet #1 FIELD CD cc ITEH DESCRIPTIO 1 7 001 Location 8 002 Hold/Cane. 9-66 003 Title 67 004 Type Sub. 68 005 How to Pay ~ 006 No. Years 70-71 007 Account 72-76 008 Cost 77-80 009 Expir.Date 2 7-15 010 Inv.No. 16-24 011 Holdings S2_-33 012 MFHoldings 34-39 013 P.O. No. 4o-48 014 Sub.Length 9-51 015 Frequency LIBRARY PERIOD1c}~VbRTlL TR~.NSMITTAL TITLE buN 1 S /C G li 11£/V ORIGINILL INFORt.llLTION f' 0 l:>!M rrJ IS fi'E IlL IE IW cK f?. ,;1. B u 0 I ;;.. 0 oJ 0[&' 1S' I q ?:l- 1:> -lt.T E I q ~~ - I 9 7 I 0 .:?'f ~ 0 ~ 0 q 7 3 - 08'7.$' /)1 UNIQUE # 00 09 7 UPDATE INFOR~L~TIO!T D I l J l I I I I _l ll I II I IJJJJJJ 2-74 016 Indexing 0 3 i 0 l J I I I I III_LJlJIJ 80 017 Pay t.leth. 13 ').~··· - 3 7-80 018 Publisher ]) 11 N I :S R. ~ iii En/ Name .l .. 4 7-40 019 Publisher & :~ i.b JF.l.:.z:triTIHI IA-Ivl£1 I I J I I l_l_l_l I Address l l J Ll t I I 41-43 027 Sullj .Code 0 I 6 h4-h9 028 lf'.O.Date 0 &, a Lrl7J3J 51-7? 020 ublisher lA2. e.w ll::I~IILJKI I l.,t..J!t:--IWJ 1Yic61tc I f\1 I I I I I City,Stat " 76-80 jo.:>1 Pub.Zip J (I) 0 Jl9l Fig. 3. Libmry Pe1'iodical Data Tmnsmittal Form duplicated the subscription length field. (The subscription length field it- self was to be kept intact for transfer to the history record. ) A one-posi- tion field (Figure 3, Item 002) could be programmed to suppress printing of a purchase order, as in the case of a canceled subscription, or to keep the order in "hold" if a budget problem arose near the end of the fiscal year. Hold status would cause the order to be printed with the tag "Pay when authorized" to call attention to this status. Other fields shown in Fig- Automated Periodicals System/HARP and HEARD 89 !NCC LIBRARY PERIODICAL DATA 'rRANSMITTAL UNIQUE # p () 0 '9 '7 Sheet #2 - ORIGINAL INFORH!\TION UPDATE Il!FORMATI ON FIELD 0 D 'D cc ITEH DESCRIP'l'IO l ..• . -- 5 7-41 022 Publisher ~ cK -- L 'L I-- ko 4 los' ~} l 0000 13-1'7 014 18-26 c Cj ? lg - 0 f 6 9 008 2'7-31 ,{: 0 StJ 6 006 32 I I 010 33-80 'I 0 S;J. (., l l l l J B 013 7-12 ;? 8 5 ~¢ rooo 13-17 014 18-26 0 q 10 f- c n"~' o! 008 2'7-31 {/ () 'l 0 0 006 32 I I r q 0 Q tf 'f I 1111 010 33-80 I I B _Q13 7-12 0 0 b 317 tf 000 13-17 c f, - ?b2_ 014 18-26 ·c 9 ? 0 - 0 g 7 3 008 2'7-31 0 I %' 0 0 006 32 3 010 33-80 l111J B 013 7-12 000 13-17 014 18-26 n-r-L __ J 008 27-31 ~ --- 32 _r-·-,.-j-- r-· -r-r--~ 010 33-80 ! I · I LJ ~- f+t- r-· j-- -rTIEB t-r------ I I I 1 I ii-tJ l I ! j_ Fig. 5. Historical Record Form plete file on the history of each subscription. Data necessary to maintain this file were the subscription length and cost fields (described above) and the addition of fields for the purchase order number (Figure 3, Item 013) plus the invoice number (Figure 3, Item 010). The computer program was Wlitten so these data could be automatically transferred to the history rec- ord card at renewal time. Automated Periodicals System/HARP and HEARD 91 SUPERINTENCENT OF DOCUMENTS GOVERNMF.NT PRINTING OFFICE WASHINGTON, DC 20402 RE• OUR PUBLIC lANDS ATTENTION SUBSCRIPTION CLAIMS DEPARTMENT 02107/73 ACCORDING TO OUR RECORDS, WE HAVE NOT RECEIVE,D THE FOLLOWING ISSUEISI. IF OUR SU8SCRIPTION IS IN ORDER, KINDLY SEND OUR MISSING ISSUES. VOLUMF. 2?, ISSUE N0.3••SUMMER 1q72 -------.. THANK YOU MORAINE VAllEY COMMUNITY COLLEGE LIBRARY Fig. 6. Claims Letter CPL•OCT BA•D•l8220 MORAINF VAllEV CC~MUNITY COLLEGE LIBRARY 10900 SO 88TH AVE PALOS HILLS IL 60465 Claims data are transmitted as needed by providing the unique number of the title and the information concerning the missing issue( s). Claims letters (Figure 6) are then mailed in window envelopes, so no typing is required. When working with periodicals or serials one becomes accustomed to sudden or unusual changes that occur with or without notice. A few ex- amples could be changes in title, frequency, or general publishing patterns. We wanted to provide our system with the ability to notify us that an in- vestigative procedure had been completed and thus avoid many of the "why's" that recur. Accordingly, we included a comment card (Figure 4, Card C) which can be updated as circumstances require. From the transmittal forms for the initial batch of titles, cards were keypunched and built into a magnetic tape file. The serials technician now submits updates or additions (e.g., for new titles) within the schedule pro- vided by Information Systems and the tape is updated each month. The main printed report is run monthly (Figure 7). This master list includes all bibliographic, holding, and renewal information. Titles due for renewal in three months are flagged with asterisks. The technician 92 Journal of Library Automation Vol. 7/2 June 1974 M 0 R A I N E VAL LEY C 0 M M UN I T Y C 0 L L E.G E _______ ___!.2{2117_3 ______________ _ PERIODICAL LIST PAGE_ 67 R R 2 BU 01200 0875 00097 P DUN'S REVIEW 197Z•OATE 1965•1971 024502 0973•0875 M DUN'S REV! EW 666 FIFTH AVE NEW YORK, NE\1 YORK ~~RkiN~o~:~L~~R~6~~~y 86o~t ~~;3~~~~~~~0 --~ LIBRARY 10900 SOUTH 88TH AVE PALOS HILLS, !L 60465 0310 062573 _B ____________ _ 10019 ___ ____:_ _______________ _ 016 HISTORY 00097 EBSCO 0968•0869 00500 1 805276 ------------------~co o~7o oo70o 1 905441 -------------- 006374 06-_73 0970•,9873 01800 3 ' ______ ___:_____.:_____~~--"-----:__ _ ___:_ _ ______.:.___---'---~-'-----~-~------- ------- ******* HISTORY 00664 P EARLY YEARS .0373•DATE EARLY YEARS , ONE HALE LANE LIBRARY, MORAINE VALLEY COMM. COLLEGE, 10900 SOUTH 88TH AVE., PALOS HILLS, ILL. 60465 022934 0373•0374 9/Y 000 DARIEN, CONN. R R 1 T 00700 ------040273 06820 025 NO _HISTORY RECORD FOUND 0374 A ------0-~~:~1 ~L ,EB~~~~2""-D~AT~E-ll59•1 072 023-yr608'1:3-rZT4 M 10 R R 1 L 00595 1274 051013 B- HISTORY EBONY . 820 S. MICHIGAN AV_E CHICAGO, !lltN~!=._S__:..._....:____!6~0~6~05~~--'--~-- . P 160464MRN88T090092 06/B 2 MORAINE VALL COMM 06080 '10900 S 88TH AVE 13 COLL_LIB PALOS HILlS, ILL 60464 004 00098 EBSCO 0868•0769 00500 1 805276 EBSCO 0869•0770 00600 1 905441 _______ __:O:o:0_.:::63=-.7~3~05::__:•73 0870•0773 01200 3 HISTORY 00099 PX ECOLoGY ·TODAY 0371•0872 ECOLOGY TODAY R R - P ·ooooo 000000 000000000 M 000 BOX 180 Ll-6 21'3 -----~s_T _MYSTIC CONNECTICUT -~_B_B ___________ _ MORAINE VALL~Y COMM COLL LIB 10900 S 88TH AVE PALOS HILLS, ILL 60465 CEASED PUB•8/72--REPLACED WITH ENVIRONMENT 00099 010859 02•71 U37L-U'ZT2llll600 1 015680 02-73 0372•0273 00600 1 Fig. 7. Master Periodical Printout 019 I . 0000 . A determines the current subscription price and number of years to re- new each flagged title and updates these fields. At the next monthly run, facsimile purchase orders (containing all revised data except the purchase order number) are printed (Figure 8). The technician types up numbered purchase orders from these and forwards them to the business office in time for payment. We intended our system to utilize purchase order forms to be run directly on the computer. Therefore, our present method of typing from facsimiles does seem wasted effort, but is looked on as a stop- gap measure for the present and the inconvenience is tolerated while wait- ing for the more desirable method. If the computer forms are adopted, we may have increased conflict in price updates because there will be less op- portunity for last minute corrections. However, we do plan to avoid as Automated Periodicals System/HARP and HEARD 93 ASSOC. P.o. NO ... .0 .HEN.TION- · ·· ·t.Ia~kltv * ****************>lr>lr>lr.i.>lr>lri*>lrlilr>lr•iiii*H****i> .z i • vSiiR.RFNEVAi. S•la§t~~tktltl~··r~·;i~b~~Itj.;,.·. :.·,... .... ··= ··· ·>~~cit)'· * AOVOCATE * * * * * * * * * * liBRARY * * PAlOS.HlllS ll 6046~ * • PAYMENT ENCLOSED * * * * TO THIS Pri~dA~~~;'.~~;~J~ ~J~~E~ ·.··• CORR ESPfJNOENe·~.·: •... :. • ·.... · · """'' ·: •·:· < ii< .,. Fig. 8. Facsimile Purchase Order * * * • * * * * much conflict as possible by plans to run actual purchase orders closer to the actual expiration date. The renewal procedure followed involves these steps: 1. Check purchase order facsimiles for accuracy and match with renew- al notices received. 2. Check Kardex for material arrival regularity. 3. Type and forward purchase orders to business office. 4. Update forms and send to Information Systems. 5. Scan master list for flagged items and record their unique numbers and titles on update sheets. 94 Journal of Library Automation Vol. 7/2 June 1974. 6. Update flagged items with renewal notices as follows: a. Price. b. Number of years for renewal. c. New subscription dates. d. Method of payment. e. Any changed information concerning publisher and mailing label. 7. Update flagged items without renewal notices in the same manner, using the latest issue received. 8. As additional renewal notices for flagged items come in, make neces- sary updates. 9. Send all updates to Information Systems at least three days before the master list and facsimiles are due to be run. Price changes do occur between the time the item is flagged and the check is mailed. With most, though, notification is received from the pub- lisher before the purchase order is actually typed, and corrections are made at that time. Since the renewal process is linked to the expiration field, updating that field also causes transfer of data for the year just expired into the history record, as explained earlier. Free materials, government depository items, and standing orders for which invoicing is known to be automatic are handled by filling the expiration date field with zeros. If a purchase order history record is needed, as with standing orders, these fields are updated at the time the invoice arrives. Our master list does not contain headings to explain field descriptions. We place our master list in· a binder; a legend describing placement of field descriptions is attached to the inside of the front of the binder cover and is readily available for reference. We felt headings on each record would be clumsy, confusing, and would waste valuable printing space. Codes and their explanations are attached to the inside of the back of the binder cover. To date, two revisions have been installed into the system: ( 1) In 1972 we decided to classify our holdings by subject. Space was "found" for three digits, and we then proceeded to code our subjects (Figure 3, Item 027). Our subject codes and their meanings are explained in Figure 2A. ( 2) Correspondence was assisted by having all necessary information in one location. The cost, purchase order number, and problem explanation were available by merely flipping the printout pages to the title in question on the master list. However, the date the purchase order was typed had to be looked up in order to effect an intelligent solution. Six spaces were again "found" to provide this purchase order date (Figure 3, Item 028). Actual computer programming was performed by Information Systems staff in BAL, and programs are run on the college IBM 370-135 computer. Automated Pet·iodicals System/HARP and HEARD 95 RESULTS It has not been possible to figure actual monetary costs for the library portion and maintenance of this system, nor to compare these costs with the manual system. Libraries have traditionally been weak in figuring op- eration costs, and we confess to not having been very innovative in this area. We do not have specific itemized costs for our manual routines, so actual comparisons are not possible. A few figures concerning library time can be given. From October through December of 1971, when the initial phase was set up, the serials technician and public services librarian each contributed about 20 percent of their time, and a student aide worked 10 to 15 hours per week on the clerical part of the data transmittal. Since that time the system has been operational for over two years, and some time approximations concerning updating, adding to the file, etc., are now available. With development behind us, time contributed by the serials technician, who is now solely responsible for the maintenance of the sys- tem, has dropped from 20 percent to between 5 and 15 percent. Exact costs are difficult to extract, since this varies during the year according to the number of renewals due in particularly heavy expiration months as com- pared with those due in light expiration months. The library as part of the college is not charged for use of computer fa- cilities. Figures for machine time and keypunching are available and are as follows: Program Periodical additions per 100 titles Periodical updates per 100 titles Purchase order printing Claim disbursements Miscellaneous reports Machine Time (hr.) .1 .1 .5 .1 3.0 Keypunch Time (hr.) 8.0 2.0 .5 .2 .0 Information Systems has given their monetary cost in developing this sys- tem as $5,970 for programming time. They also figure program mainte- nance at $215 per year and the cost to run programs per year at $256. We can list important benefits we have derived. Renewal problems have been eliminated. The few duplicate problems can be handled now as soon as they occur. Our system handles all types of live subscriptions and the "dead file" as well. There is no more fussing with cards since we have a one-stop, clear record of holdings and histories, including the entire in- voice and payment record for each subscription. At renewal time all the information for purchase orders is listed on a single-sheet facsimile. Claim letters are done for us and we can call for various listings as they are need- ed. Reports we receive are: master listing once a month, purchase order facsimiles once a month, claim letters as needed, fiscal year total cost re- 96 Journal of Library Automation Vol. 7/2 June 1974 ports, fiscal year area cost reports, subject lists as needed, holdings lists as needed, unique number lists as needed. CONCLUSIONS Many librarians having access to sophisticated computer facilities con- tent themselves with producing a more or less elaborate holdings list. Sub- scription placements and renewals are handled manually, often through a commercial agency. Common agency problems such as overlapping and lapsed subscriptions are simply tolerated. We feel from our experience that if enough effort is expended to create a successfully operating hold- ings list, a small library does not require much further effort to add renew- al, history record, and claiming functions. This eliminates agency prob- lems, provides the ability to manipulate files for producing various reports, ancl in our opinion, results in more efficient and convenient record-keep- ing. The size of our operation falls at the lower end of a range of libraries having holdings large enough to require at least one individual's time. Translated into figures, we feel that any automated system would be wasted on holdings of under 150 periodicals. The crucial factor in rela- tion to size is not really any magic number of holdings but the ratio of available staff time to the size of the holdings. This factor must be eval- uated by libraries considering any type of automated system. We feel much of the success of our system has been dependent upon our initial planning, our staff availability, and our conviction that a change was neces- sary to/eliminate the problems we were encountering with our n:ianual sys- tem. Also the availability of the computer facilities, the encouragement provided by our superiors, and adequate library staff and Information Systems staff all contributed to an efficient changeover. ACKNOWLEDGMENTS Gratitude is due Moraine Valley Community College for its permission and support of this innovation. Particular gratitude is due Anabel Sproat, head librarian, for her permission, support, and constant encouragement. The excellent work and friendly attitude of Linda Nemeth and the en- tire Information Systems staff who made this project a reality have been ~eeply appreciated. Also, the capable assistance of student aide Barbara Hart ( Goeske) in the recording process proved to be a very valuable asset. 8934 ---- A Hybrid Access Method for Bibliographic Records Abraham BOOKSTEIN: The University of Chicago Graduate Library School, Chicago, Illinois. 97 This paper defines an access method for bibliographic reco1'ds that com- bines features of the sea1'ch key app1'oach and the inverted file approach. It is a refinement of the search key technique that permits its extension to la1'ge files. A method by which this approach can be efficiently implemented is suggested. INTRODUCTION A major problem in the development of computerized files of bibliograph- ic records is the creation of a convenient and economical mechanism to ac- cess the records. As the problem of organizing a file for efficient access is a general one, a number of structural devices have been suggested. Hsiao and Harary propose an abstract model for file structure that encom- passes those that are discussed most frequently. 1 Lefkovitz discusses these techniques in more detail and considers the advantages of each for imple- mentation, while Dodd and Knuth describe the data structures needed in implementing such files. 2-4 These works reveal the interrelation between a file's organization and its retrieval capability, but the determination of which routes of access to provide must be the task of those responsible for creating the file. Such a determination may involve consideration of both the intrinsic structure of the items represented by the file and the conditions under which the file is to be used. They will influence which file organization should be chosen. Because of the complexity inherent in collections of bibliographic " items, the problem of determining suitable access routes to library files has been a challenging one. Almost any datum may, on some occasion, be a useful means of entering the file. Dimsdale and Heaps, in their discussion of a file structure for an on-line catalog, explicitly propose words from the title, authors, and Library of Congress call numbers. 5 In this paper we shall consider the problem of accessing a known item by means of information contained in the author and title field. We shall concentrate on two approaches that have received much attention-the use 98 I ottrnal of Library Automation Vol. 7/2 June 197 4 of a truncated search key, referred to simply as search key, and the use of Boolean expressions of key words from the title. Both of these are intend- ed to allow a user simple entry into the file when the full field of informa- tion is long, complicated, or, perhaps, incompletely known by the user. The authors and titles of books often share these characteristics. Each of the approaches, taken by itself, has its strengths and its weak- nesses. We will discuss each technique in tum, and then suggest an elabora- tion of the search key technique that incorporates some features of the Boolean search technique; this combination of techniques should enable systems that are committed to the use of search keys as a primary access route to extend this technique to large files. It introduces into the search key approach some of the flexibility of the key word approach. SEARCH KEYS This approach defines at least one special field, the search key, for each item represented in the file, and allows retrieval of the record for an item by inputting the value of its search key. 6- 8 The search key should be con- structed so as to allow its evaluation from data that are available at the time of access. The main advantage of this approach as it is usually im- plemented has been its great simplicity-for a broad variety of materials, the key can be readily evaluated and quickly entered into the system. The most heavily discussed defect of this approach is that it will sometimes re- trieve a considerable number of records to a single request. Consider, for example, these works: 1. Ramsay, Blanche Margaret. Relation of various climactic factors to the growth and development of sugar beets, and 2. Ramsey, Ian Thomas. Religious Language. The popular ( 3, 3) search key, constructed by concatenating the first three letters of the author's name and the first three letters of the first sig- nificant word of the title, would represent each of these by the key RAM, REL. This defect becomes particularly severe with certain corporate en- tries and works such as conference proceedings. Furthermore, this difficul- ty can be expected to become aggravated as the file increases in size or, equivalently, as some items are given multiple search key values; the latter may be required in order to alleviate the problems inherent in having to access items with ambiguous or multiple forms of titles. Attempts to reme- dy the difficulty of multiple retrievals have resulted in increasingly com- plex keys, defeating the purpose for which this technique was originally proposed. A more complex key makes greater demands on the user, en- courages mistakes on entry, and also might increase the likelihood of two individuals deriving different keys for the same item. INVERTED FILES In this approach, a user attempts to retrieve a record by forming a Hyb1·id Access MethodjBOOKSTEIN 99 Boolean expression of key words taken from various fields of the desired record. 9• 10 Stanford University's BALLOTS, for example, allows the user to enter the file by means of words taken from the title of a book. Two advan- tages of this approach as compared to the search key are that: (a) the user need not know the information required to form a search key, for example the first word of the title; and (b) the user is able to enter the system by what appears to him to be the most distinctive terms in the title, thereby minimizing false drops. Users of BALLOTS have found that because of the speed at which computers operate, usually the indexes can be manipulated and a record retrieved immediately, or in a very short period of time. Fayollat gives an estimate of two to five seconds as the response time. The most direct way to implement this approach would be to access each record in the file and compare it to the request. For any but the smallest files this would be unreasonably costly in computer time. An alternative, and customary, implementation involves maintenance of indexes of key words. While experience with this approach, as at the BALLOTS project, recommends this as a workable implementation, it can be costly in terms of the computer costs involved with upkeeping the indexes. HYBRID APPROACH We offer for consideration an elaboration of the search key approach that incorporates aspects of the key word approach. It is intended as an al- ternative to developing increasingly complex keys for systems adopting a search key approach, but for which a simple search key retrieves too many items; possibly this approach can be selectively applied to the more trouble- some parts of the file, such as to items with corporate authors. This ap- proach associates a search key with each record, hopefully one that is sim- ple and easily derived. A user would begin by entering into the system the search key. If the system finds that the number of items that would be re- trieved exceeds a preset threshold, it would output a message requesting that the user enter a set of key words taken from various fields in the rec- ords; the title would be very useful in this regard. The system first gener- ates a subfile of records having the desired search key. If a hashing tech- nique is used, constructing this subfile can be accomplished quickly and at relatively little cost in space for tables. 11 Once the smaller file is formed, a complete search of the full records can be made for the key words. Since the system operates in two phases, it is less sensitive to the number of rec- ords the search key retrieves as far as user considerations are concerned. Ease of use becomes the dominating objective in designing the search key. Experience to date suggests that even a very simple search key will almost always produce less than thirty records with files having in the order of 100,000 records. However, a complete search of a reduced file of thirty rec- ords should be feasible; in fact, usually the subfile will be no larger than two or three records. From one point of view, in the hybrid system, we can 100 Journal of Library Automation Vol. 7/2 June 1974 think of the search key not as an access mechanism, as earlier, but rather as a file reduction mechanism. This system trades the cost of maintaining and storing large indexes for an increase in costs of computer processing; only relatively easily maintained hash tables for fixed length search keys need be maintained. An accurate assessment of these costs can be made only after the statistical characteristics of various search keys have been explored. OBSERVATIONS If it should be desired to implement a hybrid system, the following ob- servations would be in order: 1. Among the current concerns of facilities with large bibliographic files is file compaction. If records will have to be searched for key words, this consideration will influence planning of compaction tech- niques. For example, a technique such as COP ACK, which completely scrambles the bits in a record, would not be permissible. 12 Use of variable length codes for characters, such as in Hoffman coding, would allow searches for key words; most likely such a search would be implemented by attempting to match substrings of bits rather than matching on the full word level.13 Another common compaction technique, bigram coding, would also complicate the separation of words unless the blank were prevented from combining with other characters; because of the frequency with which the blank occurs with other characters, this restriction would interfere considerably with the efficacy of the technique.U A different approach would be to recognize that each word could have only two "spellings," depend- ing on what happened to the blank preceding the word, and both spellings could be tested. (A brief survey of the above compaction techniques has been conducted by Fouty.15 ) 2. Though a complete search for key words would be feasible on a small file, it is possible to expedite the search considerably by means of a technique devised by Malcolm Harrison, which involves adding a fixed number of bits, or signatures, to each field on which a search can take place; these additional bits are derived in a well-defined way from the original field. 16• 17 This subfield is a fixed-size representation of the full field in a form that can be used to very rapidly eliminate most records which would not pass the key word matching test. It is stored in the index to the file along with the address of the record. Though this preliminary test is not foolproof, it could considerably reduce the size of the subfile that requires a more costly complete search, thereby reducing the number of disc accesses. If this proce- dure is adopted, a possible sequence of events would be as follows: (a) A user inputs a search key and, perhaps, a couple of key words. These may be words he is certain are in the title, although the Hybrid Access MethodjBOOKSTEIN 101 name of a series, the author, or subject headings would also rep- resent candidates. (b) On the basis of the search key the system creates a sub file of rec- ord addresses and signatures taken from the index-if the user is unfortunate the subfile would have a large number of records. (c) A rapid preliminary search of the signatures using the Harrison technique is made of the reduced file to test whether the key words could possibly be part of a record. This pass eliminates a number of records; how efficient this technique is will depend on the number of bits the system associates with each representative field. (d) Finally, the full records of the remaining items are retrieved and a full search is made. At any point, if the subfile is too large, the system may request additional key words. EXAMPLE OF TECHNIQUE IMPLEMENTATION How to create a signature for a record is best explained by means of an example. Many variants are possible, and we have chosen a simple one for the purposes of illustration. The signature we shall create will consist of one word of thirty-two bits. We proceed as follows: 1. List all the substantive words of the title, e.g., Relation, various, cli- mactic, ... , beets, if we consider one of the titles mentioned above. 2. Truncate each word to, say, the first four characters: Rela, vari, ... , beet. Other truncation sizes, or no truncation at all, may be elected. 3. For each string of characters produced in this way, form the two con- secutive strings of three characters. For example, "vari" contributes "var" and "ari." Since the first word is already represented in the search key, we may use only the second three-letter string for that word-here "Rela" is represented only by "ela." Implicit in this im- plementation is the assumption that if a user remembers anything about a word, he will correctly remember at least its first three char- acters, and that the first four characters go a long way toward giv- ing the word away. 4. Finally, we turn on a bit in the signature for each three-character string, essentially creating a hash code of thirty-two bits. The code should incorporate information from all three characters. For pur- poses of illustration, the following method will suffice: (a) for each letter in a three-letter string, substitute the rank of that letter in the alphabet, beginning with 01 for a-thus "ela" becomes 05,12,01; (b) consider the string of digits as a single six-digit number, and multiply that number by 1111-thus "ela" becomes 51201 and then 56884311; (c) divide by 32 and use the remainder as the address of the bit which is to be turned on. The string "ela" is thus associated with bit number 23, where the leftmost bit is the Oth bit. As the algorithm is 102 journal of Library Automation Vol. 7/2 June 1974 applied to each three-character string, the signature is formed. The book by Blanche Margaret Ramsay is accordingly represented by: 01000011100100011000010100100101 Similarly the book by Ian Thomas Ramsey is represented by: 00000000000000010000000001000010 Suppose a patron, or a cataloger, wishes to see the record associated with Mr. Ramsey's book on religious language. He would enter the search key, RAM,REL, and, say, the word "language." Among the index entries re- b·ieved by the search key will be the desired book, and also the book by Ramsay, dealing with sugar beets. The signature for the word "language" has bits numbered 30 and 25 turned on. Since the Ramsay book does not have both of these turned on (in this case neither bit is turned on), it is immediately eliminated; the actual records retrieved from the file will be only those for which both bits are on. Though it is quite possible that false drops can be incurred in this way, clearly many incorrect records are easily eliminated. Note also that the user need input only as much of the word as he has confidence in, provided that at least three characters are produced. Use of the above technique leaves a number of decisions that still must be made by the system designer. Among these are: 1. Should a signature be associated with each item, or only a part of them, for example, with corporate authors? 2. How much truncation is appropriate, if any? If no truncation is used, then the user can input fragments of words, including frag- ments taken from the middle of a word, as well as full words. On the other hand, as the signature fills up, the probability of a false drop increases. Earlier research contains a formula that allows us to estimate this ef- fect.18 Consider a title with six significant words. Fayollat has found that in a file of biomedical serials, about 83 percent of all items will be of this size or less.19 Similarly, let us assume that the average word in the title is made up of eight characters, a modal number of characters in Fayollat's data base. If the user requests a term composed also of eight characters, then Table 1 estimates the probability of a false drop as a function of the b·uncation size. Table l. Probability of False Drops as Function of Truncation Size. Truncation Probability of length false drop 3 .17 4 .10 5 .08 6 .08 7 .08 8 .09 It is seen that for this typical case, the method eliminates about 90 per- Hybrid Access MethodjBOOKSTEIN 103 cent of the false drops. It must be understood that longer titles, or titles made up of longer words, will be more likely to be erroneously retrieved; on the other hand, the user can increase his precision by inputting a larger number of terms. The above calculation assumes that terms in the request and in the title are independent; of course, all items having the same search key as the request and sharing the discriminant word will be re- trieved; presumably the user will minimize this effect by choosing distinc- tive words. Fayollat finds that 50 percent of the words appearing in his titles occur only once. CONCLUSION In conclusion, we propose a technique for entering a bibliographic data base that retains the simplicity of search keys while also including some of the flexibility that Boolean expressions of key words have for uniquely defining an item. In such a system, the only indexes that must be main- tained are the hash tables; the other indexes, such as title words, are re- placed by the search algorithms. If a signature, the supplementary field de- scribed above, is also stored in the index, this approach reduces the num- ber of disc accesses. A major limitation of this approach is that a user must be able to provide a search key; this is shared, however, with systems depending exclusively on search keys. Furthermore, since the system is ca- pable of handling larger numbers of retums on the search key, there is greater inducement to associating more search key values for each item. Thus such a hybrid system allows groups that find search keys an attractive access technique to extend this approach to file sizes which strain the ca- pacities of the direct approach. REFERENCES 1. D. Hsiao and F. Harary, "A Formal System for Information Retrieval from Files," Communications of the ACM 13:67-73 (Feb. 1970). 2. D. Lefkovitz, File Structures for On-Line Systems (New York: Spartan Books, 1969). 3. G. Dodd, "Elements of Data Management Systems," Computing Surveys 1:117-35 (June 1969). 4. D. Knuth, Fundamental Algorithms, the Art of Computer Programming, Vol. 1 (New York: Addison-Wesley, 1968). 5. J. J. Dimsdale and H. S. Heaps, "File Structure for an On-Line Catalog of One Million Titles," Journal of Libmry Automation 6:37-55 (March 1973). 6. F. G. Kilgour, P. L. Long, and E. B. Leiderman, "Retrieval of Bibliographic Entries for a Name-Title Catalog by Use of Truncated Search Keys," Proceedings of the ASIS 7:79-82 (1970). 7. P. L. Long and F. G. Kilgour, "A Truncated Search Key Title Index," Journal of Libmry Automation 5:17-20 (March 1972). 8. A. Landgraf, K. Rastogi, and P. Long, "Corporate Author Entry Records Retrieved by Use of Derived Truncated Search Keys," Journal of Library Automation 6:156- 61 (Sept. 1973). 9. James Fayollat, "On-Line Serials Control System in a Large Bio-Medical Library. 104 Journal of Library Automation Vol. 7/2 June 1974 Part II. Evaluation of Retrieval Features," Journal of the ASIS 23:353-58 (Nov.- Dec. 1972). 10. A. H. Epstein et al., articles in Proceedings of the ASIS 10 ( 1973). 11. A. Bookstein, "Double Hashing," Journal of the ASIS 23:402-5 (Nov.- Dec. 1972). 12. B. A. MatTon and P. A. D. De Maine, "Automatic Data Compression," Communi- cations of the ACM 10:711-15 (Nov. 1967). 13. W. D. Maurer, "File Compression Using Hoffman Coding," in Computing Metho.ds in Optimization Problems 2, from Second International Conference on Computing Methods in Optimization Problems (New York: Academic Press, 1969), p.242-56. 14. W. D. Schieber and G. W. Thomas, "Compaction of Alphanumeric Data," Journal of Library Automation 4:198-206 (Dec. 1971). 15. Gary Fouty, Unpublished Master's thesis, University of Chicago. 16. M. Harrison, "Implementation of the Substring Test by Hashing," Communications of the ACM 14:777-79 (Dec. 1971). 17. A. Bookstein, "On Malcolm Harrison's Subsb·ing Testing Technique," Communi- cations of the ACM 16:180-81 (March 1973). 18. Ibid. 19. Fayollat, "On-Line Serials Control System." 8935 ---- 105 Application of the Variety-Generator Approach to Searches of Personal Names in Bibliographic Data Bases-Part 1. Microstructure of Personal Authors' Names Dirk W. FOKKER and Michael F. LYNCH: Postgraduate School of Librarianship and Information Science, University of Sheffield, England. Conventional approaches to processing records of linguistic origin for storage and retrieval tend to regard the data as immutable. The data gen- erally exhibit great variety and disparate frequency distributions, which are largely ignored and which entail either the storage of extensive lists of items or the use of complex numerical algorithms such as hash coding. The results in each case are far fmm ideal. The variety-generator approach seeks to reflect the microstructure of data elements in their description for storage and search, and takes advan- tage of the consistency of statistical characteristics of data elements in homogeneous data bases. In this paper, the application of the variety-generator approach to the description of personal author names from the INSPEC data base by means of small sets of keys is detailed. It is shown that high degrees of partitioning of names can be obtained by key-sets generated from the ini- tial characters of surnames, fmm the terminal characters of surnames, and from the initials. The implications of the findings for computer-based bibliographical in- formation systems are discussed. INTRODUCTION The application of computer technology to the storage of bibliographic data bases and to the selection of items from them on the basis of the con- tent of specified data elements poses considerable problems. Among the most important of these, from the viewpoint of the efficiency of computer use, is the fact that many of the individual data elements exhibit great variety (i.e., lists of their contents are extensive), and show relatively dis- parate distributions. This behavior is encountered in different degrees in regard to items such as words in the titles of monograph or periodical ar- 106 ]oumal of Library Automation Vol. 7/2 June 1974 ticles, assigned subject headings, authors' names, and citations.1- 4 Such dis- tributions have been extensively studied in various contexts by Bradford, Zip£, and Mandelbrot.4-6 In general, the distributions are approximately hyperbolic, so that a small proportion of items may account for a substan- tial proportion of occurrences, while the majority of items occur only in- frequently. The studies have been well reviewed by Fairthorne.7 Of all the data elements, personal author names exhibit a distribution which is at its most exh·eme in one direction. As is shown later in this pa- per, the most frequent author name in a file of 50,000 names occurred only sixteen times, while over 35,000 of the names, or over 70 percent of the file, occurred once only. A simple and general strategy for dealing with searches of data ele- ments, the contents of which show large variety and disparate distribu- tions, is under development by the Research Unit at the Sheffield School, and has thus far been elaborated in regard to searches of chemical struc- tures and of natural-language data bases. 8• 9 Based on information-theoret- ic principles, it involves a two-stage search procedure in which in the first and rapid stage the majority of items which cannot possibly fulfill the search criteria are eliminated, while those which meet the criteria are ex- amined for an exact match at the second stage. The criteria (or attributes) are selected on the basis of an examination of the microstructure of the items in the data base, and are chosen so that their frequencies are ap- proximately equal. The number of criteria or attributes chosen for de- scription of the items is variable within a wide range; with their aid, the variety of items can be described so as to facilitate discrimination among them. In the context of substructure searching, the attributes are representa- tions of fragments of chemical structures,10 while in the case of text, they are strings of characters which are variable in length. These strings are long when the characters comprising them represent frequent combina- tions, and short when the characters are infrequent.11 Since the sets of at- tributes can generate, in an approximate manner, the variety of items en- countered in the data base, they are termed variety generato1·s. They are in- termediate in number between the primitive set of symbols ( alphanumer- ic characters in the case of text, atoms and bonds in that of chemical struc- tures) and the actual variety of items in the collection (words or word fragments in text in the first instance, and molecules in the second). The variety-generator approach involves recognition of the fact that the statistical properties of specific data elements within homogeneous data bases are relatively constant, and that the primitive symbols of the data elements themselves usually show hyperbolic distributions. New symbol sets can therefore be defined, consisting of sequences of primitive symbols such that their frequencies of occurrence become comparable. The new symbol sets then constitute the attributes which are employed, singly or in combination, to represent the items within a search file. These symbol sets Variety-Generator ApproachjFOKKER and LYNCH 107 approximate to the ideal of equifrequency postulated by Shannon for op- timal efficiency in communication. 12 Only an approximation can be ob- tained, however, since the distributions of the newly defined symbols still cover a relatively wide range, and since they are seldom entirely indepen- dent of one another in statistical terms, and may often be strongly asso- ciated. The variety-generator concept is not entirely novel. Indeed, it was antici- pated most closely in precisely the present context by Merrill and by Cutter with a view to subdividing a library's holdings into equal groups of items.13 • 14 However, the greater flexibility of computer techniques would appear to make its use today even more attractive. This paper thus describes a study of a large file of authors' names with a view to identifying attributes of the names which can be used for effi- cient reh·ieval purposes. Assessment of the effectiveness of the attributes in retrieval is described in Part 2 of this series. (t The main terms used here are n-gram, key, and key-set, where an n-gram is a string of n adjacent char- acters. A key consists of an n-gram, and keys are chosen so that the fre- quencies of a set of keys (or key-set) are approximately equivalent in a given file. The measures used in assessing frequency distributions are Shannon's ex- pressions for the entropy of a sequence of symbols: and relative entropy: i H = - I p1log2pi i= 1 H _ Hactual r- Hmaximum Hmaxlmum is reached when the probabilities of occurrence of the symbols of the sequence are equal; its value is the binary logarithm of the variety of symbols, since 1 1 H =- n(-log2-) =log2n n n The value of the relative entropy is thus a measure of the degree of equi- frequency of a set of symbols, and is independent of their variety. CHARACTERISTICS OF NAME FILE The file studied was a collection of 100,000 personal names taken from ten issues of the INSPEC data base dating from the period 1969 to 1972. The names are represented in variable-length format, surname followed by a comma, space and initials each followed by a period. For the present purpose, case and diacritic shift symbols were ignored. <~>To appear in the September 1974 issue of the Journal of Library Automation. 108 Journal of Library Automation Vol. 7/2 June 1974 Subsets of the file were first sorted into sequence on the basis of the full names, and distributions determined both for surnames and initials, and for surnames alone, as shown in Table 1 for the subset of 50,000 names. Since the great majority of full names occur once only, the relative en- tropy of this distribution, at 0.975 (computed with respect to the 50,000 names, i.e., Hmax= log250,000), is high, while that for surnames alone is lower, at 0.904. An analysis of the ratio of unique surnames to the total number of entries in files of 25,000, 50,000, 75,000 and 100,000 names showed that the proportion of different surnames added to the file as it in- creases in size is predictable. The relationship between the number of dif- ferent surnames (D) and the total number of entries ( N) conforms to the expression: D=aNtl where a = 5.89 and {3 = 0.78. Next, the frequencies of characters at different positions in the sur- names and of the initials were determined. The most important positions in the surname are the first and last characters, as will be seen shortly. The distributions of these characters and of the first and second initials are shown in Table 2. The relative entropy of the first initial is, interestingly, Table 1. Distribution of full names and surnames alone in a file of 50,000 INSPEC names. Frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > 20 Full Names No. of Names with %of Names with Frequency f Frequency f 35,187 70.37 4,768 19.07 1,060 6.36 302 2.42 88 0.88 34 0.41 16 0.22 7 0.11 3 0.05 1 0.03 2 0.05 1 0.03 Total number of different full names = 41,469 H = 15.22 Hmax = 15.61 ( log250,000) Hr = 0.9753 SU1·names No. of Surnames with Frequencyf % of Surnames with Frequencyf 19,894 39.79 4,258 17.03 1,597 706 395 235 134 104 68 54 36 39 36 28 24 24 15 19 16 9 112 9.58 5.65 3.75 2.82 1.88 1.66 1.22 1.08 0.79 0.94 0.94 0.78 0.72 0.77 0.51 0.68 0.61 0.36 8.44 Total number of different surnames = 27,803 H = 14.11 Hmax = 15.61 ( log250,000) Hr = 0.9042 Variety-Generator ApproachjFOKKER and LYNCH 109 Table 2. Distributions of first and last characters of surname and of initials in 50,000 INSPEC name me. First Character Last Character First Second of Surname of Surname Initial Initial s 0.113 N 0.164 J 0.100 Space 0.371 B 0.083 R 0.102 A 0.083 A 0.066 M 0.080 A 0.084 R 0.081 M 0.045 K 0.076 s 0.082 M 0.064 J 0.043 H 0.056 I 0.074 G 0.058 s 0.035 G 0.055 E 0.068 v 0.051 L 0.033 p 0.053 v 0.067 D 0.050 E 0.033 c 0.052 y 0.043 H 0.050 R 0.031 R 0.047 T 0.042 s 0.047 p 0.031 L 0.047 0 0.041 E 0.043 G . 0.030 D 0.044 L 0.040 p 0.042 c 0.030 T 0.040 H 0.037 w 0.038 w 0.028 w 0.040 K 0.033 K 0.036 v 0.028 A 0.036 D 0.030 L 0.036 H 0.027 F 0.034 G 0.026 c 0.035 D 0.026 N 0.025 z 0.013 T 0.033 I 0.026 v 0.025 M 0.013 B 0.032 F 0.024 E 0,018 u 0.013 N 0.026 N 0.024 J 0.017 F 0.006 F 0.026 K 0.022 0 0.016 c 0.005 I 0.023 B 0.020 z 0.013 w 0.005 y 0.023 T 0.013 I 0.013 p 0.004 0 0.010 y 0.007 y 0.011 X 0.004 Space 0.005 0 0.005 u 0.005 B 0.003 z 0.005 z 0.002 Q 0.001 J 0.001 u 0.004 u 0.001 X Q 0.0002 Q 0.0002 Q 0.0002 X 0.0001 X 0.0001 H =4.309 H =4.039 H =4.374 H =3.688 Hmax = 4. 700 (log,26) Hmax = 4. 700 (log,26) Hmax = 4.755 (Iog.27) Hmax = 4, 755 (log,27) Hr = 0.917 Hr = 0.859 H. =0.920 H. =0.776 the highest of the four; the highest ranking initial is J, which is one of the least frequent characters in English text. Thereafter follow the first and last letters of the surname, and the second initial. The low relative entropy of the last is partly accounted for by the fact that a single initial occurred in 37 percent of the entries. Distributions were also obtained for the second and subsequent char- acters of the surname. These, and also the distributions of the first char- acter, are in general agreement with the results of earlier studies by Bourne and Ford, and by Ohlman, and indicate that consonants predom- inate in the first position, vowels in the second position, while thereafter the distributions become less disparate. 15• 16 However, due to the variable lengths of names, the dominant character at the sixth and subsequent po- sitions of the surname is the space character. KEY-SET GENERATION TECHNIQUE The basic key-set generation technique involves creating fixed-length 110 Journal of Library Automation Vol. 7/2 JuBe 1974 n-grams from some point or points of reference within each record, the strings generated being initially of length greater than those anticipated within the key-set. These strings are sorted into lexicographic order and counted. (The resultant distribution of the fixed-length strings is again hy- perbolic.) The frequencies are compared with a predetermined threshold frequency-at the first stage none of the string frequencies should exceed this value. The strings are then shortened by truncation of the right-hand character, and the frequencies of the strings which have become identical through truncation are accumulated. The new n-gram frequencies are compared with the threshold value; any strings which exceed the value are noted. The procedure is repeated until the single characters are reached. Two types of analysis are possible, redundant and nonredundant. ·In the latter, any string exceeding the threshold value is removed from the list and not processed further, while in the former they continue to the next processing stage. While redundant analysis is valuable at the exploratory stage, the nonredundant type is preferred for key-set generation. The procedure was first applied to strings of characters starting with the first character of each surname, as illustrated in Figure 1. n-gram FOREMAN FOREMA FOREM FORE FOR FO F Frequency 11 13 24 98 143 214 1685 Fig. 1. Successive right-hand truncations of a surname during key-set generation Here the frequency of the surname FOREMAN in a _file of 50,000 names is eleven. When successively shortened, other surnames with the same ini- tial n-gram are included in the count. Comparison of the count with a threshold value results in selection of a key. Here, if the threshold were 100, the key selected would be FOR. Application of the procedure to the surnames of the 50,000 name file (the name records had a maximum of eighteen characters, left-justified and space-filled if less than this length), with a threshold frequency of 300 (i.e., a probability of 0.006), gave a key-set consisting of eighty-seven keys, including all the alphabetic characters. The key-set is shown, in al- phabetic order, together with the probabilities, in Table 3. It is clear that the most frequent characters at the beginning of the surname have pro- duced most keys, S and M with eight keys each, B with seven, K with six, and H, G, P, and R each with five keys. Whereas the relative entropy of the initial surname letter was 0.917, that of the key-set is 0.977. The prob- abilities of no less than seventy of the eighty-seven keys now lie between 0.005 and 0.015. The key-set itself consists of the twenty-six alphabetic characters (one of these, X, is not represented in the collection), fifty- Variety-Generator ApproachjFOKKER and LYNCH 111 Table 3. Key-set of 87 keys produced from 50,000 surnames from INSPEC files. Key P1'0bability Key Probability Key Probability Key Probability A .023 GA .009 M .001 RO .016 AL .007 GO .011 MA .022 s .027 AN .006 GR .012 MAR .008 SA .016 B .012 GU .007 MC .007 SCH .014 BA .013 H ,006 ME .010 SE .008 BAR .006 HA .021 MI .012 SH .016 BE .017 HE .010 MO .012 SI .010 BO .014 HO .012 MU .008 so .007 BR .014 HU .007 N .011 ST .016 BU .009 I .013 NA .008 T .030 c .013 J .010 NI .006 TA .010 CA .011 JO .007 0 .017 u .005 CH .016 K .015 p .011 v .015 co .013 KA .018 PA .014 VA .010 D .015 KI .008 PE .011 w .011 DA .009 KO .017 PO .010 WA .011 DE .013 KR .008 PR .006 WE .008 DO .007 KU .010 Q .001 WI .010 E .018 L .013 R .007 X F .025 LA .012 RA .011 y .011 FR .008 LE .014 RE .008 z .013 G .015 LI .009 RI .006 H=6.2952 Hmax = 6.443 (log,87) H, =0.977 eight digram keys, and the three trigram keys BAR, MAR, and SCH. The predominance of vowels as the second character of keys is noticeable; for- ty-nine of the sixty-one n-grams have a vowel in the second position. The size of the key-set produced from a given data base can be varied arbitrarily by changing the threshold value. An approximately hyperbolic relation obtains between the value of the threshold and the number of keys selected. As the size of the key-set increases, the length of the longest n-gram in the key-set increases, and the distribution of n-grams shifts to- ward higher values, as shown in Figure 2. Stability of the key-sets with increase in file size is clearly an important factor. To determine the extent of this, successive portions of the entire file of 100,000 surnames were subjected to the analysis at a threshold value of 0.005. As illustrated in Table 4, the key-sets are remarkably stable in re- gard to total key-set size, the number of keys of each length, and to the actual keys. Table 4. Stability of size and composition of keys with increasing file size. Number of Number of Number of Number of Total Size Entries in File Characters Digrams Trigrams of Key-set 25,000 26 76 10 112 50,000 26 74 9 109 75,000 26 74 10 110 100,000 26 75 10 111 No, of keys common to key-sets 26 73 9 108 112 ]oU1'nal of Library Automation Vol. 7/2 June 1974 400 300 Number of n-grams 200 100 1 2 3 4 5 6 7 8 9 Length of n-grams Key-set size A 184 B 332 c 572 D 1034 Threshold probability 0.0025 0.0015 0.0010 0.0007 10 11 12 13 Fig. 2. Distribution characteristics of n-grams generated from 10,000 surnames from INSPEC for four different threshold values As the size of the key-set increases, the range of probabilities represent- ed among the keys narrows, and the relative entropy of the distribution in- creases, becoming eventually asymptotic with the value of one. This i~ illus- trated in Figure 3, for the surnames in a file of 50,000 entries. Beyond a key-set size of about 100, increases in the relative entropy of the resultant distribution are marginal. Furthermore, with increasing key-set size, the Va1'iety-Gene1'ato1' AppmachjFOKKER and LYNCH 113 shorter and more frequent surnames begin to appear in their entirety as keys. As an alternative to increasing the variety of the keys, the production of keys from character positions after the first letter of the surname was con- sidered. The problem of variations in name length, as well as the very dif- ferent distributions of the characters at these positions, were not encourag- ing, and instead the production of key-sets from the last letter of the sur- 1 .99 .98 .97 .96 .95 .94 .93 Hr .92 .91 .90 .89 .88 .87 .86 0 20 40 60 80 100 Total number of keys for the front of surnames Fig. 3. Increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames 114 J oumal of Library Automation Vol. 7/2 June 1974 name was investigated, and proved much more ath·active, since it is largely independent of surname length. KEY-SETS FROM THE END OF THE SURNAME For this purpose, each surname in the file was reversed within a record and subjected to key-generation. The relative entropy of the last character of the surname is substantially lower than that of the first character, at 0.860. Accordingly, the key-sets have a higher proportion of longer keys than those produced from the front of the surname, as shown in Table 5. This key-set consists of the twenty-six characters, seventy-eight digrams, Table 5. Key-set of 155 n-grams produced from last letter of 50,000 INSPEC surnames at threshold of 0.003. Key P1'obability Key P!'obability Key Probability Key Probability A .012 VICH ,005 EIN .005 IS .012 CA .003 GH .003 KIN .007 NS .006 DA .008 SH .003 LIN .005 INS .003 KA .006 TH .005 TIN .003 OS .004 MA .007 ITH .004 NN .010 RS .006 NA .003 I .014 ON .009 ss .005 INA .004 AI .004 SON .013 TS .004 RA .010 HI .007 LSON .004 us .004 TA .008 II .009 NSON .006 T .012 VA .004 VSKII .005 RSON .004 DT .003 OVA .010 KI .006 TON .009 ET .004 WA .004 SKI .005 0 .017 NT .004 YA .005 WSKI .004 KO .003 RT .003 B .003 LI .005 NKO .010 ERT .004 c .005 NI .007 NO .004 ST .004 D .009 RI .005 TO .007 TT .005 LD .005 TI .004 p .004 ETT .003 ND .006 J .001 Q .001 u .013 RD .009 K .010 R .005 v .001 E .020 AK .006 AR .006 EV .018 DE .003 CK .009 ER .016 ov .012 EE .004 EK .004 BER .003 KOV .008 GE .004 IK .004 DER .006 IKOV .004 KE .006 L .007 GER .005 LOV .005 LE .008 AL .006 NGER .003 NOV .006 NE .008 EL .012 HER .006 ANOV .006 RE .006 LL .004 IER .005 ROV .006 SE .005 ALL .004 KER .007 sov .003 TE .004 ELL .008 LER .007 w .005 F .003 M .008 LLER .005 X .004 FF .003 AM .005 MER .003 y .017 G .004 N .009 NER .010 AY .004 NG .004 AN .017 SER .003 EY .006 ANG .003 MAN .014 TER .008 LEY .007 ING .007 RMAN .003 OR .004 KY .004 RG .007 YAN .003 s .016 RY .005 H .004 EN .018 AS .007 z .007 CH .009 SEN .007 ES .011 TZ .006 ICH .003 IN .019 NES .004 H=7.059 Hmax = 7.276(log.155) Hr = 0.970 Va1'iety-Generator ApproachjFOKKER and LYNCH 115 1 .99 .98 .97 .96 .95 .94 .93 .92 Hr .91 .90 .89 .88 .87 .86 0 40 80 120 160 200 Total number of keys for the end of sumames E!g. 4. Increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames forty trigrams, ten tetragrams, and a single pentagram. The breakdown of the individual terminal characters of the surname is also more extreme, since the distribution is more skew. Thus N, the most frequent last char- acter, has no fewer than nineteen different keys in this set, closely followed by R, with seventeen keys. The relative entropy of the distribution is again high, at 0.970 for this key-set. Figure 4 shows the relation between key-set size and relative entropy, and indicates that a larger number of keys from the last character of the surname is required to reach the same relative en- 116 Journal of Library Automation Vol. 7/2 June 197 4 tropy as keys from the first character. There is an anomalous section of the curve, which may well derive from the much greater prevalence of suffixes than prefixes in personal names. CONCLUSIONS This study has demonstrated the feasibility of devising partial represen- tations of author names by applying the variety-generator approach to overcome the substantial frequency variations encountered in their dis- tributions. It has also been shown that within a homogeneous file, i.e., one of consistent provenance, there exists a substantial level of consistency in terms of character distributions, as illustrated in Table 4. The character- istics may vary substantially between data bases of different provenance, e.g., as between INSPEC and MARC files. 17 Conventional approaches to processing records comprising linguistic data tend to disregard the statistical properties of the items, and attempt to overcome the resultant problems either by storage of extensive lists of items or by using complex numerical algorithms. Typical of this latter ap- proach, in the present context, is the use of truncated search keys for ac- cess to bibliographical files in direct access stores, in which fixed-length character strings are the keys, as, for instance, in the system in operation at the Ohio College Library Center.18 The problems encountered in the use of fixed-length truncated author and title search keys for monograph data are indicated by the fact that the search files using hash-addressing are operated, on average, at a density of only 62.5 percent. Once the density reaches 75 percent, the proportion of collisions and the resultant degrada- tion in performance are such that the files are recreated at a density of only 50 percent. Fixed-length keys from author and title entries are demonstrably ineffi- cient in performance since the information content is low. The distribu- tion of the initial trigrams of 50,000 names from the INSPEC file pro- vides corroboration of this fact. The number of possible combinations of three characters is 17,576 (263 ), yet only 3,285 trigrams were represented in the file, or 18.7 percent of the total variety. Moreover, the relative en- tropy of the trigrams is much lower than that of the initial characters of the surnames, at 0.73. Performance figures for precision illustrate this point.19 The present work, together with other studies of the scope for applica- tion of the variety-generator approach, thus stands in considerable con- trast to prior work, and must be viewed as a means whereby the microstruc- ture of particular data elements is fully reflected in their manipulation, affording substantial advantages. 20 Part 2 of this paper illustrates this in re- gard to searches of personal names. ACKNOWLEDGMENTS We thank M. D. Martin of the Institution of Electrical Engineers for Vm·iety-Generator ApproachjFOKKER and LYNCH 117 provisiOn of a part of the INSPEC data base and of file-handling soft- ware, and the Potchefstroom University for C.H.E. (South Africa) for awarding a National Grant to D. Fokker to pursue this work. We also thank Dr. I. J. Barton and Dr. G. W. Adamson for valuable discussions, and the former for n-gram generation programs. REFERENCES I. P. B. Schipma, Term Fragment Analysis for Inversion of Large Files (Chicago: Illi- nois Institute of Technology Research Institute, 1971). 2. J. C. Costello and E. Wall, "Recent Improvements in Techniques for Storing and Retrieving Information," in Studies in Co-ordinate Indexing, vol. 5 (Washington, D.C.: Documentation Inc., 1959). 3. L. H. Thiel and H. S. Heaps, "Program Design for Retrospective Searches on Large Data Bases," Information Storage and Retrieval8:1-20 (Feb. 1972). 4. S.C. Bradford, Documentation (London: Crosby-Lockwood, 1948). 5. G. K. Zip£, Human Behaviour and the Principle of Least Effort (Cambridge, Mass: Addison-Wesley, 1949). 6. B. Mandelbrot, "An Informational Theory of the Statistical Structure of Language," in W. Jackson, ed., Communication Theory (London: Butterworth, 1953), p.486- 501. 7. R. A. Fairthorne, "Empirical Hyperbolic Distributions (Bradford-Zipf-Mandelbrot) for Bibliometric Description and Prediction," ]oumal of Documentation 25:319-43 (Dec. 1969). 8. M. F. Lynch, "The Microstructure of Chemical Data-bases, and Their Repre- sentation for Retrieval," Proceedings, CN AI NATO Advanced Study Institute on Computer Representation and Manipulation of Chemical Information (in press). 9. I. J. Barton, S. E. Creasey, M. F. Lynch, and M. J. Snell, "An Information-Theo- retic Approach to Text Searching in Direct-Access Systems," Communications of the ACM (in press). 10. G. W. Adamson, J. Cowell, M. F. Lynch, A. H. W. McLure, W. G. Town, and A. M. Yapp, "Strategic Considerations in the Design of Screening Systems for Substructure Searches of Chemical Structure Files," ]oumal of Chemical Docu- mentation 13:153-57 (Aug. 1973). 11. A. C. Clare, E. M. Cook, and M. F. Lynch, "The Identification of Variable-Length, Equifrequent Character Strings in a Natural Language Data Base," Computer Journal15:259-62 (Aug. 1972). 12. C. E. Shannon, "A Mathematical Theory of Communication," Bell System Technical Journal 27: 398-403 ( 1948) . 13. W. C. B. Sayers, A Manual of Classification for Librarians and Bibliographers (London: Grafton, 1926), 14. C. A. Cutter, C. A. Cutter's Alphabetic Order Table ... Altered and Fitted with Three Figures by Kate E. Sanborn (Boston: Boston Library Bureau, 1896). 15. C. P. Bourne and D. F. Ford, "A Study of the Statistics of Letters in English Words," Information & Control4:48-67 (1961). 16. H. Ohlman, "Subject Word Letter Frequencies; Applications to Superimposed Coding," Proceedings of the Inte1'national Conference of Scientific Information, Vol. 2 (Washington, D.C.: National Academy of Science, 1959), p.903-16. 17. D. W. Fokker and M. F. Lynch, "A Comparison of the Microstructure of Author Names in the INSPEC, Chemical Titles and B.N.B. MARC Data-bases" (in preparation). 118 ]oumalof Library Automation Vol. 7/2 June 1974 18. F. G. Kilgour, P. L. Long, A. L. Landgraf, and J. A. Wyckoff, "The Shared Cata- loging System of the Ohio College Library Center," Journal of Library Automation 5:157-83 (Sept. 1972). 19. F. G. Kilgour, P. L. Long, and E. B. Leiderman, "Retrieval of Bibliographic Entries from a Name-Title Catalog by Use of Truncated Search Keys," Proceedings of the ASIS 7:79-82 (1970). 20. I. J. Barton, M. F. Lynch, J. H. Petrie, and M. J. Snell, "Variable-Length Character String Analysis of Three Data-Bases, and Their Application for File Compression," Proceedings, 1st Informatics Con£., Durham, 1973 (in press). 8936 ---- The Library of Congress View on Its Relation to the ALA MARC Advisory Committee Henriette D. AVRAM: MARC Development Office, Libraq of Congress. 119 This paper is a statement of the Library of Congress' 1'ecommendation that a MARC advisory committee be appointed within the present struc- ture of the RTSD jiSAD jRASD Committee on Representation in Ma- chine-Readable Form of Bibliographic Information (MARBI) and de- scribes the Library's proposed relation to such a committee. The proposals and recommendations suggested were adopted by the MARBI Committee dming its deliberations at ALA Midwinter, Janua1'y 1974, and a1'e now in effect. INTRODUCTION During ALA Midwinter, January 1973, the Library of Congress (LC) sug- gested to the RTSD/ISAD/RASD Committee on Representation in Ma- chine-Readable Form of Bibliographic Information that a MARC advis- ory committee be formed to work with the MARC Development Office re- garding changes made to the various MARC formats. The primary inter- est of the committee would be the serial and monograph formats, though the committee should have interest in and responsibility for re- viewing changes in any of the MARC formats to insure that the integrity and compatibility of MARC content designators are preserved. The MARBI Committee decided that it would be the MARC advisory com- mittee and asked that a paper be prepared proposing how such a com- mittee would operate in relationship to the MARC Development Office. Prior to a discussion of MARC changes, it appears appropriate to make certain basic statements regarding MARC changes and the difficulties ex- perienced by the MARC Development Office in evaluating the significance of a change for the MARC subscriber. It would be naive to assume, in a dynamic situation, that even in the best of all worlds a MARC subscriber would never have to do any reprogram- ming. Changes in procedures, changes in cataloging, experience in provid- ing the knowledge for more efficient ways to process information, addi- tional requirements from users, etc., have always been factors creating the 120 ] ournal of Library Automation Vol. 7/2 June 197 4 need to both modify andjor expand an automated system. Programming installations always require personnel to maintain ongoing systems. Situa- tions creating changes locally must exist and, likewise, they also exist at LC. Staff of the MARC Development Office give serious consideration to ev- ery proposed MARC change and its impact on the MARC subscribers. However, it must be realized that it is not possible to evaluate fully the im- pact of each change because the significance of a change is directly de- pendent on the use made of the elements of the record and the program- ming techniques used by each subscriber. MARC staff cannot possibly know the details of use and programming techniques and capabilities at every user installation. Each MARC subscriber evaluates a change in light of his operational re- quirements. Since the uses made of the data are varied among users, there is rarely a consensus as to the pros and cons of a change. MARC staff are aware of the expenses imposed by changes to software and have made an attempt to solicit preferences in some cases for one technique over an- other from MARC subscribers when changes were required. In the case of the ISBD implementation, ten replies were received from questions sub- mitted to the then sixty-two MARC users. The remainder of this paper describes what is included in the term "change," the various stimuli that initiate changes, and recommendations of how LC and the MARC advisory committee should interact in regard to changes. The appendix summarizes in chart form the addenda to Books: A MARC Fo1·mat since the initiation of the MARC service. An examination of the chart will reveal that the number and the types of changes have not been too significant. MARC CHANGES The term "change" is used throughout this paper in the broad sense, i.e., the term includes additions, modifications, and deletions of content data (in both fixed and variable fields) and content designators (tags, indica- tors, and subfield codes) made to the format as well as additions, modifica- tions, and deletions made to the tape labels. The concern is with changes made to all records where applicable or groups of records but not with the correction or updating of individual records as part of the MARC Distri- bution Service. Changes as described above fall into several broad types: 1. Addition of new fields, indicators, or subfield codes to the format. 2. Implementation of aheady defined but unused tags, indicators, sub- field codes, or fixed fields. 3. Modification of content data of fields (fixed and variable). 4. Changes in style of content in records, e.g., punctuation. 5. Cessation in use of existing fields, indicators, and subfield codes. Library of Congress View/ AVRAM 121 The following paragraphs are divided into two sections. Section "a" de- scribes the stimulus for a change and the rationale for making it. Section "b" describes the LC position regarding the change and, where applicable, a recommendation to the MARC advisory committee. Changes made to MARC records may be divided into the following cate- gories: Category 1: Changes resulting from a change in cataloging rules or sys- tems. a. Cataloging rules or systems fall into two distinct types: those made in consultation with ALA (Resources & Technical Services Divi- sion/Cataloging & Classification Section/Descriptive Cataloging Committee), and those made by the Subject Cataloging Division to the subject cataloging system without consultation with ALA. LC follows AACR. Since the MARC record is the record used for LC bibliographical control as well as the source record for the LC printed card and LC book catalogs (for those items presently within the scope of MARC), cataloging changes (descriptive and subject) are necessarily reflected in MARC. If the cataloging change is such that the retrospective records can reasonably be modified by automated techniques, these records are modified to re- flect the change. Prior to MARC, this updating could not be provid- ed to subscribers to LC bibliographic products and is one of the ad- vantages of a machine-readable service. It has the effect of main- taining a consistent data base for all MARC users. b. Changes made in cataloging rules or systems will be made by the ap- propriate agencies. Once changes in cataloging rules have been made by the ALA (RTSDjCCSjDCC) committee, LC will con- sult with the MARC advisory committee with respect to their im- plementation in those cases affecting the MARC format.'~* Wherev- er possible, depending upon resources available, the number of rec- ords affected, and the type of change, the retrospective flies will be updated and made available in one of two ways: if the number of records is small (to be decided by LC), the records will be distribut- ed as corrections through the normal channels of the MARC Dis- tribution Service. If the number of records is large, the records will be sold by the LC Card Division. Category 2: Changes made to satisfy a requirement of the Library of Congress. a. Since LC uses the MARC records for its own purposes, situations do arise in which LC has a requirement for a change. In most cases, LC feels that the change would also be beneficial to the users. Un- der these circumstances LC has carefully evaluated the im- ""Format change is used in this context to mean a change affecting the tags, indicators, subfield codes, addition or deletion of fixed fields, or change to the leader. 122 I oumal of Libmry Automation Vol. 7/2 June 197 4 plication of the change to the MARC subscribers and, in some cases, solicited their preferences and advice. b. If LC has a requirement to make a change to MARC, the proposed change and the reason for the change will be referred to the MARC advisory committee. The MARC advisory committee will solicit opinions from MARC users as to whether or not to include the change in the MARC Distribution Service, and LC will abide by the committee's recommendation. If this decision is not to in- clude the change, LC will implement the change only in its own data base.t Category 3: Changes made to satisfy subscribers' requests. a. Subscribers sometimes request that a change be made to a MARC record. Where possible, within the limitation of LC resources, these requests are complied with. LC, when considering such a request, has sought the opinion of the MARC subscribers, and if sufficient numbers of users were interested in the change, the change was implemented. b. Changes requested by subscribers will be evaluated by LC, and if considered possible to implement, the proposed change will be sub- mitted by LC to the MARC advisory committee to solicit opinions from MARC users. If the committee recommends, LC will imple- ment the change. Catego1·y 4: Changes made to support international standardization. a. LC plays a significant role in international activities in the area of machine-readable cataloging records. Much of the future expan- sion of MARC depends upon standards in formats, data content, and cataloging. In all these activities, LC firmly supports AACR and current MARC formats. Occasionally, in order to arrive at complete agreement with agen- cies in other countries, it becomes necessary for all to compromise. However, in all cases LC does not agree to changes in cataloging rules until the recommendation has been approved by the appropri- ate ALA committee. b. Changes resulting from international meetings will fall principally into two areas: 1. Cataloging-if the change required is the result of a change in cataloging rules and the ALA (RTSDjCCSjDCC) has approved the AACR modifications, the MARC change falls into Category 1. 2. All other changes affecting the format-since LC is the agency in the U.S. that will exchange machine-readable bibliographic records with other national agencies, LC will consider these t An exception to this statement will be those changes to LC practice which must be reflected on cards and in the MARC record and which cannot exist in optional form. An example of the above would be abolition of the check digit in the LC card number. Libmry of Congress Viewj AVRAM 123 changes an internal LC requirement; therefore, they can be con- sidered under the proposal described in Category 2. LC will sub- mit the proposed changes to the MARC advisory committee. Category 5: Changes made to expand the MARC program to include additional services. a. If the MARC service were static, changes to expand the service would not be possible. An example of an additional service is the Cataloging in Publication data available on MARC tapes. Since these cataloging data are available four to six months prior to the publication of the item, it was determined to be of value to MARC subscribers and'changes were made to the MARC record to make these data available in machine-readable form. b. If a new service is under consideration at LC that will cause a change to MARC records, e.g., Cataloging in Publication, LC will submit the proposal to the MARC advisory committee for their ac- tion as described in Category 2. OTHER LC RECOMMENDATIONS FOR THE MARC ADVISORY COMMITTEE 1. Time fmme fo1' changes. In order to prevent consultation on changes from taking an inordinate length of time, LC proposes that the MARC advisory committee be given two months to solicit comments from MARC users, to arrive at a consensus, and to respond to pro- posed changes. If there is no response during that time, LC will im- plement the proposed change. LC will notify the MARC subscribers two months prior to including the change in the MARC Distribution Service. 2. Consultation with the MARC advisory committee. The MARC De- velopment Office will submit the recommendation for change and any other information required to evaluate the change to the MARC ad- visory committee. The MARC advisory committee will be responsible for submitting the proposal to the MARC users and notifying the MARC Development Office of the committee's recommendation. 3. Test tapes. The MARC advisory committee, on consultation with the MARC Development Office, will consider the requirement for a test tape to reflect the change made to the MARC record (the require- ment for a test tape is dependent on the type of change made). APPENDIX A Addenda to Books: A MARC Format Stimul~ for Change Date Change 1. Cataloging Rules and Cataloging System Changes 1972 U.S./Gt. Brit. changed to United States and Great Britain. Comments Change made to facilitate machine filing. 124 Journal of Library Automation Vol. 7/2 June 1974 APPENDIX A-Continued Stimulus for Change Date Change 1972 ISBD. 1973 ISBD-additional information. Comments Cataloging change based on an international agreement. 2. Subscribers Requests 1972 Government Publication Code 3. Initiated at LC: a. Addition or Deletion of Fields added to Fixed Field. 1969 Abolishment of 653-Political Jurisdiction (Subject) and 750-Proper Name Not Capable of Authorship.' These little-used fields proved difficult to define and of little value. 1970 Addition of Encoding Level to Implemented for use for Leader. RECON records. 1970 Addition of Geographic Area Code field, tag 043. 1971 Addition of Superintendent of Documents field, tag 086. This field has been widely used by LC and subscriber libraries. Information added to LC catalog cards (and thus to MARC records) at the request of outside libraries. b. Additions of Indicators 1971 Addition of Filing Indicators. or Subfields Information needed to allow LC to ignore initial articles in arranging its computer- produced book catalog. c. Addition or change of codes or data to existing fields 1972 Addition of "q" subfield to fields for conferences entered under place. 1969 Code added to Modified Record Indicator in fixed field to indicate shortened records. 1969 Code for phonodiscs added to Illustration fixed field. 1970 Code added to Modified Record Indicator in fixed field to indicate that the dashed-on entry on the original LC card was not carried in MARC record. 1971 "Questionable Condition" codes deleted from Country of Publication code. 1971 Geographic Area Code. Guidelines for implementation modified slightly and 23 new codes added. Subfield needed to enable LC to file conferences entered under place correctly. 1971 Microfilm call numbers Description of what such call carried in LC call number field. numbers looked like. 1971 Abolished LC card number check digit. Numbers available using check digit too limited. Library of Congress ViewjAVRAM 125 APPENDIX A-Continued Stimulus for Change Date Change Comments d. Explanations or 1970 Use of "b" subfield with Subfield and its use inad- Corrections Topical Subjects (Field 650) vertently omitted from Books: and Geographic Subjects A MARC Format. It occurs (Field 651). rarely in MARC records. 1971 Use of "Revision date" as Explanation of what this in- suffix to LC card number. formation means at LC and how subscribers use it. 1971 Indicators used with Explanation of use of indi- Romanized title. cators with this field omitted from Books: A MARC Format. e. Changes to labels 1972 Change to label to reflect new computer system at LC. 4. National and 1970 Standard Book Number (9 International Agreement digits ) changed to Inter- national Standard Book Number ( 10 digits) to con- form to an international standard. 1971 Entry Map added to Leader to Adoption of ANSI Z39 Format conform to national standard. for Exchange of Bibliographic Information Interchange. 1971 Change to label to conform to ANSI standard. 5. New Services at LC 1969 Changes to label and status To provide for cumulative codes for cumulated tapes. quarterly and semiannual tapes. 1971 CIP records-addition of codes to Encoding Level and Record Status. 8937 ---- 126 Standards for Library Automation and ISAD's Committee on Technical Standards for Library Automation (TESLA) The 1'0le of ISAD's Committee on Technical Standards for Library Auto- mation is examined and discussed. A p1'DcedU1'e fo1' the reaction to and ini- tiation of standards is described, with reference to relevant standards or- ganizations. The development, implementation, and maintenance of standards might best be characterized as the complexity of simplification-complex insofar as a standard represents a universally applicable ideal which is usually the result of arduous negotiation and compromise; simplification since a stan- dard, once recommended and followed in practice, forms a firm reference point for the achievement of specified objectives. Thus, if a standard exists, it can be referenced or used immediately and variant wheels do not have to be invented. Unfortunately, to use, reference, or advocate a standard requires an awareness of available standards or the process whereby standards evolve. It is at this point that standards again become complex-in fact, they be- come a maze, which perhaps can be characterized by questions such as: Is there a standard already? Who is responsible for it? Where are copies of this standard available? and so on (a maze familiar, certainly, to all of us). n is precisely to address the mazelike aspect of the standards "game" that the Committee on Technical Standards for Library Automation ( TESLA) has been established. In short, TESLA intends to act as a two- way clearinghouse, hopefully to bring user and supplier into a meaningful dialogue wherein the requirements of both might be satisfied. TECHNICAL STANDARDS AND DATA DEPENDENCE Within this context the emphasis by TESLA shall be on technical stan- dards for library automation (e.g., standards relating to electronic data processing devices and techniques). Concurrently, however, there are in- stances where device and data become inseparably linked. For example, the Standatds fot Libraty Automation 127 relationship between the physical dimensions of a machine-readable pa- tron badge and the amount and, therefore, type of data which can be me~ chanically encoded in it; or the character set used by a terminal and the minimum processing potential and, thus, hardware which must be internal to the terminal to receive, transmit, and display that character set. Because it would be foolish to ignore this relationship, TESLA in its clearinghouse function will stress and foster the involvement of individuals or organiza- tional units within the American Library Association wherever data-de- pendent technical standards are involved. ALA-ORIGINATED AND MAINTAINED TECHNICAL STANDARDS Though certainly no mystery, there is little evidence that the direct cost and personal involvement for a published and practiced standard is popu- larly known. For example, it has been indicated by those within the stan- dards business that an adopted standard might culminate an investment of over a million dollars and represent the expenditure of tens of man-years. The cost, for example, leading to and including the final publication by the American National Standards Institute (ANSI) of the standard for bibliographic information interchange on magnetic tape (ANSI Z39.2- 1971), more popularly known as MARC, has not been published. It is sus- pected, however, that the cost of the MARC standard was monumental. In short, and by way of this example, it can be safely assumed that neither the American Library Association nor ISAD nor TESLA will become stan- dards organizations in the strict sense of the word. In fact such a capa- bility is not desirable, since organizations such as ANSI, Electronics Indus- try Association (EIA), Institute of Electronic and Electrical Engineers (IEEE), National Microfilm Association ( NMA), etc., exist and are geared specifically to this activity. Rather, the American Library Association and ISAD should, and must, participate actively in the standards processes available to them to insure a meaningful user-voice in the development of standards by those organizations. To provide for participation in the stan- dards process at the membership level is precisely TESLA' s role. Thus, when placed in operation, such standards will reflect the library communi- ty's requirements, contributing to and fostering library automation rather than hindering it. At least one of the anticipated results would be the de- velopment of equipment addressing library needs directly, and so preclude the custom fabrication of specialty devices which while satisfying the needs of a few libraries-expensively, cannot satisfy libraries in general- economically. WHAT IS PROVIDED BY TESLA? TESLA specifically has established a procedure whereby the membership of the American Library Association might either teact to proposals for standards regardless of origin, or initiate proposals for standards for mem- bership reaction. The results of this procedure, whether reactive or initia- 128 Journal of Library Automation Vol. 7/2 June 1974 tive, would be communicated to the membership in terms of the status and position taken for each proposal, and to the originator and to ALA's offi- cial representative in full detail for subsequent application. THE TESLA PROCEDURE The procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ALA) standards propo- sals to provide recommendations to ALA's representatives on existing, rec- ognized standards organizations. To enter the procedure for an initiative standards proposal the member must complete an "Initiative Standards Proposal" using the outline which follows: Initiative Standard Proposal Outline The following outline is designed to facilitate review by both the committee and the membership of initiative standards requirements and to expedite the handling of Initia- tive Standards Proposals through the procedure. Since the outline will be used for the review process, it is to be followed explicitly. Where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: VI. Existing Standards. Not Applicable). Note that the parenthetical statements following most of the outline entry descrip- tions relate to the ANSI Standards Proposal section headings to facilitate the translation from this outline to the ANSI format. All Initiative Standards Proposals are to be typed, double spaced on 8%" x 11" white paper (typing on one side only). Each page is to be numbered consecutively in the upper right-hand corner. The initiator's last name followed by the key word from the title is to appear one line below each page number. I. Title of Initiative Standard Proposal (Title). II. Initiator Information (Forward). A. Name B. Title C. Organization D. Address E. City, State, Zip F. Telephone: Area Code, Number, Extension III. Technical area. Describe the area of library technology as understood by initia- tor. Be as precise as possible since in large measure the information given here will help determine which ALA official representative might best handle this proposal once it has been reviewed and which ALA organizational component might best be engaged in the review process. IV. Purpose. State the purpose of Standard Proposal (Scope and Qualifications). V. Description. Briefly describe the Standard Proposal (Specification of the Standard). VI. Relationship of other standards. If existing standards have been identified which relate to, or are felt to influence, this Standard Proposal, cite them here (Expository Remarks). VII. Background. Describe the research or historical review performed relating to this Standard Proposal (if applicable, provide a bibliography) and your find- ings (Justification). Standards for Librm·y Automation 129 VIII. Specifications. Specify the Standard Proposal using record layouts, mechanical . drawings, and such related documentation aids as required in addition to text exposition where applicable (Specification of the Standard). Please note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. In addition, the outline permits the presentation of background and de- scriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. The Reactor Ballot (Fig- ure 1) is to be used by members to voice their recommendations relative to initiative standards proposals. The Reactor Ballot permits both "for" and "againsf' votes to be explained permitting the capture of additional infor- mation which is necessary to document and communicate formal standards proposals to standards organizations outside the American Library Asso- ciation. As you, the members, use the outline to present your standards pro- TESLA REACTOR BALLOT Reactor Information Name __________________ __ Title Organization Address City___ State ___ Zip __ Telephone ---------- Identification Number For Standard Requirement ------------ For ------------- Against Reason for Position: (Use Additional Pages if Required) Fig. 1. TESLA Reactor Ballot posals, TESLA will publish them in ]OLA-TC and solicit membership re- action via the Reactor Ballot. Throughout the process TESLA will insure that standards proposals are drawn to the attention of the applicable American Library Association division or committee. Thus, internal review usually will proceed concurrent with membership review. From the review and the Reactor Ballot TESLA will prepare a majority recommendation and a minority report on each standards proposal. The majority recom- 130 Journal of Library Automation Vol. 7/2 June 1974 Receipt Screen Division REJ/ACP1 Publish Tally Representative Title/I.D. Number Date Date Date Date Date Date Date Target Fig. 2. TESLA Standards Scoreboard mendation and minority report so developed will then be transmitted to the originator, and to the official American Library Association representa- tive on the appropriate standards organization where it should prove a Standards for Library Automation 131 source of guidance as official votes are cast. In addition, the status of each standards proposal will be reported by TESLAin JOLA-TC via the Stan- dards Scoreboard (Figure 2). The committee ( TESLA) itself will be non- partisan with regard to the proposals handled by it. However, the com- mittee does reserve the right to reject proposals which after review are not found to relate to library automation. TESLA'S COMPOSITION TESLA is comprised of representatives both from the library communi- ty and library suppliers to insure a mix of both users and producers for its review of standards proposals. In addition, rotating membership on TESLA will insure a continuing movement of voices from different seg- ments of the library and library supplier communities to shortstop the pressing of vested interests. At this time, the members of TESLA and the term for each are: Ms. Madeline Henderson Chairperson, Task Force on Automation of Library Operations and Federal Library Committee U.S. Department of Commerce/NOS Washington, DC 20234 Term ends: 1974 Mr. Arthur Brody Chairman of the Board Bro-Dart Industries 1609 Memorial Ave. Williamsport, PA 17701 Term ends: 1975 Dr. Edmund A.' Bowles Data Processing Division International Business Machines Corpo- ration 10401 Fernwood Rd. Bethesda, MD 20034 Term ends: 1974 Mr. Anthony W. Miele Assistant Director Technical Services Illinois State Library Centennial Building Springfield, IL 62706 Term ends: 1975 STANDARDS LIBRARY Mr. Jay L. Cunningham Director, University-wide Library Automa- tion Program University of California, South Hall Annex Berkeley, CA 94720 Term ends: 1976 Mr. Richard E. Uttman P.O. Box 200 Princeton, NJ 08540 Term ends: 1976 Mr. Leonard L. Johnson Director of Media Services Greensboro Public Schools Drawer V Greensboro, NC 27402 Term ends: 1975 Mr. John C. Kountz (Chairman) Associate for Library Automation Office of the Chancellor The California State University and Col- leges 5670 Wilshire Blvd., Suite 900 Los Angeles, CA 90036 Term ends: 1976 In addition to acting as a clearinghouse for standards for ISAD, and the maintenance of the standard proposal and Reactor Ballot procedure, TESLA intends to urge the establishment of an ALA collection of stan- 132 Journal of Library Automation Vol. 7/2 June 1974 dards applicable to libraries to handle requests for information from the library community. Thus, while currently each member is left to "do it himself," there appear to be definite economies in the centralization of such a collection and the periodic publication of indices to relevant stan- dards. SOURCES OF INFORMATION RELATING TO STANDARDS To provide a source of guidance at this time for the types of available standards, to list the many existing standards, and to index the originating standards organizations would consume several issues of JOLA. Therefore, in the following are very brief recapitulations of the more relevant or- ganizations impacting library automation standards. The list is very incom- plete as might be expected. For those interested in a comprehensive review of standards, Global Engineering, 3950 Campus Drive, Newport Beach, CA 92660, maintains and annually publishes their Directory of Engineer- ing Document Somces .. This directory, now in its second edition, lists over 2,000 standards organizations and the prefixes used by those organizations in publishing their standards, to permit Global Engineering's customers to specify standards for purchase. ( Global Engineering's primary function is the sale of original copies of standards and specifications.) American National Standards Institute, Inc. (ANSI)-The American National Standards Institute, Inc. ( 1430 Broadway, New York; NY 10019) does not write standards. Rather, ANSI has established the procedures and guidelines to be followed in the development of standards that will be labeled American National Standards. The actual work is done by ANSI members and other interested groups and individuals, using the ANSI pro- cedures in the development of standards. Only after these groups have demonstrated to ANSI's satisfaction that the proposed standard has been developed in accordance with the procedure established by ANSI will it be approved and published as an American National Standard. In addition, ANSI publishes materials relating to standards, of which the ANSI Report- er, a journal dedicated to standards, probably represents the single best source for information relating to current standards issues. However, the scope of ANSI is very broad. Thus for library and library automation ac- tivities specific committees of ANSI, rather than ANSI itself, are relevant; specifically, ANSI committees: PH5 (Microfilm). See National Microfilm Association below. X3 ( Computers and Information Processing); X4 (Business Machines and Supplies). Both of these subcommittees are sponsored by the Com- puter and Business Equipment Manufacturers' Association ( CBEMA), which is also their secretariat. CBEMA ( 1828 L St. NW, Washington, DC 20036) periodically provides indices to the published standards of X3. An insight into the breadth of X3's activities can be implied from Figure 3, ANSI X3 Standards Committee Organization. X3 currently has Standards fo1' Library Automation 133 BEMA ISO/TC 97 SecreUiriat INFORMATION PROCESSING SYSTEMS TECHNICAL ADVISORY BOARD (IPSTAB) Computers and lnfonnation Processing AMERICAN NATIONAL STANDARDS INSTITUTE AMERICAN NATIONAL STANDARDS COMMITTEE ~=:.:.:.:;:.~::::::;;r:----1 CONSUMER MEMBERS -i I I I I I :------, : I : I : I I -- I I Standanls Department Administrstion Secretariat a Standards Advisory Committee Coordination DPG Advisory Committee On Plans & Policy Policy DPG Standards Conmittee Technics/ '---- -- - -- - - STAFF LINE STAFF I I I STANDARDS STANDARDS INTERNAnONAL PLANNING & STEERING ·ADVISORY REQUIREMENTS COMMITTEE COI.WITTEE COI.WITTEE (SSC) (lAC) (Sl'ARC) I STUDY GROUI'S I I•• requiredl I I Hardware Gro141 Software Gro141 Systems G10141 RECOGNITION SECTION LANGUAGE SECTION DATA COMMUNICATIONS SECTIO~ MA" M:ro "S" IQA1 OCR IC3J1 PL/1 IC3S3 Data Communications 113A7 MICA 113J3 FORmAN PHYSICAL MEDIA SECTION X3J4 COBOL SYSTEMS TECHNOLOGY SECTION X3J7 APT "T" "au X3JB ALGOL Magnetic Tape & Cauettea 113'19 1/0 lntarfaca X3B1 DOCUMENTATION SECTION X3B2 Perforated Tape "K" X3B3 Punched Cards X3B4 Edge Punched Carda X3K1 Documentation 11387 Megnetic Disc. 1131<2 Flow Charta 1131<5 Vocabulary X3KB Network-Oriented lnfonnation Syatama DATA REPRESENTATION SECTION ML" X3L2 Cod"" X3L5 Labels 113LB Data Repntsontatkln Fig, 3, ANSI X3 Standards Committee Organization, 134 ]oumal of Librmy Automation Vol. 1/2 June 1974 over fifty member organizations. The ALA representative on X3 is Mr. James Rizzolo of the New York Public Library. An excellent overview of X4's scope and activities was published in The Secretary (Nov. 1973) under the title "What's Being Done About Office Equipment Standards." From the library viewpoint X4' s activities in credit cards, typewriter keyboards, and forms are of interest. At this writing X4 has nine user, fifteen producer, and nine general interest members. The ALA is not currently represented on X4. Z39 (Library Work, Documentation and Related Publishing Practices) sponsored by the Council of National Library Associations. With thirty- six subcommittees ( SC), Z39 covers library related activities from ma- chine input records ( SC/2) through standard order forms ( SC/36). It was through Z39 that MARC became the American National Standard (Z39.2-1911). Z39 publishes a quarterly entitled News About Z39. Z39 is located at the School of Library Science, University of North Caro- lina, Chapel Hill, NC 27514. Fifteen standards have been published by Z39. The ALA representative to Z39 is Fred Blum of Eastern Michigan University, Ypsilanti, whose excellent summary of Z39 appears in the Winter 1974 issue of Library Resources & Technical Services. National Microfilm Association (NMA)-The National Microfilm Asso- ciation has an organization of standards committees and a standards board as shown in Figure 4. Information relating to their standards is published from time to time in the ]oumal of Mic1·ographics. A recent article, en- titled "Standards: NMA Standards Committee Scope of Work," (Vol. 7, No. 1 [Sept. 1973] ), briefly describes the subcommittees internal to NMA's standards organization and the scope of each. Of particular interest to li- braries is the sponsorship by NMA of ANSI-PH5. Micrographic standards are listed in an NMA publication ( RR1-1974 Resource Report). Copies of this Resource Report may be obtained by contacting the NMA at Suite 1101, 8728 Colesville Rd., Silver Spring, MD 20910. International Organization for Standardization (ISO)-This organiza- tion is truly international with representatives from thirty-five nations. The secretariat of ISO is ANSI (see above). While ISO parallels ANSI in its coverage, it differs organizationally. Thus, the committees/subcommit- tees of ANSI have in large measure their equivalent technical committees, subcommittees, and working groups in the ISO. Standards developed by the ISO and published by them are reported regularly in the ANSI Reporter (referred to above). Most recently, the January 11, 1974 ANSI Reporter contained an article outlining ISO publications and describing five ISO titles. MARC, by the way, is also ISO standard 2709. The ISO Technical Committees ( TC) of immediate interest to library automation are TC 37 (terminology), TC 46 (documentation), TC 95 (office machines), and TC 97 (computers and associated information processing systems, NMA STANDARDS BOARD I I I I I I MICROFICHE INSPECTION & MATERIALS & OPERATIONAL PUBLIC EQUIPMENT QUALITY CON. SUPPLIES PRACTICES RECORDS I I I I I I MICRO- DRAFTING TERMINOLOGY INFO. STORAGE ROTARY REDUCTION FACSIMilE & RETRIEVAL CAMERAS RATIOS I I i I I I FlOW CHART NEWSPAPERS COM FORMAT COM COM ECOLOGY SYMBOlS & CODING QUALITY SOFTWARE Fig. 4. NMA Standards Organization 136 Journal of Libmry Automation Vol. 7/2 June 197 4 peripheral equipment and devices, and media related thereto). As an indi- cation of the technical areas covered by TC 97, its organization is shown in Figure 5. Electronic Industries Association (EIA)-The Electronic Industries As- sociation maintains a broad variety of standards for hardware and related peripheral equipment. Such areas as cathode ray tube (CRT) terminals, the luminescence of cathode ray tubes themselves, television transmission, and data communications are dealt with by the EIA standards. An excel- lent source of EIA standards is the publication produced by the EIA en- titled Index of EIA and JEDEC Standards and Engineering Publications ( 1973 Revision and No. 2). Copies are available through the Electronic In- dustries Association, Engineering Department, 2001 I St. NW, Washington, DC 20006. The ALA is not a member of the EIA, by definition. The Institute of Electt·onic and Electrical Engineers (IEEE)-The In- stitute of Electronic and Electrical Engineers, Inc. is a professional organi- zation which, in addition to its professional activities, maintains standards. Many of these standards relate to library automation in such areas as key- boards for terminals and transmission types for data communications. While each monthly issue of the IEEE publication Spectrum contains an- notated lists of new standards, a full index to the IEEE standards is avail- able by contacting the IEEE Headquarters, 345 E. 47 St., New York, NY 10017. National Bureau of Standards (NBS)-The National Bureau of Stan- dards, U.S. Department of Commerce, has the responsibility within the federal government for monitoring and coordinating the development of information processing standards and publishing proved data standards for data elements and codes in data systems. Thus, the National Bureau of Standards works closely with federal departments and agencies, the American National Standards Institute (ANSI), and the International Or- gatlization for Standardization (ISO). Of specific interest to library auto- mation are the Fedeml Information Processing Standards (FIPS) and the FIPS Index published by NBS. The annual FIPS Index (FIPS PUB 12-1) is a veritable gold mine of information relating to ANSI, ISO, federal government participation and representatim,1 in the standards process, and the role of NBS itself. FIPS 12-1 is available from the Superintendent of Documents, U.S. G.P.O., Washington, DC 20402 (SD Catalog No. C 13:52: 12-1). While the material above should provide a brief overview of the stan- dards arena in which TESLA will function and some insight into the scope of standards activities, it is not to be construed as a definitive compilation of standards organizations. As indicated earlier, over 2,000 such organiza- tions are known to be active currently. ISO/TC97 COMPUTERS AIID Ill FORMATION PROCESSING USA SCI SC2 SC3 SC4 sc 5 sc 6 SC7 S C8 'IGK CHARACTER CHARACTER PROGRAM lUNG DIGITAL DATA PROBLEM NUMERICAL DATA ELEMENTS VOCABULARY Ill PUT/ OUTPUT DEFINITION l CONTROL OF l THEIR CODED SETS lCODINC RECOGNITION LANGUAGES TRANSMISSION ANALYSIS MACHINES REPRESENTATIONS SEC: FRANCE FRANCE USA ITALY USA USA GERMANY FRANCE USA I I I WGI WG I 'IG2 PROG\i~NIIIG WGI VOCABULARY OPTICAL MAGNETIC INK LAitGUAGE FOR VOCABULARY MAINTENANCE CHARACTER CHARACTER NUMERIC CONTROL FOR NUMERIC RECOGNITION RECOGNITION OF MACHINES CONTROL USA SWITZERLAND BELGIUM USA FRANCE we 1 'IG2 WG3 WG4 IGS 'IG6 MAGNETIC PUNCHED PUNCHED 1/0 IIISllWWEIITA TIOI MAGNETIC TAPE CARDS TAPE EQUIPNEIIT TAPE DISK PACKS USA FRANCE ITALY GERMANY USA GERMAIIY Fig. 5. ISO/TC 97 Organization Chart 138 ] ournal of Libm1'y Automation Vol. 7/2 June 197 4 FINALLY, AN INVITATION During the formative period of TESLA a list of potential standards areas for library automation was developed. Potential Technical Standa1'ds A1·eas 1. Codes for libraries and library networks, including network hierarchy structures. 2. Documentation for systems design, development, implementation, operation, and post-implementation review. 3. Minimum display requirements for library CRTs, keyboards for ter- minals, and machine-readable character or code set to be used as label printed in book. 4. Patron or user badge physical dimension ( s) and minimum data ele- ments. 5. Book catalog layout (physical and minimum data elements) : a. Off-line print b. Photocomposed c. Microform 6. Communication formats for inventory control (absorptive of inter- library loan and local circulation). 7. Data element dictionary content, format, and minimum vocabulary, and inventory identification minimum content. 8. Inventory labels or identifiers (punched cards, labels, badges, or ... ) physical dimensions and minimum data elements. 9. Model/minimum specifications relating to hardware, software, and ser- vices procurement for library applications. 10. Communication formats for library material procurement (absorptive of order, bid, invoice, and related follow-up). You are invited to review this list and voice your opinion of any or all areas indicated by means of the Reactor Ballot in ]OLA-TC in this issue. Or, if you've a requirement for a standard not included in this list, use the Initiative Standard Proposal Outline to collect and present your thoughts. Henceforth, future issues of ]OLA-TC will contain a Reactor Ballot and the Scoreboard. The ball is in your court! Send ballots and/or initiative standard proposals to: John Kountz, Chair- man, ISAD-TESLA, 5670 Wilshire Blvd., Suite 900, Los Angeles, CA 90036. 8945 ---- 139 TECHNICAL COMMUNICATIONS ANNOUNCEMENTS Panel Discussion on «Government Publi- cations in Machine-Readable Form" This meeting will be held on July 10 from 8:30 to 10:30 p.m. as a part of the American Library Association's 1974 New York Conference. The meeting is co- sponsored by the Government Documents Round Table's (GODORT) Machine- Readable Data File Committee, the Fed- eral Librarians Round Table (FLIRT), the RASD Information Retrieval Committee, and the RASD/RTSD/ ASLA Public Docu- ments Committee. The moderator is Gretchen DeWitt of Columbus Public Library and the panelists are Peter Watson of UCLA, Mary Pensyl of MIT, Judith Rowe of Princeton, and Billie Salter of Yale. Mr. Watson will dis- cuss the general issues concerning the ac- quisition and use of bibliographic data files and provide a brief description of some of the files now publicly available; Miss Pensyl will describe the workings of the project now underway to make these files available to MIT users. Mrs. Rowe will discuss the ways in which govern- ment-produced statistical files supplement the related printed reports and will indi- cate some of the types and sources of files now being released; Miss Salter will dis- cuss a program for integrating these and other research files into Yale's social science reference service. Representatives of sev- eral federal agencies will display materials describing and documenting both biblio- graphic and statistical data files. The purpose of the program is to ac- quaint reference librarians, particularly those now handling printed documents, with the uses of both types of files, the advantages and disadvantages of these reference tools, and the techniques and policy changes necessary for their use in a library environment. The recent release of the draft proposal produced by the National Commission on Libraries and Information Services makes more timely than ever an open discussion of the place of bibliographic and numeric data files in a reference collection. All librarians must be acquainted with these growing resources in order to continue to provide full service to their patrons. For further information, contact Judith Rowe, Computer Center, Princeton Uni- versity, 87 Prospect Ave., Princeton, NJ 08540. Ninth Annual Educational Media and Technology Conference to be Hosted by University of Wisconsin-Stout, July 22-24, 1974 AETC past president Dr. Jerry Kemp, coordinator of instructional development services for San Jose State University (Cal- ifornia), and film consultant Ralph J. Amelio, media coordinator and English instructor at Willowbrook High School, Villa Park, Illinois, will headline the Uni- versity of Wisconsin-Stout's 9th Annual Educational Media and Technology Con- ference to be held in Menomonie, Wiscon- sin, on July 22-24, 1974. "Educational Technology: Can We Realize Its Poten- tial?" will be the subject of Kemp's presen- tation on Monday evening, while Amelio, speaking on Tuesday, July 23, will chal- lenge participants with the subject "Visual Literacy: What Can You Do?". Seven concurrent workshops will be held on Monday afternoon: Library Automa- tion; Sound for Visuals; Making the Time- Sharing Computer Work for You; New Developments in Photography; What's 140 Journal of Libmry Automation Vol. 7/2 June 1974 New in Graphics; Selecting and Evalu- ating Educational Media; and Instructional Development: How to Make It Work! Individuals leading the three-hour work- shops will include: Alfred Baker, vice- president of Science Press; John Lord, technical service manager for the DuKane Corporation; William Daehling, Weber State College, Ogden, Utah; and several media specialists from Learning Resources, University of Wisconsin-Stout. About fifty exhibitors will show and demonstrate both hardware and software during the conference. Six case studies will be given of exemplary media programs at the public school, vocational-technical, and college level. Further information may be obtained by contacting Dr. David P. Bernard, Dean of Learning Resources, University of Wis- consin-Stout, Menomonie, WI 54751. Report of RECON Project Published The Library of Congress has published in RECON Pilot Project (vii, 49p.) the final report of a project sponsored by LC, the Council on Library Resources, Inc., and the U.S. Office of Education to determine the problems associated with centralized conversion of retrospective catalog records and distribution of these records from a central source. In the MARC Pilot Project, begun in November 1966, the Library of Congress distributed machine-readable catalog rec- ords for English-language monographs, and the success of that project led to the implementation in March 1969 of the MARC Distribution Service, in which over fifty subscribers have by now received more than 300,000 MARC records repre- senting the current English-language mono- graph cataloging at the Library of Con- gress. As coverage is extended to catalog records for foreign-language monographs and for other forms of material, libraries will be able to obtain machine records for a large number of their current titles. More research was needed, however, on the problems of obtaining machine- readable data for retrospective cataloging, and the Council on Library Resources made it possible for LC to engage in No- vember 1968 a task force to study the feasibility of converting retrospective cat- alog records. The final report of the RECON (for REtrospective CONversion) Working Task Force was published in June 1969. One of the report's recommendations was that a pilot project test various con- version techniques, ideally covering the highest priority materials, English-lan- guage monograph records from 1960-68; and with funds from the sponsoring agen- cies LC initiated a two-year project in August 1969. The present report covers five major areas examined in that period: 1. Testing of techniques postulated in the RECON report in an operational environment by converting English- language monographs cataloged in 1968 and 1969 but not included in the MARC Distribution Service. 2. Development of format recognition, a computer program which can pro- cess unedited catalog records and supply all the necessary content des- ignators required for the full MARC record. 3. Analysis of techniques for the con- version of older English-language materials and titles in foreign lan- guages using the roman alphabet. 4. Monitoring the state-of-the-art of in- put devices that would facilitate conversion of a large data base. 5. A study of microfilming techniques and their associated costs. RECON Pilot Project is available for $1.50 from the Superintendent of Docu- ments, U.S. Government Printing Office, Washington, DC 20402. Stock No. 3000- 00061. Library of Congress Issues RECON Working Task Fo1'ce Report National Aspects of Creating and Using MARC/RECON Records (v, 48p.) re- ports on studies conducted at the Library of Congress by the RECON Working Task Force under the chairmanship of Henriette D. Avram. They were made concurrently with a pilot project by the library to test the feasibility of the plan outlined in the task force's first report en- titled Conversion of Retrospective Reco1·.ds to Machine-Readable Form (Library of Congress, 1969) and in RECON Pilot P1'oject (Library of Congress, 1972). Both the pilot project and the new studies re- ceived financial support from the Council on Library Resources, Inc., and the U.S. Office of Education. The present volume describes four in- vestigations: ( 1) the feasibility of deter- mining a level or subset of the established MARC content designators (tags, indi- cators, and subfield codes) that would still allow a library using it to be part of a future national network; ( 2) the prac- ticality of the Library of Congress using other machine-readable data bases to build a national bibliographic store; ( 3) implications of a national union catalog in machine-readable form; and ( 4) alter- native strategies for undertaking a large- scale conversion project. The appendices include an explanation of the problems of achieving a cooperatively produced bib- liographic data base, a description of the characteristics of the present National Union Catalog, and an analysis of Library of Congress card orders for one year. Although the findings and recommenda- tions of this report are less optimistic than those of the original RECON study, they reaffirm the need for coordinated activity in the conversion of retrospective catalog records and suggest ways in which a large-scale project might be undmtaken. The report provides a basis for realistic planning in a critical area of library au- tomation. National Aspects of Creating and Using MARC!RECON Records is available for $2.75 from the Superintendent of Docu- ments, U.S. Government Printing Office, Washington, DC 20402. Stock No. 3000- 00062. ISAD OFFICIAL ACTIVITIES TESLA Info1'mation Editor's Note: Use of the following guidelines and forms is described in the article by John Kountz in this issue of Technical Communications 141 JOLA. The TESLA Reactor Ballot will also appear in subsequent issues of Technical Communications for reader use, and the TESLA Standards Score- board will be presented as cumulate.d 1'esults warrant its publication. To use, photocopy or otherwise duplicate the forms presented in JOLA-TC, fill out these copies, and mail them to the TESLA chai1'man, M1'. John C. Kountz, Associate fo1' Libmry Automation, Office of The Chancello1', The Califomia State University and Colleges, 5670 Wilshim Blvd., Suite 900, Los Angeles, CA 90036. Initiative Standard Proposal Outline- The following outline and forms are de- signed to facilitate review by both the ISAD Committee on Technical Standards for Library Automation (TESLA) and the membership of initiative standards re- quirements and to expedite the handling of the Initiative Standard Proposal through the procedure. Since the outline will be used for the review process, it is to be followed ex- plicitly. Where an initiative standard re- quirement does not require the use of a TESlA REACTOR BAllOT Reactor Information Name Title Organization Address City state ___ Zip __ Telephone Identification Number For Standard Requirement For Against Reason for Position: (Use Additional Pages if Required} 142 ]oumal of Librm·y Automation Vol. 7/2 June 1974 TESLA STANDARDS SCOREBOARD Receipt Screen Division REJ/ACPl Publish Tally Representative Title/I.D. Number Date Date Date Date Date Date Date Target specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi- cated by: VI. Existing Standards. Not Ap- plicable). Note that the parenthetical statements following most of the outline entry de- scriptions relate to the ANSI Standards Proposal section headings to facilitate the translation from this outline to the ANSI format. All Initiative Standards Proposals are to be typed, double spaced on 83~" x 11" white paper (typing on one side only). Each page is to be numbered consecu- tively in the upper right-hand corner. The initiator's last name followed by the key word from the title is to appear one line below each page number. I. Title of Initiative Standard Pro- posal (Title) . II. Initiator Information (Forward). A. Name B. Title C. Organization D. Address E. City, State, Zip F. Telephone: Area Code, Num- ber, Extension III. Technical area. Describe the area of library technology as under- stood by initiator. Be as precise as possible since in large measure the information given here will help determine which ALA offi- cial representative might best handle this proposal once it has been reviewed and which ALA organizational component might best be engaged in the review process. IV. Purpose. State the purpose of Standard Proposal (Scope and Qualifications) . V. Description. Briefly describe the Standard Proposal (Specification of the Standard) . VI. Relationship of other standards. If existing standards have been identified which relate to, or are felt to influence, this Standard Technical Communications 143 Proposal, cite them here (Ex- pository Remarks) . VII. Background. Describe the re- search or historical review per- formed relating to this Standard Proposal (if applicable, provide a bibliography) and your find- ings (Justification). VIII. Specifications. Specify the Stan- dard Proposal using record lay- outs, mechanical drawings, and such related documentation aids as required in addition to text ex- position where applicable (Spec- ification of the Standard). RESEARCH AND DEVELOPMENT System Development Corporation Award- ed National Science Foundation Grant to Study Interactive Searching of Large Lit- erature Data Bases Santa Monica, California-The Nation- al Science Foundation has awarded Sys- tem Development Corporation $98,500 for a study of man-machine system communi- cation in on-line reh·ieval systems. The study will focus on interactive searching of very large literature data bases, which has become a major area of interest and activity in the field of information science. At least seven major systems of national or international scope are in operation within the federal government and private industry, and more systems are on the drawing boards or in experimental opera- tion. The principal investigator for the proj- ect will be Dr. Carlos Cuadra, manager of SDC's Education and Library Systems Department. The project manager, who will be responsible for the day-to-day op- eration of the fifteen-month effort, is Judy Wanger, an information systems analyst and project leader with extensive experi- ence in the establishment and use of in- teractive bibliographic retrieval services. Ms. Wanger is currently responsible for user training and customer support on SDC's on-line information service. The study will use questionnaire and interview techniques to collect data re- 144 Journal of Libml'y Automation Vol. 7/2 June 1974 lated to: (1) the impact of on-line re- trieval usage on the terminal user; (2) the impact of on-line service on the sponsor- ing institution; and ( 3) the impact of on- line service on the information-utilization habits of the information consumer. At- tention will also be given to reliability problems in the transmission chain from the user to the computer and back. The major elements in this chain include: the user; the terminal; the telephone instru- ment; local telephone lines and switch- boards; long-haul communications; the communications-computer interface hard- ware; the computer itself; and various programs in the computer, including the retrieval program. REPORTS ON REGIONAL PROJECTS AND ACTIVITIES California State University and Colleges System Union List System The Library Systems Project of the Cal- ifornia State University and Colleges has recently completed a production Union List System. This system, comprised of eight processing programs to be run in a very modest environment (currently a CDC 3300), is written in ANSI COBOL and is fully documented. Included in the documentation package are user work- sheets for bibliographic and holding data, copies of all reports, file layouts, program descriptions, etc. Output from this system are files designed to drive graphic quality photocomposition or COM devices. The system is available for the price of dupli- cating the documentation package. And, for those so desiring, the master file con- taining some 25,000 titles and titles with references is also available for the cost of duplication. Interested parties (bona fides only, please) should contact John C. Kountz, Associate for Library Automation, California State University and Colleges, 5670 Wilshire Blvd., Suite 900, Los An- geles, CA 90036, for further details. SOLINET Membe1·ship Meeting The annual membership meeting of the Southeastern Library Network (SOLI- NET) was held at the Georgia Institute of Technology in Atlanta, March 14. It was announced that Charles H. Stevens, executive director of the National Com- mission on Libraries and Information Sci- ence, has been named director of SOLI- NET effective July 1. John H. Gribbin, chairman of the board, will serve as in- terim director. It was also announced that SOLINET will be affiliated with the Southern Re- gional Education Board. SREB will pro- vide office space, act as financial disburs- ing agent, and will be available at all times in an advisory capacity. Negotiations are underway for a tie-in to the Ohio College Library Center ( OCLC) and a proposed contract is in the hands of the OCLC legal counsel. It is anticipated that a contract soon will be signed. Additional to the tie-in, SOLINET will proceed with the development of its own permanent computer center in At- lanta. This center will eventually provide a variety of services and will be coordi- nated carefully with other developing net- works, looking toward a national library network system. Elected to fill three vacancies on the Board of Directors were James F. Govan (University of North Carolina), Gustave A. Harrar (University of Florida), and Robert H. Simmons (West Georgia Col- lege). They will assume office on July 1. Anyone desiring information about SOLI- NET should write to 130 Sixth St., NW, Atlanta, GA 30313. REPORTS-LIBRARY PROJECTS AND ACTIVITIES New Book Catalog for Junior College District of St. Louis The three community college libraries of the Junior College District of St. Louis have been using computerized union book catalogs since 1964. Formerly maintained and produced by an outside contractor, the catalogs are now one product of a new catalog system recently designed and implemented by in- structional resources and data processing staff of the district. Known as "ir catalog," the system pres- ently has a data base of approximately 65,000 records describing the print and nonprint collections of the district's three college instructional resource centers. In addition to photocomposed author, subject, and title indexes, the system also produces weekly cumulative printouts which supplement the phototypeset ''base" catalog. Other output includes three-by-five-inch shelflist cards (which include union holdings information), a motion picture film catalog, subject and cross reference authority lists, and various statistical reports. Hawaii State Lihra1'y System to Automate P1'ocessing The State Board of Education in Ha- waii has approved a proposal for a com- puterized data processing system for the Hawaii State Library. The decision allows for the purchase of computer equipment for automating library operations. The state library centrally processes library materials for all public and school li- braries in the state. Teichior Hirata, acting state education superintendent, told board members a computerized system will speed book se- lection, ordering, and processing, and will improve interlibrary loan and reference services. He also pointed out it would fa- cilitate a general streamlining of all tech- nical administrative operations. The system's total cost will be $187,000, of which $58,000 will be spent for com- puter software. The "BIBLIOS" system, designed and developed at Orange Coun- ty Public Library in California and mar- keted by Information Design, Inc., was selected as the software package. The Caltech Science Lihm1'y Catalog Supplement The use of catalog supplements during the necessary maturation period required to take full advantage of the National Pro- gram for Acquisitions and Cataloging is Technical Communications 145 obviously an idea whose time has come. The program developed at the California Institute of Technology, however, differs in several important respects from that previously described by Nixon and Bell at U.C.L.A. 1 For reasons based primarily on faculty pressure, the practice of holding books in anticipation of the cataloging copy has never been a practice at the Institute. The solution, while hardly unique, is to assign the classification number (Dewey) and depend on a temporary main entry card to suffice until the LC copy is available. While this procedure has the distinct ad- vantage of not requiring the presence of the book to complete the cataloging pro- cess, it does, however, prevent the user from finding the newest books through a search of the subject added entry cards. The use of the computer-based systems is an obvious solution to this aspect of the program but raises several additional problems which formerly seemed to defy solutions. As has been pointed out by Mason, li- brary-based computer systems can rarely be justified in terms of cost effectiveness, and computer-based library catalogs are no exception.2 Part of this problem arises from the natural inclination to repeat in machine language what has been standard practice in the library catalog. This reac- tion overlooks the very different nature of catalogs and catalog supplements. As catalogs serve as the basis for the permanent record and their cost can be prorated over several decades the need for a careful description of the many facets of a book is quite properly justified. In the case of catalog supplements, however, where the record will serve quite likely for only a few months, any attempt at de- tailed description of the book cannot be justified. One solution to this dilemma that has been developed here at Caltech is a brief listing supplement which allows searching for a given book by either the first author or editor's last name, a key word from the title, or the first word of a series entry. These elements form the basis of a simple KWOC index (see Figure 1) which sup- 146 Journal of Library Automation Vol. 7/2 June 1974 CHEMISORPTION CHEMISORPTION AND CATALYSIS HEPPLE 541.395 HE 1970 CH CHESTER 19 TECHNIQUES IN PARTIAL DIFFERENTIAL EQUATIONS CHESTER 517.6 CH 1971 CH 199 CIBA PROTEIN TURNOVER 612.39 PR 1972 Bl ( CIBA FOUNDATION SYMPOSIUM, 9) Fig. 1. Sample entries from the KWOC index 108 19 T CHEMISORPTION & CATALYSIS HEPPLE 541.395 HE 1970 CH A HEPPLE CHEMISORPTION CATALYSIS 108 T PROTEIN TURNOVER 612.39 PR 1972 BI (CIBA FOUNDATION SYMPOSIUM, 9) A PROTEIN CIBA 199 T TECHNIQUES IN PARTIAL DIFFERENTIAL EQUATIONS CHESTER 517.6 CH 1971 CH A DIFFERENTIAL CHESTER Fig. 2. Sample ent1·ies from the bibliographic listing NEW BOOKS CHEMISTRY /BIOLOGY August 6, 1973 Catalysis, Chemisorption and . Hepple 541.395 HE 1970 CH Differential equations, Techniques in partial . . . Chester 517.6 CH 1971 CH Protein turnover CIBA Foundation Symposium, 9 612.39 PR 1972 BI Fig. 3. Sample entries from the weekly list of newly added books plements the bibliographic listing (shown in Figure 2) . All books received in the chemistry, physics, and biology libraries are represented in the catalog supplement. Weekly lists of newly added books (shown in Figure 3) are annotated to show the index terms prior to keypunching. The unit record consists of a "title" card or cards (which contain the full title, au- thor/ editor, call number, library designa- tion, and series information) and an "au- thor" card (which contains the index terms) . Edited material is added acces- sionally to the card file data base and batch processed on the campus IBM 370/ 155 computer. The catalog supplement is currently published on 8Jf-by-1Hnch sheets as a result of reducing the computer printout on a Xerox 7000 copier. Lists are given a vello-bind and delivered to there- spective libraries. Weeding the catalog supplement is still unresolved. At the present time additions are less than 1,000 per year, so that it may be possible after five years to replace the subject sections of the respective division- al catalogs with the catalog supplement. The "library" at Caltech consists of sev- eral divisional libraries, each with their own card catalog. These divisional card catalogs are supplemented by a union catalog, which serves all libraries on cam- pus and, because of the strong interdisci- plinary nature of the divisional libraries, is much the better source for subject searches. The project is so facile and the costs so minimal that this approach might be of value to many small libraries. It is particu- larly applicable to the problems recently discussed by Patterson. 3 Books in series, even if they are distinct monographs, are often lost to the user from a subject ap- proach. With this system each physical volume added to the library can be ana- lyzed for possible inclusion in the catalog supplement. 1. RobertaNixon and Ray Bell, "The U.C.L.A. Library Catalog Supplement," Library Re- sources & Technical Services 17:59 (Winter 1973). 2. Ellsworth Mason, "Along the Academic Way," Library Journal 96:1671 (1971). 3. Kelly Patterson, "Library Think vs Libra1y User," RQ 12:364 (Summer 1973). DanaL. Roth Millikan Librm·y C alifomia Institute of Technology COMMERCIAL ACTIVITIES Richard Abel & Company to Sponsor Workshops in Library Automation and Management One of the most effective forms of con- tinuing education is state-of-the-art report- ing. Recognizing the need for more such communication 1 the international li- brary service firm of Richard Abel & Com- pany plans to sponsor two workshops for the library and information science com- munity. The first workshop will deal with the latest techniques in library automation. It will precede the 197 4 American Li- brary Association Conference in New York City, July 7-13. The second will present advances in library management, and will be scheduled to precede the 1975 ALA Midwinter Meeting, January 19-25. The workshops will include forums, lec- tures, and open discussions. They will be presented by recognized leaders in the fields of library automation, management, and consulting. Each workshop will prob- ably be one or two days long. There will be no charge to attend ei- ther of the workshops, but attendance will be limited, to provide a good discussion atmosphere. For the Management Work- shop, attendance will be limited to li- brarians active in library management. Similarly, the Automation Workshop is in- tended for librarians working in library automation. Maintaining the theme of state-of-the- art reporting, the basic content of the workshops will consist of what is happen- ing in library management and automa- tion today. Looking to the future, there will also be discussions and forecasts of what is to come. Persons interested in further informa- Technical Communications 147 tion or in pa1ticipating in either workshop should contact Abel Workshop Director, Richard Abel & Company, Inc., P.O. Box 4245, Portland, OR 97208. IDC Introduces BIBNET On-Line Services The introduction of BIBNET on-line systems, a centralized computer-based bibliographic data service for libraries, has been announced by Information Dynam- ics Corporation. Demonstrations are planned for the ALA Annual Conference in New York, July 7-13. According to David P. Waite, IDC president, "During 1973, BIBNET ser- vice modules were interconnected over thousands of miles and tested for on-line use with IDC's centralized computer- based cataloging data files. This is the culmination of a program that began two years ago. It is patterned after advanced technological developments similar to those recently applied to airline reserva- tion systems and other large scale nation- wide computing networks used in indus- try." IDC, a New England-based library sys- tems supplier, will provide a computer- stored cataloging data base of more than 1.2 million Library of Congress and con- tributed entries. Initially it will consist of all Library of Congress MARC records (now numbering over 430,000 titles), plus another 800,000 partial LC catalog records containing full titles, main entries, LC card numbers, and other selected data elements. As a result, BIBNET will pro- vide on-line bibliographic searching for all 1,250,000 catalog records produced by the Library of Congress since 1969. To enable users to produce library cards from those non-MARC records for which only partial entries are kept in the computer, IDC will mail card sets from its headquarters and add the full records to the data base for future reference. Subscribing libraries will have access to the data base using a minicomputer cathode ray tube (CRT) terminal. Using this technique of dispersed computing each BIBNET terminal has programmable computer power built-in. This in-house 148 Journal of Library Automation Vol. 7/2 June 1974 processing power, independent of the cen- tral computer, allows computer processes like library card production to be per- formed in the library. This also eliminates waiting for catalog cards to arrive in the mail. BIBNET terminals communicate with the central computer over regular tele- phone lines, eliminating the high costs of dedicated communication lines. There- fore, thousands of libraries throughout the United States and Canada can avail them- selves of on-line services at low cost. BIBNET users will have several meth- ods of extracting information from the IDC data base. The computer can search for individual records by titles, main en- try, ISBN number, or keywords. Here's how it works: the operator types in any one of the search items or if a complete title is not known, a keyword from the title may be used. The cataloging infor- mation is then displayed on the CRT where the operator may verify the record. At the push of a button, the data is stored on a magnetic cassette tape which is later used for editing and production of cata- log cards by the user library. The BIBNET demonstration in New York will highlight one of many biblio- graphic service modules available from IDC and stress the fact that these ser- vices can be utilized by individual li- braries and organized groups of libraries. License for New Information Retrieval Concept Awarded to Boeing by Xynetics An exclusive license for manufacture and marketing to the government sector of systems incorporating a completely new concept in information storage and re- trieval has been awarded to The Boeing Company, Seattle, Washington, by Xynet- ics, Inc., Canoga Park, California, it was announced jointly by Dr. R. V. Hanks, Boeing program manager, and Burton Cohn, Xynetics Board chairman. The system is said to be the first image storage and retrieval system which offers response times and costs comparable to those of digital systems. The heart of the system is a device of proprietary design, the Flat Plane Memory, which provides mpid access to massive amounts of data stored in high resolution photographic media. The photographic medium enables low cost storage of virtually any type of source material (documents, correspondence, drawings, multitone images, computer output, etc.) while eliminating the need for time-consuming, costly conversion of pre-existing information into a specialized (e.g., digital) format. By virtue of its ex- tremely rapid random access capability, the data needs of as many as several thou- sand users can be served at remote video terminals from a single memory with near real time response ( 1-3 seconds, typical- ly). The high speed, high accuracy, and high reliability of the Flat Plane Memory is accomplished primarily through the use of the patented Xynetics positioner, which generates direct linear motion at high speeds and with great precision and relia- bility instead of converting rotary motion. As a result, the positioners eliminate the gears, lead screws, and other mechanical devices previously utilized, and thus achieve the requisite speed, accuracy, and reliability. The Xynetics positioners are already being used in automated drafting systems produced by the firm, and in a wide variety of other applications, includ- ing the apparel industry and integrated circuit test systems. The new approach could eliminate many of the problems associated with multiple reproductions and distribution of large data files. In addition to many government appli- cations, the system is expected to have major applications in the commercial mar- ketplace. APPOINTMENTS Charles H. Stevens Appointed SOLINET Director Charles H. Stevens, executive director, National Commission on Libraries and In- formation Science, has been appointed di- rector of the Southeastern Library Net~ work (SOLINET), effective July 1. The announcement was made at a meeting of SOLINET in Atlanta, March 14, by John H. Gribbin, board chairman. Composed of ninety-nine institutional members, SOLINET is headquartered in Atlanta. A librarian of acknowledged national stature and an expert on the technical as- pects of information retrieval systems, Mr. Stevens brings to SOLINET a valuable combination of experience and abilities. Concerned with national problems of li- braries and information services, he will develop a regional network and move to- ward a cohesive national program to meet the evolving needs of U.S. libraries. A forerunner in library automation, Mr. Stevens served for six years as associate director for library development, Project Intrex, at Massachusetts Institute of Tech- nology. From 1959-1965 he was director of library and publications at MIT's Lin- coln Laboratory, Lexington, Massachu- setts. At Purdue University, he was aero- nautical engineering librarian and later di- rector of documentation of the Thermo- physical Properties Research Center. Mr. Stevens is a member of the Coun- cil of the American Library Association, the American Society for Information Sci- ence, the Special Libraries Association, and other professional organizations. He is the author of approximately forty pa- pers in the field, lectures widely, and con- sults on library activities for a number of universities. Mr. Stevens holds a B.A. in English fro:in Principia College, Elsah, Illinois, and Master's degrees in English and in Li- brary Science from the University of North Carolina. Mr. Stevens has done fur- ther study in engineering at Brooklyn Polytechnic Institute. Mr. Stevens is married and has three sons. INPUT To the Editor: International scuttlebutt informs us that those in the bibliothecal stratosphere are Technical Communications 149 attempting to formulate a communications format for bibliographical records accept- able on a worldwide basis. We on the lo- cal scene unite in wishing them "Huzzah!" and "Godspeed!" Nomenclature must be provided, of course, to designate particular applica- tions; and the following suggestions are offered as possible subspecies of the genus SUPERMARC: DEUTSCHMARC-for records distrib- uted from Bonn and/ or Wiesbaden RHEEMARC-for South Korean records, named in honor of the late president of that country BISMARC-for records of stage produc- tions which have been produced by popular demand from the top balcony; especially pertinent for Wagnerian op- eras BENCHMARC-for records of generally unsuccessful football plays MINSKMARC-for Byelorussian records SACHERMARC-for Austrian records, usually representing extremely tasteful concoctions TRADEMARC-for records pertaining to manufactured products, especially pat- ent medicines GOLDMARC-for records representing Hungarian musical compositions ( v. Karl Goldmark, 1830-1915) ECTOMARC } ENDOMARC MESOMARC (from -for skinny, fat, and the Italian, MEZ- medium-sized rec- ZOMARC) ords, respectively LANDMARC-for records of historic edi- fices; sometimes ( enoneously) applied to records for local geographical re- gions FEUERMARC-for records representing charred or burned documents MONTMARC-1. for records represent- ing works by or about Parisian artists; 2. for records representing publications of the French Academy WATERMARC-for records representing documents contained in bottles washed up on the beach. Joseph A. Rosenthal University of California, Berkeley 8946 ---- 150 BOOK REVIEWS Networks and Disciplines; !Proceedings of the EDUCOM Fall Conference, October 11-13, 1972, Ann Arbor, Michigan. Princeton: EDUCOM, 1973. 209p. $6.00. As with so many conferences, the prin- cipal beneficiaries of this one are those who attended the sessions, and not those who will read the proceedings. Except for a few prepared papers, the text is the somewhat edited version of verbatim, ad lib summaries of a number of workshop sessions and two panels that purport to summarize common themes and consen- sus. Since few people are profound in ad lib commentaries, the result is shallow and repetitive. The forest of themes is com- pletely lost among a bewildering array of trees. The conference was, I am sure, exciting and thought-provoking for the partici- pants. It was simply organized, starting with statements of networking activities in a number of disciplines, i.e., chemistry, language studies, economics, libraries, mu- seums, and social research. The paper on economics is by far the best organized presentation of the problems and poten- tial of computers in any of the fields con- sidered, and perhaps the best short pre- sentation yet published for economics. The paper on libraries was short, that on chemistry lacking in analytical quality, that on language provocative, that on so- cial research highly personal, and that on museums a neat mixture of reporting and interpreting. Much of the information is conditional, that is, it described what might or could be in the realm of the application of computers to the various subjects. The speakers all directed their papers to the concept of networks, in- terpreted chiefly as widespread remote ac- cess to computational facilities. The papers are followed by very brief transcripts of the summaries of workshops in which the application of computers to each of the disciplines was presumably discussed in detail. Much of each sum- mary is indicative and not really informa- tive about the discussions. The concluding text again is the transcript of two final panels on themes and relationships among computer centers. The only description for this portion of the text is turgid. In the midst of all this is the banquet paper pre- sented by Ed Parker, who as usual was thoughtful and insightful, and several pre- sentations by National Science Foundation officials that must have been useful at the time to guide those relying on federal funding for computer networks in devel- oping proposals. I can't think of another reference that touches on the potential of computers in so many different disciplines, but it is ap- parent from the breadth of ideas and the range of suggested or tested applications that a coherent and analytical review should be done. This volume isn't it. Russell Shank Smithsonian Institution The Analysis of Information Systems, by Charles T. Meadow. Second Edition. Los Angeles: Melville Publishing Co., 1973. A Wiley-Becker & Hayes Series Book. This is a revised edition of a book first published in 1967. The earlier edition was written from the viewpoint of the pro- grammer interested in the application of computers to information retrieval and re- lated problems. The second edition claims to be "more of a textbook for information science graduate students and users" (al- though it is not clear who these "users" are) . Elsewhere the author indicates that his emphasis is on "software technology of information systems" and that the book is intended "to bridge the communications gap among information users, librarians and data processors." The book is divided into four parts: Language and Communication (dealing largely with indexing techniques and the properties of index languages) , Retrieval of Information (including retrieval strate- gies and the evaluation of system perform- ance), The Organization of Information (organization of records, of ffies, file sets), Computer Processing of Information (basic file processes, data access systems, interactive information retrieval, program- ming languages, generalized data manage- ment systems). The second two sections are, I feel, . much better than the first. These are the areas in which the author has had the most direct experience, and the topics covered, at least in their infor- mation retrieval applications, are not dis- cussed particularly well or particularly fully elsewhere. It is these sections of the book that make it of most value to the student of information science. I am less happy about Meadow's discussion of in- dexing and index languages, which I find unclear, incomplete, and inaccurate in places. The distinction drawn between pre-co- ordinate and post-coordinate systems is inaccurate; Meadow tends to refer to such systems simply as keyword systems, al- though it is perfectly possible to have a post-coordinate system based on, say, class numbers, which can hardly be considered keywords, while it is also possible to have keyword systems that are essentially pre- coordinate. In fact, Meadow relates the characteristic of being post-coordinate to the number of terms an indexer may use (" ... permit their users to select several descriptors for an index, as many as are needed to describe a particular docu- ment"), but this is not an accurate dis- tinction between the two types of system. The real difference is related to how the terms are used (not how many are used), including how they are used at the time of searching. The references to faceted classification are also confusing and a number of statements are made through- out the discussion on index languages that are completely untrue. For example, Meadow states (p. 51) that "a hierarchi- cal classification language has no syntax to combine descriptors into terms." This is not at all accurate since several hier- archical classification schemes, including UDC, do have synthetic elements which al- low combination of descriptors, and some of these are highly synthetic. In fact, Meadow himself gives an example (p. 38- 39) of this synthetic feature in the UDC. It is also perhaps unfortunate that the student could read all through Meadow's discussion of index languages without get- ting any clear idea of the structure of a thesaurus for information retrieval and how this thesaurus is applied in practice. Book Reviews 151 Moreover, Meadow used Medical Subject Headings as his example of a thesaurus (p. 33-34), although this is not at all a conventional thesaurus and does not fol- low the usual thesaurus structure. My other criticism is that the book is too selective in its discussion of various aspects of information retrieval. For ex- ample, the discussion on automatic index- ing is by no means a complete review of techniques that have been used in this field. Likewise, the discussion of interac- tive systems is very limited, because it is based solely on NASA's system, RECON. The student who relied only on Meadow's coverage of these topics would get a very incomplete and one-sided view of what exists and what has been done in the way of research. In short, I would recommend this book for those sections (p. 183-412) that deal with the organization of records and files and with related programming considera- tions. The author has handled these topics well and perhaps more completely, in the information retrieval context, than anyone else. Indexing and index languages, on the other hand, are subjects that have been covered more completely, clearly, and ac- curately by various other writers. I would not recommend the discussion on index languages to a student unless read in con- junction with other texts. F. W. Lancaster University of Illinois Application of Computer Technology to Librm·y Processes, A Syllabus, by Joseph Becker and Josephine S. Pulsifer. Metuch- en, N.J.: Scarecrow Press, 1973. 173p. $5.00. Despite the large number of institutions offering courses related to library automa- tion, including just about every library school in North America, accredited or not, there is a remarkable shortage of pub- lished material to assist in this instruction. With the publication of this small volume a light has been kindled; let us hope it will be only the first of many, for larger numbers of better educated librarians must surely result in higher standards in the field. This syllabus covers eight topics related 152 Journal of Library Automation Vol. 7/2 Jtme 1974 to the use of computers in libraries, titled as follows: Bridging the Gap (librarians and automation); Computer Technology; Systems Analysis and Implementation; MARC Program; Library Clerical Pro- cesses (which encompasses acquisitions, cataloging, serials, circulation, and man- agement information) ; Reference Ser- vices; Related Technologies; and Library Networks. Each topic is treated as a unit of instruction, and each receives the iden- tical treatment as follows. The units each start with an introduc- tory paragraph, explaining what the field encompasses, and indicating the purpose of teaching that topic. The purpose of sys- tems analysis, for example, is "To develop the sequence of steps essential to the in- troduction of automated systems into the library." A series of behavioral objectives are then listed, to show what the student will be able to do (after he has learned the material) that he presumably was un- able to do before. For example, there are seven behavioral objectives in the unit on Computer Technology, of which the first four are: "1) the student will be able to discuss the two-fold requirement to repre- sent data by codes and data structures for purposes of machine manipulation, 2) the student will be able to identify the basic components of computer systems and de- scribe their purposes, 3) the student will be able to differentiate hardware and soft- ware and describe briefly the part that programming plays in the overall com- puter processing operation, 4) the student will be able to define the various modes of computer operation and indicate the utility of each in library operations." The remaining three objectives refer to the student's ability to enumerate and com- pare types of input, output, and storage devices. Then an outline of the instruc- tional material is presented, followed by the detailed and well-organized material for instruction. In no case can the material presented here be considered all that an instructor would need to know about the field, but a surprising amount of specific detail is included, along with a carefully organized framework within which to place other knowledge. The end result is to present to the instructor a series of outlines that would encompass much of the material included in a basic introductory course in library automation. Every instructor would, presumably, want to add other topics of his own in addition to adding other material to the topics treated in this volume, but he has here an extremely helpful guide to a basic course, and the only work of its kind to be published to date. Peter Simmons School of Librarianship University of British Columbia The Larc Reports, Vol. 6, Issue 1. On- Line Cataloging and Circulation at West- ern Kentucky University: An Approach to Automated Instructional Resources ~anagement. 1973. 78p. This is a detailed account of the design, development, and implementation of on- line cataloging and circulation which have been in operation at Western Kentucky University for several years. The library's reasons for using computers are similar to those of many college and university li- braries that experienced rapid growth dur- ing the 1960s. The faculty of the Division of Library Services first prepared a detailed proposal with appropriate feasibility studies and cost analyses to reclassify the collection from Dewey Decimal to Library of Con- gress classification. The proposal was ap- proved by the administration of the uni- versity, and the decision was made to utilize campus computer facilities via on- line input techniques for reclassification, cataloging, and circulation. "Project Re- class" was accomplished during 1970-71 using IBM 2741 ATS/360 terminals. A circulation file was subsequently generat- ed from the master record file. The main library is housed in a new building and has excellent computer fa- cilities within the library that are connect- ed to the University Computer Center. Cataloging information is input directly into the system via ATS terminals; IBM 2260 visual display terminals are used for inquiry into the status of books and pa- trons; and IBM 1031/1033 data collec- tion terminals are used to charge out and check in books. Catalog cards and book catalogs in upper/lower case are produced in batch mode on regular schedule. The on-line circulation book record file is used in conjunction with the on-line student master record and payroll master record files for preparation of overdue and fine notices. Apparently the communication between library staff and computer personnel has been well above average, and cooperation of the administration and other interested parties has been outstanding. The atten- tion given to planning, scheduling, train- ing, and implementation is impressive. What has been accomplished to date is considered very successful, and plans are Book Reviews 153 underway to develop on-line acquisitions ordering and receiving procedures. The report has some annoying short- comings such as referring to the Library of Congress as "National Library"; fre- quent use of the word "xeroxing," which the Xerox Corporation is attempting to correct; "inputing" for "inputting"; and several other misspelled words. Some parts are poorly organized and unclear, but the report does provide rriany useful details for those considering a similar un- dertaking. LaVahn Overmyer School of Library Science Case Western Reserve University 8947 ---- The LC/MARC Record As a National Standard 159 The desire to promote exchange of bibliographic data has given rise to a rather cacophonous debate concerning MARC as a "standard," and the definition of a MARC compatible record. Much of the confusion has arisen out of a failure to carefully separate the intellectual content of a bibliographic record, the specific analysis to which it is subjected in an LC/MARC format, and its physical representation on magnetic tape. In addition, there has been a tendency to obscure the different requirements of users and creators of machine-readable bibliographic data. In general, the standards making process attempts to find a consensus among both groups based on existing practice. The process of standardization is rarely one which relies on enlightened legislation. Rather, a more pragmatic ap- proach is taken based on an evaluation of the costs to manufacturers weighed against costs to consumers. Even this modest approach is not in- vested with lasting wisdom. ANSI standards, for example, are subject to quinquennial review. Standards, as already pointed out, have as their basis common acceptance of conventions. Thus, it might prove useful to examine the conventions employed in an LC/MARC record. The most important of these is the Anglo-American Cataloging Rules as interpreted by LC. The use of these rules for descriptive cataloging and choice of entry is universal enough that they may safely be considered a standard. Similar comments may be made concerning the subject headings used in the dictionary catalog of the Library of Congress. The physical format within which machine-readable bibliographic data may be transmitted is accepted as a codified national and international standard (ANSI Z39.2-1971 and ISO 2709-1973 (E) ) . This standard, which is only seven pages in length, should be carefully read by anyone seriously concerned with the problems of bibliographic data interchange. ANSI Z39.2 is quite different from the published LC/ MARC formats. It defines little more than the structure of a variable length record. Simply stated, ANSI Z39.2 specifies only that a record shall contain a leader specifying its physical attributes, a directory for identify- ing elements within the record by numeric tag (the values of the tags are not defined), and optionally, additional designators which may be used to provide further information regarding fields and subfields. This structure is completely general. Within this same structure one could transmit book 160 1 oumal of Library Automation Vol. 7 I 3 September 197 4 orders, a bibliographic record, an abstract, or an authority record by adopt- ing specific conventions regarding the interpretation of numeric tags. Thus, we come to the crux of the problem, the meanings of the content designators. Content designators (numeric tags, subfields, delimiters, etc.) are not synonymous with elements of bibliographic description; rather, they represent the level of explicitness we wish to achieve in encoding a record. It might safely be said that in the most common use of a MARC record-card production-scarcely more than the paragraph distinctions on an LC card are really necessary. If we accept such an argument, then we can simply define compatibility with LC/MARC by defining compatibility in terms of a particular class of applications, e.g., card, book, or CRT cata- log creation. A record may be said to be compatible with LCjMARC if a system which accepts a record as created by LC produces from the compati- ble 1·ecord products not discernibly different from those created from an LC/MARC record. Thus, what is called for is a family of standards all downwardly com- patible with LC/MARC, employing ANSI Z39.2 as a structural base. This represents the only rational approach. The alternative is to accept LC/ MARC conventions as worthy of veneration as artistic expression. s. MICHAEL MALINCONICO 8948 ---- Principles of Format Design Henriette D. AVRAM and Lucia J. RATHER: MARC Development Office, Library of Congress 161 This paper is a summary of several working papers prepared for the Inter- national Federation of Library Associations (IFLA) Working Group on Content Designators. The first working paper, January 1973, discussed the obstacles confronting the Worldng Group, stated the scope of responsibili- ty for the Working Group, and gave definitions of the terms, tags, indica- tor and data element identifiers, as well as a statement of the function of each.1 The first paper was submitted to the Working Group for comments and was subsequently modified (revised Aprill973) to reflect those comment$ that were applicable to the scope of the Working Group and to the defini- t·ion and function of content designators. The present paper makes the basic assumption that there will be a SUPERMARC and discusses princi- ples of format design. This se1·ies of papers is be·ing published in the interest of almting the library community to intemational activities. All individual working pa- pers are submitted to the MARBI interdivisional committee of ALA by the chairman of the IFLA Working Group for comments by that com- mittee. INTRODUCTION In order to have this paper stand alone, the scope and the definition and functions of the content designators as agreed to by the Working Group are summarized below: 1. The scope of responsibility for the IFLA Working Group is to arrive at a standard list of content designators for different forms of ma- terial for the international interchange of bibliographic data. 2. The definition and function of each content designator are given as: a. A tag is a string of characters used to identify or name the main content of an associated data field. The designation of main con- tent does not require that a data field contain all possible data ele- ments all the time. b. An indicator is a character associated with a tag to supply addition- al information about the data field or parameters for the process- ing of the data field. There may be more than one indicator per data field. 162 ] ournal of Lib1'a1'Y Automation Vol. 7 I 3 September 197 4 c. A data element identifier is a code consisting of one or more char- acters used to identify individual data elements within a data field. The data element identifier precedes the data element which it identifies. d. A fixed field is one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the fixed field from occurrence to occurrence. The content of the fixed field can actually be data content, or a code representing data content, or a code representing information about the record. BASIC ASSUMPTION-SUPERMARC There appears to be little doubt that the format used for international exchange will not be the format presently in use in any national system. The first working paper addressed the obstacles that preclude complete agreement on any single national format, and a study of the matrix of the content designators assigned by various national agencies substantiates the above conclusion. Consequently, we are concerned with the development of a SUPERMARC whereby national agencies would translate their local format into that of the SUPERMARC format and conversely, each agen- cy would accept the SUPERMARC format and translate it into a format for local processing. 2• 3 SUPERMARC, therefore, is an international ex- change format with the principal function that of transferring data across national boundaries. It is not a processing format (although if de- sired, it could be used as such) and in no way dictates the record organiza- tion, character bit configuration, coding schemes, etc., to be used within processing agencies. The SUPERMARC format, however, should conform to certain conven- tions, namely the format structure should be ISO 2709 and the character representation should be an eight-bit extension of ISO 646. ~ The latter convention means that data cannot be in any other configuration than a character-by-character representation. SUPERMARC assumes not only agreement on the value of content des- ignators but, equally as important, on the level of application of these content designators. Whatever the agreed upon level of content designa- tion is, those agencies with formats more detailed will be able to translate to SUPERMARC but will be in the position of having to upgrade all rec- ords entered into their local system from other agencies. Likewise, local formats consisting of less detailed content designation than SUPER- MARC must upgrade to the SUPERMARC level for communication pur- poses. Where the actual content of the record is concerned, i.e., the fields andjor data elements to be included, it is highly probable that the deci- sion of the Content Designator Working Group will be that data, if in- ~ ISO/TC 46/SC4 WGl is presently engaged in the definition of extended characters for Roman, Cyrillic, and Greek alphabets and mathematics and control symbols. Principles of Format Design/ AVRAM and RATHER 163 eluded in the record, are assigned SUPERMARC content designators, but that not all data will always be present. This permits the flexibility re- quired to bypass some of the substantive problems of different cataloging rules and cataloging systems. For example, one agency may supply printer and place of printing while another may not. It may be assumed, however, that all agencies will conform to the specifications prescribed by the ISBD and other such standard descriptions as they become available. PRINCIPLES OF FORMAT DESIGN Prior to any deliberation regarding the actual value of content designa- tors, the Working Group realized it must agree on a set of basic principles for the design of the international format. The first working paper set forth, in the form of questions, some of the issues that must be taken into account in arriving at the principles. Several members of the Working Group expressed their opinions and these were considered in the formula- tion of the principles. The principles were discussed at the Grenoble meet- ing in August 1973. Five of the principles were adopted and the sixth was deferred for further analysis based on working papers to be written by some of the members. The sixth principle was adopted at the Brussels meeting in February 1974. The six basic principles are stated below with a discussion following each principle: 1. The international format should be designed to handle all media. It would be ideal if at this time all forms of material had been fully analyzed. This is currently not the case. Agreement on data fields and the assignment of content designators can realistically only be accom- plished if there is a foundation upon which to build. Therefore, the forms of material have been limited to those listed below because, to the best of our knowledge, these are the only forms where either experience has been gained in the actual conversion to machine-readable form or in-depth analysis has been performed to define the elements of information for the material. Books: all monographic printed language materials. Serials: all printed language materials in serial form. Maps: printed maps, single maps, serial maps, and map collections. Films: all media intended for projection in monographic or serial form. Music and Sound Recordings: music scores and music and nonmusic sound recordings. At the meeting in Brussels, the decision was made to use the ISBD as the foundation for the definition of functional areas for the for- mats. Since at the present time an ISBD exists only for monographs and serials, these materials will receive first priority by the IFLA Working Group. · Still under consideration is the question whether manuscripts should be included in the forms of material within the scope of the 164 J oumal of Lihra1'y Automation Vol. 7 I 3 September 197 4 Working Group. Pictorial representations and computer mediums have not as yet been analyzed. When these forms have been analyzed, they should be added to the generalized list. 2. The inte1'national fo1'mat should accept single-level and multilevel st1'uctu1'es. There is a requirement to express the relationship of one bibliographic entity to another. This relationship may take many forms. A hierarchical relation is expressed for works which are part of a larger bibliographic entity (such as the chapter of a book, a sin- gle volume of a multivolume set, a book within a series). A linear re- lation is expressed for works which are related to other works such as a book in translation. This discussion is concerned with hierarchical relationships and the need to describe this relationship in machine- readable records. There are a number of ways in which hierarchical relationships may be expressed. One method is to place the informa- tion on the related work in a single field within the record. For exam- ple, the different volumes of a multivolume set may be carried in a contents field. When a book is in a series, the series may be calTied in a series field. This may be termed using a single-level record to show a hierarchical relationship. Another method is to use a multilevel rec- ord made up of subrecords.t The concept of a subrecord directory and a subrecord relationship field was discussed in Appendix II to the ANSI standard Z39.2-197!.4 The appendix illustrated a possible method of handling subrecords and expressing relationships within a bibliographic record but was not part of the American standard. Similarly, in 1968 the Library of Congress published as part of its MARC II format a proposal to pro- vide for the bibliographic descriptions of more than one item in a single record, and represented this capability as "levels" of biblio- graphic description. 5 The international standard (ISO 2709) defines a subrecord technique without an explicit statement of a method to describe relationships. 6 More recently, a level structure was proposed in a document by John E. Linford,7 and an informal paper by Richard Coward8 gave the following example of a level structure: Level Collection Sub-collection Document Analytical Record 1 subrecord 1 subrecord 1 subrecord r------1------, 1 subrecord 1 subrecord 1 subrecord t A subrecord is a "group of fields within a bibliographic record which may be treated as a logical entity." When a bibliographic record describes more than one bibliographic unit, the descriptions of the individual bibliographic units may be treated as subrecords. Principles of Format Design/ AVRAM and RATHER 165 Several national ,agencies have expressed concern regarding the effi- ciency of the ISO 2709 subrecord technique and have suggested that a modification be made to the subrecord statement. There are alternative techniques which could be incorporated in the international exchange format to build in level capability. Meth- ods have been suggested that would cause a revision (specifically the number of characters in each directory entry) to the ISO standard; other alternatives might not. Regardless of the final technique agreed upon, national agencies should maintain the authority to record their cataloging data to reflect their catalog practices, i.e., either describing the items related to an item cataloged as fields within a single-level record or as subrecords of a multilevel record. 3. Tags should identify a field by type of entry as well as function by assigning specific values to the charactet positions. Assigning values to the characters of the tags allows the flexibility to derive more than a single kind of information from the tag. For example, it should be possible by an inspection of the tags to retrieve all personal names from a machine-readable record regardless of the function of the name in the record, i.e., principal author, secondary author, name used as subject, etc. 4. Indicatots should be tag dependent and used as consistently as possi- ble across all fields. Indicators should be tag dependent because they provide both descriptive and processing information about a data field. If the value assigned to an indicator is used as consistently as possible across all fields, where the situation warrants this equality, the machine coding is simplified to process different functional fields containing the same type of entry. 5. Data element identifiets should be tag dependent, but, as fat as pos- sible, common data elements should be identified by the same data element identifiets actoss fields. The principle has been adopted that the format will handle all types of media and consequently the pro- jected number of unique tags may be quite large. In addition, since all types of media are not yet fully analyzed, the number of unique fields is an unknown factor. While it is undeniable that making data element identifiers tag independent would be desirable, the limited number of alphabetic, numeric, and symbolic characters would re- strict the number of data elements to the number of unique charac- ters. This constraint on future expansion seems to be more important than any advantages gained from making data element identifiers tag independent. If data element identifiers are tag dependent, then additional re- finements could be added in one of two ways: ( 1) the principle of identifying common data elements by the same identifiers across fields could be followed as far as possible, 01' ( 2) the identifiers could be given a value to aid in filing. The two refinements appear to be mutu- 166 Journal of Library Automation Vol. 7/3 September 197 4 ally exclusive since a data element in one field may have a different fil- ing value from the same data element in another field. Since the first refinement should be useful for many types of processing, and the second would be useful only in filing, the former seems to be the bet- ter option. 6. The fields in a bibliographic record are primarily related to broad categories of information relating to "sttbfect," "description," "intel- lectual1'esponsibility," etc., and should be grouped according to these fundamental categories. The first working paper discussed as an ob- stacle the lack of agreement on the organization of data content in machine-readable records in different bibliographic communities. A subsequent paper consisting of comments made by staff of the Li- brary of Congress on the proposed EUDISED format discussed in greater detail the analytic versus traditional arrangement. 9 • t The ma- jority of the national formats designed to date are arranged by using the function as the primary grouping and the type of entry as the secondary grouping. Several working papers produced by committee members supported the arrangement by function on the grounds that it followed the traditional order of elements in the bibliographic record and therefore simplified input procedures. Grouping of the fields first by function and then by type of entry was agreed to at the Brussels meeting. REFERENCES 1. Henriette D. Avram and Kay D. Guiles, "Content Designators for Machine Read- able Records," Journal of Library Automation 5:207-16 (Dec. 1972). 2. R. E. Coward, "MARC: National and International Cooperation," in International Seminar on the MARC Format and the Exchange of Bibilographic Data in Machine- Readable Form, Berlin, 1971, The Exchange of Bibliographic Data and the MARC Format (Munich: Pullach, 1972), p. 17-23. 3. Roderick M. Duchesne, "MARC: National and International Cooperation," in Inter- national Seminar on the MARC Format and the Exchange of Bibliographic Data in Machine-Readable Form, Berlin, 1971, The Exchange of Bibliographic Data and the MARC Format (Munich: Pullach, 1972), p.37-56. 4. American National Standards Institute, American National Standard fot' Biblio- gmphic Information Interchange on Magnetic Tape (Washington, D.C.: 1971) (ANSI Z39.2-1971). Appendix, p.l5-34. 5. Henriette D. Avram, John F. Knapp, and Lucia J. Rather, The MARC II Format; A Communications Format for Bibliographic Data (Washington, D.C.: Library of Congress, 1968), Appendix IV, p.l47-49. 6. International Organization for Standardization, Documentation-Format fot• Biblio- graphic Information Interchange on Magnetic Tape. 1st ed. International standard ISO 2709-1973(E). 4p. t In an analytic tagging scheme, the first character of the tag describes the type of entry and subsequent characters describe function; in a traditional tagging scheme, the first character describes function and subsequent characters describe type of entry. PTinciples of Format Design/ AVRAM and RATHER 167 7. Council for Cultural Cooperation. Ad Hoc Committee for Educational Docu- mentation and Information. Working Party on EUDISED Formats and Standards, 3d Meeting, Luxembourg, 26-27 April 1973, Draft EUDISED Format (Second Revision). Prepared by John E. Linford. 8. Paper sent from Richard Coward to Henriette D. Avram, "Notes on MARC Sub- record Directory Mechanism." 9. Henriette D. Avram, "Comments on Draft EUDISED Format (Second Revision)," unpublished paper. 8949 ---- 168 Techniques for Special Processing of Data within Bibliographic Text Paula GOOSSENS: Royal Library Albert I, Brussels, Belgium. An analysis of the codification practices of bibliographic desc1'iptions re- veals a multiplicity of ways to solve the p1'oblem of the special processing of ce1tain characters within a bibliographic element. To obtain a clem· insight i'nto this subfect, a review of the techniques used in different systems is given. The basic principles of each technique are stated, examples am given, and advantages and disadvantages are weighed. Simple local applications as well as more ambitious shared cataloging p1'0j- ects are considered. INTRODUCTION Effective library automation should be based on a one-time manual in- put of the bibliographic descriptions, with multiple output functions. These objectives may be met by introducing a logical coding technique. The higher the requirements of the output, the more sophisticated the storage coding has to be. In most cases a simple identification of the bibliographic elements is not sufficient. The requirement of a minimum of flexibility in filing and printing operations necessitates the ability to locate certain groups of char- acters within these elements. It is our aim, in this article, to give a re- view of the techniques solving this last problem. As an introduction, the basic bibliographic element coding methods are roughly schematized in the first section. According to the precision in the element identification, a distinction is made between two groups, called re- spectively field level and sub:field level systems. The second section con- tains discussions on the techniques for special processing of data within bibliographic text. Three basic groups are treated: the duplication meth- od, the internal coding techniques, and the automatic handling techniques. The different studies are illustrated with examples of existing systems. For the field level projects we confined ourselves to some important German and Belgian applications. In the choice of the subfield level systems, which are MARC II based, we tried to be more complete. Most of the cited appli- cations, for practical reasons, only concern the treatment of monographs. This cannot be seen as a limitation because the methods discussed are very Techniques for Special Processing/ GOOSSENS 169 general by nature and may be used for other material. Each system which has recourse to different special processing techniques is discussed in terms of each of these techniques, enabling one to get a realistic overview of the problem. In the last section, a table of the systems versus the tech- niques used is given. The material studied in this paper provided us with the necessary background for building an internal coding technique in our internal processing format. BIBLIOGRAPHIC ELEMENT CODIFICATION METHODS Field Level Systems The most rudimentary projects of catalog automation are limited to a coarse division of the bibliographic description into broad fields. These are marked by special supplied codes and cover the basic elements of au- thor, title, imprint, collation, etc. In some of the field level systems, a bib- liographic element may be further differentiated according to a more spe- cific content designation, or according to a function identification. For in- stance, the author element can be split up into personal name and corporate name, or a distinction can be made between a main entry, an added entry, a reference, etc. This approach supports only the treatment of each identified biblio- graphic element as a whole for all necessary processing operations, filing and printing included. This explains why, in certain applications, some of the bibliographic elements are duplicated, under a variant form, according to the subsequent treatments reflected in the output functions. Details on this will be discussed later. Here we only mention as an example the Deutsche Bibliographie and the project developed at the University of Bochum.l-4 It is evident that these procedures are limited in their possibilities and are not economical if applied to very voluminous bibliographic files. For this reason, at the same time, more sophisticated systems, using internal coding techniques, came into existence. These allow one to perform sep- arate operations within a bibliographic element, based on a special indica- tion of certain character strings within the text. As there is an overlap in the types of internal coding techniques used in the field level systems and in the subfield level systems, this problem will later be studied as a whole. We limit ourselves to citing some projects falling under this heading. As German applications we have the Deutsche Bibliographie and the BIKAS system. 5 In Belgium the programs of the Quetelet Fonds may be men- tioned.6· 7 Subfield Level Systems In a subfield level system the basic bibliographic elements, separated into fields, are further subdivided into smaller logical units called subfields. For instance, a personal name is broken into a surname, a forename, a numeration, a title, etc. Such a working method provides access to smaller logical units and will greatly facilitate the functions of extraction, sup- 170 Journal of Lihm1·y Automation Vol. 7/3 September 1974 pression, and transposition. Thus, more flexibility in the processing of the bibliographic records is obtained. As is well known, the Library of Congress accomplished the pioneering work in developing the MARC II format: the communications format and the internal processing format. s-n These will be called MARC LC and a distinction between the two will only be made if necessary. The MARC LC project originated in the context of a shared cataloging program and immediately served as a model in different national bibliographies and in public and university libraries. In this paper we will discuss BNB MARC of the British National Bibliography, the NYPL automated bibliographic system of the New York Public Library, MONOCLE of the library of the University of Grenoble, Canadian MARC, and FBR (Forma Bibliothecae Regiae), the internal processing format of the Royal Library of Bel- gium.l2-21 In order to further optimize the coding of a bibliographic description, the Library of Congress also provided for each field two special codes, called indicators. The function of these indicators differs from field to field. For example, in a personal name one of the indicators describes the type of name, to wit: forename, single surname, multiple surname, and name of family. Some of the indicators may act as an internal code. In spite of the well-considered structuring of the bibliographic data in the subfield level systems, not all library objectives may yet be satisfied. To reduce the remaining limitations, some approaches similar to those elabo- rated in field level systems are supplied. Some ( NYPL, MARC LC internal fmmat, and Canadian MARC) have, or will have, in a very limited way, recourse to a procedure of duplication of subfields or fields. All cited sys- tems, except NYPL, use to a greater or lesser degree internal coding tech- niques. Finally some subfield level systems automatically solve certain filing problems by computer algorithms. This option was taken by NYPL, MARC LC, and BNB MARC. Each of these methods will be discussed in detail in the next section. TECHNIQUES FOR SPECIAL PROCESSING OF DATA Methods for special treatment of words or characters within biblio- graphic text were for the most part introduced to suppmt exact file ar- rangement procedures and printing operations. In order to give concrete form to the following explanation, we will illustrate some complex cases. Each example contains the printing form and the filing form according to specific cataloging practices for some bibliographic elements. Consider the titles in examples 1, 2, and 3, and the surnames in examples 4, 5, and 6. Example 1: L'Automation des bibliotheques AUTOMATION BIBLIOTHEQUES Example 2: Bulletino della R. Accademia Medica di Roma BOLLETINO ACCADEMIA MEDICA ROMA Techniques for Special Processing/ GOOSSENS 171 Example 3: IBM 360 Assembler language I B M THREE HUNDRED SIXTY ASSEMBLER LANGUAGE Example 4: Me Kelvy MACKELVY Example 5: Van de Castele V ANDECASTELE Example 6: Martin du Card MARTIN DUGARD We do not intend, in this paper, to review the well-known basic rules for building a sort key (the translation of lowercase characters to uppercase, the completion of numerics, etc.). Our attention is directed to the char- acter strings that file differently than they are spelled in the printing form. The methods developed to meet these problems are of a very different na- ture. For reasons of space, not all the examples will be reconsidered in every case; only those most meaningful for the specific application will be chosen. Duplication Methods We briefly repeat that this method consists of the duplication of certain bibliographic elements in variant fonns, each of them exactly correspond- ing to a certain type of treatment. In Bochum, the title data are handled in this way. One field, called "Sachtitel," contains the filing form of the title followed by the year of edition. Another field, named "Titelbeschrei- bung," includes the printing form of the title and the other elements nec- essary for the identification of a work (statements of authorship, edition statement, imprint, series statement, etc.). To apply this procedure to ex- amples 1, 2, and 3, the different forms of each title respectively have to be stored in a printing field and in a sorting field. Analogous procedures are, in a more limited way, employed in the Deutsche Bibliographie. For in- stance, in addition to the imprint, the name of the publisher is stored in a separate field to facilitate the creation of publisher indexes. The tech- nique of the duplication of bibliographic elements has also been consid- ered in subfield level systems. The NYPL format furnishes a filing subfield in those fields needed for the creation of the sort key. This special subfield is generally created by program, although in exceptional cases manual in- put may be necessary. In the filing subfield the text is preceded by a special character indicating whether or not the subfield has been introduced man- ually. MARC LC (internal format) and Canadian MARC opt for a more flexible approach in which the filing information is specified with the same precision as the other information. The sorting data are stored in complete fields containing, among others, the same subfields as the corresponding original field. Because in most subfield level systems the number of different fields is much higher than in field level systems, the duplication method becomes more intricate. Provision of a separately coded field for each normal field 172 J oumal of Library Automation Vol. 7 I 3 September 197 4 which may need filing information is excluded. Only one filing field is sup- plied, which is repeatable and stored after the other fields. In order to link the sorting fields with the original fields, specific procedures have been de- vised. MARC LC, for instance, reserves one byte per field, the sorting field code, to announce the presence or the absence of a related sorting field. The link between the fields themselves is placed in a special subfield of the filing field. 22 In the supposition that examples 3 and 4 originate from the same bibliographical description, this method may be illustrated schemati- cally as follows: tag 100 245 880 880 sorting field code sequence number X 1 X 1 1 2 data $a$Mc Kelvy $a$IBM 360 Assembler Language $ja$1001$MacKelvy $ja$2451$I B M Three hundred sixty Assembler Language As is well known, the personal author and title fields are coded respectively as tag 100 and tag 245. Tag 880 defines a filing field. In the second column, the letter x identifies the presence of a related sorting field. The third col- umn contains a tag sequence number needed for the unequivocal identifi- cation of a field. In the last column the sign ·$ is a delimiter. The first $ is followed by the different subfield codes. The other delimiters initiate the subsequent subfields. In tag 100 and 245, the first subfields contain the sur- name and the short title respectively. In tag 880 the first subfield gives the identification number of the related original field. The further subfield subdivision is exactly the same as in the original fields. In Canadian MARC a slightly different approach has been worked out. Note that in neither of the last two projects has this technique been implemented yet. For an evaluation of the duplication method different means of appli- cation must be considered. If not systematically used for several biblio- graphic elements, the method is very easy at input. The cataloger can fill in the data exactly as they are; no special codes must be imbedded in the text. But it is easy to understand that a more frequent need of duplicated data renders the cataloging work very cumbersome. In regard to informa- tion processing, this method consumes much storage space. First, a certain percentage of the data is repeated; second, in the most complete ap- proach of the subfield level systems, space is needed for identifying and linking information. For instance, in MARC LC, one byte per field is pro- vided containing the sorting field code, even if no filing information at all is present. Finally, programming efforts are also burdened by the need for special linking procedures. In order to minimize the use of the duplication technique, the cited sys- tems reduce their application in different ways. Bochum simplified its cata- loging rules in order to limit its use to title information. As will be ex- plained further, the Deutsche Bibliographie also has recourse to internal Techniques for Special Processing/ GOOSSENS 173 coding techniques. NYPL, MARC LC, and Canadian MARC only call on it if other more efficient methods (see later) fail. They also make an at- tempt to adapt existing cataloging practices to an unmodified machine handling of nonduplicated and minimally coded data. Intemal Coding Techniques Separators Separators are special codes introduced within the text, identifying the characters to be treated in a special way. A distinction can be made among four procedures. 1. Simple separators. With this method, each special action to be per- formed on a limited character string is indicated by a group of two identical separators, each represented as a single special sign. Illustra- tion on examples 2, 3, 4, and 6 gives: Example 2: £ Bolletino £ ¢Bulletino della R. ¢Accademia Medica ¢di ¢Roma Example 3: £I B M three hundred sixty £¢IBM 360 ¢Assembler Language Example 4: M£a£c¢ ¢Kelvy Example 6: Martin du¢ ¢card The characters enclosed between each group of two corresponding codes £ must be omitted for printing operations. In the same way the characters enclosed between two corresponding codes ¢ are to be ig- nored in the process of filing. In the case that only the starting posi- tion of a special action has to be indicated, one separator is sufficient. For instance, if in example 1 we limit ourselves to coding the first character to be taken into account for filing operations, we have: Example 1: L' I Automation des bibliotheques where a slash is used as sorting instruction code. The simple separator method has tempting positive aspects. Occu- pying a minimum of storage space (maximum two bytes for each in- struction), the technique gives a large range of processing possibili- ties. Indeed, excluding the limitation on the number of special signs available as separators, no other restrictions are imposed. This argu- ment will be rated at its true worth only after evaluation of the multi- ple function separators method and of the indicator techniques. The major disadvantage of the simple separator method lies in its slowness of exploitation. In fact, for every treatment to be performed, each data element which may contain special codes has to be scanned, character by character, to localize the separators within the text and to enable the execution of the appropriate instructions. For example, in the case of a printing operation, the program has to identify the parts of the text to be considered and to remove all separators. The sluggishness of 17 4 I ournal of Library Automation Vol. 7 I 3 September 197 4 execution was for some, as for Canadian MARC, a reason to disap- prove this method.23 As already mentioned, another handicap with cataloging applications is the loss of a number of characters caused by their use as special codes. It is self-evident that each character needed as a separator cannot be used as an ordinary character in the text. For Bochum this was a motive to reject this method. Many of the field level systems with internal codes have recourse to simple separators. We mention the Deutsche Bibliographie, in which some separators indicate the keywords serving for automatic creation of indexes and others give the necessary commands for font changes in photocomposition applications. In order to reduce the number of special signs, the Deutsche Bibliographie also duplicates certain bib- liographic data. BIKAS uses simple separators for filing purposes. The technique is also employed in subfield level systems. In MONOCLE each title field contains a slash, indicating the first char- acter to be taken into account for filing. 2. Multiple function separators. Designed by the British, the technique of the multiple function separators was adopted in MONOCLE. The basic idea consists of the use of one separator characteristic for in- structing multiple actions. In the case of MONOCLE these actions are printing only, filing only, and both printing and filing. In order to give concrete form to this method we apply it to examples 3, 4, and 6, using a vertical bar as special code. Example 3: JIBM 360 JIB M three hundred sixty JAssembler Language Example 4: MJc JacJKelvy Example 6: Martin duJJJGard The so-called three-bar filing system divides a data element into the following parts: data to be J data to be I data to be filed and printed J printed only filed only I data to be J filed and printed In comparison with the simple separator technique, this method has the advantage of needing fewer special characters. A gain of storage space cannot be assumed directly. As is the case in example 6, if only one special instruction is needed, the set of three separators must still be used. On the other hand, one must note that a repetition of identi- cal groups of multiple function separators within one data element must be avoided. Subsequent use of these codes leads to very unclear representations of the text and may cause faulty data storage. This can well be proved if the necessary groups of three bars are inserted in examples 1 and 2. Of the studied systems, MONOCLE is the only one to use this method. 3. Separators with indicators. As mentioned in the description of sub- field level systems, two indicators are added for each field present. In Techniques for Special P1'0cessing/ GOOSSENS 175 order to speed up the processing time in separator applications, indi- cators may be exploited. In MONOCLE the presence or the absence of three bars in a subfield is signalled by an indicator at the begin- ning of the corresponding field. This avoids the systematic search for separators within all the subfields that may contain special codes. The number of indicators being limited, it is self-evident that in certain fields they may already be used for other purposes. As a result, some of the separators will be identified at the beginning of the field and others not. This leads to a certain heterogeneity in the general system concept which complicates the programming efforts. Under this heading, we have mentioned the use of indicators only in connection with multiple function separators. Note that this pro- cedure could be applied as well in simple separator methods. Never- theless, none of the subfield level systems performs in this fashion be- cause it is not necessary for the particular applications. This method is not followed in the field level systems as no indicators are provided. 4. Compound separators. A means of avoiding the second disadvantage of the simple separator technique is to represent each separator by a two-character code: the first one, a delimiter, identifies the presence of the separator and is common to each of them; the second one, a normal character, identifies the separator's characteristic. Taking the sign £ as delimiter and indicating the functions of nonprinting and nonfiling respectively by the characters a and b, examples 2 and 4 give in this case : Example 2: £ aBolletino £ a£ bBulletino della R. £ bAccademia Medica £ bdi £ bRoma Example 4: M£aa£ac£b £bKelvy Thus the number of reserved special characters is reduced to one, in- dependent of the number of different types of separators needed. In none of the considered projects is this technique used, probably be- cause of the amount of storage space wasted. Indicators As the concept of adding indicators in a bibliographic record format is an innovation of MARC LC, the methods described under this heading concern only subfield level systems. Although at the moment of the crea- tion of MARC LC one did not anticipate the systematic use of indicators for filing, its adherents made good use of them for this purpose. 1. Personal name type indicator. As mentioned earlier, in MARC LC one of the indicators, in the field of a personal name, provides infor- mation on the name type. This enables one to realize special file ar- rangements. For example, in the case of homonyms, the names con- sisting only of a forename can be filed before identical surnames. Using the same indicator, an exact sort sequence can be obtained for 176 Journal of Libmry Automation Vol. 7/3 September 1974 single surnames, including prefixes. Knowing that the printing form of example 5 is a single surname, the program for building the sort key can ignore the two spaces. The systems derived from MARC LC developed analog indicator codifications adapted to their own re- quirements. This seems to be an elegant method for solving particular filing problems in personal names. Nevertheless, its possibilities are not large enough to give full satisfaction. For instance, example 6 gives a multiple surname with prefix in the second part of the name. The statement of multiple surname in the indicator does not give enough information to create the exact sort form. Because of this shortcom- ing, MONOCLE had recourse to the technique called "separators with indicators." 2. Indicators identifying the beginning of filing text. BNB MARC re- serves one indicator in the title field for identification of the first character of the title to be considered for filing. This indicator is a digit between zero and nine, giving the number of characters to be skipped at the beginning of the text. Applying this technique to example I, the corresponding filing indicator must have the value three. Without having recourse to other working methods, this title sorts as: Example 1: AUTOMATION DES BIBLIOTHEQUES Notice that the article des still remains in the filing form. This procedure has the advantage of being very economical in storage space and in processing time. Moreover the text is not clut- tered with extraneous characters. On the other hand we must disap- prove of the limitation of this technique to the indication of non- filing words at the beginning of a field. The possibility of identify- ing certain character strings within the text is not provided for. Tak- ing examples 2 and 3 we observe that the stated conditions cannot be fulfilled. Another negative side is the number of characters to be ig- nored, which may not exceed nine. Also one indicator must be avail- able for this filing indication. After BNB MARC, MARC LC and Canadian MARC also introduced this technique. 3. Separators with indicators. The use of indicators in combination with separators has been treated above. Pointers A final internal coding technique which seems worth studying is the one developed at the Royal Library of Belgium for the creation of the catalogs of the library of the Quetelet Fonds, a field level system. The pointer technique is rather intricate at input but has many advantages at output. Because there is inadequate documentation of this working meth- od, we will try to give an insight into it by schematizing the procedures to be followed to create the final storage structure. At input, the cataloger in- Techniques for Special P1'Dcessing/GOOSSENS 177 serts the necessary internal codes as simple separators within the text. These codes are extracted by program from the text and placed before it, at the beginning of each field. Each separator, now called pointm· char- acteristic, is supplemented with the absolute beginning address and the length of its action area within the text. In the Quetelet Fonds the pointer characteristic is represented by one character, the address and length oc- cupy two bytes each. The complete set of pointers (pointer characteristics, lengths, and addresses ) is named pointer field. This field is incorporated in a sort of directory, starting with the sign "&" identifying the beginning of the field, followed by the length of the directory, the length of the text, and the pointer field itself. This is illustrated in Figure 1. Note that each field contains the five first bytes, even if no pointers are present. In the Quetelet Fonds, pointers are used for the following purposes: nonfiling, nonprinting, KWIC index, indication of a corporate name in the title of a periodical, etc. Examples 2, 3, and 4 should be stored in this system as represented in Figure 2. directory text I I pointer field I I I I I 1 I I I I I I I I I I I Representation of the structure of a field in the internal processing format of the Quetelet Fonds system. The codes respectively represent: &: field delimiter; Ld: length of directory; Lt: length of text; x, y, . . . : pointer characteristics; Ax, Ay, . . . : addresses of the beginning of the related action area inside the text; Lx, Ly, ... : length of these action areas. Fig. 1. Structure of Direct01y with Pointe1' Technique. ' The advantages of the pointer technique are numerous. First, we must mention the relative rapidity of the processing of the records. In fact, in order to detect a specific pointer, only the directory has to be consulted. All subsequent instructions can be executed immediately. In contrast with most of the other methods discussed, there is no objection to using pointers for all internal coding purposes needed. This enables one to pursue homo- geneity in the storage format, facilitating the development of programs. Fur- ther, the physical separation of the internal codes and the text allow, in most cases, a direct clean text representation without any reformatting. Finally, unpredictable expansions of internal coding processes can easily be added without adaptation of the existing software. A great disadvantage of the pointer technique lies in the creation of the directory. The storage space occupied by the pointers is also great in comparison with the place occupied by internal codes in other methods. A further handicap is the limitation imposed at input due to the use of simple separators. 178 Journal of Library Automation Vol. 7 I 3 September 197 4 ~~2,!J5,31~4>.~ 1,~eb ,4>11 ,9lel4,61ct;,3jB.O,L,L,E,T, I,N.O, ,B,u, I, I ,e, t, i ,n,oj 0 5 10 15 ,d,e,l,l,a, ,R,., ,A,c,c,a,d,e,m,i,a, ,M,e,d,i,c,a, ,d:i, ,R,o,m,a,$ ~ ~ ~ ~ ~ ~ ~ ~ ~~1 ,5 1 5,2~A~,$1 2,61sl2,6 1ci>, eli, ,B, ,M, ,T,H,R,E,E, ,H,U,N,D,R,E,D, ,S, I ,x,rl 0 5 10 15 ~ lv, , I,B,M, ,3,6,4>, ,A,s,s,e,m,b, I ,e, r, , I ,a,n,g,u,a,g,e, ~ ~ ~ H ~ ~ ~~ ~~1 ,5,4>,91~4>, 114>,1lelc~>,3 1 ci>, 1jM,A,c, ,K,e, l ,v,y, ~ 0 5 8 Representation of examples 2, 3, and 4 in the Quetelet Fonds format. A represents the pointer characteristic for nonprinting data; B is the pointer characteristic for nonfiling data. Fig. 2. Pointe1· Technique as Applied to Bibliographic Data. In spite of these negative arguments, we see a great interest in this meth- od, and wish to give some suggestions in order to relieve or to eliminate some of them. Initially we must realize that the creation of a record takes place only once, while the applications are innumerable. The possibility of automatically adding some of the codes may also be considered. Data need- ing special treatment expressed in a consistent set of logical rules can be coded by program. Only exceptions have to be treated manually. In consid- ering the space occupied by the directory, some profit could be imagined by trying to reduce the storage space occupied by the addresses and the lengths. There is also a solution to be found by not having systematically to provide pointer field information. One must realize that only a small percentage of the fields may contain such codes. Finally, the restrictions at input may be removed by using complex separators. Such a change does not have any repercussion on the directory. As far as we know, the pointer technique has not been used in a subfield level system. At our library an internal processing format of the subfield level type, called FBR, is under development, in which a pointer technique based on the foregoing is incorporated. Techniques for Special P1'Dcessing/GOOSSENS 179 Automatic Handling Techniques In order to give a complete review of the methods of handling data within bibliographic text, we must also treat the methods in which both the identification and the special treatment of these data are done during the execution of the output programs. The working method can easily be demonstrated with example 1. Only the printing form must be re- corded. The program for building the sort key processes a look-up table of nonfiling words including the articles L' and des. The program checks every word of the printing form for a match with one of the words of the nonfiling list. The sort key is built up with all the words which are not present in this table. To treat example 4, an analogous procedure can be worked through. An equivalence list of words for which the filing form differs from the printing form is needed. If, during the construction of the sort key, a match is found with a word in the equivalence list, the cor- rect filing form, stored in this list, is placed in the sort key. The other words are taken in their printing form. In our case, using the equivalence list, Me should be replaced by MAC. In order to speed up the look-up procedures, different methods of organization of the look-up tables can be devised. Other types of automatic processing techniques can be illustrated by the special filing algorithms constructed for a correct sort of dates. For instance, in order to be able to sort B.C. and A.D. dates in a chronological or- der, the year 0 is replaced by the year 5000. B.c. and A.D. dates are respec- tively subtracted from or added to this number. Thus dates back to 5000 B.c. can be correctly treated. This technique, introduced by NYPL, is also used at LC. The advantages of automatic handling techniques are many. No special arrangements must be made at input. Only the bibliographic elements must be introduced under the printing form and no special codes have to be added. There is no storage space wasted for storing internal codes. As nega- tive aspects we ascertain that not all cataloging rules may be expressed in rigid systematic process steps. Examples 2 and 3 illustrate this point. One must also recognize that the special automatic handling programs must be executed repeatedly when a sort key is built up, increasing the pro- cessing time. This procedure may give some help for filing purposes, but we can hardly imagine that it really may solve all internal coding prob- lems. Think of the instructions to be given for the choice of character type while working with a type setting machine. The automatic handling technique is very extensively applied in the NYPL programs, MARC LC has recourse to it for treating dates, and BNB MARC for personal names. 24 None of the field level systems considered here uses this method. SUMMARY AND CONCLUSIONS Table 1 presents, for the discussed systems, a summary of the methods used for treating data in a bibliographic text. The duplication and indica- tor techniques have the most adherents. However, we must keep in mind Table 1. Review of the Techniques for Special Processing of Data within Bibliographic Text Used or Planned in the Discussed Systems Systems Techniques ,..... 00 0 Automatic Duplication Internal Codes Handling ......... 0 Separators t Separators with Indicators Indicators Pointers Multiple Personal Beginning of 0 Simple Function Name Type Filing Text -t-t Deutsche ""· c::>"' Bibliographie X X ~ ~ <.-: Eo chum X ~ <>+- 0 ~ BIKAS a .... 0 ;:I Quetelet Fonds X < 0 !"'"" -l. MARC LC X X X X -CN CJ') BNB MARC X X X ('t) "'0 ..... ('t) NYPL X X s 0" ('t) .... ,..... MONOCLE X X X X co -l. Jol>.. Canadian MARC X X X FER X Techniques for Special Processing/ GOOSSENS 181 that in most of the systems the duplication of data only represents an ex- treme solution. On the other hand, indicators are very limited in their pos- sibilities. As far as the flexibility and application possibilities are con- cerned, the simple separators and the pointers present the most interesting prospects. Automatic handling techniques may produce good results for use in well-defined fields or subfields. From the evaluations given for the different methods, we conclude that for a special application the choice of a method depends greatly on the objectives, namely the sort of special processing facilities needed, the vol- ume of data to be treated, and the frequency of execution. REFERENCES I. Rudolf Blum, "Die maschinelle Herstellung der Deutschen Bibliographie in biblio- thekarischer Sicht," Zeitschrift fUr Bibliothekswesen und Bibliographie 13:303-21 (1966). 2. Die ZMD in Frankfurt am Main; Herausgegeben von Klaus Schneider (Berlin: Beuth-Vertrieb GmbH, 1969), p.133-37, 162-67. 3, Magnetbanddienst Deutsche Bibliographie, Beschreibung fUr 7-Spur-Magnet- biinder (Frankfurt on the Main: Zentralstelle fi.ir maschinelle Documentation, 1972). 4. Ingeborg Sobottke, "Rationalisierung der alphabetischen Katalogisierung," in Electronische Datenverarbeitung in der Universitiitsbibliothek Bochum; Heraus- gegeben in Verbindung mit der Pressestelle der Ruhr-Universitat Bochum von Gunther Pflug und Bernhard Adams (Bochum: Druck- und Verlagshaus Schiirmann & Klagges, 1968), p.24-32. 5. Datenerfassung und Datenverarbeitung in der Universitiitsbibliothek Bielefeld: Eine Materialsammlung; Hrsg. von Elke Bonness und Harro Heim (Munich: Pullach, 1972). 6. Michel Bartholomeus, L' aspect informatique de la catalographie automatique (Brussels: Bibliotheque royale Albert J•r, 1970), 7. M. Bartholomeus and M. Hansart, Lecture des ent1·ees bibliog1·aphiques sous format 80 colonnes et creation de l'enregistrement standard; publication interne: Mecono B015A (Brussels: Bibliotheque royale Albert J•r, 1969). 8. Henriette D. Avram, John F. Knapp, and Lucia J. Rather, The MARC II Format: A Communications Format for Bibliographic Data (Washington, D.C.: Library of Congress, 1968) . 9. Books, a MARC Format: Specifications for Magnetic Tapes Containing Catalog Records for Books (5th ed.; Washington, D.C.: Library of Congress, 1972). 10. "Automation Activities in the Processing Department of the Library of Congress," Library Resources & Technical Services 16:195-239 (Spring 1972). 11. L. E. Leonard and L. J. Rather, Internal MARC Format Specifications for Books (3d ed.; Washington, D.C.: Library of Congress, 1972). 12. MARC Record Service Proposals (BNB Documentation Service Publications no.1 [London: Council of the British National Bibliography, Ltd., 1968]). 13. MARC II Specifications (BNB Documentation Service Publications no.2 [London: Council of the British National Bibliography, Ltd., 1969]). 14. Michael Gorman and John E. Linford, Desc1·iption of the BNB MARC Record- A Manual of Practice (London: Council of the British National Bibliography, Ltd., 1971). 182 ] ournal of Library Automation VoL 7 I 3 September 197 4 15. Edward Duncan, "Computer Filing at the New York Public Library," in Lm·c Reports vol.3, no.3 ( 1970), p.66-72. 16. NYPL Automated Bibliographic System Overview, Internal Report. (New York: New York Public Library, 1972). 17. Marc Chauveinc, Monocle: Projet de mise en ordinateur d'une notice catalo- graphique de livre. Deuxieme edition (Grenoble: Bibliotheque universitaire, 1972). 18. Marc Chauveinc, "Monocle," Journal of Library Automation 4:113-28 (Sept. 1971). 19. Canadian MARC (Ottawa: National Library of Canada, 1972). 20. Format de communication du MARC Canadien: Monographies (Ottawa: Biblio- theque nationale du Canada, 1973). 21. To be published. 22. Private communications ( 1973). 23. Private communications ( 1972). 24. Private communications ( 1973). 8950 ---- MARCIVE: A Cooperative Automated Library System Virginia M. BOWDEN: Systems Analyst, The University of Texas Health Science Center at San Antonio, and Ruby B. MILLER: Head Cataloger, Trinity University, San Antonio, Texas. 183 The MARCIVE Library System is a batch computer system utilizing both the MARC tapes and local cataloging to provide catalog cal'ds, book cata- logs, and selective bibliographies for five academic libra1·ies in San An- tonio, Texas. The development of the system is traced and present proce- dures are described. Batch retrieval from the MARC 1·ecords plus the modification of these records costs less than twenty cents per title. Computer costs fo1' retrieval, modification, and card production average six-ty-six cents per title, between seven and ten cents per card. The attributes and limitations of the MARCIVE system are compm·ed with those of the OCLC system. In San Antonio, Texas, a unique cooperative effort in library automa- tion has developed, involving the libraries of five diverse institutions: Trinity University, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio College (SAC), The University of Texas at San Antonio (UTSA), and St. Mary's University. These institu- tions are utilizing the MARCIVE Library System which was developed by and for one library, that of Trinity University. The MARCIVE system is a batch, disc oriented computer system utilizing both local cataloging and the MARC tapes to produce catalog cards, book catalogs, selective bibliog- raphies, and other products. DEVELOPMENT The Trinity University Library has been involved in library automation since 1966.1 When the library reclassified its collection from Dewey to the Library of Congress classification in1966, a simplified machine-readable for- mat was developed and used for storage on computer. This format con- tained the following bibliographic elements: accession number, call num- ber, author, title, and imprint date. In 1969 the library decided to refor- mat the computer data base into a MARC II compatible format in order 184 ] ournal of Library Automation Vol. 7 I 3 September 197 4 to build a data base of bibliographic records that could be the basis for all future automated systems within the library. The resulting system, MARCIVE, was designed jointly by the head cataloger, Ruby B. Miller, and the library programmer, Paul Jackson, a graduate student in Trinity's Department of Computer Science. Since in 1969 literature on completed library automation projects was sparse, no other system was used as a guide. The MARCIVE format was based on the designers' interpretation of the 1969 edition of the MARC manual. The name, MARCIVE, evolved when the programmer facetiously claimed that his format was so advanced he would call it the MARC IV format. The computer room operating staff, ignoring the space between the MARC and IV, combined the two, producing MARCIV. An E was added later for ease of pronunciation. The MARCIVE system was designed initially as a system for data stor- age and retrieval. The UPDATE, SELECT, and ACQUISITIONS LIST programs were operative in September 1970. The next month UTHSCSA inquired as to the possibility of producing catalog cards as part of the MARCIVE system. Within the brief span of three months, by January 1971, Trinity University Library produced 4,289 catalog cards and UTHSCSA produced 1,719 catalog cards via MARCIVE. In February 1974, the five participating libraries produced a total of 29,000 catalog cards, with Trinity accounting for 10,740 cards. Continued development of the MARCIVE system was delayed in 1971 by changes in Computer Center personnel and equipment. In 1972 new programs were developed to incorporate the MARC tapes into the MARCIVE system. The size of the MARC data base, which is now held on three discs, was a major problem. Modifications were included to accept in- put from magnetic tape and typewriter terminals using the APL language as well as keypunched cards. The original restriction of the system to classi- fications with one to three alphabetic letters followed by numbers, such as used by LC and NLM, was modified to accept Dewey Decimal Classifica- tion to accommodate San Antonio College. This restriction had been in- corporated in an attempt to insure that the call number would be properly formatted, thus simplifying retrieval in the SELECT program and group- ing in the ACQUISITIONS LIST and UPDATE programs. COMPUTER CONFIGURATION The MARCIVE system is a disc oriented system which was programmed for an IBM 360/44 using the MFT operating system. This computer model was designed for scientific programming and was manufactured in limited quantities. The programs were written in basic assembly language since adequate higher level language compilers for the 360 I 44 were not available at the Trinity Computer Center. In 1971 the programs were con- verted to run under DOS, and in 1972 they were converted for processing on the IBM 370/155 using the OS processing system. Since the initial pro- MARCNE/BOWDEN and MILLER 185 grams were written in basic assembly language, the subsequent programs have also been written this way. MARCIVE FORMAT The MARCIVE format is an adaptation of the MARC II format. The definition of the MARC II format is a", .. format which is intended for the interchange of bibliographic records on magnetic tape. It has not been designed as a record format for retention within the files of any specific organization ... [it is] a generalized structure which can be used to trans- mit between systems records describing all forms of material capable of bibliographic descriptions . . . the methods of recording and identifying data should provide for maximum manipulability leading to ease of con- version to other formats for various uses."2 Adaptation of the MARC II format is common among users. An analysis by the RECON Task Force found much variation among the use of the fixed fields, tags, indicators, and subfields. 3 The OCLC system can regenerate MARC II records from OCLC records although they contain only 78 percent of the number of characters in the original MARC II record. 4 The developers of the MARCIVE system studied the MARC manual and decided that the leader and directmy were not necessary for program manipulation. Such information can be generated by a conversion pro- gram. The MARC mnemonic codes were chosen instead of the numeric ones because all bibliographic data were being coded locally and it was felt that mnemonics would be easier to work with. The mnemonic codes are the ones designated in the MARC manuals except that "SI" was substituted for "SE." Rules for assigning indicators, subfields, and delimiters are those described by MARC. The basic structure of the MARCIVE format is illustrated in Figure 1. The differences between MARCIVE and MARC are as follows: 1. MARCIVE's leader consists of three fields: length of disc space, status code, and length of record. In converting MARC the following elements of the MARC leader are incorporated in the MARCIVE leader fields: length of disc space, status code, and length of record. 2. MARCIVE does not contain the MARC record directory, but rather places the tags and subfield codes in front of the actual data. 3. In the conversion from MARC II to MARCIVE, fixed fields such as date of publication are omitted. 4. All data elements in MARCIVE are treated as variable tags even though they contain fixed field data. 5. MARCIVE uses the mnemonic code names for the input of data rather than the numeric MARC codes. For example "MEP" is used for coding a person as main entry rather than "llO." The mnemonic tag names are stored in the machine format and not the numeric MARC tags. ,, ·' 186 J oumal of Libra1'y Automation Vol. 7 I 3 September 197 4 '""d ~ .... ~ 0 i5 CJ " "' .OJ "' "' .OJ § ~ s s " '""' p:; Ql "' Ql ~ CJ "' FIN FIN-Data Data Elements ..sl 0"' ~ z z <:<:: "' 0 <:<:: <:<:: 1"'1 fJP< ~ Tag .g Elements bil .g bil ~ b!l"' fl "' "' .s "' bil C/) E-< C/) E-< "' "' " "' p >-1 "' >-1 Length of disc space. This identifies the number of seventy-two byte blocks a record uses. The MARCIVE records average 350 characters or three to six blocks. Blank. This field is used by the UPDATE program. Length of record. Identifies the actual number of characters a record .contains. FIN tag. This is the MARCIVE control tag and must precede each record. It contains four subfields: accession number, type of material, location of material, and call number. Tag name. After the FIN tag, any of the MARCIVE tags may be input as long as they conform to the proper sequence (i.e., main entry must pi·ecede title). Each tag is followed by its subfield codes and the data elements. Fig. 1. MARCIVE Fo1•mat St1'uctu1'e. 6. All first indicators are input except for the first indicator in the con- tents note. 7. Most of the second indicators are not input, except for the filing in- dicators which are included in the MARCIVE format. 8. MARCIVE adds one variable tag to the MARC format called "FIN." It serves the function of the MARC 090 local holdings tag. The FIN tag must be the first variable tag in each MARCIVE record and must contain four data elements: ( 1) accession number; ( 2) type of ma- terial code (monograph, serial, etc.); ( 3) location of material within library (reference, reserve, etc.); ( 4) local call number. Even though MARCIVE is not a pure MARC format, there has been an attempt to code most of the data elements into MARCIVE. A MARCIVE to MARC conversion is being written by one of the MARCIVE libraries in order to merge its MARCIVE data base with a purchased MARC data base. MARCIVE MASTER DATA BASES Each of the M ARCIVE users maintains a separate data base of its hold- ings, which is called its MARCIVE master. This master file contains a com- plete bibliographic record for each title cataloged by the library, including MARC cataloging and local cataloging. When a library modifies a MARC record, the modified record is recorded in that library's MARCIVE master. The various libraries' MARCIVE masters have not been merged, although this is being considered. Each library has prefaced all of its accession num- bers with a unique library code just in case a merged data base is desired. MARC-CON DATA BASE The largest data base in the system is the MARC-Converted data base, MARCIVE/BOWDEN and MILLER 187 hereafter referred to as MARC-CON. This data base contains only pure MARC data that have been converted into MARCIVE machine format. No original cataloging or local modifications of MARC are contained in the MARC-CON data base. MARCIVE PROGRAMS CONVERT-This program reformats the weekly MARC tapes into the MARCIVE machine format. MARC-UPDATE-This program merges the weekly converted MARC tape with the MARC-CON disc file. An index sequential ( ISAM) file con- taining LC card number, fifty characters of the title, and the disc address of the MARC reoord is generated. The ISAM file is in LC card number or- der. In 1974 the MARC-CON data base filled three 3330 disc packs. There are three tape back-up files: one file consisting of original MARC records, one of the MARC-CON records, and a third with the ISAM file. Deleted records and replaced records are annually purged from the MARC-CON files. A new set of back-up tapes for the disc packs is created every three months in order to facilitate regeneration of the disc packs should damage occur. MARC-LIST-This program lists MARC records in title sequence from the tape. Once every six to eight weeks the list is cumulated and printed. These lists are used for searching until the annual cumulation of the NUC is received. This provides current listings of records on the MARC tapes that are not easily available in the National Union Catalog. This list- ing will be eliminated in 1974, when access by title to the MARC-CON data base is available. MARC-SEARCH-This program searches for LC numbers on the MARC-CON file using the ISAM file. A file of the matched records is pro- duced on tape or disc as specified along with a listing of these records. This listing contains the MARC-CON complete bibliographic entry (Figure 2). Although access is currently only by LC card number, access by title algo- rithm ( 3, 1, 1) is expected in 197 4. REPLACE-The purpose of this program is to modify MARC-CON records to fit the needs of the individual library. These modifications can be done automatically to all records or on a single record basis by the li- brary. The automatic changes are specified on a control card and include twenty-two options such as assignment of accession number, usage of Dew- ey class number instead of LC, and changing "U.S." in subject headings to "United States." An example of a single modification would be the chang- ing of a series entry from t~·aced to untraced. Most MARCIVE partici- pants use a combination of automatic and single changes. The output from the REPLACE program may be input to all other MARCIVE pro- grams, such as EDIT, CATALOG CARD, UPDATE, etc. EDIT-This program verifies the format of the input. Valid tags and subfields as well as correct sequence of tags are checked. Multiple spaces 188 Journal of Library Automation Vol. 7/3 September 1974 LIBRARY CODE T0000100FIN AB~PA3877.A1~D5~ T0000102LCN A~?0-022854 ~ T0000104LANO A~eng~ T0000106LANT A~enggrc~ T0000108DDC A~882j.01~ T0000110MEPF A~Aristophanes.~ T0000112TILN AC~Plays;~newly translated into English VBrsB by Patrie Dickinson.~ T0000114IHP AABC~London,~New York,~oxford University Pr~ss,~1970-~ T0000116COL AC~v. ~21 em.~ T000011HPRI AbLB.0.75 (v. 1)~{$2.95 U.s.)~ T0000120SIRU A~oxford paperbacks, 216-~ T0000122NOC A~1. Acbarnians. Knights. Clouds. Wasps. Peace,1 T0000124AEPS ADE~Dickinson, Patrie,11914-1tr.~ T0000200FIN AB1ND1097.W4~M613~ T0000202LCN A173-4J7272 ~ T0000204LANO A~enq1 T0000206LANT A~engita~ T0000210MEPS A~Monti, Franco.~ T0000212TIL AC~African masks;~[translated from th~ Italian by Andrew Hale].1 T0000214IMP AABC~London,~New York,1Hamlyn,~1969.~ T0000216COL ADC~J-157 p.169 col. illus.~20 em.~ T0000218PRI A,15/-~ T0000220SIRU a~cameo~ T0000222NOG A,Translation of Le maschere africane.~ T0000224SUT Az,Masks, African,- Africa, West., Fig. 2. SEARCH listing of MARC-CON data. are compressed to one, implied subfields are added, and a limited number of punctuation marks are generated. Actual bibliographic data are not checked so spelling errors are not detected by the program. Those titles which do not conform to specifications are rejected and an explanatory message is generated. A library may choose one of three forms of listings of output: (1) Full-Edit, (2) Mini-Edit, or (3) Error-Edit. The Full-Edit MARCIVE!BOWDEN and MILLER 189 950564 FIN,CB6950564,M,RP, QS,4,jK49T,1961;, 950564 MEPS A, KIMBER, !DIANA jCLIFFORD, 950564 TIL AC,jANATOMY AND PHYSIOLOGY, (BY> JDIANA !CLIFFORD jKIMBER 950564 5 +656 +235 *200 +233 0621 +657 0623 +237+658 •212 +229 *208 +223 0383 Improved DeliveryjHERLING, et al. 279 1.0 SUPPLIES ( I IO I MENDING & BINDING ( I 9 I 0.75 NEWLY PROCESSED CONTRACT MTS ( I 6 I ~ RECIPROCAL RETURNS ( I 8 I ::J 0.5 PHOTO DUPLICATED MTLS (I zl I= ;;;, 0.25 INTER LIBRARY LOANS ( 1 3 1 0 8 TIME (DAYS) Fig. 3. Utility CU1'Ves for the Timeliness of Library .Materials Delivery. engineering techniques to the solution of management and systems prob- lems, generally but not necessarily with the use of the computer. The op- erations research approach requires a valid unit of measurement. If an existing system is to be evaluated for comparison with alternative systems other than subjectively, some quantitative basis must be derived. We be- lieve that one of the most important products of this project was the de- velopment of a measure of effectiveness, or "objective function." This measure was a composite of numerical values (weights ) assigned to the types of materials to be delivered, as shown in Table 1; the frequency of delivery within a week; timeliness value (utility), as shown in Figures 3 and 4; and the number of units to be delivered. To illustrate: ten interlibrary loan items (a weight of .135) delivered in less than one day (a utility of 1) have an effectiveness value of 13.5. On the other hand, ten items delivered in five days (a utility of .5) have 'an effectiveness value of 0.5 X .135 X 10 = 6.75. A system designed to accomplish the latter would have 50 percent less effectiveness than a system that ac- complished the former. No librarian needs to be told that it is generally more important to de- liver interlibrary loans promptly than it is to deliver supplies. But in order to use operations research methods, quantitative values, as we have said, are required. The values shown in Table 1 and the sensitivity to timeliness of delivery, i.e., Figures 3 and 4, were established by the use of a technique - 280 Journal of Library Automation Vol. 7/4 December 1974 0.75 ~ :::i 0.5 § 0.25 0 4 8 12 16 TIME (DAYS) Fig. 4. Utility Curves for the Timeliness of Librmy Materials Delive1·y. GIFTS (Itt) INTRA LIBRARY BULK SHIPMTS ( I 5 ) NEWLY PROCESSED INTRA LIBRARY ( 1 7 ) CORRESPONDENCE (I 1 ) INTRA LIBRARY LOANS (I 4) known as the Delphi Method. Our application of the method in this proj- ect has been described elsewhere. 4 Essentially the method seeks out a con- sensus from a panel of knowledgeable people, in this case experts from academic, school, special, and public libraries, and a trustee. The methodology has three characteristics: anonymity, controlled feed- back, and statistical group response. Anonymity is used to minimize the lin- pacts of dominant individuals in the panel. This is achieved by eliciting separate and individual responses to previously prepared questions. In this case, the responses were made in writing on preprinted forms. Controlled feedback reduces the variance in parameter estimates. After the first and all remaining rounds, the results of the previous round are fed back to the panel in a summarized form showing the vote distribution along with vari- ous justifications for votes after the second round. Since the panel is asked to reevaluate their position based on the feedback provided, but with no particular attempt to anive at unanimity, the spread of votes will usually be much smaller after several rounds than during the earlier rounds. This is known as statistical group response. In each case consensus was reached within five rounds. In addition to the need for evaluating system effectiveness in relation to service, there is the need to relate effectiveness to costs. Systems Descrip- tion I provided the data on all fixed and variable costs of the existing sys- tem. Because of the prevailing use in libraries of line accounting, all the Improved Delivery/HERLING, et al. 281 300 --~9---r.r---€>~· ~----;:'\..Jo;0J..--- HDQ 0 0 250 200 "' ~ 150 ..., H ::0: ~ LEGEI:IJl !;! 100 0 HDQ A ---------- A ~ ----- B 4> ------ c X ------- D 50 0 ..1...,8""/ 1:-=7...,8'""/ 2,.,.4-..,.8/.,.,3.,...1 .....-::9""/ 8,....,.,9,..,/1:-:-4-,.,.9 /.,.,2.,..1 ...,...,9""/ 2""'8~10"""/""5 ~10::-;/-:-:12~1:-::0~/ 1:-::-9..-:1-::-0 /'7::2':""6 r-:1:-::-1-;:/ 2:-r1:-::-1-;:/ 9:-r.1:-1/;:-16::-'- TH!E --AUG, •I• SEPT. ----f-4--- OCT. Fig. 5. Weeldy Mileage Versus Time for CCPL Trucks. associated costs were not readily available, hence present costs were prob- ably underestimated. For purposes of computer processing, cost per minute of driving and cost per mile of truck operation were identified. SOFTWARE To repeat: the general approach was to study the characteristics of the existing system, then design an improved system. Using the elements de- scribed above, a computer program was written to emulate the system, in- troducing, however, the measure of effectiveness to make it possible to es- tablish values representing the existing level of performance. Entered in the program were 1. the nodes, 2. demand and frequency of delivery at each node, 3. geographic coordinates of each node, 4. unit costs, and 5. weights and utilities for each type of material. The program was run to compute, for each driver, the costs, distances traveled, volume delivered, time utilization, the effectiveness as dis- cussed earlier, and then the cost/effectiveness ratios. Figures 5 through 10 show the hard data inputs to the program. Table 2 depicts a sample of 1 to co to ._ 0 ~ Table 2. Statistical Analysis of the CPL Driver Collection Cards ~ -.Q... Summer Schedule 8117170---914170 t"-1 .... c:s-' Statistic Delivery Pickup ~ Number Number c:.s::: Daily Number Number of Number of of Bindery Number of Mileage of Stops Telescopes Packages Boxes Audiovisuals Number of Number of of Bindery Number of E" Telescopes Packages Boxes Audiovisuals .,.... 0 Mean 48.13 14.33 18.00 13.11 0.11 5.89 0.11 ~ 16.22 - ~ .,.... .... Variance 187.27 9.50 58.50 98.61 0.11 57.94 51.11 0.11 - 0 ~ Standard Deviation 13.68 3.08 7.65 9.93 0.33 7.61 7.15 0.33 - X ---4- -~--A --A---A- -~-----.A- -A- -A A >< ... c t:-~z;.-~-- LEGEND 0 HDQ A-------:.... A ~--~ B 4>---- c x----- n .,---- -_j~-!l-----£L----~~-@--~~~~0~·--~------ 0 HDQ 0 ·~ 0 Q @ 0 0 0 E) 8/17 8/24 8/31 9/8 9/14 9/21 9/28 10/5 10/12 10/19 10/26 11/2 U/9 11/16 TillE -AUG. .,..,,. SEPT. --...,•-+1.,.•--ocT. --~,._-NOV. -- Fig. 6. Number of Stops per Week Vet·sus Time for CCPL Trucks. the statistical analysis performed on the hard data. Four sets of com- puter runs were made, first using data for the same week for all drivers, then data for several weeks for different drivers. Total effectiveness of the existing system, as measured by the sum of the multiples of the im- portance of each material type (weight), their timeliness values (utilities), and total amounts of materials delivered, ranged from 8,110 to 9,950. Costs ranged from $3,801 to $3,934 per week. A second program incorporating the tools of operations research known as simulation and optimization was then used to design an improved sys- tem. This program (SIMOPT) included a routing algorithm (set of in- structions to the computer) to determine the best routes for each of the drivers on a daily basis. Figure 11 describes the basic logic of this program. The procedures to operate the methodology require the following steps (see Figure 11) : 1. Based on the library hierarchy, contractual arrangements, or any ex~ traneous but agreed-upon reasons, the librarians assign frequencies of delivery to each group of or individual nodes. 2. Using the maps (Figure 2) and other information, librarians group nodes and assign them to a driver along with the frequency as de- 284 Journal of Library Automation Vol. 7/4 December 1974 200 '"' 150 "' "' ~ "' p., § "' .., ['.! ,.. 100 0 "' "' ~ "' 50 0 0 l-Aue. "'I • SEPr.----J----OCT. ---!"'"'""''"'.,_NOV, ._ LEGEI>ID 0 CCPL HDQ A-------·ccPL A (;;>----CCPL B cp----CCPLC X---·- CCPL D Fig. 7. Number of Telescopes Delivered per Week Versus Time for CCPL Trucks. rived from step 1 above. This constitutes the input necessary for a com- puter production run. 3. In a production run, the computer calculates: a. its best route for each driver day by day; b. the effectiveness of the route; c. the cost of the route cumulative by day for one week; d. the distance traveled by each driver; e. the time spent working by each driver; and f. capacity, time, and/or distance constraint violations 4. If results of step 3 are not satisfactory or a better variant is synthe- sized, librarians can iterate through steps 1 or 2. In order to maintain the information basis of the procedure, the fol- lowing input must be updated for computer files. Ad Hoc Basis • Node changes -new nodes -nodes to be dropped -changes of location -changes of hierarchical status and category • Changes in vehicle capacity • Changes in cost parameters Pe1'iodic (inte1'mediate range) Basis • Evaluate demands-by season by node 600 500 0 Imp1'0ved Delive1·yjHERLING, et al. 285 0 0684 0-~~~------wq 0 LEGEND 0 WQ [;) I! 400 A------- A (>--~- B 0----- c X----- D .,. f;l ~ ..., ~ 300 0 I "' "" 0 0 200 I 100 Fig. 8. Number of Packages Delivered per Week Vm·sus Time for CCPL Trucks. (once every two or three years or ad hoc if major shifts have been established) • Evaluate driver time data (as above) Pel'iodic (long mnge) Basis • Reevaluate the material types • Reestablish sensitivity curves Maintaining the same frequency of delivery as used earlier, but with routes generated by the computer, results showed a potential cost reduction of 5 percent and an increased effectiveness of 37,930, or 400 to 500 percent improvement. The simulation-optimization program also has the capability of process- ing changes in the elements of the system. Effects of two types of changes were tested: 1. configurations which included an increase of frequency of delivery to daily delivery for most libraries and twice-daily delivery to some; and 2. configurations which included one or two trucks dedicated to trans- shipment delivery among key distribution centers. 286 Journal of Library Automation Vol. 7/4 December 1974 150 0 125 0 0 I.EGENQ 0 HDQ A-------- A ~==:--= ~ X----- D 25 .. -~ «!> Cj)c t:;o ~ . SIP ~ ~ _r:;.. r:;.. ·q;--B 0 8/17 8/24 8/31 9/8 9/14 9/21 TIME - AUG, ... I .. SEPT. ... I .. ...I .. Fig. 9, Number of Bindery Boxes Delivered per Week Versus Time for CCPL Trucks. Effectiveness again increased 400 to 500 percent; costs, however, also in- creased, between 3 and 39 percent. DISCUSSION Essentially, these results provided the means by which Cleveland libraries could maintain the existing delivery system at a slight reduction in cost, but with a four- to fivefold increase in effectiveness; or could improve the frequency of delivery at a known increase in cost and the four- to fivefold improved effectiveness. At the same time, a realistic basis for. evaluating bids from commercial delivery services was made available, should this al- ternative be explored. Last, but by no means least, a method for the analy- sis and/or design of a delivery system that could be used by. other library networks was developed. No study-as is true of most human endeavors-is perfect: ours is no exception. The original intent of the proposal, to study the entire distri- bution system, and especially its reference network aspects, was narrowed to the delivery subsystem because of inadequate funding. Underestimation of the complexity of the problem, which mandated the expenditure of more time than was anticipated on data collection and systems description, caused a limitation on the time that could be devoted to study of the de- Improved DelivetyjHERLING, et al. 287 300 0 250 "' "' ~ 200 "' LEGEND :;J [;:' 0 HDQ H A ---- A "' "' 0 ----- B "' 0 :> 150 0 ---- c ~ ----- D ""' 0 0 P'l "' i 100 50 0 8/17 8/24 8/31 9/8 9/14 9/21 9/28 10/5 10/12 10/19 10/26 11/2 11/9 11/16 -,AUG. ~~--SEPT. ----t--- OCT, ---1-o--- NOV. Fig. 10. Numhe1' ofAudiovisuals Delivered pel' Week Versus Time fo1' CCPL Trucks. livery subsystem. We could not, as we had intended, consider the question of optimum truck size or alternative types of vehicles; hypothetically, a combination of motorcy 1 cles and large trucks would produce a more cost- effective system. Acceptance of the location of facilities such as garages as fixed was a further limiting factor: their relocation might have a signifi- cant effect. Finally, the method of approach in concert with the realities of library budgets ruled out the design of an ideal system unrelated to the existing system. · Enough has been written recently to denigrate the usefulness of the com- puter in library applications. Nevertheless, we must acknowledge that a greater amount of human intervention than anticipated was employed as a corrective in the generation of computer-produced routes and must also be used for their implementation. Consider: each of 700 geographical lo- cations is a potential successor in a route to any other of the remaining 699. To process these for computer routing would require obtaining near- ly 500,000 pairs of geographical coordinates, their keypunching, and veri- fying. By human selection from a map, reasonable sets of contiguous nodes were fed into the computer; the pairs of geographical coordinates were thus reduced to the not unmanageable number of 2,500 . to 6,400 pairs. Further, once computer routes have been generated, human interven- 288 ] ournal of Library Automation Vol. 7 I 4 Decem her 197 4 UNIT & WEIGHTS & 9 VARIABLE UTILITIES COSTS VALUES HIERARCHY NODE SELECT & LOCATION DEMAND r--- SEASON OR POLICY RULES COORDINATES DATE I I t t I I SELECT I I POTENTIAL SELECT NODE LIST FREQUENCY FOR DRIVER ~ SELECT SCHEDULE I ROUTING SUBROUTINE ~ t I COMPUTE OBJECTIVE AND COST • CHECK :---Jill------- ____ __,- CONSTRAINTS ~......_------------ A YES NEXT NO DAY OR DRIVER CONSTRAINTS Fig. 11. A General Methodology for the Simulation-Optimization. RESOURCES ~- AVAILABLE I I : I I I I l I I I I I t ! I I I : l _J I ____ _j tion is required to adjust these to road and traffic patterns that the com- puter cannot know. This does not imply that the multitude of calculations that need be performed in a study such as this could have ever been at- tempted without the computer. CONCLUSION Despite its imperfections, the project discussed here has convinced us that the approach and methodology are of value to the library community, not only in application to library delivery systems but also in application to a multitude of library service problems, particularly those involving sev- eral libraries or library systems, albeit because of changes in top adminis- trative positions within the key library systems the results of this study are still awaiting implementation. Improved Delivery/HERLING, et al. 289 REFERENCES 1. Library of Congress Information Bulletin 31:A72 (June 9, 1972). 2. A related study relatively limited in scope is J. C. Hsiao and F. J. Heinritz, "Optimum Distribution of Centrally Processed Material: Multiple Routing Solutions Utilizing the Lock-Set Method of Sequential Programming," Library Resources & Technical Services 13:537-44 (Fall 1969). 3. Full documentation of the project is available in the following: An Operations Re- search Study and Design of an Optimal Distribution Network for Selected Public, Academic, and Special Libraries in Gmater Cleveland: Technical Report (Cleveland, Ohio: The Task Force, LSCA Title III Distribution Project, 1972); Systems Descrip- tion I (Cleveland, Ohio: The Task Force, LSCA Title III Distribution Project, 1972). These are available on loan through the State Library of Ohio. 4. A. Reisman, G. Kaminski, S. Srinivasan, J. Herling, and M. G. Fancher, "Timeliness of Library Material Delivery: A Set of Priorities," Socio-Economic Planning Sciences 6:145--52 (1972). i ! ! I I, 8960 ---- 290 A Computer-Accessed Microfiche Library R. G. J. ZIMMERMANN: Department of Engineering-Economic Sys- tems, Stanford University, Stanford, California. At the time this article was written, the author was a member of the Technical Staff, Space Photography Laboratory, California Institute of Technology, Pasadena, California. This paper describes a user-interactive system for the selection and dis- play of pictorial information stored on microfiche cards in a computeJ'- controlled viewer. The system is designed to provide rapid access to photo- graphic and graphical data. It is intended to provide a library of photo- gmphs of planetary bodies and is currently being used to sto1·e selected Martian and lunar photogmphy. INTRODUCTION Information is often most usefully stored in pictorial form. Photogra- phy, for example, has become an important means of recording data, especially in the sciences. A major reason for this importance is that pho- tographs can be used to record information collected by instruments and not normally observable by the unaided eye. Such photographs, especially in large quantities, may present a barrier to their use because of the incon- venience of reproducing and handling them. It is apparent that a system to compactly store and to speed access to these photographs would be very useful. Such a system, utilizing a microfiche viewer directly controlled by a user-interactive computer program, has been developed to support a li- brary of photographs taken from space. In the past fifteen years, the National Aeronautics and Space Adminis- tration has conducted many missions to photograph planetary bodies. These missions have provided millions of pictures of the earth, moon, and Mars. A large number of additional pich1res are expected to be taken in the near future. The Space Photography Laboratory of the California In- stitute of Technology is establishing, under NASA auspices, a microfiche library of a selection of these photographs. The library currently contains the photographs of Mars taken by the Mariner 9 spacecraft as well as lu- nar photographs taken by the Lunar Orbiter series. The library is expected to be expanded as time and resources permit. It has been operating, with various versions of the control program, since June 1972. The program is: currently being further developed by Mr. David Neff and Miss Laura Hor- Microfiche LibraryjZIMMERMANN 291 ner of the Space Photography Laboratory at the California Institute of Technology. HARDWARE The photographs are kept on 105-by-148mm microfiche cards, sixty frames to a card. This format provides the least reduction of any stan- dard microfiche format and was used to retain the highest possible resolu- tion. The cards are displayed by a microfiche viewer (Image Systems, Cul- ver City, California) which can store up to about 700 cards and has the capability of selecting a card and displaying any frame on it within a maximum of about four seconds. (Throughout this paper, "viewer" will be used to refer to the microfiche viewing device. ) The viewer can be equipped with a computer interface which allows the picture display to be directly computer controlled. An installation consists of the viewer with interface, any standard input/output ( IjO) terminal, and the control program, running, in this case, on a time-shared computer. The terminal is used for communication with the control program. The user enters all commands by typing on the terminal keyboard. The viewer is designed to be plugged in between the computer and I/0 terminal. The computer transmits all information on the circuit to which normally (without the viewer) only the terminal is attached. This information includes the view- er picture display control codes which are recognized and intercepted by the viewer. All other information is passed on to the terminal. No further special equipment is necessary. The system described has been implemented on a Digital Equipment Corporation System 10 medium-scale computer with a time-sharing operat- ing system. The program is written mainly in FORTRAN with some as- sembly language subroutines. It runs in 12K words ( 36 bits /word) of core memory. The program will not run without conversion on any computer other than the DEC System 10. SOFTWARE The control program is user-interactive, that is, it accepts information and commands from the user. These commands allow him to indicate what he desires and to control the action taken by the program. The program permits the user to indicate what characteristics he wishes the pictures to have, selects the pictures that satisfy his criteria, and then allows him to control the display of the selected pictures and to obtain any additional in- formation he may need to interpret the pictures. To guide the user, in- structions for use of the system, as well as other infonnation the user may need, are displayed on the viewer as they are required. All user responses are extensively checked for validity. Any uninterpretable response is re- jected with a message indicating the source of the trouble, and may be re- entered in corrected form. It is always possible to return to a previous state, so it is impossible to make a "catastrophic" error. In designing the 292 Journal of Librat'y Automation Vol. 7/4 December 1974 system, particular attention was paid to integrating the viewer and com- puter to utilize the unique capabilities of each. For example, most instruc- tions are presented on the viewer where they can be shown quickly and can be scanned easily by the user. Only short messages need to be sent and received by the I/0 terminal. Data Base A picture is described by a number of characteristics, called parameters. For every picture stored in the viewer, the value for each of these param- eters is stored in a disc file. In this application, parameters are mainly used to describe characteristics that are available without analyzing the picture for content. In science, these are the experimental conditions-such as viewing and lighting conditions for space photography. Because space photographs are taken by missions with different objec- tives and equipment, it was necessary to design a library system to include pictures with widely varying selection characteristics. In order to accommo- date sets of pictures with widely differing characteristics, without wasting storage space or requiring the elimination of useful descriptors, the com- puter storage has been structured to allow pictures to be grouped into pic- ture sets, each of which is described by its own set of parameters. Con- versely, any group of pictures for which the same selection parameters are used forms a picture set. The characteristics of each such set of pictures are also stored and the program reconfigures itself to these characteris- tics whenever a new picture set is encountered. Such an organization al- lows the control program to be used on groups of totally different kinds of pictures. Opemtion In selecting a picture set the user is guided along a series of decisions presented on the viewer. At each step the control program directs the view- er to display a frame with a set of possible choices. The user enters his re- sponse on the I/0 terminal and the control program uses this response to determine which frame the viewer should be commanded to display next. When the user has selected a set, he is shown the available parameters and apppropriate values for these parameters. After he has specified acceptable values for the parameters he is interested in, the computer program com- pares these values with the known values in its records for the picture set. The pictures selected by the program are then available for display. As will be described, the user may, at any time, select another picture set or change his parameter specifications. He may also indicate which pictures of those selected by the computer during the comparison search he wishes to have remain available after the next comparison search. This allows comparison of pictures in different picture sets. Appendix 1 shows an ex- ample of a typical search. The action of the control program can be separated into five phases of Microfiche Library/ZIMMERMANN 293 operation, each with a distinct function. The functions of three of these phases involve user interaction. Transfer between phases may also be ac- complished by user command. A different group of commands is employed for each of the user-interactive phases. In addition, there is a group of commands which may be used any time a user response is requested; they are listed in Appendixes 3 and 4. There are no required commands or se- quences of commands. The user proceeds from one phase to another as he desires. In each phase allowing user interaction, the user can enter any valid command at any time. Figure 1 shows the phases and possible transfers be- tween phases. A more detailed description of what occurs in each phase will be given after the data organization is described. Picture Set Selection Parameter Specification Search Optimization Comparison Search Picture Display and Information Access Bold lines enclose user-interactive phases. Arrows indicate possible directions of control transfer; bold arrows are control transfers made by user commands. Fig. 1. Phases and Control Transfers. DESCRIPTION OF SOFTWARE Data Base Organization As has been stated, the pictures of the library are grouped into picture sets. The data base may contain any number of picture sets. Each such set has a picture file associated with it. This picture file is on disc storage and 294 Journal of Library Automation Vol. 7/4 December 1974 contains all the known information stored for a set of pictures. Each pic- ture in the set has an associated picture record in the file. In addition, the first record in a picture file, known as the format record, contains all the file specific information about that file. Whenever a new picture file is called for, the format record for that file is read from disc storage into main memory and kept for reference. Figure 2 shows the organizational structure of the data base. Picture Files (as many as required) Format Record Picture Records / ~.___I ~I ......_I ~If }IJ Fig. 2. Picture File Organization, Picture records consist of a fixed- and a variable-length portion. The variable-length portion contains the known values, for the associated pic- ture, of the specification parameters. Since the number of parameters, can vary from file to file, the length of this portion varies from file to file. (However, all picture records within a particular file have the same length and form.) The maximum number of parameters for a system is deter- mined by array dimensions set when the program is compiled. Currently these dimensions are set for a maximum of fifty parameters for any file in the system. The fixed-length portion contains (generally) the same type of information for all files. It includes the information needed to display a picture and to obtain interpretive information. When, during the com- parison search, a picture is selected on the basis of information in the variable data, the fixed-length portion is copied into a table and kept for use during the picture display phase. Each selected picture is represented by an entry in this table. The contents of the fixed-length portion are presented in Table 1. As an example, the contents of a picture record for the Mariner 9 photographs are given in Appendix 5. A picture file's format record describes the file by all characteristics that are allowed to vary from file to file. The format records for all picture files have the same form; each is divided into a number of fields supplying information for a particular function. These fields can be separated into two categories: those which describe the picture records and those which apply to the file as a whole. For fields of the first type, each parameter has an enb·y in the field. For example, one such field contains the location, in Microfiche Librm·y /ZIMMERMANN 295 Table 1. The Fixed-Length Portion of a Picture Record Field Use Fiche Code File Name Picture Number Unit Number ID Number Auxiliary Codes ( 3 fields) Control code output by the control program to the viewer to display the frame associated with this picture record. The file name of the picture file; this and the picture number uniquely identify the picture record and allow it, and specifically the contents of the variable portion, to be refound. A sequence number assigned each picture record in the file in increasing order. The viewer that the picture associated with this picture record is stored in. The identification number referred to by the user. If the picture has been given an ID number by which it is commonly known, it will be kept in this field. Viewer control codes for frames containing different versions of, or auxil- iary data for, the picture. The actual contents of these fields vary with the picture file as determined from the contents of the format record of that file. a picture record, of the value for each of the parameters. Another field has a ten-letter description of each parameter. See Appendix 2 for a de- scription of the format field. Operation of the Control Progmm The following is a brief technical description of the control program; detailed documentation is available. The control program is modularly constructed. Each phase consists of a major subroutine and its subsidiary subroutines. At the completion of a phase, control is transferred to a main program which determines which phase is to be performed next and transfers control to it. The user-inter- active (interrogation) subroutines ask for a user response, attempt to in- terpret the response and perform the desired function, then ask for an- other response. An important subroutine used by all the interrogation subroutines col- lects the characters of the user response into groups of similar characters to form alphabetic keywords, numbers, punctuation marks, relational opera- tors, etc. When an interrogation subroutine is ready for a user request, it calls this "scanning" subroutine. The scanning subroutine outputs an as- terisk, indicating it is ready, to the user I/0 terminal. The scanning sub- routine supplies the groups of characters, along with a description of the group, to the interrogation subroutine. The interrogation subroutine then attempts to interpret the character groups by comparing them with accept- able responses. If the response is not in one of the acceptable forms, an error message is given to the user and he can try again. The error message includes an indication of where the error was found and describes the er- ror. Some commands do not need to be interpreted by the interrogation sub- routines; the function they request is the same throughout the program. These are called immediate commands and are listed in Appendix 3. These 296 journal of Library Automation Vol. 7/4 December 1974 commands are interpreted, and their functions performed, by the scan- ning subroutine. Picture Set Selection In selecting a picture set the user is asked to make a series of decisions. For each decision, a frame listing the possible choices is displayed on the viewer. All possible decisions form an inverted tree structure (see Figure 3). The user may also return to a previous decision point. The tree struc- ture is implemented in a table in computer storage. There is an entry in this table corresponding to each decision point in the tree. When a decision A Martian AA Orbital. AAA AAB AAC - Flyby :1-iariner Hariner Nariner IV VI, IX 0 u VII AB Sur£ace - Viking B Lunar BA Orbital - Approach BAA Apollo Hand Held BAB Apollo Metric BAC Apollo Pan BAD Lunar Orbiter BAE Ranger BB Surface BBA Apollo BBB Surveyor c Venus - Flyby D Mercury - Flyby Fig. 3, Example of a Tree. Microfiche LibmryjZIMMERMANN 297 is made, the entry corresponding to the new decision point is obtained. An entry at the bottom of the· tree identifies the picture file associated with the picture set selected. In general, an entry contains: ( 1) the viewer control code of the frame displaying the choices; ( 2) a pointer to the entry from which this node was reached; ( 3) the number of possible decisions which can be made at this decision point (to check for valid decisions); and ( 4) pointers to the entries for the decision points reached. Parameter Specification Once the user has made a decision selecting a set of pictures, he is pre- sented with a list of the available parameters and acceptable values for them. For each parameter in which the user is interested, he specifies the parameter number and the values or range of values acceptable to him. This information is stored in two tables which are referred to when the comparison search is made. One table, the parameter table, contains an en- try for each parameter specified. This table is cleared whenever a new pic- ture set is called for. An entry in the table includes: ( 1) the parameter number; ( 2) a code indicating which of several methods is to be used in processing the parameter; ( 3) a code providing information on how the user-specified values are to be interpreted; and ( 4) a pointer to the loca- tion in a second table, the values table, where the first of the specified val- ues is stored. All additional values are placed in the values table following the addressed value. The processing code (number ( 2) above) allows each parameter to be processed by a unique method. A standard method for a given parameter is kept in a field of the format record. The user can also specify a method other than the standard one. If an entry already exists for a just-entered parameter, the old entry is updated rather than a new one created. Search Optimization This phase determines the most efficient way to conduct the comparison search from among a set of alternatives. Whenever possible, the search is restricted to only a part of the picture file. For each picture file there is a number of parameters for which additional information is available. Spe- cifically, if a list of pictures ordered by increasing value of a parameter is available, the pictures which have a particular value of that parameter can be found more quickly through this list than by searching through the whole file for that value of the parameter. If the position, in this ordered list, of the picture at the low end of a range of values (of the parameter it is ordered on) can be found easily, the search can be started at this point and need only be continued until the picture at the high end has been reached. Note that the picture records for the intervening pictures must nonetheless be compared with the user specifications since the restric- tion is only made on the basis of one parameter whereas more than one may have been specified. 298 ]oumal of Library Automation Vol. 7/4 December 1974 A binary search is the method used to search the list for the first picture in a range of values. To use this method, of a set of n picture records the n / 2th is chosen and its value of the parameter is compared with the de- sired one. Since the list of records is in order of the value of this param- eter, it is clear in which half of the list a picture with the desired value of the· parameter would have to be. This interval can then be divided and the process continued until the remaining interval consists of only one picture. The main picture :file is itself usually arranged in order of at least one parameter. For other parameters, control lists of picture numbers ordered by value of these parameters can be used for binary searches. However, it is not practical to create these lists for all parameters as they require a fair amount of storage. An entry in such a list contains two words, the value of the parameter and the picture number of the corresponding picture. Pic- ture number is a sequence number which determines the position of the picture record relative to the beginning of the picture :file. Each picture file has a table in its format record containing identifiers for the parameters for which the binary search technique can be used. If more than one of these has been specified (as stored in the parameter ta- ble), it must be determined which parameter restricts the search the most. To do this the upper and lower limits of the specified values of each such parameter are found (from the values table), and from this the expected number of picture records to be compared is computed. This number is multiplied by a factor indicating the speed of the type of search to be used relative to the speed of the simplest type of search. The parameter· with the lowest expected elapsed time of search is selected for the search. Comparison Search For each picture to be compared, the appropriate picture record is found and specified parameter values are compared with those in the pic- ture record. A control list, selected in the search optimization phase, may be used to determine which picture records are to be compared. For each selected picture an entry containing a portion of the picture record is made in a picture table. The picture table has a limited capacity which is set when the program is compiled. For our application there is currently room for up to 100 entries. If the picture table is filled before the search is finished, the search is suspended and can be continued by a command in the display phase. Picture Display, Information Access This phase accepts commands to conb·ol display of the selected pictures and provide access to interpretive information. The picture table entries provide the information needed, either directly or by referring back to the picture record. Any of the selected pictures can be viewed at any time. In addition, the user can "mark" preferred pictures to differentiate them from the others. These marked pictures are set apart in the sense that Microfiche Library/ZIMMERMANN 299 many. viewing and information access commands refer optionally to only these pictures. The pictures themselves are the primary source of information, but the user will often want information that is not available from the picture in order to interpret the picture. There are commands that request the con- trol program to type out on the I/0 terminal the information in a picture record. These commands optionally refer to the picture currently dis- played, the marked pictures, or all the selected pictures. Other commands call for the display of data frames associated with a picture. These frames can contain large volumes of data that need not be kept in computer stor- age. The viewer control codes for these frames are kept in the picture ta- ble. The keyword commands to display data frames can vary from file to file. The valid commands for a file are kept in the file's format record. There are other commands to transfer control to other phases and to keep desired pictures available for display with those selected by the next comparison search. There is also a provision for adding file specific com- mands to perform any other function. The commands and their functions are listed in Appendix 3. PERFORMANCE AND COSTS A typical simple search consisting of logging in, picture selection, param- eter specification, search, and display might take five to ten minutes and cost one to two dollars for compute time. Most of this is time spent by the user in entering commands. Command execution is usually almost imme- diate as it does not involve a major amount of computation. Most of the compute time is accumulated during the comparison search phase. To search through the entire Mariner 9 picture file of around 7,000 pictures (about 200,000 words) takes about forty seconds elapsed time and costs about two dollars. A more typical search, however, will allow some search optimization and cost about thirty cents with an elapsed time of ten sec- onds. Of course, these figures should only be used as estimates, even for other DEC System 10 systems, as elapsed time depends on system load and this, as well as the rates charged, varies considerably. Total monthly com- pute costs for a system depend entirely on use. Likewise, storage costs depend on actual storage space used. For the 200,000-word Mariner 9 file our cost is about seventy-five dollars per month. Only the most-used pic- ture files actually need be kept on disc; the rest can be copied from mag- netic tape if they are needed. All files are backed up on magnetic tape in any case. The rates listed in this paper are those charged by our campus time-sharing system. DEC System 10 computer time is available from com- mercial firms at somewhat higher rates. The cost for a microfiche viewer with computer interface (Image Sys- tems, Culver City, California, Model 201) is around $7,000. A thirty-char- acters-per-second I/0 terminal sells for $1,500 and leases for $90 per month. In addition, an installation may require a microfiche camera and 300 ]oumal of Library Autonwtion Vol. 7/4 December 1974 other photographic equipment and supplies. Photographic services are also available from the viewer manufacturer. The hardware cost for an inde- pendent system implemented on a minicomputer with 12K to 20K of core and five million words of disc memory is estimated at an additional $30,000 (exclusive of development and photographic costs). IMPLEMENTING A LIBRARY SYSTEM In implementing a library system to use the hardware and software de- scribed in this paper, two major areas of effort are required. First, the pic- torial information must be converted to microfiche format; that is, it must be photographed, or possibly rephotographed if already in photographic form. In addition, a computer data base must be created. If information about the photographs is already available in computer-readable form, this involves writing a program to convert the data to the structure required by the control program. If this type of information is not available, the pic- tures may need to be investigated and the information coded, and pre- sumably punched onto computer cards, for further processing. The major difficulties we encountered were coordinating the photographic and data base generation tasks, achieving the high resolution we required to retain the detail of the original photographs, and in using early versions of the microfiche viewer (which had a tendency to jam cards). CONCLUSION A system for rapid access to pictorial information, the Computer Ac- cessed Microfiche Library ( CAML), has been described. CAML has been designed to integrate, in an easy-to-use system, the storage capacity and capability for fast retrieval of a special microfiche viewer with the manip- ulating ability and speed of a computer. It is believed that this system will help overcome the barriers to the full utilization of photographs in large quantities, as well as have applications in the retrieval of other types of pictorial information. ACKNOWLEDGMENTS The work described in this paper was supported by NASA grant #NGR 05-002-117. The author is grateful to Dr. Bruce Murray and the staff of the Space Photography Laboratory at Caltech for their support and ad- vice; he also wishes to acknowledge the efforts of Mr. James Fuhrman, who assisted in the programming task and contributed many valuable ideas. APPENDIX 1 The following is an example of a typical search. Numbers in the left margin indicate when a new frame is displayed on the viewer. These were added later to clarify the interaction between viewer and terminal. User responses and commands are identified by lines beginning with an asterisk. (The control program types asterisks when it Mic1'ofiche Libra1'yjZlMMERMANN 301 is ready for input.) In this demonstration, most keywords were completely typed out. It is possible, however, to abbreviate any keyword to the shortest form that will be unique among the acceptable keywords. After the user enters a standard "log in" procedure to identify his account number and verify that he is an authorized user of this account, the control program is auto- matically initiated. The viewer displays a picture ( 1) of the installation and the user is asked to enter his name. The name, charges, and time of use will later be added . LOG 9::::;:94-···t·H·H-J JOB 13 CALTECH 506B SYS~EM TTY?? Po=t·::s~~ORD: 1930 27-AuG-74 Tue lTD START, PLEASE ENTER YOUR NAME DEt1DtETRAT I otl 2ENTER NAME OF FILE DESIRED •~1r·1 I::·:: 3PLEASE TYPE IN PARAMETERS AND THEIR VALUES TYPE "DONE" WHEN YOU HAVE FINISHED +ORBIT 222 +CANERA A •LATITUDE -45 TO 45 +SPECIFICATIONS PARAMETERS FROM FILE MMIX ORBIT 222 DR LATITUDE -45.00 TO 45.00 CAt·1EF.'A A DR +DONE ·--?3:::2 PICTUF.:ES: TO PF.:OCESS, PLEASE lo.IAIT 2 PICTURES HAVE BEEN SELECTED 4THIS IS THE FIF.:ST PICTUF.:E THE FDLLOI!JitlG PICTUPES ARE FF.:OM FILE t·HH~'!. •• 1 PLEASE ENTEF.' COMMANDS • 5THIS IS THE LAST PICTUF.:E 2 •t·1AF.:K +TYPE PARAt-!ETERS t·1ARKED PAF.:At-!ETEF.: KEY FOR FILE Mt-!IX DAS TIME ORBIT LATITUDE PHASE ANGL VIEWING AN SLANT RANG LOCAL TIME FILTER EXPOSUR TM LONt3ITUDE CAMERA RDLLFILE ~ 2, FILE MMIX , ID = 9557769 BASE PICT. READER o, 2-E-2 1-A DATA READER 1), 2-E-2 5-A FDOTP READER o, 2-E-2 5-K PARAMETER VALUES: ·~557769 222 :3€ .. 70 140.70 48.18 14.85 29:37 A 15.29 sn,:rt i~4 425606:3 NO Cm1~1ENTS SOLAR ANGL F.:ESOLUTION 60.26 2.29 302 journal of Library Automation Vol. 7/4 December 1974 to an accounting file. The user now enters the picture set selection phase. In the cur- rent system, only two files (picture sets) are stored and the user is simply presented with a frame ( 2) listing the file names and giving a short description of what is con- tained in each. The user types the desired file name (MMIX-Mariner 9 Mars photo- graphs) and thus enters the parameter specification phase. The available selection parameters and acceptable values are now shown ( 3) . The user specifies some param- +E:>(AM I t·iE 1 MARKED PICTURES HAYE BEEN SELECTED 6THIS IS THE FIRST MARKED PICTURE THIS IS THE LAST MARKED PICTURE THE FOLLOWING PICTURES ARE FROM FILE MMIX 2 +RESPEC IF~' %WARNING--ORIGINAL SEARCH PARAMETERS ARE STILL IN EFFECT 7 PLEAS&- TYPE IH PARAMETERS AND THEIR VALUES TYPE "DONE" WHEN YOU HAYE FINISHED +RESTART 8 E~ITER NAME OF FILE DES I RED +ORBIT 9 PLEASE TYPE IN PARAMETERS AND THEIR VALUES T'r'PE "DONE" I,_IHEN YOU HAVE FINISHED +CHARGE:S: $ 0.5:3 10 •HELP ll+IDEN "> 5196 • +THIS IS A~ ERROR ++ERROR++: ~lD SUCH ~:EYI .• JDRD--PLEASE REHPE LitlE +DO 1022 PICTURES TO PROCESS, PLEASE ~JAIT 22 PICTURES HAVE BEEN SELECTED 12THJS IS THE FIRST PICTURE THE FOLLObiiNG PICTURES ARE FROM FILE ORBIT .. 1 PLEASE ENTER COMMANDS +TYPE PARAMETERS SPECIFIED PARAMETER KEY FOR FILE ORBIT IDEtl a 1• FILE ORBIT• ID = 5196 PARAMETER. VALUES: +5 13 :~ 6 5196 +TYPE PARAMETERS LATITUDE, LONGITUDE, RESOLUTION PARAMETER KEY FOR FILE ORBIT LATITUDE LONGITUDE RESOLUTION a 6, FILE ORBIT• ID = 5201 PARAMETER VALUES: 24.48 -47.27 2.90 PLEASE TURN OFF VIEI.~ER• TERMINAL• AND COUPLER Jc:s 13 [98"394, MNNJ LeGGED cFF TTY77 1948 27-A•y;-74 JI.S:I n2, ... nn) 306 Journal of Library Automation Vol. 7/4 December 1974 If not used, file name is assumed to refer to the file last searched. If the parameters are not enumerated, those specified for the picture selection are typed out. The parameters to be typed out can be enumerated or the specification parameters called for. If neither of these is done, the values of all parameters are typed out. Parameters typed out are identified by column headings. Phase Transfer Commands Function RESPECIFY Allows respecification of selection parameters-only those parameters which are reentered are changed; previously spe- cified parameters retain their values. SEARCH Similar to RESPECIFY, except only those pictures in the present list are candidates for selection. This is more efficient than again searching through all the pictures. CONTINUE If the search was terminated before all pictures had been pro- cessed, the search is continued from where it had been suspend- ed. RESTART To view another set of pictures (all specified parameter values are deleted) . Field Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23-28 APPENDIX5 Mariner 9 Picture Records Field Fixed-Length Portion Fiche Code Data Code File Name ID Number (DAS) Unit# Picture Number Footprint Code Unused Variable Portion DAS Time Orbit Latitude Longitude Solar Lighting Angle Phase Angle Viewing Angle Slant Range Camera Resolution Local Time Filter Exposure Time Role and File of Filter Version on Roll Film Comments (Content Descriptors) 8961 ---- The Binary Vector as the Basis of an Inverted Index File Donald R. KING: Rutgers University, New Brunswick, New Jersey. 307 The inverted index file is a frequently used file structure for the storage of indexing information in a document retrieval system. This paper de- scribes a novel method for the computer storage of such an index. The method not only offers the possibility of reducing storage requirements fot an index but also affords more mpid processing of query statements ex- pressed in Boolean logic. INTRODUCTION The inverted index file is a frequently used file structure for the storage of indexing information in document retrieval systems. An inverted index file may be used by itself or with a direct file in a so-called combined file system. The inverted index file contains a logical record for each of the subject headings or index terms which may be used to describe documents in the system. Within each logical record there is a list of pointers to those documents which have been indexed by the subject heading in question. The individual pointers are usually in the form of document numbers stored in fixed-length digital form. Obviously, the length of the lists will vary from record to record. The purpose of this paper is the presentation of a new technique for the storage of the lists of pointers to documents. It will be shown that this technique not only reduces storage requirements, but that in many cases the time required to search the index is reduced. The technique is useful in systems which use Boolean searches. The relative merits of Boolean and weighted term searches are beyond the scope of this paper, as are the rela- tive merits of the various possible file structures. THE BINARY VECTOR AS A STORAGE DEVICE The exact form of each document pointer is immaterial to the user of a document retrieval system as long as he is able to obtain the document he desires. The standard form for these pointers in most automated systems is a document number. Note that each pointer is by itself a piece of infor- mation. However, if one thinks of a "peek-a-boo" system, the document 308 Journal of Library Automation Vol. 7/4 December 1974 pointer becomes simply a hole punched in a card. In this case the position of the pointer, not the pointer itself, conveys the information. The new technique presented in this paper is an extension of the "peek- a-boo" concept. A vector or string of binary zeroes is constructed equal in length to the number of documents expected in the system. The position of each vector element corresponds to a document number. That is, the first position in a vector corresponds to document number one and the tenth vector posi- tion corresponds to document number ten. A vector is constructed for each subject heading in the system. As a document enters the system, ones are in- serted in place of the zeroes in the positions corresponding to the new doc- ument number in the vectors for the subject headings used to describe the document. As an example, assume the following document descriptions are presented to a system using binary vectors: Document Number 1 2 3 Subject Headings A,B,D C,E A,C The binary vectors for terms A, B, C, D, and E before the insertion of the indexing data would be as follows: Subject Heading A B c D E Vector 000 ... 0 000 ... 0 000 ... 0 000 ... 0 ooo ... ·o After the insertion of the indexing information, the same vectors would appear as follows: Subject Heading A B c D E Vector 101 ... 0 100 ... 0 011 ... 0 100 ... 0 010 ... 0 The binary vector seems to have several advantages over the standard form of storage of document numbers in an inverted file. First, the rec- ords are of fixed length since the vectors are all equal in length to the ex- pected number of documents in the system. Space may be left at the end of each vector for the addition of new documents. Periodic copying of the file may be used to expand the index records with additional zeroes added at the end of each record during the process. Consequently, unless Binary Vector/KING 309 there are limitations of size imposed by the equipment, only one access to the storage device will be needed to retrieve the index record for a term. The second advantage offered by the binary vector method appears in the search process. Most modern computers have a built-in capability of per- forming Boolean logical manipulations on binary digit vectors or strings. Thus, when Boolean operations are specified as part of a query, the imple- mentation of the operations within the· computer is considerably easier and faster for binary vectors than for the standard form of inverted files. Other investigators of the use of the binary digit patterns or vectors have not fully explored its advantages and disadvantages. Bloom suggests, without an explanation or evaluation, the use of bit patterns as the storage technique for inverted files in large data bases in the area of management information systems.1 Davis and Lin, again in the area of management in- formation systems, propose bit patterns as the means of locating pertinent records in a master file. 2 They do not compare the method with other pos- sible techniques. Sammon discusses briefly the use of binary vectors as a storage technique, but dismisses it on the basis that the two-valued ap- proach obviates the possible assignment of weights to index terms in de- scribing documents. 3 Gorokhov discusses the use of a modified binary vec- tor approach in a document retrieval system implemented on a small Soviet computer.4 Faced with the need to minimize storage requirements for his inverted file, Gorokhov concentrated on developing a technique for lo- cating and removing strings of zeroes occurring in the binary vectors used within the system. Since these zeroes represent the absence of information they could be removed if there were a way to indicate the position in the original vector of the ones that remained. He proposed the removal of strings of zeroes and the inclusion of numeric place values with the re- maining vector elements. His result is a file with variable-length index rec- ords. The abandoning of the pure binary vector obviates the process, and Gorokhov found it necessary to expand the vector elements into the orig- inal vector before logical operations could be applied. Even though he does not state so explicitly, Gorokhov seems to have found his method more efficient than the standard inverted file. Gorokhov' s suggestion has led to the development of an algorithm for the compression of binary vectors. Heaps and Thiel have also discussed the use of compressed binary vec- tors as the basis of an inverted index file. 5• 6 Aside from a brief descrip- tion of the method for implementing the concept, they offer no compari- son of the binary vector with the standard inverted file. STORAGE REQUIREMENTS An immediate reaction to the concept of binary vectors is to state that they will obviously take more storage space than the standard inverted file. A closer study shows that this is not always the case. The storage require- ments for the two types of files may be calculated as follows: 310 Journal of Library Automation Vol. 7/4 December 1974 D·N 1. MBv = 8 bytes 2. Msr = D · I · K where: ( binary vector file) (standard inverted file) M = Storage requirements in bytes D = Number of documents in the system N = Number of index terms in the system I = Average depth of indexing in the system K = Size in bytes of a document number stored in the file Using equations 1 and 2 we find that the storage requirements for the binary vector file are, in fact, less than the requirements for the standard inverted file if N < 8 •] • K. It is well lmown that the distribution of the use of index terms follows a logarithmic curve. In simple terms, one might say that a few terms are used very frequently and many terms are used infrequently. This condi- tion implies that in a binary vector file the records for many terms will contain segments in which there are no "ones" in any byte. A method for re- moving these "zero" bytes is called compression. COMPRESSION ALGORITHM The technique for the compression of binary vectors as described here is designed specifically for the IBM 360 family of computers and similar machines. The extension to other machines should be obvious. Within the IBM 360 the byte, which contains eight binary digits, is the basic storage unit, and with the eight binary digits it is possible to store a maximum integer value of 255. For the purpose of describing a proposed compression algorithm for the binary vector in the IBM 360, the term sub- vector will be defined as a string of contiguous bytes chosen from within the binary vector. A zero subvector will be a subvector each of whose bytes contains eight binary zeroes. A nonzero subvecto1· will be a subvector each of whose bytes contains at least one binary one. To compress a binary vec- tor in the IBM 360 the following steps may be taken: 1. Divide the binary vector into a series of zero subvectors and nonzero subvectors. Subvectors of either type may have a maximum length of 255 bytes. For zero subvectors longer than 255 bytes, the 256th byte is to be treated as a nonzero byte, thus dividing the long zero subvector. 2. Each nonzero subvector is prefixed with two bytes. The first of the prefix bytes contains the count of zero bytes which precede the non- zero subvector in the uncompressed vector. The second prefix byte contains a count of the bytes in the nonzero subvector. 3. The compressed vector then consists of only the nonzero subvectors together with their prefix bytes. 4. A two byte field of binary zeroes will end the compressed vector. Binmy Vector/KING 311 The compression of the vectors creates variable-length records and re- moves the advantage of having records which are directly amenable to Boolean manipulation. The effect of file compression on such manipula- tion in the search process is not as severe as it might appear. For the search process, the compressed vector may be expanded into its original form. The process of expansion of the binary vectors is relatively simple, and since only those index term records which are used in a query need to be expanded at the search time, the search time is not significantly affected. As an example of the use of the compression algorithm consider the fol- lowing binary vector. 01100000/10000000/ seven zero bytes j00000001j10000000j ... The slashes indicate the division of the vector into bytes. The vector might be read as indicating the following list of document numbers: 2, 3, 9, 80, and 81. In a standard inverted file with each document number assigned three bytes of storage, fifteen bytes would be required to store these numbers. The compressed vector which results from the application of the algo- rithm is the following: 00000000j00000010j01100000/10000000j00000111/00000010/ 00000001/10000000/ ... Again the slashes separate the vector into bytes. For the purpose of the fol- lowing discussion consider each byte in a vector to be numbered sequential- ly beginning with byte one at the left. In the uncompressed vector bytes one and two form a nonzero subvector. Consequently, the first four bytes in the compressed vector can be inter- preted as follows: Byte one. Binary zero indicating that no zero bytes were re- moved preceding this subvector. Byte two. Binary two indicating that the following nonzero sub- vector is two bytes long. Bytes three, four. Bytes one and two of the original vector. Bytes three through nine of the original vector are a zero subvector, and bytes ten and eleven form a second nonzero subvector. Consequently, the second four bytes of the compressed vector are interpreted as follows: Byte five. Binary seven indicating that a zero subvector of seven bytes has been removed. Byte six. Binary two indicating that the following two bytes are a nonzero subvector. Bytes seven, eight. Bytes ten and eleven of the original vector. Thus the binary vector has been reduced from eleven bytes to eight 312 Journal of Library Automation Vol. 7/4 December 1974 bytes while the space required to record the document numbers in the stan- dard inverted file remains fifteen bytes. MEMORY REQUIREMENTS FOR THE STANDARD INVERTED FILE AND THE BINARY VECTOR FILE To compare memory requirements for the standard inverted file and the compressed binary vector file, we base our comparison on the total number of postings in the file. In the standard inverted file the storage space for the postings is equal to the number of postings times the length of a sin- gle posting, which is usually two, three, or five bytes. Memory requirements for the compressed binary vector file are more difficult to estimate because the distribution of document numbers within the record for each index term is not known. The fact that a single byte in the binary vector file may contain between zero and eight postings is extremely important. The worst possible case occurs if the postings in the binary vector are spaced in such a way that each nonzero byte contains only one posting, and these bytes are separated by zero bytes. Consider the following example: ... /00000000/00010000/00000000/00000100/ ... In this case the compression algorithm will remove the zero bytes, but will add two bytes (the prefix bytes) for each nonzero byte. The resulting com- pressed vector will be essentially the same length as the standard inverted file record if each posting is three bytes long in the standard inverted file. It might seem that the distribution of one posting per byte for the entire vector represents an even worse situation. It is clear that the compression algorithm will, in this case, not reduce the size of the vector. However, it must be remembered that in the standard inverted file each posting will re- quire at least two bytes and perhaps three bytes. Thus, the length of the record in the standard inverted file is two or three times longer than the corresponding binary vector regardless of compression. In data used in two model retrieval systems prepared to compare the standard inverted file and the binary vector file there are 6,121 documents with a total of 94,542 postings. An examination of the binary inverted file for the model systems discloses that there are only 55,311 nonzero bytes in the binary vector file. Thus there seems to be some form of clustering of the document numbers in each index term record. If each nonzero byte in this binary vector is isolated by zero bytes, two prefix bytes would be added for each byte. Thus the total memory requirements for the postings in the compressed file would be 165,933 bytes. Less storage space is required if some nonzero bytes are contiguous. On the other hand, the standard in- verted file will require 189,084 bytes if a two-byte posting is used, or 283,626 bytes if a three-byte posting is used. Further study of the cluster- ing phenomenon is needed. Binary Vector /KING 313 MODEL RETRIEVAL SYSTEMS To test some of the conjectures about the differences between the stan- dard inverted file and the binary vector file, two model systems were pre- pared for operation on an IBM 360/67. Details of the systems and PL/1 program listings are available elsewhere. 7 The data base used was obtained from the Institute of Animal Behavior at Rutgers University. In the data base 6,121 documents were indexed by 1,484 index terms. A total of 94,542 postings in the system gives an average depth of indexing of 15.4 terms per document. Both inverted files were stored on IBM 2314 disc storage devices. To ease the problem of handling variable-length records in both files the logical records for each index term were divided into chains of fixed~ lehgth physical records. For the standard inverted file a physical record size of 331 bytes was chosen. The entire file required 702,713 bytes including record overhead. For the uncompressed binary vector file a physical record size of 1,286 bytes was chosen to include overhead and space for up to 10,216 document numbers. When the compression algorithm was applied, with a physical record length of 130 bytes, the memory requirements for the binary vector file were reduced to 281,450 bytes, or 41 percent of the space required to store the standard inverted file. A series of forty searches of varying complexities were run against both files. The "TIME" function of PL/1 made it possible to accumulate tim- ing statistics which excluded input/output functions. Search times for the binary vector file include expansion of the compressed vectors, Boolean manipulation of the vectors, and conversion of the resultant vector into digital document numbers. The times for the standard inverted file are for the Boolean manipulation of the lists. The following points were noted in the analysis of the times: 1. In twenty-two of the forty queries for which comparative timings were obtained, the search of the binary vector file was faster, in one case by a factor of thirty-five. In the eighteen cases in which the search of the standard inverted file was faster, the search of the stan- dard inverted file was at most 6.17 times faster. 2. The range of the total times for the binary vector file was .79 seconds to 9.72 seconds. The range for searching the standard inverted file was .15 seconds to 202.98 seconds. The fact that the search times for the binary vector file are within a fairly narrow range, in contrast to the wider range of times for searching the standard inverted file, has im- portant implications for the design of an on-line interactive docu- ment retrieval system. In such a system it is important that the com- puter respond to users' requests not only rapidly but consistently. The narrower range of the search times provided by the binary vector file will assist in producing consistent times. 3. The search times for the binary vector file, exclusive of expansion and conversion times, are unaffected by the number of postings con- 314 Journal of Library Automation Vol. 7/4 December 1974 tained in the index terms used in a query. On the other hand, the number of postings in the records used from the standard inverted file appears to cause the differences in search times for that file. To test the conjectures! that 1. search times for the binary vector file are related to the number of index terms in the query, and 2. search times for the standard inverted file are related to the num- ber of postings in the index terms in the query, a correlation analysis was performed. The following correlation co- efficients were obtained: V a1'iables 1' Number of terms in query and search .960 times for the binary vector file. Number of postings in query terms and .979 search times for standard inverted file. The relationships indicated above are significant at the .001 level. No attempt was made to compute an average search time per term for the binary vector file or average search time per posting for the standard in- verted file. Such times would have meaning only for the model systems. SUMMARY The binary vector is suggested as an alternative to the usual method of storing document pointers in an inverted index file. The binary vector file can provide savings in storage space, search times, and programming effort. REFERENCES 1. Burton H. Bloom, "Some Techniques and Trade-Offs Affecting Large Data Base Retrieval Times," Proceedings of the ACM 24 ( 1969). 2. D. R. Davis and A. D. Lin, "Secondary Key Retrieval Using an IBM 7090-1310 System," Communications of the ACM 8:243-46 (April1965). 3. John W. Sammon, Some Mathematics of Information Storage and Retrieval (Tech- nical Report RADC-Tr-68-178 [Rome, New York: Rome Air Development Center, 1968]). 4. S. A. Gorokhov, "The 'Setka-3' Automated IRS on the 'Minsk-22' with the Use of the Socket Associative-Address Method of Organization of Information" (Paper presented at the All-Union Conference on Information Retrieval Systems and Auto- matic Processing of Scientific and Technical Information, Moscow, 1967. Translated and published as part of AD 697 687, National Technical Information Service). 5. H. S. Heaps and L. H. Thiel, "Optimum Procedures for Economic Information Re- trieval," Information Storage & Retrieval6:131-53 (1970). 6. L. H. Thiel and H. S. Heaps, "Program Design for Retrospective Searches on Large Data Bases," Information Storage & Retrieval8:1-20 (1972). 7. D. R. King, "An Inverted File Structure for an Interactive Document Retrieval System" (Ph.D. dissertation, Rutgers University, 1971). 8962 ---- 315 TECHNICAL COMMUNICATIONS ISAD/SOLINET TO SPONSOR INSTITUTE "Networks and Networking II; The Pres- ent and Potential" is the theme of an ISAD Institute to be held at the Braniff Place Hotel on February 27-28, 1975, in New Orleans. The sponsors are the Information Science and Automation Division of ALA and the Southeastern Library Network (SOLINET). This second institute on net- working will be an extension of the pre- vious one held in New Orleans a year ago. The ground covered in that previous institute will be the point of departure for "Networks II." The purpose of the previous institute was to review the options available in net- working, to provide a framework for iden- tifying problems, and to suggest evalua- tion strategies to aid in choosing alterna- tive systems. While the topics covered in the previ- ous institute will be briefly reviewed in this one, some speakers will take differ- ent approaches to the subject of network- ing, while other speakers will discuss total- ly new aspects. In addition to the papers given and the resultant questions and answers from the floor, a period of round table discussions will be held during which the speakers can be questioned on a per- son-to-person basis. A new feature to ISAD institutes now being planned will be the presence of ven- dors' exhibits. Arrangements are being made with the many vendors and manu- facturers whose services are applicable to networking to exhibit their products and systems. It is hoped that many of them will be interested in responding to this opportunity. The program will include: "A Systems Approach to Selection of Al- ternatives" -Resource sharing-Campo- nents-Communications options-Plan- ning strategy. Joseph A. Rosenthal Uni- versity of California, Berkeley. ' "State of the Nation"-Review of current developments and an evaluation. Brett Butler, Butler Associates. "The Library of Congress, MARC, and Future Developments." Henriette D. Avram, Library of Congress. "Data Bases, Standards and Data Conver- sions" -Existing data bases-Characteris- tics-Standardization-Problems. John F. Knapp, Richard Abel & Co. "User Products"-Possibilities for product creation-The role of user products. Maurice Freedman, New York Public Library. "On-Line Technology"-Hardware and software considerations-Library re- quirements-Standards-Cost considera- tions of alternatives. Philip Long, State University of New York, Albany. "Publishers' View of Networks"-Copy- right-Effect on publishers-Effect on authorship-Impact on jobbers-Facsim- ile transmission. Carol Nemeyer, Asso- ciation of American Publishers. "National Library of Canada"-Current and anticipated developments-Coop- erative plans in Canada-International Cooperation. Rodney Duchesne, Nation- al Library of Canada. "Administrative, Legal, Financial, Orga- nizational and Political Considerations" -Actual and potential problems-Or- ganizational options-Financial com- mitment-Governance. Fred Kilgour, OCLC. Registration will be $75.00 to members of ALA and staff members of SOLINET institutions, $90.00 to nonmembers, and $10.00 to library school students. For hotel reservation information and registration blanks, contact Donald P. Hammer, ISAD, American Library Association, 50 E. Huron St., Chicago, IL 60611; 312-944-6780. 316 Journal of Library Automation Vol. 7/4 December 1974 REGIONAL PROJECTS AND ACTIVITIES Indiana Coopemtive Libmry Services Authm·ity The first official meeting of the board of directors of the Indiana Cooperative Li- brary Services Authority (InCoLSA) was held June 4, 1974, at the Indiana State Library in Indianapolis. A direct out- growth of the Cooperative Bibliographic Center for Indiana Libraries ( CoBiCIL) Feasibility Study Project sponsored by the Indiana State Library and directed by Mrs. Barbara Evans Markuson, InCoLSA has been organized as an independent not-for-profit organization "to encourage the development and improvement of all types of library service." To date, contracts have been signed by sixty-one public, thirteen academic, four- teen schools and five specfal libraries- a total of ninety-three libraries. InCoLSA is being funded initially by a three-year establishment grant from the U.S. Office of Education, Library Services and Construction Act (LSCA) Title I funds. Officers are: president-Harold Ba- ker, head of library systems development, Indiana State University; vice-president- Or. Michael Buckland, assistant director for technical services, Purdue University Libraries; secretary-Mary Hartzler, head of Catalog Division, Indiana State Li- brary; treasurer-Mary Bishop, director of the Crawfordsville Book Processing Center; three directors-at-large--Phil Hamilton, director of the Kokomo Public Library; Edward A. Howard, director of the Evansville-Vanderburgh County Pub- lic Library; and Sena Kautz, director of Media Services, Duneland School Corpo- ration. Stanford's BALLOTS On-Line Files Publicly Available through SPIRES September 16,.1974 The Stanford University Libraries au- tomated technical processing system, BALLOTS (Bibliographic Automation of Large Library Operations using a Time- sharing System) , has been in operation for twenty-two months and supports the acquisition and cataloging of nearly 90 percent of all materials processed. Important components of the BAL- LOTS operations are several on-line files accessible through an unusually powerful set of indexes. Currently available are: a file of Library of Congress MARC data starting from January 1, 1972 (with a gap from May to August 1972); an in-process file of individual items being purchased by Stanford; an on-line catalog (the cata- log data file) of all items cataloged through the system, whether copy was de- rived from Library of Congress MARC data, was input from non-MARC catalog- ing copy, or resulted from Stanford's own original cataloging efforts; and a file of see, see also, and explanatory references (the reference file) to the catalog data file. In addition, during September and October 1974, the 85,000 bibliographic and holdings records (already in machine- readable form on magnetic tape) repre- senting the entire J. Henry Meyer Me- morial Undergraduate Library was con- vmted to on-line Meyer catalog data and Meyer reference files in BALLOTS. These files are publicly available through SPIRES (Stanford Public Infor- mation Retrieval System) to any person with a terminal that can dial up the Stan- ford Center for Information Processing's Academic Computer Services computer (an IBM 360 Model 67) and who has a valid computer account. The MARC file can be searched through the following index points: LC Card Number Personal Name Corporate/ Conference N arne Title The in-process, catalog data, and refer- ence files for Stanford and for Meyer can also be searched as SPIRES public sub- files through the following index points: BALLOTS Unique Record Identifica- tion Number Personal Name Corporate/ Conference Name Title Subject Heading (catalog data and ref- erence file records only) Call Number (catalog data and refer- ence file records only) LC Card Number The title and corporate/ conference name indexes are word indexes; this means that each word is indexed individ- ually. Search requests may draw on more than one index at a time by using the log- ical operators "and," "or," and "and not" to combine index values sought. If you plan to use SPIRES to search these files, or if you would like more in- formation, a publication called Gttide to BALLOTS Files may be ordered by writ- ing to: Editor, Library Computing Ser- vices, S.C.I.P.-Willow, Stanford Univer- sity, Stanford, CA 94305. This document contains complete information about the BALLOTS files and data elements, how to open an account number, and how to use SPIRES to search BALLOTS files. A list of BALLOTS publications and prices is also available on request. As additional libraries create on-line files using BALLOTS in a network en- vironment, these files will also be avail- able. These additions will be announced in ]OLA Technical Commttnications. DATA BASE NEWS Interchange of AlP and Ei Data Bases A National Science Foundation Grant (GN-42062) for $128,700 has been awarded to the American Institute of Physics (AIP), in cooperation with Engi- neering Index ( Ei), for a project entitled "Interchange of Data Bases." The grant became effective on May 1, 1974, for a period of fifteen months. The project is intended to develop methods by which Ei and AlP can reduce their input costs by eliminating duplica- tion of intellectual effort and processing. Through sharing of the resources of the two organizations and an interchange of their respective data bases, AlP and Ei ex- pect to improve the utilization of these computer-readable data bases. The basic requirement for the develop- Technical Communications 317 ment of the interchange capability for computer-readable data bases is the es- tablishment of a compatible set of data elements. Each organization has unique data elements in its data base. It will therefore be necessary to determine which of the data elements are absolutely essen- tial to each organization's services which elements can be modified, and wh~t other elements must be added. Mter the list of data elements has been established, it will be possible to unite the specifications and programs for format conversions from AlP to Ei tape format and vice versa. Simultaneously, there will be the de- velopment of language conversion facil- ities between Ei' s indexing vocabulary and AlP's Physics and Astronomy Classifica- tion Scheme (PACS). It is also planned to investigate the possibility of establish- ing a computer program which can con- vert AlP's indexing to Ei's terms and vice versa. With the accomplishment of the above tasks, it will be possible to create new services and repackage existing services to satisfy the information demands in areas of mutual interest to engineers and physicists, such as acoustics and optics. ERIC Data Base Users Conference The Educational Resource Information Center (ERIC) held an ERIC Data Base Users Conference in conjunction with the 37th Annual Meeting of the American So- ciety for Information Science (ASIS) in Atlanta, Georgia, October 13-17, 1974. The ERIC Data Base Users Conference provided a forum for present and poten- tial ERIC users to discuss common prob- lems and concerns as well as interact with other components of the ERIC network: Central ERIC, the ERIC Processing and Reference Facility, ERIC Clearinghouse personnel, and information dissemination centers. Although attendees have in the past been primarily oriented toward ma- chine use of the ERIC files, all patterns of usage were represented at this confer- ence, from manual users of printed in- dexes to operators of national on-line re- h·ieval systems. 318 ]oumal of Library Automation Vol. 7/4 December 1974 A number of invited papers were pre- sented dealing with subjects such as: • The current state and future directions of educational information dissemina- tion. Sam Rosenfeld (NIE), Lee Burch- ina! (NSF). • What services, systems, and data bases are available? Marvin Gechman (In- formation General), Harvey Marron (NIE). • The roles of libraries and industry, re- spectively, in disseminating educational information. Richard De Gennaro (Uni- versity of Pennsylvania), Paul Zurkow- ski (Information Industry Association) . Several organizations (National Library of Canada, University of Georgia, Wiscon- sin State Department of Education) were invited to participate in "Show and Tell" sessions to describe in detail how they are using the ERIC system and data base. A status report covering ERIC on-line ser- vices for educators was presented by Dr. Carlos Cuadra (System Development Cor- poration) and Dr. Roger Summit (Lock- heed). Interactive discussion groups cov- ered a number of subjects including: • Computer techniques-programming methods, use of utilities, file mainte- nance, search system selection, installa- tion, and operation. • Serv:Uig the end user of educational in- formation. • Introduction to the ERIC system- what tools, systems, and services are available and how are they used? • Beginning and advanced sessions on computer searching the ERIC files. On- line terminals were used to demon- strate and explain use of machine ca- pabilities. COMMERCIAL SERVICES AND DEVELOPMENTS SCOPE DATA Inc. ALA Train Compatible Terminal Printers SCOPE DATA Inc. currently is offer- ing a high-speed, nonimpact terminal printer for use in various interactive print- ing applications. Capability can be included in the Se- ries 200 printer as an extra-cost feature to print the eight-bit ASCII character set for ALA character set with 176 characters. For further information contact Alan G. Smith, Director of Marketing, SCOPE DATA Inc., 3728 Silver Star Rd., Orlan- do, FL 32808. Institute for Scientific Information Puts Life Sciences Data Base On-Line through System Development Corporation The Institute for Scientific Information (lSI) has announced that it will collab- orate with System Development Corpora- tion (SDC) to provide on-line, interac- tive, computer searches of the life sciences journal literature. Scheduled to be fully operational by July 1, 1974, the ISI-SDC service is called SCISEARCH® and is de- signed to give quick, easy, and econom- ical access to a large life sciences litera- ture .file. Stressing ease of access, the SDC re- trieval program, ORBIT, permits sub- scribers to conduct extremely rapid liter- ature searches through two-way commu- nications terminals located in their own facilities. Mter examining the preliminary results of their inquiries, searchers are able to further refine their questions to make them broader or narrower. This di- alog between the searcher and the com- puter (located in SDC's headquarters in Santa Monica, California) is conducted with simple English-language statements. Because this system is tied in to a nation- wide communications network, most sub- scribers will be able to link their terminals to the computer through the equivalent of a local phone call. Covering every editorial item from about 1,100 of the world's most important life sciences journals, the service will ini- tially offer a searchable ille of over 400,000 items published between April 1972 and the present. Each month ap- proximately 16,000 new items will be added until the average size of the file to- tals about one-half million items and rep- resents two-and-one-half years ·of cover- age. To assure subscribers maximum re- trieval effectiveness when dealing with this massive amount of information, the data base can be searched in several ways. Included are searches by keywords, word stems, word phrases, authors, and organi- zations. One of the search techniques utilized-citation searching-is an exclu- sive feature of the lSI data base. For every item retrieved through a search, subscribers can receive a complete bibliographic description that includes all authors, journal citation, full title, a lan- guage indicator, a code for the type of item (article, note, review, etc.), an lSI accession number, and all the cited ref- erences contained in the retrieved article. The accession number is used to order full-text copies of relevant items through lSI's Original Article Tear Sheet service (OATS®). This ability to provide copies of every item in the data base distin- guishes the lSI service from many others. Current Library of Congress Catalog On-Line for Reference Searches Information Dynamics Corporation (IDC) has agreed to collaborate with Sys- tem Development Corporation (SDC) to provide reference librarians, researchers, and scholars with on-line interactive com- puter searches of all library materials being cataloged by the Library of Con- gress. Scheduled to be fully operational as of October 1, 1974, the SDC-IDC ser- vice is called SDC-IDC/LIBCON and is designed to give quick, easy, and eco- nomical access to a large portion of the world's scholarly library materials. As in the lSI service described above, the data base can be searched in several ways. Included are compound logic searches by keywords, word stems, word phrases, authors, organizations, and subject headings for most English materials. One of the search techniques utilized-string searching-is an exclusive feature of SDC's ORBIT system. Keyword searching of cat- aloged items including all foreign materi- als processed by the Library of Congress Technical Communications 319 is an exclusive feature of the IDC data base not currently available in other on- line MARC files. For individual items retrieved through a search, subscribers can receive a biblio- graphic description that includes authors, full title, an IDC accession number, the LC classification number, and publisher information. STANDARDS The ISAD Committee on Technical Stan- dards for Library Automation Invites Your Participation in the Standards Game Editor's Note: The TESLA Reactor Bal- lot will be provided in f01'thcoming is- sues. To use, photocopy the ballot fol'm, fill out, and mail to: John C. Kountz, Associate for Library Automation, Of- fice of the Chan{Jellor, The California State University and Colleges, 5670 Wilshire Blvd., Suite 900, Los Angeles, CA 90036. THE PROCEDURE This procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ALA) standards proposals to provide rec- ommendations to ALA's representatives to existing, recognized standards organiza- tions. To enter the procedure for an initia- tive standards proposal you must complete an "Initiative Standards Proposal" using the outline which follows: Initiative Standard Proposal Outline- The following outline is designed to facili- tate review by both the committee and the membership of initiative standards pro- posals and to expedite the handling of the Initiative Standard Proposal through the procedure. Since the outline will be used for the review process, it is to be followed ex- plicitly. Where an initiative standard re- quirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi- 320 Journal of Library Automation Vol. 7/4 December 1974 cated by: VI. Existing Standards. Not Ap- plicable). Nate that the parenthetical statements following most of the outline entry de- scriptions relate to the ANSI Standards Proposal section headings to facilitate the translation from this outline to the ANSI format. All Initiative Standards Proposals are to be typed, double spaced on 8~~~~ x 11" white paper (typing on one side only) . Each page is to be numbered consecu- tively in the upper right-hand corner. The initiator's last name followed by the key word from the title is to appear one line below each page number. I. Title of Initiative Standard Pro- posal (Title) . II. Initiator Information (Forward). A. Name B. Title C. Organization D. Address E. City, State, Zip F. Telephone: Area Code, Num- ber, Extension III. Technical area. Describe the area of library technology as under- stood by initiator. Be as precise as possible since in large measure the information given here will help determine which ALA offi- cial representative might best handle this proposal once it has been reviewed and which ALA organizational component might best be engaged in the review process. IV. Purpose. State the purpose of Standard Proposal (Scope and Qualifications) . V. Description. Briefly describe the Standard Proposal (Specification of the Standard). VI. Relationship of other standards. If existing standards have been identified which relate to, or are felt to influence, this Standard Proposal, cite them here (Ex- pository Remarks). VII. Background. Describe the re- search or historical review per- formed relating to this Standard Proposal (if applicable, provide a bibliography) and your find- ings (Justification). VIII. Specifications. (Optional) Speci- fy the Standard Proposal using record layouts, mechanical draw- ings, and such related documen- tation aids as required in addi- tion to text exposition where ap- plicable (Specifications of the Standard). Kindly note that the outline is designed to enable Standards Proposals to be writ- ten following a generalized format which will facilitate their review. In addition, the outline permits the presentation of back- ground and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. TESLA REACTOR BALLOT Identification Number For Standing Requirement Reactor Information Name-----'----------- TiUe ______________________ ___ Organization --------------- AddrMs _____________ ___ City ___ _ State ___ Zip __ _ TelephoneA 1:-:::-REA::+----~--- Need (For This Standard) For D Against 0 Specification (A Presented in This Requirement) For 0 Against 0 EXT. Can You Participate In The Development Of This. Standard -.,.---------==----- 0 No D Yes Reason For Position: (Use Format Of Proposal. · Additional Pages Can Be Used If Required) The Reactor Ballot is to be used by members to voice their recommendations relative to Initiative Standards Proposals. The Reactor Ballot permits both "for" and "against" votes to be explained, per- mitting the capture of additional informa- tion which is necessary to document and communicate formal Standards Proposals to standards organizations outside of the American Library Association. As you, the members, use the outline to present your Standards Proposals, TESLA will publish them in JOLA-TC and solicit membership reaction via the Reactor Ballot. Throughout the process TESLA will insure that Standards Pro- posals are drawn to the attention of the applicable American Library Association division or committee. Thus, internal re- view usually will proceed concurrently with membership review. From the review and the Reactor Ballot TESLA will pre- pare a "majority recommendation" and a "minority report" on each Standards Pro- posal. The majority recommendation and minority report so developed will then be transmitted to the originator, and to the official American Library Association representative on the appropriate stan- dards organization where it should prove a source of guidance as official votes are cast. In addition, the status of each Stan- dards Proposal will be reported by TESLA in JOLA-TC via the Standards Scoreboard. The committee (TESLA) it- self will be nonpartisan with regard to the proposals handled by it. However, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. INPUT To the Editor: We have been asked by the members of the ALA Interdivisional Committee on Representation in Machine Readable Form of Bibliographic Information, (MARBI) to respond to your editorial in the June 1974 issue of the Journal of Li- brary Automation. This editorial dealt with the Council of Library Resources' [sic] in- volvement in a wide range of projects, ranging from the sponsorship of a group which is attempting to develop a subset of MARC for use in inter-library exchange Technical Communications 321 of bibliographic data ( CEMBI), to man- agement of a project which has as its goal the creation of a national serials data base, (CONSER), and, more recently, to the convening of a conference of Library and A&I organizations to discuss the outlook for comprehensive national bibliographic control. You raised several legitimate questions: 1) Has sufficient publicity been given to these activities of the Council so that all, not just a few, libraries are aware of what is happening and have an opportunity to exert an influence on developments? and, 2) Is the Council bypassing existing chan- nels of operation and communication? You also suggest that proposals from groups such as CEMBI be channeled through an official ALA committee such as MARBI for intensive review and eval- uation. It should be pointed out that MARBI is not charged with the development of standards. It acts to monitor and review proposals affecting the format and content of machine readable bibliographic data, where that data has implications for na- tional or international use. This applies to proposals emanating from CEMBI and CONSER as well as from other concerned groups. All indications to date are that the Council is fully aware of MARBI's role and will not bypass MARBI. A number of members of MARBI are also members of CEMBI and MARBI is represented on the CONSER project. Also reassuring is the fact that, unless we allow LC to fall by the wayside in its role as the primary creator and distributor of machine readable data, any standards for format or content developed by a Council-sponsored group will eventually be reflected in the MARC records dis- tributed by LC. The Library of Congress has issued a statement, published in the June 1974 issue of JOLA, to the effect that it will not implement any changes in the MARC distribution system which are not acceptable to MARBI. MARBI and LC have worked out a procedure where- by all proposed changes to MARC are submitted to MARBI. They are then pub- lished in ]OLA and distributed to mem- 322 Journal of Library Automation Vol. 7/4 December 1974 hers of the MARC Users Discussion Group for comments. Comments are col- lected and evaluated by MARBI and a re- port submitted to LC, with its recom- mendations. The MARBI review process does not guarantee perfection and there is no as- surance that everyone will be satisfied. Compromise and expediency are the name of the game in this extremely complicated and uncharted area of standards for ma- chine readable bibliographic data. How- ever the Council has undoubtedly learned from the ISBD(M) experience that it cannot make decisions which affect li- braries without the greatest possible in- volvement of librarians. It is the feeling of the MARBI committee members that the Council intends to work with MARBI in future projects which fall into MARBI's area of concern. Velma Veneziano MARBI Past Chairperson Ruth Tighe Chairperson Editor's note: It is gratifying to note that MARBTs response reflects the opinions expressed in the June 1974 editorial. The library community will doubtless. be pleased to learn of CLR's intention to work closely with MARBI.-SKM To the Editor: As briefly discussed with you, yom edi- torial in the June 1974 issue of JOLA is both admirable and disturbing (to me, at least). The problem of national leadership in the area of library automation is a crit- ical problem indeed. Being in the ''boon- docks" and far removed from the scene of action, I can only express to you my perception as events and activities filter through to me. I can remember as far back as 1957 when ADI had a series of meetings in Washington, D.C. trying to establish a national program for biblio- graphic automation. I have been through eighteen years of meetings, committees, conferences, etc. concerned with trying to develop a national plan for bibliograph- ic automation and information storage and retrieval systems. I have worked with NSF, USOE, Department of Commerce, U.S. Patent Office, engineering and tech- nical societies, DOD Agency-the entire spectrum. I spent a good many years working in ADI and ASIS, SLA, and- most recently ALA. At no time were we able to make significant progress towards a national system. Even the great Airlie House Conference did not produce any significant changes in the fragmented, competitive "non-system." It has only been in the recent past since CLR has taken an aggressive posture that I am able to see the beginning of orderly develop- ment of a national automated bibliograph- ic system. I certainly agree that any topic as critical as those being discussed by CEMBI should be in the public domain, but I also believe that the progress made by CEMBI would not have been possible without CLR taking the initiative in get- ting these key agencies together. Thank goodness someone quit talking and started doing something at the national level! I sincerely believe that in the absence of a national library and with the cmrent lack of legally derived authority in this arena, CLR provides a genuine service to the total library community in establish- ing CEMBI. Hopefully, your very excel- lent article (in the same issue of JOLA) on "Standards for Library Automation ... " will help to put the entire issue of bibliographic record standards into per- spective. As a former chemist and corro- sion engineer, I am fully aware of the ab- solute necessity for technical standards. I am also fully aware of the necessity of developing technical standards through the process you outlined in your article. Hopefully, CLR action with CEMBI will expedite this laborious process and help to push our profession forward into the twentieth century. Since we ourselves have not been able to do it through all these years, I am personally grateful that some group such as CLR took the initia- tive and forced us to do what we should have done years ago. Maryann Duggan SLICE Office Di1·ector Editor's note: Positive action and progms- sive movement are, of course, desirable and are often lacking in large organiza- tions. However, posit·ive action without communication of this action to the affect- ed population can only be detrimental. On issues of the complexity of those ad- dressed by CEMBI and CONSER, review by the library community is always use- ful, even though action may be tem- porarily delayed.-SKM To the Editor: On page 233 of the September issue of lOLA there is a report from the Informa- tion Industry Association's Micropublish- ing Committee Chairman (Henry Powell). He states that", .. the committee spelled out several areas of concern to micropub- lishers which will be the subject of com- mittee action .... " One of the concerns of the committee is that a Z39 standards committee has recommended "standards covering what micropublishers can say about their products." (Emphasis mine.) Technical Communications 323 As Chairman of the Z39 Standards Sub- committee which is developing the adver- tising standard referred to, I wish to point out that there is no intention on the part of the Subcommittee to tell micropublish- ers what they can say nor what they may say about their products. The Subcommit- tee, which is composed of representatives from three micropublishing concerns, two librarians, and myself, has from the be- ginning taken the view that the purpose of the standard would be to provide guid- ance for micropublishers and librarians alike. We are most anxious that no one feel that the Subcommittee has any inten- tion of attempting to use the standards mechanism to tell any micropublisher how he must design his advertisements. In ad- dition it should be noted that no ANSI standard is compulsory. Carl M. Spaulding Program Officer Council on Library Resou1·ces 8963 ---- 324 Journal of Library Automation Vol. 7/4 December 1974 Book Reviews Current Awareness and the Chemist, A Study of the Use of CA Con.densates by Chemists, by Elizabeth E. Duncan. Me- tuchen, N.J.: Scarecrow Press, 1972. 150p. $5.00. This book starts with a five-page fore- word by Allen Kent entitled "KWIC in- dexes have come a long way-or have they?" Kent is always interesting but when one detects that his foreword is be- coming almost an. apologia, one wonders just what is to come. The remainder of the book (apart from the index) appears al- ready to have been presented as Dr. Dun- can's Ph.D. thesis at the University of Pittsburgh. The first two chapters are the usual sort of stuff, taking us from Alexan- dria in the third century to Columbus, Ohio in 1970, with undistinguished re- views of user studies and the history of the Chemical Abstracts Service. The remaining sixty-four pages of text report and discuss a study of the use of CA Condensates by quite a small sample of academic and industrial chemists in the Pittsburgh area. The objective appears to have been to compare profile hits with pe- riodical holdings and interlibrary loan re- quests at the client's library so that a de- cision model for the acquisition of periodi- cals could be developed. On the author's own admission, this objective was not achieved. A certain amount of data is pre- sented but it is difficult to draw many con- clusions from it, other than the fact that chemists do not appear to follow up the majority of profile hits that they receive nor do they use the current issues of Chemical Abstracts very frequently. It is difficult to understand why this material was published in book form. It could have been condensed to one or pos- sibly two papers for ].Chem.Doc. or per- haps even left for the really diligent seek- er to find on the shelves of University Microfilms-but, as the Old Testament scribe bemoaned, "Of making many books there is no end." At the bottom of page 118 a reference is made to the paper by Abbott et al. in Aslib Proceedings (Feb. 1968); at the top of page 119 the same paper's date is given as January 1968. Other errors are less obvious, but one real- ly questions whether the provision of a short foreword and an index makes even a good thesis worth publishing in hard covers. R. T. Bottle The City University London, U.K. Computer-Based Reference Service, by M. Lorrai'ne Mathies and Peter G. Wat- son. Chicago: American Library Assn., 1973. 200p. $9.95. The archetypal title and model for all works of explication is ....... without Tears. Lorraine Mathies and Peter Watson have attempted the praiseworthy task of explaining computer-produced indexes to the ordinary reference librarian, but for a number of reasons, some of them prob- ably beyond the control of the authors, the tears will remai'n, Perhaps one difficulty is that this book was, in its beginnings at least, the product of a committee. Back in 1968 the Informa- tion Retrieval Committee of the Reference Services Division of the ALA wanted to present to "working reference librarians the essentials of the reference potential of computers and the machine-readable data they produce" (p.xxix). The proposal worked its way (not untouched, of course) through several other groups and eventually resulted in a preconference workshop on computer-based reference service being given at the Dallas conven- tion of 1971. The present book is based on the tutor's manual which Mathies and Watson prepared for that workshop but incorporates revisions suggested by the ALA Publishing Services as well as chan- ges initiated by the authors themselves. With so many people getting into the planning act, it is not surprising that the various parts of the book should end up by working at cross purposes to each oth- er. Unfortunately, the principal conflicts come at just those points where a volume of exposition needs to be most definite and precise: just what is the book trying to do and for whom? At the original workshop, the ERIC data base was chosen as a "model system" since educational termi- nology was more likely to be understood than that of the sciences. And because the participants were to learn by doing, they were told a great deal about ERIC so as to be able to "practice" on it. The trouble is that these objectives do not translate well from workshop to print. The detafls about ERIC, which may have been necessary as tutors' instructions, seem misplaced in book form. Almost half the present book is devoted to a laborious explanation of how ERIC works and this is a great deal more than most workaday reference librarians will want to know about it. Moreover, it is no longer clear whether Mathies and Watson aim to train "producers" or "consumers." The welter of detail suggests that they expect their readers to learn hereby to construct pro- files and to program searches but it is highly doubtful that skills of this kind can or should be imparted on a "teach your- self" basis. Once Mathies and Watson leave ERIC behind, they seem on surer ground. Part II (Computer Searching: Principles and Strategies) begins with a fairly routine chapter on binary numeration which is perhaps unnecessary since this material is easily available elsewhere. However, the section quickly moves on to an excellent explanation of Boolean logic and weight- ing, describes their application in the formulation of search strategies, and ends with an admirably succinct and demysti- fying account of how one evaluates the output (principles of relevance and re- call). The reader might well have been better served if the book had indeed be- gun with this part. The last section (Part III: Other Ma- chine Readable Data Bases) is also very useful, particularly for the "critical bibli- ography" (p.153) in which the authors describe and evaluate ten of the major bibliographic data bases. This critical bib- liography is apparently a first of its kind, which makes the authors' perceptive and frank comments all the more welcome. Part III also contains chapters on MARC and the 1970 census but, sh·angely enough, does not include a final resume and conclusions. It is true that in each Book Reviews 325 chapter there is a paragraph or so of sum- mary but this is hardly a satisfactory sub- stitute for the overall recapitulation one would expect. In the final analysis, indeed, one's view of the book will depend on just that- what one expects of it. If "working refer- ence librarians" expect to read this book in order to be no longer "intimidated by these electronic tools" (p.ix), they are apt to be disappointed. The inordinate em- phasis on ERIC, the rather dense lan- guage, and the fact that the main ideas are never pulled together at the end will all prevent easy enlightenment. However, if our workaday reference librarians are willing to work their way through a fairly difficult manual on computer-based index- ing as in effect a substitute 'for a work- shop on the subject, they will find this book a worthwhile investment of their time-and tears. Samuel Rothstein School of Lihl'arianship University of British Columbia The Circulation System at the University of Missouri-Columbia Library: An Evolu- tionary Approach. Sue McCollum and Charles R. Sievert, issue eds. The Larc Reports, vol. 5, issue 2, 1972. 101p. In 1958 the University of Missouri-Co- lumbia Library was one of the first li- braries to mechanize circulation by punch- ing a portion of the charge slip with book and borrower and/ or loan information. In 1964 an IBM 357 data collection sys- tem utilizing a modified 026 keypunch was installed, but not until 1966 was 026 output processed on the library owned and operated IBM 1440 computer. How- ever, budgetary constraints forced a trans- fer of operations in 1970 to the Data Pro- cessing Center, which undertook rewrit- ing of library programs in 1971. After explanation of hardware changes and an overview of the circulation depart- ment organization and Data Processing Center operation, this report deals in depth with the major files of the circula- tion system-circulation master flle and location master file-and the main compo- nents of the circulation system-edit, up- date, overdues, fines, interlibrary loans, 326 Journal of Libmry Automation Vol. 7/4 December 1974 address file, location file, reserve book, listing of files, special requests, and utility programs. Many examples of report lay- outs are included, particularly those ac- complished by utilizing data gathered from main collection and reserve book loans. Although this off-line batch processing circulation system is limited in that it does not handle any borrower reserve or look- up (tracer) routines, both of which are possible in off-line systems, the University of Missouri-Columbia system has merit as a pioneer system which influenced other university library circulation system de- signs in the 1960s. Detailed reference giv- en throughout the report to changes in the original library programs not only makes it of value as a case history for any library interested in circulation automation but also indicates the important fact that li- brary programs do change and evolve in response to new demands and technologi- cal capabilities. Lois M. KershnM University of Pennsylvania Libraries National Science Information Systems, A Guide to Science Information Systems in Bulgaria, Czechoslovakia, Hungary, Po- land, Rumania, and Yugoslavia, by David H. Kraus, Pranas Zunde, and Vladimir Slamecka. (National Science Information Series) Cambridge, Mass.: The M.I.T. Press, 1972. 325p. $12.50. As indicated by the title, this volume provides a comparative description and analysis of the various organizational or political structures which have been adopted by six counb·ies of central and eastern Europe in their attempts to devel- op effective national systems for the dis- semination of scientific and technical in- formation. For each country there is a de- tailed account of the national information system now existing, with a brief outline of its antecedents, a directory of informa- tion or documentation centers, a list of serials published by these centers, and a bibliography of recent papers dealing with the development of information systems in that country. This main section of the book is pre- ceded by a brief review of the common characteristics of the six national systems and an outline of steps being taken to achieve international cooperation for the exchange of information in specific sub- jects. Of particular interest is the descrip- tion of the International Center of Scien- tific and Technical Information estab- lished in Moscow in 1969, and which is now linked to five of these national sys- tems. No attempt is made to describe the techniques being used to store, retrieve, and disseminate information. The authors point out that the six coun- tries being examined "have experimented intensely with organizational variants of national science information systems." Un- fortunately, they do not attempt to indi- cate which of these organizational struc- tures was most effective in bringing about the desired results. Undoubtedly, this would have been an impossible task and probably not worth the effort, since a suc- cessful type of organization in a socialist country would not necessarily be effective in a democracy. The book will be of interest to political scientists and to those seeking the most effective ways of coordinating the infor- mation processing efforts of all types of government bodies. It will be only of academic interest to the information spe- cialist concerned primarily with informa- tion processing techniques. Jack E. Brown National Science Library of Canada Ottawa Information Retrieval: On-Line, by F. W. Lancaster and E. G. Fayen. Los Angeles: Melville Publishing Co., 1973. 597p. LC: 73-9697. ISBN: 0-471-51235-4. Have you been reading the ASIS An- nual Review of Information Science and Technology year after year and wishing for a compendium of the best informa- tion and examples of the latest systems, user manuals, cost data, and other facts so that you would not have to go search- ing in a library for the interesting reports, journal articles, and books? Well, if you have (and who hasn't), your prayers have been answered if you are interested in on- line bibliographic retrieval systems. The authors of the handy reference book have collected and reprinted, among other things, the complete DIALOG Terminal Users Reference Manual, the SUPARS User Manual, the user instructions for AIM-TWX, OBAR, and the CARUSO Tu- torial Program. Each of these systems, and several others (arranged alphabetically from AIM-TWX [MEDLINE] to TO XI CON [TOXLINE]), is described and illustrated. Features and functions of on-line systems, such as vocabulary con- trol and indexing, cataloging, instruction of users, equipment, and file design, are all covered in a straightforward manner, simply enough for the uninformed and carefully enough so that a system operator could compare his system's features and functions with the data provided. Richly illustrated with tables, charts, graphs, and figures, up-to-date bibliographies (only serious omission noticed was the AFIPS conference proceedings edited by D. Walk- er), and subject and author indexes, this volume will stand as another landmark in the state-of-the-art review series which the Wiley-Becker & Hayes Information Sci- ence series has come to represent. Emphasis has been placed on the de- sign, evaluation, and use of on-line re- trieval systems rather than the hardware or programming aspects. Several of the chapters have a broader base of interest than on-line systems, covering as they do performance criteria of retrieval systems, evaluating effectiveness, human factors, and cost-performance-benefits factors. Easy to use and as up to date and bal- anced a book as any in a rapidly chang- ing field can be, Lancaster and Fayen have given students of information studies and planners and managers of information services a very valuable reference aid. Pauline A. Atherton School of Information Studies Syracuse University National Library of Australia. Australian MARC Specification. Canberra: National Library of Australia, 1973. 83p. $2.50. ISBN: 0-642-99014-X For those readers who are familiar with Book Reviews 327 the Library of Congress MARC format, the Australian MARC specification will be, for the most part, self-explanatory. The intent of the document is to describe the basic format structure and to list the various content designators that are used in the format. No effort was made to in- clude any background information or ex- planation of data elements. Because of this, the reviewer found it necessary to re- fer to other documents, e.g., PRECIS: A Rotated Subiect Index System, by Derek Austin and Peter Butcher, in order to complete a comparative analysis of the Australian format with other similar for- mats. Perhaps the value of reviewing a de- scriptive document of this type lies in dis- covering how the format it describes com- pares to other existing formats developed for the same purpose. The International Organization for Standardization published a format for bibliographic information interchange on magnetic tape in 1973, International Stan- dard ISO 2709, The Australian format structure is the same throughout as the in- ternational standard. The only variance is in character positions 20 and 21 of the leader, which the Australian format left undefined. A comparison of content designators cannot be made with the international standard because it specifies only the po- sition and length of the identifiers in the structure of the format, but not the actual identifier (except for the three-digit tags 001-999 that identify the data fields). The best comparison of content designa- tors can be made with the LC MARC for- mat, since the Australian format uses many of the same tags, indicators, and subfield codes for the same purposes. The Australian format has assigned to the same character positions the same fixed-length data elements as the LC for- mat except for position 38, which is the Periodical Code in the Australian format and the Modified Record Code in the LC format. In the fixed-length character. posi- tions for Form of Contents, Publisher (Government Publication in LC MARC), and Literary Text (Fiction in LC 328 Journal of Library Automation Vol. 7/4 December 1974 MARC) , the Australian format assigned different codes than LC. In general, the Australian format uses the same three-digit tags as LC to identify the primary access fields in their records, e.g., 100, 110, 111 for main entries; 400, 410, 411, 440, 490 for series notes; 600, 610, 611, 650, 651 for subject headings; and 700, 710, 711 for added entries. For the remaining bibliographic fields there are some variations in tagging between the two formats. The Australian MARC has chosen a different method of identify- ing uniform titles, and has identified five more note fields in the 5XX series of tags than has LC. The Australians have also added some manufactured fields to their record. These fields do not contain actual data from the bibliographic record, but rather are fields consisting of data created by program for control and manipulation purposes, or from lists such as the PRECIS subject index. The Australian for- mat has also included, as part of its rec- ord, a series of cross-reference fields iden- tified by 9XX tags. LC has reserved the 9XX block of tags for local use. The use of indicators differs in most in- stances between the two formats. Both al- low for two indicator positions in each field as specified by the international stan- dard format structure. However, the in- formation conveyed by the indicators dif- fers except where the first indicator con- which means no intelligence carried in this position. In the Australian format the indicators in the 6XX block of tags have three different patterns. Inconsistency of this kind does not tend to destroy com- patibility with other coding systems using the same format structure, as long as suf- ficient explanation and examples are given from which conversion tables may be de- veloped by the institutions with whom one wants to exchange, or interchange, bibliographic data. An even greater degree of difference exists between the two formats in the subfield codes used to identify data ele- ments. The Australian MARC has identi- fied some data elements that LC has not, e.g., in personal name main entries, the Australian record identifies first names with subfield code "h," whereas LC does not identify parts of a personal name, only the form of the name, i.e., forename form, single surname, family name, etc. In most of the fields the two formats have defined some of the same data elements, but each uses a different subfield code to represent the element. In the Australian document, under each field heading, the subfield codes are listed alphabetically with a data element following each code. This ar- rangement causes the data elements to fall out of their normal order of occur- rence in the field, i.e., name, numeration, titles, dates, relator, etc. For example: Personal name main entry (tag 100) Subfield Code a b Amtralian MARC Entry element ( name) Relator LC MARC Entry element (name ) Numeration c Dates d e Second or subsequent additions to name Numeration Titles ( Honorary) Dates Relator f Additions to name other than date Date (of a work) veys form of name for personal and cor- porate name headings. Within each block of tags, LC has made an effort to remain consistent in the use of indicators, e.g., in the 6XX block for subject headings, the first indicator specifies form of name where a form of name can be discerned. Where no form of name is discernable such as in a topical subject heading (tag 650), a null indicator or blank is used The example demonstrates the need for precise definition and documentation of data elements for the purpose of conver- sion or translation when interchanging data with other institutions. The Australian format has included the capability of identifying analytical en- tries by using an additional digit (called the level digit) placed between the tag and the indicators to identify the analyt- ical entries. A subrecord directory (tag 002) is present in each record containing data for analytical entries. The Australian document includes ap- pendixes for the Country of Publication Codes, Language Codes, and Geograph- ical Area Codes that were developed by the Library of Congress. Their only devia- Book Reviews 329 tion from LC MARC usage is in the Country of Publication Codes, where the Australians have added entities and codes for Australian first-level administrative subdivisions. Patricia E. Parker MARC Development Office Library of Congress