lib-MOCS-KMC364-20140106083504


THE RECON PILOT PROJECT: A PROGRESS REPORT 
OCTOBER 1970-MAY 1971 

159 

Henriette D. AVRAM and Lenore S. MARUYAMA: MARC Development 
Office, Library of Congress, Washington, D. C. 

Synopsis of three progress reports on the RECON Pilot Project submitted 
by the Library of Congress to the Council on Library Resources covering 
the period October 1970-May 1971. Progress w reported in the following 
areas: RECON production, foreign language editing test, format recogni-
tion, microfilming, input devices, and tasks assigned to the RECON Working 
Task Force. 

INTRODUCTION 

With the implementation of the MARC Distribution Service in March 1969, 
the Library of Congress and the library community have had available in 
machine readable form the catalog records for English language mono-
graphs cataloged since 1969. Most libraries, however, also need to convert 
their older cataloging records, and the Library of Congress attempted to 
meet these needs by establishing the RECON Pilot Project in August 1969. 
During the two-year period of the pilot project, various techniques for 
conversion of retrospective bibliographic records have been tested, and a 
useful body of catalog records is being converted to machine readable form. 

The pilot project is being supported with funds from the Library of 
Congress, the Council on Library Resources, and the U.S. Office of Educa-
tion. Earlier articles in the Journal of Library Automation have described 
the progress through September 1970 ( 1, 2, 3 ). This article covers the 
period October 1970 through May 1971. 


160 Journal of Library Automation Vol. 4/3 September, 1971 

PROGRESS-OCTOBER 1970 THROUGH MAY 1971 

RECON Production 

The conversion of 8476 records in the 1969 and 7-series of card numbers 
that had not been included in the MARC Distribution Service was com-
pleted, and these records were sent to 47 subscribers of the MARC Distri-
bution Service. The subscribers were not charged for these records but 
were asked to send a tape reel to the Library for the duplication process. 

At present, the RECON data base consists of 25,206 records in the 7, 
1969, and 1968 series of card numbers. Records in the 1968 series that 
were part of the data base for the MARC Pilot Project are being converted 
by program from the MARC I format to the MARC II format, proofed, 
and updated. To date, 7551 out of 7583 MARC I records have been 
processed. Prior to the implementation of the MARC Distribution Service, 
records were input for test purposes, and the resulting practice tapes 
contain data requiring correction or updating to correspond with the present 
specifications of the MARC II format. Of the 8340 titles on the practice 
tapes, 3460 have been updated and reside on the RECON master file. These 
updated machine readable records will be distributed with the RECON 
titles in the 1968 card series. 

Foreign Languages Editing Experiment 

A foreign language editing experiment was conducted to test the 
accuracy of MARC/RECON editors in editing French and German lan-
guage records. Records used for this test included 1180 of the 5000 RECON 
research titles. At least 50 percent accuracy was expected since half of the 
task of editing a MARC record involves being able to read the language 
of the record. The other half involves identifying the data elements by 
their location in the record. The three editors used in the experiment had 
studied French in high school, one having had an additional year in 
college; none had studied German. Each editor was required to edit 
approximately 200 records in each language. 

Statistics on the number of records edited per hour and the number of 
errors made, when compared with the same editors' statistics for editing 
English language records, showed that each editor maintained an approx-
imately equal rate of speed in editing foreign language records as in 
editing English. The error rate for each editor, however, was more than 
tripled on foreign records, and each made approximately as many errors 
in French (the language studied) as in German. Each editor averaged 
more than 12 errors per batch in French and 12 in German. Since the 
MARC Editorial Office has established a standard of 2.5 errors per batch 
( 20 records comprising a batch ) as being acceptable for trained MARC 
editors, this error rate would have to be lowered in a production 
environment. 

The majority of errors occurred in the title field, which is a portion of 


The RECON Pilot ProjectjAVRAM and MARUYAMA 161 

the record that must be read for content in order to be edited correctly. 
The second largest number of errors occurred in the fixed fields, which are 
also dependent upon a reading knowledge of the language of the record 
for accurate coding. 

The number of errors made in each batch of records by each editor was 
tabulated to determine if any improvement was made during the course 
of the experiment. In no case was improvement noted. Statistics were 
also kept on the number of times an editor consulted various sources for 
help: e.g., dictionaries, the editing manual, the LC Official Catalog, the 
reviser, or a language specialist. Dictionaries were consulted frequently, 
and the reviser and language specialists rarely. 

Typing statistics (number of errors) were also recorded for 181 French 
and 185 German records. The error rate for typing foreign language material 
was lower than for typing English. The English language statistics, how-
ever, were combined for several typists, and the foreign language statistics 
were for one typist only. Charts showed that there was no improvement 
in the number of typing errors made at the end of the test. 

The primary conclusion drawn from the results of the experiment is that 
in order to edit foreign language records with an acceptable degree of 
accuracy, it would be necessary for the editor to have a good knowledge 
of the language as well as the editing procedures. 

F orrnat Recognition 

Format recognition is a technique that allows the computer to process 
unedited bibliographic records by analyzing data strings for certain key-
words, significant punctuation, and other clues to determine proper identifi-
cation of data fields. The Library of Congress has been developing this 
technique since early 1969 in order to eliminate substantial portions of the 
manual editing process, which in turn should represent a considerable 
savings in the cost of creating machine readable records. 

The RECON report, which was written prior to the completion of the 
first format recognition feasibility study, concluded that "partial editing 
combined with format recognition processing is a promising alternative to 
full editing." ( 4) Since that time, the emphasis in the deve1opment of the 
programs has been shifted to no editing prior to format recognition pro-
cessing. The programs are in the final stages of acceptance testing, and 
it is expected that 75% of the records can be processed without errors 
created by the format recognition programs. Preliminary estimates show 
that it takes approximately half a second of machine time to process one 
record by format recognition ; the manual editing process, on the other 
hand, takes approximately six minutes per record. The total amount of 
core storage required is approximately 120K: 80K for the programs and 
40K for the keyword lists. Although the keyword lists are maintained as 
a separate data set on a 2314 disk pack, they are loaded into memory 
during processing. The format recognition programs have been written 


162 Journal of Library Automation Vol. 4/3 September, 1971 

in Assembler Language for the Library's IBM 360/40 under DOS. 
The logical design of the format recognition process, with detailed flow 

charts needed for implementation of computer programming, has been 
published as a worki~;tg document by the American Library Association 
so that the technical content would be available to assist librarians in 
their automation projects ( 5). 

Workflow for format recognition begins with the input of unedited 
catalog records via the MT /ST following the typing specifications created 
for format recognition. Mter being processed by the format recognition 
programs, these records are proofed by the editors (the first instance in 
which they see the records), and the necessary corrections or verifications 
made. Correction procedures for format recognition records are the same 
as those used for regular MARC records. Figures 1, 2, and 3 are examples 
of the printed card used for input, the MT /ST hard copy, anq the proofsheet 
of the record created by format recognition. 

Initial use of the format recognition programs is for input of approx-
imately 16,000 RECON records in the 1968 card series. Input of current 
MARC records via format recognition will begin at a later date. RECON 
records were chosen for large-scale testing because they are not required 
for an actual production operation such as the MARC Distribution Service. 
In addition, work has begun on the expansion of format recognition to 
foreign languages. Analysis is being done on German and French mono-
graph records, and eventually Spanish, for new or expanded keyword lists 
and some changes to the algorithms. 

Ewart, Andrew. 
The world's greatest Ion' n If airs. London. Odhu m~. Hl(;j 

ti. e. 19681• 

287 p. 8 plates, lllus .. ports. 2~ em . 20/-
( n 68-<m.•r) 

1. Love. 2. Biography. r. TitlE>. 

Library of Congress 0 
301.41'4'0922 liR-97457 HQ80l.A2EO 

Fig. 1. Input for Format Recognition. 


The RECON Pilot Pro;ect/AVRAM and MARUYAMA 163 

HQ80l.A2E9 
Ewart, Andrew 
The world's greatest love affairs.#London, Odhams, 1967 [i. e. 1968]. 
287 p. 8 plates, illus., ports. 22 em. 25/-
(B68-03757) 
l.L 

Love. 
2. Biography. 
I. Title. 
301.41/4/0922 
68-97457 
Library of Congress 

Fig. 2. MT j ST Hard Copy. 

050/ 1 

100/1 

68-97457 

CAL :$ab 
---·- -------- -- --·---. ----

MEPS :ta *Ewnrt , 1\ndrew. 
--------- ------ ---·--------- ----------------
245/ 1 TILA~ *The world's greatest love affair s. 

260/ 1 I MP *abc *London , *Odhams, *1 967 [i.e. 1 968) . 
---- ·--- ---------------- -------·--------

300/1 COL *abc *287 p . *8 p lates , illus., ports , 22*cm . 

350/1 PRI *e. 
015/ 1 :mrHa *B68-03757 

650/ 1 SUT-L*a *Love . 
----------- -- --------·- -- -----·---

650/2 SUT-L*a *Biography. 

0 

- - --- ------- --- - -----------------------
08 2/1 DDC*a *301.41 /4/0 922 

--;o'Bft~c--=-~ --==~-- ~--~--- - 1 ·~-_--2 -~--~-=i-;_ ~-:-_---;---_ ~ . ~~== 
c. c. 

~- -r11r..~-~1Tl~.b~·-+13~.--~1*~~.--~1~5~.~etmtyr-

- -------------·----- - -- -- M-;-s-- 21-.-l%&-H.-------r-3-;-en-!r-Z*;-aef'-2-5.--- ---

------ ----- -- . - - -- -2-6-; --~.-m--~--T9~--'*l-;-----7r.-----

Fig. 3. Proofsheet of Format Recognition R ecord. 

Microfilming 

For a full-scale retrospective conversion project at the Library of Con-
gress, it is likely that records for input would be microfilmed from the 
Card Division record set and updated from the corresponding records in 
the Library's Official Catalog. A subset of the record set, such as the 
catalog cards for a given year, would be microfilmed and then the appro-
priate records, i.e., English language monographs, German monographs, 
etc., would be selected after filming. Costs were calculated for a base 
figure of 100,000 records for the year 1965, and four different methods of 


164 Journal of Library Automation Vol. 4/3 September, 1971 

microfilming have been estimated as follows by the Library's Photodupli-
cation Service: 1) microfilming for a direct-read optical character reader 
( $2000); 2) microfilming for reader/ printer specifications ( $2350); 3) 
microfilming for reader specifications ( $400); and 4) microfilming for a 
Xerox Copyflo printout of a card overlaid on a 8 x 10)~ worksheet ( $7000). 

The differences in cost are primarily attributable to the type of camera 
used (rotary or planetary) and the kind of feed mechanism (manual or 
automatic). Other factors need to be considered, such as the fact that 
film suitable for OCR requirements could not be used on Xerox Copyflo 
or even for contact printing to positive film. Since a readable copy of the 
original printed card is necessary for updating and proofing, microfilming 
for direct-read OCR would not be a viable alternative. 

Input Devices 

The monitoring of existent input devices was continued with an investi-
gation of Dissly Systems' Scan Data optical character reader. Scan Data 
has been modified, via software, to read 55 different type fonts which are 
recognized by a "best compare" technique using six stored fonts to match 
against the remaining 49. According to the manufacturer, direct-reading 
is accomplished with approximately 95% level of accuracy. Errors are 
recorded during a proofing cycle and corrected in the machine readable 
data base. 

The Scan Data equipment does not have a transport for a 3 x 5 document, 
so that a number of 3 x 5 cards must be attached to an 8 x 14 document 
for scanning, and therefore these cards would not be returned to the 
Library by the manufacturer. Under these conditions, cards to be read 
by Scan Data equipment would have to be obtained from stock rather 
than from the Card Division record set. Unfortunately, many cards are 
out of stock; and of those that are in stock many may be cards reprinted 
several times by photo-offset methods and consequently have a poor image. 
Therefore the use of this device would be severely hampered. 

Fifty good quality cards were submitted to Dissly Systems for an experi-
ment that was run without any modifications to the existing machine and 
software. Five of the 50 cards were returned to the Library with a matching 
printout. The results were not encouraging because many lines of text 
were missed and many characters misread. 

RECON Working Task Force 

The RECON Working Task Force has compiled work statements for 
contractual support for two of its research projects. These projects involve 
investigations on the implications of a national union catalog in machine 
readable form and the possible utilization of machine readable data bases 
other than that of the Library of Congress for use in a national bibliographic 
store. Preliminary tasks related to these projects have been described in 
earlier progress reports ( 6, 7). 


The RECON Pilot Project/ AVRAM and MARUYAMA 165 

The first part of the work statement deals with the products that could 
be derived from the machine readable national union catalog: a biblio-
graphic register, indexes by name, title, and subject, and a register of 
locations. These indexes would provide multiple access points to the 
records in the National Union Catalog. 

The bibliographic register will contain a full bibliographic record on 
each title covered. The indexes will contain partial records which are 
associated with the full records in the register, and a given index file will 
carry one or more partial records for every record in the register. For each 
title in the register, the register of locations lists those libraries where copies 
of the title are held. 

The assumption is made that the indexes under consideration will contain 
the following data elements (the numeric designations and subfield codes 
are those used in the MARC format fields): 

Name Index 
Name ( 100, 110, 111, 400, 410, 411, 600, 610, 611, 700, 710, 711, 800, 

810, 811) 
Short title ( 245) 
Main entry in abbreviated form 
Date (fixed field Date 1) 
Language (fixed field language code) 
LC card number 
Register number 

Title Index 
Short title ( 130, 240, 241, 245, 440, 630, 730, 7 40, 840) 
Main entry in abbreviated form 
Date (fixed field Date 1, or may be omitted if in heading) 
Language (fixed field language code, or may be omitted if in heading) 
LC card number 
Register number 

Subject Index 
Subject heading ( 650, 651) 
Main entry ( 100, 110, or 111) 
Short title (245) 
Date (fixed field Date 1) 
Language (fixed field language code) 
LC card number 
Register number 

The abbreviated form of main entry noted above is to be included in the 
record of the name or title index unless the name itself is carried in the 
main entry of that record. It is defined as follows: 1) for a personal name, 
a conference, or a uniform title heading-subfield "$a" is appended in 
brackets after the title; and 2) for a corporate name-subfield "$a" plus 
the first "$b" subfield are appended, within a single set of brackets, after 
the title. 


166 Journal of Library Automation Vol. 4/3 September, 1971 

The specific objective of this project is to define and investigate alterna-
tive processing schemes associated with an automated National Union 
Catalog. This study will explore and examine these processing schemes 
and the following components: 

1) Techniques for introducing the necessary input into the automated 
NUC svstem. The considerations to be covered include the relation-
ship to' MARC input, use of the format recognition programs, and 
the problems of language in terms of selection of input. 

2) Techniques for structuring or organizing the data contained in the 
register and the various indexes to establish and maintain the rela-
tionships among the records contained in these data bases. 

3) Techniques and procedures connected with the production of the 
products listed above. This investigation will also cover any selection 
and sorting procedures necessary. 

4) Analysis of the format, i.e., graphic design and printing, size, style, 
typographic variation, condensation, etc. 

5) Examination of alternative cumulation patterns associated with the 
products of the system. In this connection, items such as number 
of characters in an average entry, average number of entries on a 
page, expected rate of increase of number of entries in catalog, and 
segmentation of catalog are to be taken into consideration. 

6) Feasibility of producing a register through automation techniques. 
If this can be accomplished, further investigation will be directed 
toward the feasibility and cost of segmenting the register into three 
sections: one produced from machine readable records (English and 
whatever roman alphabet language records are in machine readable 
form); one produced from roman alphabet language records which 
are only in printed form; and one produced from non-roman alphabet 
language records which are only in printed form. 

The costs associated with the various techniques and procedures enumer-
ated above as well as with their components will be calculated. From these 
figures an average total cost per title cataloged is to be determined for each 
alternative processing scheme. These cost values (one per alternative 
scheme ) are to be compared with those associated with a purely manual 
processing scheme. Included in this cost analysis will be the associated 
costs for different forms of hard copy as well as for the use of COM 
(Computer Output Microfilm). 

From any one index and the register of locations, the maximum number 
of alphabetic and numeric lists (registers of location ordered by register 
number) will be determined, taking into account ease of usage and 
technical and economic feasibility. The intent is to have as few lists as 
possible and still keep the cost within reasonable bounds. Supplements 
to the indexes should be issued monthly; supplements to the register of 
locations may be issued monthly or quarterly. 


The RECON Pilot Pro;ectfAVRAM and MARUYAMA 167 

The second project is a continuation of a previous investigation on the 
possible utilization of machine readable data bases other than that pro-
duced by the Library of Congress for use in a national bibliographic store. 
The results of this project should determine if the use of other data bases 
is economically and technically feasible. Using three or four data bases 
selected by the RECON Working Task Force, the study will determine 
the following: 

1) Method and cost of acquiring these other data bases in machine 
readable form. 

2) Analysis of the kinds of programs capable of converting records from 
a number of these data bases into the MARC format. Different level 
data bases might require different kinds of programs. If such an 
effort is deemed feasible, a cost estimate for such a program or array 
of programs will be calculated. 

3) Method and cost of printing the records for examination, corrections, 
etc. 

4) Method and cost of eliminating records already in the MARC data 
base. 

5) Method and cost of comparing these records against the LC Official 
Catalog and making the necessary changes in the data or content 
designators. 

6) Cost for input of additions and corrections. 
7) Method and cost of incorporating the additions and corrections in 

the machine readable file. 
8) Cost of providing means by which these records would not be input 

again by any future LC retrospective conversion effort. 
A result of this project should be a determination as to whether high 
potential or medium potential files, or both, are suitable for conversion. 
A determination will be made of the minimum yield or the minimum 
number of titles needed to justify writing the programs to convert these 
data bases. A factor to be considered is that the number of unique titles 
will decrease as more data bases are converted for this pool of records. 

It was decided that the research tasks to study the problems in dis-
tributing name and subject cross reference control files would be dropped 
because of limitations of time and funds. An additional task, however, 
has been added that can be performed within the time limits of the pilot 
project. During the past year, the Library of Congress Card Division has 
recorded information about card orders in machine readable form. This 
information will be analyzed as to the year and language of the most 
frequent orders because it is assumed that the most popular card orders 
bear a relationship to the potential use of a data base in machine readable 
form by libraries in the field. This study involves the following: 

1) Analysis of a frequency count of LC card orders for a one-year period 
and preparation of a distribution curve for card series. 


168 Journal of Libmry Automation Vol. 4/3 September, 1971 

2) Analysis of a sample of frequently ordered cards to determine with 
fair reliability the proportion of English language titles in this group. 
The sample will be large enough to give an indication of other 
language groups that might be significant for any RECON effort. 

3) Preparation of distribution curves for English language and non-
English titles by card series. 

4) Mathematical analysis of the results of 1) -3) above to arrive at a 
table to show the anticipated utility of converting specified subsets 
of the LC card set. 

OUTLOOK 
Research in input devices has not uncovered any equipment that offers 

a significant technical and cost improvement over the MT /ST currently 
used in the Library of Congress. On-line correction and verification of 
MARC/RECON records will, however, speed conversion and will offer 
relief in the flow of documents and paper work required in a purely batch 
operation. Since MARC/RECON records will be corrected and verified 
in one operation rather than by the cyclic process of the present system, 

· cost savings should be realized. The Library of Congress will have this 
on-line capability through the Multiple Use MARC System. This new 
system is still in the design phase, and a projected date for implementation 
has not yet been set. 

To date investigations in the use of direct-read optical character readers 
have demonstrated that there are no devices currently available capable 
of scanning the LC printed card. 

The format recognition programs are operational, and RECON titles 
in the 1968 card series are being converted without any prior editing of 
the records. Procedures are being implemented to gather the necessary 
data to compare costs of the format recognition technique with costs of 
conversion with human editing. 

Production statistics have shown that retrospective records are more 
costly to convert than current records. This higher cost is attributed to 
the additional tasks in RECON of selecting the subset for input from the 
LC record set and comparing the records with the LC Official Catalog 
for updating. Since cards in the LC record set do not necessarily reflect 
the latest changes made to the cards in the LC Official Catalog, the Official 
Catalog comparison is necessary to ensure that RECON records are as 
up-to-date as the cards in the Official Catalog. 

Although the RECON report ( 8) recommended conversion in reverse 
chronological order with highest priority given to the last ten years of 
English language monograph cataloging, the Working Task Force study 
on the Card Division popular titles may reveal that selective conversion 
is a more practical approach. The orderliness of chronological conversion 
by language does mean that records in machine readable form can be 
ascertained easily. It is interesting, however, to speculate on the use of 


The RECON Pilot Project/ AVRAM and MARUYAMA 169 

these records compared with popular titles which may cross many years 
and languages. 

The MARC/RECON titles constitute the data base for the Phase II 
Card Division Mechanization Project, and close liaison continues to be 
maintained between both projects. It is recognized that the distribution 
of cards and MARC records requires the same computer based bibliographic 
files and has similar hardware and software requirements. Plans are pres-
ently underway to transfer the duplication of tapes for ~.iARC subscribers 
from the Library's IBM 360/40 to the Card Division's Spectra 70 when the 
Phase II system is operational. 

The RECON Pilot Project does not officially end until August 1971. 
In an attempt to make information available as rapidly as possible, the 
preparation of the final report will begin this summer, since several aspects 
of the project are complete enough to be documented. The final report 
will be published by the Library of Congress, and its availability will be 
announced in the LC Information Bulletin and in professional journals. 

ACKNOWLEDGMENTS 

The authors wish to thank the staff members associated with the RECON 
Pilot Project in the MARC Development Office, the MARC Editorial Office, 
the Technical Processes Research Office, and the Photoduplication Service 
of the Library of Congress for their contributions to the project and, 
therefore, to this report. Special thanks are due to Patricia E. Parker of 
the MARC Development Office for her work on the foreign language editing 
experiment and for writing that section of this article. 

REFERENCES 

1. Avram, Henriette D.: "The RECON Pilot Project: A Progress Report," 
Journal of Library Automation, 3 (June 1970), 102-114. 

2. Avram, Henriette D.; Guiles, Kay D.; Maruyama, Lenore S.: "The 
RECON Pilot Project: A Progress Report, November 1969-April 1970," 
Journal of Librm·y Automation, 3 (September 1970), 230-251. 

3. Avram, Henriette D.; Maruyama, Lenore S.: "RECON Pilot Project: 
A Progress Report, April-September 1970," Jow·nal of Library Auto-
mation, 4 ( March 1971 ) , 38-51. 

4. RECON Working Task Force: Conversion of Retrospective Catalog 
Records to Machine-Readable Form: A Study of the Feasibility of a 
National Bibliographic Service (Washington, D.C.: Library of Congress, 
1969 ), 179. 

5. U. S. Library of Congress. Information Systems Office. Format Recog-
nition Process for MARC Records: A Logical Design (Chicago, Ameri-
can Library Association, 1970 ). 

6. Avram , Guiles, Maruyama, op. cit., 248-249. 
7. Avram, Maruyama, op. cit., 49-51. 
8. RECON Working Task Force, op. cit., 11.