C1'eation of compUH" "'.,---' llO in (9) presents me results 01 a comparison test 01 me first mree CREATION OF COMPUTER INPUT IN AN EXPANDED CHARACTER SET Donald V. BLACK: System Development Corporation, Santa Monica California (Formerly, University of California, Santa Cruz, Calif.) , Keypunching of an expanded character set for library catalog data is described. The set included 101 different characters. Source documents were shelf list cards, the master record at the University of California Library, Santa Cruz. At the end of February, 1967, some 50 million characters, 1'epresenting more than 110,000 separate titles, had been punched. Some of the considerations leading to the adoption of this method for the creation of machine readable input are given, and details on costs and production rates. For manipulation by a computer, data must be converted to machine readable fornl. There are still only a few reasonably flexible means of creating machine readable records, especially if the data include an. ex~ panded character set. Five possible methods utilize one of the fnUoWlllg. standard keypunch, paper tape-producing typewriter, optical character reader, keyboard device that encodes dh'ectly onto magnetic tap~, or f keyboard tenuinal that inputs directly into a computer. DescriptIOns. 0 some of these methods are available in the literature. The Johns Hopkin! University (1) used optical character recognition which can handle a ft~_ > alphanumeric representation, whereas Southern Illinois (2) used mar. sense scanning to convert only a limited amount of information. Car~ wright (3) and IBM (4) discuss direct computer input from a keybo~r terminal. Buckland (5) discusses the use of the paper tape-produc:nf typewriter. Hammer (6) and Kilgour (7) discuss keypunching. PatflC t( 8) discusses several methods of conversion, but only in the abstrac · ...Cbap\110ds above.does not discuSS the relative merlts 01 these methods, but. 'fb paper ts the details of a system that has converted approximately es 11resen . h 2 1ra"eris;fuon characters 01 library catalog data on more t an 0 anguag , 1 500 to et of 101 characters. with • ~ iversity 01 california at Santa Cruz is one 01 three university '!'h. n recently established by the State. It opened lor business in the ~Pls~~65 with a core collection 01 some 55,000 titles in approximately fw000 volun,es. Early in the operation 01 the Library, it was decided to SO, achin ro as much as possible; therelore me existing catalog emods eos. as had to be converted il the original collection were to be a part 'f,~e lutur machine system. The creation 01 the core collection lor the e ;)"ee neW campuses 01 the University 01 Calilornia has been described in the literature (10). METHODSBids were sought to convert the catalog records during the summer 01 )965. The shell list record produced by me new campuses' project was the master record and was to be me source lor conversion. Unlortunately, the shell list consisted 01 both printed Library of Congress caIds and cards produced at me new campuses' project Irom typewritten multilim mas' ters. No editing was to be done on me shell list caIds. The only addition was the stamping 01 an arbitrary number using a five-digit automatic numbering machine, the purpose 01 the number being to keep individual punch cards together for each entry. Weighing me responses to me request lor bids was a disherutening experience. Only lour responses were received Irom a total 01 15 requests sent out. The bid request did not specify the method to be used to convert to machine readable form, but only the resulting machine read­ able record. Since the specifications had used punch cards as an example, P,:,haps this limited the minking 01 some 01 the organizations involved, With the result that they did not choose to bid, e Three .bids were based on keypunching. One was from Florida and me .ompleXlhes of the task made the choice of such a distant company :mpossible. II problems had arisen during the course 01 me conversion, ravel costs would have been excessive. cF'0ther .response estimated the cost to be about $1.50 per record. early, tllls was too costly, and since bids of this nature are apt to be ~::ervative in 'h. matter ~f ~Itimate tot~1 costs, we I~lt the choice 01 eth an mgam7.abon to do tne lob would, mdeed, result m a target figure at would be too high. I ?nly one bid used optical scanning as the method 01 conversion. Un· orufately, the bid was for me scanning only, and Library staff members wou d have had to retype the records for the scanner. Since the cost the scanJling alone was close to 301 a title, that bid was also lJaSed"'ll~.~being ultUnately more costly, choice 01 a kelY"nching service in San Francisc~ was made m 1'b" fill . 01 its pr<>xirillty to Santa Cruz, on the enthuSlas 01 the '" tbO b""J, task to be undertaken, and on a reasonable cost estimate. lJidDor I -"" .... N <: ' .... ..... dedco, col '" 0 because it was aVlUlable on an IBM/1401 computer at the Los.<: " >,. +J ... CO COIr\, P CO I I ..... >, I -::t -::t cbg'j':. caIDpus 01 the University (UCLA). At that time, it was the only >< tdp (V") I IS '"' $,,; ~;:1~ ~with sucb a printer on the West Coast. The character set had been ! g .& ~ .~ ~ ro...... ... z ..., Joe Joe 0) f CO CO I, ted by librarians at UCLA from characters offered by IBM in the ..; ~~ Pi~ '}J~~ co al IJ) Joe 0 I • ..-i .;.> ~~§IJ)'rlPi ... ~~ (J) ~er of 1964 for the 1403 printer.'" >r< A 'rle:;j~~g ~ ...... ~g lor the special characters is descnbed in the tables. There are'-" d ,....-! ~ -rl" (V") col col ~ 0 0 .;.> ~ Ul
  • ~ Q) Q} , o;:j Q.! 0) Pi 0) .c ~rl+J roOO"'CO I I .;.> > u ~ .... ..-i 0 '" ~ a.g -a:ci~~ 0J,'f'cr~ ~ e ~ ~ ~ ~ ~ ~ ~ ;I;«: ~ ~ E-o ~ (J) E-o U '" 0:l.!ll8c"l«:~~~~;:1~ 01 the character; obtaining a centered minus requires a multiple punch u ~ '" ~ -&3 (11. ). The underscore prints in a space by itself, just as do other char· 0:; 'B Po< ~ acters. It requires special programming to overprint this character by 0UJ co co ..... 0 ~£, , , r-I C\J (V") -::t If'\ \0ffi '):lr-co~J,~ ..... ..... suppressing paper spacing. The virgule overprint requires two columns E-o o ;:1 ..... ~ ~ CJ It> punch. Sharp.eyed readers will notice that the virgule appears twice ~~ u in Figure 1, and it has been counted twice lor the total 01 101 char­ o;J '""l acters. The blank has also been counted as a character, but the black ~ , < o lZ -A '" ... > ....... "" Ed «: .'ifJ square, which was not used at Santa Cruz, was not counted.fu ~ All data elements were encoded in fixed card fields; that is, the field for each type of inforrnation had a fixed length, generally 300 characters. ..," It was not necessary, however, to use the entire field or to fill it with g ..-i -­ :a til zeros or other codes. No terminating characters were used to separate II the fields. Each type of information was included on one or more cards'"' ~ ~ ~ ~ ~ U ~ ~ m m :arin ~ code which would tell the computer precisely what type 01UJ 1l ~ ~~ ~(Q" gA II '"' .<: a .., "" '"' ~ () as ..., 0)
  • <: U There are basically two ways that information can be encoded into ~ ,.a<>: 1-'~g~ ~~6 H~~P~" rd 0) '-" J..t ~1J)~!!~gbj'ga~p~,.... cards. This is discussed in references (3) and (6) especially. To use a.cH~MenO)~ en 0 EI ro ~ ,0 r::!30)O.rllQo)....-i~VJdCO.....-lA"II ~ ~: '" .~ ..; ... ~ ° '" °M ~ .;.> SO'"8 rl0 ~ '" h '" r-Id ~~og.;i§ n,",daIQo,"PodJl) .... p,..; u~etelY variable lonnat it is necessary to bave field delimiting codes. Cd ~ A«:~~p., (J) 0:; U W ::0: Z ~ ~ ~ «: u UJ «: ~ Po< 000Z I' ~ xed sequence 01 data elements is established (e.g., author, title, pub­ co CO CO CO ""'-!cococoEd O?Cfo'?'?'f'?'? 'f'f '" a"fu . II a number 01 individual codes are to be used to delimit fields, H u ell-