lib-s-mocs-kmc364-20141005044532


156 

Corporate Author Entry Records 
Retrieved by Use of Derived Truncated 
Search Keys 

Alan L. LANDGRAF, Kunj B. RASTOGI, and Philip L. LONG, 
The Ohio College Library Center. 

An experiment was conducted to design a corporate author index to a large 
bibliographic file. The nature of corporate entries necessitates a different 
search key construction from that of personal names or titles. Derivation 
of a search key to select distinct corporate entry rec01'ds is discussed. 

INTRODUCTION 

This paper describes the findings of an experiment conducted to design a 
corporate author index to entries in a large file of catalog records at the 
Ohio College Library Center; a companion paper describes findings of a 
similar investigation into retrieval employing a personal author index. 1 

The center has operated an on-line, shared cataloging system since August 
1971. In addition to a Library of Congress card number index, the system 
maintains truncated name-title and title index files. The user is thus able 
to retrieve entries employing truncated search keys. Three previous papers 
report results of experiments which led to the design of the name-title and 
title indexes.2- 4 

For monographs having personal names as main entries, a truncated 3,3 
search key consisting of the first three letters of the author's name plus the 
first three letters of the first non-English-article word of the title was 
judged to be satisfactory in that this key yielded five or fewer entries per 
query in more than 99 percent of the cases when keys were selected at ran-
dom.5 However, a recent study by Guthrie and Slifko reveals that a model 
which employs random selection of entries yields results closer to actual ex-
perience, and with a higher average number of entries per reply.6 

A search key composed of the first five or four characters of the sur-
name and the first or first and second initials makes possible efficient re-
trievaP However, the situation is different in the case of corporate entries 
because many corporate names begin with the same or similar words. For 
example, in the records examined, the initial words of more than 1,300 
publications are "U.S. Congress, House Committee On .. .. " Obviously a 


Corporate Author Entry RecordsjLANDGRAF, et al. 157 

type of search key different from that which proved efficient for retrieving 
personal authors is required for retrieval of corporate entries. 

MATERIAL AND METHODS 

The experiment used a file of approximately 200,000 MARC II records 
having a total of 68,169 corporate name entries. Corporate entries were ex-
tracted from the llO, Ill, 410, 411, 710, 711, 810, and 811 fi elds in the rec-
ords. A program edited the file to extract keys; initial English language ar-
ticles were removed from each entry, and the words "United States," 
"U.S .," "U. S.," "Great Brit.," and "Great Britain" appearing anywhere in 
the entry were replaced with "US" and "Gt Brit" respectively. A blank was 
substituted for each subfield delimiter and associated code, and unwanted 
characters such as punctuation, diacritics, and special symbols were re-
moved; the program also closed up the space that the unwanted character 
had occupied. One blank replaced multiple blanks. The elements extracted 
consisted of five segments of eight characters each, representing the initial 
eight characters of the first five words of the corporate entry. Segments 
containing fewer than eight characters were padded out with blanks. If a 
corporate name had fewer than five words, the remaining segments were 
blank. 

To study a given type of key, the file was sorted on a specified number 
of initial characters of each segment; these initial characters were then 
employed as search keys by a program which sequentially compared the 
characters in the key, counting distinct and identical keys. 

RESULTS AND DISCUSSION 

Table 1 presents the number of distinct keys and the maximum number 
of occurrences of identical keys for the structures studied in the experi-
ment. The larger the number of distinct keys for a fixed number of en-
tries in the file, the better the key will be for retrieval purposes. Given two 
search keys which are more or less equally specific, the one which is sim-
pler to use is preferable. 

The peculiarity of corporate-entry keys can be observed from Table 1. 
Even for the 8,8,8)8,8 key structure the percentage of distinct keys ( 33.7 per-
cent) is low, and the maximum number of occurrences of an identical key 
( 1304) is high. Another observation revealed by Table 1 is that as the key 
structure goes from five to three segments, there is a steady decrease in the 
percentage of distinct keys and consequently an increase in the maximum 
number of entries per key. However, a reduction in the number of char-
acters in a segment does not cause a great deal of deterioration. For exam-
ple, for 8,8,8,~,8 keys, the percentage of unique keys and the maximum 
number of entries per key are respectively 33.7 percent and 1304, while for 
2,2,2,2,2 keys, the corresponding figures are 32.3 percent and 1307. 

Thus, the 2,2,2,2,2 key structure seemed a good candidate for a corporate 


158 Journal of Library Automation Vol. 6/ 3 September 1973 

Table 1. Number of Distinct Keys and Maximum Number of Identical 
Entries Per Key for Different Key Structures in 68,169 MARC II 
Records. 

Key Structure 

8,8,8,8,8 
8,8,8,8,0 
8,8,8,0,0 
4,2,2,2,2 
4,2,2,2,1 
4,2,2,2,0 
4,2,2,1,0 
4,2,2,0,0 
3,3,2,2,2 
3,3,2,2,1 
3,3,2,2,0 
3,3,2,1,0 
3,3,2,0,0 
2,2,2,2,2 
2,2,2,2,1 
2,2,2,2,0 
2,2,2,1,0 
2,2,2,0,0 
1,1,1,1,1 

Number of 
Distinct Keys 

22982 
20476 
16283 
22411 
22120 
19513 
18589 
14801 
22417 
22132 
19560 
18654 
14922 
22053 
21743 
19034 
18036 
13842 
19028 

Number of Distinct Ker1s 
as a Percent of Total 
Number of Records 

33.7 
30.0 
23.9 
32.9 
32.4 
28.6 
27.3 
21.7 
32.9 
32.5 
28.7 
27.4 
21.9 
32.3 
31.9 
27.9 
26.5 
20.3 
27.9 

Maximum Number 
of Entries Per Key 

1304 
1305 
1802 
1307 
1308 
1311 
1311 
1807 
1307 
1308 
1311 
1311 
1806 
1307 
1308 
1311 
1311 
1807 
1308 

entries index and therefore the number of entries per reply for this key 
structure was more intensely studied. 

On the average it is desirable that the number of replies per query be 
such that information by which the user can choose among the possible re-
plies can be displayed on a single CRT screen. This maximizes the utility 
of a computer system, since it minimizes the amount of system activity to 
promptly satisfy a user's request. Since some query keys produce but one 
reply while others produce hundreds of candidate records, it is necessary 
to use the mathematics of probability to determine the likely long-term ef-
fect of a given choice of system parameters. Using the approach indicated 

Table 2. Average Number of Entries Per Reply for Key St1·ucture 2,2,2,2,2 
for Various Multiplicity of Entries. 

Number of Average Number 
Maximum Frequency Total Records Percent of Distinct Keys of Entries 
of Any Entries in File in File Total Records Eliminated Per Reply 

19 44174 64.8 389 5.0 
29 48127 70.6 223 6.6 
39 50854 74.6 142 8.1 
49 52422 76.9 107 9.1 
59 53513 78.5 87 10.1 


Corporate Autho-r Entry RecordsjLANDGRAF, et al. 159 

as useful by Guthrie and Slifko, the analysis of the effect of various 
choices of search key becomes the following. 

Assume that every entry has an equal probability of being accessed. 
Then, in attempting to retrieve each entry once, keys having i number . of 
entries will cause a total of i 2 entries to be accessed. If ft denotes the fre-
quency of keys having i number of entries and M denotes the maximum 
allowable occurrences of any key in the file, the average number of entries 
per reply y, is given by: 

Jl{ 

where ~ i ft is the number of entries in the file whose derived keys have 
• = 1 

a frequency of M or less. 
The above formula yields the average number of entries per reply for 

the 2,2,2,2,2 key to be much larger than 20 for M > 100; but some 2,2,2,2,2 
keys corresponded to more than 500 file entries. A typical CRT display ter-
minal can accommodate only ten or fewer entries per screen. Therefore, 
if the average number of entries per reply is desired to be ten or fewer, 
it is necessary either to ignore entries with high multiplicity or to adopt a 
different scheme of storing and retrieving such items, in which case the 
mathematical result would be the same as ignoring high-frequency items. 

The average number of entries per reply was computed for five different 
values of M ( 19,29,39,49, and 59); the results of these computations are 
in Table 2, which reveals that if keys in the file are allowed a maximum 
recurrence of 39 entries per key, it would be possible to have keys in the 
main index for about 75 percent of total records, while entries for only 
142 high frequency keys would have to be shunted to a secondary index. 
In this case, the average number of entries per reply would be about eight. 

Table 3 gives the probability of number of entries per reply for the in-
dex file consisting of 50,854 (out of a total of 68,169) records with the 
maximum frequency of any key in the file being 39. For preparing this 
table the assumption is made that each entry in the file has an equal prob-
ability of being accessed. Thus the probability of obtaining i entries per 
reply is given by: 

P(i)= Jft 
'f. ifJ 

i= 1 

where f, is frequency of keys occurring exactly i number of times in the 
index file. An inspection of this table shows that in 87.7 percent of the 


160 Journal of Library Automation Vol. 6/ 3 September 1973 

Table 3. Probability of Number of Entries Per Reply for an Index File 
Using 2,2,2,2, 2 Key. 

Number of Entries 

1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 

Frequency 

14820 
2893 
1276 
726 
427 
312 
248 
195 
150 
120 

78 
88 
56 
71 
62 
48 
41 
28 
24 
22 
18 
16 
23 
25 
13 
9 

12 
18 
10 
11 
11 
13 

6 
9 
7 
6 

11 
5 
2 

Probability Pt·,ccntasc-

29.1 
11.4 
7.5 
5.7 
4.2 
3.7 
3.4 
3.1 
2.6 
2.4 
1.7 
2.1 
1.4 
1.9 
1.9 
1.5 
1.3 
1.0 
0.9 
0.9 
0.7 
0.7 
l.l 
L.l 
0.7 
0.4 
0.7 
1.0 
0.5 
0.7 
0.7 
0.8 
0.4 
0.6 
0.4 
0.5 
0.8 
0.3 
0.2 

CumulutioC ProlHJhiiJlll 
1' ~rr.cnltrEW 

29.1 
40.5 
48.0 
53.7 
57.9 
61.6 
65.0 
68.1 
70.7 
73.1 
74.8 
76.9 
78.3 
80.2 
82.1 
83 .6 
84.9 
85.9 
86.8 
87.7 
88.4 
89.1 
90.2 
91.3 
92.0 
92.4 
93.1 
94.1 
94.6 
95.3 
96.0 
96.8 
97.2 
97.8 
98.2 
98.7 
99.5 
99.8 

100.0 

time there would be 20 or fewer replies. This represents two screensful of 
information on a typical CRT display. 

CONCLUSION 

A file containing only those entries for which the frequencies of 
2,2,2,2,2 search keys is 39 or fewer would produce 20 or fewer entries per 


Corporate Autlwr Entry RecordsjLANDGRAF, et al. 161 

reply approximately 88 percent of the time, but such a file excludes 142 
high frequency keys for 17,315 of a total of 68,169 entries . Therefore, a 
special technique for handling corporate~entry derived keys of high multi~ 
plicity is desirable. 

REFERENCES 

1. A. L. Landgraf and F. G. Kilgour, "Catalog Records Retrieved by Personal Author 
Using Derived Search Keys," Journal of Library Automati{)n 6:103-8 (June 1973}. 

2. F. G. Kilgour, P. L. Long, and E. B. Leiderman, "Retrieval of Bibliographic 
Entries from a Nam~Title Catalog by Use of Truncated Search Keys," Proceedings 
of the American Society for Information Science 7:79-82 ( 1970}. 

3. F . G. Kilgour, P. L. Long, E. B. Leiderman, and A. L. Landgraf, "Titl~Only 
Entries Retrieved by the Use of Truncated Search Keys," Journal of Library Auto-
mation 4:207-10 (Dec. 1971). 

4. P. L. Long and F. G. Kilgour, "A Truncated Search Key Title Index," Journal of 
Library Automation 5:17-20 (March 1972}. 

5. Kilgour, Long, Leiderman, "Retrieval of Bibliographic Entries." 
6. G. D. Guthrie and S. D. Slifko, "Analysis of Search Key Retrieval on a Large 

Bibliographic File," Journal of Library Automation 5:96--100 (June 1972}. 
1. Landgraf and Kilgour, "Catalog Records Retrieved."