AN ALGORITHM FOR VARIABLE-LENGTH 
PROPER-NAME COMPRESSION 

257 

James L. DOLBY: R & D Consultants Company, Los Altos, California 

Viable on-line search systems require reasonable capabilities to automa-
tically detect (and hopefully correct) variations between request format 
and stored format. An important requirement is the solution of the prob-
lem of matching proper names, not only because both input specificatiof.I,S 
and storage specifications are subject to error, but also because various 
transliteration schemes exist and can provide variant proper name forms in 
the same data base. This paper reviews several proper name matching 
schemes and provides an updated version of these schemes which tests out 
nicely on the proper name equivalence classes of a suburban telephone 
book. An appendix lists the corpus of names used for algorithm test. 

A viable on-line search system cannot reasonably assume that each user 
will invariably provide the proper input information without error. Human 
beings not only make errors, but also expect their correspondents, be they 
human or mechanical, to be able to cope with these errors, at least at some 
reasonable error-rate level. Many of the difficulties in implementing com-
puter systems in many areas of human activity stem from failure to rec-
ognize, and plan for, routine acceptance of errors in the systems. Indeed, 
computing did not become the widespread activity it is now until the so-
called higher-level languages came into being. Although it is customary 
to think of higher-level languages as being "more English-like," the height 
of their level is better measured by the brevity with which various jobs 
can be expressed (for brevity tends to reduce errors) and the degree of 
sophistication of their automatic error detection and correction procedures. 

The processing of catalog information for the purposes of exposing and 
retrieving information presents at least two major areas for research in 
automatic error detection and correction. At the first stage, the data bank 
must be created, updated and maintained. Methods for dealing with input 
errors at this level have been derived by a number of groups and it seems 
reasonable to assert that something in the order of 60% of the input errors 
can be detected automatically ( 1,2,3 ). With the possibility of human proof-


258 Journal of Library Automation Vol. 3/4 December, 1970 

reading and error detection through actual use, it is reasonable to expect 
a mature data base to have a very low over-all error rate. 

At the second stage, however, when a user approaches the data base 
through a terminal or other on-line device, the errors will be of a recurring 
nature. Each user will generate his own error set and, though experience 
will tend to minimize the error rate for a particular user, there will be 
an essentially irreducible minimum error rate even for an experienced 
user. If the system is to attract users other than professional interrogators, 
it must respond intelligently at this minimal error level. 

This paper explores certain problems associated with making "noisy 
matches" in catalog searches. Because preliminary information indicates 
that the most likely source of input errors is in the keyboarding of proper 
names, the main emphasis of the paper is on the problem of algorithmically 
compressing proper names in such a way as to identify similar names (and 
likely misspellings) without over-identifying the list of possible authors. 

EXISTING NAME-COMPRESSION ALGORITHMS 
The problem of providing equivalence classes of proper names is hardly 

new. Library catalogs, telephone directories and other major data bases 
have made use of "see-also"-type references for many years. Some years 
ago Remington-Rand derived an alphanumeric name compression algor-
ithm, SOUNDEX, that could be applied either by hand or by machine 
for such purposes ( 4). Perhaps the most widely used on-line retrieval 
system presently in existence, the airline reservation system (such as 
SABRE), makes use of such an algorithm (5). The closely related problem 
of compressing English words (either to establish noisy matches, to elimi-
nate misspelled words, or simply to achieve data bank compression) has 
also received some attention ( 6, 7, 8). Implementation of such algorithms 
has been described ( 9, 10, 11, 12, 13). 

Although English word structure differs from proper-name structure in 
some important respects (e.g., the existence of suffixes), three of the 
algorithms are constructed by giving varying degrees of attention to the 
following five areas of word structure: 

1 ) The character in word initial position; 
2) The character set: (A, E, I, 0, U, Y, H, W); 
3) Doubled characters (e.g., tt); 
4) Transformation of consonants (i.e., all alphabetic characters other 

than those in 2 above) into equivalence classes; 
5) Truncation of the residual character string. 
The word-initial character receives varying attention. SOUNDEX places 

the initial consonant in the initial position of the compressed form and 
then transforms all other consonants into equivalence classes with numeric 
titles. SABRE maintains the word-initial character even if it is a vowel. 
In the Armour Research Foundation scheme (ARF), the word-initial 
character is also retained as is. 


Algorithm for Name CompressionjDOLBY 259 

Both SOUNDEX and SABRE eliminate all characters in the set 2) 
above. The ARF scheme retains all characters in shorter words and deletes 
vowels only, to reduce the compressed form to four characters, deleting 
the "U" after "Q," the second vowel in a vowel string, and then all re-
maining vowels. 

All three systems delete the second letter of a double-letter string. 
SABRE goes a step further and deletes the second letter of a double-
letter string occurring after the vowels have been deleted. Thus, the 
second "R" of "BEARER" would be deleted. 

SOUNDEX maps the eighteen consonants into six equivalence classes: 
1) B, F, P, V 
2) C, G, J, K, Q, S, X, Z 
3) D, T 
4) L 
5) M, N 
6) R 

SABRE and ARF do not perform any transformations on these eighteen 
consonants. 

Finally, all three systems truncate the remaining string of characters 
to four characters. For shorter forms, padding in the form of zeros 
(SOUNDEX), blanks (SABRE), or hyphens (ARF) is added so that all 
codes are precisely four characters long. 

Variable-length coding schemes have been considered but generally 
rejected for implementation on major systems because of the attendant 
difficulties of programming and the fact that code compression is en-
hanced by fixed-length codes where no interword space is necessary. 
Although fixed-length schemes of length greater than four have been 
considered, no definitive data appears to be available as to the enhanced 
ability of compressed codes to discriminate by introduction of more 
characters. The SABRE system does add a fifth character but makes use 
of the person's first initial for added discrimination. 

Tukey ( 14) has constructed a personal author code for his citation 
indexing and permuted title studies on an extensive corpus of the statistical 
literature. In this situation the author code is a semi-mnemonic code in a 
tag form to assist the user in identification rather than to be used as a 
basic entry point. However, Tukey does note that in his corpus a three-
character code of the surname, plus two initials, is superior to a five-
character surname code for purposes of unique identification. 

MEASURING ALGORITHMIC PERFORMANCE 

One of the main problems in constructing linguistic algorithms is to 
decide on appropriate measures of performance and to obtain data bases 
for implementing such measures. In this case it is clear that certain 
improvements in existing algorithms can be made- particularly by using 
more sophisticated b·ansformation rules for the consonants - and that 


260 Journal of Librat·y Automation Vol. 3/4 December, 1970 

the problems of implementing such changes are not so great in today's 
context as they were when the systems noted above were originally derived. 
Improvements in processing speeds and programming languages, how-
ever, do not remove the need for keeping "linguistic frills" to a minimum. 

Ideally, it would be desirable to have a list of common errors in key-
boarding names as a test basis for any proposed algorithms. Unfortunately, 
no such list of sufficient size appears to be available. Lacking this, one can 
speculate that certain formal properties of the predictability of language 
might be useful in deriving an algorithm. At the English word level, some 
effort has been made to exploit measures of entropy as developed by 
Shannon in this direction (6, 7). However, there is good reason to question 
whether entropy, at least when measured in the usual way, is strongly 
correlated with actually occurring errors ( 15). 

As an alternative, one can study existing lists of personal-name equiva-
lence classes to derive such algorithms and then test the algorithm against 
such classes, measuring both the degree of over-identification and the de-
gree of under-identification. Clearly, such tests will carry more weight if 
they are conducted under economic forcing conditions where weaknesses 
in the test set will lead to real and measurable expense to the organization 
publishing the list. The SABRE system operates under strong economic 
forcing conditions in the sense that airline passengers frequently have a 
number of competitive alternatives available to them and lost reservations 
can cause sufficient inconvenience for them to consider these alternatives. 
However, the main application of the SABRE system is to rather small 
groups of persons (at least when compared to the number of personal 
authors in a typical library catalog), so that errors of over-identification 
are essentially trivial in cost to the airlines. 

A readily available source of "see-also"-type equivalence classes of 
proper names is given in the telephone directory system. Here, the eco-
nomic forcing system is not so strong as in the airline situation, but it is 
measurable in that failure to provide an adequate list will lead to increased 
user dependence on the Information Operator, with consequent increased 
cost to the telephone company. As a test of the feasibility of using such a 
set of equivalence classes, the 451 classes found in the Palo Alto-Los Altos 
(California) telephone directory were copied out by hand and used in 
deriving and testing the algorithm given in the next section and the 
SOUNDEX algorithm. 

There remains the question of deciding what is to constitute proper 
agreement between any algorithm and the set of equivalence classes chosen 
as a data base. At the grossest level it seems reasonable to argue that over-
identification is less serious than under-identification. False drops only tend 
to clog the line. Lost reference points, on the other hand, lead to lost in-
formation. Investigation of other applications of linguistic algorithms, such 
as algorithms to hyphenate words, identify semantically similar words 
through cutting off of suffixes, and so forth, indicates that it is usually 


Algorithm for Name CompressionjDOLBY 261 

possible to reduce crucial error (in this case under-identification) to some-
thing under 5%, while preserving something in the order of 80% of the 
original distinctions (or efficiency) of the system. Efforts to improve 
materially on the "five-and-eighty" rule generally lead to solutions involv-
ing larger context and/or extensive exception dictionaries. In this study 
efforts are directed at achieving a "five-and-eighty" solution. 

A VARIABLE-LENGTH NAME-COMPRESSION SCHEME 
In light of the fact that no definitive information is available on the 

problems of truncating errors in name-compression algorithms, it is con-
venient to break the problem into two pieces. First is derivation of a 
variable-length algorithm of the required accuracy and efficiency and 
then determination of the errors induced by truncation. 

A studying of the set of equivalence classes given in the Palo Alto-Los 
Altos telephone directory made fairly clear that with minor modifications 
of the basic five steps used in the other algorithms noted above, it would 
not be too difficult to provide a reasonably accurate match without requir-
ing too much over-identification. The main modifications made consisted 
of maintaining the position of the first vowel and using local context to 
make transformations on the consonants. The algorithm is given below. 
(The rules given must be applied in the order given both with respect to 
the rules themselves and to the order of the lists within the rules, as the 
precedence relations are important to the performance of the algorithm.) 

A Spelling Equivalent Abbreviation Algorithm For Personal Names 

1) Transform: "MeG" to "Mk", "Mag" to "Mk", "Mac" to "Mk", "Me" to 
"Mk". 

2) Working from the right, recursively delete the second letter from 
th f II . I tt · "dt" "ld" " d" " t" " " " d" " t" " '' e o owmg e er parrs: , , n , n , rc , r , r , sc , 
"sk", "st''. 

3) T f ,, , t "k ,, (( , t 1.( , " ., t " ., " , t " ,~ ,, rans orm: x o s , ce o se , c1 o s1 , cy o sy , con-
sonant-ch" to "consonant-sh"; all other occurrences of "c" to "k", "z" 
to "s", "wr" to "r", "dg" to "g", "qu" to "k'', "t" to "d", "ph" to 
"f' (after the first letter). 

4) Delete all consonants other than "1", "n", and Y' which precede the 
letter "k" (after the first letter). 

5) Delete one letter from any doubled consonant. 
6) Transform "pf#" to "p#", "#pf" to "#f", "vowel-gh#" to "vowel-£#", 

"consonant-gh" to "consonant-g#", and delete all other occurrences 
of "gh". ("#"is the word-beginning and word-ending marker.) 

7) Replace the first vowel in the name by the symbol "•". 
8) Delete all remaining vowels. 
9) Delete all occurrences of "w" or "h" after the first letter in the word. 

The vowels are taken to be (A, E, I, 0, U, Y) . The remaining literal 
characters are treated as consonants. 


262 Journal of Library Automation Vol. 3/4 December, 1970 

The algorithm splits 22 ( 4.9%) of the 451 equivalence classes given by 
the phone directory. On the other hand, the algorithm provides 349 dis-
tinct classes (not counting those classes that were broken off in error) or 
77.4% of the 451 classes in the telephone directory data base. Thus has 
been achieved a reasonable approximation to the "five-and-eighty" per-
formance found in other linguistic problem areas. 

To give a proper appreciation of the nature of these underidentification 
errors, they are discussed below individually. 

1) The name Bryer is put in the same equivalence class with a variety 
of spellings of the name Bear. The algorithm fails to make this 
identification. 

2) Blagburn is not equated to Blackburn. 
3) The name Davison is equated to Davidson in its various forms. 

The algorithm fails to make this identification and this appears 
to be one of a modest class of difficulties that occur prior to the 
-son, -sen names. 

4) The class of names Dickinson, Dickerson, Dickison, and Dickenson 
are all equated by the directory but kept separate, except for the 
two forms of Dickinson, by the algorithm. 

5) The name Holm is not equated with the name Home. 
6) The name Holmes is not equated with the name Homes. 
7) The algorithm fails to equate Jaeger with two forms of Yaeger. 
8) The algorithm fails to equate Lamb with Lamn. 
9) The algorithm incorrectly assumes that the final "gh" of Leigh 

should be treated as an "f." Treating final "gh" either as a null 
sound or an "f' leads to about the same number of errors in either 
direction. 

10) The algorithm fails on the pairing of Leicester and Lester. The 
difficulty is an intervening vowel. 

11) The algorithm fails to equate the various forms of Lindsay with 
the forms of Lindsley. 

12) The algorithm fails to equate the various forms of McLaughlin 
with McLachlan. 

13) The algorithm fails to equate McCullogh with McCullah. This is 
again the final "gh" problem. 

14) The algorithm fails to equate McCue with McHugh (again the 
final "gh" problem) . 

15) The algorithm fails to equate Moretton with Morton. This is an 
intervening vowel problem. 

16) The algorithm fails to equate Rauch with Roush. 
17) The algorithm fails to equate Robinson with Robison (another -son 

type problem). 
18) The algorithm incorrectly assumes that the interior "ph" of Shep-

herd is an "£." 
19) The algorithm fails to equate Speer with Speier. 


Algorithm for Name CompressionjDOLBY 263 

20) The algorithm fails to equate Stevens with Stephens. 
21) The algorithm fails to equate Stevenson with Stephenson. 
22) The algorithm fails to equate the various forms of the word Thomp-

son (an -son problem.) 
In several of the errors noted above it may be questioned whether the 

telephone directory is following its own procedures with complete rigor. 
Setting these aside, the primary errors occur with the final "gh," the words 
ending in "son," and the words with the extraneous interior vowels. Each 
of these problems can be resolved to any desired degree of accuracy, but 
only at the expense of noticeable ·increases in the degree of complexity 
of the algorithm. 

THE TRUNCATION PROBLEM 
Simple truncation does not introduce errors of under-identification; it 

can only lead to further over-identification. Examination of the results of 
applying the algorithm to the telephone directory data base shows that 
no new over-identification is introduced if the compressed codes are all 
reduced to the leftmost seven characters. Further truncation leads to the 
following results: 
Code Length 

7 
6 
5 
4 

Cumulative Over-Identification Losses 

0 
1 
6 

45 
Thus there is a strong argument for maintaining at least five characters 
in the compressed code. 

However, there is no real need for restriction to simple truncation. 
Following the procedures used in the ARF system, further truncation can 
be obtained by selectively removing some of the remaining characters. 
The natural candidate for such removal is the vowel marker. If the vowel 
marker is removed from all the five character codes, only six more over-
identification errors are introduced. Removal of the vowel markers from 
all of the codes would have introduced 17 more errors of over-identification. 
The utility of the vowel marker is in the short codes. This in turn suggests 
that introduction of a second vowel marker in the very short codes may 
have some utility, and this is indeed the case. If the conception of vowel 
marker is generalized as marking the position of a vowel-string (i.e., a 
string of consecutive vowels), where for these purposes a vowel is any 
of the characters (A, E, I, 0, U, Y, H, W), and these markers are main-
tained as "padding" in the very short words, 18 errors of over-identification 
are eliminated at the cost of two new errors of under-identification. In this 
way the following modification to the variable length algorithm is derived: 

1) Mark the position of each of the first two vowel strings with an 
"o ," if there is more than one vowel. 


264 Journal of Library Automation Vol. 3/4 December, 1970 

2) Truncate to six characters. 
3) If the six-character code has two vowel markers, remove the right-

hand vowel marker. Otherwise, truncate the sixth character. 
4) If the resulting five-character code has a vowel marker, remove it. 

Otherwise remove the fifth character. 
5) For all codes having less than four characters in the variable-length 

fonn, pad to four characters by adding blanks to the right. 
Measured against the telephone directory data base, this fixed-length 
compression code provides 361 distinct classes (not counting improper 
class splits as separate classes) or 80% of the 451 given classes. Twenty-
four ( 5.3 %) of the classes are improperly split. By way of comparison, 
the SOUND EX system improperly splits 135 classes ( 30%) and provides 
only 287 distinct classes (not counting improperly split classes), or 63.8% 
of the telephone directory data base. 

ACKNOWLEDGMENTS 
This research was carried out for the Institute of Library Research, 

University of California, under the sponsorship of the Office of Education, 
Research Grant No. OEG-1-7-071083-5068. 

The author would like to thank Ralph M. Shoffner and Kelley L. Cart-
wright for suggesting the problem and for a number of useful comments 
on existing systems. Allan J. Humphrey was kind enough to program the 
variable-length version of the algorithm for test purposes. 

APPENDIX: CORPUS OF NAMES USED FOR ALGORITHM TEST 
A list of personal-name equivalence classes from the Palo Alto-Los Altos 

Telephone Directory is arranged according to the variable-length compres-
sion code (with the vowel marked "•" treated as an "A" for ordering) . 

Names whose compressed codes do not match the one given in the first 
column (and hence represent weaknesses in the algorithm and/ or the 
directory groupings) are given in italics. 

A small number of directory entries that do not bear on the immediate 
problem have been deleted from the list : Bell's see also Bells; Co-op 
see also Co-operative; St. see also Saint; etc. 
0 BL Abel, Abele, Abell, Able 
0 BRMS Abrahams, Abrams 
0 BRMSN Abrahamson, Abramson 
•D Eddy, Eddie 
0 DMNS Edmonds, Edmunds 
0 DMNSN Edmondson, Edmundson 
0 DMS Adams, Addems 
0 GN Eagen, Egan, Eggen 
0 GR Jaeger, Yaeger, Yeager 
°KN Aiken, Aikin, Aitken 
°KNS Adkins, Akins 


°KR 
OKR 
·Ks 
0 LBRD 
·Ln 
0 LN 
0 LSN 
0 LVR 
•Ms 
0 NGL 
0 NL 
0 NRS 
0 NRSN 
•Ns 
0 RKSN 
0 RL 
0 RN 
•RNs 
•Rs 
0 RVN 
0 RVNG 
0 SBRN 
B•n 
B•ns 
B°KMN 
B0 L 
B0 L 
B0 L 
B0 L 
B.L 
B 0 LN 
B·M 
B 0 MN 
B•N 
B0 ND 
B·R 

B0 R 
B•R 
B•R 
B 0 RBR 
B•Rc 
B 0 RGR 
B 0 RK 
B 0 RN 

Algorithm for Name CompressionjDOLBY 265 

Acker, Aker 
Eckard, Eckardt, Eckart, Eckert, Eckhardt 
Oakes, Oaks, Ochs 
Albright, Allbright 
Elliot, Elliott 
Allan, Allen, Allyn 
Ohlsen, Olesen, Olsen, Olson, Olsson 
Oliveira, Olivera, Olivero 
Ames, Eames 
Engel, Engle, Ingle 
O'Neal, O'Neil, O'Neill 
Andrews, Andrus 
Andersen, Anderson, Andreasen 
Ennis, Enos 
Enrichsen, Erickson, Ericson, Ericsson, Eriksen 
Earley, Early 
Erwin, Irwin 
Aarons, Ahrends, Ahrens, Arens, Arentz, Arons 
Ayers, Ayres 
Ervin, Ervine, Irvin, Irvine 
Erving, Irving 
Osborn, Osborne, Osbourne, Osburn 
Beatie, Beattie, Beatty, Beaty, Beedie 
Betts, Betz 
Bachman, Bachmann, Backman 
Bailey, Baillie, Bailly, Baily, Bayley 
Beal, Beale, Beall, Biehl 
Belew, Ballou, Bellew 
Buhl, Buell 
Belle, Bell 
Bolton, Boulton 
Baum, Bohm, Bohme 
Bauman, Bowman 
Bain, Bane, Bayne 
Bennet, Bennett 
Baer, Bahr, Baier, Bair, Bare, Bear, Beare, Behr, Beier, 

Bier, Bryer 
Barry, Beare, Beery, Berry 
Bauer, Baur, Bower 
Bird, Burd, Byrd 
Barbour, Barber 
Berg, Bergh, Burge 
Berger, Burger 
Boerke, Birk, Bourke, Burk, Burke 
Burn, Byrne 


266 Journal of Library Automation Vol. 3/4 December, 1970 

B 0 RNR 
B 0 RNS 
B 0 RNSN 
B0 RS 
BL°KBRN 
BL 0 M 
BR 0 D 
BR 0 N 
BR 0 N 
D 0 DS 
D°F 
D 0 GN 
D°K 
n•KNSN 
n•KsN 
n•L 
n•L 
n•L 
D 0 MN 
n•N 
n•N 
n•N 

n•N 
n•N 
D0 NL 
D.R 
n•R 
D 0 RM 
D 0 VDSN 
n•vs 
DR0 SL 
F• 
F°FR 
F 0 GN 
F 0 L 
F 0 L 
F 0 LKNR 
F 0 LPS 
F 0 NGN 
F 0 NL 
F0 RL 
F 0 RR 
F 0 RR 
F 0 RS 

Bernard, Bernhard, Bernhardt, Bernhart 
Berns, Bims, Burns, Byrns, Byrnes 
Bernstein, Bornstein 
Bertsch, Birch, Burch 
Blackburn, Blagburn 
Blom, Bloom, Bluhm, Blum, Blume 
Brode, Brodie, Brody 
Braun, Brown, Browne 
Brand, Brandt, Brant 
Diezt, Ditz 
Duffie, Duffy 
Dougan, Dugan, Duggan 
Dickey, Dicke 
Dickenson, Dickerson, Dickinson, Dickison 
Dickson, Dixon, Dixson 
Dailey, Daily, Daley, Daly 
Dahl, Dahle, Dall, Doll 
Deahl, Deal, Diehl 
Diamond, Dimond, Dymond 
Dean, Deane, Deen 
Denney, Denny 
Donahoo, Donahue, Donoho, Donohoe, Donohoo, 

Donohue, Dunnahoo 
Downey, Downie 
Dunn, Dunne 
Donley, Donnelley, Donnelly 
Daugherty, Doherty, Dougherty 
Dyar, Dyer 
Derham, Durham 
Davidsen, Davidson, Davison 
Davies, Davis 
Driscoll, Driskell 
Fay, Fahay, Fahey 
Fifer, Pfeffer, Pfeiffer 
Fagan, Feigan, Fegan 
Feil; Pfeil 
Feld, Feldt, Felt 
Faulkner, Falconer 
Philips, Phillips 
Finnegan, Finnigan 
Finlay, Finley 
Farrell, Ferrell 
Ferrara, Ferreira, Ferriera 
Foerster, Forester, Forrester, Forster 
Forrest, Forest 


F 0 RS 
F 0 RS 
F 0 SR 
FL 0 N 
FL 0 NGN 
FR0 

FR0 DMN 
FR0 DRKSN 
FR°K 
FR0 NS 
FR0 NS 
FR 0 S 
FR0 SR 
G0 D 
G0 DS 
G°F 
G0 L 
G0 LMR 
G0 LR 
G0 MS 
G0 NR 
G 0 NSLS 
G0 NSLVS 
G0 RD 
c•Rn 
G 0 RN 
G 0 RNR 
c•RR 
G 0 S 
GR 0 
GR.FD 
GR0 N 
GR•s 
H•n 
H°F 
H°FMN 
H0 G 
H 0 GN 
H°K 
H°KSN 
H 0 L 
H•L 
H•L 
H0 L 
H 0 LD 

Algorithm for Name CompressionjDOLBY 267 

Faris, Farriss, Ferris, Ferriss 
First, Fuerst, Furst 
Fischer, Fisher 
Flinn, Flynn 
Flanagan, Flanigan, Flannigan 
Frei, Frey, Fry, Frye 
Freedman, Friedman 
Frederickson, Frederiksen, Fredickson, Fredriksson 
Franck, Frank 
France, Frantz, Franz 
Frances, Francis 
Freeze, Freese, Fries 
Fraser, Frasier, Frazer, Frazier 
Good, Goode 
Getz, Goetz, Goetze 
Goff, Gough 
Gold, Goold, Gould 
Gilmer, Gilmore, Gilmour 
Gallagher, Gallaher, Galleher 
Gomes, Gomez 
Guenther, Gunther 
Gonzales, Gonzalez 
Consalves, Gonzalves 
Garratt, Garrett 
Garrity, Geraghty, Geraty, Gerrity 
Gorden, Gordohn, Gordon 
Gardiner, Gardner, Gartner 
Garrard, Gerard, Gerrard, Girard 
Gauss, Goss 
Gray, Grey 
Griffeth, Griffith 
Green, Greene 
Gros, Grose, Gross 
Hyde, Heidt 
Hoff, Hough, Huff 
Hoffman, Hoffmann, Hofman, Hofmann, Huffman 
Hoag, Hoge, Hogue 
Hagan, Hagen 
Hauch, Hauck, Hauk, Hauke 
Hutcheson, Hutchison 
Holley, Holly 
Holl, Hall 
Halley, Haley 
Haile, Hale 
Holiday, Halliday, Holladay, Holliday 

I 


268 Journal of Libra1·y Automation Vol. 3/4 December, 1970 

H 0 LG 
H 0 LM 
H 0 LMS 
H 0 LN 
H0 M 
H 0 MR 
H 0 N 
H 0 N 
H0 NN 
H 0 NRKS 
H 0 NRKSN 
H0 NS 
H0 NS 
I-JONSN 
H 0 R 
H 0 R 
H 0 R 
H 0 R 
H 0 RMN 
H 0 RMN 
H 0 RMN 
H0 RN 
H 0 RN 
H 0 RN 
H 0 RNGDN 
H 0 S 
H 0 S 
H 0 S 
H 0 SN 
H 0 VR 
r 
tFR 
rFRS 
tKB 
rKBsN 
rKs 
rL 
rMs 
rMSN 
rNsN 

rs 
Ko 
K°F 
K°FMN 

Helwig, Hellwig 
Holm, Home 
Holmes, Homes 
Highland, Hyland 
Ham, Hamm 
Hammar, Hammer 
Hanna, Hannah 
Hahn, Hahne, Harm, Haun 
Hanan, Hannan, Hannon 
Hendricks, Hendrix, Henriques 
Hendrickson, Henriksen, Henrikson 
Heintz, Heinz, Heinze, Hindes, Hinds, Hines, Hinze 
Haines, Haynes 
Henson, Hansen, Hanson, Hanssen, Hansson, Hanszen 
Herd, Heard, Hird, Hurd 
Hart, Hardt, Harte, Heart 
Hare, Hair 
Hardey, Hardie, Hardy 
Hartman, Hardmen, Hardman, Hartmann 
Herman, Hermann, Herrmann 
Harman, Harmon 
Heron, Herrin, Herron 
Hardin, Harden 
Hom, Horne 
Herrington, Harrington 
Haas, Haase, Hasse 
Howes, House, Howse 
Hays, Hayes 
Houston, Huston 
Hoover, Hover 
Jew, Jue 
Jeffery, Jeffrey 
Jefferies, Jefferis, Jefferys, Jeffreys 
Jacobi, Jacoby 
Jacobsen, Jacobson, Jackobsen 
Jacques, Jacks, Jaques 
Jewell, Juhl 
Jaimes, James 
Jameson, Jamieson, Jamison 
Jahnsen, Jansen, Jansohn, Janssen, Jansson, 

Janzen, Jensen, Jenson 
Joice, Joyce 
Kay, Kaye 
Coffee, Coffey 
Coffman, Kauffman, Kaufman, Kaufmann 


K°K 
K0 L 
K0 L 
K0 LMN 
K0 LR 
K0 MBRLN 
K 0 MBS 
K0 MP 
K0 MPS 
K0 N 
K0 N 
K0 N 
K0 N 
K0 N 
K0 N 
K0 N 
K 0 NL 
K 0 NR 
K0 NS 
K0 P 
K0 PL 
K0 R 
K0 R 
K0 R 
K0 R 
K0 R 
K 0 RD 
K0 RLN 
K 0 RN 
K0 RSNR 
K0 S 
K0 S 
K0 S 
K0 SL 
K0 SLR 
K 0 SR 
KL 0 N 
KL.,RK 
KL 0 SN 
KR 0 

KR 0 GR 
KR.,MR 
KR 0 N 
KR 0 S 
KR 0 S 

Algor·ithm. for Name CompressionfDOLBY 269 

Cook, Cooke, Koch, Koche 
Cole, Kohl, Koll 
Kelley, Kelly 
Coleman, Cohnan 
Koehler, Koeller, Kohler, Koller 
Chamberlain, Chamberlin 
Combs, Coombes, Coombs 
Camp, Kampe, Kampf 
Campos, Campus 
Cahn, Conn, Kahn 
Cahen, Cain, Caine, Cane, Kain, Kane 
Chin, Chinn 
Chaney, Cheney 
Coen, Cohan, Cohen, Cohn, Cone, Koehn, Kahn 
Coon, Kuhn, Kuhne 
Kenney, Kenny, Kinney 
Conley, Conly, Connelly, Connolly 
Conner, Connor 
Coons, Koontz, Kuhns, Kuns, Kuntz, Kunz 
Coop, Co-op, Coope, Coupe, Koop 
Chapel, Chapell, Chappel, Chappell, Chappelle, Chapple 
Carrie, Carey, Cary 
Corey, Cory 
Carr, Kar, Karr 
Kurtz, Kurz 
Kehr, Ker, Kerr 
Cartwright, Cortright 
Carleton, Carlton 
Carney, Cerney, Kearney 
Kirschner, Kirchner 
Chace, Chase 
Cass, Kass 
Kees, Keyes, Keys 
Cassel, Cassell, Castle 
Kesler, Kessler, Kestler 
Kaiser, Kayser, Keizer, Keyser, Kieser, Kiser, Kizer 
Cline, Klein, Kleine, Kline 
Clark, Clarke 
Claussen, Clausen, Clawson, Closson 
Crow, Crowe 
Krieger, Kroeger, Krueger, Kruger 
Creamer, Cramer, Kraemer, Kl·amer, Kremer 
Craine, Crane 
Christie, Christy, Kristee 
Crouss, Kraus, Krausch, Krause, Krouse 


270 Journal of Library Automation Vol. 3/4 December, 1970 

KR 0 S 
KR 0 S 
KR 0 SNSN 
Lo 
Lo 
L 0 D 
L 0 DL 
L 0 DRMN 
L°K 
L°KS 
L 0 LN 
L 0 LR 
L 0 MB 
L 0 MN 
L 0 MN 
L0 N 
L0 N 
L0 N 
L0 N 
L 0 NG 
L 0 NN 
L 0 NS 
L 0 R 
L 0 RNS 
L 0 RNS 
L 0 RSN 
L 0 S 
L 0 S 
L 0 SR 
L0 V 
L 0 VD 
L 0 VL 
L 0 VN 
M 0 D 
M 0 DN 
M0 DS 
M 0 DSN 
M°KL 
M°KM 
M°KS 
M°KS 
M 0 LN 
M 0 LN 
M 0 LR 
M 0 LR 

Cross, Krost 
Crews, Cruz, Kruse 
Christensen, Christiansen, Christianson 
Loe, Loewe, Low, Lowe 
Lea, Lee, Leigh 
Lloyd, Loyd 
Litle, Littell, Little, Lytle 
Ledterman, Letterman 
Leach, Leech, Leitch 
Lucas, Lukas 
Laughlin, Loughlin 
Lawler, Lawlor 
Lamb, Lamm 
Lemen, Lemmon, Lemon 
Layman, Lehman, Lehmann 
Lind, Lynd, Lynde 
Lion, Lyon 
Lin, Linn, Lynn, Lynne 
Lain, Laine, Laing, Lane, Layne 
Lang, Lange 
London, Lundin 
Lindsay, Lindsey, Lindsley, Linsley 
Lawry, Lowery, Lowrey, Lowry 
Lawrence, Lowrance 
Laurence, Lawrance, Lawrence, Lorence, Lorenz 
Larsen, Larson 
Lewis, Louis, Luis, Luiz 
Lacey, Lacy 
Leicester, Lester 
Levey, Levi, Levy 
Leavett, Leavitt, Levit 
Lavell, Lavelle, Leavelle, Loveall, Lovell 
Lavin, Levin, Levine 
Mead, Meade 
M oretton, Morton 
Mathews, Matthews 
Madison, Madsen, Matson, Matteson, Mattison, Mattson 
Michael, Michel 
Meacham, Mechem 
Marques, Marquez, Marquis, Marquiss 
Marcks, Marks, Marx 
Maloney, Moloney, Molony 
Mullan, Mullen, Mullin 
Mallery, Mallory 
Moeller, Moller, Mueller, Muller 


M0 LR 
M 0 LS 
M 0 N 
M0 NR 
M0 NR 
M0 NSN 
M 0 R 
M 0 R 
M0 R 
M0 R 
M0 R 
M0 RF 
M0 RL 
M 0 RN 
M 0 RS 
M0 RS 
MK0 

MK0 

MK0 

MK 0 

MK 0 L 
MK 0 LF 
MK 0 LM 
MK 0 N 
MK 0 NR 
MK 0 NS 
MK0 NS 
MK0 R 
MK0 R 
MKD 0 NL 
MKF 0 RLN 
MKF 0 RSN 
MKL 0 D 
MKL 0 KLN 

MKL 0 LN 
MKL 0 N 
MKL•N 
MKL 0 S 
MKM 0 LN 
MKN°L 
MKR•o 
N°KL 
N°KLS 
N°KLS 

Algorithm for Name CompressionjDOLBY 271 

Millar, Miller 
Miles, Myles 
Mahan, Mann 
Miner, Minor 
Monroe, Munro 
Monson, Munson 
Murray, Murrey 
Maher, Maier, Mayer 
Mohr, Moor, Moore 
Meyers, Myers 
Meier, Meyer, Mieir, Myhre 
Murphey, Murphy 
Merrell, Merrill 
Marten, Martin, Martine, Martyn 
Meyers, Myers 
Maurice, Morris, Morse 
McCoy, McCaughey 
Magee, McGee, McGehee, McGhie 
Mackey, MacKay, Mackie, McKay 
McCue, McHugh 
Magill, McGill 
McCollough, McCullah, McCullough 
McCallum, McCollum, McColm 
McKenney, McKinney 
Macintyre, McEntire, Mcintire, Mcintyre 
MacKenzie, McKenzie 
Maginnis, McGinnis, McGuinness, Mcinnes, Mcinnis 
Maguire, McGuire 
McCarthy, McCarty 
MacDonald, McDonald, McDonnell 
MacFarland, MacFarlane, McFarland, McFarlane 
MacPherson, McPherson 
MacLeod, McCloud, McLeod 
MacLachlan, Maclachlin, McLachlan, McLaughlin, 

McLoughlin 
McClellan, McClelland, McLellan 
McClain, McClaine, McLain, McLane 
MacLean, McClean, McLean 
McCloskey, McClosky, McCluskey 
MacMillan, McMillan, McMillin 
MacNeal, McNeal, McNeil, McNeill 
Magrath, McGrath 
Nichol, Nicholl, Nickel, Nickle, Nicol, Nicoll 
Nicholls, Nichols, Nickels, Nickles, Nicols 
Nicholas, Nicolas 


272 Journal of Library Automation Vol. 3/4 D ecember, 1970 

N°KLSN 
N°KSN 
N°L 
N°LSN 

N°MN 
N°RS 
N°SBD 
p•n 
P 0 DRSN 
p•c 
P 0 LK 
P0 LSN 
p•N 
p•R 
p•R 
P0 RK 
P 0 RKS 
p•Rs 
r•Rs 
p•Rs 
P 0 RSN 
PR°KR 
PR 0 NS 
PR 0 R 
R• 
R• 
R 0 BNSN 
R•n 
R•n 
R 0 D 
R 0 DR 
R•ns 
R 0 GN 
R•GR 
R°K 
R°K 
R°KR 
n•L 
R0 MNGTN 
R0 MR 
n•Ms 
n•N 
R0 NR 
R•s 

Nicholsen, Nicholson, Nicolaisen, Nicolson 
Nickson, Nixon 
Neal, Neale, Neall, Neel, Neil, Neill 
Neilsen, Neilson, Nelsen, Nelson, Nielsen, Nielson, 

Nilson, Nilssen, Nilsson 
Neumann, Newman 
Norris, Nourse 
Nesbit, Nesbitt, Nisbet 
Pettee, Petty 
Peterson, Pederson, Pedersen, Petersen, Petterson 
Page, Paige 
Polak, Pollack, Pollak, Pollock 
Polson, Paulsen, Paulson, Poulsen, Poulsson 
Paine, Payn, Payne 
Parry, Perry 
Parr, Paar 
Park, Parke 
Parks, Parkes 
Pierce, Pearce, Peirce, Piers 
Parish, Parrish 
Paris, Parris 
Pierson, Pearson, Pehrson, Peirson 
Prichard, Pritchard 
Prince, Prinz 
Prior, Pryor 
Roe, Rowe 
Rae, Ray, Raye, Rea, Rey, Wray 
Robinson, Robison 
Rothe, Roth 
Rudd, Rood, Rude 
Reed, Read, Reade, Reid 
Rider, Ryder 
Rhoades, Rhoads, Rhodes 
Regan, Ragon, Reagan 
Rodgers, Rogers 
Richey, Ritchey, Ritchie 
Reich, Reiche 
Reichardt, Richert, Rickard 
Reilley, Reilly, Reilli, Riley 
Remington, Rimington 
Reamer, Reimer, Riemer, Rimmer 
Ramsay, Ramsey 
Rhein, Rhine, Ryan 
Reinhard, Reinhardt, Reinhart, Rhinehart, Rinehart 
Reas, Reece, Rees, Reese, Reis, Reiss, Ries 


R0 S 
R0 S 
R0 S 
R•vs 
s•BR 
S°FL 
s•FN 
S°FNS 
S°FNSN 
S°FR 

S°FR 
s•cL 
S 0 GLR 
s•K 
s•Ks 
s•L 
s•L 
s•LR 
s•Ls 
s•Lv 
s•LvR 
S 0 MKR 

S 0 MN 
S 0 MN 
s•MRS 
s·Ms 
s•N 
S 0 N 
S 0 NR 
S0 NRS 
S 0 PR 
s·R 
s·R 
s·R 
S 0 R 
S0 R 
s•RL 
S 0 RLNG 
s•RMN 
S0 RN 
s•RR 
sos 
SM 0 D 

Algorithm for Name CompressionjDOLBY 273 

Rauch, Rausch, Roach, Roche, Roush 
Rush, Rusch 
Russ, Rus 
Reaves, Reeves 
Seibert, Siebert 
Schofield, Scofield 
Stefan, Steffan, Steffen, Stephan, Stephen 
Steffens, Stephens, Stevens 
Steffensen, Steffenson, Stephenson, Stevenson 
Schaefer, Schaeffer, Schafer, Schaffer, Schafer, 

Shaffer, Sheaffer 
Stauffer, Stouffer 
Siegal, Sigal 
Sigler, Ziegler 
Schuck, Shuck 
Sachs, Sacks, Saks, Sax, Saxe 
Seeley, Seely, Seley 
Schell, Shell 
Schuler, Schuller 
Schultz, Schultze, Schulz, Schulze, Shults, Shultz 
Silva, Sylva 
Silveira, Silvera, Silveria 
Schomaker, Schumacher, Schumaker, Shoemaker, 

Shumaker 
Simon, Symon 
Seaman, Seemann, Semon 
Somers, Sommars, Sommers, Summers 
Simms, Sims 
Stein, Stine 
Sweeney, Sweeny, Sweney 
Senter, Center 
Sanders, Saunders 
Shepard, Shephard, Shepheard, Shepherd, Sheppard 
Stahr, Star, Starr 
Stewart, Stuart 
Storey, Story 
Saier, Sayre 
Schwartz, Schwarz, Schwarze, Swartz 
Schirle, Shirley 
Sterling, Stirling 
Scheuermann, Schurman, Sherman 
Stearn, Stem 
Scherer, Shearer, Sharer, Sherer, Sheerer 
Sousa, Souza 
Smith, Smyth, Smythe 


274 Journal of Library Automation Vol. 3/4 December, 1970 

SM 0 D 
SN°DR 
SN°L 
SP 0 LNG 
SP 0 R 
SP 0 R 
SR 0 DR 
SR0 DR 
T0 D 
T 0 MSN 
T0 RL 
TR 0 S 
v·L 
v·L 
v·R 
w•o 
W 0 DKR 
w·nL 
w·nMN 
W 0 DR 
W 0 DRS 
W 0 GNR 

W 0 L 
W 0 L 
W 0 L 
W 0 LBR 
W 0 LF 
W 0 LKNS 
W 0 LKS 
W 0 LN 
W 0 LR 
W 0 LRS 
W 0 LS 
W 0 LS 
W 0 LS 
W 0 LSN 
W 0 N 
W 0 R 
W 0 R 
W 0 RL 
W 0 RNR 
W 0 S 
w·sMN 

Schmid, Schmidt, Schmit, Schmitt, Smit 
Schneider, Schnieder, Snaider, Snider, Snyder 
Schnell, Snell 
Spalding, Spaulding 
Spear, Speer, Speirer 
Spears, Speers 
Schroder, Schroeder, Schroeter 
Schrader, Shrader 
Tait, Tate 
Thomason, Thompson, Thomsen, Thomson, Tomson 
Terrel, Terrell, Terrill 
Tracey, Tracy 
Vail, Vaile, Vale 
Valley, Valle 
Vieira, Vierra 
White, Wight 
Whitacre, Whitaker, Whiteaker, Whittaker 
Whiteley, Whitley 
Whitman, Wittman 
Woodard, Woodward 
Waters, Watters 
Wagener, Waggener, Wagoner, Wagner, Wegner, 

Waggoner 
Willey, Willi 
Wiley, Wylie 
Wahl, Wall 
Wilber, Wilbur 
Wolf, Wolfe, Wolff, Woolf, Woulfe, Wulf, Wulff 
Wilkens, Wilkins 
Wilkes, Wilks 
Whalen, Whelan 
Walter, Walther, Wolter 
Walters, Walthers, Wolters 
Wallace, Wallis 
Welch, Welsh 
Welles, Wells 
Willson, Wilson 
Winn, Wynn, Wynne 
Worth, Wirth 
Ware, Wear, Weir, Wier 
Wehrle, Wehrlie, Werle, Worley 
Warner, Werner 
Weis, Weiss, Wiese, Wise, Wyss 
Weismann, Weissman, Weseman, Wiseman, 

Wismonn, Wissman 


Algorithm for Name CompressionjDOLBY 275 

REFERENCES 

1. Cox, N.S.M.; Dolby, J. L.: "Structured Linguistic Data and the 
Automatic Detection of Errors." In Advances in Computer Type-
setting (London: Institute of Printing, 1966), pp. 122-125. 

2. Cox, N.S.M.; Dews, J. D.; Dolby, J. L.,: The Computer and the 
Library (Hamden, Conn.: Archon Press, 1967). 

3. Dolby, J. L.; Forsyth, V. J.; Resnikoff, H. L.: Computerized Library 
Catalogs: Their Growth, Cost and Utility (Cambridge, Massachu-
setts: The M.I.T. Press, 1969) . 

4. Becker, Joseph; Hayes, Robert M. : Information Storage and Re-
trieval (New York: Wiley, 1963 ), p. 143. 

5. Davidson, Leon: "Retrieval of Misspelled Names in Airlines Pas-
senger Record System," Communications of the ACM, 5 (1962), 
169-171. 

6. Blair, C. R.: "A Program for Correcting Spelling Errors," Informa-
tion & Control, 3 ( 1960), 60-67. 

7. Schwartz, E. S.: An Adaptive Information Transmission System 
Employing Minimum Redundancy Word Codes (Armour Research 
Foundation Report, April 1962). (AD 274-135). 

8. Bourne, C. P.; Ford, D.: "A Study of Methods for Systematically 
Abbreviating English Words and Names," Journal of the ACM, 
8 ( 1961), 538-552. 

9. Kessler, M. M., "The "On-Line" Technical Information System at 
M.I.T.", in 1967 IEEE International Convention Record. (New York: 
Institute of Electrical and Electronic Engineers, 1967), pp. 40-43. 

10. Kilgour, F. G.: "Retrieval of Single Entries from a Computerized 
Library Catalog File," American Society for Information Science, 
Proceedings, 5 ( 1968), 133-136. 

11. Nugent, W. R.: "Compression Word Coding Techniques for In-
formation Retrieval," Journal of Library Automation, 1 (December 
1968), 250-260. 

12. Rothrock, H. I.: Computer-Assisted Directory Search; A Dissertation 
in Electrical Engineering. (Philadelphia: University of Pennsylvania, 
1968). 

13. Ruecking, F. H.: "Bibliographic Retrieval from Bibliographic In-
put; The Hypothesis and construction of a Test," Journal of Library 
Automation, 1 (December 1968), 227-238. 

14. Tukey, J. W.: A Tagging System for Journal Articles and Other 
Citable Items: A Status Report (Princeton, N.J.: Statistical Tech-
niques Research Group, Princeton University, 1963). 

15. Resnikoff, H. L.; Dolby, J. L.: A Proposal to Construct a Linguistic 
and Statistical Programming System, (Los Altos, Cal.: R & D Con-
sultants Company, 1967).