id sid tid token lemma pos 8935 1 1 105 105 CD 8935 1 2 Application application NN 8935 1 3 of of IN 8935 1 4 the the DT 8935 1 5 Variety Variety NNP 8935 1 6 - - HYPH 8935 1 7 Generator Generator NNP 8935 1 8 Approach Approach NNP 8935 1 9 to to IN 8935 1 10 Searches search NNS 8935 1 11 of of IN 8935 1 12 Personal Personal NNP 8935 1 13 Names Names NNPS 8935 1 14 in in IN 8935 1 15 Bibliographic Bibliographic NNP 8935 1 16 Data Data NNP 8935 1 17 Bases Bases NNPS 8935 1 18 - - HYPH 8935 1 19 Part Part NNP 8935 1 20 1 1 CD 8935 1 21 . . . 8935 2 1 Microstructure microstructure NN 8935 2 2 of of IN 8935 2 3 Personal Personal NNP 8935 2 4 Authors Authors NNPS 8935 2 5 ' ' POS 8935 2 6 Names name NNS 8935 2 7 Dirk Dirk NNP 8935 2 8 W. W. NNP 8935 2 9 FOKKER FOKKER NNP 8935 2 10 and and CC 8935 2 11 Michael Michael NNP 8935 2 12 F. F. NNP 8935 2 13 LYNCH LYNCH NNP 8935 2 14 : : : 8935 2 15 Postgraduate Postgraduate NNP 8935 2 16 School School NNP 8935 2 17 of of IN 8935 2 18 Librarianship Librarianship NNP 8935 2 19 and and CC 8935 2 20 Information Information NNP 8935 2 21 Science Science NNP 8935 2 22 , , , 8935 2 23 University University NNP 8935 2 24 of of IN 8935 2 25 Sheffield Sheffield NNP 8935 2 26 , , , 8935 2 27 England England NNP 8935 2 28 . . . 8935 3 1 Conventional conventional JJ 8935 3 2 approaches approach NNS 8935 3 3 to to IN 8935 3 4 processing processing NN 8935 3 5 records record NNS 8935 3 6 of of IN 8935 3 7 linguistic linguistic JJ 8935 3 8 origin origin NN 8935 3 9 for for IN 8935 3 10 storage storage NN 8935 3 11 and and CC 8935 3 12 retrieval retrieval NN 8935 3 13 tend tend VBP 8935 3 14 to to TO 8935 3 15 regard regard VB 8935 3 16 the the DT 8935 3 17 data datum NNS 8935 3 18 as as IN 8935 3 19 immutable immutable JJ 8935 3 20 . . . 8935 4 1 The the DT 8935 4 2 data datum NNS 8935 4 3 gen- gen- NN 8935 4 4 erally erally RB 8935 4 5 exhibit exhibit VB 8935 4 6 great great JJ 8935 4 7 variety variety NN 8935 4 8 and and CC 8935 4 9 disparate disparate JJ 8935 4 10 frequency frequency NN 8935 4 11 distributions distribution NNS 8935 4 12 , , , 8935 4 13 which which WDT 8935 4 14 are be VBP 8935 4 15 largely largely RB 8935 4 16 ignored ignore VBN 8935 4 17 and and CC 8935 4 18 which which WDT 8935 4 19 entail entail VBP 8935 4 20 either either CC 8935 4 21 the the DT 8935 4 22 storage storage NN 8935 4 23 of of IN 8935 4 24 extensive extensive JJ 8935 4 25 lists list NNS 8935 4 26 of of IN 8935 4 27 items item NNS 8935 4 28 or or CC 8935 4 29 the the DT 8935 4 30 use use NN 8935 4 31 of of IN 8935 4 32 complex complex JJ 8935 4 33 numerical numerical JJ 8935 4 34 algorithms algorithm NNS 8935 4 35 such such JJ 8935 4 36 as as IN 8935 4 37 hash hash NN 8935 4 38 coding coding NN 8935 4 39 . . . 8935 5 1 The the DT 8935 5 2 results result NNS 8935 5 3 in in IN 8935 5 4 each each DT 8935 5 5 case case NN 8935 5 6 are be VBP 8935 5 7 far far RB 8935 5 8 fmm fmm NN 8935 5 9 ideal ideal NN 8935 5 10 . . . 8935 6 1 The the DT 8935 6 2 variety variety NN 8935 6 3 - - HYPH 8935 6 4 generator generator NN 8935 6 5 approach approach NN 8935 6 6 seeks seek VBZ 8935 6 7 to to TO 8935 6 8 reflect reflect VB 8935 6 9 the the DT 8935 6 10 microstructure microstructure NN 8935 6 11 of of IN 8935 6 12 data datum NNS 8935 6 13 elements element NNS 8935 6 14 in in IN 8935 6 15 their -PRON- PRP$ 8935 6 16 description description NN 8935 6 17 for for IN 8935 6 18 storage storage NN 8935 6 19 and and CC 8935 6 20 search search NN 8935 6 21 , , , 8935 6 22 and and CC 8935 6 23 takes take VBZ 8935 6 24 advan- advan- NN 8935 6 25 tage tage NN 8935 6 26 of of IN 8935 6 27 the the DT 8935 6 28 consistency consistency NN 8935 6 29 of of IN 8935 6 30 statistical statistical JJ 8935 6 31 characteristics characteristic NNS 8935 6 32 of of IN 8935 6 33 data datum NNS 8935 6 34 elements element NNS 8935 6 35 in in IN 8935 6 36 homogeneous homogeneous JJ 8935 6 37 data datum NNS 8935 6 38 bases basis NNS 8935 6 39 . . . 8935 7 1 In in IN 8935 7 2 this this DT 8935 7 3 paper paper NN 8935 7 4 , , , 8935 7 5 the the DT 8935 7 6 application application NN 8935 7 7 of of IN 8935 7 8 the the DT 8935 7 9 variety variety NN 8935 7 10 - - HYPH 8935 7 11 generator generator NN 8935 7 12 approach approach NN 8935 7 13 to to IN 8935 7 14 the the DT 8935 7 15 description description NN 8935 7 16 of of IN 8935 7 17 personal personal JJ 8935 7 18 author author NN 8935 7 19 names name NNS 8935 7 20 from from IN 8935 7 21 the the DT 8935 7 22 INSPEC INSPEC NNP 8935 7 23 data data NN 8935 7 24 base base NN 8935 7 25 by by IN 8935 7 26 means mean NNS 8935 7 27 of of IN 8935 7 28 small small JJ 8935 7 29 sets set NNS 8935 7 30 of of IN 8935 7 31 keys key NNS 8935 7 32 is be VBZ 8935 7 33 detailed detailed JJ 8935 7 34 . . . 8935 8 1 It -PRON- PRP 8935 8 2 is be VBZ 8935 8 3 shown show VBN 8935 8 4 that that IN 8935 8 5 high high JJ 8935 8 6 degrees degree NNS 8935 8 7 of of IN 8935 8 8 partitioning partitioning NN 8935 8 9 of of IN 8935 8 10 names name NNS 8935 8 11 can can MD 8935 8 12 be be VB 8935 8 13 obtained obtain VBN 8935 8 14 by by IN 8935 8 15 key key NN 8935 8 16 - - HYPH 8935 8 17 sets set NNS 8935 8 18 generated generate VBN 8935 8 19 from from IN 8935 8 20 the the DT 8935 8 21 ini- ini- NN 8935 8 22 tial tial JJ 8935 8 23 characters character NNS 8935 8 24 of of IN 8935 8 25 surnames surname NNS 8935 8 26 , , , 8935 8 27 fmm fmm NNP 8935 8 28 the the DT 8935 8 29 terminal terminal JJ 8935 8 30 characters character NNS 8935 8 31 of of IN 8935 8 32 surnames surname NNS 8935 8 33 , , , 8935 8 34 and and CC 8935 8 35 from from IN 8935 8 36 the the DT 8935 8 37 initials initial NNS 8935 8 38 . . . 8935 9 1 The the DT 8935 9 2 implications implication NNS 8935 9 3 of of IN 8935 9 4 the the DT 8935 9 5 findings finding NNS 8935 9 6 for for IN 8935 9 7 computer computer NN 8935 9 8 - - HYPH 8935 9 9 based base VBN 8935 9 10 bibliographical bibliographical JJ 8935 9 11 in- in- NN 8935 9 12 formation formation NN 8935 9 13 systems system NNS 8935 9 14 are be VBP 8935 9 15 discussed discuss VBN 8935 9 16 . . . 8935 10 1 INTRODUCTION introduction VB 8935 10 2 The the DT 8935 10 3 application application NN 8935 10 4 of of IN 8935 10 5 computer computer NN 8935 10 6 technology technology NN 8935 10 7 to to IN 8935 10 8 the the DT 8935 10 9 storage storage NN 8935 10 10 of of IN 8935 10 11 bibliographic bibliographic JJ 8935 10 12 data data NN 8935 10 13 bases basis NNS 8935 10 14 and and CC 8935 10 15 to to IN 8935 10 16 the the DT 8935 10 17 selection selection NN 8935 10 18 of of IN 8935 10 19 items item NNS 8935 10 20 from from IN 8935 10 21 them -PRON- PRP 8935 10 22 on on IN 8935 10 23 the the DT 8935 10 24 basis basis NN 8935 10 25 of of IN 8935 10 26 the the DT 8935 10 27 con- con- NNP 8935 10 28 tent tent NNP 8935 10 29 of of IN 8935 10 30 specified specify VBN 8935 10 31 data datum NNS 8935 10 32 elements element NNS 8935 10 33 poses pose VBZ 8935 10 34 considerable considerable JJ 8935 10 35 problems problem NNS 8935 10 36 . . . 8935 11 1 Among among IN 8935 11 2 the the DT 8935 11 3 most most RBS 8935 11 4 important important JJ 8935 11 5 of of IN 8935 11 6 these these DT 8935 11 7 , , , 8935 11 8 from from IN 8935 11 9 the the DT 8935 11 10 viewpoint viewpoint NN 8935 11 11 of of IN 8935 11 12 the the DT 8935 11 13 efficiency efficiency NN 8935 11 14 of of IN 8935 11 15 computer computer NN 8935 11 16 use use NN 8935 11 17 , , , 8935 11 18 is be VBZ 8935 11 19 the the DT 8935 11 20 fact fact NN 8935 11 21 that that IN 8935 11 22 many many JJ 8935 11 23 of of IN 8935 11 24 the the DT 8935 11 25 individual individual JJ 8935 11 26 data data NN 8935 11 27 elements element NNS 8935 11 28 exhibit exhibit VBP 8935 11 29 great great JJ 8935 11 30 variety variety NN 8935 11 31 ( ( -LRB- 8935 11 32 i.e. i.e. FW 8935 11 33 , , , 8935 11 34 lists list NNS 8935 11 35 of of IN 8935 11 36 their -PRON- PRP$ 8935 11 37 contents content NNS 8935 11 38 are be VBP 8935 11 39 extensive extensive JJ 8935 11 40 ) ) -RRB- 8935 11 41 , , , 8935 11 42 and and CC 8935 11 43 show show VBP 8935 11 44 relatively relatively RB 8935 11 45 dis- dis- IN 8935 11 46 parate parate NN 8935 11 47 distributions distribution NNS 8935 11 48 . . . 8935 12 1 This this DT 8935 12 2 behavior behavior NN 8935 12 3 is be VBZ 8935 12 4 encountered encounter VBN 8935 12 5 in in IN 8935 12 6 different different JJ 8935 12 7 degrees degree NNS 8935 12 8 in in IN 8935 12 9 regard regard NN 8935 12 10 to to IN 8935 12 11 items item NNS 8935 12 12 such such JJ 8935 12 13 as as IN 8935 12 14 words word NNS 8935 12 15 in in IN 8935 12 16 the the DT 8935 12 17 titles title NNS 8935 12 18 of of IN 8935 12 19 monograph monograph NNP 8935 12 20 or or CC 8935 12 21 periodical periodical NNP 8935 12 22 ar- ar- CD 8935 12 23 106 106 CD 8935 12 24 ] ] -RRB- 8935 12 25 oumal oumal JJ 8935 12 26 of of IN 8935 12 27 Library Library NNP 8935 12 28 Automation Automation NNP 8935 12 29 Vol Vol NNP 8935 12 30 . . . 8935 13 1 7/2 7/2 CD 8935 13 2 June June NNP 8935 13 3 1974 1974 CD 8935 13 4 ticles ticle NNS 8935 13 5 , , , 8935 13 6 assigned assign VBD 8935 13 7 subject subject JJ 8935 13 8 headings heading NNS 8935 13 9 , , , 8935 13 10 authors author NNS 8935 13 11 ' ' POS 8935 13 12 names name NNS 8935 13 13 , , , 8935 13 14 and and CC 8935 13 15 citations.1- citations.1- NNP 8935 13 16 4 4 CD 8935 13 17 Such such JJ 8935 13 18 dis- dis- NN 8935 13 19 tributions tribution NNS 8935 13 20 have have VBP 8935 13 21 been be VBN 8935 13 22 extensively extensively RB 8935 13 23 studied study VBN 8935 13 24 in in IN 8935 13 25 various various JJ 8935 13 26 contexts contexts NN 8935 13 27 by by IN 8935 13 28 Bradford Bradford NNP 8935 13 29 , , , 8935 13 30 Zip£ Zip£ NNP 8935 13 31 , , , 8935 13 32 and and CC 8935 13 33 Mandelbrot.4 Mandelbrot.4 NNP 8935 13 34 - - HYPH 8935 13 35 6 6 CD 8935 13 36 In in IN 8935 13 37 general general JJ 8935 13 38 , , , 8935 13 39 the the DT 8935 13 40 distributions distribution NNS 8935 13 41 are be VBP 8935 13 42 approximately approximately RB 8935 13 43 hyperbolic hyperbolic JJ 8935 13 44 , , , 8935 13 45 so so IN 8935 13 46 that that IN 8935 13 47 a a DT 8935 13 48 small small JJ 8935 13 49 proportion proportion NN 8935 13 50 of of IN 8935 13 51 items item NNS 8935 13 52 may may MD 8935 13 53 account account VB 8935 13 54 for for IN 8935 13 55 a a DT 8935 13 56 substan- substan- JJ 8935 13 57 tial tial JJ 8935 13 58 proportion proportion NN 8935 13 59 of of IN 8935 13 60 occurrences occurrence NNS 8935 13 61 , , , 8935 13 62 while while IN 8935 13 63 the the DT 8935 13 64 majority majority NN 8935 13 65 of of IN 8935 13 66 items item NNS 8935 13 67 occur occur VBP 8935 13 68 only only RB 8935 13 69 in- in- RB 8935 13 70 frequently frequently RB 8935 13 71 . . . 8935 14 1 The the DT 8935 14 2 studies study NNS 8935 14 3 have have VBP 8935 14 4 been be VBN 8935 14 5 well well RB 8935 14 6 reviewed review VBN 8935 14 7 by by IN 8935 14 8 Fairthorne.7 Fairthorne.7 NNP 8935 14 9 Of of IN 8935 14 10 all all PDT 8935 14 11 the the DT 8935 14 12 data data NN 8935 14 13 elements element NNS 8935 14 14 , , , 8935 14 15 personal personal JJ 8935 14 16 author author NN 8935 14 17 names name NNS 8935 14 18 exhibit exhibit VBP 8935 14 19 a a DT 8935 14 20 distribution distribution NN 8935 14 21 which which WDT 8935 14 22 is be VBZ 8935 14 23 at at IN 8935 14 24 its -PRON- PRP$ 8935 14 25 most most JJS 8935 14 26 exh·eme exh·eme NNP 8935 14 27 in in IN 8935 14 28 one one CD 8935 14 29 direction direction NN 8935 14 30 . . . 8935 15 1 As as IN 8935 15 2 is be VBZ 8935 15 3 shown show VBN 8935 15 4 later later RB 8935 15 5 in in IN 8935 15 6 this this DT 8935 15 7 pa- pa- XX 8935 15 8 per per IN 8935 15 9 , , , 8935 15 10 the the DT 8935 15 11 most most RBS 8935 15 12 frequent frequent JJ 8935 15 13 author author NN 8935 15 14 name name NN 8935 15 15 in in IN 8935 15 16 a a DT 8935 15 17 file file NN 8935 15 18 of of IN 8935 15 19 50,000 50,000 CD 8935 15 20 names name NNS 8935 15 21 occurred occur VBD 8935 15 22 only only RB 8935 15 23 sixteen sixteen CD 8935 15 24 times time NNS 8935 15 25 , , , 8935 15 26 while while IN 8935 15 27 over over IN 8935 15 28 35,000 35,000 CD 8935 15 29 of of IN 8935 15 30 the the DT 8935 15 31 names name NNS 8935 15 32 , , , 8935 15 33 or or CC 8935 15 34 over over IN 8935 15 35 70 70 CD 8935 15 36 percent percent NN 8935 15 37 of of IN 8935 15 38 the the DT 8935 15 39 file file NN 8935 15 40 , , , 8935 15 41 occurred occur VBD 8935 15 42 once once RB 8935 15 43 only only RB 8935 15 44 . . . 8935 16 1 A a DT 8935 16 2 simple simple JJ 8935 16 3 and and CC 8935 16 4 general general JJ 8935 16 5 strategy strategy NN 8935 16 6 for for IN 8935 16 7 dealing deal VBG 8935 16 8 with with IN 8935 16 9 searches search NNS 8935 16 10 of of IN 8935 16 11 data datum NNS 8935 16 12 ele- ele- NNS 8935 16 13 ments ment NNS 8935 16 14 , , , 8935 16 15 the the DT 8935 16 16 contents content NNS 8935 16 17 of of IN 8935 16 18 which which WDT 8935 16 19 show show VBP 8935 16 20 large large JJ 8935 16 21 variety variety NN 8935 16 22 and and CC 8935 16 23 disparate disparate JJ 8935 16 24 distribu- distribu- JJ 8935 16 25 tions tion NNS 8935 16 26 , , , 8935 16 27 is be VBZ 8935 16 28 under under IN 8935 16 29 development development NN 8935 16 30 by by IN 8935 16 31 the the DT 8935 16 32 Research Research NNP 8935 16 33 Unit Unit NNP 8935 16 34 at at IN 8935 16 35 the the DT 8935 16 36 Sheffield Sheffield NNP 8935 16 37 School School NNP 8935 16 38 , , , 8935 16 39 and and CC 8935 16 40 has have VBZ 8935 16 41 thus thus RB 8935 16 42 far far RB 8935 16 43 been be VBN 8935 16 44 elaborated elaborate VBN 8935 16 45 in in IN 8935 16 46 regard regard NN 8935 16 47 to to IN 8935 16 48 searches search NNS 8935 16 49 of of IN 8935 16 50 chemical chemical NN 8935 16 51 struc- struc- NN 8935 16 52 tures ture NNS 8935 16 53 and and CC 8935 16 54 of of IN 8935 16 55 natural natural JJ 8935 16 56 - - HYPH 8935 16 57 language language NN 8935 16 58 data datum NNS 8935 16 59 bases basis NNS 8935 16 60 . . . 8935 17 1 8• 8• CD 8935 17 2 9 9 CD 8935 17 3 Based base VBN 8935 17 4 on on IN 8935 17 5 information information NN 8935 17 6 - - HYPH 8935 17 7 theoret- theoret- NN 8935 17 8 ic ic NNP 8935 17 9 principles principle NNS 8935 17 10 , , , 8935 17 11 it -PRON- PRP 8935 17 12 involves involve VBZ 8935 17 13 a a DT 8935 17 14 two two CD 8935 17 15 - - HYPH 8935 17 16 stage stage NN 8935 17 17 search search NN 8935 17 18 procedure procedure NN 8935 17 19 in in IN 8935 17 20 which which WDT 8935 17 21 in in IN 8935 17 22 the the DT 8935 17 23 first first JJ 8935 17 24 and and CC 8935 17 25 rapid rapid JJ 8935 17 26 stage stage NN 8935 17 27 the the DT 8935 17 28 majority majority NN 8935 17 29 of of IN 8935 17 30 items item NNS 8935 17 31 which which WDT 8935 17 32 can can MD 8935 17 33 not not RB 8935 17 34 possibly possibly RB 8935 17 35 fulfill fulfill VB 8935 17 36 the the DT 8935 17 37 search search NN 8935 17 38 criteria criterion NNS 8935 17 39 are be VBP 8935 17 40 eliminated eliminate VBN 8935 17 41 , , , 8935 17 42 while while IN 8935 17 43 those those DT 8935 17 44 which which WDT 8935 17 45 meet meet VBP 8935 17 46 the the DT 8935 17 47 criteria criterion NNS 8935 17 48 are be VBP 8935 17 49 ex- ex- RB 8935 17 50 amined amine VBN 8935 17 51 for for IN 8935 17 52 an an DT 8935 17 53 exact exact JJ 8935 17 54 match match NN 8935 17 55 at at IN 8935 17 56 the the DT 8935 17 57 second second JJ 8935 17 58 stage stage NN 8935 17 59 . . . 8935 18 1 The the DT 8935 18 2 criteria criterion NNS 8935 18 3 ( ( -LRB- 8935 18 4 or or CC 8935 18 5 attributes attribute NNS 8935 18 6 ) ) -RRB- 8935 18 7 are be VBP 8935 18 8 selected select VBN 8935 18 9 on on IN 8935 18 10 the the DT 8935 18 11 basis basis NN 8935 18 12 of of IN 8935 18 13 an an DT 8935 18 14 examination examination NN 8935 18 15 of of IN 8935 18 16 the the DT 8935 18 17 microstructure microstructure NN 8935 18 18 of of IN 8935 18 19 the the DT 8935 18 20 items item NNS 8935 18 21 in in IN 8935 18 22 the the DT 8935 18 23 data data NN 8935 18 24 base base NN 8935 18 25 , , , 8935 18 26 and and CC 8935 18 27 are be VBP 8935 18 28 chosen choose VBN 8935 18 29 so so IN 8935 18 30 that that IN 8935 18 31 their -PRON- PRP$ 8935 18 32 frequencies frequency NNS 8935 18 33 are be VBP 8935 18 34 ap- ap- RB 8935 18 35 proximately proximately RB 8935 18 36 equal equal JJ 8935 18 37 . . . 8935 19 1 The the DT 8935 19 2 number number NN 8935 19 3 of of IN 8935 19 4 criteria criterion NNS 8935 19 5 or or CC 8935 19 6 attributes attribute NNS 8935 19 7 chosen choose VBN 8935 19 8 for for IN 8935 19 9 de- de- DT 8935 19 10 scription scription NN 8935 19 11 of of IN 8935 19 12 the the DT 8935 19 13 items item NNS 8935 19 14 is be VBZ 8935 19 15 variable variable JJ 8935 19 16 within within IN 8935 19 17 a a DT 8935 19 18 wide wide JJ 8935 19 19 range range NN 8935 19 20 ; ; : 8935 19 21 with with IN 8935 19 22 their -PRON- PRP$ 8935 19 23 aid aid NN 8935 19 24 , , , 8935 19 25 the the DT 8935 19 26 variety variety NN 8935 19 27 of of IN 8935 19 28 items item NNS 8935 19 29 can can MD 8935 19 30 be be VB 8935 19 31 described describe VBN 8935 19 32 so so IN 8935 19 33 as as IN 8935 19 34 to to TO 8935 19 35 facilitate facilitate VB 8935 19 36 discrimination discrimination NN 8935 19 37 among among IN 8935 19 38 them -PRON- PRP 8935 19 39 . . . 8935 20 1 In in IN 8935 20 2 the the DT 8935 20 3 context context NN 8935 20 4 of of IN 8935 20 5 substructure substructure NN 8935 20 6 searching search VBG 8935 20 7 , , , 8935 20 8 the the DT 8935 20 9 attributes attribute NNS 8935 20 10 are be VBP 8935 20 11 representa- representa- JJ 8935 20 12 tions tion NNS 8935 20 13 of of IN 8935 20 14 fragments fragment NNS 8935 20 15 of of IN 8935 20 16 chemical chemical NN 8935 20 17 structures,10 structures,10 NNP 8935 20 18 while while IN 8935 20 19 in in IN 8935 20 20 the the DT 8935 20 21 case case NN 8935 20 22 of of IN 8935 20 23 text text NN 8935 20 24 , , , 8935 20 25 they -PRON- PRP 8935 20 26 are be VBP 8935 20 27 strings string NNS 8935 20 28 of of IN 8935 20 29 characters character NNS 8935 20 30 which which WDT 8935 20 31 are be VBP 8935 20 32 variable variable JJ 8935 20 33 in in IN 8935 20 34 length length NN 8935 20 35 . . . 8935 21 1 These these DT 8935 21 2 strings string NNS 8935 21 3 are be VBP 8935 21 4 long long JJ 8935 21 5 when when WRB 8935 21 6 the the DT 8935 21 7 characters character NNS 8935 21 8 comprising comprise VBG 8935 21 9 them -PRON- PRP 8935 21 10 represent represent VBP 8935 21 11 frequent frequent JJ 8935 21 12 combina- combina- JJ 8935 21 13 tions tion NNS 8935 21 14 , , , 8935 21 15 and and CC 8935 21 16 short short JJ 8935 21 17 when when WRB 8935 21 18 the the DT 8935 21 19 characters character NNS 8935 21 20 are be VBP 8935 21 21 infrequent.11 infrequent.11 NNP 8935 21 22 Since since IN 8935 21 23 the the DT 8935 21 24 sets set NNS 8935 21 25 of of IN 8935 21 26 at- at- DT 8935 21 27 tributes tribute NNS 8935 21 28 can can MD 8935 21 29 generate generate VB 8935 21 30 , , , 8935 21 31 in in IN 8935 21 32 an an DT 8935 21 33 approximate approximate JJ 8935 21 34 manner manner NN 8935 21 35 , , , 8935 21 36 the the DT 8935 21 37 variety variety NN 8935 21 38 of of IN 8935 21 39 items item NNS 8935 21 40 en- en- RB 8935 21 41 countered counter VBN 8935 21 42 in in IN 8935 21 43 the the DT 8935 21 44 data data NN 8935 21 45 base base NN 8935 21 46 , , , 8935 21 47 they -PRON- PRP 8935 21 48 are be VBP 8935 21 49 termed term VBN 8935 21 50 variety variety NN 8935 21 51 generato1·s generato1·s NN 8935 21 52 . . . 8935 22 1 They -PRON- PRP 8935 22 2 are be VBP 8935 22 3 in- in- CD 8935 22 4 termediate termediate NN 8935 22 5 in in IN 8935 22 6 number number NN 8935 22 7 between between IN 8935 22 8 the the DT 8935 22 9 primitive primitive JJ 8935 22 10 set set NN 8935 22 11 of of IN 8935 22 12 symbols symbol NNS 8935 22 13 ( ( -LRB- 8935 22 14 alphanumer- alphanumer- NNP 8935 22 15 ic ic NNP 8935 22 16 characters character NNS 8935 22 17 in in IN 8935 22 18 the the DT 8935 22 19 case case NN 8935 22 20 of of IN 8935 22 21 text text NN 8935 22 22 , , , 8935 22 23 atoms atom NNS 8935 22 24 and and CC 8935 22 25 bonds bond NNS 8935 22 26 in in IN 8935 22 27 that that DT 8935 22 28 of of IN 8935 22 29 chemical chemical NN 8935 22 30 struc- struc- NNP 8935 22 31 tures ture NNS 8935 22 32 ) ) -RRB- 8935 22 33 and and CC 8935 22 34 the the DT 8935 22 35 actual actual JJ 8935 22 36 variety variety NN 8935 22 37 of of IN 8935 22 38 items item NNS 8935 22 39 in in IN 8935 22 40 the the DT 8935 22 41 collection collection NN 8935 22 42 ( ( -LRB- 8935 22 43 words word NNS 8935 22 44 or or CC 8935 22 45 word word NN 8935 22 46 fragments fragment NNS 8935 22 47 in in IN 8935 22 48 text text NN 8935 22 49 in in IN 8935 22 50 the the DT 8935 22 51 first first JJ 8935 22 52 instance instance NN 8935 22 53 , , , 8935 22 54 and and CC 8935 22 55 molecules molecule NNS 8935 22 56 in in IN 8935 22 57 the the DT 8935 22 58 second second JJ 8935 22 59 ) ) -RRB- 8935 22 60 . . . 8935 23 1 The the DT 8935 23 2 variety variety NN 8935 23 3 - - HYPH 8935 23 4 generator generator NN 8935 23 5 approach approach NN 8935 23 6 involves involve VBZ 8935 23 7 recognition recognition NN 8935 23 8 of of IN 8935 23 9 the the DT 8935 23 10 fact fact NN 8935 23 11 that that IN 8935 23 12 the the DT 8935 23 13 statistical statistical JJ 8935 23 14 properties property NNS 8935 23 15 of of IN 8935 23 16 specific specific JJ 8935 23 17 data data NN 8935 23 18 elements element NNS 8935 23 19 within within IN 8935 23 20 homogeneous homogeneous JJ 8935 23 21 data data NN 8935 23 22 bases basis NNS 8935 23 23 are be VBP 8935 23 24 relatively relatively RB 8935 23 25 constant constant JJ 8935 23 26 , , , 8935 23 27 and and CC 8935 23 28 that that IN 8935 23 29 the the DT 8935 23 30 primitive primitive JJ 8935 23 31 symbols symbol NNS 8935 23 32 of of IN 8935 23 33 the the DT 8935 23 34 data data NN 8935 23 35 elements element NNS 8935 23 36 themselves -PRON- PRP 8935 23 37 usually usually RB 8935 23 38 show show VBP 8935 23 39 hyperbolic hyperbolic JJ 8935 23 40 distributions distribution NNS 8935 23 41 . . . 8935 24 1 New new JJ 8935 24 2 symbol symbol NN 8935 24 3 sets set NNS 8935 24 4 can can MD 8935 24 5 therefore therefore RB 8935 24 6 be be VB 8935 24 7 defined define VBN 8935 24 8 , , , 8935 24 9 consisting consist VBG 8935 24 10 of of IN 8935 24 11 sequences sequence NNS 8935 24 12 of of IN 8935 24 13 primitive primitive JJ 8935 24 14 symbols symbol NNS 8935 24 15 such such JJ 8935 24 16 that that IN 8935 24 17 their -PRON- PRP$ 8935 24 18 frequencies frequency NNS 8935 24 19 of of IN 8935 24 20 occurrence occurrence NN 8935 24 21 become become VBP 8935 24 22 comparable comparable JJ 8935 24 23 . . . 8935 25 1 The the DT 8935 25 2 new new JJ 8935 25 3 symbol symbol NN 8935 25 4 sets set NNS 8935 25 5 then then RB 8935 25 6 constitute constitute VBP 8935 25 7 the the DT 8935 25 8 attributes attribute NNS 8935 25 9 which which WDT 8935 25 10 are be VBP 8935 25 11 employed employ VBN 8935 25 12 , , , 8935 25 13 singly singly RB 8935 25 14 or or CC 8935 25 15 in in IN 8935 25 16 combination combination NN 8935 25 17 , , , 8935 25 18 to to TO 8935 25 19 represent represent VB 8935 25 20 the the DT 8935 25 21 items item NNS 8935 25 22 within within IN 8935 25 23 a a DT 8935 25 24 search search NN 8935 25 25 file file NN 8935 25 26 . . . 8935 26 1 These these DT 8935 26 2 symbol symbol NN 8935 26 3 sets set VBZ 8935 26 4 Variety Variety NNP 8935 26 5 - - HYPH 8935 26 6 Generator Generator NNP 8935 26 7 ApproachjFOKKER ApproachjFOKKER NNP 8935 26 8 and and CC 8935 26 9 LYNCH LYNCH NNP 8935 26 10 107 107 CD 8935 26 11 approximate approximate JJ 8935 26 12 to to IN 8935 26 13 the the DT 8935 26 14 ideal ideal NN 8935 26 15 of of IN 8935 26 16 equifrequency equifrequency NN 8935 26 17 postulated postulate VBN 8935 26 18 by by IN 8935 26 19 Shannon Shannon NNP 8935 26 20 for for IN 8935 26 21 op- op- JJ 8935 26 22 timal timal JJ 8935 26 23 efficiency efficiency NN 8935 26 24 in in IN 8935 26 25 communication communication NN 8935 26 26 . . . 8935 27 1 12 12 CD 8935 27 2 Only only RB 8935 27 3 an an DT 8935 27 4 approximation approximation NN 8935 27 5 can can MD 8935 27 6 be be VB 8935 27 7 ob- ob- RB 8935 27 8 tained taine VBN 8935 27 9 , , , 8935 27 10 however however RB 8935 27 11 , , , 8935 27 12 since since IN 8935 27 13 the the DT 8935 27 14 distributions distribution NNS 8935 27 15 of of IN 8935 27 16 the the DT 8935 27 17 newly newly RB 8935 27 18 defined define VBN 8935 27 19 symbols symbol NNS 8935 27 20 still still RB 8935 27 21 cover cover VBP 8935 27 22 a a DT 8935 27 23 relatively relatively RB 8935 27 24 wide wide JJ 8935 27 25 range range NN 8935 27 26 , , , 8935 27 27 and and CC 8935 27 28 since since IN 8935 27 29 they -PRON- PRP 8935 27 30 are be VBP 8935 27 31 seldom seldom RB 8935 27 32 entirely entirely RB 8935 27 33 indepen- indepen- JJ 8935 27 34 dent dent NN 8935 27 35 of of IN 8935 27 36 one one CD 8935 27 37 another another DT 8935 27 38 in in IN 8935 27 39 statistical statistical JJ 8935 27 40 terms term NNS 8935 27 41 , , , 8935 27 42 and and CC 8935 27 43 may may MD 8935 27 44 often often RB 8935 27 45 be be VB 8935 27 46 strongly strongly RB 8935 27 47 asso- asso- RB 8935 27 48 ciated ciate VBN 8935 27 49 . . . 8935 28 1 The the DT 8935 28 2 variety variety NN 8935 28 3 - - HYPH 8935 28 4 generator generator NN 8935 28 5 concept concept NN 8935 28 6 is be VBZ 8935 28 7 not not RB 8935 28 8 entirely entirely RB 8935 28 9 novel novel JJ 8935 28 10 . . . 8935 29 1 Indeed indeed RB 8935 29 2 , , , 8935 29 3 it -PRON- PRP 8935 29 4 was be VBD 8935 29 5 antici- antici- RB 8935 29 6 pated pat VBN 8935 29 7 most most RBS 8935 29 8 closely closely RB 8935 29 9 in in IN 8935 29 10 precisely precisely RB 8935 29 11 the the DT 8935 29 12 present present JJ 8935 29 13 context context NN 8935 29 14 by by IN 8935 29 15 Merrill Merrill NNP 8935 29 16 and and CC 8935 29 17 by by IN 8935 29 18 Cutter Cutter NNP 8935 29 19 with with IN 8935 29 20 a a DT 8935 29 21 view view NN 8935 29 22 to to IN 8935 29 23 subdividing subdivide VBG 8935 29 24 a a DT 8935 29 25 library library NN 8935 29 26 's 's POS 8935 29 27 holdings holding NNS 8935 29 28 into into IN 8935 29 29 equal equal JJ 8935 29 30 groups group NNS 8935 29 31 of of IN 8935 29 32 items.13 items.13 NNP 8935 29 33 • • NNP 8935 29 34 14 14 CD 8935 29 35 However however RB 8935 29 36 , , , 8935 29 37 the the DT 8935 29 38 greater great JJR 8935 29 39 flexibility flexibility NN 8935 29 40 of of IN 8935 29 41 computer computer NN 8935 29 42 techniques technique NNS 8935 29 43 would would MD 8935 29 44 appear appear VB 8935 29 45 to to TO 8935 29 46 make make VB 8935 29 47 its -PRON- PRP$ 8935 29 48 use use NN 8935 29 49 today today NN 8935 29 50 even even RB 8935 29 51 more more RBR 8935 29 52 attractive attractive JJ 8935 29 53 . . . 8935 30 1 This this DT 8935 30 2 paper paper NN 8935 30 3 thus thus RB 8935 30 4 describes describe VBZ 8935 30 5 a a DT 8935 30 6 study study NN 8935 30 7 of of IN 8935 30 8 a a DT 8935 30 9 large large JJ 8935 30 10 file file NN 8935 30 11 of of IN 8935 30 12 authors author NNS 8935 30 13 ' ' POS 8935 30 14 names name NNS 8935 30 15 with with IN 8935 30 16 a a DT 8935 30 17 view view NN 8935 30 18 to to IN 8935 30 19 identifying identify VBG 8935 30 20 attributes attribute NNS 8935 30 21 of of IN 8935 30 22 the the DT 8935 30 23 names name NNS 8935 30 24 which which WDT 8935 30 25 can can MD 8935 30 26 be be VB 8935 30 27 used use VBN 8935 30 28 for for IN 8935 30 29 effi- effi- NNP 8935 30 30 cient cient NNP 8935 30 31 reh·ieval reh·ieval NN 8935 30 32 purposes purpose NNS 8935 30 33 . . . 8935 31 1 Assessment assessment NN 8935 31 2 of of IN 8935 31 3 the the DT 8935 31 4 effectiveness effectiveness NN 8935 31 5 of of IN 8935 31 6 the the DT 8935 31 7 attributes attribute NNS 8935 31 8 in in IN 8935 31 9 retrieval retrieval NN 8935 31 10 is be VBZ 8935 31 11 described describe VBN 8935 31 12 in in IN 8935 31 13 Part Part NNP 8935 31 14 2 2 CD 8935 31 15 of of IN 8935 31 16 this this DT 8935 31 17 series series NN 8935 31 18 . . . 8935 32 1 ( ( -LRB- 8935 32 2 t t NN 8935 32 3 The the DT 8935 32 4 main main JJ 8935 32 5 terms term NNS 8935 32 6 used use VBN 8935 32 7 here here RB 8935 32 8 are be VBP 8935 32 9 n n NN 8935 32 10 - - HYPH 8935 32 11 gram gram NN 8935 32 12 , , , 8935 32 13 key key NN 8935 32 14 , , , 8935 32 15 and and CC 8935 32 16 key key NN 8935 32 17 - - HYPH 8935 32 18 set set NN 8935 32 19 , , , 8935 32 20 where where WRB 8935 32 21 an an DT 8935 32 22 n n NN 8935 32 23 - - HYPH 8935 32 24 gram gram NN 8935 32 25 is be VBZ 8935 32 26 a a DT 8935 32 27 string string NN 8935 32 28 of of IN 8935 32 29 n n JJ 8935 32 30 adjacent adjacent JJ 8935 32 31 char- char- JJ 8935 32 32 acters acter NNS 8935 32 33 . . . 8935 33 1 A a DT 8935 33 2 key key JJ 8935 33 3 consists consist VBZ 8935 33 4 of of IN 8935 33 5 an an DT 8935 33 6 n n NN 8935 33 7 - - HYPH 8935 33 8 gram gram NN 8935 33 9 , , , 8935 33 10 and and CC 8935 33 11 keys key NNS 8935 33 12 are be VBP 8935 33 13 chosen choose VBN 8935 33 14 so so IN 8935 33 15 that that IN 8935 33 16 the the DT 8935 33 17 fre- fre- XX 8935 33 18 quencies quencie NNS 8935 33 19 of of IN 8935 33 20 a a DT 8935 33 21 set set NN 8935 33 22 of of IN 8935 33 23 keys key NNS 8935 33 24 ( ( -LRB- 8935 33 25 or or CC 8935 33 26 key key NN 8935 33 27 - - HYPH 8935 33 28 set set NN 8935 33 29 ) ) -RRB- 8935 33 30 are be VBP 8935 33 31 approximately approximately RB 8935 33 32 equivalent equivalent JJ 8935 33 33 in in IN 8935 33 34 a a DT 8935 33 35 given give VBN 8935 33 36 file file NN 8935 33 37 . . . 8935 34 1 The the DT 8935 34 2 measures measure NNS 8935 34 3 used use VBN 8935 34 4 in in IN 8935 34 5 assessing assess VBG 8935 34 6 frequency frequency NN 8935 34 7 distributions distribution NNS 8935 34 8 are be VBP 8935 34 9 Shannon Shannon NNP 8935 34 10 's 's POS 8935 34 11 ex- ex- JJ 8935 34 12 pressions pression NNS 8935 34 13 for for IN 8935 34 14 the the DT 8935 34 15 entropy entropy NN 8935 34 16 of of IN 8935 34 17 a a DT 8935 34 18 sequence sequence NN 8935 34 19 of of IN 8935 34 20 symbols symbol NNS 8935 34 21 : : : 8935 34 22 and and CC 8935 34 23 relative relative JJ 8935 34 24 entropy entropy RB 8935 34 25 : : : 8935 34 26 i i PRP 8935 34 27 H h NN 8935 34 28 = = : 8935 34 29 - - : 8935 34 30 I i NN 8935 34 31 p1log2pi p1log2pi NNP 8935 34 32 i= i= NN 8935 34 33 1 1 CD 8935 34 34 H H NNP 8935 34 35 _ _ NNP 8935 34 36 Hactual Hactual NNP 8935 34 37 r- r- XX 8935 34 38 Hmaximum Hmaximum NNP 8935 34 39 Hmaxlmum Hmaxlmum NNP 8935 34 40 is be VBZ 8935 34 41 reached reach VBN 8935 34 42 when when WRB 8935 34 43 the the DT 8935 34 44 probabilities probability NNS 8935 34 45 of of IN 8935 34 46 occurrence occurrence NN 8935 34 47 of of IN 8935 34 48 the the DT 8935 34 49 symbols symbol NNS 8935 34 50 of of IN 8935 34 51 the the DT 8935 34 52 sequence sequence NN 8935 34 53 are be VBP 8935 34 54 equal equal JJ 8935 34 55 ; ; : 8935 34 56 its -PRON- PRP$ 8935 34 57 value value NN 8935 34 58 is be VBZ 8935 34 59 the the DT 8935 34 60 binary binary JJ 8935 34 61 logarithm logarithm NN 8935 34 62 of of IN 8935 34 63 the the DT 8935 34 64 variety variety NN 8935 34 65 of of IN 8935 34 66 symbols symbol NNS 8935 34 67 , , , 8935 34 68 since since IN 8935 34 69 1 1 CD 8935 34 70 1 1 CD 8935 34 71 H H NNP 8935 34 72 = = SYM 8935 34 73 - - : 8935 34 74 n(-log2- n(-log2- NN 8935 34 75 ) ) -RRB- 8935 34 76 = = NFP 8935 34 77 log2n log2n NNP 8935 34 78 n n CC 8935 34 79 n n CC 8935 34 80 The the DT 8935 34 81 value value NN 8935 34 82 of of IN 8935 34 83 the the DT 8935 34 84 relative relative JJ 8935 34 85 entropy entropy NN 8935 34 86 is be VBZ 8935 34 87 thus thus RB 8935 34 88 a a DT 8935 34 89 measure measure NN 8935 34 90 of of IN 8935 34 91 the the DT 8935 34 92 degree degree NN 8935 34 93 of of IN 8935 34 94 equi- equi- JJ 8935 34 95 frequency frequency NN 8935 34 96 of of IN 8935 34 97 a a DT 8935 34 98 set set NN 8935 34 99 of of IN 8935 34 100 symbols symbol NNS 8935 34 101 , , , 8935 34 102 and and CC 8935 34 103 is be VBZ 8935 34 104 independent independent JJ 8935 34 105 of of IN 8935 34 106 their -PRON- PRP$ 8935 34 107 variety variety NN 8935 34 108 . . . 8935 35 1 CHARACTERISTICS CHARACTERISTICS NNP 8935 35 2 OF of IN 8935 35 3 NAME NAME NNP 8935 35 4 FILE FILE VBD 8935 35 5 The the DT 8935 35 6 file file NN 8935 35 7 studied study VBD 8935 35 8 was be VBD 8935 35 9 a a DT 8935 35 10 collection collection NN 8935 35 11 of of IN 8935 35 12 100,000 100,000 CD 8935 35 13 personal personal JJ 8935 35 14 names name NNS 8935 35 15 taken take VBN 8935 35 16 from from IN 8935 35 17 ten ten CD 8935 35 18 issues issue NNS 8935 35 19 of of IN 8935 35 20 the the DT 8935 35 21 INSPEC INSPEC NNP 8935 35 22 data data NN 8935 35 23 base base NN 8935 35 24 dating date VBG 8935 35 25 from from IN 8935 35 26 the the DT 8935 35 27 period period NN 8935 35 28 1969 1969 CD 8935 35 29 to to IN 8935 35 30 1972 1972 CD 8935 35 31 . . . 8935 36 1 The the DT 8935 36 2 names name NNS 8935 36 3 are be VBP 8935 36 4 represented represent VBN 8935 36 5 in in IN 8935 36 6 variable variable JJ 8935 36 7 - - HYPH 8935 36 8 length length NN 8935 36 9 format format NN 8935 36 10 , , , 8935 36 11 surname surname NN 8935 36 12 followed follow VBN 8935 36 13 by by IN 8935 36 14 a a DT 8935 36 15 comma comma NN 8935 36 16 , , , 8935 36 17 space space NN 8935 36 18 and and CC 8935 36 19 initials initial NNS 8935 36 20 each each DT 8935 36 21 followed follow VBN 8935 36 22 by by IN 8935 36 23 a a DT 8935 36 24 period period NN 8935 36 25 . . . 8935 37 1 For for IN 8935 37 2 the the DT 8935 37 3 present present JJ 8935 37 4 purpose purpose NN 8935 37 5 , , , 8935 37 6 case case NN 8935 37 7 and and CC 8935 37 8 diacritic diacritic JJ 8935 37 9 shift shift NN 8935 37 10 symbols symbol NNS 8935 37 11 were be VBD 8935 37 12 ignored ignore VBN 8935 37 13 . . . 8935 38 1 < < XX 8935 38 2 ~>To ~>To NFP 8935 38 3 appear appear VB 8935 38 4 in in IN 8935 38 5 the the DT 8935 38 6 September September NNP 8935 38 7 1974 1974 CD 8935 38 8 issue issue NN 8935 38 9 of of IN 8935 38 10 the the DT 8935 38 11 Journal Journal NNP 8935 38 12 of of IN 8935 38 13 Library Library NNP 8935 38 14 Automation Automation NNP 8935 38 15 . . . 8935 39 1 108 108 CD 8935 39 2 Journal Journal NNP 8935 39 3 of of IN 8935 39 4 Library Library NNP 8935 39 5 Automation Automation NNP 8935 39 6 Vol Vol NNP 8935 39 7 . . . 8935 40 1 7/2 7/2 CD 8935 40 2 June June NNP 8935 40 3 1974 1974 CD 8935 40 4 Subsets Subsets NNPS 8935 40 5 of of IN 8935 40 6 the the DT 8935 40 7 file file NN 8935 40 8 were be VBD 8935 40 9 first first RB 8935 40 10 sorted sort VBN 8935 40 11 into into IN 8935 40 12 sequence sequence NN 8935 40 13 on on IN 8935 40 14 the the DT 8935 40 15 basis basis NN 8935 40 16 of of IN 8935 40 17 the the DT 8935 40 18 full full JJ 8935 40 19 names name NNS 8935 40 20 , , , 8935 40 21 and and CC 8935 40 22 distributions distribution NNS 8935 40 23 determined determine VBD 8935 40 24 both both DT 8935 40 25 for for IN 8935 40 26 surnames surname NNS 8935 40 27 and and CC 8935 40 28 initials initial NNS 8935 40 29 , , , 8935 40 30 and and CC 8935 40 31 for for IN 8935 40 32 surnames surname NNS 8935 40 33 alone alone RB 8935 40 34 , , , 8935 40 35 as as IN 8935 40 36 shown show VBN 8935 40 37 in in IN 8935 40 38 Table table NN 8935 40 39 1 1 CD 8935 40 40 for for IN 8935 40 41 the the DT 8935 40 42 subset subset NN 8935 40 43 of of IN 8935 40 44 50,000 50,000 CD 8935 40 45 names name NNS 8935 40 46 . . . 8935 41 1 Since since IN 8935 41 2 the the DT 8935 41 3 great great JJ 8935 41 4 majority majority NN 8935 41 5 of of IN 8935 41 6 full full JJ 8935 41 7 names name NNS 8935 41 8 occur occur VBP 8935 41 9 once once RB 8935 41 10 only only RB 8935 41 11 , , , 8935 41 12 the the DT 8935 41 13 relative relative NN 8935 41 14 en- en- XX 8935 41 15 tropy tropy NN 8935 41 16 of of IN 8935 41 17 this this DT 8935 41 18 distribution distribution NN 8935 41 19 , , , 8935 41 20 at at IN 8935 41 21 0.975 0.975 CD 8935 41 22 ( ( -LRB- 8935 41 23 computed compute VBN 8935 41 24 with with IN 8935 41 25 respect respect NN 8935 41 26 to to IN 8935 41 27 the the DT 8935 41 28 50,000 50,000 CD 8935 41 29 names name NNS 8935 41 30 , , , 8935 41 31 i.e. i.e. FW 8935 41 32 , , , 8935 41 33 Hmax= Hmax= NNP 8935 41 34 log250,000 log250,000 NNP 8935 41 35 ) ) -RRB- 8935 41 36 , , , 8935 41 37 is be VBZ 8935 41 38 high high JJ 8935 41 39 , , , 8935 41 40 while while IN 8935 41 41 that that IN 8935 41 42 for for IN 8935 41 43 surnames surname NNS 8935 41 44 alone alone RB 8935 41 45 is be VBZ 8935 41 46 lower low JJR 8935 41 47 , , , 8935 41 48 at at IN 8935 41 49 0.904 0.904 CD 8935 41 50 . . . 8935 42 1 An an DT 8935 42 2 analysis analysis NN 8935 42 3 of of IN 8935 42 4 the the DT 8935 42 5 ratio ratio NN 8935 42 6 of of IN 8935 42 7 unique unique JJ 8935 42 8 surnames surname NNS 8935 42 9 to to IN 8935 42 10 the the DT 8935 42 11 total total JJ 8935 42 12 number number NN 8935 42 13 of of IN 8935 42 14 entries entry NNS 8935 42 15 in in IN 8935 42 16 files file NNS 8935 42 17 of of IN 8935 42 18 25,000 25,000 CD 8935 42 19 , , , 8935 42 20 50,000 50,000 CD 8935 42 21 , , , 8935 42 22 75,000 75,000 CD 8935 42 23 and and CC 8935 42 24 100,000 100,000 CD 8935 42 25 names name NNS 8935 42 26 showed show VBD 8935 42 27 that that IN 8935 42 28 the the DT 8935 42 29 proportion proportion NN 8935 42 30 of of IN 8935 42 31 different different JJ 8935 42 32 surnames surname NNS 8935 42 33 added add VBD 8935 42 34 to to IN 8935 42 35 the the DT 8935 42 36 file file NN 8935 42 37 as as IN 8935 42 38 it -PRON- PRP 8935 42 39 in- in- XX 8935 42 40 creases crease VBZ 8935 42 41 in in IN 8935 42 42 size size NN 8935 42 43 is be VBZ 8935 42 44 predictable predictable JJ 8935 42 45 . . . 8935 43 1 The the DT 8935 43 2 relationship relationship NN 8935 43 3 between between IN 8935 43 4 the the DT 8935 43 5 number number NN 8935 43 6 of of IN 8935 43 7 dif- dif- JJ 8935 43 8 ferent ferent JJ 8935 43 9 surnames surname NNS 8935 43 10 ( ( -LRB- 8935 43 11 D d NN 8935 43 12 ) ) -RRB- 8935 43 13 and and CC 8935 43 14 the the DT 8935 43 15 total total JJ 8935 43 16 number number NN 8935 43 17 of of IN 8935 43 18 entries entry NNS 8935 43 19 ( ( -LRB- 8935 43 20 N n NN 8935 43 21 ) ) -RRB- 8935 43 22 conforms conform VBZ 8935 43 23 to to IN 8935 43 24 the the DT 8935 43 25 expression expression NN 8935 43 26 : : : 8935 43 27 D D NNP 8935 43 28 = = SYM 8935 43 29 aNtl aNtl NNP 8935 43 30 where where WRB 8935 43 31 a a DT 8935 43 32 = = SYM 8935 43 33 5.89 5.89 CD 8935 43 34 and and CC 8935 43 35 { { -LRB- 8935 43 36 3 3 CD 8935 43 37 = = SYM 8935 43 38 0.78 0.78 CD 8935 43 39 . . . 8935 44 1 Next next RB 8935 44 2 , , , 8935 44 3 the the DT 8935 44 4 frequencies frequency NNS 8935 44 5 of of IN 8935 44 6 characters character NNS 8935 44 7 at at IN 8935 44 8 different different JJ 8935 44 9 positions position NNS 8935 44 10 in in IN 8935 44 11 the the DT 8935 44 12 sur- sur- NN 8935 44 13 names name NNS 8935 44 14 and and CC 8935 44 15 of of IN 8935 44 16 the the DT 8935 44 17 initials initial NNS 8935 44 18 were be VBD 8935 44 19 determined determine VBN 8935 44 20 . . . 8935 45 1 The the DT 8935 45 2 most most RBS 8935 45 3 important important JJ 8935 45 4 positions position NNS 8935 45 5 in in IN 8935 45 6 the the DT 8935 45 7 surname surname NN 8935 45 8 are be VBP 8935 45 9 the the DT 8935 45 10 first first JJ 8935 45 11 and and CC 8935 45 12 last last JJ 8935 45 13 characters character NNS 8935 45 14 , , , 8935 45 15 as as IN 8935 45 16 will will MD 8935 45 17 be be VB 8935 45 18 seen see VBN 8935 45 19 shortly shortly RB 8935 45 20 . . . 8935 46 1 The the DT 8935 46 2 distributions distribution NNS 8935 46 3 of of IN 8935 46 4 these these DT 8935 46 5 characters character NNS 8935 46 6 and and CC 8935 46 7 of of IN 8935 46 8 the the DT 8935 46 9 first first JJ 8935 46 10 and and CC 8935 46 11 second second JJ 8935 46 12 initials initial NNS 8935 46 13 are be VBP 8935 46 14 shown show VBN 8935 46 15 in in IN 8935 46 16 Table Table NNP 8935 46 17 2 2 CD 8935 46 18 . . . 8935 47 1 The the DT 8935 47 2 relative relative JJ 8935 47 3 entropy entropy RB 8935 47 4 of of IN 8935 47 5 the the DT 8935 47 6 first first JJ 8935 47 7 initial initial JJ 8935 47 8 is be VBZ 8935 47 9 , , , 8935 47 10 interestingly interestingly RB 8935 47 11 , , , 8935 47 12 Table table NN 8935 47 13 1 1 CD 8935 47 14 . . . 8935 48 1 Distribution distribution NN 8935 48 2 of of IN 8935 48 3 full full JJ 8935 48 4 names name NNS 8935 48 5 and and CC 8935 48 6 surnames surname NNS 8935 48 7 alone alone RB 8935 48 8 in in IN 8935 48 9 a a DT 8935 48 10 file file NN 8935 48 11 of of IN 8935 48 12 50,000 50,000 CD 8935 48 13 INSPEC INSPEC NNP 8935 48 14 names name NNS 8935 48 15 . . . 8935 49 1 Frequency Frequency NNP 8935 49 2 f f NNP 8935 49 3 1 1 CD 8935 49 4 2 2 CD 8935 49 5 3 3 CD 8935 49 6 4 4 CD 8935 49 7 5 5 CD 8935 49 8 6 6 CD 8935 49 9 7 7 CD 8935 49 10 8 8 CD 8935 49 11 9 9 CD 8935 49 12 10 10 CD 8935 49 13 11 11 CD 8935 49 14 12 12 CD 8935 49 15 13 13 CD 8935 49 16 14 14 CD 8935 49 17 15 15 CD 8935 49 18 16 16 CD 8935 49 19 17 17 CD 8935 49 20 18 18 CD 8935 49 21 19 19 CD 8935 49 22 20 20 CD 8935 49 23 > > CD 8935 49 24 20 20 CD 8935 49 25 Full full JJ 8935 49 26 Names name NNS 8935 49 27 No no UH 8935 49 28 . . . 8935 50 1 of of IN 8935 50 2 Names name NNS 8935 50 3 with with IN 8935 50 4 % % NN 8935 50 5 of of IN 8935 50 6 Names Names NNPS 8935 50 7 with with IN 8935 50 8 Frequency Frequency NNP 8935 50 9 f f NNP 8935 50 10 Frequency Frequency NNP 8935 50 11 f f NNP 8935 50 12 35,187 35,187 CD 8935 50 13 70.37 70.37 CD 8935 50 14 4,768 4,768 CD 8935 50 15 19.07 19.07 CD 8935 50 16 1,060 1,060 CD 8935 50 17 6.36 6.36 CD 8935 50 18 302 302 CD 8935 50 19 2.42 2.42 CD 8935 50 20 88 88 CD 8935 50 21 0.88 0.88 CD 8935 50 22 34 34 CD 8935 50 23 0.41 0.41 CD 8935 50 24 16 16 CD 8935 50 25 0.22 0.22 CD 8935 50 26 7 7 CD 8935 50 27 0.11 0.11 CD 8935 50 28 3 3 CD 8935 50 29 0.05 0.05 CD 8935 50 30 1 1 CD 8935 50 31 0.03 0.03 CD 8935 50 32 2 2 CD 8935 50 33 0.05 0.05 CD 8935 50 34 1 1 CD 8935 50 35 0.03 0.03 CD 8935 50 36 Total total JJ 8935 50 37 number number NN 8935 50 38 of of IN 8935 50 39 different different JJ 8935 50 40 full full JJ 8935 50 41 names name NNS 8935 50 42 = = SYM 8935 50 43 41,469 41,469 CD 8935 50 44 H h NN 8935 50 45 = = NFP 8935 50 46 15.22 15.22 CD 8935 50 47 Hmax hmax NN 8935 50 48 = = SYM 8935 50 49 15.61 15.61 CD 8935 50 50 ( ( -LRB- 8935 50 51 log250,000 log250,000 NNP 8935 50 52 ) ) -RRB- 8935 50 53 Hr Hr NNP 8935 50 54 = = SYM 8935 50 55 0.9753 0.9753 LS 8935 50 56 SU1·names SU1·names . 8935 50 57 No no UH 8935 50 58 . . . 8935 51 1 of of IN 8935 51 2 Surnames Surnames NNP 8935 51 3 with with IN 8935 51 4 Frequencyf Frequencyf NNP 8935 51 5 % % NN 8935 51 6 of of IN 8935 51 7 Surnames Surnames NNP 8935 51 8 with with IN 8935 51 9 Frequencyf Frequencyf NNP 8935 51 10 19,894 19,894 CD 8935 51 11 39.79 39.79 CD 8935 51 12 4,258 4,258 CD 8935 51 13 17.03 17.03 CD 8935 51 14 1,597 1,597 CD 8935 51 15 706 706 CD 8935 51 16 395 395 CD 8935 51 17 235 235 CD 8935 51 18 134 134 CD 8935 51 19 104 104 CD 8935 51 20 68 68 CD 8935 51 21 54 54 CD 8935 51 22 36 36 CD 8935 51 23 39 39 CD 8935 51 24 36 36 CD 8935 51 25 28 28 CD 8935 51 26 24 24 CD 8935 51 27 24 24 CD 8935 51 28 15 15 CD 8935 51 29 19 19 CD 8935 51 30 16 16 CD 8935 51 31 9 9 CD 8935 51 32 112 112 CD 8935 51 33 9.58 9.58 CD 8935 51 34 5.65 5.65 CD 8935 51 35 3.75 3.75 CD 8935 51 36 2.82 2.82 CD 8935 51 37 1.88 1.88 CD 8935 51 38 1.66 1.66 CD 8935 51 39 1.22 1.22 CD 8935 51 40 1.08 1.08 CD 8935 51 41 0.79 0.79 CD 8935 51 42 0.94 0.94 CD 8935 51 43 0.94 0.94 CD 8935 51 44 0.78 0.78 CD 8935 51 45 0.72 0.72 CD 8935 51 46 0.77 0.77 CD 8935 51 47 0.51 0.51 CD 8935 51 48 0.68 0.68 CD 8935 51 49 0.61 0.61 CD 8935 51 50 0.36 0.36 CD 8935 51 51 8.44 8.44 CD 8935 51 52 Total total JJ 8935 51 53 number number NN 8935 51 54 of of IN 8935 51 55 different different JJ 8935 51 56 surnames surname NNS 8935 51 57 = = SYM 8935 51 58 27,803 27,803 CD 8935 51 59 H h NN 8935 51 60 = = NFP 8935 51 61 14.11 14.11 CD 8935 51 62 Hmax Hmax NNP 8935 51 63 = = SYM 8935 51 64 15.61 15.61 CD 8935 51 65 ( ( -LRB- 8935 51 66 log250,000 log250,000 NNP 8935 51 67 ) ) -RRB- 8935 51 68 Hr Hr NNP 8935 51 69 = = SYM 8935 51 70 0.9042 0.9042 CD 8935 51 71 Variety Variety NNP 8935 51 72 - - HYPH 8935 51 73 Generator Generator NNP 8935 51 74 ApproachjFOKKER ApproachjFOKKER NNP 8935 51 75 and and CC 8935 51 76 LYNCH LYNCH NNP 8935 51 77 109 109 CD 8935 51 78 Table Table NNP 8935 51 79 2 2 CD 8935 51 80 . . . 8935 52 1 Distributions distribution NNS 8935 52 2 of of IN 8935 52 3 first first JJ 8935 52 4 and and CC 8935 52 5 last last JJ 8935 52 6 characters character NNS 8935 52 7 of of IN 8935 52 8 surname surname NN 8935 52 9 and and CC 8935 52 10 of of IN 8935 52 11 initials initial NNS 8935 52 12 in in IN 8935 52 13 50,000 50,000 CD 8935 52 14 INSPEC INSPEC NNP 8935 52 15 name name VBP 8935 52 16 me -PRON- PRP 8935 52 17 . . . 8935 53 1 First First NNP 8935 53 2 Character Character NNP 8935 53 3 Last Last NNP 8935 53 4 Character Character NNP 8935 53 5 First First NNP 8935 53 6 Second Second NNP 8935 53 7 of of IN 8935 53 8 Surname Surname NNP 8935 53 9 of of IN 8935 53 10 Surname Surname NNP 8935 53 11 Initial Initial NNP 8935 53 12 Initial Initial NNP 8935 53 13 s s NNP 8935 53 14 0.113 0.113 CD 8935 53 15 N n CD 8935 53 16 0.164 0.164 CD 8935 53 17 J J NNP 8935 53 18 0.100 0.100 CD 8935 53 19 Space space NN 8935 53 20 0.371 0.371 CD 8935 53 21 B b NN 8935 53 22 0.083 0.083 CD 8935 53 23 R r NN 8935 53 24 0.102 0.102 CD 8935 53 25 A a NN 8935 53 26 0.083 0.083 CD 8935 53 27 A a NN 8935 53 28 0.066 0.066 CD 8935 53 29 M m CD 8935 53 30 0.080 0.080 CD 8935 53 31 A a DT 8935 53 32 0.084 0.084 CD 8935 53 33 R r NN 8935 53 34 0.081 0.081 CD 8935 53 35 M m NN 8935 53 36 0.045 0.045 CD 8935 53 37 K K NNP 8935 53 38 0.076 0.076 CD 8935 53 39 s s NN 8935 53 40 0.082 0.082 CD 8935 53 41 M m NN 8935 53 42 0.064 0.064 CD 8935 53 43 J J NNP 8935 53 44 0.043 0.043 CD 8935 53 45 H h NN 8935 53 46 0.056 0.056 CD 8935 53 47 I -PRON- PRP 8935 53 48 0.074 0.074 CD 8935 53 49 G g NN 8935 53 50 0.058 0.058 CD 8935 53 51 s s NN 8935 53 52 0.035 0.035 CD 8935 53 53 G g NN 8935 53 54 0.055 0.055 CD 8935 53 55 E e NN 8935 53 56 0.068 0.068 CD 8935 53 57 v v NN 8935 53 58 0.051 0.051 CD 8935 53 59 L l NN 8935 53 60 0.033 0.033 CD 8935 53 61 p p NN 8935 53 62 0.053 0.053 CD 8935 53 63 v v NN 8935 53 64 0.067 0.067 CD 8935 53 65 D d NN 8935 53 66 0.050 0.050 CD 8935 53 67 E e NN 8935 53 68 0.033 0.033 CD 8935 53 69 c c NN 8935 53 70 0.052 0.052 CD 8935 53 71 y y NN 8935 53 72 0.043 0.043 CD 8935 53 73 H h NN 8935 53 74 0.050 0.050 CD 8935 53 75 R r NN 8935 53 76 0.031 0.031 CD 8935 53 77 R r NN 8935 53 78 0.047 0.047 CD 8935 53 79 T t NN 8935 53 80 0.042 0.042 CD 8935 53 81 s s NN 8935 53 82 0.047 0.047 CD 8935 53 83 p p NN 8935 53 84 0.031 0.031 CD 8935 53 85 L l NN 8935 53 86 0.047 0.047 CD 8935 53 87 0 0 CD 8935 53 88 0.041 0.041 CD 8935 53 89 E e NN 8935 53 90 0.043 0.043 CD 8935 53 91 G g NN 8935 53 92 . . . 8935 54 1 0.030 0.030 CD 8935 54 2 D d NN 8935 54 3 0.044 0.044 CD 8935 54 4 L l NN 8935 54 5 0.040 0.040 CD 8935 54 6 p p NN 8935 54 7 0.042 0.042 CD 8935 54 8 c c NN 8935 54 9 0.030 0.030 CD 8935 54 10 T t NN 8935 54 11 0.040 0.040 CD 8935 54 12 H h NN 8935 54 13 0.037 0.037 CD 8935 54 14 w w NN 8935 54 15 0.038 0.038 CD 8935 54 16 w w NN 8935 54 17 0.028 0.028 CD 8935 54 18 w w NN 8935 54 19 0.040 0.040 CD 8935 54 20 K k NN 8935 54 21 0.033 0.033 CD 8935 54 22 K k NN 8935 54 23 0.036 0.036 CD 8935 54 24 v v NN 8935 54 25 0.028 0.028 CD 8935 54 26 A a NN 8935 54 27 0.036 0.036 CD 8935 54 28 D d NN 8935 54 29 0.030 0.030 CD 8935 54 30 L l NN 8935 54 31 0.036 0.036 CD 8935 54 32 H h NN 8935 54 33 0.027 0.027 CD 8935 54 34 F f NN 8935 54 35 0.034 0.034 CD 8935 54 36 G g NN 8935 54 37 0.026 0.026 CD 8935 54 38 c c NN 8935 54 39 0.035 0.035 CD 8935 54 40 D d NN 8935 54 41 0.026 0.026 CD 8935 54 42 N n CD 8935 54 43 0.025 0.025 CD 8935 54 44 z z NN 8935 54 45 0.013 0.013 CD 8935 54 46 T t NN 8935 54 47 0.033 0.033 CD 8935 54 48 I -PRON- PRP 8935 54 49 0.026 0.026 CD 8935 54 50 v v NN 8935 54 51 0.025 0.025 CD 8935 54 52 M m NN 8935 54 53 0.013 0.013 CD 8935 54 54 B b NN 8935 54 55 0.032 0.032 CD 8935 54 56 F f NN 8935 54 57 0.024 0.024 CD 8935 54 58 E E NNP 8935 54 59 0,018 0,018 CD 8935 54 60 u u CD 8935 54 61 0.013 0.013 CD 8935 54 62 N n CD 8935 54 63 0.026 0.026 CD 8935 54 64 N n CD 8935 54 65 0.024 0.024 CD 8935 54 66 J j NN 8935 54 67 0.017 0.017 CD 8935 54 68 F f NN 8935 54 69 0.006 0.006 CD 8935 54 70 F f NN 8935 54 71 0.026 0.026 CD 8935 54 72 K K NNP 8935 54 73 0.022 0.022 CD 8935 54 74 0 0 CD 8935 54 75 0.016 0.016 CD 8935 54 76 c c NN 8935 54 77 0.005 0.005 CD 8935 54 78 I -PRON- PRP 8935 54 79 0.023 0.023 CD 8935 54 80 B b NN 8935 54 81 0.020 0.020 CD 8935 54 82 z z NNP 8935 54 83 0.013 0.013 CD 8935 54 84 w w NN 8935 54 85 0.005 0.005 CD 8935 54 86 y y NNP 8935 54 87 0.023 0.023 CD 8935 54 88 T t NN 8935 54 89 0.013 0.013 CD 8935 54 90 I -PRON- PRP 8935 54 91 0.013 0.013 CD 8935 54 92 p p NN 8935 54 93 0.004 0.004 CD 8935 54 94 0 0 CD 8935 54 95 0.010 0.010 CD 8935 54 96 y y NNP 8935 54 97 0.007 0.007 CD 8935 54 98 y y NN 8935 54 99 0.011 0.011 CD 8935 54 100 X x NN 8935 54 101 0.004 0.004 CD 8935 54 102 Space space NN 8935 54 103 0.005 0.005 CD 8935 54 104 0 0 CD 8935 54 105 0.005 0.005 CD 8935 54 106 u u NN 8935 54 107 0.005 0.005 CD 8935 54 108 B B NNP 8935 54 109 0.003 0.003 CD 8935 54 110 z z NN 8935 54 111 0.005 0.005 CD 8935 54 112 z z NN 8935 54 113 0.002 0.002 CD 8935 54 114 Q q NN 8935 54 115 0.001 0.001 CD 8935 54 116 J J NNP 8935 54 117 0.001 0.001 CD 8935 54 118 u u CD 8935 54 119 0.004 0.004 CD 8935 54 120 u u NN 8935 54 121 0.001 0.001 CD 8935 54 122 X x NN 8935 54 123 Q q NN 8935 54 124 0.0002 0.0002 CD 8935 54 125 Q q NN 8935 54 126 0.0002 0.0002 CD 8935 54 127 Q q NN 8935 54 128 0.0002 0.0002 CD 8935 54 129 X x NN 8935 54 130 0.0001 0.0001 CD 8935 54 131 X x NN 8935 54 132 0.0001 0.0001 CD 8935 54 133 H h NN 8935 54 134 = = SYM 8935 54 135 4.309 4.309 CD 8935 54 136 H h NN 8935 54 137 = = SYM 8935 54 138 4.039 4.039 CD 8935 54 139 H h NN 8935 54 140 = = SYM 8935 54 141 4.374 4.374 CD 8935 54 142 H h NN 8935 54 143 = = SYM 8935 54 144 3.688 3.688 CD 8935 54 145 Hmax Hmax NNP 8935 54 146 = = SYM 8935 54 147 4 4 CD 8935 54 148 . . . 8935 55 1 700 700 CD 8935 55 2 ( ( -LRB- 8935 55 3 log,26 log,26 . 8935 55 4 ) ) -RRB- 8935 55 5 Hmax Hmax NNP 8935 55 6 = = SYM 8935 55 7 4 4 CD 8935 55 8 . . . 8935 56 1 700 700 CD 8935 56 2 ( ( -LRB- 8935 56 3 log,26 log,26 . 8935 56 4 ) ) -RRB- 8935 56 5 Hmax hmax VB 8935 56 6 = = SYM 8935 56 7 4.755 4.755 CD 8935 56 8 ( ( -LRB- 8935 56 9 Iog.27 iog.27 CD 8935 56 10 ) ) -RRB- 8935 56 11 Hmax Hmax NNP 8935 56 12 = = SYM 8935 56 13 4 4 CD 8935 56 14 , , , 8935 56 15 755 755 CD 8935 56 16 ( ( -LRB- 8935 56 17 log,27 log,27 NNP 8935 56 18 ) ) -RRB- 8935 56 19 Hr Hr NNP 8935 56 20 = = NFP 8935 56 21 0.917 0.917 CD 8935 56 22 Hr Hr NNP 8935 56 23 = = SYM 8935 56 24 0.859 0.859 CD 8935 56 25 H. h. NN 8935 56 26 = = SYM 8935 56 27 0.920 0.920 CD 8935 56 28 H. h. NN 8935 56 29 = = SYM 8935 56 30 0.776 0.776 CD 8935 56 31 the the DT 8935 56 32 highest high JJS 8935 56 33 of of IN 8935 56 34 the the DT 8935 56 35 four four CD 8935 56 36 ; ; : 8935 56 37 the the DT 8935 56 38 highest highest RBS 8935 56 39 ranking ranking JJ 8935 56 40 initial initial NN 8935 56 41 is be VBZ 8935 56 42 J J NNP 8935 56 43 , , , 8935 56 44 which which WDT 8935 56 45 is be VBZ 8935 56 46 one one CD 8935 56 47 of of IN 8935 56 48 the the DT 8935 56 49 least least JJS 8935 56 50 frequent frequent JJ 8935 56 51 characters character NNS 8935 56 52 in in IN 8935 56 53 English english JJ 8935 56 54 text text NN 8935 56 55 . . . 8935 57 1 Thereafter thereafter RB 8935 57 2 follow follow VB 8935 57 3 the the DT 8935 57 4 first first JJ 8935 57 5 and and CC 8935 57 6 last last JJ 8935 57 7 letters letter NNS 8935 57 8 of of IN 8935 57 9 the the DT 8935 57 10 surname surname NN 8935 57 11 , , , 8935 57 12 and and CC 8935 57 13 the the DT 8935 57 14 second second JJ 8935 57 15 initial initial NN 8935 57 16 . . . 8935 58 1 The the DT 8935 58 2 low low JJ 8935 58 3 relative relative JJ 8935 58 4 entropy entropy RB 8935 58 5 of of IN 8935 58 6 the the DT 8935 58 7 last last JJ 8935 58 8 is be VBZ 8935 58 9 partly partly RB 8935 58 10 accounted account VBN 8935 58 11 for for IN 8935 58 12 by by IN 8935 58 13 the the DT 8935 58 14 fact fact NN 8935 58 15 that that IN 8935 58 16 a a DT 8935 58 17 single single JJ 8935 58 18 initial initial NN 8935 58 19 occurred occur VBD 8935 58 20 in in IN 8935 58 21 37 37 CD 8935 58 22 percent percent NN 8935 58 23 of of IN 8935 58 24 the the DT 8935 58 25 entries entry NNS 8935 58 26 . . . 8935 59 1 Distributions distribution NNS 8935 59 2 were be VBD 8935 59 3 also also RB 8935 59 4 obtained obtain VBN 8935 59 5 for for IN 8935 59 6 the the DT 8935 59 7 second second JJ 8935 59 8 and and CC 8935 59 9 subsequent subsequent JJ 8935 59 10 char- char- JJ 8935 59 11 acters acter NNS 8935 59 12 of of IN 8935 59 13 the the DT 8935 59 14 surname surname NN 8935 59 15 . . . 8935 60 1 These these DT 8935 60 2 , , , 8935 60 3 and and CC 8935 60 4 also also RB 8935 60 5 the the DT 8935 60 6 distributions distribution NNS 8935 60 7 of of IN 8935 60 8 the the DT 8935 60 9 first first JJ 8935 60 10 char- char- XX 8935 60 11 acter acter NN 8935 60 12 , , , 8935 60 13 are be VBP 8935 60 14 in in IN 8935 60 15 general general JJ 8935 60 16 agreement agreement NN 8935 60 17 with with IN 8935 60 18 the the DT 8935 60 19 results result NNS 8935 60 20 of of IN 8935 60 21 earlier early JJR 8935 60 22 studies study NNS 8935 60 23 by by IN 8935 60 24 Bourne Bourne NNP 8935 60 25 and and CC 8935 60 26 Ford Ford NNP 8935 60 27 , , , 8935 60 28 and and CC 8935 60 29 by by IN 8935 60 30 Ohlman Ohlman NNP 8935 60 31 , , , 8935 60 32 and and CC 8935 60 33 indicate indicate VBP 8935 60 34 that that IN 8935 60 35 consonants consonant NNS 8935 60 36 predom- predom- NNP 8935 60 37 inate inate NN 8935 60 38 in in IN 8935 60 39 the the DT 8935 60 40 first first JJ 8935 60 41 position position NN 8935 60 42 , , , 8935 60 43 vowels vowel NNS 8935 60 44 in in IN 8935 60 45 the the DT 8935 60 46 second second JJ 8935 60 47 position position NN 8935 60 48 , , , 8935 60 49 while while IN 8935 60 50 thereafter thereafter RB 8935 60 51 the the DT 8935 60 52 distributions distribution NNS 8935 60 53 become become VBP 8935 60 54 less less RBR 8935 60 55 disparate disparate JJ 8935 60 56 . . . 8935 61 1 15• 15• NNP 8935 61 2 16 16 CD 8935 61 3 However however RB 8935 61 4 , , , 8935 61 5 due due IN 8935 61 6 to to IN 8935 61 7 the the DT 8935 61 8 variable variable JJ 8935 61 9 lengths length NNS 8935 61 10 of of IN 8935 61 11 names name NNS 8935 61 12 , , , 8935 61 13 the the DT 8935 61 14 dominant dominant JJ 8935 61 15 character character NN 8935 61 16 at at IN 8935 61 17 the the DT 8935 61 18 sixth sixth JJ 8935 61 19 and and CC 8935 61 20 subsequent subsequent JJ 8935 61 21 po- po- JJ 8935 61 22 sitions sition NNS 8935 61 23 of of IN 8935 61 24 the the DT 8935 61 25 surname surname NN 8935 61 26 is be VBZ 8935 61 27 the the DT 8935 61 28 space space NN 8935 61 29 character character NN 8935 61 30 . . . 8935 62 1 KEY KEY NNP 8935 62 2 - - HYPH 8935 62 3 SET set NN 8935 62 4 GENERATION GENERATION NNP 8935 62 5 TECHNIQUE TECHNIQUE VBD 8935 62 6 The the DT 8935 62 7 basic basic JJ 8935 62 8 key key JJ 8935 62 9 - - HYPH 8935 62 10 set set NN 8935 62 11 generation generation NN 8935 62 12 technique technique NN 8935 62 13 involves involve VBZ 8935 62 14 creating create VBG 8935 62 15 fixed fix VBN 8935 62 16 - - HYPH 8935 62 17 length length NN 8935 62 18 110 110 CD 8935 62 19 Journal Journal NNP 8935 62 20 of of IN 8935 62 21 Library Library NNP 8935 62 22 Automation Automation NNP 8935 62 23 Vol Vol NNP 8935 62 24 . . . 8935 63 1 7/2 7/2 CD 8935 63 2 JuBe JuBe NNP 8935 63 3 1974 1974 CD 8935 63 4 n n CD 8935 63 5 - - : 8935 63 6 grams gram NNS 8935 63 7 from from IN 8935 63 8 some some DT 8935 63 9 point point NN 8935 63 10 or or CC 8935 63 11 points point NNS 8935 63 12 of of IN 8935 63 13 reference reference NN 8935 63 14 within within IN 8935 63 15 each each DT 8935 63 16 record record NN 8935 63 17 , , , 8935 63 18 the the DT 8935 63 19 strings string NNS 8935 63 20 generated generate VBD 8935 63 21 being be VBG 8935 63 22 initially initially RB 8935 63 23 of of IN 8935 63 24 length length NN 8935 63 25 greater great JJR 8935 63 26 than than IN 8935 63 27 those those DT 8935 63 28 anticipated anticipate VBN 8935 63 29 within within IN 8935 63 30 the the DT 8935 63 31 key key NN 8935 63 32 - - HYPH 8935 63 33 set set NN 8935 63 34 . . . 8935 64 1 These these DT 8935 64 2 strings string NNS 8935 64 3 are be VBP 8935 64 4 sorted sort VBN 8935 64 5 into into IN 8935 64 6 lexicographic lexicographic JJ 8935 64 7 order order NN 8935 64 8 and and CC 8935 64 9 counted count VBD 8935 64 10 . . . 8935 65 1 ( ( -LRB- 8935 65 2 The the DT 8935 65 3 resultant resultant JJ 8935 65 4 distribution distribution NN 8935 65 5 of of IN 8935 65 6 the the DT 8935 65 7 fixed fix VBN 8935 65 8 - - HYPH 8935 65 9 length length NN 8935 65 10 strings string NNS 8935 65 11 is be VBZ 8935 65 12 again again RB 8935 65 13 hy- hy- JJ 8935 65 14 perbolic perbolic NN 8935 65 15 . . . 8935 65 16 ) ) -RRB- 8935 66 1 The the DT 8935 66 2 frequencies frequency NNS 8935 66 3 are be VBP 8935 66 4 compared compare VBN 8935 66 5 with with IN 8935 66 6 a a DT 8935 66 7 predetermined predetermined JJ 8935 66 8 threshold threshold NN 8935 66 9 frequency frequency NN 8935 66 10 - - : 8935 66 11 at at IN 8935 66 12 the the DT 8935 66 13 first first JJ 8935 66 14 stage stage NN 8935 66 15 none none NN 8935 66 16 of of IN 8935 66 17 the the DT 8935 66 18 string string NN 8935 66 19 frequencies frequency NNS 8935 66 20 should should MD 8935 66 21 exceed exceed VB 8935 66 22 this this DT 8935 66 23 value value NN 8935 66 24 . . . 8935 67 1 The the DT 8935 67 2 strings string NNS 8935 67 3 are be VBP 8935 67 4 then then RB 8935 67 5 shortened shorten VBN 8935 67 6 by by IN 8935 67 7 truncation truncation NN 8935 67 8 of of IN 8935 67 9 the the DT 8935 67 10 right right JJ 8935 67 11 - - HYPH 8935 67 12 hand hand NN 8935 67 13 character character NN 8935 67 14 , , , 8935 67 15 and and CC 8935 67 16 the the DT 8935 67 17 frequencies frequency NNS 8935 67 18 of of IN 8935 67 19 the the DT 8935 67 20 strings string NNS 8935 67 21 which which WDT 8935 67 22 have have VBP 8935 67 23 become become VBN 8935 67 24 identical identical JJ 8935 67 25 through through IN 8935 67 26 truncation truncation NN 8935 67 27 are be VBP 8935 67 28 accumulated accumulate VBN 8935 67 29 . . . 8935 68 1 The the DT 8935 68 2 new new JJ 8935 68 3 n n JJ 8935 68 4 - - HYPH 8935 68 5 gram gram NN 8935 68 6 frequencies frequency NNS 8935 68 7 are be VBP 8935 68 8 compared compare VBN 8935 68 9 with with IN 8935 68 10 the the DT 8935 68 11 threshold threshold NN 8935 68 12 value value NN 8935 68 13 ; ; : 8935 68 14 any any DT 8935 68 15 strings string NNS 8935 68 16 which which WDT 8935 68 17 exceed exceed VBP 8935 68 18 the the DT 8935 68 19 value value NN 8935 68 20 are be VBP 8935 68 21 noted note VBN 8935 68 22 . . . 8935 69 1 The the DT 8935 69 2 procedure procedure NN 8935 69 3 is be VBZ 8935 69 4 repeated repeat VBN 8935 69 5 until until IN 8935 69 6 the the DT 8935 69 7 single single JJ 8935 69 8 characters character NNS 8935 69 9 are be VBP 8935 69 10 reached reach VBN 8935 69 11 . . . 8935 70 1 Two two CD 8935 70 2 types type NNS 8935 70 3 of of IN 8935 70 4 analysis analysis NN 8935 70 5 are be VBP 8935 70 6 possible possible JJ 8935 70 7 , , , 8935 70 8 redundant redundant JJ 8935 70 9 and and CC 8935 70 10 nonredundant nonredundant JJ 8935 70 11 . . . 8935 71 1 · · NFP 8935 71 2 In in IN 8935 71 3 the the DT 8935 71 4 latter latter JJ 8935 71 5 , , , 8935 71 6 any any DT 8935 71 7 string string NN 8935 71 8 exceeding exceed VBG 8935 71 9 the the DT 8935 71 10 threshold threshold NN 8935 71 11 value value NN 8935 71 12 is be VBZ 8935 71 13 removed remove VBN 8935 71 14 from from IN 8935 71 15 the the DT 8935 71 16 list list NN 8935 71 17 and and CC 8935 71 18 not not RB 8935 71 19 processed process VBN 8935 71 20 further further RB 8935 71 21 , , , 8935 71 22 while while IN 8935 71 23 in in IN 8935 71 24 the the DT 8935 71 25 former former JJ 8935 71 26 they -PRON- PRP 8935 71 27 continue continue VBP 8935 71 28 to to IN 8935 71 29 the the DT 8935 71 30 next next JJ 8935 71 31 processing processing NN 8935 71 32 stage stage NN 8935 71 33 . . . 8935 72 1 While while IN 8935 72 2 redundant redundant JJ 8935 72 3 analysis analysis NN 8935 72 4 is be VBZ 8935 72 5 valuable valuable JJ 8935 72 6 at at IN 8935 72 7 the the DT 8935 72 8 exploratory exploratory JJ 8935 72 9 stage stage NN 8935 72 10 , , , 8935 72 11 the the DT 8935 72 12 nonredundant nonredundant JJ 8935 72 13 type type NN 8935 72 14 is be VBZ 8935 72 15 preferred prefer VBN 8935 72 16 for for IN 8935 72 17 key key JJ 8935 72 18 - - HYPH 8935 72 19 set set NN 8935 72 20 generation generation NN 8935 72 21 . . . 8935 73 1 The the DT 8935 73 2 procedure procedure NN 8935 73 3 was be VBD 8935 73 4 first first RB 8935 73 5 applied apply VBN 8935 73 6 to to IN 8935 73 7 strings string NNS 8935 73 8 of of IN 8935 73 9 characters character NNS 8935 73 10 starting start VBG 8935 73 11 with with IN 8935 73 12 the the DT 8935 73 13 first first JJ 8935 73 14 character character NN 8935 73 15 of of IN 8935 73 16 each each DT 8935 73 17 surname surname NN 8935 73 18 , , , 8935 73 19 as as IN 8935 73 20 illustrated illustrate VBN 8935 73 21 in in IN 8935 73 22 Figure Figure NNP 8935 73 23 1 1 CD 8935 73 24 . . . 8935 73 25 n n NN 8935 73 26 - - HYPH 8935 73 27 gram gram NN 8935 73 28 FOREMAN FOREMAN NNP 8935 73 29 FOREMA FOREMA NNP 8935 73 30 FOREM FOREM NNP 8935 73 31 FORE FORE NNP 8935 73 32 FOR for IN 8935 73 33 FO FO NNP 8935 73 34 F F NNP 8935 73 35 Frequency Frequency NNP 8935 73 36 11 11 CD 8935 73 37 13 13 CD 8935 73 38 24 24 CD 8935 73 39 98 98 CD 8935 73 40 143 143 CD 8935 73 41 214 214 CD 8935 73 42 1685 1685 CD 8935 73 43 Fig Fig NNP 8935 73 44 . . . 8935 74 1 1 1 LS 8935 74 2 . . . 8935 75 1 Successive successive JJ 8935 75 2 right right JJ 8935 75 3 - - HYPH 8935 75 4 hand hand NN 8935 75 5 truncations truncation NNS 8935 75 6 of of IN 8935 75 7 a a DT 8935 75 8 surname surname NN 8935 75 9 during during IN 8935 75 10 key key JJ 8935 75 11 - - HYPH 8935 75 12 set set NN 8935 75 13 generation generation NN 8935 75 14 Here here RB 8935 75 15 the the DT 8935 75 16 frequency frequency NN 8935 75 17 of of IN 8935 75 18 the the DT 8935 75 19 surname surname NN 8935 75 20 FOREMAN FOREMAN NNP 8935 75 21 in in IN 8935 75 22 a a DT 8935 75 23 _ _ NNP 8935 75 24 file file NN 8935 75 25 of of IN 8935 75 26 50,000 50,000 CD 8935 75 27 names name NNS 8935 75 28 is be VBZ 8935 75 29 eleven eleven CD 8935 75 30 . . . 8935 76 1 When when WRB 8935 76 2 successively successively RB 8935 76 3 shortened shorten VBN 8935 76 4 , , , 8935 76 5 other other JJ 8935 76 6 surnames surname NNS 8935 76 7 with with IN 8935 76 8 the the DT 8935 76 9 same same JJ 8935 76 10 ini- ini- DT 8935 76 11 tial tial JJ 8935 76 12 n n NN 8935 76 13 - - HYPH 8935 76 14 gram gram NN 8935 76 15 are be VBP 8935 76 16 included include VBN 8935 76 17 in in IN 8935 76 18 the the DT 8935 76 19 count count NN 8935 76 20 . . . 8935 77 1 Comparison comparison NN 8935 77 2 of of IN 8935 77 3 the the DT 8935 77 4 count count NN 8935 77 5 with with IN 8935 77 6 a a DT 8935 77 7 threshold threshold NN 8935 77 8 value value NN 8935 77 9 results result NNS 8935 77 10 in in IN 8935 77 11 selection selection NN 8935 77 12 of of IN 8935 77 13 a a DT 8935 77 14 key key NN 8935 77 15 . . . 8935 78 1 Here here RB 8935 78 2 , , , 8935 78 3 if if IN 8935 78 4 the the DT 8935 78 5 threshold threshold NN 8935 78 6 were be VBD 8935 78 7 100 100 CD 8935 78 8 , , , 8935 78 9 the the DT 8935 78 10 key key NN 8935 78 11 selected select VBN 8935 78 12 would would MD 8935 78 13 be be VB 8935 78 14 FOR for IN 8935 78 15 . . . 8935 79 1 Application application NN 8935 79 2 of of IN 8935 79 3 the the DT 8935 79 4 procedure procedure NN 8935 79 5 to to IN 8935 79 6 the the DT 8935 79 7 surnames surname NNS 8935 79 8 of of IN 8935 79 9 the the DT 8935 79 10 50,000 50,000 CD 8935 79 11 name name NN 8935 79 12 file file NN 8935 79 13 ( ( -LRB- 8935 79 14 the the DT 8935 79 15 name name NN 8935 79 16 records record NNS 8935 79 17 had have VBD 8935 79 18 a a DT 8935 79 19 maximum maximum NN 8935 79 20 of of IN 8935 79 21 eighteen eighteen CD 8935 79 22 characters character NNS 8935 79 23 , , , 8935 79 24 left leave VBN 8935 79 25 - - HYPH 8935 79 26 justified justified JJ 8935 79 27 and and CC 8935 79 28 space space NN 8935 79 29 - - HYPH 8935 79 30 filled fill VBN 8935 79 31 if if IN 8935 79 32 less less JJR 8935 79 33 than than IN 8935 79 34 this this DT 8935 79 35 length length NN 8935 79 36 ) ) -RRB- 8935 79 37 , , , 8935 79 38 with with IN 8935 79 39 a a DT 8935 79 40 threshold threshold NN 8935 79 41 frequency frequency NN 8935 79 42 of of IN 8935 79 43 300 300 CD 8935 79 44 ( ( -LRB- 8935 79 45 i.e. i.e. FW 8935 79 46 , , , 8935 79 47 a a DT 8935 79 48 probability probability NN 8935 79 49 of of IN 8935 79 50 0.006 0.006 CD 8935 79 51 ) ) -RRB- 8935 79 52 , , , 8935 79 53 gave give VBD 8935 79 54 a a DT 8935 79 55 key key NN 8935 79 56 - - HYPH 8935 79 57 set set NN 8935 79 58 consisting consisting NN 8935 79 59 of of IN 8935 79 60 eighty eighty CD 8935 79 61 - - HYPH 8935 79 62 seven seven CD 8935 79 63 keys key NNS 8935 79 64 , , , 8935 79 65 including include VBG 8935 79 66 all all PDT 8935 79 67 the the DT 8935 79 68 alphabetic alphabetic JJ 8935 79 69 characters character NNS 8935 79 70 . . . 8935 80 1 The the DT 8935 80 2 key key NN 8935 80 3 - - HYPH 8935 80 4 set set NN 8935 80 5 is be VBZ 8935 80 6 shown show VBN 8935 80 7 , , , 8935 80 8 in in IN 8935 80 9 al- al- JJ 8935 80 10 phabetic phabetic JJ 8935 80 11 order order NN 8935 80 12 , , , 8935 80 13 together together RB 8935 80 14 with with IN 8935 80 15 the the DT 8935 80 16 probabilities probability NNS 8935 80 17 , , , 8935 80 18 in in IN 8935 80 19 Table table NN 8935 80 20 3 3 CD 8935 80 21 . . . 8935 81 1 It -PRON- PRP 8935 81 2 is be VBZ 8935 81 3 clear clear JJ 8935 81 4 that that IN 8935 81 5 the the DT 8935 81 6 most most RBS 8935 81 7 frequent frequent JJ 8935 81 8 characters character NNS 8935 81 9 at at IN 8935 81 10 the the DT 8935 81 11 beginning beginning NN 8935 81 12 of of IN 8935 81 13 the the DT 8935 81 14 surname surname NN 8935 81 15 have have VBP 8935 81 16 pro- pro- NN 8935 81 17 duced duced JJ 8935 81 18 most most JJS 8935 81 19 keys key NNS 8935 81 20 , , , 8935 81 21 S S NNP 8935 81 22 and and CC 8935 81 23 M M NNP 8935 81 24 with with IN 8935 81 25 eight eight CD 8935 81 26 keys key NNS 8935 81 27 each each DT 8935 81 28 , , , 8935 81 29 B b NN 8935 81 30 with with IN 8935 81 31 seven seven CD 8935 81 32 , , , 8935 81 33 K k NN 8935 81 34 with with IN 8935 81 35 six six CD 8935 81 36 , , , 8935 81 37 and and CC 8935 81 38 H h NN 8935 81 39 , , , 8935 81 40 G g NN 8935 81 41 , , , 8935 81 42 P p NN 8935 81 43 , , , 8935 81 44 and and CC 8935 81 45 R r NN 8935 81 46 each each DT 8935 81 47 with with IN 8935 81 48 five five CD 8935 81 49 keys key NNS 8935 81 50 . . . 8935 82 1 Whereas whereas IN 8935 82 2 the the DT 8935 82 3 relative relative JJ 8935 82 4 entropy entropy RB 8935 82 5 of of IN 8935 82 6 the the DT 8935 82 7 initial initial JJ 8935 82 8 surname surname NN 8935 82 9 letter letter NN 8935 82 10 was be VBD 8935 82 11 0.917 0.917 CD 8935 82 12 , , , 8935 82 13 that that DT 8935 82 14 of of IN 8935 82 15 the the DT 8935 82 16 key key NN 8935 82 17 - - HYPH 8935 82 18 set set NN 8935 82 19 is be VBZ 8935 82 20 0.977 0.977 CD 8935 82 21 . . . 8935 83 1 The the DT 8935 83 2 prob- prob- JJ 8935 83 3 abilities ability NNS 8935 83 4 of of IN 8935 83 5 no no DT 8935 83 6 less less JJR 8935 83 7 than than IN 8935 83 8 seventy seventy CD 8935 83 9 of of IN 8935 83 10 the the DT 8935 83 11 eighty eighty CD 8935 83 12 - - HYPH 8935 83 13 seven seven CD 8935 83 14 keys key NNS 8935 83 15 now now RB 8935 83 16 lie lie VBP 8935 83 17 between between IN 8935 83 18 0.005 0.005 CD 8935 83 19 and and CC 8935 83 20 0.015 0.015 CD 8935 83 21 . . . 8935 84 1 The the DT 8935 84 2 key key NN 8935 84 3 - - HYPH 8935 84 4 set set NN 8935 84 5 itself -PRON- PRP 8935 84 6 consists consist VBZ 8935 84 7 of of IN 8935 84 8 the the DT 8935 84 9 twenty twenty CD 8935 84 10 - - HYPH 8935 84 11 six six CD 8935 84 12 alphabetic alphabetic JJ 8935 84 13 characters character NNS 8935 84 14 ( ( -LRB- 8935 84 15 one one CD 8935 84 16 of of IN 8935 84 17 these these DT 8935 84 18 , , , 8935 84 19 X X NNS 8935 84 20 , , , 8935 84 21 is be VBZ 8935 84 22 not not RB 8935 84 23 represented represent VBN 8935 84 24 in in IN 8935 84 25 the the DT 8935 84 26 collection collection NN 8935 84 27 ) ) -RRB- 8935 84 28 , , , 8935 84 29 fifty- fifty- JJ 8935 84 30 Variety Variety NNP 8935 84 31 - - HYPH 8935 84 32 Generator Generator NNP 8935 84 33 ApproachjFOKKER ApproachjFOKKER NNP 8935 84 34 and and CC 8935 84 35 LYNCH LYNCH NNP 8935 84 36 111 111 CD 8935 84 37 Table Table NNP 8935 84 38 3 3 CD 8935 84 39 . . . 8935 85 1 Key key NN 8935 85 2 - - HYPH 8935 85 3 set set NN 8935 85 4 of of IN 8935 85 5 87 87 CD 8935 85 6 keys key NNS 8935 85 7 produced produce VBN 8935 85 8 from from IN 8935 85 9 50,000 50,000 CD 8935 85 10 surnames surname NNS 8935 85 11 from from IN 8935 85 12 INSPEC INSPEC NNP 8935 85 13 files file NNS 8935 85 14 . . . 8935 86 1 Key key JJ 8935 86 2 P1'0bability p1'0bability NN 8935 86 3 Key key JJ 8935 86 4 Probability Probability NNP 8935 86 5 Key Key NNP 8935 86 6 Probability Probability NNP 8935 86 7 Key Key NNP 8935 86 8 Probability Probability NNP 8935 86 9 A A NNP 8935 86 10 .023 .023 CD 8935 86 11 GA GA NNP 8935 86 12 .009 .009 CD 8935 86 13 M m CD 8935 86 14 .001 .001 CD 8935 86 15 RO ro NN 8935 86 16 .016 .016 CD 8935 86 17 AL AL NNP 8935 86 18 .007 .007 CD 8935 86 19 GO go NN 8935 86 20 .011 .011 CD 8935 86 21 MA MA NNP 8935 86 22 .022 .022 CD 8935 86 23 s s NN 8935 86 24 .027 .027 CD 8935 86 25 AN an NN 8935 86 26 .006 .006 CD 8935 86 27 GR GR NNP 8935 86 28 .012 .012 CD 8935 86 29 MAR MAR NNP 8935 86 30 .008 .008 CD 8935 86 31 SA sa NN 8935 86 32 .016 .016 CD 8935 86 33 B b NN 8935 86 34 .012 .012 CD 8935 86 35 GU GU NNP 8935 86 36 .007 .007 CD 8935 86 37 MC MC NNP 8935 86 38 .007 .007 CD 8935 86 39 SCH SCH NNP 8935 86 40 .014 .014 CD 8935 86 41 BA ba NN 8935 86 42 .013 .013 CD 8935 86 43 H h NN 8935 86 44 , , , 8935 86 45 006 006 CD 8935 86 46 ME me $ 8935 86 47 .010 .010 CD 8935 86 48 SE SE NNP 8935 86 49 .008 .008 CD 8935 86 50 BAR BAR NNP 8935 86 51 .006 .006 CD 8935 86 52 HA HA NNP 8935 86 53 .021 .021 CD 8935 86 54 MI MI NNP 8935 86 55 .012 .012 CD 8935 86 56 SH sh NN 8935 86 57 .016 .016 CD 8935 86 58 BE be VB 8935 86 59 .017 .017 CD 8935 86 60 HE he PRP 8935 86 61 .010 .010 CD 8935 86 62 MO mo NN 8935 86 63 .012 .012 CD 8935 86 64 SI SI NNP 8935 86 65 .010 .010 CD 8935 86 66 BO BO NNP 8935 86 67 .014 .014 CD 8935 86 68 HO HO NNP 8935 86 69 .012 .012 CD 8935 86 70 MU mu NN 8935 86 71 .008 .008 CD 8935 86 72 so so RB 8935 86 73 .007 .007 CD 8935 86 74 BR BR NNP 8935 86 75 .014 .014 CD 8935 86 76 HU hu NN 8935 86 77 .007 .007 CD 8935 86 78 N n CD 8935 86 79 .011 .011 CD 8935 86 80 ST ST NNP 8935 86 81 .016 .016 CD 8935 86 82 BU BU NNP 8935 86 83 .009 .009 CD 8935 86 84 I -PRON- PRP 8935 86 85 .013 .013 CD 8935 86 86 NA na CD 8935 86 87 .008 .008 CD 8935 86 88 T t NN 8935 86 89 .030 .030 CD 8935 86 90 c c NN 8935 86 91 .013 .013 CD 8935 86 92 J J NNP 8935 86 93 .010 .010 CD 8935 86 94 NI NI NNP 8935 86 95 .006 .006 CD 8935 86 96 TA TA NNP 8935 86 97 .010 .010 CD 8935 86 98 CA ca NN 8935 86 99 .011 .011 CD 8935 86 100 JO JO NNP 8935 86 101 .007 .007 CD 8935 86 102 0 0 CD 8935 86 103 .017 .017 CD 8935 86 104 u u CD 8935 86 105 .005 .005 CD 8935 86 106 CH CH NNP 8935 86 107 .016 .016 CD 8935 86 108 K K NNP 8935 86 109 .015 .015 CD 8935 86 110 p p NN 8935 86 111 .011 .011 CD 8935 86 112 v v NN 8935 86 113 .015 .015 CD 8935 86 114 co co NN 8935 86 115 .013 .013 CD 8935 86 116 KA KA NNP 8935 86 117 .018 .018 CD 8935 86 118 PA PA NNP 8935 86 119 .014 .014 CD 8935 86 120 VA VA NNP 8935 86 121 .010 .010 CD 8935 86 122 D d NN 8935 86 123 .015 .015 CD 8935 86 124 KI KI NNP 8935 86 125 .008 .008 CD 8935 86 126 PE PE NNP 8935 86 127 .011 .011 CD 8935 86 128 w w NN 8935 86 129 .011 .011 CD 8935 86 130 DA da NN 8935 86 131 .009 .009 CD 8935 86 132 KO KO NNP 8935 86 133 .017 .017 CD 8935 86 134 PO po NN 8935 86 135 .010 .010 CD 8935 86 136 WA wa NN 8935 86 137 .011 .011 CD 8935 86 138 DE de NN 8935 86 139 .013 .013 CD 8935 86 140 KR kr NN 8935 86 141 .008 .008 CD 8935 86 142 PR pr NN 8935 86 143 .006 .006 CD 8935 86 144 WE we NN 8935 86 145 .008 .008 CD 8935 86 146 DO do VBP 8935 86 147 .007 .007 CD 8935 86 148 KU KU NNP 8935 86 149 .010 .010 CD 8935 86 150 Q q NN 8935 86 151 .001 .001 CD 8935 86 152 WI wi NN 8935 86 153 .010 .010 CD 8935 86 154 E e NN 8935 86 155 .018 .018 CD 8935 86 156 L l NN 8935 86 157 .013 .013 CD 8935 86 158 R r NN 8935 86 159 .007 .007 CD 8935 86 160 X x NN 8935 86 161 F F NNP 8935 86 162 .025 .025 CD 8935 86 163 LA LA NNP 8935 86 164 .012 .012 CD 8935 86 165 RA RA NNP 8935 86 166 .011 .011 CD 8935 86 167 y y UH 8935 86 168 .011 .011 CD 8935 86 169 FR FR NNP 8935 86 170 .008 .008 CD 8935 86 171 LE LE NNP 8935 86 172 .014 .014 CD 8935 86 173 RE re NN 8935 86 174 .008 .008 CD 8935 86 175 z z NN 8935 86 176 .013 .013 CD 8935 86 177 G g NN 8935 86 178 .015 .015 CD 8935 86 179 LI li NN 8935 86 180 .009 .009 CD 8935 86 181 RI RI NNP 8935 86 182 .006 .006 CD 8935 86 183 H=6.2952 h=6.2952 NN 8935 86 184 Hmax hmax NN 8935 86 185 = = SYM 8935 86 186 6.443 6.443 CD 8935 86 187 ( ( -LRB- 8935 86 188 log,87 log,87 NNP 8935 86 189 ) ) -RRB- 8935 86 190 H h NN 8935 86 191 , , , 8935 86 192 = = NFP 8935 86 193 0.977 0.977 CD 8935 86 194 eight eight CD 8935 86 195 digram digram NN 8935 86 196 keys key NNS 8935 86 197 , , , 8935 86 198 and and CC 8935 86 199 the the DT 8935 86 200 three three CD 8935 86 201 trigram trigram NNP 8935 86 202 keys key NNS 8935 86 203 BAR BAR NNP 8935 86 204 , , , 8935 86 205 MAR MAR NNP 8935 86 206 , , , 8935 86 207 and and CC 8935 86 208 SCH SCH NNP 8935 86 209 . . . 8935 87 1 The the DT 8935 87 2 predominance predominance NN 8935 87 3 of of IN 8935 87 4 vowels vowel NNS 8935 87 5 as as IN 8935 87 6 the the DT 8935 87 7 second second JJ 8935 87 8 character character NN 8935 87 9 of of IN 8935 87 10 keys key NNS 8935 87 11 is be VBZ 8935 87 12 noticeable noticeable JJ 8935 87 13 ; ; : 8935 87 14 for- for- XX 8935 87 15 ty ty NN 8935 87 16 - - HYPH 8935 87 17 nine nine CD 8935 87 18 of of IN 8935 87 19 the the DT 8935 87 20 sixty sixty CD 8935 87 21 - - HYPH 8935 87 22 one one CD 8935 87 23 n n NN 8935 87 24 - - HYPH 8935 87 25 grams gram NNS 8935 87 26 have have VBP 8935 87 27 a a DT 8935 87 28 vowel vowel NN 8935 87 29 in in IN 8935 87 30 the the DT 8935 87 31 second second JJ 8935 87 32 position position NN 8935 87 33 . . . 8935 88 1 The the DT 8935 88 2 size size NN 8935 88 3 of of IN 8935 88 4 the the DT 8935 88 5 key key NN 8935 88 6 - - HYPH 8935 88 7 set set NN 8935 88 8 produced produce VBN 8935 88 9 from from IN 8935 88 10 a a DT 8935 88 11 given give VBN 8935 88 12 data data NN 8935 88 13 base base NN 8935 88 14 can can MD 8935 88 15 be be VB 8935 88 16 varied vary VBN 8935 88 17 arbitrarily arbitrarily RB 8935 88 18 by by IN 8935 88 19 changing change VBG 8935 88 20 the the DT 8935 88 21 threshold threshold NN 8935 88 22 value value NN 8935 88 23 . . . 8935 89 1 An an DT 8935 89 2 approximately approximately RB 8935 89 3 hyperbolic hyperbolic JJ 8935 89 4 relation relation NN 8935 89 5 obtains obtain NNS 8935 89 6 between between IN 8935 89 7 the the DT 8935 89 8 value value NN 8935 89 9 of of IN 8935 89 10 the the DT 8935 89 11 threshold threshold NN 8935 89 12 and and CC 8935 89 13 the the DT 8935 89 14 number number NN 8935 89 15 of of IN 8935 89 16 keys key NNS 8935 89 17 selected select VBN 8935 89 18 . . . 8935 90 1 As as IN 8935 90 2 the the DT 8935 90 3 size size NN 8935 90 4 of of IN 8935 90 5 the the DT 8935 90 6 key key JJ 8935 90 7 - - HYPH 8935 90 8 set set VBN 8935 90 9 increases increase NNS 8935 90 10 , , , 8935 90 11 the the DT 8935 90 12 length length NN 8935 90 13 of of IN 8935 90 14 the the DT 8935 90 15 longest long JJS 8935 90 16 n n NN 8935 90 17 - - HYPH 8935 90 18 gram gram NN 8935 90 19 in in IN 8935 90 20 the the DT 8935 90 21 key key JJ 8935 90 22 - - HYPH 8935 90 23 set set VBN 8935 90 24 increases increase NNS 8935 90 25 , , , 8935 90 26 and and CC 8935 90 27 the the DT 8935 90 28 distribution distribution NN 8935 90 29 of of IN 8935 90 30 n n NN 8935 90 31 - - HYPH 8935 90 32 grams gram NNS 8935 90 33 shifts shift NNS 8935 90 34 to- to- IN 8935 90 35 ward ward NNP 8935 90 36 higher high JJR 8935 90 37 values value NNS 8935 90 38 , , , 8935 90 39 as as IN 8935 90 40 shown show VBN 8935 90 41 in in IN 8935 90 42 Figure Figure NNP 8935 90 43 2 2 CD 8935 90 44 . . . 8935 91 1 Stability stability NN 8935 91 2 of of IN 8935 91 3 the the DT 8935 91 4 key key NN 8935 91 5 - - HYPH 8935 91 6 sets set NNS 8935 91 7 with with IN 8935 91 8 increase increase NN 8935 91 9 in in IN 8935 91 10 file file NN 8935 91 11 size size NN 8935 91 12 is be VBZ 8935 91 13 clearly clearly RB 8935 91 14 an an DT 8935 91 15 important important JJ 8935 91 16 factor factor NN 8935 91 17 . . . 8935 92 1 To to TO 8935 92 2 determine determine VB 8935 92 3 the the DT 8935 92 4 extent extent NN 8935 92 5 of of IN 8935 92 6 this this DT 8935 92 7 , , , 8935 92 8 successive successive JJ 8935 92 9 portions portion NNS 8935 92 10 of of IN 8935 92 11 the the DT 8935 92 12 entire entire JJ 8935 92 13 file file NN 8935 92 14 of of IN 8935 92 15 100,000 100,000 CD 8935 92 16 surnames surname NNS 8935 92 17 were be VBD 8935 92 18 subjected subject VBN 8935 92 19 to to IN 8935 92 20 the the DT 8935 92 21 analysis analysis NN 8935 92 22 at at IN 8935 92 23 a a DT 8935 92 24 threshold threshold NN 8935 92 25 value value NN 8935 92 26 of of IN 8935 92 27 0.005 0.005 CD 8935 92 28 . . . 8935 93 1 As as IN 8935 93 2 illustrated illustrate VBN 8935 93 3 in in IN 8935 93 4 Table table NN 8935 93 5 4 4 CD 8935 93 6 , , , 8935 93 7 the the DT 8935 93 8 key key NN 8935 93 9 - - HYPH 8935 93 10 sets set NNS 8935 93 11 are be VBP 8935 93 12 remarkably remarkably RB 8935 93 13 stable stable JJ 8935 93 14 in in IN 8935 93 15 re- re- JJ 8935 93 16 gard gard NN 8935 93 17 to to TO 8935 93 18 total total JJ 8935 93 19 key key JJ 8935 93 20 - - HYPH 8935 93 21 set set NN 8935 93 22 size size NN 8935 93 23 , , , 8935 93 24 the the DT 8935 93 25 number number NN 8935 93 26 of of IN 8935 93 27 keys key NNS 8935 93 28 of of IN 8935 93 29 each each DT 8935 93 30 length length NN 8935 93 31 , , , 8935 93 32 and and CC 8935 93 33 to to IN 8935 93 34 the the DT 8935 93 35 actual actual JJ 8935 93 36 keys key NNS 8935 93 37 . . . 8935 94 1 Table table NN 8935 94 2 4 4 CD 8935 94 3 . . . 8935 95 1 Stability stability NN 8935 95 2 of of IN 8935 95 3 size size NN 8935 95 4 and and CC 8935 95 5 composition composition NN 8935 95 6 of of IN 8935 95 7 keys key NNS 8935 95 8 with with IN 8935 95 9 increasing increase VBG 8935 95 10 file file NN 8935 95 11 size size NN 8935 95 12 . . . 8935 96 1 Number number NN 8935 96 2 of of IN 8935 96 3 Number Number NNP 8935 96 4 of of IN 8935 96 5 Number Number NNP 8935 96 6 of of IN 8935 96 7 Number Number NNP 8935 96 8 of of IN 8935 96 9 Total Total NNP 8935 96 10 Size Size NNP 8935 96 11 Entries Entries NNP 8935 96 12 in in IN 8935 96 13 File file NN 8935 96 14 Characters character NNS 8935 96 15 Digrams Digrams NNP 8935 96 16 Trigrams trigram NNS 8935 96 17 of of IN 8935 96 18 Key Key NNP 8935 96 19 - - HYPH 8935 96 20 set set VBN 8935 96 21 25,000 25,000 CD 8935 96 22 26 26 CD 8935 96 23 76 76 CD 8935 96 24 10 10 CD 8935 96 25 112 112 CD 8935 96 26 50,000 50,000 CD 8935 96 27 26 26 CD 8935 96 28 74 74 CD 8935 96 29 9 9 CD 8935 96 30 109 109 CD 8935 96 31 75,000 75,000 CD 8935 96 32 26 26 CD 8935 96 33 74 74 CD 8935 96 34 10 10 CD 8935 96 35 110 110 CD 8935 96 36 100,000 100,000 CD 8935 96 37 26 26 CD 8935 96 38 75 75 CD 8935 96 39 10 10 CD 8935 96 40 111 111 CD 8935 96 41 No no UH 8935 96 42 , , , 8935 96 43 of of IN 8935 96 44 keys key NNS 8935 96 45 common common JJ 8935 96 46 to to IN 8935 96 47 key key NN 8935 96 48 - - HYPH 8935 96 49 sets set NNS 8935 96 50 26 26 CD 8935 96 51 73 73 CD 8935 96 52 9 9 CD 8935 96 53 108 108 CD 8935 96 54 112 112 CD 8935 96 55 ] ] -RRB- 8935 96 56 oU1'nal oU1'nal NNP 8935 96 57 of of IN 8935 96 58 Library Library NNP 8935 96 59 Automation Automation NNP 8935 96 60 Vol Vol NNP 8935 96 61 . . . 8935 97 1 7/2 7/2 CD 8935 97 2 June June NNP 8935 97 3 1974 1974 CD 8935 97 4 400 400 CD 8935 97 5 300 300 CD 8935 97 6 Number number NN 8935 97 7 of of IN 8935 97 8 n n NN 8935 97 9 - - HYPH 8935 97 10 grams gram NNS 8935 97 11 200 200 CD 8935 97 12 100 100 CD 8935 97 13 1 1 CD 8935 97 14 2 2 CD 8935 97 15 3 3 CD 8935 97 16 4 4 CD 8935 97 17 5 5 CD 8935 97 18 6 6 CD 8935 97 19 7 7 CD 8935 97 20 8 8 CD 8935 97 21 9 9 CD 8935 97 22 Length length NN 8935 97 23 of of IN 8935 97 24 n n NN 8935 97 25 - - HYPH 8935 97 26 grams gram NNS 8935 97 27 Key key NN 8935 97 28 - - HYPH 8935 97 29 set set NN 8935 97 30 size size NN 8935 97 31 A a NN 8935 97 32 184 184 CD 8935 97 33 B b NN 8935 97 34 332 332 CD 8935 97 35 c c NN 8935 97 36 572 572 CD 8935 97 37 D d NN 8935 97 38 1034 1034 CD 8935 97 39 Threshold Threshold NNP 8935 97 40 probability probability NN 8935 97 41 0.0025 0.0025 CD 8935 97 42 0.0015 0.0015 CD 8935 97 43 0.0010 0.0010 CD 8935 97 44 0.0007 0.0007 CD 8935 97 45 10 10 CD 8935 97 46 11 11 CD 8935 97 47 12 12 CD 8935 97 48 13 13 CD 8935 97 49 Fig fig NN 8935 97 50 . . . 8935 98 1 2 2 LS 8935 98 2 . . . 8935 99 1 Distribution distribution NN 8935 99 2 characteristics characteristic NNS 8935 99 3 of of IN 8935 99 4 n n NN 8935 99 5 - - HYPH 8935 99 6 grams gram NNS 8935 99 7 generated generate VBN 8935 99 8 from from IN 8935 99 9 10,000 10,000 CD 8935 99 10 surnames surname NNS 8935 99 11 from from IN 8935 99 12 INSPEC INSPEC NNP 8935 99 13 for for IN 8935 99 14 four four CD 8935 99 15 different different JJ 8935 99 16 threshold threshold NN 8935 99 17 values value NNS 8935 99 18 As as IN 8935 99 19 the the DT 8935 99 20 size size NN 8935 99 21 of of IN 8935 99 22 the the DT 8935 99 23 key key JJ 8935 99 24 - - HYPH 8935 99 25 set set VBN 8935 99 26 increases increase NNS 8935 99 27 , , , 8935 99 28 the the DT 8935 99 29 range range NN 8935 99 30 of of IN 8935 99 31 probabilities probability NNS 8935 99 32 represent- represent- VBN 8935 99 33 ed ed NNP 8935 99 34 among among IN 8935 99 35 the the DT 8935 99 36 keys key NNS 8935 99 37 narrows narrow NNS 8935 99 38 , , , 8935 99 39 and and CC 8935 99 40 the the DT 8935 99 41 relative relative JJ 8935 99 42 entropy entropy RB 8935 99 43 of of IN 8935 99 44 the the DT 8935 99 45 distribution distribution NN 8935 99 46 in- in- NN 8935 99 47 creases crease VBZ 8935 99 48 , , , 8935 99 49 becoming become VBG 8935 99 50 eventually eventually RB 8935 99 51 asymptotic asymptotic JJ 8935 99 52 with with IN 8935 99 53 the the DT 8935 99 54 value value NN 8935 99 55 of of IN 8935 99 56 one one CD 8935 99 57 . . . 8935 100 1 This this DT 8935 100 2 i~ i~ CD 8935 100 3 illus- illus- NN 8935 100 4 trated trate VBD 8935 100 5 in in IN 8935 100 6 Figure Figure NNP 8935 100 7 3 3 CD 8935 100 8 , , , 8935 100 9 for for IN 8935 100 10 the the DT 8935 100 11 surnames surname NNS 8935 100 12 in in IN 8935 100 13 a a DT 8935 100 14 file file NN 8935 100 15 of of IN 8935 100 16 50,000 50,000 CD 8935 100 17 entries entry NNS 8935 100 18 . . . 8935 101 1 Beyond beyond IN 8935 101 2 a a DT 8935 101 3 key key JJ 8935 101 4 - - HYPH 8935 101 5 set set NN 8935 101 6 size size NN 8935 101 7 of of IN 8935 101 8 about about IN 8935 101 9 100 100 CD 8935 101 10 , , , 8935 101 11 increases increase NNS 8935 101 12 in in IN 8935 101 13 the the DT 8935 101 14 relative relative JJ 8935 101 15 entropy entropy RB 8935 101 16 of of IN 8935 101 17 the the DT 8935 101 18 resultant resultant JJ 8935 101 19 distribution distribution NN 8935 101 20 are be VBP 8935 101 21 marginal marginal JJ 8935 101 22 . . . 8935 102 1 Furthermore furthermore RB 8935 102 2 , , , 8935 102 3 with with IN 8935 102 4 increasing increase VBG 8935 102 5 key key NN 8935 102 6 - - HYPH 8935 102 7 set set NN 8935 102 8 size size NN 8935 102 9 , , , 8935 102 10 the the DT 8935 102 11 Va1'iety Va1'iety NNP 8935 102 12 - - HYPH 8935 102 13 Gene1'ato1 Gene1'ato1 NNP 8935 102 14 ' ' POS 8935 102 15 AppmachjFOKKER appmachjfokker CD 8935 102 16 and and CC 8935 102 17 LYNCH LYNCH NNP 8935 102 18 113 113 CD 8935 102 19 shorter short JJR 8935 102 20 and and CC 8935 102 21 more more RBR 8935 102 22 frequent frequent JJ 8935 102 23 surnames surname NNS 8935 102 24 begin begin VBP 8935 102 25 to to TO 8935 102 26 appear appear VB 8935 102 27 in in IN 8935 102 28 their -PRON- PRP$ 8935 102 29 entirety entirety NN 8935 102 30 as as IN 8935 102 31 keys key NNS 8935 102 32 . . . 8935 103 1 As as IN 8935 103 2 an an DT 8935 103 3 alternative alternative NN 8935 103 4 to to IN 8935 103 5 increasing increase VBG 8935 103 6 the the DT 8935 103 7 variety variety NN 8935 103 8 of of IN 8935 103 9 the the DT 8935 103 10 keys key NNS 8935 103 11 , , , 8935 103 12 the the DT 8935 103 13 production production NN 8935 103 14 of of IN 8935 103 15 keys key NNS 8935 103 16 from from IN 8935 103 17 character character NN 8935 103 18 positions position NNS 8935 103 19 after after IN 8935 103 20 the the DT 8935 103 21 first first JJ 8935 103 22 letter letter NN 8935 103 23 of of IN 8935 103 24 the the DT 8935 103 25 surname surname NN 8935 103 26 was be VBD 8935 103 27 con- con- NNP 8935 103 28 sidered sidere VBD 8935 103 29 . . . 8935 104 1 The the DT 8935 104 2 problem problem NN 8935 104 3 of of IN 8935 104 4 variations variation NNS 8935 104 5 in in IN 8935 104 6 name name NN 8935 104 7 length length NN 8935 104 8 , , , 8935 104 9 as as RB 8935 104 10 well well RB 8935 104 11 as as IN 8935 104 12 the the DT 8935 104 13 very very RB 8935 104 14 dif- dif- RB 8935 104 15 ferent ferent JJ 8935 104 16 distributions distribution NNS 8935 104 17 of of IN 8935 104 18 the the DT 8935 104 19 characters character NNS 8935 104 20 at at IN 8935 104 21 these these DT 8935 104 22 positions position NNS 8935 104 23 , , , 8935 104 24 were be VBD 8935 104 25 not not RB 8935 104 26 encourag- encourag- NN 8935 104 27 ing ing NNP 8935 104 28 , , , 8935 104 29 and and CC 8935 104 30 instead instead RB 8935 104 31 the the DT 8935 104 32 production production NN 8935 104 33 of of IN 8935 104 34 key key NN 8935 104 35 - - HYPH 8935 104 36 sets set NNS 8935 104 37 from from IN 8935 104 38 the the DT 8935 104 39 last last JJ 8935 104 40 letter letter NN 8935 104 41 of of IN 8935 104 42 the the DT 8935 104 43 sur- sur- NN 8935 104 44 1 1 CD 8935 104 45 .99 .99 CD 8935 104 46 .98 .98 CD 8935 104 47 .97 .97 CD 8935 104 48 .96 .96 CD 8935 104 49 .95 .95 CD 8935 104 50 .94 .94 NFP 8935 104 51 .93 .93 NFP 8935 104 52 Hr Hr NNP 8935 104 53 .92 .92 NNP 8935 104 54 .91 .91 CD 8935 104 55 .90 .90 CD 8935 104 56 .89 .89 CD 8935 104 57 .88 .88 CD 8935 104 58 .87 .87 CD 8935 104 59 .86 .86 CD 8935 104 60 0 0 CD 8935 104 61 20 20 CD 8935 104 62 40 40 CD 8935 104 63 60 60 CD 8935 104 64 80 80 CD 8935 104 65 100 100 CD 8935 104 66 Total total JJ 8935 104 67 number number NN 8935 104 68 of of IN 8935 104 69 keys key NNS 8935 104 70 for for IN 8935 104 71 the the DT 8935 104 72 front front NN 8935 104 73 of of IN 8935 104 74 surnames surname NNS 8935 104 75 Fig Fig NNP 8935 104 76 . . . 8935 105 1 3 3 LS 8935 105 2 . . . 8935 106 1 Increase increase VB 8935 106 2 in in IN 8935 106 3 relative relative JJ 8935 106 4 entropy entropy RB 8935 106 5 with with IN 8935 106 6 increase increase NN 8935 106 7 in in IN 8935 106 8 key key JJ 8935 106 9 - - HYPH 8935 106 10 set set NN 8935 106 11 size size NN 8935 106 12 ; ; : 8935 106 13 keys key NNS 8935 106 14 generated generate VBD 8935 106 15 from from IN 8935 106 16 50,000 50,000 CD 8935 106 17 surnames surname NNS 8935 106 18 114 114 CD 8935 106 19 J j NN 8935 106 20 oumal oumal JJ 8935 106 21 of of IN 8935 106 22 Library Library NNP 8935 106 23 Automation Automation NNP 8935 106 24 Vol Vol NNP 8935 106 25 . . . 8935 107 1 7/2 7/2 CD 8935 107 2 June June NNP 8935 107 3 1974 1974 CD 8935 107 4 name name NN 8935 107 5 was be VBD 8935 107 6 investigated investigate VBN 8935 107 7 , , , 8935 107 8 and and CC 8935 107 9 proved prove VBD 8935 107 10 much much RB 8935 107 11 more more RBR 8935 107 12 ath·active ath·active JJ 8935 107 13 , , , 8935 107 14 since since IN 8935 107 15 it -PRON- PRP 8935 107 16 is be VBZ 8935 107 17 largely largely RB 8935 107 18 independent independent JJ 8935 107 19 of of IN 8935 107 20 surname surname NN 8935 107 21 length length NN 8935 107 22 . . . 8935 108 1 KEY KEY NNP 8935 108 2 - - HYPH 8935 108 3 SETS SETS NNP 8935 108 4 FROM from IN 8935 108 5 THE the DT 8935 108 6 END END NNP 8935 108 7 OF of IN 8935 108 8 THE the DT 8935 108 9 SURNAME SURNAME NNP 8935 108 10 For for IN 8935 108 11 this this DT 8935 108 12 purpose purpose NN 8935 108 13 , , , 8935 108 14 each each DT 8935 108 15 surname surname NN 8935 108 16 in in IN 8935 108 17 the the DT 8935 108 18 file file NN 8935 108 19 was be VBD 8935 108 20 reversed reverse VBN 8935 108 21 within within IN 8935 108 22 a a DT 8935 108 23 record record NN 8935 108 24 and and CC 8935 108 25 subjected subject VBN 8935 108 26 to to IN 8935 108 27 key key JJ 8935 108 28 - - HYPH 8935 108 29 generation generation NN 8935 108 30 . . . 8935 109 1 The the DT 8935 109 2 relative relative JJ 8935 109 3 entropy entropy RB 8935 109 4 of of IN 8935 109 5 the the DT 8935 109 6 last last JJ 8935 109 7 character character NN 8935 109 8 of of IN 8935 109 9 the the DT 8935 109 10 surname surname NN 8935 109 11 is be VBZ 8935 109 12 substantially substantially RB 8935 109 13 lower low JJR 8935 109 14 than than IN 8935 109 15 that that DT 8935 109 16 of of IN 8935 109 17 the the DT 8935 109 18 first first JJ 8935 109 19 character character NN 8935 109 20 , , , 8935 109 21 at at IN 8935 109 22 0.860 0.860 CD 8935 109 23 . . . 8935 110 1 Accordingly accordingly RB 8935 110 2 , , , 8935 110 3 the the DT 8935 110 4 key key NN 8935 110 5 - - HYPH 8935 110 6 sets set NNS 8935 110 7 have have VBP 8935 110 8 a a DT 8935 110 9 higher high JJR 8935 110 10 proportion proportion NN 8935 110 11 of of IN 8935 110 12 longer long JJR 8935 110 13 keys key NNS 8935 110 14 than than IN 8935 110 15 those those DT 8935 110 16 produced produce VBN 8935 110 17 from from IN 8935 110 18 the the DT 8935 110 19 front front NN 8935 110 20 of of IN 8935 110 21 the the DT 8935 110 22 surname surname NN 8935 110 23 , , , 8935 110 24 as as IN 8935 110 25 shown show VBN 8935 110 26 in in IN 8935 110 27 Table Table NNP 8935 110 28 5 5 CD 8935 110 29 . . . 8935 111 1 This this DT 8935 111 2 key key NN 8935 111 3 - - HYPH 8935 111 4 set set NN 8935 111 5 consists consist NNS 8935 111 6 of of IN 8935 111 7 the the DT 8935 111 8 twenty twenty CD 8935 111 9 - - HYPH 8935 111 10 six six CD 8935 111 11 characters character NNS 8935 111 12 , , , 8935 111 13 seventy seventy CD 8935 111 14 - - HYPH 8935 111 15 eight eight CD 8935 111 16 digrams digrams NN 8935 111 17 , , , 8935 111 18 Table table NN 8935 111 19 5 5 CD 8935 111 20 . . . 8935 112 1 Key key NN 8935 112 2 - - HYPH 8935 112 3 set set NN 8935 112 4 of of IN 8935 112 5 155 155 CD 8935 112 6 n n CD 8935 112 7 - - HYPH 8935 112 8 grams gram NNS 8935 112 9 produced produce VBN 8935 112 10 from from IN 8935 112 11 last last JJ 8935 112 12 letter letter NN 8935 112 13 of of IN 8935 112 14 50,000 50,000 CD 8935 112 15 INSPEC INSPEC NNP 8935 112 16 surnames surname NNS 8935 112 17 at at IN 8935 112 18 threshold threshold NN 8935 112 19 of of IN 8935 112 20 0.003 0.003 CD 8935 112 21 . . . 8935 113 1 Key key JJ 8935 113 2 P1'obability P1'obability NNP 8935 113 3 Key Key NNP 8935 113 4 P P NNP 8935 113 5 ! ! . 8935 114 1 ' ' `` 8935 114 2 obability obability NN 8935 114 3 Key Key NNP 8935 114 4 Probability Probability NNP 8935 114 5 Key Key NNP 8935 114 6 Probability Probability NNP 8935 114 7 A A NNP 8935 114 8 .012 .012 CD 8935 114 9 VICH vich NN 8935 114 10 , , , 8935 114 11 005 005 CD 8935 114 12 EIN EIN NNP 8935 114 13 .005 .005 CD 8935 114 14 IS be VBZ 8935 114 15 .012 .012 CD 8935 114 16 CA CA NNP 8935 114 17 .003 .003 CD 8935 114 18 GH GH NNP 8935 114 19 .003 .003 CD 8935 114 20 KIN KIN NNP 8935 114 21 .007 .007 CD 8935 114 22 NS ns NN 8935 114 23 .006 .006 CD 8935 114 24 DA da NN 8935 114 25 .008 .008 CD 8935 114 26 SH sh NN 8935 114 27 .003 .003 CD 8935 114 28 LIN LIN NNP 8935 114 29 .005 .005 CD 8935 114 30 INS INS NNP 8935 114 31 .003 .003 CD 8935 114 32 KA KA NNP 8935 114 33 .006 .006 CD 8935 114 34 TH th NN 8935 114 35 .005 .005 CD 8935 114 36 TIN tin NN 8935 114 37 .003 .003 CD 8935 114 38 OS os NN 8935 114 39 .004 .004 CD 8935 114 40 MA MA NNP 8935 114 41 .007 .007 CD 8935 114 42 ITH ITH NNP 8935 114 43 .004 .004 CD 8935 114 44 NN NN NNP 8935 114 45 .010 .010 CD 8935 114 46 RS RS NNP 8935 114 47 .006 .006 CD 8935 114 48 NA na NN 8935 114 49 .003 .003 CD 8935 114 50 I i NN 8935 114 51 .014 .014 CD 8935 114 52 ON on IN 8935 114 53 .009 .009 CD 8935 114 54 ss ss NN 8935 114 55 .005 .005 CD 8935 114 56 INA ina NN 8935 114 57 .004 .004 CD 8935 114 58 AI ai NN 8935 114 59 .004 .004 CD 8935 114 60 SON son NN 8935 114 61 .013 .013 CD 8935 114 62 TS TS NNP 8935 114 63 .004 .004 CD 8935 114 64 RA RA NNP 8935 114 65 .010 .010 CD 8935 114 66 HI hi UH 8935 114 67 .007 .007 CD 8935 114 68 LSON lson NN 8935 114 69 .004 .004 CD 8935 114 70 us -PRON- PRP 8935 114 71 .004 .004 CD 8935 114 72 TA TA NNP 8935 114 73 .008 .008 CD 8935 114 74 II ii CD 8935 114 75 .009 .009 CD 8935 114 76 NSON nson NN 8935 114 77 .006 .006 CD 8935 114 78 T t NN 8935 114 79 .012 .012 CD 8935 114 80 VA VA NNP 8935 114 81 .004 .004 CD 8935 114 82 VSKII VSKII NNS 8935 114 83 .005 .005 CD 8935 114 84 RSON rson NN 8935 114 85 .004 .004 CD 8935 114 86 DT dt NN 8935 114 87 .003 .003 CD 8935 114 88 OVA OVA NNP 8935 114 89 .010 .010 CD 8935 114 90 KI KI NNP 8935 114 91 .006 .006 CD 8935 114 92 TON ton NN 8935 114 93 .009 .009 CD 8935 114 94 ET et NN 8935 114 95 .004 .004 CD 8935 114 96 WA wa NN 8935 114 97 .004 .004 CD 8935 114 98 SKI ski NN 8935 114 99 .005 .005 CD 8935 114 100 0 0 CD 8935 114 101 .017 .017 CD 8935 114 102 NT NT NNP 8935 114 103 .004 .004 CD 8935 114 104 YA ya NN 8935 114 105 .005 .005 CD 8935 114 106 WSKI wski NN 8935 114 107 .004 .004 CD 8935 114 108 KO KO NNP 8935 114 109 .003 .003 CD 8935 114 110 RT rt NN 8935 114 111 .003 .003 CD 8935 114 112 B B NNP 8935 114 113 .003 .003 CD 8935 114 114 LI LI NNP 8935 114 115 .005 .005 CD 8935 114 116 NKO NKO NNP 8935 114 117 .010 .010 CD 8935 114 118 ERT ERT NNP 8935 114 119 .004 .004 CD 8935 114 120 c c NN 8935 114 121 .005 .005 CD 8935 114 122 NI NI NNP 8935 114 123 .007 .007 CD 8935 114 124 NO no UH 8935 114 125 .004 .004 CD 8935 114 126 ST ST NNP 8935 114 127 .004 .004 CD 8935 114 128 D D NNP 8935 114 129 .009 .009 CD 8935 114 130 RI RI NNP 8935 114 131 .005 .005 CD 8935 114 132 TO to IN 8935 114 133 .007 .007 CD 8935 114 134 TT TT NNP 8935 114 135 .005 .005 CD 8935 114 136 LD LD NNP 8935 114 137 .005 .005 CD 8935 114 138 TI TI NNP 8935 114 139 .004 .004 CD 8935 114 140 p p NN 8935 114 141 .004 .004 CD 8935 114 142 ETT ETT NNP 8935 114 143 .003 .003 CD 8935 114 144 ND ND NNP 8935 114 145 .006 .006 CD 8935 114 146 J J NNP 8935 114 147 .001 .001 CD 8935 114 148 Q q NN 8935 114 149 .001 .001 CD 8935 114 150 u u CD 8935 114 151 .013 .013 CD 8935 114 152 RD rd NN 8935 114 153 .009 .009 CD 8935 114 154 K k NN 8935 114 155 .010 .010 . 8935 114 156 R r NN 8935 114 157 .005 .005 CD 8935 114 158 v v NN 8935 114 159 .001 .001 CD 8935 114 160 E e NN 8935 114 161 .020 .020 CD 8935 114 162 AK AK NNP 8935 114 163 .006 .006 CD 8935 114 164 AR AR NNP 8935 114 165 .006 .006 CD 8935 114 166 EV ev NN 8935 114 167 .018 .018 CD 8935 114 168 DE de NN 8935 114 169 .003 .003 CD 8935 114 170 CK CK NNP 8935 114 171 .009 .009 CD 8935 114 172 ER ER NNP 8935 114 173 .016 .016 CD 8935 114 174 ov ov IN 8935 114 175 .012 .012 CD 8935 114 176 EE ee NN 8935 114 177 .004 .004 CD 8935 114 178 EK EK NNP 8935 114 179 .004 .004 CD 8935 114 180 BER BER NNP 8935 114 181 .003 .003 CD 8935 114 182 KOV kov NN 8935 114 183 .008 .008 CD 8935 114 184 GE GE NNP 8935 114 185 .004 .004 CD 8935 114 186 IK IK NNP 8935 114 187 .004 .004 CD 8935 114 188 DER der RB 8935 114 189 .006 .006 CD 8935 114 190 IKOV IKOV NNP 8935 114 191 .004 .004 CD 8935 114 192 KE KE NNP 8935 114 193 .006 .006 CD 8935 114 194 L l NN 8935 114 195 .007 .007 CD 8935 114 196 GER GER NNP 8935 114 197 .005 .005 CD 8935 114 198 LOV LOV NNP 8935 114 199 .005 .005 CD 8935 114 200 LE LE NNP 8935 114 201 .008 .008 CD 8935 114 202 AL AL NNP 8935 114 203 .006 .006 CD 8935 114 204 NGER NGER NNP 8935 114 205 .003 .003 CD 8935 114 206 NOV NOV NNP 8935 114 207 .006 .006 CD 8935 114 208 NE NE NNP 8935 114 209 .008 .008 CD 8935 114 210 EL el NN 8935 114 211 .012 .012 CD 8935 114 212 HER her PRP$ 8935 114 213 .006 .006 CD 8935 114 214 ANOV anov NN 8935 114 215 .006 .006 CD 8935 114 216 RE re NN 8935 114 217 .006 .006 CD 8935 114 218 LL ll NN 8935 114 219 .004 .004 CD 8935 114 220 IER IER NNP 8935 114 221 .005 .005 CD 8935 114 222 ROV ROV NNP 8935 114 223 .006 .006 CD 8935 114 224 SE SE NNP 8935 114 225 .005 .005 CD 8935 114 226 ALL ALL NNP 8935 114 227 .004 .004 CD 8935 114 228 KER KER NNP 8935 114 229 .007 .007 CD 8935 114 230 sov sov NN 8935 114 231 .003 .003 CD 8935 114 232 TE TE NNP 8935 114 233 .004 .004 CD 8935 114 234 ELL ell NN 8935 114 235 .008 .008 CD 8935 114 236 LER ler NN 8935 114 237 .007 .007 CD 8935 114 238 w w NN 8935 114 239 .005 .005 CD 8935 114 240 F f NN 8935 114 241 .003 .003 CD 8935 114 242 M m CD 8935 114 243 .008 .008 CD 8935 114 244 LLER LLER NNS 8935 114 245 .005 .005 CD 8935 114 246 X x NN 8935 114 247 .004 .004 CD 8935 114 248 FF ff NN 8935 114 249 .003 .003 CD 8935 114 250 AM am VBP 8935 114 251 .005 .005 CD 8935 114 252 MER MER NNP 8935 114 253 .003 .003 CD 8935 114 254 y y NN 8935 114 255 .017 .017 CD 8935 114 256 G g NN 8935 114 257 .004 .004 CD 8935 114 258 N n CD 8935 114 259 .009 .009 CD 8935 114 260 NER NER NNP 8935 114 261 .010 .010 CD 8935 114 262 AY ay NN 8935 114 263 .004 .004 CD 8935 114 264 NG NG NNP 8935 114 265 .004 .004 CD 8935 114 266 AN an NN 8935 114 267 .017 .017 CD 8935 114 268 SER ser NN 8935 114 269 .003 .003 CD 8935 114 270 EY ey NN 8935 114 271 .006 .006 CD 8935 114 272 ANG ANG NNP 8935 114 273 .003 .003 CD 8935 114 274 MAN MAN NNP 8935 114 275 .014 .014 CD 8935 114 276 TER ter NN 8935 114 277 .008 .008 CD 8935 114 278 LEY LEY NNP 8935 114 279 .007 .007 CD 8935 114 280 ING ING NNP 8935 114 281 .007 .007 CD 8935 114 282 RMAN RMAN NNP 8935 114 283 .003 .003 CD 8935 114 284 OR or CC 8935 114 285 .004 .004 CD 8935 114 286 KY KY NNP 8935 114 287 .004 .004 CD 8935 114 288 RG rg NN 8935 114 289 .007 .007 CD 8935 114 290 YAN YAN NNP 8935 114 291 .003 .003 CD 8935 114 292 s s CD 8935 114 293 .016 .016 CD 8935 114 294 RY RY NNP 8935 114 295 .005 .005 CD 8935 114 296 H h NN 8935 114 297 .004 .004 CD 8935 114 298 EN en NN 8935 114 299 .018 .018 CD 8935 114 300 AS as IN 8935 114 301 .007 .007 CD 8935 114 302 z z NN 8935 114 303 .007 .007 CD 8935 114 304 CH CH NNP 8935 114 305 .009 .009 CD 8935 114 306 SEN SEN NNP 8935 114 307 .007 .007 CD 8935 114 308 ES es NN 8935 114 309 .011 .011 CD 8935 114 310 TZ tz NN 8935 114 311 .006 .006 CD 8935 114 312 ICH ICH NNP 8935 114 313 .003 .003 CD 8935 114 314 IN in IN 8935 114 315 .019 .019 CD 8935 114 316 NES NES NNP 8935 114 317 .004 .004 CD 8935 114 318 H=7.059 H=7.059 NNP 8935 114 319 Hmax Hmax NNP 8935 114 320 = = SYM 8935 114 321 7.276(log.155 7.276(log.155 LS 8935 114 322 ) ) -RRB- 8935 114 323 Hr Hr NNP 8935 114 324 = = NFP 8935 114 325 0.970 0.970 CD 8935 114 326 Va1'iety Va1'iety NNP 8935 114 327 - - HYPH 8935 114 328 Generator Generator NNP 8935 114 329 ApproachjFOKKER ApproachjFOKKER NNP 8935 114 330 and and CC 8935 114 331 LYNCH LYNCH NNP 8935 114 332 115 115 CD 8935 114 333 1 1 CD 8935 114 334 .99 .99 CD 8935 114 335 .98 .98 CD 8935 114 336 .97 .97 CD 8935 114 337 .96 .96 CD 8935 114 338 .95 .95 CD 8935 114 339 .94 .94 NFP 8935 114 340 .93 .93 NFP 8935 114 341 .92 .92 NFP 8935 114 342 Hr Hr NNP 8935 114 343 .91 .91 CD 8935 114 344 .90 .90 CD 8935 114 345 .89 .89 CD 8935 114 346 .88 .88 CD 8935 114 347 .87 .87 CD 8935 114 348 .86 .86 CD 8935 114 349 0 0 CD 8935 114 350 40 40 CD 8935 114 351 80 80 CD 8935 114 352 120 120 CD 8935 114 353 160 160 CD 8935 114 354 200 200 CD 8935 114 355 Total total JJ 8935 114 356 number number NN 8935 114 357 of of IN 8935 114 358 keys key NNS 8935 114 359 for for IN 8935 114 360 the the DT 8935 114 361 end end NN 8935 114 362 of of IN 8935 114 363 sumames sumame NNS 8935 114 364 E!g e!g NN 8935 114 365 . . . 8935 115 1 4 4 LS 8935 115 2 . . . 8935 116 1 Increase increase VB 8935 116 2 in in IN 8935 116 3 relative relative JJ 8935 116 4 entropy entropy RB 8935 116 5 with with IN 8935 116 6 increase increase NN 8935 116 7 in in IN 8935 116 8 key key JJ 8935 116 9 - - HYPH 8935 116 10 set set NN 8935 116 11 size size NN 8935 116 12 ; ; : 8935 116 13 keys key NNS 8935 116 14 generated generate VBD 8935 116 15 from from IN 8935 116 16 50,000 50,000 CD 8935 116 17 surnames surname NNS 8935 116 18 forty forty CD 8935 116 19 trigrams trigram NNS 8935 116 20 , , , 8935 116 21 ten ten CD 8935 116 22 tetragrams tetragram NNS 8935 116 23 , , , 8935 116 24 and and CC 8935 116 25 a a DT 8935 116 26 single single JJ 8935 116 27 pentagram pentagram NN 8935 116 28 . . . 8935 117 1 The the DT 8935 117 2 breakdown breakdown NN 8935 117 3 of of IN 8935 117 4 the the DT 8935 117 5 individual individual JJ 8935 117 6 terminal terminal JJ 8935 117 7 characters character NNS 8935 117 8 of of IN 8935 117 9 the the DT 8935 117 10 surname surname NN 8935 117 11 is be VBZ 8935 117 12 also also RB 8935 117 13 more more RBR 8935 117 14 extreme extreme JJ 8935 117 15 , , , 8935 117 16 since since IN 8935 117 17 the the DT 8935 117 18 distribution distribution NN 8935 117 19 is be VBZ 8935 117 20 more more RBR 8935 117 21 skew skew JJ 8935 117 22 . . . 8935 118 1 Thus thus RB 8935 118 2 N N NNP 8935 118 3 , , , 8935 118 4 the the DT 8935 118 5 most most RBS 8935 118 6 frequent frequent JJ 8935 118 7 last last JJ 8935 118 8 char- char- XX 8935 118 9 acter acter NN 8935 118 10 , , , 8935 118 11 has have VBZ 8935 118 12 no no DT 8935 118 13 fewer few JJR 8935 118 14 than than IN 8935 118 15 nineteen nineteen CD 8935 118 16 different different JJ 8935 118 17 keys key NNS 8935 118 18 in in IN 8935 118 19 this this DT 8935 118 20 set set NN 8935 118 21 , , , 8935 118 22 closely closely RB 8935 118 23 followed follow VBN 8935 118 24 by by IN 8935 118 25 R r NN 8935 118 26 , , , 8935 118 27 with with IN 8935 118 28 seventeen seventeen CD 8935 118 29 keys key NNS 8935 118 30 . . . 8935 119 1 The the DT 8935 119 2 relative relative JJ 8935 119 3 entropy entropy JJ 8935 119 4 of of IN 8935 119 5 the the DT 8935 119 6 distribution distribution NN 8935 119 7 is be VBZ 8935 119 8 again again RB 8935 119 9 high high JJ 8935 119 10 , , , 8935 119 11 at at IN 8935 119 12 0.970 0.970 CD 8935 119 13 for for IN 8935 119 14 this this DT 8935 119 15 key key NN 8935 119 16 - - HYPH 8935 119 17 set set NN 8935 119 18 . . . 8935 120 1 Figure figure NN 8935 120 2 4 4 CD 8935 120 3 shows show VBZ 8935 120 4 the the DT 8935 120 5 relation relation NN 8935 120 6 between between IN 8935 120 7 key key JJ 8935 120 8 - - HYPH 8935 120 9 set set NN 8935 120 10 size size NN 8935 120 11 and and CC 8935 120 12 relative relative JJ 8935 120 13 entropy entropy RB 8935 120 14 , , , 8935 120 15 and and CC 8935 120 16 indicates indicate VBZ 8935 120 17 that that IN 8935 120 18 a a DT 8935 120 19 larger large JJR 8935 120 20 number number NN 8935 120 21 of of IN 8935 120 22 keys key NNS 8935 120 23 from from IN 8935 120 24 the the DT 8935 120 25 last last JJ 8935 120 26 character character NN 8935 120 27 of of IN 8935 120 28 the the DT 8935 120 29 surname surname NN 8935 120 30 is be VBZ 8935 120 31 required require VBN 8935 120 32 to to TO 8935 120 33 reach reach VB 8935 120 34 the the DT 8935 120 35 same same JJ 8935 120 36 relative relative JJ 8935 120 37 en- en- CD 8935 120 38 116 116 CD 8935 120 39 Journal Journal NNP 8935 120 40 of of IN 8935 120 41 Library Library NNP 8935 120 42 Automation Automation NNP 8935 120 43 Vol Vol NNP 8935 120 44 . . . 8935 121 1 7/2 7/2 CD 8935 121 2 June June NNP 8935 121 3 197 197 CD 8935 121 4 4 4 CD 8935 121 5 tropy tropy NN 8935 121 6 as as IN 8935 121 7 keys key NNS 8935 121 8 from from IN 8935 121 9 the the DT 8935 121 10 first first JJ 8935 121 11 character character NN 8935 121 12 . . . 8935 122 1 There there EX 8935 122 2 is be VBZ 8935 122 3 an an DT 8935 122 4 anomalous anomalous JJ 8935 122 5 section section NN 8935 122 6 of of IN 8935 122 7 the the DT 8935 122 8 curve curve NN 8935 122 9 , , , 8935 122 10 which which WDT 8935 122 11 may may MD 8935 122 12 well well RB 8935 122 13 derive derive VB 8935 122 14 from from IN 8935 122 15 the the DT 8935 122 16 much much RB 8935 122 17 greater great JJR 8935 122 18 prevalence prevalence NN 8935 122 19 of of IN 8935 122 20 suffixes suffix NNS 8935 122 21 than than IN 8935 122 22 prefixes prefix NNS 8935 122 23 in in IN 8935 122 24 personal personal JJ 8935 122 25 names name NNS 8935 122 26 . . . 8935 123 1 CONCLUSIONS conclusion NNS 8935 123 2 This this DT 8935 123 3 study study NN 8935 123 4 has have VBZ 8935 123 5 demonstrated demonstrate VBN 8935 123 6 the the DT 8935 123 7 feasibility feasibility NN 8935 123 8 of of IN 8935 123 9 devising devise VBG 8935 123 10 partial partial JJ 8935 123 11 represen- represen- JJ 8935 123 12 tations tation NNS 8935 123 13 of of IN 8935 123 14 author author NN 8935 123 15 names name NNS 8935 123 16 by by IN 8935 123 17 applying apply VBG 8935 123 18 the the DT 8935 123 19 variety variety NN 8935 123 20 - - HYPH 8935 123 21 generator generator NN 8935 123 22 approach approach NN 8935 123 23 to to TO 8935 123 24 overcome overcome VB 8935 123 25 the the DT 8935 123 26 substantial substantial JJ 8935 123 27 frequency frequency NN 8935 123 28 variations variation NNS 8935 123 29 encountered encounter VBN 8935 123 30 in in IN 8935 123 31 their -PRON- PRP$ 8935 123 32 dis- dis- NN 8935 123 33 tributions tribution NNS 8935 123 34 . . . 8935 124 1 It -PRON- PRP 8935 124 2 has have VBZ 8935 124 3 also also RB 8935 124 4 been be VBN 8935 124 5 shown show VBN 8935 124 6 that that IN 8935 124 7 within within IN 8935 124 8 a a DT 8935 124 9 homogeneous homogeneous JJ 8935 124 10 file file NN 8935 124 11 , , , 8935 124 12 i.e. i.e. FW 8935 124 13 , , , 8935 124 14 one one CD 8935 124 15 of of IN 8935 124 16 consistent consistent JJ 8935 124 17 provenance provenance NN 8935 124 18 , , , 8935 124 19 there there EX 8935 124 20 exists exist VBZ 8935 124 21 a a DT 8935 124 22 substantial substantial JJ 8935 124 23 level level NN 8935 124 24 of of IN 8935 124 25 consistency consistency NN 8935 124 26 in in IN 8935 124 27 terms term NNS 8935 124 28 of of IN 8935 124 29 character character NN 8935 124 30 distributions distribution NNS 8935 124 31 , , , 8935 124 32 as as IN 8935 124 33 illustrated illustrate VBN 8935 124 34 in in IN 8935 124 35 Table Table NNP 8935 124 36 4 4 CD 8935 124 37 . . . 8935 125 1 The the DT 8935 125 2 character- character- JJ 8935 125 3 istics istic NNS 8935 125 4 may may MD 8935 125 5 vary vary VB 8935 125 6 substantially substantially RB 8935 125 7 between between IN 8935 125 8 data data NN 8935 125 9 bases basis NNS 8935 125 10 of of IN 8935 125 11 different different JJ 8935 125 12 provenance provenance NN 8935 125 13 , , , 8935 125 14 e.g. e.g. RB 8935 125 15 , , , 8935 125 16 as as IN 8935 125 17 between between IN 8935 125 18 INSPEC INSPEC NNP 8935 125 19 and and CC 8935 125 20 MARC MARC NNP 8935 125 21 files file NNS 8935 125 22 . . . 8935 126 1 17 17 CD 8935 126 2 Conventional conventional JJ 8935 126 3 approaches approach NNS 8935 126 4 to to IN 8935 126 5 processing processing NN 8935 126 6 records record NNS 8935 126 7 comprising comprise VBG 8935 126 8 linguistic linguistic JJ 8935 126 9 data datum NNS 8935 126 10 tend tend VBP 8935 126 11 to to TO 8935 126 12 disregard disregard VB 8935 126 13 the the DT 8935 126 14 statistical statistical JJ 8935 126 15 properties property NNS 8935 126 16 of of IN 8935 126 17 the the DT 8935 126 18 items item NNS 8935 126 19 , , , 8935 126 20 and and CC 8935 126 21 attempt attempt VB 8935 126 22 to to TO 8935 126 23 overcome overcome VB 8935 126 24 the the DT 8935 126 25 resultant resultant JJ 8935 126 26 problems problem NNS 8935 126 27 either either CC 8935 126 28 by by IN 8935 126 29 storage storage NN 8935 126 30 of of IN 8935 126 31 extensive extensive JJ 8935 126 32 lists list NNS 8935 126 33 of of IN 8935 126 34 items item NNS 8935 126 35 or or CC 8935 126 36 by by IN 8935 126 37 using use VBG 8935 126 38 complex complex JJ 8935 126 39 numerical numerical JJ 8935 126 40 algorithms algorithm NNS 8935 126 41 . . . 8935 127 1 Typical typical JJ 8935 127 2 of of IN 8935 127 3 this this DT 8935 127 4 latter latter JJ 8935 127 5 ap- ap- NNP 8935 127 6 proach proach NN 8935 127 7 , , , 8935 127 8 in in IN 8935 127 9 the the DT 8935 127 10 present present JJ 8935 127 11 context context NN 8935 127 12 , , , 8935 127 13 is be VBZ 8935 127 14 the the DT 8935 127 15 use use NN 8935 127 16 of of IN 8935 127 17 truncated truncate VBN 8935 127 18 search search NN 8935 127 19 keys key NNS 8935 127 20 for for IN 8935 127 21 ac- ac- RB 8935 127 22 cess cess NN 8935 127 23 to to IN 8935 127 24 bibliographical bibliographical JJ 8935 127 25 files file NNS 8935 127 26 in in IN 8935 127 27 direct direct JJ 8935 127 28 access access NN 8935 127 29 stores store NNS 8935 127 30 , , , 8935 127 31 in in IN 8935 127 32 which which WDT 8935 127 33 fixed fix VBN 8935 127 34 - - HYPH 8935 127 35 length length NN 8935 127 36 character character NN 8935 127 37 strings string NNS 8935 127 38 are be VBP 8935 127 39 the the DT 8935 127 40 keys key NNS 8935 127 41 , , , 8935 127 42 as as IN 8935 127 43 , , , 8935 127 44 for for IN 8935 127 45 instance instance NN 8935 127 46 , , , 8935 127 47 in in IN 8935 127 48 the the DT 8935 127 49 system system NN 8935 127 50 in in IN 8935 127 51 operation operation NN 8935 127 52 at at IN 8935 127 53 the the DT 8935 127 54 Ohio Ohio NNP 8935 127 55 College College NNP 8935 127 56 Library Library NNP 8935 127 57 Center.18 Center.18 . 8935 127 58 The the DT 8935 127 59 problems problem NNS 8935 127 60 encountered encounter VBN 8935 127 61 in in IN 8935 127 62 the the DT 8935 127 63 use use NN 8935 127 64 of of IN 8935 127 65 fixed fix VBN 8935 127 66 - - HYPH 8935 127 67 length length NN 8935 127 68 truncated truncate VBN 8935 127 69 author author NN 8935 127 70 and and CC 8935 127 71 title title NN 8935 127 72 search search NN 8935 127 73 keys key NNS 8935 127 74 for for IN 8935 127 75 monograph monograph NNP 8935 127 76 data datum NNS 8935 127 77 are be VBP 8935 127 78 indicated indicate VBN 8935 127 79 by by IN 8935 127 80 the the DT 8935 127 81 fact fact NN 8935 127 82 that that IN 8935 127 83 the the DT 8935 127 84 search search NN 8935 127 85 files file NNS 8935 127 86 using use VBG 8935 127 87 hash hash NN 8935 127 88 - - HYPH 8935 127 89 addressing addressing NN 8935 127 90 are be VBP 8935 127 91 operated operate VBN 8935 127 92 , , , 8935 127 93 on on IN 8935 127 94 average average JJ 8935 127 95 , , , 8935 127 96 at at IN 8935 127 97 a a DT 8935 127 98 density density NN 8935 127 99 of of IN 8935 127 100 only only RB 8935 127 101 62.5 62.5 CD 8935 127 102 percent percent NN 8935 127 103 . . . 8935 128 1 Once once IN 8935 128 2 the the DT 8935 128 3 density density NN 8935 128 4 reaches reach VBZ 8935 128 5 75 75 CD 8935 128 6 percent percent NN 8935 128 7 , , , 8935 128 8 the the DT 8935 128 9 proportion proportion NN 8935 128 10 of of IN 8935 128 11 collisions collision NNS 8935 128 12 and and CC 8935 128 13 the the DT 8935 128 14 resultant resultant JJ 8935 128 15 degrada- degrada- NN 8935 128 16 tion tion NN 8935 128 17 in in IN 8935 128 18 performance performance NN 8935 128 19 are be VBP 8935 128 20 such such JJ 8935 128 21 that that IN 8935 128 22 the the DT 8935 128 23 files file NNS 8935 128 24 are be VBP 8935 128 25 recreated recreate VBN 8935 128 26 at at IN 8935 128 27 a a DT 8935 128 28 density density NN 8935 128 29 of of IN 8935 128 30 only only RB 8935 128 31 50 50 CD 8935 128 32 percent percent NN 8935 128 33 . . . 8935 129 1 Fixed fix VBN 8935 129 2 - - HYPH 8935 129 3 length length NN 8935 129 4 keys key NNS 8935 129 5 from from IN 8935 129 6 author author NN 8935 129 7 and and CC 8935 129 8 title title NN 8935 129 9 entries entry NNS 8935 129 10 are be VBP 8935 129 11 demonstrably demonstrably RB 8935 129 12 ineffi- ineffi- VBG 8935 129 13 cient cient NN 8935 129 14 in in IN 8935 129 15 performance performance NN 8935 129 16 since since IN 8935 129 17 the the DT 8935 129 18 information information NN 8935 129 19 content content NN 8935 129 20 is be VBZ 8935 129 21 low low JJ 8935 129 22 . . . 8935 130 1 The the DT 8935 130 2 distribu- distribu- JJ 8935 130 3 tion tion NN 8935 130 4 of of IN 8935 130 5 the the DT 8935 130 6 initial initial JJ 8935 130 7 trigrams trigram NNS 8935 130 8 of of IN 8935 130 9 50,000 50,000 CD 8935 130 10 names name NNS 8935 130 11 from from IN 8935 130 12 the the DT 8935 130 13 INSPEC INSPEC NNP 8935 130 14 file file NN 8935 130 15 pro- pro- NN 8935 130 16 vides vide VBZ 8935 130 17 corroboration corroboration NN 8935 130 18 of of IN 8935 130 19 this this DT 8935 130 20 fact fact NN 8935 130 21 . . . 8935 131 1 The the DT 8935 131 2 number number NN 8935 131 3 of of IN 8935 131 4 possible possible JJ 8935 131 5 combinations combination NNS 8935 131 6 of of IN 8935 131 7 three three CD 8935 131 8 characters character NNS 8935 131 9 is be VBZ 8935 131 10 17,576 17,576 CD 8935 131 11 ( ( -LRB- 8935 131 12 263 263 CD 8935 131 13 ) ) -RRB- 8935 131 14 , , , 8935 131 15 yet yet CC 8935 131 16 only only RB 8935 131 17 3,285 3,285 CD 8935 131 18 trigrams trigram NNS 8935 131 19 were be VBD 8935 131 20 represented represent VBN 8935 131 21 in in IN 8935 131 22 the the DT 8935 131 23 file file NN 8935 131 24 , , , 8935 131 25 or or CC 8935 131 26 18.7 18.7 CD 8935 131 27 percent percent NN 8935 131 28 of of IN 8935 131 29 the the DT 8935 131 30 total total JJ 8935 131 31 variety variety NN 8935 131 32 . . . 8935 132 1 Moreover moreover RB 8935 132 2 , , , 8935 132 3 the the DT 8935 132 4 relative relative NN 8935 132 5 en- en- XX 8935 132 6 tropy tropy NN 8935 132 7 of of IN 8935 132 8 the the DT 8935 132 9 trigrams trigram NNS 8935 132 10 is be VBZ 8935 132 11 much much RB 8935 132 12 lower low JJR 8935 132 13 than than IN 8935 132 14 that that DT 8935 132 15 of of IN 8935 132 16 the the DT 8935 132 17 initial initial JJ 8935 132 18 characters character NNS 8935 132 19 of of IN 8935 132 20 the the DT 8935 132 21 surnames surname NNS 8935 132 22 , , , 8935 132 23 at at IN 8935 132 24 0.73 0.73 CD 8935 132 25 . . . 8935 133 1 Performance performance NN 8935 133 2 figures figure VBZ 8935 133 3 for for IN 8935 133 4 precision precision NN 8935 133 5 illustrate illustrate VBP 8935 133 6 this this DT 8935 133 7 point.19 point.19 NNP 8935 133 8 The the DT 8935 133 9 present present JJ 8935 133 10 work work NN 8935 133 11 , , , 8935 133 12 together together RB 8935 133 13 with with IN 8935 133 14 other other JJ 8935 133 15 studies study NNS 8935 133 16 of of IN 8935 133 17 the the DT 8935 133 18 scope scope NN 8935 133 19 for for IN 8935 133 20 applica- applica- NNP 8935 133 21 tion tion NN 8935 133 22 of of IN 8935 133 23 the the DT 8935 133 24 variety variety NN 8935 133 25 - - HYPH 8935 133 26 generator generator NN 8935 133 27 approach approach NN 8935 133 28 , , , 8935 133 29 thus thus RB 8935 133 30 stands stand VBZ 8935 133 31 in in IN 8935 133 32 considerable considerable JJ 8935 133 33 con- con- NN 8935 133 34 trast trast NN 8935 133 35 to to TO 8935 133 36 prior prior RB 8935 133 37 work work VB 8935 133 38 , , , 8935 133 39 and and CC 8935 133 40 must must MD 8935 133 41 be be VB 8935 133 42 viewed view VBN 8935 133 43 as as IN 8935 133 44 a a DT 8935 133 45 means means NN 8935 133 46 whereby whereby WRB 8935 133 47 the the DT 8935 133 48 microstruc- microstruc- NN 8935 133 49 ture ture NN 8935 133 50 of of IN 8935 133 51 particular particular JJ 8935 133 52 data data NN 8935 133 53 elements element NNS 8935 133 54 is be VBZ 8935 133 55 fully fully RB 8935 133 56 reflected reflect VBN 8935 133 57 in in IN 8935 133 58 their -PRON- PRP$ 8935 133 59 manipulation manipulation NN 8935 133 60 , , , 8935 133 61 affording afford VBG 8935 133 62 substantial substantial JJ 8935 133 63 advantages advantage NNS 8935 133 64 . . . 8935 134 1 20 20 CD 8935 134 2 Part part NN 8935 134 3 2 2 CD 8935 134 4 of of IN 8935 134 5 this this DT 8935 134 6 paper paper NN 8935 134 7 illustrates illustrate VBZ 8935 134 8 this this DT 8935 134 9 in in IN 8935 134 10 re- re- JJ 8935 134 11 gard gard NN 8935 134 12 to to IN 8935 134 13 searches search NNS 8935 134 14 of of IN 8935 134 15 personal personal JJ 8935 134 16 names name NNS 8935 134 17 . . . 8935 135 1 ACKNOWLEDGMENTS acknowledgment NNS 8935 135 2 We -PRON- PRP 8935 135 3 thank thank VBP 8935 135 4 M. M. NNP 8935 135 5 D. D. NNP 8935 135 6 Martin Martin NNP 8935 135 7 of of IN 8935 135 8 the the DT 8935 135 9 Institution Institution NNP 8935 135 10 of of IN 8935 135 11 Electrical Electrical NNP 8935 135 12 Engineers Engineers NNP 8935 135 13 for for IN 8935 135 14 Vm·iety Vm·iety NNP 8935 135 15 - - HYPH 8935 135 16 Generator Generator NNP 8935 135 17 ApproachjFOKKER ApproachjFOKKER NNP 8935 135 18 and and CC 8935 135 19 LYNCH LYNCH NNP 8935 135 20 117 117 CD 8935 135 21 provisiOn provision CD 8935 135 22 of of IN 8935 135 23 a a DT 8935 135 24 part part NN 8935 135 25 of of IN 8935 135 26 the the DT 8935 135 27 INSPEC INSPEC NNP 8935 135 28 data data NN 8935 135 29 base base NN 8935 135 30 and and CC 8935 135 31 of of IN 8935 135 32 file file NN 8935 135 33 - - HYPH 8935 135 34 handling handle VBG 8935 135 35 soft- soft- NN 8935 135 36 ware ware NN 8935 135 37 , , , 8935 135 38 and and CC 8935 135 39 the the DT 8935 135 40 Potchefstroom Potchefstroom NNP 8935 135 41 University University NNP 8935 135 42 for for IN 8935 135 43 C.H.E. C.H.E. NNP 8935 136 1 ( ( -LRB- 8935 136 2 South South NNP 8935 136 3 Africa Africa NNP 8935 136 4 ) ) -RRB- 8935 136 5 for for IN 8935 136 6 awarding award VBG 8935 136 7 a a DT 8935 136 8 National National NNP 8935 136 9 Grant Grant NNP 8935 136 10 to to IN 8935 136 11 D. D. NNP 8935 136 12 Fokker Fokker NNP 8935 136 13 to to TO 8935 136 14 pursue pursue VB 8935 136 15 this this DT 8935 136 16 work work NN 8935 136 17 . . . 8935 137 1 We -PRON- PRP 8935 137 2 also also RB 8935 137 3 thank thank VBP 8935 137 4 Dr. Dr. NNP 8935 138 1 I. I. NNP 8935 138 2 J. J. NNP 8935 138 3 Barton Barton NNP 8935 138 4 and and CC 8935 138 5 Dr. Dr. NNP 8935 138 6 G. G. NNP 8935 138 7 W. W. NNP 8935 138 8 Adamson Adamson NNP 8935 138 9 for for IN 8935 138 10 valuable valuable JJ 8935 138 11 discussions discussion NNS 8935 138 12 , , , 8935 138 13 and and CC 8935 138 14 the the DT 8935 138 15 former former JJ 8935 138 16 for for IN 8935 138 17 n n JJ 8935 138 18 - - HYPH 8935 138 19 gram gram NN 8935 138 20 generation generation NN 8935 138 21 programs program NNS 8935 138 22 . . . 8935 139 1 REFERENCES REFERENCES NNP 8935 139 2 I. I. NNP 8935 139 3 P. P. NNP 8935 139 4 B. B. NNP 8935 139 5 Schipma Schipma NNP 8935 139 6 , , , 8935 139 7 Term Term NNP 8935 139 8 Fragment Fragment NNP 8935 139 9 Analysis Analysis NNP 8935 139 10 for for IN 8935 139 11 Inversion inversion NN 8935 139 12 of of IN 8935 139 13 Large large JJ 8935 139 14 Files file NNS 8935 139 15 ( ( -LRB- 8935 139 16 Chicago Chicago NNP 8935 139 17 : : : 8935 139 18 Illi- Illi- NNP 8935 139 19 nois nois NNP 8935 139 20 Institute Institute NNP 8935 139 21 of of IN 8935 139 22 Technology Technology NNP 8935 139 23 Research Research NNP 8935 139 24 Institute Institute NNP 8935 139 25 , , , 8935 139 26 1971 1971 CD 8935 139 27 ) ) -RRB- 8935 139 28 . . . 8935 140 1 2 2 LS 8935 140 2 . . . 8935 141 1 J. J. NNP 8935 141 2 C. C. NNP 8935 141 3 Costello Costello NNP 8935 141 4 and and CC 8935 141 5 E. E. NNP 8935 141 6 Wall Wall NNP 8935 141 7 , , , 8935 141 8 " " `` 8935 141 9 Recent recent JJ 8935 141 10 Improvements improvement NNS 8935 141 11 in in IN 8935 141 12 Techniques Techniques NNP 8935 141 13 for for IN 8935 141 14 Storing store VBG 8935 141 15 and and CC 8935 141 16 Retrieving Retrieving NNP 8935 141 17 Information information NN 8935 141 18 , , , 8935 141 19 " " '' 8935 141 20 in in IN 8935 141 21 Studies study NNS 8935 141 22 in in IN 8935 141 23 Co co NN 8935 141 24 - - NN 8935 141 25 ordinate ordinate JJ 8935 141 26 Indexing Indexing NNP 8935 141 27 , , , 8935 141 28 vol vol NNP 8935 141 29 . . . 8935 142 1 5 5 LS 8935 142 2 ( ( -LRB- 8935 142 3 Washington Washington NNP 8935 142 4 , , , 8935 142 5 D.C. D.C. NNP 8935 142 6 : : : 8935 142 7 Documentation Documentation NNP 8935 142 8 Inc. Inc. NNP 8935 142 9 , , , 8935 142 10 1959 1959 CD 8935 142 11 ) ) -RRB- 8935 142 12 . . . 8935 143 1 3 3 LS 8935 143 2 . . . 8935 144 1 L. L. NNP 8935 144 2 H. H. NNP 8935 144 3 Thiel Thiel NNP 8935 144 4 and and CC 8935 144 5 H. H. NNP 8935 144 6 S. S. NNP 8935 144 7 Heaps Heaps NNP 8935 144 8 , , , 8935 144 9 " " '' 8935 144 10 Program Program NNP 8935 144 11 Design Design NNP 8935 144 12 for for IN 8935 144 13 Retrospective Retrospective NNP 8935 144 14 Searches Searches NNPS 8935 144 15 on on IN 8935 144 16 Large Large NNP 8935 144 17 Data datum NNS 8935 144 18 Bases basis NNS 8935 144 19 , , , 8935 144 20 " " `` 8935 144 21 Information Information NNP 8935 144 22 Storage Storage NNP 8935 144 23 and and CC 8935 144 24 Retrieval8:1 retrieval8:1 NN 8935 144 25 - - HYPH 8935 144 26 20 20 CD 8935 144 27 ( ( -LRB- 8935 144 28 Feb. February NNP 8935 144 29 1972 1972 CD 8935 144 30 ) ) -RRB- 8935 144 31 . . . 8935 145 1 4 4 LS 8935 145 2 . . . 8935 146 1 S.C. South Carolina NNP 8935 146 2 Bradford Bradford NNP 8935 146 3 , , , 8935 146 4 Documentation Documentation NNP 8935 146 5 ( ( -LRB- 8935 146 6 London London NNP 8935 146 7 : : : 8935 146 8 Crosby Crosby NNP 8935 146 9 - - HYPH 8935 146 10 Lockwood Lockwood NNP 8935 146 11 , , , 8935 146 12 1948 1948 CD 8935 146 13 ) ) -RRB- 8935 146 14 . . . 8935 147 1 5 5 CD 8935 147 2 . . . 8935 148 1 G. G. NNP 8935 148 2 K. K. NNP 8935 148 3 Zip£ Zip£ NNP 8935 148 4 , , , 8935 148 5 Human Human NNP 8935 148 6 Behaviour Behaviour NNP 8935 148 7 and and CC 8935 148 8 the the DT 8935 148 9 Principle Principle NNP 8935 148 10 of of IN 8935 148 11 Least Least JJS 8935 148 12 Effort Effort NNP 8935 148 13 ( ( -LRB- 8935 148 14 Cambridge Cambridge NNP 8935 148 15 , , , 8935 148 16 Mass Mass NNP 8935 148 17 : : : 8935 148 18 Addison Addison NNP 8935 148 19 - - HYPH 8935 148 20 Wesley Wesley NNP 8935 148 21 , , , 8935 148 22 1949 1949 CD 8935 148 23 ) ) -RRB- 8935 148 24 . . . 8935 149 1 6 6 CD 8935 149 2 . . . 8935 150 1 B. B. NNP 8935 150 2 Mandelbrot Mandelbrot NNP 8935 150 3 , , , 8935 150 4 " " '' 8935 150 5 An an DT 8935 150 6 Informational informational JJ 8935 150 7 Theory Theory NNP 8935 150 8 of of IN 8935 150 9 the the DT 8935 150 10 Statistical Statistical NNP 8935 150 11 Structure Structure NNP 8935 150 12 of of IN 8935 150 13 Language Language NNP 8935 150 14 , , , 8935 150 15 " " '' 8935 150 16 in in IN 8935 150 17 W. W. NNP 8935 150 18 Jackson Jackson NNP 8935 150 19 , , , 8935 150 20 ed ed NNP 8935 150 21 . . NNP 8935 150 22 , , , 8935 150 23 Communication Communication NNP 8935 150 24 Theory Theory NNP 8935 150 25 ( ( -LRB- 8935 150 26 London London NNP 8935 150 27 : : : 8935 150 28 Butterworth Butterworth NNP 8935 150 29 , , , 8935 150 30 1953 1953 CD 8935 150 31 ) ) -RRB- 8935 150 32 , , , 8935 150 33 p.486- p.486- NNP 8935 150 34 501 501 CD 8935 150 35 . . . 8935 151 1 7 7 LS 8935 151 2 . . . 8935 152 1 R. R. NNP 8935 152 2 A. A. NNP 8935 152 3 Fairthorne Fairthorne NNP 8935 152 4 , , , 8935 152 5 " " `` 8935 152 6 Empirical Empirical NNP 8935 152 7 Hyperbolic Hyperbolic NNP 8935 152 8 Distributions Distributions NNPS 8935 152 9 ( ( -LRB- 8935 152 10 Bradford Bradford NNP 8935 152 11 - - HYPH 8935 152 12 Zipf Zipf NNP 8935 152 13 - - HYPH 8935 152 14 Mandelbrot Mandelbrot NNP 8935 152 15 ) ) -RRB- 8935 152 16 for for IN 8935 152 17 Bibliometric Bibliometric NNP 8935 152 18 Description Description NNP 8935 152 19 and and CC 8935 152 20 Prediction Prediction NNP 8935 152 21 , , , 8935 152 22 " " '' 8935 152 23 ] ] -RRB- 8935 152 24 oumal oumal JJ 8935 152 25 of of IN 8935 152 26 Documentation Documentation NNP 8935 152 27 25:319 25:319 CD 8935 152 28 - - SYM 8935 152 29 43 43 CD 8935 152 30 ( ( -LRB- 8935 152 31 Dec. December NNP 8935 152 32 1969 1969 CD 8935 152 33 ) ) -RRB- 8935 152 34 . . . 8935 153 1 8 8 LS 8935 153 2 . . . 8935 154 1 M. M. NNP 8935 154 2 F. F. NNP 8935 154 3 Lynch Lynch NNP 8935 154 4 , , , 8935 154 5 " " `` 8935 154 6 The the DT 8935 154 7 Microstructure Microstructure NNP 8935 154 8 of of IN 8935 154 9 Chemical Chemical NNP 8935 154 10 Data Data NNPS 8935 154 11 - - HYPH 8935 154 12 bases basis NNS 8935 154 13 , , , 8935 154 14 and and CC 8935 154 15 Their -PRON- PRP$ 8935 154 16 Repre- Repre- NNP 8935 154 17 sentation sentation NN 8935 154 18 for for IN 8935 154 19 Retrieval Retrieval NNP 8935 154 20 , , , 8935 154 21 " " `` 8935 154 22 Proceedings Proceedings NNP 8935 154 23 , , , 8935 154 24 CN CN NNP 8935 154 25 AI AI NNP 8935 154 26 NATO NATO NNP 8935 154 27 Advanced Advanced NNP 8935 154 28 Study Study NNP 8935 154 29 Institute Institute NNP 8935 154 30 on on IN 8935 154 31 Computer Computer NNP 8935 154 32 Representation Representation NNP 8935 154 33 and and CC 8935 154 34 Manipulation Manipulation NNP 8935 154 35 of of IN 8935 154 36 Chemical Chemical NNP 8935 154 37 Information Information NNP 8935 154 38 ( ( -LRB- 8935 154 39 in in IN 8935 154 40 press press NN 8935 154 41 ) ) -RRB- 8935 154 42 . . . 8935 155 1 9 9 CD 8935 155 2 . . . 8935 156 1 I. I. NNP 8935 156 2 J. J. NNP 8935 156 3 Barton Barton NNP 8935 156 4 , , , 8935 156 5 S. S. NNP 8935 156 6 E. E. NNP 8935 156 7 Creasey Creasey NNP 8935 156 8 , , , 8935 156 9 M. M. NNP 8935 156 10 F. F. NNP 8935 156 11 Lynch Lynch NNP 8935 156 12 , , , 8935 156 13 and and CC 8935 156 14 M. M. NNP 8935 156 15 J. J. NNP 8935 156 16 Snell Snell NNP 8935 156 17 , , , 8935 156 18 " " `` 8935 156 19 An an DT 8935 156 20 Information Information NNP 8935 156 21 - - HYPH 8935 156 22 Theo- theo- NN 8935 156 23 retic retic JJ 8935 156 24 Approach Approach NNP 8935 156 25 to to IN 8935 156 26 Text Text NNP 8935 156 27 Searching Searching NNP 8935 156 28 in in IN 8935 156 29 Direct Direct NNP 8935 156 30 - - HYPH 8935 156 31 Access Access NNP 8935 156 32 Systems Systems NNPS 8935 156 33 , , , 8935 156 34 " " '' 8935 156 35 Communications communication NNS 8935 156 36 of of IN 8935 156 37 the the DT 8935 156 38 ACM ACM NNP 8935 156 39 ( ( -LRB- 8935 156 40 in in IN 8935 156 41 press press NN 8935 156 42 ) ) -RRB- 8935 156 43 . . . 8935 157 1 10 10 CD 8935 157 2 . . . 8935 158 1 G. G. NNP 8935 158 2 W. W. NNP 8935 158 3 Adamson Adamson NNP 8935 158 4 , , , 8935 158 5 J. J. NNP 8935 158 6 Cowell Cowell NNP 8935 158 7 , , , 8935 158 8 M. M. NNP 8935 158 9 F. F. NNP 8935 158 10 Lynch Lynch NNP 8935 158 11 , , , 8935 158 12 A. A. NNP 8935 158 13 H. H. NNP 8935 158 14 W. W. NNP 8935 158 15 McLure McLure NNP 8935 158 16 , , , 8935 158 17 W. W. NNP 8935 158 18 G. G. NNP 8935 158 19 Town Town NNP 8935 158 20 , , , 8935 158 21 and and CC 8935 158 22 A. A. NNP 8935 158 23 M. M. NNP 8935 158 24 Yapp Yapp NNP 8935 158 25 , , , 8935 158 26 " " `` 8935 158 27 Strategic Strategic NNP 8935 158 28 Considerations Considerations NNPS 8935 158 29 in in IN 8935 158 30 the the DT 8935 158 31 Design Design NNP 8935 158 32 of of IN 8935 158 33 Screening Screening NNP 8935 158 34 Systems Systems NNP 8935 158 35 for for IN 8935 158 36 Substructure Substructure NNP 8935 158 37 Searches Searches NNPS 8935 158 38 of of IN 8935 158 39 Chemical Chemical NNP 8935 158 40 Structure Structure NNP 8935 158 41 Files Files NNPS 8935 158 42 , , , 8935 158 43 " " '' 8935 158 44 ] ] -RRB- 8935 158 45 oumal oumal JJ 8935 158 46 of of IN 8935 158 47 Chemical Chemical NNP 8935 158 48 Docu- Docu- NNP 8935 158 49 mentation mentation NN 8935 158 50 13:153 13:153 CD 8935 158 51 - - HYPH 8935 158 52 57 57 CD 8935 158 53 ( ( -LRB- 8935 158 54 Aug. August NNP 8935 158 55 1973 1973 CD 8935 158 56 ) ) -RRB- 8935 158 57 . . . 8935 159 1 11 11 CD 8935 159 2 . . . 8935 160 1 A. A. NNP 8935 160 2 C. C. NNP 8935 160 3 Clare Clare NNP 8935 160 4 , , , 8935 160 5 E. E. NNP 8935 160 6 M. M. NNP 8935 160 7 Cook Cook NNP 8935 160 8 , , , 8935 160 9 and and CC 8935 160 10 M. M. NNP 8935 160 11 F. F. NNP 8935 160 12 Lynch Lynch NNP 8935 160 13 , , , 8935 160 14 " " `` 8935 160 15 The the DT 8935 160 16 Identification Identification NNP 8935 160 17 of of IN 8935 160 18 Variable Variable NNP 8935 160 19 - - HYPH 8935 160 20 Length Length NNP 8935 160 21 , , , 8935 160 22 Equifrequent Equifrequent NNP 8935 160 23 Character Character NNP 8935 160 24 Strings Strings NNP 8935 160 25 in in IN 8935 160 26 a a DT 8935 160 27 Natural Natural NNP 8935 160 28 Language Language NNP 8935 160 29 Data Data NNP 8935 160 30 Base Base NNP 8935 160 31 , , , 8935 160 32 " " `` 8935 160 33 Computer Computer NNP 8935 160 34 Journal15:259 Journal15:259 NNP 8935 160 35 - - HYPH 8935 160 36 62 62 CD 8935 160 37 ( ( -LRB- 8935 160 38 Aug. August NNP 8935 160 39 1972 1972 CD 8935 160 40 ) ) -RRB- 8935 160 41 . . . 8935 161 1 12 12 CD 8935 161 2 . . . 8935 162 1 C. C. NNP 8935 162 2 E. E. NNP 8935 162 3 Shannon Shannon NNP 8935 162 4 , , , 8935 162 5 " " `` 8935 162 6 A a DT 8935 162 7 Mathematical mathematical JJ 8935 162 8 Theory Theory NNP 8935 162 9 of of IN 8935 162 10 Communication Communication NNP 8935 162 11 , , , 8935 162 12 " " '' 8935 162 13 Bell Bell NNP 8935 162 14 System System NNP 8935 162 15 Technical Technical NNP 8935 162 16 Journal Journal NNP 8935 162 17 27 27 CD 8935 162 18 : : : 8935 162 19 398 398 CD 8935 162 20 - - CD 8935 162 21 403 403 CD 8935 162 22 ( ( -LRB- 8935 162 23 1948 1948 CD 8935 162 24 ) ) -RRB- 8935 162 25 . . . 8935 163 1 13 13 CD 8935 163 2 . . . 8935 164 1 W. W. NNP 8935 164 2 C. C. NNP 8935 164 3 B. B. NNP 8935 164 4 Sayers Sayers NNP 8935 164 5 , , , 8935 164 6 A a DT 8935 164 7 Manual Manual NNP 8935 164 8 of of IN 8935 164 9 Classification Classification NNP 8935 164 10 for for IN 8935 164 11 Librarians Librarians NNPS 8935 164 12 and and CC 8935 164 13 Bibliographers Bibliographers NNPS 8935 164 14 ( ( -LRB- 8935 164 15 London London NNP 8935 164 16 : : : 8935 164 17 Grafton Grafton NNP 8935 164 18 , , , 8935 164 19 1926 1926 CD 8935 164 20 ) ) -RRB- 8935 164 21 , , , 8935 164 22 14 14 CD 8935 164 23 . . . 8935 165 1 C. C. NNP 8935 165 2 A. A. NNP 8935 165 3 Cutter Cutter NNP 8935 165 4 , , , 8935 165 5 C. C. NNP 8935 165 6 A. A. NNP 8935 165 7 Cutter Cutter NNP 8935 165 8 's 's POS 8935 165 9 Alphabetic alphabetic JJ 8935 165 10 Order order NN 8935 165 11 Table table NN 8935 165 12 ... ... : 8935 165 13 Altered Altered NNP 8935 165 14 and and CC 8935 165 15 Fitted fit VBN 8935 165 16 with with IN 8935 165 17 Three three CD 8935 165 18 Figures figure NNS 8935 165 19 by by IN 8935 165 20 Kate Kate NNP 8935 165 21 E. E. NNP 8935 165 22 Sanborn Sanborn NNP 8935 165 23 ( ( -LRB- 8935 165 24 Boston Boston NNP 8935 165 25 : : : 8935 165 26 Boston Boston NNP 8935 165 27 Library Library NNP 8935 165 28 Bureau Bureau NNP 8935 165 29 , , , 8935 165 30 1896 1896 CD 8935 165 31 ) ) -RRB- 8935 165 32 . . . 8935 166 1 15 15 CD 8935 166 2 . . . 8935 167 1 C. C. NNP 8935 167 2 P. P. NNP 8935 167 3 Bourne Bourne NNP 8935 167 4 and and CC 8935 167 5 D. D. NNP 8935 167 6 F. F. NNP 8935 167 7 Ford Ford NNP 8935 167 8 , , , 8935 167 9 " " '' 8935 167 10 A a DT 8935 167 11 Study Study NNP 8935 167 12 of of IN 8935 167 13 the the DT 8935 167 14 Statistics Statistics NNPS 8935 167 15 of of IN 8935 167 16 Letters Letters NNPS 8935 167 17 in in IN 8935 167 18 English English NNP 8935 167 19 Words Words NNPS 8935 167 20 , , , 8935 167 21 " " '' 8935 167 22 Information Information NNP 8935 167 23 & & CC 8935 167 24 Control4:48 Control4:48 NNP 8935 167 25 - - HYPH 8935 167 26 67 67 CD 8935 167 27 ( ( -LRB- 8935 167 28 1961 1961 CD 8935 167 29 ) ) -RRB- 8935 167 30 . . . 8935 168 1 16 16 CD 8935 168 2 . . . 8935 169 1 H. H. NNP 8935 169 2 Ohlman Ohlman NNP 8935 169 3 , , , 8935 169 4 " " `` 8935 169 5 Subject subject JJ 8935 169 6 Word Word NNP 8935 169 7 Letter Letter NNP 8935 169 8 Frequencies Frequencies NNPS 8935 169 9 ; ; : 8935 169 10 Applications application NNS 8935 169 11 to to IN 8935 169 12 Superimposed superimpose VBN 8935 169 13 Coding coding NN 8935 169 14 , , , 8935 169 15 " " '' 8935 169 16 Proceedings proceeding NNS 8935 169 17 of of IN 8935 169 18 the the DT 8935 169 19 Inte1'national inte1'national JJ 8935 169 20 Conference Conference NNP 8935 169 21 of of IN 8935 169 22 Scientific Scientific NNP 8935 169 23 Information Information NNP 8935 169 24 , , , 8935 169 25 Vol Vol NNP 8935 169 26 . . . 8935 170 1 2 2 LS 8935 170 2 ( ( -LRB- 8935 170 3 Washington Washington NNP 8935 170 4 , , , 8935 170 5 D.C. D.C. NNP 8935 170 6 : : : 8935 170 7 National National NNP 8935 170 8 Academy Academy NNP 8935 170 9 of of IN 8935 170 10 Science Science NNP 8935 170 11 , , , 8935 170 12 1959 1959 CD 8935 170 13 ) ) -RRB- 8935 170 14 , , , 8935 170 15 p.903 p.903 NNP 8935 170 16 - - HYPH 8935 170 17 16 16 CD 8935 170 18 . . . 8935 171 1 17 17 CD 8935 171 2 . . . 8935 172 1 D. D. NNP 8935 172 2 W. W. NNP 8935 172 3 Fokker Fokker NNP 8935 172 4 and and CC 8935 172 5 M. M. NNP 8935 172 6 F. F. NNP 8935 172 7 Lynch Lynch NNP 8935 172 8 , , , 8935 172 9 " " `` 8935 172 10 A a DT 8935 172 11 Comparison Comparison NNP 8935 172 12 of of IN 8935 172 13 the the DT 8935 172 14 Microstructure Microstructure NNP 8935 172 15 of of IN 8935 172 16 Author Author NNP 8935 172 17 Names Names NNPS 8935 172 18 in in IN 8935 172 19 the the DT 8935 172 20 INSPEC INSPEC NNP 8935 172 21 , , , 8935 172 22 Chemical Chemical NNP 8935 172 23 Titles Titles NNPS 8935 172 24 and and CC 8935 172 25 B.N.B. B.N.B. NNP 8935 173 1 MARC MARC NNP 8935 173 2 Data Data NNP 8935 173 3 - - HYPH 8935 173 4 bases basis NNS 8935 173 5 " " '' 8935 173 6 ( ( -LRB- 8935 173 7 in in IN 8935 173 8 preparation preparation NN 8935 173 9 ) ) -RRB- 8935 173 10 . . . 8935 174 1 118 118 CD 8935 174 2 ] ] -RRB- 8935 174 3 oumalof oumalof NN 8935 174 4 Library Library NNP 8935 174 5 Automation Automation NNP 8935 174 6 Vol Vol NNP 8935 174 7 . . . 8935 175 1 7/2 7/2 CD 8935 175 2 June June NNP 8935 175 3 1974 1974 CD 8935 175 4 18 18 CD 8935 175 5 . . . 8935 176 1 F. F. NNP 8935 176 2 G. G. NNP 8935 176 3 Kilgour Kilgour NNP 8935 176 4 , , , 8935 176 5 P. P. NNP 8935 176 6 L. L. NNP 8935 176 7 Long Long NNP 8935 176 8 , , , 8935 176 9 A. A. NNP 8935 176 10 L. L. NNP 8935 176 11 Landgraf Landgraf NNP 8935 176 12 , , , 8935 176 13 and and CC 8935 176 14 J. J. NNP 8935 177 1 A. A. NNP 8935 177 2 Wyckoff Wyckoff NNP 8935 177 3 , , , 8935 177 4 " " `` 8935 177 5 The the DT 8935 177 6 Shared Shared NNP 8935 177 7 Cata- cata- NN 8935 177 8 loging log VBG 8935 177 9 System system NN 8935 177 10 of of IN 8935 177 11 the the DT 8935 177 12 Ohio Ohio NNP 8935 177 13 College College NNP 8935 177 14 Library Library NNP 8935 177 15 Center Center NNP 8935 177 16 , , , 8935 177 17 " " '' 8935 177 18 Journal Journal NNP 8935 177 19 of of IN 8935 177 20 Library Library NNP 8935 177 21 Automation Automation NNP 8935 177 22 5:157 5:157 CD 8935 177 23 - - SYM 8935 177 24 83 83 CD 8935 177 25 ( ( -LRB- 8935 177 26 Sept. September NNP 8935 177 27 1972 1972 CD 8935 177 28 ) ) -RRB- 8935 177 29 . . . 8935 178 1 19 19 CD 8935 178 2 . . . 8935 179 1 F. F. NNP 8935 179 2 G. G. NNP 8935 179 3 Kilgour Kilgour NNP 8935 179 4 , , , 8935 179 5 P. P. NNP 8935 179 6 L. L. NNP 8935 179 7 Long Long NNP 8935 179 8 , , , 8935 179 9 and and CC 8935 179 10 E. E. NNP 8935 179 11 B. B. NNP 8935 179 12 Leiderman Leiderman NNP 8935 179 13 , , , 8935 179 14 " " `` 8935 179 15 Retrieval retrieval NN 8935 179 16 of of IN 8935 179 17 Bibliographic Bibliographic NNP 8935 179 18 Entries Entries NNP 8935 179 19 from from IN 8935 179 20 a a DT 8935 179 21 Name Name NNP 8935 179 22 - - HYPH 8935 179 23 Title Title NNP 8935 179 24 Catalog Catalog NNP 8935 179 25 by by IN 8935 179 26 Use Use NNP 8935 179 27 of of IN 8935 179 28 Truncated Truncated NNP 8935 179 29 Search Search NNP 8935 179 30 Keys Keys NNPS 8935 179 31 , , , 8935 179 32 " " `` 8935 179 33 Proceedings proceeding NNS 8935 179 34 of of IN 8935 179 35 the the DT 8935 179 36 ASIS ASIS NNP 8935 179 37 7:79 7:79 CD 8935 179 38 - - HYPH 8935 179 39 82 82 CD 8935 179 40 ( ( -LRB- 8935 179 41 1970 1970 CD 8935 179 42 ) ) -RRB- 8935 179 43 . . . 8935 180 1 20 20 CD 8935 180 2 . . . 8935 181 1 I. I. NNP 8935 181 2 J. J. NNP 8935 181 3 Barton Barton NNP 8935 181 4 , , , 8935 181 5 M. M. NNP 8935 181 6 F. F. NNP 8935 181 7 Lynch Lynch NNP 8935 181 8 , , , 8935 181 9 J. J. NNP 8935 181 10 H. H. NNP 8935 181 11 Petrie Petrie NNP 8935 181 12 , , , 8935 181 13 and and CC 8935 181 14 M. M. NNP 8935 181 15 J. J. NNP 8935 181 16 Snell Snell NNP 8935 181 17 , , , 8935 181 18 " " `` 8935 181 19 Variable variable JJ 8935 181 20 - - HYPH 8935 181 21 Length length NN 8935 181 22 Character Character NNP 8935 181 23 String String NNP 8935 181 24 Analysis Analysis NNP 8935 181 25 of of IN 8935 181 26 Three Three NNP 8935 181 27 Data Data NNPS 8935 181 28 - - HYPH 8935 181 29 Bases Bases NNPS 8935 181 30 , , , 8935 181 31 and and CC 8935 181 32 Their -PRON- PRP$ 8935 181 33 Application application NN 8935 181 34 for for IN 8935 181 35 File File NNP 8935 181 36 Compression Compression NNP 8935 181 37 , , , 8935 181 38 " " `` 8935 181 39 Proceedings Proceedings NNP 8935 181 40 , , , 8935 181 41 1st 1st NNP 8935 181 42 Informatics Informatics NNP 8935 181 43 Con£. Con£. NNP 8935 181 44 , , , 8935 181 45 Durham Durham NNP 8935 181 46 , , , 8935 181 47 1973 1973 CD 8935 181 48 ( ( -LRB- 8935 181 49 in in IN 8935 181 50 press press NN 8935 181 51 ) ) -RRB- 8935 181 52 . . .