id sid tid token lemma pos 2163 1 1 Reference reference NN 2163 1 2 Information Information NNP 2163 1 3 Extraction Extraction NNP 2163 1 4 and and CC 2163 1 5 Processing Processing NNP 2163 1 6 Using Using NNP 2163 1 7 Conditional Conditional NNP 2163 1 8 Random Random NNP 2163 1 9 Fields Fields NNPS 2163 1 10 Tudor Tudor NNP 2163 1 11 Groza Groza NNP 2163 1 12 , , , 2163 1 13 Gunnar Gunnar NNP 2163 1 14 AAstrand AAstrand NNP 2163 1 15 Grimnes Grimnes NNPS 2163 1 16 , , , 2163 1 17 and and CC 2163 1 18 Siegfried Siegfried NNP 2163 1 19 Handschuh Handschuh NNP 2163 1 20 REFERENCE REFERENCE NNP 2163 1 21 INFORMATION INFORMATION VBD 2163 1 22 EXTRACTION extraction NN 2163 1 23 AND and CC 2163 1 24 PROCESSING processing NN 2163 1 25 |GROZA |groza NN 2163 1 26 , , , 2163 1 27 GRIMNES GRIMNES NNP 2163 1 28 , , , 2163 1 29 AND and CC 2163 1 30 HANDSCHUH HANDSCHUH NNP 2163 1 31 6 6 CD 2163 1 32 ABSTRACT ABSTRACT NNP 2163 1 33 Fostering foster VBG 2163 1 34 both both CC 2163 1 35 the the DT 2163 1 36 creation creation NN 2163 1 37 and and CC 2163 1 38 the the DT 2163 1 39 linking linking NN 2163 1 40 of of IN 2163 1 41 data datum NNS 2163 1 42 with with IN 2163 1 43 the the DT 2163 1 44 scope scope NN 2163 1 45 of of IN 2163 1 46 supporting support VBG 2163 1 47 the the DT 2163 1 48 growth growth NN 2163 1 49 of of IN 2163 1 50 the the DT 2163 1 51 Linked Linked NNP 2163 1 52 Data Data NNP 2163 1 53 Web Web NNP 2163 1 54 requires require VBZ 2163 1 55 us -PRON- PRP 2163 1 56 to to TO 2163 1 57 improve improve VB 2163 1 58 the the DT 2163 1 59 acquisition acquisition NN 2163 1 60 and and CC 2163 1 61 extraction extraction NN 2163 1 62 mechanisms mechanism NNS 2163 1 63 of of IN 2163 1 64 the the DT 2163 1 65 underlying underlie VBG 2163 1 66 semantic semantic JJ 2163 1 67 metadata metadata NN 2163 1 68 . . . 2163 2 1 This this DT 2163 2 2 is be VBZ 2163 2 3 particularly particularly RB 2163 2 4 important important JJ 2163 2 5 for for IN 2163 2 6 the the DT 2163 2 7 scientific scientific JJ 2163 2 8 publishing publishing NN 2163 2 9 domain domain NN 2163 2 10 , , , 2163 2 11 where where WRB 2163 2 12 currently currently RB 2163 2 13 most most JJS 2163 2 14 of of IN 2163 2 15 the the DT 2163 2 16 datasets dataset NNS 2163 2 17 are be VBP 2163 2 18 being be VBG 2163 2 19 created create VBN 2163 2 20 in in IN 2163 2 21 an an DT 2163 2 22 author author NN 2163 2 23 - - HYPH 2163 2 24 driven drive VBN 2163 2 25 , , , 2163 2 26 manual manual JJ 2163 2 27 manner manner NN 2163 2 28 . . . 2163 3 1 In in IN 2163 3 2 addition addition NN 2163 3 3 , , , 2163 3 4 such such JJ 2163 3 5 datasets dataset NNS 2163 3 6 capture capture VBP 2163 3 7 only only RB 2163 3 8 fragments fragment NNS 2163 3 9 of of IN 2163 3 10 the the DT 2163 3 11 complete complete JJ 2163 3 12 metadata metadata NN 2163 3 13 , , , 2163 3 14 omitting omit VBG 2163 3 15 usually usually RB 2163 3 16 , , , 2163 3 17 important important JJ 2163 3 18 elements element NNS 2163 3 19 such such JJ 2163 3 20 as as IN 2163 3 21 the the DT 2163 3 22 references reference NNS 2163 3 23 , , , 2163 3 24 although although IN 2163 3 25 they -PRON- PRP 2163 3 26 represent represent VBP 2163 3 27 valuable valuable JJ 2163 3 28 information information NN 2163 3 29 . . . 2163 4 1 In in IN 2163 4 2 this this DT 2163 4 3 paper paper NN 2163 4 4 we -PRON- PRP 2163 4 5 present present VBP 2163 4 6 an an DT 2163 4 7 approach approach NN 2163 4 8 that that WDT 2163 4 9 aims aim VBZ 2163 4 10 at at IN 2163 4 11 dealing deal VBG 2163 4 12 with with IN 2163 4 13 this this DT 2163 4 14 aspect aspect NN 2163 4 15 of of IN 2163 4 16 extraction extraction NN 2163 4 17 and and CC 2163 4 18 processing processing NN 2163 4 19 of of IN 2163 4 20 reference reference NN 2163 4 21 information information NN 2163 4 22 . . . 2163 5 1 The the DT 2163 5 2 experimental experimental JJ 2163 5 3 evaluation evaluation NN 2163 5 4 shows show VBZ 2163 5 5 that that IN 2163 5 6 , , , 2163 5 7 currently currently RB 2163 5 8 , , , 2163 5 9 our -PRON- PRP$ 2163 5 10 solution solution NN 2163 5 11 handles handle VBZ 2163 5 12 very very RB 2163 5 13 well well RB 2163 5 14 diverse diverse JJ 2163 5 15 types type NNS 2163 5 16 of of IN 2163 5 17 reference reference NN 2163 5 18 format format NN 2163 5 19 , , , 2163 5 20 thus thus RB 2163 5 21 making make VBG 2163 5 22 it -PRON- PRP 2163 5 23 usable usable JJ 2163 5 24 for for IN 2163 5 25 , , , 2163 5 26 or or CC 2163 5 27 adaptable adaptable JJ 2163 5 28 to to IN 2163 5 29 , , , 2163 5 30 any any DT 2163 5 31 area area NN 2163 5 32 of of IN 2163 5 33 scientific scientific JJ 2163 5 34 publishing publishing NN 2163 5 35 . . . 2163 6 1 1 1 LS 2163 6 2 . . . 2163 7 1 INTRODUCTION introduction VB 2163 7 2 The the DT 2163 7 3 progressive progressive JJ 2163 7 4 adoption adoption NN 2163 7 5 of of IN 2163 7 6 Semantic Semantic NNP 2163 7 7 Web web NN 2163 7 8 1 1 CD 2163 7 9 techniques technique NNS 2163 7 10 resulted result VBN 2163 7 11 in in IN 2163 7 12 the the DT 2163 7 13 creation creation NN 2163 7 14 of of IN 2163 7 15 a a DT 2163 7 16 series series NN 2163 7 17 of of IN 2163 7 18 datasets dataset NNS 2163 7 19 connected connect VBN 2163 7 20 by by IN 2163 7 21 the the DT 2163 7 22 Linked Linked NNP 2163 7 23 Data Data NNP 2163 7 24 2 2 CD 2163 7 25 initiative initiative NN 2163 7 26 , , , 2163 7 27 and and CC 2163 7 28 via via IN 2163 7 29 the the DT 2163 7 30 Linked Linked NNP 2163 7 31 Data Data NNP 2163 7 32 principles principle NNS 2163 7 33 , , , 2163 7 34 into into IN 2163 7 35 a a DT 2163 7 36 universal universal JJ 2163 7 37 Web web NN 2163 7 38 of of IN 2163 7 39 Linked Linked NNP 2163 7 40 Data Data NNP 2163 7 41 . . . 2163 8 1 In in IN 2163 8 2 order order NN 2163 8 3 to to TO 2163 8 4 foster foster VB 2163 8 5 the the DT 2163 8 6 continuous continuous JJ 2163 8 7 growth growth NN 2163 8 8 of of IN 2163 8 9 this this DT 2163 8 10 Linked Linked NNP 2163 8 11 Data Data NNP 2163 8 12 Web Web NNP 2163 8 13 , , , 2163 8 14 we -PRON- PRP 2163 8 15 need need VBP 2163 8 16 to to TO 2163 8 17 improve improve VB 2163 8 18 the the DT 2163 8 19 acquisition acquisition NN 2163 8 20 and and CC 2163 8 21 extraction extraction NN 2163 8 22 mechanisms mechanism NNS 2163 8 23 of of IN 2163 8 24 the the DT 2163 8 25 underlying underlie VBG 2163 8 26 semantic semantic JJ 2163 8 27 metadata metadata NN 2163 8 28 . . . 2163 9 1 Unfortunately unfortunately RB 2163 9 2 , , , 2163 9 3 the the DT 2163 9 4 scientific scientific JJ 2163 9 5 publishing publishing NN 2163 9 6 domain domain NN 2163 9 7 , , , 2163 9 8 a a DT 2163 9 9 domain domain NN 2163 9 10 with with IN 2163 9 11 an an DT 2163 9 12 enormous enormous JJ 2163 9 13 potential potential NN 2163 9 14 for for IN 2163 9 15 generating generate VBG 2163 9 16 large large JJ 2163 9 17 amounts amount NNS 2163 9 18 of of IN 2163 9 19 Linked Linked NNP 2163 9 20 Data Data NNP 2163 9 21 , , , 2163 9 22 still still RB 2163 9 23 promotes promote VBZ 2163 9 24 trivial trivial JJ 2163 9 25 mechanisms mechanism NNS 2163 9 26 for for IN 2163 9 27 producing produce VBG 2163 9 28 semantic semantic JJ 2163 9 29 metadata metadata NN 2163 9 30 . . . 2163 10 1 3 3 LS 2163 10 2 As as IN 2163 10 3 an an DT 2163 10 4 illustration illustration NN 2163 10 5 , , , 2163 10 6 the the DT 2163 10 7 metadata metadata NN 2163 10 8 acquisition acquisition NN 2163 10 9 process process NN 2163 10 10 of of IN 2163 10 11 the the DT 2163 10 12 Semantic Semantic NNP 2163 10 13 Web Web NNP 2163 10 14 Dog Dog NNP 2163 10 15 Food Food NNP 2163 10 16 Server Server NNP 2163 10 17 , , , 2163 10 18 4 4 CD 2163 10 19 the the DT 2163 10 20 main main JJ 2163 10 21 Linked Linked NNP 2163 10 22 Data Data NNP 2163 10 23 publication publication NN 2163 10 24 repository repository NN 2163 10 25 available available JJ 2163 10 26 on on IN 2163 10 27 the the DT 2163 10 28 Web web NN 2163 10 29 , , , 2163 10 30 consists consist VBZ 2163 10 31 of of IN 2163 10 32 two two CD 2163 10 33 steps step NNS 2163 10 34 : : : 2163 10 35   . 2163 10 36 the the DT 2163 10 37 authors author NNS 2163 10 38 manually manually RB 2163 10 39 fill fill VBP 2163 10 40 - - HYPH 2163 10 41 in in RP 2163 10 42 submission submission NN 2163 10 43 forms form NNS 2163 10 44 corresponding correspond VBG 2163 10 45 to to IN 2163 10 46 different different JJ 2163 10 47 publishing publishing NN 2163 10 48 venues venue NNS 2163 10 49 ( ( -LRB- 2163 10 50 e.g. e.g. RB 2163 10 51 , , , 2163 10 52 conferences conference NNS 2163 10 53 or or CC 2163 10 54 workshops workshop NNS 2163 10 55 ) ) -RRB- 2163 10 56 , , , 2163 10 57 with with IN 2163 10 58 the the DT 2163 10 59 resulting resulting NN 2163 10 60 ( ( -LRB- 2163 10 61 usually usually RB 2163 10 62 XML xml NN 2163 10 63 ) ) -RRB- 2163 10 64 information information NN 2163 10 65 being be VBG 2163 10 66 transformed transform VBN 2163 10 67 via via IN 2163 10 68 scripts script NNS 2163 10 69 into into IN 2163 10 70 semantic semantic JJ 2163 10 71 metadata metadata NN 2163 10 72 , , , 2163 10 73 and and CC 2163 10 74   NFP 2163 10 75 the the DT 2163 10 76 entity entity NN 2163 10 77 URIs uri NNS 2163 10 78 ( ( -LRB- 2163 10 79 i.e. i.e. FW 2163 10 80 , , , 2163 10 81 authors author NNS 2163 10 82 and and CC 2163 10 83 publications publication NNS 2163 10 84 ) ) -RRB- 2163 10 85 present present JJ 2163 10 86 in in IN 2163 10 87 this this DT 2163 10 88 semantic semantic JJ 2163 10 89 metadata metadata NN 2163 10 90 are be VBP 2163 10 91 then then RB 2163 10 92 manually manually RB 2163 10 93 mapped map VBN 2163 10 94 to to IN 2163 10 95 existing exist VBG 2163 10 96 Web web NN 2163 10 97 URIs uri NNS 2163 10 98 for for IN 2163 10 99 linking linking NN 2163 10 100 / / SYM 2163 10 101 consolidation consolidation NN 2163 10 102 purposes purpose NNS 2163 10 103 . . . 2163 11 1 Tudor Tudor NNP 2163 11 2 Groza Groza NNP 2163 11 3 ( ( -LRB- 2163 11 4 tudor.groza@uq.edu.au tudor.groza@uq.edu.au NN 2163 11 5 ) ) -RRB- 2163 11 6 is be VBZ 2163 11 7 Postdoctoral Postdoctoral NNP 2163 11 8 Research Research NNP 2163 11 9 Fellow Fellow NNP 2163 11 10 , , , 2163 11 11 School School NNP 2163 11 12 of of IN 2163 11 13 Information Information NNP 2163 11 14 Technology Technology NNP 2163 11 15 and and CC 2163 11 16 Electrical Electrical NNP 2163 11 17 Engineering Engineering NNP 2163 11 18 , , , 2163 11 19 University University NNP 2163 11 20 of of IN 2163 11 21 Queensland Queensland NNP 2163 11 22 , , , 2163 11 23 Gunnar Gunnar NNP 2163 11 24 AAstrand AAstrand NNP 2163 11 25 Grimnes Grimnes NNPS 2163 11 26 ( ( -LRB- 2163 11 27 grimnes@dfki.uni-kl.de grimnes@dfki.uni-kl.de NNP 2163 11 28 ) ) -RRB- 2163 11 29 is be VBZ 2163 11 30 Researcher Researcher NNP 2163 11 31 , , , 2163 11 32 German German NNP 2163 11 33 Research Research NNP 2163 11 34 Center Center NNP 2163 11 35 for for IN 2163 11 36 Artificial Artificial NNP 2163 11 37 Intelligence Intelligence NNP 2163 11 38 ( ( -LRB- 2163 11 39 DFKI DFKI NNP 2163 11 40 ) ) -RRB- 2163 11 41 GmbH GmbH NNP 2163 11 42 , , , 2163 11 43 Kaiserslautern Kaiserslautern NNP 2163 11 44 , , , 2163 11 45 Germany Germany NNP 2163 11 46 , , , 2163 11 47 Siegfried Siegfried NNP 2163 11 48 Handschuh Handschuh NNP 2163 11 49 ( ( -LRB- 2163 11 50 msiegfried.handschuh@deri.org msiegfried.handschuh@deri.org NNP 2163 11 51 ) ) -RRB- 2163 11 52 is be VBZ 2163 11 53 Senior Senior NNP 2163 11 54 Lecturer Lecturer NNP 2163 11 55 / / SYM 2163 11 56 Associate Associate NNP 2163 11 57 Professor Professor NNP 2163 11 58 , , , 2163 11 59 National National NNP 2163 11 60 University University NNP 2163 11 61 of of IN 2163 11 62 Ireland Ireland NNP 2163 11 63 , , , 2163 11 64 Galway Galway NNP 2163 11 65 , , , 2163 11 66 Ireland Ireland NNP 2163 11 67 . . . 2163 12 1 mailto:tudor.groza@uq.edu.au mailto:tudor.groza@uq.edu.au NNP 2163 12 2 mailto:grimnes@dfki.uni-kl.de mailto:grimnes@dfki.uni-kl.de NNP 2163 12 3 mailto:msiegfried.handschuh@deri.org mailto:msiegfried.handschuh@deri.org NNP 2163 12 4 INFORMATION INFORMATION VBD 2163 12 5 TECHNOLOGY technology NN 2163 12 6 AND and CC 2163 12 7 LIBRARIES LIBRARIES NNP 2163 12 8 | | NNP 2163 12 9 JUNE JUNE NNP 2163 12 10 2012 2012 CD 2163 12 11 7 7 CD 2163 12 12 Moreover moreover RB 2163 12 13 , , , 2163 12 14 independent independent JJ 2163 12 15 of of IN 2163 12 16 the the DT 2163 12 17 creation creation NN 2163 12 18 / / SYM 2163 12 19 acquisition acquisition NN 2163 12 20 process process NN 2163 12 21 , , , 2163 12 22 one one CD 2163 12 23 particular particular JJ 2163 12 24 component component NN 2163 12 25 of of IN 2163 12 26 the the DT 2163 12 27 publication publication NN 2163 12 28 metadata metadata NN 2163 12 29 , , , 2163 12 30 i.e. i.e. FW 2163 12 31 , , , 2163 12 32 the the DT 2163 12 33 reference reference NN 2163 12 34 information information NN 2163 12 35 , , , 2163 12 36 is be VBZ 2163 12 37 almost almost RB 2163 12 38 constantly constantly RB 2163 12 39 neglected neglect VBN 2163 12 40 . . . 2163 13 1 The the DT 2163 13 2 reason reason NN 2163 13 3 is be VBZ 2163 13 4 mainly mainly RB 2163 13 5 the the DT 2163 13 6 amount amount NN 2163 13 7 of of IN 2163 13 8 work work NN 2163 13 9 required require VBN 2163 13 10 to to TO 2163 13 11 manually manually RB 2163 13 12 create create VB 2163 13 13 it -PRON- PRP 2163 13 14 , , , 2163 13 15 or or CC 2163 13 16 the the DT 2163 13 17 complexity complexity NN 2163 13 18 of of IN 2163 13 19 the the DT 2163 13 20 task task NN 2163 13 21 , , , 2163 13 22 in in IN 2163 13 23 the the DT 2163 13 24 case case NN 2163 13 25 of of IN 2163 13 26 automatic automatic JJ 2163 13 27 extraction extraction NN 2163 13 28 . . . 2163 14 1 As as IN 2163 14 2 a a DT 2163 14 3 result result NN 2163 14 4 , , , 2163 14 5 currently currently RB 2163 14 6 , , , 2163 14 7 there there EX 2163 14 8 are be VBP 2163 14 9 no no DT 2163 14 10 datasets dataset NNS 2163 14 11 in in IN 2163 14 12 the the DT 2163 14 13 Linked Linked NNP 2163 14 14 Data Data NNP 2163 14 15 Web web NN 2163 14 16 exposing expose VBG 2163 14 17 reference reference NN 2163 14 18 information information NN 2163 14 19 , , , 2163 14 20 while while IN 2163 14 21 the the DT 2163 14 22 number number NN 2163 14 23 of of IN 2163 14 24 digital digital JJ 2163 14 25 libraries library NNS 2163 14 26 providing provide VBG 2163 14 27 search search NN 2163 14 28 and and CC 2163 14 29 link link VBP 2163 14 30 functionality functionality NN 2163 14 31 over over IN 2163 14 32 references reference NNS 2163 14 33 is be VBZ 2163 14 34 rather rather RB 2163 14 35 limited limited JJ 2163 14 36 . . . 2163 15 1 This this DT 2163 15 2 is be VBZ 2163 15 3 quite quite PDT 2163 15 4 a a DT 2163 15 5 problematic problematic JJ 2163 15 6 gap gap NN 2163 15 7 if if IN 2163 15 8 we -PRON- PRP 2163 15 9 consider consider VBP 2163 15 10 the the DT 2163 15 11 amount amount NN 2163 15 12 of of IN 2163 15 13 information information NN 2163 15 14 provided provide VBN 2163 15 15 by by IN 2163 15 16 references reference NNS 2163 15 17 and and CC 2163 15 18 their -PRON- PRP$ 2163 15 19 foundational foundational JJ 2163 15 20 support support NN 2163 15 21 for for IN 2163 15 22 other other JJ 2163 15 23 application application NN 2163 15 24 techniques technique NNS 2163 15 25 that that WDT 2163 15 26 bring bring VBP 2163 15 27 value value NN 2163 15 28 to to IN 2163 15 29 researchers researcher NNS 2163 15 30 and and CC 2163 15 31 librarians librarian NNS 2163 15 32 , , , 2163 15 33 such such JJ 2163 15 34 as as IN 2163 15 35 citation citation NN 2163 15 36 analysis analysis NN 2163 15 37 and and CC 2163 15 38 citation citation NN 2163 15 39 metrics metric NNS 2163 15 40 , , , 2163 15 41 tracking track VBG 2163 15 42 temporal temporal JJ 2163 15 43 author author NN 2163 15 44 - - HYPH 2163 15 45 topic topic NN 2163 15 46 evolution evolution NN 2163 15 47 5 5 CD 2163 15 48 or or CC 2163 15 49 co co NN 2163 15 50 - - JJ 2163 15 51 authorship authorship JJ 2163 15 52 graph graph NN 2163 15 53 analysis analysis NN 2163 15 54 . . . 2163 16 1 6,7 6,7 LS 2163 16 2 In in IN 2163 16 3 this this DT 2163 16 4 paper paper NN 2163 16 5 we -PRON- PRP 2163 16 6 focus focus VBP 2163 16 7 on on IN 2163 16 8 the the DT 2163 16 9 first first JJ 2163 16 10 of of IN 2163 16 11 the the DT 2163 16 12 above above RB 2163 16 13 - - HYPH 2163 16 14 mentioned mention VBN 2163 16 15 steps step NNS 2163 16 16 , , , 2163 16 17 i.e. i.e. FW 2163 16 18 , , , 2163 16 19 providing provide VBG 2163 16 20 the the DT 2163 16 21 underlying underlie VBG 2163 16 22 mechanisms mechanism NNS 2163 16 23 for for IN 2163 16 24 automatic automatic JJ 2163 16 25 extraction extraction NN 2163 16 26 of of IN 2163 16 27 reference reference NN 2163 16 28 metadata metadata NN 2163 16 29 . . . 2163 17 1 We -PRON- PRP 2163 17 2 devise devise VBP 2163 17 3 a a DT 2163 17 4 solution solution NN 2163 17 5 that that WDT 2163 17 6 enables enable VBZ 2163 17 7 extraction extraction NN 2163 17 8 and and CC 2163 17 9 chunking chunking NN 2163 17 10 of of IN 2163 17 11 references reference NNS 2163 17 12 using use VBG 2163 17 13 Conditional Conditional NNP 2163 17 14 Random Random NNP 2163 17 15 Fields Fields NNPS 2163 17 16 ( ( -LRB- 2163 17 17 CRF CRF NNP 2163 17 18 ) ) -RRB- 2163 17 19 . . . 2163 18 1 8 8 CD 2163 18 2 The the DT 2163 18 3 resulting result VBG 2163 18 4 metadata metadata NN 2163 18 5 can can MD 2163 18 6 then then RB 2163 18 7 be be VB 2163 18 8 easily easily RB 2163 18 9 transformed transform VBN 2163 18 10 into into IN 2163 18 11 semantic semantic JJ 2163 18 12 metadata metadata NN 2163 18 13 adhering adhere VBG 2163 18 14 to to IN 2163 18 15 particular particular JJ 2163 18 16 schemas schema NNS 2163 18 17 via via IN 2163 18 18 scripts script NNS 2163 18 19 , , , 2163 18 20 the the DT 2163 18 21 added add VBN 2163 18 22 value value NN 2163 18 23 being be VBG 2163 18 24 the the DT 2163 18 25 exclusion exclusion NN 2163 18 26 of of IN 2163 18 27 the the DT 2163 18 28 manual manual JJ 2163 18 29 author author NN 2163 18 30 - - HYPH 2163 18 31 driven drive VBN 2163 18 32 creation creation NN 2163 18 33 step step NN 2163 18 34 from from IN 2163 18 35 the the DT 2163 18 36 process process NN 2163 18 37 . . . 2163 19 1 From from IN 2163 19 2 the the DT 2163 19 3 domain domain NN 2163 19 4 perspective perspective NN 2163 19 5 , , , 2163 19 6 we -PRON- PRP 2163 19 7 focus focus VBP 2163 19 8 on on IN 2163 19 9 computer computer NN 2163 19 10 science science NN 2163 19 11 and and CC 2163 19 12 health health NN 2163 19 13 sciences science NNS 2163 19 14 only only RB 2163 19 15 because because IN 2163 19 16 these these DT 2163 19 17 domains domain NNS 2163 19 18 have have VBP 2163 19 19 representative representative JJ 2163 19 20 datasets dataset NNS 2163 19 21 that that WDT 2163 19 22 can can MD 2163 19 23 be be VB 2163 19 24 used use VBN 2163 19 25 for for IN 2163 19 26 evaluation evaluation NN 2163 19 27 and and CC 2163 19 28 hence hence RB 2163 19 29 enable enable VB 2163 19 30 comparison comparison NN 2163 19 31 against against IN 2163 19 32 similar similar JJ 2163 19 33 approaches approach NNS 2163 19 34 . . . 2163 20 1 However however RB 2163 20 2 , , , 2163 20 3 we -PRON- PRP 2163 20 4 believe believe VBP 2163 20 5 that that IN 2163 20 6 our -PRON- PRP$ 2163 20 7 model model NN 2163 20 8 can can MD 2163 20 9 be be VB 2163 20 10 applied apply VBN 2163 20 11 also also RB 2163 20 12 in in IN 2163 20 13 domains domain NNS 2163 20 14 such such JJ 2163 20 15 as as IN 2163 20 16 digital digital JJ 2163 20 17 humanities humanity NNS 2163 20 18 or or CC 2163 20 19 social social JJ 2163 20 20 sciences science NNS 2163 20 21 , , , 2163 20 22 and and CC 2163 20 23 we -PRON- PRP 2163 20 24 intend intend VBP 2163 20 25 , , , 2163 20 26 in in IN 2163 20 27 the the DT 2163 20 28 near near JJ 2163 20 29 future future NN 2163 20 30 , , , 2163 20 31 to to TO 2163 20 32 build build VB 2163 20 33 a a DT 2163 20 34 corresponding corresponding JJ 2163 20 35 corpus corpus NN 2163 20 36 that that WDT 2163 20 37 would would MD 2163 20 38 allow allow VB 2163 20 39 us -PRON- PRP 2163 20 40 to to TO 2163 20 41 test test VB 2163 20 42 and and CC 2163 20 43 adapt adapt VB 2163 20 44 ( ( -LRB- 2163 20 45 if if IN 2163 20 46 necessary necessary JJ 2163 20 47 ) ) -RRB- 2163 20 48 our -PRON- PRP$ 2163 20 49 solution solution NN 2163 20 50 to to IN 2163 20 51 these these DT 2163 20 52 domains domain NNS 2163 20 53 . . . 2163 21 1 Figure figure NN 2163 21 2 1 1 CD 2163 21 3 . . . 2163 22 1 Examples example NNS 2163 22 2 of of IN 2163 22 3 Chunked Chunked NNP 2163 22 4 and and CC 2163 22 5 Labeled Labeled NNP 2163 22 6 Reference Reference NNP 2163 22 7 Strings Strings NNP 2163 22 8 Reference Reference NNP 2163 22 9 chunking chunking NN 2163 22 10 represents represent VBZ 2163 22 11 the the DT 2163 22 12 process process NN 2163 22 13 of of IN 2163 22 14 label label NNP 2163 22 15 sequencing sequence VBG 2163 22 16 a a DT 2163 22 17 reference reference NN 2163 22 18 string string NN 2163 22 19 , , , 2163 22 20 i.e. i.e. FW 2163 22 21 , , , 2163 22 22 tagging tag VBG 2163 22 23 the the DT 2163 22 24 parts part NNS 2163 22 25 of of IN 2163 22 26 the the DT 2163 22 27 reference reference NN 2163 22 28 containing contain VBG 2163 22 29 the the DT 2163 22 30 authors author NNS 2163 22 31 , , , 2163 22 32 the the DT 2163 22 33 title title NN 2163 22 34 , , , 2163 22 35 the the DT 2163 22 36 publication publication NN 2163 22 37 venue venue NN 2163 22 38 , , , 2163 22 39 etc etc FW 2163 22 40 . . . 2163 23 1 The the DT 2163 23 2 main main JJ 2163 23 3 issue issue NN 2163 23 4 associated associate VBN 2163 23 5 with with IN 2163 23 6 this this DT 2163 23 7 task task NN 2163 23 8 is be VBZ 2163 23 9 the the DT 2163 23 10 lack lack NN 2163 23 11 of of IN 2163 23 12 uniformity uniformity NN 2163 23 13 in in IN 2163 23 14 the the DT 2163 23 15 reference reference NN 2163 23 16 representation representation NN 2163 23 17 . . . 2163 24 1 Figure figure NN 2163 24 2 1 1 CD 2163 24 3 presents present VBZ 2163 24 4 three three CD 2163 24 5 examples example NNS 2163 24 6 of of IN 2163 24 7 chunked chunked JJ 2163 24 8 and and CC 2163 24 9 labeled label VBN 2163 24 10 reference reference NN 2163 24 11 strings string NNS 2163 24 12 . . . 2163 25 1 One one PRP 2163 25 2 can can MD 2163 25 3 not not RB 2163 25 4 infer infer VB 2163 25 5 generic generic JJ 2163 25 6 patterns pattern NNS 2163 25 7 for for IN 2163 25 8 all all DT 2163 25 9 types type NNS 2163 25 10 of of IN 2163 25 11 references reference NNS 2163 25 12 . . . 2163 26 1 For for IN 2163 26 2 example example NN 2163 26 3 , , , 2163 26 4 the the DT 2163 26 5 year year NN 2163 26 6 ( ( -LRB- 2163 26 7 or or CC 2163 26 8 date date NN 2163 26 9 ) ) -RRB- 2163 26 10 of of IN 2163 26 11 some some DT 2163 26 12 of of IN 2163 26 13 the the DT 2163 26 14 references reference NNS 2163 26 15 of of IN 2163 26 16 this this DT 2163 26 17 paper paper NN 2163 26 18 are be VBP 2163 26 19 similar similar JJ 2163 26 20 to to IN 2163 26 21 example example VB 2163 26 22 2 2 CD 2163 26 23 from from IN 2163 26 24 the the DT 2163 26 25 figure figure NN 2163 26 26 , , , 2163 26 27 i.e. i.e. FW 2163 26 28 , , , 2163 26 29 they -PRON- PRP 2163 26 30 are be VBP 2163 26 31 located locate VBN 2163 26 32 at at IN 2163 26 33 the the DT 2163 26 34 very very JJ 2163 26 35 end end NN 2163 26 36 of of IN 2163 26 37 the the DT 2163 26 38 reference reference NN 2163 26 39 string string NN 2163 26 40 . . . 2163 27 1 Unfortunately unfortunately RB 2163 27 2 , , , 2163 27 3 this this DT 2163 27 4 does do VBZ 2163 27 5 not not RB 2163 27 6 hold hold VB 2163 27 7 for for IN 2163 27 8 some some DT 2163 27 9 journal journal NN 2163 27 10 reference reference NN 2163 27 11 formats format NNS 2163 27 12 , , , 2163 27 13 such such JJ 2163 27 14 as as IN 2163 27 15 the the DT 2163 27 16 one one CD 2163 27 17 presented present VBN 2163 27 18 in in IN 2163 27 19 example example NN 2163 27 20 1 1 CD 2163 27 21 . . . 2163 28 1 And and CC 2163 28 2 at at IN 2163 28 3 the the DT 2163 28 4 same same JJ 2163 28 5 time time NN 2163 28 6 , , , 2163 28 7 the the DT 2163 28 8 actual actual JJ 2163 28 9 date date NN 2163 28 10 might may MD 2163 28 11 not not RB 2163 28 12 comprise comprise VB 2163 28 13 only only RB 2163 28 14 the the DT 2163 28 15 year year NN 2163 28 16 , , , 2163 28 17 but but CC 2163 28 18 also also RB 2163 28 19 the the DT 2163 28 20 month month NN 2163 28 21 ( ( -LRB- 2163 28 22 and and CC 2163 28 23 even even RB 2163 28 24 day day NN 2163 28 25 ) ) -RRB- 2163 28 26 . . . 2163 29 1 In in IN 2163 29 2 addition addition NN 2163 29 3 to to IN 2163 29 4 the the DT 2163 29 5 placement placement NN 2163 29 6 of of IN 2163 29 7 the the DT 2163 29 8 particular particular JJ 2163 29 9 types type NNS 2163 29 10 of of IN 2163 29 11 tokens token NNS 2163 29 12 within within IN 2163 29 13 the the DT 2163 29 14 reference reference NN 2163 29 15 string string NN 2163 29 16 , , , 2163 29 17 one one CD 2163 29 18 of of IN 2163 29 19 the the DT 2163 29 20 major major JJ 2163 29 21 concerns concern NNS 2163 29 22 when when WRB 2163 29 23 labeling label VBG 2163 29 24 these these DT 2163 29 25 types type NNS 2163 29 26 of of IN 2163 29 27 tokens token NNS 2163 29 28 is be VBZ 2163 29 29 disambiguation disambiguation NN 2163 29 30 . . . 2163 30 1 Generally generally RB 2163 30 2 , , , 2163 30 3 there there EX 2163 30 4 are be VBP 2163 30 5 three three CD 2163 30 6 categories category NNS 2163 30 7 of of IN 2163 30 8 ambiguous ambiguous JJ 2163 30 9 elements element NNS 2163 30 10 : : : 2163 30 11 REFERENCE reference NN 2163 30 12 INFORMATION INFORMATION NNS 2163 30 13 EXTRACTION extraction NN 2163 30 14 AND and CC 2163 30 15 PROCESSING processing NN 2163 30 16 |GROZA |groza NN 2163 30 17 , , , 2163 30 18 GRIMNES GRIMNES NNP 2163 30 19 , , , 2163 30 20 AND and CC 2163 30 21 HANDSCHUH handschuh NN 2163 30 22 8 8 CD 2163 30 23   NN 2163 30 24 names name NNS 2163 30 25 — — : 2163 30 26 can can MD 2163 30 27 act act VB 2163 30 28 as as IN 2163 30 29 authors author NNS 2163 30 30 , , , 2163 30 31 editors editor NNS 2163 30 32 , , , 2163 30 33 or or CC 2163 30 34 even even RB 2163 30 35 part part NN 2163 30 36 of of IN 2163 30 37 organization organization NN 2163 30 38 names name NNS 2163 30 39 ( ( -LRB- 2163 30 40 e.g. e.g. RB 2163 30 41 , , , 2163 30 42 Max Max NNP 2163 30 43 Planck Planck NNP 2163 30 44 Institute Institute NNP 2163 30 45 ) ) -RRB- 2163 30 46 ; ; : 2163 30 47 in in IN 2163 30 48 example example NN 2163 30 49 1 1 CD 2163 30 50 a a DT 2163 30 51 name name NN 2163 30 52 is be VBZ 2163 30 53 used use VBN 2163 30 54 as as IN 2163 30 55 part part NN 2163 30 56 of of IN 2163 30 57 the the DT 2163 30 58 title title NN 2163 30 59 ; ; : 2163 30 60   : 2163 30 61 numbers number NNS 2163 30 62 — — : 2163 30 63 can can MD 2163 30 64 act act VB 2163 30 65 as as IN 2163 30 66 pages page NNS 2163 30 67 , , , 2163 30 68 years year NNS 2163 30 69 , , , 2163 30 70 days day NNS 2163 30 71 , , , 2163 30 72 volume volume NN 2163 30 73 numbers number NNS 2163 30 74 , , , 2163 30 75 or or CC 2163 30 76 just just RB 2163 30 77 numbers number NNS 2163 30 78 within within IN 2163 30 79 the the DT 2163 30 80 title title NN 2163 30 81 ; ; : 2163 30 82   NFP 2163 30 83 locations location NNS 2163 30 84 — — : 2163 30 85 can can MD 2163 30 86 act act VB 2163 30 87 as as IN 2163 30 88 actual actual JJ 2163 30 89 locations location NNS 2163 30 90 or or CC 2163 30 91 part part NN 2163 30 92 of of IN 2163 30 93 organization organization NN 2163 30 94 names name NNS 2163 30 95 ( ( -LRB- 2163 30 96 e.g. e.g. RB 2163 30 97 , , , 2163 30 98 Univ Univ NNP 2163 30 99 . . . 2163 31 1 of of IN 2163 31 2 Wisconsin Wisconsin NNP 2163 31 3 ) ) -RRB- 2163 31 4 To to TO 2163 31 5 help help VB 2163 31 6 the the DT 2163 31 7 chunker chunker NN 2163 31 8 in in IN 2163 31 9 performing perform VBG 2163 31 10 disambiguation disambiguation NN 2163 31 11 , , , 2163 31 12 one one PRP 2163 31 13 can can MD 2163 31 14 use use VB 2163 31 15 a a DT 2163 31 16 series series NN 2163 31 17 of of IN 2163 31 18 markers marker NNS 2163 31 19 , , , 2163 31 20 such such JJ 2163 31 21 as as IN 2163 31 22 , , , 2163 31 23 pp pp NNP 2163 31 24 . . . 2163 32 1 for for IN 2163 32 2 pages page NNS 2163 32 3 , , , 2163 32 4 TR TR NNP 2163 32 5 for for IN 2163 32 6 technical technical JJ 2163 32 7 reports report NNS 2163 32 8 , , , 2163 32 9 Univ Univ NNP 2163 32 10 . . . 2163 33 1 or or CC 2163 33 2 Institute Institute NNP 2163 33 3 for for IN 2163 33 4 organization organization NN 2163 33 5 . . . 2163 34 1 However however RB 2163 34 2 , , , 2163 34 3 there there EX 2163 34 4 are be VBP 2163 34 5 cases case NNS 2163 34 6 where where WRB 2163 34 7 such such JJ 2163 34 8 markers marker NNS 2163 34 9 help help VBP 2163 34 10 in in IN 2163 34 11 detecting detect VBG 2163 34 12 the the DT 2163 34 13 general general JJ 2163 34 14 category category NN 2163 34 15 of of IN 2163 34 16 the the DT 2163 34 17 token token NN 2163 34 18 , , , 2163 34 19 e.g. e.g. RB 2163 34 20 , , , 2163 34 21 publication publication NN 2163 34 22 venue venue NN 2163 34 23 , , , 2163 34 24 but but CC 2163 34 25 a a DT 2163 34 26 more more RBR 2163 34 27 detailed detailed JJ 2163 34 28 disambiguation disambiguation NN 2163 34 29 is be VBZ 2163 34 30 required require VBN 2163 34 31 . . . 2163 35 1 For for IN 2163 35 2 example example NN 2163 35 3 , , , 2163 35 4 the the DT 2163 35 5 Proc Proc NNP 2163 35 6 . . . 2163 36 1 marker marker NN 2163 36 2 generally generally RB 2163 36 3 signals signal VBZ 2163 36 4 the the DT 2163 36 5 publication publication NN 2163 36 6 venue venue NN 2163 36 7 of of IN 2163 36 8 the the DT 2163 36 9 reference reference NN 2163 36 10 , , , 2163 36 11 without without IN 2163 36 12 knowing know VBG 2163 36 13 exactly exactly RB 2163 36 14 whether whether IN 2163 36 15 it -PRON- PRP 2163 36 16 represents represent VBZ 2163 36 17 a a DT 2163 36 18 workshop workshop NN 2163 36 19 , , , 2163 36 20 conference conference NN 2163 36 21 or or CC 2163 36 22 even even RB 2163 36 23 journal journal NNP 2163 36 24 ( ( -LRB- 2163 36 25 as as IN 2163 36 26 in in IN 2163 36 27 the the DT 2163 36 28 case case NN 2163 36 29 of of IN 2163 36 30 Proc Proc NNP 2163 36 31 . . . 2163 37 1 Natl Natl NNP 2163 37 2 . . . 2163 38 1 Acad Acad NNS 2163 38 2 . . . 2163 39 1 Sci.—Proceedings Sci.—Proceedings NNP 2163 39 2 of of IN 2163 39 3 the the DT 2163 39 4 National National NNP 2163 39 5 Academy Academy NNP 2163 39 6 of of IN 2163 39 7 Sciences Sciences NNPS 2163 39 8 ) ) -RRB- 2163 39 9 . . . 2163 40 1 The the DT 2163 40 2 solution solution NN 2163 40 3 we -PRON- PRP 2163 40 4 have have VBP 2163 40 5 devised devise VBN 2163 40 6 was be VBD 2163 40 7 built build VBN 2163 40 8 to to TO 2163 40 9 properly properly RB 2163 40 10 handle handle VB 2163 40 11 such such JJ 2163 40 12 disambiguation disambiguation NN 2163 40 13 issues issue NNS 2163 40 14 and and CC 2163 40 15 the the DT 2163 40 16 intrinsic intrinsic JJ 2163 40 17 heterogeneous heterogeneous JJ 2163 40 18 nature nature NN 2163 40 19 of of IN 2163 40 20 references reference NNS 2163 40 21 . . . 2163 41 1 The the DT 2163 41 2 features feature NNS 2163 41 3 of of IN 2163 41 4 the the DT 2163 41 5 CRF CRF NNP 2163 41 6 chunker chunker NN 2163 41 7 model model NN 2163 41 8 were be VBD 2163 41 9 chosen choose VBN 2163 41 10 to to TO 2163 41 11 provide provide VB 2163 41 12 a a DT 2163 41 13 representative representative JJ 2163 41 14 discrimination discrimination NN 2163 41 15 between between IN 2163 41 16 the the DT 2163 41 17 different different JJ 2163 41 18 fields field NNS 2163 41 19 of of IN 2163 41 20 the the DT 2163 41 21 reference reference NN 2163 41 22 string string NN 2163 41 23 . . . 2163 42 1 Consequently consequently RB 2163 42 2 , , , 2163 42 3 as as IN 2163 42 4 the the DT 2163 42 5 experimental experimental JJ 2163 42 6 results result NNS 2163 42 7 show show VBP 2163 42 8 , , , 2163 42 9 the the DT 2163 42 10 resulting result VBG 2163 42 11 chunker chunker NN 2163 42 12 has have VBZ 2163 42 13 a a DT 2163 42 14 superior superior JJ 2163 42 15 efficiency efficiency NN 2163 42 16 , , , 2163 42 17 while while IN 2163 42 18 at at IN 2163 42 19 the the DT 2163 42 20 same same JJ 2163 42 21 time time NN 2163 42 22 maintaining maintain VBG 2163 42 23 an an DT 2163 42 24 increased increase VBN 2163 42 25 versatility versatility NN 2163 42 26 . . . 2163 43 1 The the DT 2163 43 2 rest rest NN 2163 43 3 of of IN 2163 43 4 the the DT 2163 43 5 paper paper NN 2163 43 6 is be VBZ 2163 43 7 structured structure VBN 2163 43 8 as as IN 2163 43 9 follows follow VBZ 2163 43 10 : : : 2163 43 11 in in IN 2163 43 12 section section NN 2163 43 13 2 2 CD 2163 43 14 we -PRON- PRP 2163 43 15 briefly briefly RB 2163 43 16 describe describe VBP 2163 43 17 Conditional Conditional NNP 2163 43 18 Random Random NNP 2163 43 19 Fields Fields NNPS 2163 43 20 and and CC 2163 43 21 analyze analyze VB 2163 43 22 the the DT 2163 43 23 existing exist VBG 2163 43 24 related related JJ 2163 43 25 work work NN 2163 43 26 . . . 2163 44 1 Section section NN 2163 44 2 3 3 CD 2163 44 3 details detail VBZ 2163 44 4 the the DT 2163 44 5 CRF CRF NNP 2163 44 6 - - HYPH 2163 44 7 based base VBN 2163 44 8 reference reference NN 2163 44 9 chunker chunker NN 2163 44 10 and and CC 2163 44 11 before before IN 2163 44 12 concluding conclude VBG 2163 44 13 in in IN 2163 44 14 section section NN 2163 44 15 5 5 CD 2163 44 16 , , , 2163 44 17 section section NN 2163 44 18 4 4 CD 2163 44 19 presents present VBZ 2163 44 20 our -PRON- PRP$ 2163 44 21 experimental experimental JJ 2163 44 22 results result NNS 2163 44 23 . . . 2163 45 1 2 2 LS 2163 45 2 . . . 2163 46 1 BACKGROUND BACKGROUND NNP 2163 46 2 2.1 2.1 CD 2163 46 3 Conditional Conditional NNP 2163 46 4 Random Random NNP 2163 46 5 Fields Fields NNPS 2163 46 6 To to TO 2163 46 7 have have VB 2163 46 8 a a DT 2163 46 9 better well JJR 2163 46 10 understanding understanding NN 2163 46 11 of of IN 2163 46 12 the the DT 2163 46 13 Machine Machine NNP 2163 46 14 Learning Learning NNP 2163 46 15 technique technique NN 2163 46 16 used use VBN 2163 46 17 by by IN 2163 46 18 our -PRON- PRP$ 2163 46 19 solution solution NN 2163 46 20 , , , 2163 46 21 in in IN 2163 46 22 the the DT 2163 46 23 following following NN 2163 46 24 we -PRON- PRP 2163 46 25 give give VBP 2163 46 26 a a DT 2163 46 27 brief brief JJ 2163 46 28 description description NN 2163 46 29 of of IN 2163 46 30 the the DT 2163 46 31 Conditional Conditional NNP 2163 46 32 Random Random NNP 2163 46 33 Fields Fields NNPS 2163 46 34 paradigm paradigm NN 2163 46 35 . . . 2163 47 1 Figure figure NN 2163 47 2 2 2 CD 2163 47 3 . . . 2163 48 1 Example Example NNP 2163 48 2 Linear Linear NNP 2163 48 3 CRF CRF NNP 2163 48 4 — — : 2163 48 5 Showing show VBG 2163 48 6 Dependencies dependency NNS 2163 48 7 Between between IN 2163 48 8 Features Features NNP 2163 48 9 X X NNPS 2163 48 10 and and CC 2163 48 11 Classes Classes NNPS 2163 48 12 Y Y NNP 2163 48 13 INFORMATION information NN 2163 48 14 TECHNOLOGY technology NN 2163 48 15 AND and CC 2163 48 16 LIBRARIES LIBRARIES NNP 2163 48 17 | | NNP 2163 48 18 JUNE JUNE NNP 2163 48 19 2012 2012 CD 2163 48 20 9 9 CD 2163 48 21 Conditional Conditional NNP 2163 48 22 Random Random NNP 2163 48 23 Fields Fields NNPS 2163 48 24 ( ( -LRB- 2163 48 25 CRF CRF NNP 2163 48 26 ) ) -RRB- 2163 48 27 is be VBZ 2163 48 28 a a DT 2163 48 29 probabilistic probabilistic JJ 2163 48 30 graphical graphical JJ 2163 48 31 model model NN 2163 48 32 for for IN 2163 48 33 classification classification NN 2163 48 34 . . . 2163 49 1 CRF CRF NNP 2163 49 2 , , , 2163 49 3 in in IN 2163 49 4 general general JJ 2163 49 5 , , , 2163 49 6 can can MD 2163 49 7 represent represent VB 2163 49 8 many many JJ 2163 49 9 different different JJ 2163 49 10 types type NNS 2163 49 11 of of IN 2163 49 12 graphical graphical JJ 2163 49 13 models model NNS 2163 49 14 , , , 2163 49 15 however however RB 2163 49 16 in in IN 2163 49 17 the the DT 2163 49 18 scope scope NN 2163 49 19 of of IN 2163 49 20 this this DT 2163 49 21 paper paper NN 2163 49 22 , , , 2163 49 23 we -PRON- PRP 2163 49 24 use use VBP 2163 49 25 the the DT 2163 49 26 so so RB 2163 49 27 - - HYPH 2163 49 28 called call VBN 2163 49 29 linear linear NN 2163 49 30 - - HYPH 2163 49 31 chain chain NN 2163 49 32 CRFs crf NNS 2163 49 33 . . . 2163 50 1 A a DT 2163 50 2 simple simple JJ 2163 50 3 example example NN 2163 50 4 of of IN 2163 50 5 a a DT 2163 50 6 linear linear JJ 2163 50 7 dependency dependency NN 2163 50 8 graph graph NN 2163 50 9 is be VBZ 2163 50 10 shown show VBN 2163 50 11 in in IN 2163 50 12 Figure Figure NNP 2163 50 13 2 2 CD 2163 50 14 , , , 2163 50 15 here here RB 2163 50 16 only only RB 2163 50 17 the the DT 2163 50 18 features feature NNS 2163 50 19 X X NNS 2163 50 20 of of IN 2163 50 21 the the DT 2163 50 22 previous previous JJ 2163 50 23 item item NN 2163 50 24 influences influence VBZ 2163 50 25 the the DT 2163 50 26 class class NN 2163 50 27 of of IN 2163 50 28 the the DT 2163 50 29 current current JJ 2163 50 30 item item NN 2163 50 31 Y. y. NN 2163 51 1 The the DT 2163 51 2 conditional conditional JJ 2163 51 3 probability probability NN 2163 51 4 is be VBZ 2163 51 5 defined define VBN 2163 51 6 as as IN 2163 51 7 : : : 2163 51 8 ( ( -LRB- 2163 51 9 | | NNP 2163 51 10 ) ) -RRB- 2163 51 11 ( ( -LRB- 2163 51 12 ) ) -RRB- 2163 51 13 ( ( -LRB- 2163 51 14 ∑ ∑ . 2163 51 15 ( ( -LRB- 2163 51 16 ) ) -RRB- 2163 51 17 ) ) -RRB- 2163 51 18 where where WRB 2163 51 19 ( ( -LRB- 2163 51 20 ) ) -RRB- 2163 51 21 ∑ ∑ . 2163 51 22 ( ( -LRB- 2163 51 23 ) ) -RRB- 2163 51 24 and and CC 2163 51 25 ( ( -LRB- 2163 51 26 ) ) -RRB- 2163 51 27 ∑ ∑ . 2163 51 28 ( ( -LRB- 2163 51 29 ∑ ∑ . 2163 51 30 ( ( -LRB- 2163 51 31 ) ) -RRB- 2163 51 32 ) ) -RRB- 2163 51 33 . . . 2163 52 1 The the DT 2163 52 2 model model NN 2163 52 3 is be VBZ 2163 52 4 usually usually RB 2163 52 5 trained train VBN 2163 52 6 by by IN 2163 52 7 maximizing maximize VBG 2163 52 8 the the DT 2163 52 9 log log NN 2163 52 10 - - HYPH 2163 52 11 likelihood likelihood NN 2163 52 12 of of IN 2163 52 13 the the DT 2163 52 14 training training NN 2163 52 15 data datum NNS 2163 52 16 by by IN 2163 52 17 gradient gradient JJ 2163 52 18 methods method NNS 2163 52 19 . . . 2163 53 1 A a DT 2163 53 2 dynamic dynamic JJ 2163 53 3 algorithm algorithm NN 2163 53 4 is be VBZ 2163 53 5 used use VBN 2163 53 6 to to TO 2163 53 7 compute compute VB 2163 53 8 all all PDT 2163 53 9 the the DT 2163 53 10 required require VBN 2163 53 11 probabilities probability NNS 2163 53 12 p p NNP 2163 53 13 ⍬ ⍬ NN 2163 53 14 (yi (yi NNP 2163 53 15 , , , 2163 53 16 yi+1 yi+1 NNP 2163 53 17 ) ) -RRB- 2163 53 18 for for IN 2163 53 19 calculating calculate VBG 2163 53 20 the the DT 2163 53 21 gradient gradient NN 2163 53 22 of of IN 2163 53 23 the the DT 2163 53 24 likelihood likelihood NN 2163 53 25 . . . 2163 54 1 This this DT 2163 54 2 means mean VBZ 2163 54 3 that that IN 2163 54 4 in in IN 2163 54 5 contrast contrast NN 2163 54 6 to to IN 2163 54 7 traditional traditional JJ 2163 54 8 classification classification NN 2163 54 9 algorithms algorithm NNS 2163 54 10 in in IN 2163 54 11 Machine Machine NNP 2163 54 12 Learning Learning NNP 2163 54 13 ( ( -LRB- 2163 54 14 e.g. e.g. RB 2163 54 15 , , , 2163 54 16 Support Support NNP 2163 54 17 Vector Vector NNP 2163 54 18 Machines Machines NNP 2163 54 19 9 9 CD 2163 54 20 ) ) -RRB- 2163 54 21 , , , 2163 54 22 it -PRON- PRP 2163 54 23 not not RB 2163 54 24 only only RB 2163 54 25 considers consider VBZ 2163 54 26 the the DT 2163 54 27 attributes attribute NNS 2163 54 28 of of IN 2163 54 29 the the DT 2163 54 30 current current JJ 2163 54 31 element element NN 2163 54 32 when when WRB 2163 54 33 determining determine VBG 2163 54 34 the the DT 2163 54 35 class class NN 2163 54 36 , , , 2163 54 37 but but CC 2163 54 38 also also RB 2163 54 39 attributes attribute VBZ 2163 54 40 of of IN 2163 54 41 preceding precede VBG 2163 54 42 and and CC 2163 54 43 succeeding succeed VBG 2163 54 44 items item NNS 2163 54 45 . . . 2163 55 1 This this DT 2163 55 2 makes make VBZ 2163 55 3 it -PRON- PRP 2163 55 4 ideal ideal JJ 2163 55 5 for for IN 2163 55 6 tagging tag VBG 2163 55 7 sequences sequence NNS 2163 55 8 , , , 2163 55 9 such such JJ 2163 55 10 as as IN 2163 55 11 chunking chunk VBG 2163 55 12 of of IN 2163 55 13 parts part NNS 2163 55 14 of of IN 2163 55 15 speech speech NN 2163 55 16 or or CC 2163 55 17 parts part NNS 2163 55 18 of of IN 2163 55 19 references reference NNS 2163 55 20 , , , 2163 55 21 which which WDT 2163 55 22 is be VBZ 2163 55 23 what what WP 2163 55 24 we -PRON- PRP 2163 55 25 require require VBP 2163 55 26 for for IN 2163 55 27 our -PRON- PRP$ 2163 55 28 chunking chunk VBG 2163 55 29 task task NN 2163 55 30 . . . 2163 56 1 2.2 2.2 CD 2163 56 2 Related related JJ 2163 56 3 Work work NN 2163 56 4 In in IN 2163 56 5 recent recent JJ 2163 56 6 years year NNS 2163 56 7 , , , 2163 56 8 extensive extensive JJ 2163 56 9 research research NN 2163 56 10 has have VBZ 2163 56 11 been be VBN 2163 56 12 performed perform VBN 2163 56 13 in in IN 2163 56 14 the the DT 2163 56 15 area area NN 2163 56 16 of of IN 2163 56 17 automatic automatic JJ 2163 56 18 metadata metadata NN 2163 56 19 extraction extraction NN 2163 56 20 from from IN 2163 56 21 scientific scientific JJ 2163 56 22 publications publication NNS 2163 56 23 . . . 2163 57 1 Most Most JJS 2163 57 2 of of IN 2163 57 3 the the DT 2163 57 4 approaches approaches NNP 2163 57 5 focus focus VBP 2163 57 6 on on IN 2163 57 7 one one CD 2163 57 8 of of IN 2163 57 9 the the DT 2163 57 10 two two CD 2163 57 11 main main JJ 2163 57 12 metadata metadata NN 2163 57 13 components component NNS 2163 57 14 , , , 2163 57 15 i.e. i.e. FW 2163 57 16 , , , 2163 57 17 on on IN 2163 57 18 the the DT 2163 57 19 heading heading NN 2163 57 20 / / SYM 2163 57 21 bibliographic bibliographic JJ 2163 57 22 metadata metadata NN 2163 57 23 or or CC 2163 57 24 on on IN 2163 57 25 the the DT 2163 57 26 reference reference NN 2163 57 27 metadata metadata NN 2163 57 28 , , , 2163 57 29 but but CC 2163 57 30 there there EX 2163 57 31 are be VBP 2163 57 32 also also RB 2163 57 33 cases case NNS 2163 57 34 when when WRB 2163 57 35 the the DT 2163 57 36 entire entire JJ 2163 57 37 set set NN 2163 57 38 is be VBZ 2163 57 39 targeted target VBN 2163 57 40 . . . 2163 58 1 As as IN 2163 58 2 this this DT 2163 58 3 paper paper NN 2163 58 4 focuses focus VBZ 2163 58 5 only only RB 2163 58 6 on on IN 2163 58 7 the the DT 2163 58 8 second second JJ 2163 58 9 component component NN 2163 58 10 , , , 2163 58 11 within within IN 2163 58 12 this this DT 2163 58 13 section section NN 2163 58 14 we -PRON- PRP 2163 58 15 present present VBP 2163 58 16 and and CC 2163 58 17 discuss discuss VB 2163 58 18 those those DT 2163 58 19 applications application NNS 2163 58 20 that that WDT 2163 58 21 deal deal VBP 2163 58 22 strictly strictly RB 2163 58 23 with with IN 2163 58 24 reference reference NN 2163 58 25 chunking chunk VBG 2163 58 26 . . . 2163 59 1 The the DT 2163 59 2 ParsCit ParsCit NNP 2163 59 3 framework framework NN 2163 59 4 is be VBZ 2163 59 5 the the DT 2163 59 6 closest close JJS 2163 59 7 technique technique NN 2163 59 8 mapping mapping NN 2163 59 9 to to IN 2163 59 10 our -PRON- PRP$ 2163 59 11 goals goal NNS 2163 59 12 and and CC 2163 59 13 methodology methodology NN 2163 59 14 . . . 2163 60 1 10 10 CD 2163 60 2 ParsCit ParsCit NNP 2163 60 3 is be VBZ 2163 60 4 an an DT 2163 60 5 open open JJ 2163 60 6 - - HYPH 2163 60 7 source source NN 2163 60 8 reference reference NN 2163 60 9 - - HYPH 2163 60 10 parsing parse VBG 2163 60 11 package package NN 2163 60 12 . . . 2163 61 1 While while IN 2163 61 2 its -PRON- PRP$ 2163 61 3 first first JJ 2163 61 4 version version NN 2163 61 5 used use VBD 2163 61 6 a a DT 2163 61 7 Maximum Maximum NNP 2163 61 8 Entropy Entropy NNP 2163 61 9 model model NN 2163 61 10 to to TO 2163 61 11 perform perform VB 2163 61 12 reference reference NN 2163 61 13 chunking chunk VBG 2163 61 14 , , , 2163 61 15 11 11 CD 2163 61 16 currently currently RB 2163 61 17 , , , 2163 61 18 inspired inspire VBN 2163 61 19 by by IN 2163 61 20 the the DT 2163 61 21 work work NN 2163 61 22 of of IN 2163 61 23 Peng Peng NNP 2163 61 24 et et NNP 2163 61 25 al al NNP 2163 61 26 . . . 2163 62 1 , , , 2163 62 2 12 12 CD 2163 62 3 it -PRON- PRP 2163 62 4 uses use VBZ 2163 62 5 a a DT 2163 62 6 trained train VBN 2163 62 7 CRF CRF NNP 2163 62 8 model model NN 2163 62 9 for for IN 2163 62 10 label label NNP 2163 62 11 sequencing sequencing NN 2163 62 12 . . . 2163 63 1 The the DT 2163 63 2 model model NN 2163 63 3 was be VBD 2163 63 4 obtained obtain VBN 2163 63 5 based base VBN 2163 63 6 on on IN 2163 63 7 a a DT 2163 63 8 set set NN 2163 63 9 of of IN 2163 63 10 twenty twenty CD 2163 63 11 - - HYPH 2163 63 12 three three CD 2163 63 13 token token NN 2163 63 14 - - HYPH 2163 63 15 oriented orient VBN 2163 63 16 features feature NNS 2163 63 17 tailored tailor VBN 2163 63 18 towards towards IN 2163 63 19 correcting correct VBG 2163 63 20 the the DT 2163 63 21 errors error NNS 2163 63 22 that that WDT 2163 63 23 Peng Peng NNP 2163 63 24 's 's POS 2163 63 25 CRF CRF NNP 2163 63 26 model model NN 2163 63 27 produced produce VBD 2163 63 28 . . . 2163 64 1 Our -PRON- PRP$ 2163 64 2 CRF CRF NNP 2163 64 3 chunker chunker NN 2163 64 4 builds build VBZ 2163 64 5 on on IN 2163 64 6 the the DT 2163 64 7 work work NN 2163 64 8 of of IN 2163 64 9 ParsCit ParsCit NNP 2163 64 10 . . . 2163 65 1 However however RB 2163 65 2 , , , 2163 65 3 as as IN 2163 65 4 we -PRON- PRP 2163 65 5 aimed aim VBD 2163 65 6 at at IN 2163 65 7 improving improve VBG 2163 65 8 the the DT 2163 65 9 chunking chunk VBG 2163 65 10 performance performance NN 2163 65 11 , , , 2163 65 12 we -PRON- PRP 2163 65 13 altered alter VBD 2163 65 14 some some DT 2163 65 15 of of IN 2163 65 16 the the DT 2163 65 17 existing exist VBG 2163 65 18 features feature NNS 2163 65 19 and and CC 2163 65 20 introduced introduce VBD 2163 65 21 additional additional JJ 2163 65 22 ones one NNS 2163 65 23 . . . 2163 66 1 Moreover moreover RB 2163 66 2 , , , 2163 66 3 we -PRON- PRP 2163 66 4 have have VBP 2163 66 5 compiled compile VBN 2163 66 6 significantly significantly RB 2163 66 7 larger large JJR 2163 66 8 gazetteers gazetteer NNS 2163 66 9 required require VBN 2163 66 10 for for IN 2163 66 11 detecting detect VBG 2163 66 12 different different JJ 2163 66 13 aspects aspect NNS 2163 66 14 , , , 2163 66 15 such such JJ 2163 66 16 as as IN 2163 66 17 names name NNS 2163 66 18 , , , 2163 66 19 places place NNS 2163 66 20 , , , 2163 66 21 organizations organization NNS 2163 66 22 , , , 2163 66 23 journals journal NNS 2163 66 24 , , , 2163 66 25 or or CC 2163 66 26 publishers publisher NNS 2163 66 27 . . . 2163 67 1 One one CD 2163 67 2 of of IN 2163 67 3 the the DT 2163 67 4 first first JJ 2163 67 5 attempts attempt NNS 2163 67 6 to to TO 2163 67 7 extract extract VB 2163 67 8 and and CC 2163 67 9 index index NN 2163 67 10 reference reference NN 2163 67 11 information information NN 2163 67 12 led lead VBD 2163 67 13 to to IN 2163 67 14 the the DT 2163 67 15 currently currently RB 2163 67 16 well- well- XX 2163 67 17 known know VBN 2163 67 18 system system NN 2163 67 19 , , , 2163 67 20 CiteSeer CiteSeer NNP 2163 67 21 . . . 2163 68 1 13 13 CD 2163 68 2 Around around IN 2163 68 3 the the DT 2163 68 4 same same JJ 2163 68 5 period period NN 2163 68 6 , , , 2163 68 7 Seymore Seymore NNP 2163 68 8 et et NNP 2163 68 9 al al NNP 2163 68 10 . . . 2163 69 1 developed develop VBD 2163 69 2 one one CD 2163 69 3 of of IN 2163 69 4 the the DT 2163 69 5 first first JJ 2163 69 6 reference reference NN 2163 69 7 chunking chunk VBG 2163 69 8 approaches approach NNS 2163 69 9 that that WDT 2163 69 10 used use VBD 2163 69 11 Machine Machine NNP 2163 69 12 Learning Learning NNP 2163 69 13 techniques technique NNS 2163 69 14 . . . 2163 70 1 14 14 CD 2163 70 2 The the DT 2163 70 3 authors author NNS 2163 70 4 trained train VBD 2163 70 5 a a DT 2163 70 6 Hidden Hidden NNP 2163 70 7 Markov Markov NNP 2163 70 8 Model Model NNP 2163 70 9 ( ( -LRB- 2163 70 10 HMM HMM NNP 2163 70 11 ) ) -RRB- 2163 70 12 to to TO 2163 70 13 build build VB 2163 70 14 a a DT 2163 70 15 reference reference NN 2163 70 16 sequence sequence NN 2163 70 17 labeler labeler NN 2163 70 18 using use VBG 2163 70 19 internal internal JJ 2163 70 20 states state NNS 2163 70 21 for for IN 2163 70 22 different different JJ 2163 70 23 parts part NNS 2163 70 24 of of IN 2163 70 25 the the DT 2163 70 26 fields field NNS 2163 70 27 . . . 2163 71 1 As as IN 2163 71 2 it -PRON- PRP 2163 71 3 represented represent VBD 2163 71 4 pioneering pioneer VBG 2163 71 5 work work NN 2163 71 6 , , , 2163 71 7 it -PRON- PRP 2163 71 8 also also RB 2163 71 9 resulted result VBD 2163 71 10 in in IN 2163 71 11 the the DT 2163 71 12 first first JJ 2163 71 13 gold gold JJ 2163 71 14 standard standard JJ 2163 71 15 set set NN 2163 71 16 , , , 2163 71 17 the the DT 2163 71 18 CORA CORA NNP 2163 71 19 dataset dataset NN 2163 71 20 . . . 2163 72 1 At at IN 2163 72 2 a a DT 2163 72 3 later later JJ 2163 72 4 stage stage NN 2163 72 5 , , , 2163 72 6 the the DT 2163 72 7 same same JJ 2163 72 8 group group NN 2163 72 9 applied apply VBD 2163 72 10 CRF CRF NNP 2163 72 11 for for IN 2163 72 12 the the DT 2163 72 13 first first JJ 2163 72 14 time time NN 2163 72 15 to to TO 2163 72 16 perform perform VB 2163 72 17 reference reference NN 2163 72 18 chunking chunking NN 2163 72 19 , , , 2163 72 20 which which WDT 2163 72 21 later later RB 2163 72 22 inspired inspire VBD 2163 72 23 ParsCit ParsCit NNP 2163 72 24 . . . 2163 73 1 15 15 CD 2163 73 2 REFERENCE reference NN 2163 73 3 INFORMATION INFORMATION VBN 2163 73 4 EXTRACTION extraction NN 2163 73 5 AND and CC 2163 73 6 PROCESSING processing NN 2163 73 7 |GROZA |groza NN 2163 73 8 , , , 2163 73 9 GRIMNES GRIMNES NNP 2163 73 10 , , , 2163 73 11 AND and CC 2163 73 12 HANDSCHUH handschuh NN 2163 73 13 10 10 CD 2163 73 14 In in IN 2163 73 15 the the DT 2163 73 16 same same JJ 2163 73 17 learning learning NN 2163 73 18 - - HYPH 2163 73 19 driven drive VBN 2163 73 20 category category NN 2163 73 21 is be VBZ 2163 73 22 the the DT 2163 73 23 work work NN 2163 73 24 of of IN 2163 73 25 Han Han NNP 2163 73 26 et et NNP 2163 73 27 al al NNP 2163 73 28 . . . 2163 74 1 16 16 CD 2163 74 2 The the DT 2163 74 3 authors author NNS 2163 74 4 proposed propose VBD 2163 74 5 an an DT 2163 74 6 effective effective JJ 2163 74 7 word word NN 2163 74 8 clustering cluster VBG 2163 74 9 approach approach NN 2163 74 10 with with IN 2163 74 11 the the DT 2163 74 12 goal goal NN 2163 74 13 of of IN 2163 74 14 reducing reduce VBG 2163 74 15 feature feature NN 2163 74 16 dimensionality dimensionality NN 2163 74 17 when when WRB 2163 74 18 compared compare VBN 2163 74 19 to to IN 2163 74 20 HMM HMM NNP 2163 74 21 , , , 2163 74 22 while while IN 2163 74 23 at at IN 2163 74 24 the the DT 2163 74 25 same same JJ 2163 74 26 time time NN 2163 74 27 improving improve VBG 2163 74 28 the the DT 2163 74 29 overall overall JJ 2163 74 30 chunking chunk VBG 2163 74 31 performance performance NN 2163 74 32 . . . 2163 75 1 The the DT 2163 75 2 resultant resultant JJ 2163 75 3 domain domain NN 2163 75 4 , , , 2163 75 5 rule rule NN 2163 75 6 - - HYPH 2163 75 7 based base VBN 2163 75 8 word word NN 2163 75 9 clustering clustering NN 2163 75 10 method method NN 2163 75 11 for for IN 2163 75 12 cluster cluster NN 2163 75 13 feature feature NN 2163 75 14 representation representation NN 2163 75 15 used use VBD 2163 75 16 clusters cluster NNS 2163 75 17 formed form VBN 2163 75 18 from from IN 2163 75 19 various various JJ 2163 75 20 domain domain NN 2163 75 21 databases database NNS 2163 75 22 and and CC 2163 75 23 word word NN 2163 75 24 orthographic orthographic JJ 2163 75 25 properties property NNS 2163 75 26 . . . 2163 76 1 Consequently consequently RB 2163 76 2 , , , 2163 76 3 they -PRON- PRP 2163 76 4 achieved achieve VBD 2163 76 5 an an DT 2163 76 6 8.5 8.5 CD 2163 76 7 percent percent NN 2163 76 8 improvement improvement NN 2163 76 9 on on IN 2163 76 10 the the DT 2163 76 11 overall overall JJ 2163 76 12 accuracy accuracy NN 2163 76 13 of of IN 2163 76 14 reference reference NN 2163 76 15 fields field VBZ 2163 76 16 classification classification NN 2163 76 17 combined combine VBN 2163 76 18 with with IN 2163 76 19 a a DT 2163 76 20 significant significant JJ 2163 76 21 dimensionality dimensionality NN 2163 76 22 reduction reduction NN 2163 76 23 . . . 2163 77 1 FLUX FLUX NNP 2163 77 2 - - HYPH 2163 77 3 CIM CIM NNP 2163 77 4 17 17 CD 2163 77 5 is be VBZ 2163 77 6 the the DT 2163 77 7 only only JJ 2163 77 8 unsupervised unsupervise VBN 2163 77 9 18 18 CD 2163 77 10 approach approach NN 2163 77 11 that that WDT 2163 77 12 targets target VBZ 2163 77 13 reference reference NN 2163 77 14 chunking chunk VBG 2163 77 15 . . . 2163 78 1 The the DT 2163 78 2 system system NN 2163 78 3 uses use VBZ 2163 78 4 automatically automatically RB 2163 78 5 constructed construct VBN 2163 78 6 knowledge knowledge NN 2163 78 7 bases basis NNS 2163 78 8 from from IN 2163 78 9 an an DT 2163 78 10 existing exist VBG 2163 78 11 set set NN 2163 78 12 of of IN 2163 78 13 sample sample NN 2163 78 14 references reference NNS 2163 78 15 for for IN 2163 78 16 recognizing recognize VBG 2163 78 17 the the DT 2163 78 18 component component NN 2163 78 19 fields field NNS 2163 78 20 of of IN 2163 78 21 a a DT 2163 78 22 reference reference NN 2163 78 23 . . . 2163 79 1 The the DT 2163 79 2 chunking chunking NN 2163 79 3 process process NN 2163 79 4 features feature VBZ 2163 79 5 two two CD 2163 79 6 steps step NNS 2163 79 7 : : : 2163 79 8   : 2163 79 9 a a DT 2163 79 10 probability probability NN 2163 79 11 estimation estimation NN 2163 79 12 of of IN 2163 79 13 a a DT 2163 79 14 given give VBN 2163 79 15 term term NN 2163 79 16 within within IN 2163 79 17 a a DT 2163 79 18 reference reference NN 2163 79 19 which which WDT 2163 79 20 is be VBZ 2163 79 21 a a DT 2163 79 22 value value NN 2163 79 23 for for IN 2163 79 24 a a DT 2163 79 25 given give VBN 2163 79 26 reference reference NN 2163 79 27 field field NN 2163 79 28 based base VBN 2163 79 29 on on IN 2163 79 30 the the DT 2163 79 31 information information NN 2163 79 32 encoded encode VBN 2163 79 33 in in IN 2163 79 34 their -PRON- PRP$ 2163 79 35 knowledge knowledge NN 2163 79 36 bases basis NNS 2163 79 37 , , , 2163 79 38 and and CC 2163 79 39   NFP 2163 79 40 the the DT 2163 79 41 use use NN 2163 79 42 of of IN 2163 79 43 generic generic JJ 2163 79 44 structural structural JJ 2163 79 45 properties property NNS 2163 79 46 of of IN 2163 79 47 references reference NNS 2163 79 48 . . . 2163 80 1 Similarly similarly RB 2163 80 2 to to IN 2163 80 3 Seymore Seymore NNP 2163 80 4 et et NNP 2163 80 5 al al NNP 2163 80 6 . . NNP 2163 80 7 , , , 2163 80 8 19 19 CD 2163 80 9 the the DT 2163 80 10 authors author NNS 2163 80 11 have have VBP 2163 80 12 also also RB 2163 80 13 created create VBN 2163 80 14 two two CD 2163 80 15 datasets dataset NNS 2163 80 16 ( ( -LRB- 2163 80 17 specifically specifically RB 2163 80 18 for for IN 2163 80 19 the the DT 2163 80 20 computer computer NN 2163 80 21 science science NN 2163 80 22 and and CC 2163 80 23 health health NN 2163 80 24 science science NN 2163 80 25 areas area NNS 2163 80 26 ) ) -RRB- 2163 80 27 to to TO 2163 80 28 be be VB 2163 80 29 used use VBN 2163 80 30 for for IN 2163 80 31 comparing compare VBG 2163 80 32 the the DT 2163 80 33 achieved achieve VBN 2163 80 34 accuracies accuracy NNS 2163 80 35 . . . 2163 81 1 A a DT 2163 81 2 completely completely RB 2163 81 3 different different JJ 2163 81 4 , , , 2163 81 5 and and CC 2163 81 6 novel novel NN 2163 81 7 , , , 2163 81 8 direction direction NN 2163 81 9 was be VBD 2163 81 10 developed develop VBN 2163 81 11 by by IN 2163 81 12 Poon Poon NNP 2163 81 13 and and CC 2163 81 14 Domingos Domingos NNP 2163 81 15 . . . 2163 82 1 20 20 CD 2163 82 2 Unlike unlike IN 2163 82 3 all all PDT 2163 82 4 the the DT 2163 82 5 other other JJ 2163 82 6 approaches approach NNS 2163 82 7 , , , 2163 82 8 they -PRON- PRP 2163 82 9 propose propose VBP 2163 82 10 a a DT 2163 82 11 solution solution NN 2163 82 12 where where WRB 2163 82 13 the the DT 2163 82 14 segmentation segmentation NN 2163 82 15 ( ( -LRB- 2163 82 16 chunking chunking NN 2163 82 17 ) ) -RRB- 2163 82 18 of of IN 2163 82 19 the the DT 2163 82 20 reference reference NN 2163 82 21 fields field NNS 2163 82 22 is be VBZ 2163 82 23 performed perform VBN 2163 82 24 together together RB 2163 82 25 with with IN 2163 82 26 the the DT 2163 82 27 entity entity NN 2163 82 28 resolution resolution NN 2163 82 29 in in IN 2163 82 30 a a DT 2163 82 31 single single JJ 2163 82 32 integrated integrated JJ 2163 82 33 inference inference NN 2163 82 34 process process NN 2163 82 35 . . . 2163 83 1 They -PRON- PRP 2163 83 2 , , , 2163 83 3 thus thus RB 2163 83 4 , , , 2163 83 5 help help NN 2163 83 6 in in IN 2163 83 7 disambiguating disambiguate VBG 2163 83 8 the the DT 2163 83 9 boundaries boundary NNS 2163 83 10 of of IN 2163 83 11 less less RBR 2163 83 12 - - HYPH 2163 83 13 clear clear RB 2163 83 14 chunked chunk VBN 2163 83 15 fields field NNS 2163 83 16 , , , 2163 83 17 using use VBG 2163 83 18 the the DT 2163 83 19 already already RB 2163 83 20 well well RB 2163 83 21 - - HYPH 2163 83 22 segmented segment VBN 2163 83 23 ones one NNS 2163 83 24 . . . 2163 84 1 Although although IN 2163 84 2 the the DT 2163 84 3 results result NNS 2163 84 4 achieved achieve VBN 2163 84 5 are be VBP 2163 84 6 similar similar JJ 2163 84 7 to to IN 2163 84 8 , , , 2163 84 9 and and CC 2163 84 10 even even RB 2163 84 11 better well JJR 2163 84 12 than than IN 2163 84 13 some some DT 2163 84 14 of of IN 2163 84 15 , , , 2163 84 16 the the DT 2163 84 17 above above RB 2163 84 18 - - HYPH 2163 84 19 mentioned mention VBN 2163 84 20 approaches approach NNS 2163 84 21 , , , 2163 84 22 this this DT 2163 84 23 is be VBZ 2163 84 24 suboptimal suboptimal JJ 2163 84 25 from from IN 2163 84 26 the the DT 2163 84 27 computational computational JJ 2163 84 28 perspective perspective NN 2163 84 29 : : : 2163 84 30 the the DT 2163 84 31 chunking chunking NN 2163 84 32 / / SYM 2163 84 33 resolution resolution NN 2163 84 34 time time NN 2163 84 35 reported report VBN 2163 84 36 by by IN 2163 84 37 the the DT 2163 84 38 authors author NNS 2163 84 39 measured measure VBN 2163 84 40 around around RB 2163 84 41 thirty thirty CD 2163 84 42 minutes minute NNS 2163 84 43 . . . 2163 85 1 In in IN 2163 85 2 addition addition NN 2163 85 3 to to IN 2163 85 4 the the DT 2163 85 5 previously previously RB 2163 85 6 described describe VBN 2163 85 7 works work NNS 2163 85 8 , , , 2163 85 9 which which WDT 2163 85 10 were be VBD 2163 85 11 specifically specifically RB 2163 85 12 tailored tailor VBN 2163 85 13 for for IN 2163 85 14 bibliographic bibliographic JJ 2163 85 15 metadata metadata NN 2163 85 16 extraction extraction NN 2163 85 17 , , , 2163 85 18 there there EX 2163 85 19 are be VBP 2163 85 20 a a DT 2163 85 21 series series NN 2163 85 22 of of IN 2163 85 23 other other JJ 2163 85 24 approaches approach NNS 2163 85 25 that that WDT 2163 85 26 could could MD 2163 85 27 be be VB 2163 85 28 used use VBN 2163 85 29 for for IN 2163 85 30 the the DT 2163 85 31 same same JJ 2163 85 32 purpose purpose NN 2163 85 33 . . . 2163 86 1 For for IN 2163 86 2 example example NN 2163 86 3 , , , 2163 86 4 Cesario Cesario NNP 2163 86 5 et et NNP 2163 86 6 al al NNP 2163 86 7 . . . 2163 87 1 propose propose VB 2163 87 2 an an DT 2163 87 3 innovative innovative JJ 2163 87 4 recursive recursive JJ 2163 87 5 boosting boosting NN 2163 87 6 strategy strategy NN 2163 87 7 , , , 2163 87 8 with with IN 2163 87 9 progressive progressive JJ 2163 87 10 classification classification NN 2163 87 11 , , , 2163 87 12 to to TO 2163 87 13 reconcile reconcile VB 2163 87 14 textual textual JJ 2163 87 15 elements element NNS 2163 87 16 to to IN 2163 87 17 an an DT 2163 87 18 existing exist VBG 2163 87 19 attribute attribute NN 2163 87 20 schema schema NN 2163 87 21 . . . 2163 88 1 21 21 CD 2163 88 2 In in IN 2163 88 3 the the DT 2163 88 4 case case NN 2163 88 5 of of IN 2163 88 6 bibliographic bibliographic JJ 2163 88 7 metadata metadata NN 2163 88 8 segmentation segmentation NN 2163 88 9 , , , 2163 88 10 the the DT 2163 88 11 metadata metadata NN 2163 88 12 fields field NNS 2163 88 13 would would MD 2163 88 14 correspond correspond VB 2163 88 15 to to IN 2163 88 16 the the DT 2163 88 17 textual textual JJ 2163 88 18 elements element NNS 2163 88 19 , , , 2163 88 20 while while IN 2163 88 21 an an DT 2163 88 22 ontology ontology NN 2163 88 23 describing describe VBG 2163 88 24 them -PRON- PRP 2163 88 25 ( ( -LRB- 2163 88 26 e.g. e.g. RB 2163 88 27 , , , 2163 88 28 DublinCore DublinCore NNP 2163 88 29 22 22 CD 2163 88 30 or or CC 2163 88 31 SWRC SWRC NNP 2163 88 32 23 23 CD 2163 88 33 ) ) -RRB- 2163 88 34 would would MD 2163 88 35 have have VB 2163 88 36 the the DT 2163 88 37 schema schema NNP 2163 88 38 role role NN 2163 88 39 . . . 2163 89 1 The the DT 2163 89 2 authors author NNS 2163 89 3 even even RB 2163 89 4 describe describe VBP 2163 89 5 an an DT 2163 89 6 evaluation evaluation NN 2163 89 7 of of IN 2163 89 8 the the DT 2163 89 9 method method NN 2163 89 10 using use VBG 2163 89 11 the the DT 2163 89 12 DBLP DBLP NNP 2163 89 13 citation citation NN 2163 89 14 dataset dataset NN 2163 89 15 , , , 2163 89 16 however however RB 2163 89 17 , , , 2163 89 18 without without IN 2163 89 19 giving give VBG 2163 89 20 precise precise JJ 2163 89 21 details detail NNS 2163 89 22 on on IN 2163 89 23 the the DT 2163 89 24 fields field NNS 2163 89 25 considered consider VBN 2163 89 26 for for IN 2163 89 27 segmentation segmentation NN 2163 89 28 . . . 2163 90 1 Some some DT 2163 90 2 other other JJ 2163 90 3 approaches approach NNS 2163 90 4 include include VBP 2163 90 5 , , , 2163 90 6 in in IN 2163 90 7 general general JJ 2163 90 8 , , , 2163 90 9 any any DT 2163 90 10 sequence sequence NN 2163 90 11 labeling labeling NN 2163 90 12 techniques technique NNS 2163 90 13 , , , 2163 90 14 e.g. e.g. RB 2163 90 15 , , , 2163 90 16 SLF SLF NNP 2163 90 17 , , , 2163 90 18 24 24 CD 2163 90 19 named name VBN 2163 90 20 entity entity NN 2163 90 21 recognition recognition NN 2163 90 22 techniques technique NNS 2163 90 23 , , , 2163 90 24 25 25 CD 2163 90 25 or or CC 2163 90 26 even even RB 2163 90 27 Field Field NNP 2163 90 28 Association Association NNP 2163 90 29 ( ( -LRB- 2163 90 30 FA FA NNP 2163 90 31 ) ) -RRB- 2163 90 32 terms term NNS 2163 90 33 extraction extraction NN 2163 90 34 , , , 2163 90 35 26 26 CD 2163 90 36 the the DT 2163 90 37 latter latter JJ 2163 90 38 working work VBG 2163 90 39 on on IN 2163 90 40 bibliographic bibliographic JJ 2163 90 41 metadata metadata NN 2163 90 42 fields field NNS 2163 90 43 in in IN 2163 90 44 a a DT 2163 90 45 quasi quasi JJ 2163 90 46 - - JJ 2163 90 47 similar similar JJ 2163 90 48 manner manner NN 2163 90 49 as as IN 2163 90 50 the the DT 2163 90 51 recursive recursive JJ 2163 90 52 boosting boosting NN 2163 90 53 strategy strategy NN 2163 90 54 . . . 2163 91 1 In in IN 2163 91 2 conclusion conclusion NN 2163 91 3 , , , 2163 91 4 it -PRON- PRP 2163 91 5 is be VBZ 2163 91 6 worth worth JJ 2163 91 7 mentioning mention VBG 2163 91 8 that that IN 2163 91 9 retrieving retrieve VBG 2163 91 10 citation citation NNP 2163 91 11 contexts contexts NNP 2163 91 12 is be VBZ 2163 91 13 an an DT 2163 91 14 interesting interesting JJ 2163 91 15 research research NN 2163 91 16 area area NN 2163 91 17 especially especially RB 2163 91 18 in in IN 2163 91 19 the the DT 2163 91 20 context context NN 2163 91 21 of of IN 2163 91 22 digital digital JJ 2163 91 23 libraries library NNS 2163 91 24 . . . 2163 92 1 Our -PRON- PRP$ 2163 92 2 current current JJ 2163 92 3 work work NN 2163 92 4 does do VBZ 2163 92 5 not not RB 2163 92 6 feature feature VB 2163 92 7 this this DT 2163 92 8 aspect aspect NN 2163 92 9 , , , 2163 92 10 but but CC 2163 92 11 we -PRON- PRP 2163 92 12 regard regard VBP 2163 92 13 it -PRON- PRP 2163 92 14 as as IN 2163 92 15 one one CD 2163 92 16 of of IN 2163 92 17 the the DT 2163 92 18 key key JJ 2163 92 19 next next JJ 2163 92 20 steps step NNS 2163 92 21 to to TO 2163 92 22 be be VB 2163 92 23 tackled tackle VBN 2163 92 24 . . . 2163 93 1 Consequently consequently RB 2163 93 2 , , , 2163 93 3 we -PRON- PRP 2163 93 4 mention mention VBP 2163 93 5 the the DT 2163 93 6 research research NN 2163 93 7 performed perform VBN 2163 93 8 by by IN 2163 93 9 Schwartz Schwartz NNP 2163 93 10 et et FW 2163 93 11 al al NNP 2163 93 12 . . . 2163 94 1 27 27 CD 2163 94 2 Teufel Teufel NNP 2163 94 3 et et NNP 2163 94 4 al al NNP 2163 94 5 . . NNP 2163 94 6 , , , 2163 94 7 28 28 CD 2163 94 8 or or CC 2163 94 9 Wu Wu NNP 2163 94 10 et et NNP 2163 94 11 al al NNP 2163 94 12 . . . 2163 95 1 29 29 CD 2163 95 2 that that DT 2163 95 3 deal deal VBP 2163 95 4 with with IN 2163 95 5 using use VBG 2163 95 6 citation citation NN 2163 95 7 contexts contexts NN 2163 95 8 for for IN 2163 95 9 discerning discern VBG 2163 95 10 a a DT 2163 95 11 citation citation NN 2163 95 12 's 's POS 2163 95 13 function function NN 2163 95 14 and and CC 2163 95 15 analyzing analyze VBG 2163 95 16 how how WRB 2163 95 17 this this DT 2163 95 18 influences influence NNS 2163 95 19 or or CC 2163 95 20 is be VBZ 2163 95 21 influenced influence VBN 2163 95 22 by by IN 2163 95 23 the the DT 2163 95 24 work work NN 2163 95 25 it -PRON- PRP 2163 95 26 points point VBZ 2163 95 27 to to TO 2163 95 28 . . . 2163 96 1 INFORMATION INFORMATION NNP 2163 96 2 TECHNOLOGY technology NN 2163 96 3 AND and CC 2163 96 4 LIBRARIES LIBRARIES NNP 2163 96 5 | | NNP 2163 96 6 JUNE JUNE NNP 2163 96 7 2012 2012 CD 2163 96 8 11 11 CD 2163 96 9 3 3 CD 2163 96 10 . . . 2163 97 1 METHOD METHOD NNP 2163 97 2 This this DT 2163 97 3 section section NN 2163 97 4 presents present VBZ 2163 97 5 the the DT 2163 97 6 CRF CRF NNP 2163 97 7 chunker chunker NN 2163 97 8 model model NN 2163 97 9 . . . 2163 98 1 We -PRON- PRP 2163 98 2 start start VBP 2163 98 3 by by IN 2163 98 4 defining define VBG 2163 98 5 the the DT 2163 98 6 preprocessing preprocesse VBG 2163 98 7 steps step NNS 2163 98 8 that that WDT 2163 98 9 deal deal VBP 2163 98 10 with with IN 2163 98 11 the the DT 2163 98 12 extraction extraction NN 2163 98 13 of of IN 2163 98 14 the the DT 2163 98 15 references reference NNS 2163 98 16 block block NN 2163 98 17 , , , 2163 98 18 dividing divide VBG 2163 98 19 the the DT 2163 98 20 block block NN 2163 98 21 into into IN 2163 98 22 actual actual JJ 2163 98 23 reference reference NN 2163 98 24 entries entry NNS 2163 98 25 and and CC 2163 98 26 cleaning clean VBG 2163 98 27 the the DT 2163 98 28 reference reference NN 2163 98 29 strings string NNS 2163 98 30 , , , 2163 98 31 and and CC 2163 98 32 then then RB 2163 98 33 detail detail VB 2163 98 34 the the DT 2163 98 35 CRF CRF NNP 2163 98 36 reference reference NN 2163 98 37 chunker chunker NN 2163 98 38 features feature NNS 2163 98 39 . . . 2163 99 1 3.1 3.1 CD 2163 99 2 Prerequisites prerequisite NNS 2163 99 3 Most Most JJS 2163 99 4 of of IN 2163 99 5 the the DT 2163 99 6 features feature NNS 2163 99 7 used use VBN 2163 99 8 by by IN 2163 99 9 the the DT 2163 99 10 CRF CRF NNP 2163 99 11 chunker chunker NN 2163 99 12 require require VBP 2163 99 13 some some DT 2163 99 14 forms form NNS 2163 99 15 of of IN 2163 99 16 vocabulary vocabulary JJ 2163 99 17 entries entry NNS 2163 99 18 . . . 2163 100 1 Therefore therefore RB 2163 100 2 , , , 2163 100 3 we -PRON- PRP 2163 100 4 have have VBP 2163 100 5 manually manually RB 2163 100 6 compiled compile VBN 2163 100 7 a a DT 2163 100 8 comprehensive comprehensive JJ 2163 100 9 list list NN 2163 100 10 of of IN 2163 100 11 gazetteers gazetteer NNS 2163 100 12 ( ( -LRB- 2163 100 13 only only RB 2163 100 14 for for IN 2163 100 15 English English NNP 2163 100 16 , , , 2163 100 17 except except IN 2163 100 18 for for IN 2163 100 19 the the DT 2163 100 20 names name NNS 2163 100 21 ) ) -RRB- 2163 100 22 , , , 2163 100 23 explained explain VBD 2163 100 24 as as IN 2163 100 25 follows follow VBZ 2163 100 26 : : : 2163 100 27   CD 2163 100 28 FirstName—25,155 FirstName—25,155 NNP 2163 100 29 entries entry NNS 2163 100 30 gazetteer gazetteer NN 2163 100 31 of of IN 2163 100 32 the the DT 2163 100 33 most most RBS 2163 100 34 common common JJ 2163 100 35 first first JJ 2163 100 36 names name NNS 2163 100 37 ( ( -LRB- 2163 100 38 independent independent JJ 2163 100 39 of of IN 2163 100 40 gender gender NN 2163 100 41 ) ) -RRB- 2163 100 42 ; ; : 2163 100 43   : 2163 100 44 LastName—48,378 LastName—48,378 NNP 2163 100 45 entries entries NNP 2163 100 46 list list NN 2163 100 47 of of IN 2163 100 48 the the DT 2163 100 49 most most RBS 2163 100 50 common common JJ 2163 100 51 surnames surname NNS 2163 100 52 ; ; : 2163 100 53   : 2163 100 54 Month month NN 2163 100 55 — — : 2163 100 56 month month NN 2163 100 57 names name NNS 2163 100 58 gazetteer gazetteer NN 2163 100 59 and and CC 2163 100 60 associated associated NNP 2163 100 61 abbreviations abbreviation NNS 2163 100 62 ; ; , 2163 100 63   : 2163 100 64 VenueType VenueType NNP 2163 100 65 — — : 2163 100 66 a a DT 2163 100 67 structured structured JJ 2163 100 68 gazetteer gazetteer NN 2163 100 69 with with IN 2163 100 70 five five CD 2163 100 71 categories category NNS 2163 100 72 : : : 2163 100 73 Conference Conference NNP 2163 100 74 , , , 2163 100 75 Workshop Workshop NNP 2163 100 76 , , , 2163 100 77 Journal Journal NNP 2163 100 78 , , , 2163 100 79 TechReport TechReport NNP 2163 100 80 , , , 2163 100 81 and and CC 2163 100 82 Website website NN 2163 100 83 . . . 2163 101 1 Each each DT 2163 101 2 category category NN 2163 101 3 has have VBZ 2163 101 4 attached attach VBN 2163 101 5 its -PRON- PRP$ 2163 101 6 own own JJ 2163 101 7 gazetteer gazetteer NN 2163 101 8 , , , 2163 101 9 containing contain VBG 2163 101 10 specific specific JJ 2163 101 11 keywords keyword NNS 2163 101 12 and and CC 2163 101 13 not not RB 2163 101 14 actual actual JJ 2163 101 15 titles title NNS 2163 101 16 . . . 2163 102 1 For for IN 2163 102 2 example example NN 2163 102 3 , , , 2163 102 4 the the DT 2163 102 5 Conference Conference NNP 2163 102 6 gazetteerfeatures gazetteerfeature VBZ 2163 102 7 ten ten CD 2163 102 8 unigrams unigram NNS 2163 102 9 signaling signal VBG 2163 102 10 conferences conference NNS 2163 102 11 , , , 2163 102 12 such such JJ 2163 102 13 as as IN 2163 102 14 Conference Conference NNP 2163 102 15 , , , 2163 102 16 Conf Conf NNP 2163 102 17 , , , 2163 102 18 or or CC 2163 102 19 Symposium Symposium NNP 2163 102 20 ; ; : 2163 102 21   NFP 2163 102 22 Location location NN 2163 102 23 — — : 2163 102 24 places place NNS 2163 102 25 , , , 2163 102 26 cities city NNS 2163 102 27 , , , 2163 102 28 and and CC 2163 102 29 countries country NNS 2163 102 30 gazetteer gazetteer NN 2163 102 31 comprising comprise VBG 2163 102 32 17,336 17,336 CD 2163 102 33 entries entry NNS 2163 102 34 ; ; : 2163 102 35   NFP 2163 102 36 Organization—150 organization—150 ADD 2163 102 37 entries entries NNP 2163 102 38 gazetteer gazetteer NNP 2163 102 39 listing list VBG 2163 102 40 organization organization NN 2163 102 41 prefixes prefix NNS 2163 102 42 and and CC 2163 102 43 suffixes suffix NNS 2163 102 44 ( ( -LRB- 2163 102 45 e.g. e.g. RB 2163 102 46 , , , 2163 102 47 e e NNP 2163 102 48 . . . 2163 102 49 V. V. NNP 2163 103 1 or or CC 2163 103 2 KGaA KGaA NNP 2163 103 3 ) ) -RRB- 2163 103 4 ; ; : 2163 103 5   NFP 2163 103 6 Proceedings proceeding NNS 2163 103 7 — — : 2163 103 8 simple simple JJ 2163 103 9 list list NN 2163 103 10 of of IN 2163 103 11 all all DT 2163 103 12 possible possible JJ 2163 103 13 appearances appearance NNS 2163 103 14 of of IN 2163 103 15 the the DT 2163 103 16 Proceedings Proceedings NNP 2163 103 17 marker marker NN 2163 103 18 ; ; : 2163 103 19   NN 2163 103 20 Publisher—564 Publisher—564 NNP 2163 103 21 entries entries NNP 2163 103 22 gazetteer gazetteer NNP 2163 103 23 comprising comprise VBG 2163 103 24 publisher publisher NN 2163 103 25 unigrams unigrams NNP 2163 103 26 ( ( -LRB- 2163 103 27 produced produce VBN 2163 103 28 from from IN 2163 103 29 around around IN 2163 103 30 150 150 CD 2163 103 31 publisher publisher NN 2163 103 32 names name NNS 2163 103 33 ) ) -RRB- 2163 103 34 ; ; : 2163 103 35   : 2163 103 36 JTitle—12,101 jtitle—12,101 CD 2163 103 37 entries entry NNS 2163 103 38 list list NN 2163 103 39 of of IN 2163 103 40 journal journal NNP 2163 103 41 title title NNP 2163 103 42 unigrams unigrams NNP 2163 103 43 ( ( -LRB- 2163 103 44 produced produce VBN 2163 103 45 from from IN 2163 103 46 around around IN 2163 103 47 1600 1600 CD 2163 103 48 journal journal NNP 2163 103 49 titles title NNS 2163 103 50 ) ) -RRB- 2163 103 51 ; ; : 2163 103 52   : 2163 103 53 Connection connection NN 2163 103 54 — — : 2163 103 55 a a DT 2163 103 56 42 42 CD 2163 103 57 entries entry NNS 2163 103 58 stop stop NN 2163 103 59 - - HYPH 2163 103 60 word word NN 2163 103 61 gazetteer gazetteer NN 2163 103 62 ( ( -LRB- 2163 103 63 e.g. e.g. RB 2163 103 64 , , , 2163 103 65 to to IN 2163 103 66 , , , 2163 103 67 and and CC 2163 103 68 , , , 2163 103 69 as as IN 2163 103 70 ) ) -RRB- 2163 103 71 . . . 2163 104 1 3.2 3.2 CD 2163 104 2 Preprocessing Preprocessing NNP 2163 104 3 In in IN 2163 104 4 the the DT 2163 104 5 preprocessing preprocessing NN 2163 104 6 stage stage NN 2163 104 7 we -PRON- PRP 2163 104 8 deal deal VBP 2163 104 9 with with IN 2163 104 10 three three CD 2163 104 11 aspects aspect NNS 2163 104 12 : : : 2163 104 13   : 2163 104 14 cleaning clean VBG 2163 104 15 the the DT 2163 104 16 provided provide VBN 2163 104 17 input input NN 2163 104 18 , , , 2163 104 19   : 2163 104 20 extracting extract VBG 2163 104 21 the the DT 2163 104 22 reference reference NN 2163 104 23 block block NN 2163 104 24 , , , 2163 104 25 and and CC 2163 104 26   NFP 2163 104 27 the the DT 2163 104 28 division division NN 2163 104 29 of of IN 2163 104 30 the the DT 2163 104 31 reference reference NN 2163 104 32 block block NN 2163 104 33 into into IN 2163 104 34 reference reference NN 2163 104 35 entries entry NNS 2163 104 36 . . . 2163 105 1 The the DT 2163 105 2 first first JJ 2163 105 3 step step NN 2163 105 4 aims aim VBZ 2163 105 5 to to TO 2163 105 6 clean clean VB 2163 105 7 the the DT 2163 105 8 raw raw JJ 2163 105 9 textual textual JJ 2163 105 10 input input NN 2163 105 11 received receive VBN 2163 105 12 by by IN 2163 105 13 the the DT 2163 105 14 chunker chunker NN 2163 105 15 of of IN 2163 105 16 unwanted unwanted JJ 2163 105 17 spacing space VBG 2163 105 18 characters character NNS 2163 105 19 while while IN 2163 105 20 at at IN 2163 105 21 the the DT 2163 105 22 same same JJ 2163 105 23 time time NN 2163 105 24 ensuring ensure VBG 2163 105 25 proper proper JJ 2163 105 26 spacing spacing NN 2163 105 27 where where WRB 2163 105 28 necessary necessary JJ 2163 105 29 . . . 2163 106 1 Since since IN 2163 106 2 the the DT 2163 106 3 source source NN 2163 106 4 of of IN 2163 106 5 the the DT 2163 106 6 textual textual JJ 2163 106 7 input input NN 2163 106 8 is be VBZ 2163 106 9 unknown unknown JJ 2163 106 10 to to IN 2163 106 11 the the DT 2163 106 12 chunker chunker NN 2163 106 13 , , , 2163 106 14 we -PRON- PRP 2163 106 15 make make VBP 2163 106 16 no no DT 2163 106 17 assumptions assumption NNS 2163 106 18 with with IN 2163 106 19 regard regard NN 2163 106 20 to to IN 2163 106 21 its -PRON- PRP$ 2163 106 22 structure structure NN 2163 106 23 or or CC 2163 106 24 content content NN 2163 106 25 . . . 2163 107 1 30 30 CD 2163 107 2 Thus thus RB 2163 107 3 , , , 2163 107 4 in in IN 2163 107 5 order order NN 2163 107 6 to to TO 2163 107 7 avoid avoid VB 2163 107 8 inherent inherent JJ 2163 107 9 errors error NNS 2163 107 10 that that WDT 2163 107 11 might may MD 2163 107 12 appear appear VB 2163 107 13 as as IN 2163 107 14 a a DT 2163 107 15 result result NN 2163 107 16 of of IN 2163 107 17 extracting extract VBG 2163 107 18 the the DT 2163 107 19 raw raw JJ 2163 107 20 text text NN 2163 107 21 from from IN 2163 107 22 the the DT 2163 107 23 original original JJ 2163 107 24 document document NN 2163 107 25 , , , 2163 107 26 we -PRON- PRP 2163 107 27 perform perform VBP 2163 107 28 the the DT 2163 107 29 following follow VBG 2163 107 30 cleaning cleaning NN 2163 107 31 steps step NNS 2163 107 32 : : : 2163 107 33   NFP 2163 107 34 we -PRON- PRP 2163 107 35 compress compress VBP 2163 107 36 the the DT 2163 107 37 text text NN 2163 107 38 by by IN 2163 107 39 eliminating eliminate VBG 2163 107 40 unnecessary unnecessary JJ 2163 107 41 carriage carriage NN 2163 107 42 returns return NNS 2163 107 43 , , , 2163 107 44 such such JJ 2163 107 45 that that IN 2163 107 46 the the DT 2163 107 47 lines line NNS 2163 107 48 containing contain VBG 2163 107 49 less less JJR 2163 107 50 than than IN 2163 107 51 15 15 CD 2163 107 52 characters character NNS 2163 107 53 are be VBP 2163 107 54 merged merge VBN 2163 107 55 with with IN 2163 107 56 previous previous JJ 2163 107 57 ones one NNS 2163 107 58 , , , 2163 107 59 31 31 CD 2163 107 60   : 2163 107 61 we -PRON- PRP 2163 107 62 introduce introduce VBP 2163 107 63 spaces space NNS 2163 107 64 after after IN 2163 107 65 some some DT 2163 107 66 punctuation punctuation NN 2163 107 67 characters character NNS 2163 107 68 , , , 2163 107 69 such such JJ 2163 107 70 as as IN 2163 107 71 “ " `` 2163 107 72 , , , 2163 107 73 , , , 2163 107 74 ” " '' 2163 107 75 “ " `` 2163 107 76 . . . 2163 107 77 ” " '' 2163 107 78 or or CC 2163 107 79 “ " `` 2163 107 80 - - : 2163 107 81 ” " '' 2163 107 82 , , , 2163 107 83 and and CC 2163 107 84 finally finally RB 2163 107 85 , , , 2163 107 86   NFP 2163 107 87 we -PRON- PRP 2163 107 88 split split VBD 2163 107 89 the the DT 2163 107 90 camel camel NN 2163 107 91 - - HYPH 2163 107 92 cased case VBN 2163 107 93 strings string NNS 2163 107 94 , , , 2163 107 95 such such JJ 2163 107 96 as as IN 2163 107 97 JohnDoe JohnDoe NNP 2163 107 98 . . . 2163 108 1 REFERENCE reference NN 2163 108 2 INFORMATION INFORMATION NNS 2163 108 3 EXTRACTION extraction NN 2163 108 4 AND and CC 2163 108 5 PROCESSING processing NN 2163 108 6 |GROZA |groza NN 2163 108 7 , , , 2163 108 8 GRIMNES GRIMNES NNP 2163 108 9 , , , 2163 108 10 AND and CC 2163 108 11 HANDSCHUH handschuh NN 2163 108 12 12 12 CD 2163 108 13 The the DT 2163 108 14 result result NN 2163 108 15 will will MD 2163 108 16 be be VB 2163 108 17 a a DT 2163 108 18 compact compact JJ 2163 108 19 and and CC 2163 108 20 clean clean JJ 2163 108 21 version version NN 2163 108 22 of of IN 2163 108 23 the the DT 2163 108 24 input input NN 2163 108 25 . . . 2163 109 1 Also also RB 2163 109 2 , , , 2163 109 3 if if IN 2163 109 4 the the DT 2163 109 5 raw raw JJ 2163 109 6 input input NN 2163 109 7 is be VBZ 2163 109 8 already already RB 2163 109 9 compact compact JJ 2163 109 10 and and CC 2163 109 11 clean clean JJ 2163 109 12 , , , 2163 109 13 this this DT 2163 109 14 preprocessing preprocesse VBG 2163 109 15 step step NN 2163 109 16 will will MD 2163 109 17 not not RB 2163 109 18 affect affect VB 2163 109 19 it -PRON- PRP 2163 109 20 . . . 2163 110 1 The the DT 2163 110 2 extraction extraction NN 2163 110 3 of of IN 2163 110 4 the the DT 2163 110 5 reference reference NN 2163 110 6 block block NN 2163 110 7 is be VBZ 2163 110 8 done do VBN 2163 110 9 using use VBG 2163 110 10 regular regular JJ 2163 110 11 expressions expression NNS 2163 110 12 . . . 2163 111 1 Generally generally RB 2163 111 2 , , , 2163 111 3 we -PRON- PRP 2163 111 4 search search VBP 2163 111 5 in in IN 2163 111 6 the the DT 2163 111 7 compacted compact VBN 2163 111 8 and and CC 2163 111 9 cleaned clean VBN 2163 111 10 input input NN 2163 111 11 for for IN 2163 111 12 specific specific JJ 2163 111 13 markers marker NNS 2163 111 14 , , , 2163 111 15 like like IN 2163 111 16 References References NNPS 2163 111 17 or or CC 2163 111 18 Bibliography Bibliography NNP 2163 111 19 , , , 2163 111 20 located locate VBN 2163 111 21 mainly mainly RB 2163 111 22 at at IN 2163 111 23 the the DT 2163 111 24 beginning beginning NN 2163 111 25 of of IN 2163 111 26 a a DT 2163 111 27 line line NN 2163 111 28 . . . 2163 112 1 If if IN 2163 112 2 these these DT 2163 112 3 are be VBP 2163 112 4 not not RB 2163 112 5 directly directly RB 2163 112 6 found find VBN 2163 112 7 , , , 2163 112 8 we -PRON- PRP 2163 112 9 try try VBP 2163 112 10 different different JJ 2163 112 11 variations variation NNS 2163 112 12 , , , 2163 112 13 such such JJ 2163 112 14 as as IN 2163 112 15 , , , 2163 112 16 looking look VBG 2163 112 17 for for IN 2163 112 18 the the DT 2163 112 19 markers marker NNS 2163 112 20 at at IN 2163 112 21 the the DT 2163 112 22 end end NN 2163 112 23 of of IN 2163 112 24 a a DT 2163 112 25 line line NN 2163 112 26 , , , 2163 112 27 or or CC 2163 112 28 looking look VBG 2163 112 29 for for IN 2163 112 30 split split NN 2163 112 31 markers marker NNS 2163 112 32 onto onto IN 2163 112 33 two two CD 2163 112 34 lines line NNS 2163 112 35 ( ( -LRB- 2163 112 36 e.g. e.g. RB 2163 112 37 , , , 2163 112 38 Ref Ref NNP 2163 112 39 – – : 2163 112 40 erences erence NNS 2163 112 41 , , , 2163 112 42 or or CC 2163 112 43 Refer refer VB 2163 112 44 – – : 2163 112 45 ences ence NNS 2163 112 46 ) ) -RRB- 2163 112 47 . . . 2163 113 1 This this DT 2163 113 2 latter latter JJ 2163 113 3 case case NN 2163 113 4 is be VBZ 2163 113 5 a a DT 2163 113 6 typical typical JJ 2163 113 7 consequence consequence NN 2163 113 8 of of IN 2163 113 9 the the DT 2163 113 10 above above RB 2163 113 11 - - HYPH 2163 113 12 described describe VBN 2163 113 13 compacting compact VBG 2163 113 14 step step NN 2163 113 15 if if IN 2163 113 16 the the DT 2163 113 17 initial initial JJ 2163 113 18 input input NN 2163 113 19 was be VBD 2163 113 20 erroneously erroneously RB 2163 113 21 extracted extract VBN 2163 113 22 . . . 2163 114 1 The the DT 2163 114 2 text text NN 2163 114 3 following follow VBG 2163 114 4 the the DT 2163 114 5 markers marker NNS 2163 114 6 is be VBZ 2163 114 7 considered consider VBN 2163 114 8 for for IN 2163 114 9 division division NN 2163 114 10 , , , 2163 114 11 although although IN 2163 114 12 it -PRON- PRP 2163 114 13 may may MD 2163 114 14 contain contain VB 2163 114 15 unwanted unwanted JJ 2163 114 16 parts part NNS 2163 114 17 such such JJ 2163 114 18 as as IN 2163 114 19 appendices appendix NNS 2163 114 20 or or CC 2163 114 21 tables table NNS 2163 114 22 . . . 2163 115 1 The the DT 2163 115 2 division division NN 2163 115 3 into into IN 2163 115 4 individual individual JJ 2163 115 5 reference reference NN 2163 115 6 entries entry NNS 2163 115 7 is be VBZ 2163 115 8 performed perform VBN 2163 115 9 on on IN 2163 115 10 a a DT 2163 115 11 case case NN 2163 115 12 basis basis NN 2163 115 13 . . . 2163 116 1 After after IN 2163 116 2 splitting split VBG 2163 116 3 the the DT 2163 116 4 reference reference NN 2163 116 5 block block NN 2163 116 6 based base VBN 2163 116 7 on on IN 2163 116 8 new new JJ 2163 116 9 lines line NNS 2163 116 10 , , , 2163 116 11 we -PRON- PRP 2163 116 12 look look VBP 2163 116 13 for for IN 2163 116 14 prefix prefix JJ 2163 116 15 patterns pattern NNS 2163 116 16 at at IN 2163 116 17 the the DT 2163 116 18 beginning beginning NN 2163 116 19 of of IN 2163 116 20 each each DT 2163 116 21 line line NN 2163 116 22 . . . 2163 117 1 As as IN 2163 117 2 an an DT 2163 117 3 example example NN 2163 117 4 , , , 2163 117 5 we -PRON- PRP 2163 117 6 analyze analyze VBP 2163 117 7 which which WDT 2163 117 8 lines line NNS 2163 117 9 start start VBP 2163 117 10 with with IN 2163 117 11 “ " `` 2163 117 12 [ [ -LRB- 2163 117 13 ” " '' 2163 117 14 , , , 2163 117 15 “ " `` 2163 117 16 ( ( -LRB- 2163 117 17 ” " '' 2163 117 18 , , , 2163 117 19 or or CC 2163 117 20 a a DT 2163 117 21 number number NN 2163 117 22 followed follow VBN 2163 117 23 by by IN 2163 117 24 “ " `` 2163 117 25 . . . 2163 117 26 ” " '' 2163 117 27 or or CC 2163 117 28 space space NN 2163 117 29 , , , 2163 117 30 and and CC 2163 117 31 we -PRON- PRP 2163 117 32 record record VBP 2163 117 33 the the DT 2163 117 34 positions position NNS 2163 117 35 of of IN 2163 117 36 these these DT 2163 117 37 lines line NNS 2163 117 38 in in IN 2163 117 39 the the DT 2163 117 40 list list NN 2163 117 41 of of IN 2163 117 42 all all DT 2163 117 43 lines line NNS 2163 117 44 . . . 2163 118 1 To to TO 2163 118 2 ensure ensure VB 2163 118 3 that that IN 2163 118 4 we -PRON- PRP 2163 118 5 do do VBP 2163 118 6 n't not RB 2163 118 7 consider consider VB 2163 118 8 any any DT 2163 118 9 false false JJ 2163 118 10 positives positive NNS 2163 118 11 when when WRB 2163 118 12 merging merge VBG 2163 118 13 the the DT 2163 118 14 adjacent adjacent JJ 2163 118 15 lines line NNS 2163 118 16 into into IN 2163 118 17 a a DT 2163 118 18 reference reference NN 2163 118 19 entry entry NN 2163 118 20 , , , 2163 118 21 we -PRON- PRP 2163 118 22 compute compute VBP 2163 118 23 a a DT 2163 118 24 global global JJ 2163 118 25 average average NN 2163 118 26 of of IN 2163 118 27 the the DT 2163 118 28 differences difference NNS 2163 118 29 between between IN 2163 118 30 positions position NNS 2163 118 31 . . . 2163 119 1 Assuming assume VBG 2163 119 2 that that IN 2163 119 3 a a DT 2163 119 4 reference reference NN 2163 119 5 does do VBZ 2163 119 6 not not RB 2163 119 7 span span VB 2163 119 8 on on IN 2163 119 9 more more JJR 2163 119 10 than than IN 2163 119 11 four four CD 2163 119 12 lines line NNS 2163 119 13 , , , 2163 119 14 if if IN 2163 119 15 this this DT 2163 119 16 average average NN 2163 119 17 is be VBZ 2163 119 18 between between IN 2163 119 19 one one CD 2163 119 20 and and CC 2163 119 21 four four CD 2163 119 22 , , , 2163 119 23 a a DT 2163 119 24 reference reference NN 2163 119 25 entry entry NN 2163 119 26 is be VBZ 2163 119 27 created create VBN 2163 119 28 . . . 2163 120 1 The the DT 2163 120 2 same same JJ 2163 120 3 average average NN 2163 120 4 is be VBZ 2163 120 5 also also RB 2163 120 6 used use VBN 2163 120 7 to to TO 2163 120 8 extract extract VB 2163 120 9 the the DT 2163 120 10 last last JJ 2163 120 11 reference reference NN 2163 120 12 in in IN 2163 120 13 the the DT 2163 120 14 list list NN 2163 120 15 , , , 2163 120 16 thus thus RB 2163 120 17 detaching detach VBG 2163 120 18 it -PRON- PRP 2163 120 19 from from IN 2163 120 20 eventual eventual JJ 2163 120 21 appendices appendix NNS 2163 120 22 or or CC 2163 120 23 tables table NNS 2163 120 24 . . . 2163 121 1 3.3 3.3 CD 2163 121 2 The the DT 2163 121 3 reference reference NN 2163 121 4 chunking chunk VBG 2163 121 5 model model NN 2163 121 6 We -PRON- PRP 2163 121 7 have have VBP 2163 121 8 built build VBN 2163 121 9 the the DT 2163 121 10 CRF CRF NNP 2163 121 11 learning learn VBG 2163 121 12 model model NN 2163 121 13 based base VBN 2163 121 14 on on IN 2163 121 15 a a DT 2163 121 16 series series NN 2163 121 17 of of IN 2163 121 18 features feature NNS 2163 121 19 used use VBN 2163 121 20 in in IN 2163 121 21 principle principle NN 2163 121 22 also also RB 2163 121 23 by by IN 2163 121 24 the the DT 2163 121 25 other other JJ 2163 121 26 CRF CRF NNP 2163 121 27 reference reference NN 2163 121 28 chunking chunk VBG 2163 121 29 approaches approach NNS 2163 121 30 such such JJ 2163 121 31 as as IN 2163 121 32 ParsCit ParsCit NNP 2163 121 33 32 32 CD 2163 121 34 or or CC 2163 121 35 Peng Peng NNP 2163 121 36 and and CC 2163 121 37 McCallum McCallum NNP 2163 121 38 33 33 CD 2163 121 39 . . . 2163 122 1 A a DT 2163 122 2 set set NN 2163 122 3 of of IN 2163 122 4 feature feature NN 2163 122 5 values value NNS 2163 122 6 is be VBZ 2163 122 7 used use VBN 2163 122 8 to to TO 2163 122 9 characterize characterize VB 2163 122 10 each each DT 2163 122 11 token token JJ 2163 122 12 present present JJ 2163 122 13 in in IN 2163 122 14 the the DT 2163 122 15 reference reference NN 2163 122 16 string string NN 2163 122 17 , , , 2163 122 18 where where WRB 2163 122 19 the the DT 2163 122 20 reference reference NN 2163 122 21 's 's POS 2163 122 22 token token JJ 2163 122 23 list list NN 2163 122 24 is be VBZ 2163 122 25 obtained obtain VBN 2163 122 26 by by IN 2163 122 27 dividing divide VBG 2163 122 28 the the DT 2163 122 29 reference reference NN 2163 122 30 string string NN 2163 122 31 into into IN 2163 122 32 space space NN 2163 122 33 - - HYPH 2163 122 34 separated separate VBN 2163 122 35 pieces piece NNS 2163 122 36 . . . 2163 123 1 The the DT 2163 123 2 complete complete JJ 2163 123 3 list list NN 2163 123 4 of of IN 2163 123 5 features feature NNS 2163 123 6 is be VBZ 2163 123 7 detailed detail VBN 2163 123 8 as as IN 2163 123 9 follows follow VBZ 2163 123 10 . . . 2163 124 1 We -PRON- PRP 2163 124 2 use use VBP 2163 124 3 example example NN 2163 124 4 1 1 CD 2163 124 5 from from IN 2163 124 6 figure figure NN 2163 124 7 1 1 CD 2163 124 8 toexemplify toexemplify VB 2163 124 9 the the DT 2163 124 10 feature feature NN 2163 124 11 values value NNS 2163 124 12 . . . 2163 125 1   : 2163 125 2 Token Token NNP 2163 125 3 — — : 2163 125 4 the the DT 2163 125 5 original original JJ 2163 125 6 reference reference NN 2163 125 7 token token NN 2163 125 8 : : : 2163 125 9 Bronzwaer Bronzwaer NNP 2163 125 10 , , , 2163 125 11   : 2163 125 12 Clean clean JJ 2163 125 13 token token NN 2163 125 14 — — : 2163 125 15 the the DT 2163 125 16 original original JJ 2163 125 17 token token NN 2163 125 18 , , , 2163 125 19 stripped strip VBN 2163 125 20 of of IN 2163 125 21 any any DT 2163 125 22 punctuation punctuation NN 2163 125 23 and and CC 2163 125 24 lower low JJR 2163 125 25 cased case VBN 2163 125 26 : : : 2163 125 27 bronzwaer bronzwaer NNP 2163 125 28   : 2163 125 29 Token token JJ 2163 125 30 ending ending NN 2163 125 31 — — : 2163 125 32 a a DT 2163 125 33 flag flag NN 2163 125 34 signaling signal VBG 2163 125 35 the the DT 2163 125 36 type type NN 2163 125 37 of of IN 2163 125 38 ending end VBG 2163 125 39 ( ( -LRB- 2163 125 40 possible possible JJ 2163 125 41 values value NNS 2163 125 42 : : : 2163 125 43 lower low JJR 2163 125 44 cap cap NN 2163 125 45 – – : 2163 125 46 c c NN 2163 125 47 / / SYM 2163 125 48 upper upper JJ 2163 125 49 cap cap NN 2163 125 50 – – : 2163 125 51 C c NN 2163 125 52 / / SYM 2163 125 53 digit digit NN 2163 125 54 – – : 2163 125 55 0 0 CD 2163 125 56 / / SYM 2163 125 57 punctuation punctuation NN 2163 125 58 character character NN 2163 125 59 : : : 2163 125 60 , , , 2163 125 61   : 2163 125 62 Token token JJ 2163 125 63 decomposition decomposition NN 2163 125 64 – – : 2163 125 65 start start VB 2163 125 66 — — : 2163 125 67 five five CD 2163 125 68 individual individual JJ 2163 125 69 values value NNS 2163 125 70 corresponding correspond VBG 2163 125 71 to to IN 2163 125 72 token token VB 2163 125 73 's 's POS 2163 125 74 first first JJ 2163 125 75 five five CD 2163 125 76 characters character NNS 2163 125 77 , , , 2163 125 78 taken take VBN 2163 125 79 gradually gradually RB 2163 125 80 : : : 2163 125 81 B b UH 2163 125 82 , , , 2163 125 83 Br br UH 2163 125 84 , , , 2163 125 85 Bro Bro NNP 2163 125 86 , , , 2163 125 87 Bron Bron NNP 2163 125 88 , , , 2163 125 89 Bronz Bronz NNP 2163 125 90   JJ 2163 125 91 Token token JJ 2163 125 92 decomposition decomposition NN 2163 125 93 – – : 2163 125 94 end end NN 2163 125 95 — — : 2163 125 96 five five CD 2163 125 97 individual individual JJ 2163 125 98 values value NNS 2163 125 99 corresponding correspond VBG 2163 125 100 to to IN 2163 125 101 the the DT 2163 125 102 token token NN 2163 125 103 's 's POS 2163 125 104 last last JJ 2163 125 105 five five CD 2163 125 106 characters character NNS 2163 125 107 , , , 2163 125 108 taken take VBN 2163 125 109 gradually gradually RB 2163 125 110 : : : 2163 125 111 r r LS 2163 125 112 , , , 2163 125 113 er er UH 2163 125 114 , , , 2163 125 115 aer aer NNP 2163 125 116 , , , 2163 125 117 waer waer NNP 2163 125 118 , , , 2163 125 119 zwaer zwaer NNP 2163 125 120 , , , 2163 125 121   : 2163 125 122 POS POS NNP 2163 125 123 Tag tag NN 2163 125 124 — — : 2163 125 125 the the DT 2163 125 126 token token NN 2163 125 127 's 's POS 2163 125 128 part part NN 2163 125 129 of of IN 2163 125 130 speech speech NN 2163 125 131 tag tag NN 2163 125 132 ( ( -LRB- 2163 125 133 possible possible JJ 2163 125 134 values value NNS 2163 125 135 : : : 2163 125 136 proper proper JJ 2163 125 137 noun noun JJ 2163 125 138 phrase phrase NN 2163 125 139 – – : 2163 125 140 NNP NNP NNP 2163 125 141 , , , 2163 125 142   : 2163 125 143 noun noun NNP 2163 125 144 phrase phrase NN 2163 125 145 – – : 2163 125 146 NP NP NNP 2163 125 147 , , , 2163 125 148 adjective adjective NN 2163 125 149 – – : 2163 125 150 JJ jj JJ 2163 125 151 , , , 2163 125 152 cardinal cardinal JJ 2163 125 153 number number NN 2163 125 154 – – : 2163 125 155 CD cd NN 2163 125 156 , , , 2163 125 157 etc etc FW 2163 125 158 ) ) -RRB- 2163 125 159 : : : 2163 125 160 NNP NNP NNP 2163 125 161   : 2163 125 162 Orthographic orthographic JJ 2163 125 163 case case NN 2163 125 164 — — : 2163 125 165 a a DT 2163 125 166 flag flag NN 2163 125 167 signaling signal VBG 2163 125 168 the the DT 2163 125 169 token token NN 2163 125 170 's 's POS 2163 125 171 orthographic orthographic JJ 2163 125 172 case case NN 2163 125 173 ( ( -LRB- 2163 125 174 possible possible JJ 2163 125 175 values value NNS 2163 125 176 : : : 2163 125 177   NFP 2163 125 178 initialCap initialCap NNP 2163 125 179 , , , 2163 125 180 singleCap singleCap NNP 2163 125 181 , , , 2163 125 182 lowercase lowercase NN 2163 125 183 , , , 2163 125 184 mixedCaps mixedCaps NNP 2163 125 185 , , , 2163 125 186 allCaps allCaps NNP 2163 125 187 ) ) -RRB- 2163 125 188 : : : 2163 125 189 singleCap singleCap NNP 2163 125 190   : 2163 125 191 Punctuation punctuation NN 2163 125 192 type type NN 2163 125 193 — — : 2163 125 194 a a DT 2163 125 195 flag flag NN 2163 125 196 signaling signal VBG 2163 125 197 the the DT 2163 125 198 presence presence NN 2163 125 199 and and CC 2163 125 200 type type NN 2163 125 201 of of IN 2163 125 202 a a DT 2163 125 203 trailing trail VBG 2163 125 204 punctuation punctuation NN 2163 125 205 character character NN 2163 125 206 ( ( -LRB- 2163 125 207 possible possible JJ 2163 125 208 values value NNS 2163 125 209 : : : 2163 125 210 cont cont NNP 2163 125 211 , , , 2163 125 212 stop stop VB 2163 125 213 , , , 2163 125 214 other other JJ 2163 125 215 ) ) -RRB- 2163 125 216 : : : 2163 125 217 cont cont NN 2163 125 218   : 2163 125 219 Number number NN 2163 125 220 type type NN 2163 125 221 — — : 2163 125 222 a a DT 2163 125 223 flag flag NN 2163 125 224 signaling signal VBG 2163 125 225 the the DT 2163 125 226 presence presence NN 2163 125 227 and and CC 2163 125 228 type type NN 2163 125 229 of of IN 2163 125 230 a a DT 2163 125 231 number number NN 2163 125 232 in in IN 2163 125 233 the the DT 2163 125 234 token token JJ 2163 125 235 ( ( -LRB- 2163 125 236 possible possible JJ 2163 125 237 values value NNS 2163 125 238 : : : 2163 125 239 year year NN 2163 125 240 , , , 2163 125 241 ordinal ordinal JJ 2163 125 242 , , , 2163 125 243 1dig 1dig CD 2163 125 244 , , , 2163 125 245 2dig 2dig CD 2163 125 246 , , , 2163 125 247 3dig 3dig CD 2163 125 248 , , , 2163 125 249 4dig 4dig CD 2163 125 250 , , , 2163 125 251 4dig+ 4dig+ CD 2163 125 252 , , , 2163 125 253 noNumber noNumber NNP 2163 125 254 ) ) -RRB- 2163 125 255 : : : 2163 125 256 noNumber noNumber NNP 2163 125 257 INFORMATION INFORMATION VBD 2163 125 258 TECHNOLOGY TECHNOLOGY NNP 2163 125 259 AND and CC 2163 125 260 LIBRARIES LIBRARIES NNP 2163 125 261 | | NNP 2163 125 262 JUNE JUNE NNP 2163 125 263 2012 2012 CD 2163 125 264 13 13 CD 2163 125 265   : 2163 125 266 Dictionary Dictionary NNP 2163 125 267 entries entry NNS 2163 125 268 — — : 2163 125 269 a a DT 2163 125 270 set set NN 2163 125 271 of of IN 2163 125 272 ten ten CD 2163 125 273 flags flag NNS 2163 125 274 signaling signal VBG 2163 125 275 the the DT 2163 125 276 presence presence NN 2163 125 277 of of IN 2163 125 278 the the DT 2163 125 279 token token NN 2163 125 280 in in IN 2163 125 281 the the DT 2163 125 282 set set NN 2163 125 283 of of IN 2163 125 284 individual individual JJ 2163 125 285 gazetteers gazetteer NNS 2163 125 286 listed list VBN 2163 125 287 in in IN 2163 125 288 Sect Sect NNP 2163 125 289 . . . 2163 126 1 3.1 3.1 CD 2163 126 2 . . . 2163 127 1 For for IN 2163 127 2 our -PRON- PRP$ 2163 127 3 example example NN 2163 127 4 the the DT 2163 127 5 dictionary dictionary JJ 2163 127 6 feature feature NN 2163 127 7 set set VBN 2163 127 8 would would MD 2163 127 9 be be VB 2163 127 10 : : : 2163 127 11 no no DT 2163 127 12 LastName lastname NN 2163 127 13 no no UH 2163 127 14 no no UH 2163 127 15 no no UH 2163 127 16 no no UH 2163 127 17 no no UH 2163 127 18 no no UH 2163 127 19 no no UH 2163 127 20 no no UH 2163 127 21   NN 2163 127 22 Date date NN 2163 127 23 check check NN 2163 127 24 — — : 2163 127 25 a a DT 2163 127 26 flag flag NN 2163 127 27 checking checking NN 2163 127 28 whether whether IN 2163 127 29 the the DT 2163 127 30 token token NN 2163 127 31 may may MD 2163 127 32 contain contain VB 2163 127 33 a a DT 2163 127 34 date date NN 2163 127 35 in in IN 2163 127 36 form form NN 2163 127 37 of of IN 2163 127 38 a a DT 2163 127 39 period period NN 2163 127 40 of of IN 2163 127 41 days day NNS 2163 127 42 , , , 2163 127 43 e.g. e.g. RB 2163 127 44 , , , 2163 127 45 12 12 CD 2163 127 46 - - SYM 2163 127 47 14 14 CD 2163 127 48 ( ( -LRB- 2163 127 49 possible possible JJ 2163 127 50 values value NNS 2163 127 51 : : : 2163 127 52 possDate possdate NN 2163 127 53 , , , 2163 127 54 no no UH 2163 127 55 ) ) -RRB- 2163 127 56 : : : 2163 127 57 no no DT 2163 127 58   : 2163 127 59 Pages page NNS 2163 127 60 check check VBP 2163 127 61 — — : 2163 127 62 a a DT 2163 127 63 flag flag NN 2163 127 64 checking checking NN 2163 127 65 whether whether IN 2163 127 66 the the DT 2163 127 67 token token NN 2163 127 68 may may MD 2163 127 69 contain contain VB 2163 127 70 pages page NNS 2163 127 71 , , , 2163 127 72 e.g. e.g. RB 2163 127 73 , , , 2163 127 74 234–238 234–238 CD 2163 127 75 ( ( -LRB- 2163 127 76 possible possible JJ 2163 127 77 values value NNS 2163 127 78 : : : 2163 127 79 possPages posspage NNS 2163 127 80 , , , 2163 127 81 no no UH 2163 127 82 ) ) -RRB- 2163 127 83 : : : 2163 127 84 no no DT 2163 127 85   JJ 2163 127 86 Token token JJ 2163 127 87 placement placement NN 2163 127 88 — — : 2163 127 89 the the DT 2163 127 90 token token JJ 2163 127 91 placement placement NN 2163 127 92 in in IN 2163 127 93 the the DT 2163 127 94 reference reference NN 2163 127 95 string string NN 2163 127 96 , , , 2163 127 97 based base VBN 2163 127 98 on on IN 2163 127 99 its -PRON- PRP$ 2163 127 100 division division NN 2163 127 101 into into IN 2163 127 102 nine nine CD 2163 127 103 equal equal JJ 2163 127 104 consecutive consecutive JJ 2163 127 105 buckets bucket NNS 2163 127 106 . . . 2163 128 1 This this DT 2163 128 2 feature feature NN 2163 128 3 indicates indicate VBZ 2163 128 4 the the DT 2163 128 5 bucket bucket NN 2163 128 6 number number NN 2163 128 7 : : : 2163 128 8 0 0 NFP 2163 128 9 For for IN 2163 128 10 training training NN 2163 128 11 purposes purpose NNS 2163 128 12 we -PRON- PRP 2163 128 13 compiled compile VBD 2163 128 14 and and CC 2163 128 15 manually manually RB 2163 128 16 tagged tag VBD 2163 128 17 a a DT 2163 128 18 set set NN 2163 128 19 of of IN 2163 128 20 830 830 CD 2163 128 21 randomly randomly RB 2163 128 22 chosen choose VBN 2163 128 23 references reference NNS 2163 128 24 . . . 2163 129 1 These these DT 2163 129 2 were be VBD 2163 129 3 extracted extract VBN 2163 129 4 from from IN 2163 129 5 random random JJ 2163 129 6 publications publication NNS 2163 129 7 from from IN 2163 129 8 diverse diverse JJ 2163 129 9 conferences conference NNS 2163 129 10 and and CC 2163 129 11 journals journal NNS 2163 129 12 from from IN 2163 129 13 the the DT 2163 129 14 computer computer NN 2163 129 15 science science NN 2163 129 16 field field NN 2163 129 17 ( ( -LRB- 2163 129 18 collected collect VBN 2163 129 19 from from IN 2163 129 20 IEEE IEEE NNP 2163 129 21 Explorer Explorer NNP 2163 129 22 , , , 2163 129 23 Springer Springer NNP 2163 129 24 Link Link NNP 2163 129 25 or or CC 2163 129 26 the the DT 2163 129 27 ACM ACM NNP 2163 129 28 Portal Portal NNP 2163 129 29 ) ) -RRB- 2163 129 30 , , , 2163 129 31 manually manually RB 2163 129 32 cleaned clean VBN 2163 129 33 , , , 2163 129 34 tagged tag VBN 2163 129 35 , , , 2163 129 36 and and CC 2163 129 37 categorized categorize VBD 2163 129 38 according accord VBG 2163 129 39 to to IN 2163 129 40 their -PRON- PRP$ 2163 129 41 type type NN 2163 129 42 of of IN 2163 129 43 publication publication NN 2163 129 44 venue venue NN 2163 129 45 . . . 2163 130 1 34 34 CD 2163 130 2 To to TO 2163 130 3 achieve achieve VB 2163 130 4 an an DT 2163 130 5 increased increase VBN 2163 130 6 versatility versatility NN 2163 130 7 , , , 2163 130 8 instead instead RB 2163 130 9 of of IN 2163 130 10 performing perform VBG 2163 130 11 cross- cross- JJ 2163 130 12 validation validation NN 2163 130 13 , , , 2163 130 14 35 35 CD 2163 130 15 which which WDT 2163 130 16 would would MD 2163 130 17 result result VB 2163 130 18 in in IN 2163 130 19 a a DT 2163 130 20 dataset- dataset- JJ 2163 130 21 tailored tailored JJ 2163 130 22 model model NN 2163 130 23 with with IN 2163 130 24 limited limited JJ 2163 130 25 or or CC 2163 130 26 no no DT 2163 130 27 versatility versatility NN 2163 130 28 , , , 2163 130 29 we -PRON- PRP 2163 130 30 opted opt VBD 2163 130 31 for for IN 2163 130 32 sampling sample VBG 2163 130 33 the the DT 2163 130 34 test test NN 2163 130 35 data datum NNS 2163 130 36 . . . 2163 131 1 Hence hence RB 2163 131 2 , , , 2163 131 3 we -PRON- PRP 2163 131 4 included include VBD 2163 131 5 in in IN 2163 131 6 the the DT 2163 131 7 training training NN 2163 131 8 corpus corpus NN 2163 131 9 some some DT 2163 131 10 samples sample NNS 2163 131 11 from from IN 2163 131 12 the the DT 2163 131 13 testing testing NN 2163 131 14 datasets dataset NNS 2163 131 15 as as IN 2163 131 16 follows follow VBZ 2163 131 17 : : : 2163 131 18 10 10 CD 2163 131 19 percent percent NN 2163 131 20 of of IN 2163 131 21 the the DT 2163 131 22 CORA CORA NNP 2163 131 23 dataset dataset NN 2163 131 24 ( ( -LRB- 2163 131 25 i.e. i.e. FW 2163 131 26 , , , 2163 131 27 20 20 CD 2163 131 28 entries entry NNS 2163 131 29 ) ) -RRB- 2163 131 30 , , , 2163 131 31 36 36 CD 2163 131 32 10 10 CD 2163 131 33 percent percent NN 2163 131 34 of of IN 2163 131 35 the the DT 2163 131 36 FLUX FLUX NNP 2163 131 37 - - HYPH 2163 131 38 CIM CIM NNP 2163 131 39 CS CS NNP 2163 131 40 dataset dataset NN 2163 131 41 ( ( -LRB- 2163 131 42 i.e. i.e. FW 2163 131 43 , , , 2163 131 44 30 30 CD 2163 131 45 entries entry NNS 2163 131 46 ) ) -RRB- 2163 131 47 , , , 2163 131 48 37 37 CD 2163 131 49 and and CC 2163 131 50 1 1 CD 2163 131 51 % % NN 2163 131 52 of of IN 2163 131 53 the the DT 2163 131 54 FLUX FLUX NNP 2163 131 55 - - HYPH 2163 131 56 CIM CIM NNP 2163 131 57 HS HS NNP 2163 131 58 dataset dataset NN 2163 131 59 ( ( -LRB- 2163 131 60 i.e. i.e. FW 2163 131 61 , , , 2163 131 62 20 20 CD 2163 131 63 entries entry NNS 2163 131 64 ) ) -RRB- 2163 131 65 . . . 2163 132 1 Consequently consequently RB 2163 132 2 , , , 2163 132 3 the the DT 2163 132 4 final final JJ 2163 132 5 training training NN 2163 132 6 corpus corpus NNP 2163 132 7 consisted consist VBD 2163 132 8 of of IN 2163 132 9 a a DT 2163 132 10 total total NN 2163 132 11 of of IN 2163 132 12 900 900 CD 2163 132 13 reference reference NN 2163 132 14 strings string NNS 2163 132 15 . . . 2163 133 1 To to TO 2163 133 2 clarify clarify VB 2163 133 3 , , , 2163 133 4 this this DT 2163 133 5 is be VBZ 2163 133 6 , , , 2163 133 7 to to IN 2163 133 8 some some DT 2163 133 9 extent extent NN 2163 133 10 , , , 2163 133 11 similar similar JJ 2163 133 12 to to IN 2163 133 13 the the DT 2163 133 14 dataset dataset NN 2163 133 15 - - HYPH 2163 133 16 specific specific JJ 2163 133 17 cross cross NN 2163 133 18 - - NN 2163 133 19 validation validation NN 2163 133 20 , , , 2163 133 21 but but CC 2163 133 22 instead instead RB 2163 133 23 of of IN 2163 133 24 considering consider VBG 2163 133 25 , , , 2163 133 26 for for IN 2163 133 27 example example NN 2163 133 28 , , , 2163 133 29 a a DT 2163 133 30 60–40 60–40 CD 2163 133 31 ratio ratio NN 2163 133 32 for for IN 2163 133 33 training training NN 2163 133 34 / / SYM 2163 133 35 testing testing NN 2163 133 36 , , , 2163 133 37 we -PRON- PRP 2163 133 38 used use VBD 2163 133 39 only only RB 2163 133 40 10 10 CD 2163 133 41 percent percent NN 2163 133 42 for for IN 2163 133 43 training training NN 2163 133 44 , , , 2163 133 45 while while IN 2163 133 46 the the DT 2163 133 47 testing testing NN 2163 133 48 ( ( -LRB- 2163 133 49 described describe VBN 2163 133 50 in in IN 2163 133 51 section section NN 2163 133 52 4 4 CD 2163 133 53 ) ) -RRB- 2163 133 54 was be VBD 2163 133 55 performed perform VBN 2163 133 56 as as IN 2163 133 57 a a DT 2163 133 58 direct direct JJ 2163 133 59 application application NN 2163 133 60 of of IN 2163 133 61 the the DT 2163 133 62 chunker chunker NN 2163 133 63 on on IN 2163 133 64 the the DT 2163 133 65 entire entire JJ 2163 133 66 dataset dataset NN 2163 133 67 . . . 2163 134 1 As as IN 2163 134 2 already already RB 2163 134 3 mentioned mention VBN 2163 134 4 , , , 2163 134 5 our -PRON- PRP$ 2163 134 6 focus focus NN 2163 134 7 on on IN 2163 134 8 computer computer NN 2163 134 9 science science NN 2163 134 10 and and CC 2163 134 11 health health NN 2163 134 12 sciences science NNS 2163 134 13 is be VBZ 2163 134 14 strictly strictly RB 2163 134 15 due due JJ 2163 134 16 to to IN 2163 134 17 evaluation evaluation NN 2163 134 18 purposes purpose NNS 2163 134 19 . . . 2163 135 1 Our -PRON- PRP$ 2163 135 2 proposed propose VBN 2163 135 3 model model NN 2163 135 4 is be VBZ 2163 135 5 domain domain NN 2163 135 6 - - HYPH 2163 135 7 agnostic agnostic JJ 2163 135 8 , , , 2163 135 9 and and CC 2163 135 10 hence hence RB 2163 135 11 , , , 2163 135 12 the the DT 2163 135 13 steps step NNS 2163 135 14 described describe VBN 2163 135 15 here here RB 2163 135 16 can can MD 2163 135 17 be be VB 2163 135 18 easily easily RB 2163 135 19 performed perform VBN 2163 135 20 on on IN 2163 135 21 datasets dataset NNS 2163 135 22 emerged emerge VBD 2163 135 23 from from IN 2163 135 24 other other JJ 2163 135 25 domains domain NNS 2163 135 26 , , , 2163 135 27 if if IN 2163 135 28 at at RB 2163 135 29 all all RB 2163 135 30 necessary necessary JJ 2163 135 31 . . . 2163 136 1 In in IN 2163 136 2 reality reality NN 2163 136 3 , , , 2163 136 4 the the DT 2163 136 5 chunker chunker NN 2163 136 6 ’s ’s POS 2163 136 7 performance performance NN 2163 136 8 on on IN 2163 136 9 references reference NNS 2163 136 10 from from IN 2163 136 11 a a DT 2163 136 12 domain domain NN 2163 136 13 not not RB 2163 136 14 covered cover VBN 2163 136 15 above above RB 2163 136 16 can can MD 2163 136 17 be be VB 2163 136 18 easily easily RB 2163 136 19 boosted boost VBN 2163 136 20 simply simply RB 2163 136 21 by by IN 2163 136 22 including include VBG 2163 136 23 a a DT 2163 136 24 sample sample NN 2163 136 25 of of IN 2163 136 26 references reference NNS 2163 136 27 in in IN 2163 136 28 the the DT 2163 136 29 training training NN 2163 136 30 set set VBN 2163 136 31 and and CC 2163 136 32 then then RB 2163 136 33 retraining retrain VBG 2163 136 34 the the DT 2163 136 35 chunker chunker NN 2163 136 36 . . . 2163 137 1 The the DT 2163 137 2 list list NN 2163 137 3 of of IN 2163 137 4 labels label NNS 2163 137 5 used use VBN 2163 137 6 for for IN 2163 137 7 training training NN 2163 137 8 and and CC 2163 137 9 then then RB 2163 137 10 testing test VBG 2163 137 11 consists consist NNS 2163 137 12 of of IN 2163 137 13 Author author NN 2163 137 14 , , , 2163 137 15 Title Title NNP 2163 137 16 , , , 2163 137 17 Journal Journal NNP 2163 137 18 , , , 2163 137 19 Conference Conference NNP 2163 137 20 , , , 2163 137 21 Workshop Workshop NNP 2163 137 22 , , , 2163 137 23 Website Website NNP 2163 137 24 , , , 2163 137 25 Technicalrep Technicalrep NNP 2163 137 26 , , , 2163 137 27 Date Date NNP 2163 137 28 , , , 2163 137 29 Publisher publisher NN 2163 137 30 , , , 2163 137 31 Location Location NNP 2163 137 32 , , , 2163 137 33 Volnum Volnum NNP 2163 137 34 , , , 2163 137 35 Pages Pages NNPS 2163 137 36 , , , 2163 137 37 Etal Etal NNP 2163 137 38 , , , 2163 137 39 Note Note NNP 2163 137 40 , , , 2163 137 41 Editors Editors NNP 2163 137 42 , , , 2163 137 43 Organization Organization NNP 2163 137 44 . . . 2163 138 1 As as IN 2163 138 2 we -PRON- PRP 2163 138 3 will will MD 2163 138 4 see see VB 2163 138 5 in in IN 2163 138 6 the the DT 2163 138 7 evaluation evaluation NN 2163 138 8 , , , 2163 138 9 not not RB 2163 138 10 all all DT 2163 138 11 labels label NNS 2163 138 12 were be VBD 2163 138 13 actually actually RB 2163 138 14 used use VBN 2163 138 15 for for IN 2163 138 16 testing testing NN 2163 138 17 ( ( -LRB- 2163 138 18 e.g. e.g. RB 2163 138 19 , , , 2163 138 20 Note Note NNP 2163 138 21 or or CC 2163 138 22 Editors Editors NNP 2163 138 23 ) ) -RRB- 2163 138 24 , , , 2163 138 25 some some DT 2163 138 26 of of IN 2163 138 27 them -PRON- PRP 2163 138 28 being be VBG 2163 138 29 present present JJ 2163 138 30 in in IN 2163 138 31 the the DT 2163 138 32 model model NN 2163 138 33 for for IN 2163 138 34 the the DT 2163 138 35 sake sake NN 2163 138 36 of of IN 2163 138 37 disambiguation disambiguation NN 2163 138 38 . . . 2163 139 1 Also also RB 2163 139 2 , , , 2163 139 3 as as IN 2163 139 4 opposed oppose VBN 2163 139 5 to to IN 2163 139 6 the the DT 2163 139 7 other other JJ 2163 139 8 approaches approach NNS 2163 139 9 , , , 2163 139 10 we -PRON- PRP 2163 139 11 made make VBD 2163 139 12 a a DT 2163 139 13 clear clear JJ 2163 139 14 distinction distinction NN 2163 139 15 between between IN 2163 139 16 Workshop Workshop NNP 2163 139 17 and and CC 2163 139 18 Conference Conference NNP 2163 139 19 , , , 2163 139 20 which which WDT 2163 139 21 adds add VBZ 2163 139 22 an an DT 2163 139 23 extra extra JJ 2163 139 24 degree degree NN 2163 139 25 to to IN 2163 139 26 the the DT 2163 139 27 complexity complexity NN 2163 139 28 of of IN 2163 139 29 the the DT 2163 139 30 disambiguation disambiguation NN 2163 139 31 . . . 2163 140 1 The the DT 2163 140 2 CRF CRF NNP 2163 140 3 model model NN 2163 140 4 was be VBD 2163 140 5 trained train VBN 2163 140 6 using use VBG 2163 140 7 the the DT 2163 140 8 MALLET MALLET NNP 2163 140 9 ( ( -LRB- 2163 140 10 A a DT 2163 140 11 Machine Machine NNP 2163 140 12 Learning Learning NNP 2163 140 13 for for IN 2163 140 14 Language Language NNP 2163 140 15 Toolkit Toolkit NNP 2163 140 16 ) ) -RRB- 2163 140 17 implementation implementation NN 2163 140 18 . . . 2163 141 1 38 38 CD 2163 141 2 The the DT 2163 141 3 output output NN 2163 141 4 of of IN 2163 141 5 the the DT 2163 141 6 chunker chunker NN 2163 141 7 is be VBZ 2163 141 8 post post JJ 2163 141 9 - - JJ 2163 141 10 processed process VBN 2163 141 11 to to TO 2163 141 12 expose expose VB 2163 141 13 a a DT 2163 141 14 series series NN 2163 141 15 of of IN 2163 141 16 fine fine RB 2163 141 17 - - HYPH 2163 141 18 grained grain VBN 2163 141 19 details detail NNS 2163 141 20 . . . 2163 142 1 As as IN 2163 142 2 shown show VBN 2163 142 3 in in IN 2163 142 4 figure figure NN 2163 142 5 1 1 CD 2163 142 6 in in IN 2163 142 7 all all PDT 2163 142 8 the the DT 2163 142 9 examples example NNS 2163 142 10 , , , 2163 142 11 the the DT 2163 142 12 chunking chunking NN 2163 142 13 provides provide VBZ 2163 142 14 a a DT 2163 142 15 blocked block VBN 2163 142 16 partition partition NN 2163 142 17 of of IN 2163 142 18 the the DT 2163 142 19 reference reference NN 2163 142 20 string string NN 2163 142 21 , , , 2163 142 22 but but CC 2163 142 23 we -PRON- PRP 2163 142 24 require require VBP 2163 142 25 for for IN 2163 142 26 the the DT 2163 142 27 Author author NN 2163 142 28 field field NN 2163 142 29 an an DT 2163 142 30 even even RB 2163 142 31 deeper deep JJR 2163 142 32 partition partition NN 2163 142 33 . . . 2163 143 1 Consequently consequently RB 2163 143 2 , , , 2163 143 3 following follow VBG 2163 143 4 a a DT 2163 143 5 rule rule NN 2163 143 6 - - HYPH 2163 143 7 based base VBN 2163 143 8 approach approach NN 2163 143 9 we -PRON- PRP 2163 143 10 extract extract VBP 2163 143 11 the the DT 2163 143 12 individual individual JJ 2163 143 13 author author NN 2163 143 14 names name NNS 2163 143 15 from from IN 2163 143 16 the the DT 2163 143 17 Author author NN 2163 143 18 block block NN 2163 143 19 making make VBG 2163 143 20 use use NN 2163 143 21 of of IN 2163 143 22 the the DT 2163 143 23 punctuation punctuation NN 2163 143 24 marks mark NNS 2163 143 25 , , , 2163 143 26 the the DT 2163 143 27 orthographic orthographic JJ 2163 143 28 case case NN 2163 143 29 , , , 2163 143 30 and and CC 2163 143 31 the the DT 2163 143 32 alternation alternation NN 2163 143 33 between between IN 2163 143 34 initials initial NNS 2163 143 35 and and CC 2163 143 36 actual actual JJ 2163 143 37 names name NNS 2163 143 38 . . . 2163 144 1 When when WRB 2163 144 2 no no DT 2163 144 3 initials initial NNS 2163 144 4 , , , 2163 144 5 subject subject JJ 2163 144 6 to to IN 2163 144 7 the the DT 2163 144 8 existing exist VBG 2163 144 9 punctuation punctuation NN 2163 144 10 marks mark NNS 2163 144 11 , , , 2163 144 12 we -PRON- PRP 2163 144 13 consider consider VBP 2163 144 14 as as IN 2163 144 15 a a DT 2163 144 16 rule rule NN 2163 144 17 - - HYPH 2163 144 18 of of IN 2163 144 19 - - HYPH 2163 144 20 thumb thumb NN 2163 144 21 that that WDT 2163 144 22 each each DT 2163 144 23 name name NN 2163 144 24 generally generally RB 2163 144 25 comprises comprise VBZ 2163 144 26 one one CD 2163 144 27 first first JJ 2163 144 28 name name NN 2163 144 29 and and CC 2163 144 30 one one CD 2163 144 31 surname surname NN 2163 144 32 ( ( -LRB- 2163 144 33 in in IN 2163 144 34 this this DT 2163 144 35 order order NN 2163 144 36 , , , 2163 144 37 i.e. i.e. FW 2163 144 38 , , , 2163 144 39 John John NNP 2163 144 40 Doe Doe NNP 2163 144 41 ) ) -RRB- 2163 144 42 . . . 2163 145 1 The the DT 2163 145 2 result result NN 2163 145 3 of of IN 2163 145 4 the the DT 2163 145 5 post post NN 2163 145 6 - - JJ 2163 145 7 processing processing NN 2163 145 8 is be VBZ 2163 145 9 used use VBN 2163 145 10 in in IN 2163 145 11 the the DT 2163 145 12 linking linking NN 2163 145 13 process process NN 2163 145 14 . . . 2163 146 1 REFERENCE reference NN 2163 146 2 INFORMATION INFORMATION NNS 2163 146 3 EXTRACTION extraction NN 2163 146 4 AND and CC 2163 146 5 PROCESSING processing NN 2163 146 6 |GROZA |groza NN 2163 146 7 , , , 2163 146 8 GRIMNES GRIMNES NNP 2163 146 9 , , , 2163 146 10 AND and CC 2163 146 11 HANDSCHUH handschuh NN 2163 146 12 14 14 CD 2163 146 13 4 4 CD 2163 146 14 . . . 2163 147 1 EXPERIMENTAL experimental NN 2163 147 2 RESULTS results VBP 2163 147 3 We -PRON- PRP 2163 147 4 have have VBP 2163 147 5 performed perform VBN 2163 147 6 an an DT 2163 147 7 extensive extensive JJ 2163 147 8 evaluation evaluation NN 2163 147 9 of of IN 2163 147 10 the the DT 2163 147 11 proposed propose VBN 2163 147 12 reference reference NN 2163 147 13 chunking chunk VBG 2163 147 14 approach approach NN 2163 147 15 . . . 2163 148 1 In in IN 2163 148 2 general general JJ 2163 148 3 , , , 2163 148 4 all all PDT 2163 148 5 the the DT 2163 148 6 previous previous JJ 2163 148 7 work work NN 2163 148 8 in in IN 2163 148 9 reference reference NN 2163 148 10 chunking chunk VBG 2163 148 11 focuses focus NNS 2163 148 12 on on IN 2163 148 13 raw raw JJ 2163 148 14 reference reference NN 2163 148 15 chunking chunking NN 2163 148 16 , , , 2163 148 17 i.e. i.e. FW 2163 148 18 , , , 2163 148 19 label label NNP 2163 148 20 sequencing sequence VBG 2163 148 21 at at IN 2163 148 22 the the DT 2163 148 23 macro macro JJ 2163 148 24 level level NN 2163 148 25 . . . 2163 149 1 More more RBR 2163 149 2 concretely concretely JJ 2163 149 3 , , , 2163 149 4 the the DT 2163 149 5 other other JJ 2163 149 6 approaches approach NNS 2163 149 7 split split VBP 2163 149 8 and and CC 2163 149 9 tag tag VB 2163 149 10 the the DT 2163 149 11 reference reference NN 2163 149 12 strings string NNS 2163 149 13 using use VBG 2163 149 14 blocks block NNS 2163 149 15 of of IN 2163 149 16 complete complete JJ 2163 149 17 references reference NNS 2163 149 18 , , , 2163 149 19 without without IN 2163 149 20 going go VBG 2163 149 21 into into IN 2163 149 22 details detail NNS 2163 149 23 such such JJ 2163 149 24 as as IN 2163 149 25 chunking chunk VBG 2163 149 26 individual individual JJ 2163 149 27 authors author NNS 2163 149 28 . . . 2163 150 1 The the DT 2163 150 2 only only JJ 2163 150 3 exception exception NN 2163 150 4 is be VBZ 2163 150 5 the the DT 2163 150 6 ParsCit ParsCit NNP 2163 150 7 package package NN 2163 150 8 that that WDT 2163 150 9 does do VBZ 2163 150 10 perform perform VB 2163 150 11 complete complete JJ 2163 150 12 reference reference NN 2163 150 13 chunking chunk VBG 2163 150 14 in in IN 2163 150 15 a a DT 2163 150 16 similar similar JJ 2163 150 17 fashion fashion NN 2163 150 18 as as IN 2163 150 19 we -PRON- PRP 2163 150 20 do do VBP 2163 150 21 . . . 2163 151 1 The the DT 2163 151 2 evaluation evaluation NN 2163 151 3 results result NNS 2163 151 4 presented present VBN 2163 151 5 in in IN 2163 151 6 this this DT 2163 151 7 section section NN 2163 151 8 , , , 2163 151 9 will will MD 2163 151 10 feature feature VB 2163 151 11 complete complete JJ 2163 151 12 chunking chunk VBG 2163 151 13 only only RB 2163 151 14 for for IN 2163 151 15 our -PRON- PRP$ 2163 151 16 solution solution NN 2163 151 17 and and CC 2163 151 18 for for IN 2163 151 19 ParsCit ParsCit NNP 2163 151 20 , , , 2163 151 21 and and CC 2163 151 22 raw raw JJ 2163 151 23 chunking chunk VBG 2163 151 24 for for IN 2163 151 25 the the DT 2163 151 26 rest rest NN 2163 151 27 of of IN 2163 151 28 the the DT 2163 151 29 approaches approach NNS 2163 151 30 . . . 2163 152 1 Field Field NNP 2163 152 2 ParsCit ParsCit NNP 2163 152 3 Peng Peng NNP 2163 152 4 Han Han NNP 2163 152 5 et et NNP 2163 152 6 al al NNP 2163 152 7 . . . 2163 153 1 Our -PRON- PRP$ 2163 153 2 approach approach NN 2163 153 3 P P NNP 2163 153 4 R r NN 2163 153 5 F1 f1 NN 2163 153 6 F1 f1 NN 2163 153 7 P P NNP 2163 153 8 R R NNP 2163 153 9 F1 f1 NN 2163 153 10 P P NNP 2163 153 11 R R NNP 2163 153 12 F1 F1 NNP 2163 153 13 Author author NN 2163 153 14 98.7 98.7 CD 2163 153 15 99.3 99.3 CD 2163 153 16 98.99 98.99 CD 2163 153 17 99.4 99.4 CD 2163 153 18 92.6 92.6 CD 2163 153 19 99.1 99.1 CD 2163 153 20 97.6 97.6 CD 2163 153 21 99.08 99.08 CD 2163 153 22 99.6 99.6 CD 2163 153 23 99.30 99.30 CD 2163 153 24 Title Title NNP 2163 153 25 96.0 96.0 CD 2163 153 26 98.4 98.4 CD 2163 153 27 97.18 97.18 CD 2163 153 28 98.3 98.3 CD 2163 153 29 92.2 92.2 CD 2163 153 30 93.0 93.0 CD 2163 153 31 92.6 92.6 CD 2163 153 32 95.64 95.64 CD 2163 153 33 95.64 95.64 CD 2163 153 34 95.64 95.64 CD 2163 153 35 Date date NN 2163 153 36 100 100 CD 2163 153 37 98.4 98.4 CD 2163 153 38 99.19 99.19 CD 2163 153 39 98.9 98.9 CD 2163 153 40 98.5 98.5 CD 2163 153 41 95.9 95.9 CD 2163 153 42 97.2 97.2 CD 2163 153 43 99.33 99.33 CD 2163 153 44 98.67 98.67 CD 2163 153 45 98.99 98.99 CD 2163 153 46 Pages Pages NNPS 2163 153 47 97.7 97.7 CD 2163 153 48 98.4 98.4 CD 2163 153 49 98.04 98.04 CD 2163 153 50 98.6 98.6 CD 2163 153 51 95.6 95.6 CD 2163 153 52 96.9 96.9 CD 2163 153 53 96.2 96.2 CD 2163 153 54 99.28 99.28 CD 2163 153 55 99.22 99.22 CD 2163 153 56 99.24 99.24 CD 2163 153 57 Location Location NNP 2163 153 58 95.6 95.6 CD 2163 153 59 90.0 90.0 CD 2163 153 60 92.71 92.71 CD 2163 153 61 87.2 87.2 CD 2163 153 62 77.7 77.7 CD 2163 153 63 71.5 71.5 CD 2163 153 64 74.5 74.5 CD 2163 153 65 93.45 93.45 CD 2163 153 66 92.59 92.59 CD 2163 153 67 93.01 93.01 CD 2163 153 68 Organization Organization NNP 2163 153 69 90.9 90.9 CD 2163 153 70 87.9 87.9 CD 2163 153 71 89.37 89.37 CD 2163 153 72 94.0 94.0 CD 2163 153 73 76.5 76.5 CD 2163 153 74 77.3 77.3 CD 2163 153 75 76.9 76.9 CD 2163 153 76 100 100 CD 2163 153 77 87.87 87.87 CD 2163 153 78 93.54 93.54 CD 2163 153 79 Journal Journal NNP 2163 153 80 90.8 90.8 CD 2163 153 81 91.2 91.2 CD 2163 153 82 90.99 90.99 CD 2163 153 83 91.3 91.3 CD 2163 153 84 77.1 77.1 CD 2163 153 85 78.7 78.7 CD 2163 153 86 77.9 77.9 CD 2163 153 87 94.02 94.02 CD 2163 153 88 97.42 97.42 CD 2163 153 89 95.68 95.68 CD 2163 153 90 Booktitle Booktitle NNP 2163 153 91 92.7 92.7 CD 2163 153 92 94.2 94.2 CD 2163 153 93 93.44 93.44 CD 2163 153 94 93.7 93.7 CD 2163 153 95 88.7 88.7 CD 2163 153 96 88.9 88.9 CD 2163 153 97 88.88 88.88 CD 2163 153 98 97.77 97.77 CD 2163 153 99 98.44 98.44 CD 2163 153 100 98.10 98.10 CD 2163 153 101 Publisher Publisher NNP 2163 153 102 95.2 95.2 CD 2163 153 103 88.7 88.7 CD 2163 153 104 91.83 91.83 CD 2163 153 105 76.1 76.1 CD 2163 153 106 56.0 56.0 CD 2163 153 107 64.1 64.1 CD 2163 153 108 59.9 59.9 CD 2163 153 109 94.84 94.84 CD 2163 153 110 95.83 95.83 CD 2163 153 111 95.33 95.33 CD 2163 153 112 Tech Tech NNP 2163 153 113 . . . 2163 154 1 rep rep NNP 2163 154 2 . . NNP 2163 154 3 94.0 94.0 CD 2163 154 4 79.6 79.6 CD 2163 154 5 86.2 86.2 CD 2163 154 6 86.7 86.7 CD 2163 154 7 56.2 56.2 CD 2163 154 8 64.1 64.1 CD 2163 154 9 59.9 59.9 CD 2163 154 10 100 100 CD 2163 154 11 90.90 90.90 CD 2163 154 12 95.23 95.23 CD 2163 154 13 Website Website NNP 2163 154 14 - - HYPH 2163 154 15 - - HYPH 2163 154 16 - - HYPH 2163 154 17 - - HYPH 2163 154 18 - - HYPH 2163 154 19 - - HYPH 2163 154 20 - - HYPH 2163 154 21 100 100 CD 2163 154 22 100 100 CD 2163 154 23 100 100 CD 2163 154 24 Table table NN 2163 154 25 1 1 CD 2163 154 26 . . . 2163 155 1 Evaluation Evaluation NNP 2163 155 2 Results Results NNP 2163 155 3 on on IN 2163 155 4 the the DT 2163 155 5 CORA CORA NNP 2163 155 6 Dataset Dataset NNP 2163 155 7 An an DT 2163 155 8 additional additional JJ 2163 155 9 observation observation NN 2163 155 10 we -PRON- PRP 2163 155 11 need need VBP 2163 155 12 to to TO 2163 155 13 make make VB 2163 155 14 is be VBZ 2163 155 15 related relate VBN 2163 155 16 to to IN 2163 155 17 the the DT 2163 155 18 reference reference NN 2163 155 19 fields field NNS 2163 155 20 taken take VBN 2163 155 21 into into IN 2163 155 22 account account NN 2163 155 23 . . . 2163 156 1 Most Most JJS 2163 156 2 of of IN 2163 156 3 the the DT 2163 156 4 fields field NNS 2163 156 5 we -PRON- PRP 2163 156 6 have have VBP 2163 156 7 focused focus VBN 2163 156 8 on on IN 2163 156 9 coincide coincide NN 2163 156 10 with with IN 2163 156 11 the the DT 2163 156 12 fields field NNS 2163 156 13 considered consider VBN 2163 156 14 by by IN 2163 156 15 all all PDT 2163 156 16 the the DT 2163 156 17 existing exist VBG 2163 156 18 relevant relevant JJ 2163 156 19 approaches approach NNS 2163 156 20 . . . 2163 157 1 Nevertheless nevertheless RB 2163 157 2 , , , 2163 157 3 there there EX 2163 157 4 are be VBP 2163 157 5 also also RB 2163 157 6 some some DT 2163 157 7 discrepancies discrepancy NNS 2163 157 8 , , , 2163 157 9 listed list VBN 2163 157 10 as as IN 2163 157 11 follows follow VBZ 2163 157 12 : : : 2163 157 13   NFP 2163 157 14 the the DT 2163 157 15 fields field NNS 2163 157 16 : : : 2163 157 17 Volume volume NN 2163 157 18 , , , 2163 157 19 Number Number NNP 2163 157 20 , , , 2163 157 21 Editors Editors NNP 2163 157 22 , , , 2163 157 23 or or CC 2163 157 24 Note Note NNP 2163 157 25 were be VBD 2163 157 26 used use VBN 2163 157 27 in in IN 2163 157 28 the the DT 2163 157 29 chunking chunking NN 2163 157 30 process process NN 2163 157 31 b b NNP 2163 157 32 u u NNP 2163 157 33 t t NNP 2163 157 34 are be VBP 2163 157 35 not not RB 2163 157 36 considered consider VBN 2163 157 37 for for IN 2163 157 38 evaluation evaluation NN 2163 157 39   NN 2163 157 40 unlike unlike IN 2163 157 41 all all PDT 2163 157 42 the the DT 2163 157 43 other other JJ 2163 157 44 approaches approach NNS 2163 157 45 , , , 2163 157 46 we -PRON- PRP 2163 157 47 make make VBP 2163 157 48 the the DT 2163 157 49 distinction distinction NN 2163 157 50 between between IN 2163 157 51 Conference Conference NNP 2163 157 52 and and CC 2163 157 53 Workshop Workshop NNP 2163 157 54 as as IN 2163 157 55 publication publication NN 2163 157 56 venues venue NNS 2163 157 57 . . . 2163 158 1 However however RB 2163 158 2 , , , 2163 158 3 for for IN 2163 158 4 alignment alignment NN 2163 158 5 purposes purpose NNS 2163 158 6 ( ( -LRB- 2163 158 7 i.e. i.e. FW 2163 158 8 , , , 2163 158 9 to to TO 2163 158 10 be be VB 2163 158 11 able able JJ 2163 158 12 to to TO 2163 158 13 compare compare VB 2163 158 14 our -PRON- PRP$ 2163 158 15 results result NNS 2163 158 16 with with IN 2163 158 17 the the DT 2163 158 18 other other JJ 2163 158 19 approaches approach NNS 2163 158 20 ) ) -RRB- 2163 158 21 , , , 2163 158 22 in in IN 2163 158 23 the the DT 2163 158 24 evaluation evaluation NN 2163 158 25 results result NNS 2163 158 26 these these DT 2163 158 27 are be VBP 2163 158 28 merged merge VBN 2163 158 29 into into IN 2163 158 30 the the DT 2163 158 31 Booktitle Booktitle NNP 2163 158 32 field field NN 2163 158 33 . . . 2163 159 1 The the DT 2163 159 2 actual actual JJ 2163 159 3 tests test NNS 2163 159 4 were be VBD 2163 159 5 performed perform VBN 2163 159 6 on on IN 2163 159 7 four four CD 2163 159 8 different different JJ 2163 159 9 datasets dataset NNS 2163 159 10 , , , 2163 159 11 three three CD 2163 159 12 of of IN 2163 159 13 them -PRON- PRP 2163 159 14 used use VBN 2163 159 15 also also RB 2163 159 16 for for IN 2163 159 17 evaluating evaluate VBG 2163 159 18 the the DT 2163 159 19 other other JJ 2163 159 20 approaches approach NNS 2163 159 21 , , , 2163 159 22 and and CC 2163 159 23 a a DT 2163 159 24 fourth fourth JJ 2163 159 25 one one NN 2163 159 26 compiled compile VBN 2163 159 27 by by IN 2163 159 28 us -PRON- PRP 2163 159 29 . . . 2163 160 1 In in IN 2163 160 2 the the DT 2163 160 3 case case NN 2163 160 4 of of IN 2163 160 5 the the DT 2163 160 6 three three CD 2163 160 7 existing exist VBG 2163 160 8 datasets dataset NNS 2163 160 9 , , , 2163 160 10 during during IN 2163 160 11 the the DT 2163 160 12 experimental experimental JJ 2163 160 13 evaluation evaluation NN 2163 160 14 we -PRON- PRP 2163 160 15 did do VBD 2163 160 16 not not RB 2163 160 17 make make VB 2163 160 18 use use NN 2163 160 19 of of IN 2163 160 20 the the DT 2163 160 21 preprocessing preprocesse VBG 2163 160 22 step step NN 2163 160 23 as as IN 2163 160 24 they -PRON- PRP 2163 160 25 were be VBD 2163 160 26 already already RB 2163 160 27 clean clean JJ 2163 160 28 . . . 2163 161 1 As as IN 2163 161 2 evaluation evaluation NN 2163 161 3 metric metric NN 2163 161 4 , , , 2163 161 5 we -PRON- PRP 2163 161 6 used use VBD 2163 161 7 the the DT 2163 161 8 F1 F1 NNP 2163 161 9 score score NN 2163 161 10 , , , 2163 161 11 39 39 CD 2163 161 12 i.e. i.e. FW 2163 161 13 , , , 2163 161 14 the the DT 2163 161 15 harmonic harmonic JJ 2163 161 16 mean mean NN 2163 161 17 of of IN 2163 161 18 precision precision NNP 2163 161 19 and and CC 2163 161 20 recall recall NN 2163 161 21 , , , 2163 161 22 using use VBG 2163 161 23 the the DT 2163 161 24 following follow VBG 2163 161 25 formula formula NN 2163 161 26 : : : 2163 161 27 INFORMATION INFORMATION NNP 2163 161 28 TECHNOLOGY TECHNOLOGY NNP 2163 161 29 AND and CC 2163 161 30 LIBRARIES LIBRARIES NNP 2163 161 31 | | NNP 2163 161 32 JUNE JUNE NNP 2163 161 33 2012 2012 CD 2163 161 34 15 15 CD 2163 161 35 In in IN 2163 161 36 the the DT 2163 161 37 following following NN 2163 161 38 , , , 2163 161 39 we -PRON- PRP 2163 161 40 iterate iterate VBP 2163 161 41 over over IN 2163 161 42 each each DT 2163 161 43 dataset dataset NN 2163 161 44 , , , 2163 161 45 by by IN 2163 161 46 providing provide VBG 2163 161 47 a a DT 2163 161 48 short short JJ 2163 161 49 description description NN 2163 161 50 and and CC 2163 161 51 the the DT 2163 161 52 experimental experimental JJ 2163 161 53 results result NNS 2163 161 54 . . . 2163 162 1 It -PRON- PRP 2163 162 2 is be VBZ 2163 162 3 worth worth JJ 2163 162 4 mentioning mention VBG 2163 162 5 that that IN 2163 162 6 our -PRON- PRP$ 2163 162 7 CRF CRF NNP 2163 162 8 reference reference NN 2163 162 9 chunker chunker NN 2163 162 10 was be VBD 2163 162 11 trained train VBN 2163 162 12 only only RB 2163 162 13 once once RB 2163 162 14 , , , 2163 162 15 as as IN 2163 162 16 described describe VBN 2163 162 17 earlier early RBR 2163 162 18 , , , 2163 162 19 and and CC 2163 162 20 not not RB 2163 162 21 specifically specifically RB 2163 162 22 for for IN 2163 162 23 each each DT 2163 162 24 dataset dataset NN 2163 162 25 . . . 2163 163 1 4.1 4.1 CD 2163 163 2 Dataset dataset NN 2163 163 3 : : : 2163 163 4 CORA cora VB 2163 163 5 The the DT 2163 163 6 CORA CORA NNP 2163 163 7 dataset dataset NN 2163 163 8 is be VBZ 2163 163 9 the the DT 2163 163 10 first first JJ 2163 163 11 gold gold NN 2163 163 12 standard standard NN 2163 163 13 created create VBN 2163 163 14 for for IN 2163 163 15 automatic automatic JJ 2163 163 16 reference reference NN 2163 163 17 chunking chunk VBG 2163 163 18 . . . 2163 164 1 40 40 CD 2163 164 2 It -PRON- PRP 2163 164 3 comprises comprise VBZ 2163 164 4 two two CD 2163 164 5 hundred hundred CD 2163 164 6 reference reference NN 2163 164 7 strings string NNS 2163 164 8 and and CC 2163 164 9 focuses focus VBZ 2163 164 10 on on IN 2163 164 11 the the DT 2163 164 12 computer computer NN 2163 164 13 science science NN 2163 164 14 area area NN 2163 164 15 . . . 2163 165 1 Each each DT 2163 165 2 entry entry NN 2163 165 3 is be VBZ 2163 165 4 segmented segment VBN 2163 165 5 into into IN 2163 165 6 thirteen thirteen CD 2163 165 7 different different JJ 2163 165 8 fields field NNS 2163 165 9 : : : 2163 165 10 Author author NN 2163 165 11 , , , 2163 165 12 Editor Editor NNP 2163 165 13 , , , 2163 165 14 Title Title NNP 2163 165 15 , , , 2163 165 16 Booktitle Booktitle NNP 2163 165 17 , , , 2163 165 18 Journal Journal NNP 2163 165 19 , , , 2163 165 20 Volume volume NN 2163 165 21 , , , 2163 165 22 Publisher publisher NN 2163 165 23 , , , 2163 165 24 Date date NN 2163 165 25 , , , 2163 165 26 Pages Pages NNPS 2163 165 27 , , , 2163 165 28 Location Location NNP 2163 165 29 , , , 2163 165 30 Tech Tech NNP 2163 165 31 , , , 2163 165 32 Institution Institution NNP 2163 165 33 and and CC 2163 165 34 Note Note NNP 2163 165 35 . . . 2163 166 1 Table table NN 2163 166 2 1 1 CD 2163 166 3 shows show VBZ 2163 166 4 the the DT 2163 166 5 comparative comparative JJ 2163 166 6 evaluation evaluation NN 2163 166 7 results result NNS 2163 166 8 on on IN 2163 166 9 the the DT 2163 166 10 CORA CORA NNP 2163 166 11 dataset dataset NN 2163 166 12 of of IN 2163 166 13 ParsCit ParsCit NNP 2163 166 14 , , , 2163 166 15 Peng Peng NNP 2163 166 16 et et NNP 2163 166 17 al al NNP 2163 166 18 . . NNP 2163 166 19 , , , 2163 166 20 41 41 CD 2163 166 21 Han Han NNP 2163 166 22 et et NNP 2163 166 23 al al NNP 2163 166 24 . . NNP 2163 166 25 , , , 2163 166 26 42 42 CD 2163 166 27 and and CC 2163 166 28 our -PRON- PRP$ 2163 166 29 approach approach NN 2163 166 30 . . . 2163 167 1 We -PRON- PRP 2163 167 2 observe observe VBP 2163 167 3 that that IN 2163 167 4 our -PRON- PRP$ 2163 167 5 chunker chunker NN 2163 167 6 outperforms outperform VBZ 2163 167 7 the the DT 2163 167 8 other other JJ 2163 167 9 chunkers chunker NNS 2163 167 10 on on IN 2163 167 11 most most JJS 2163 167 12 of of IN 2163 167 13 the the DT 2163 167 14 fields field NNS 2163 167 15 , , , 2163 167 16 with with IN 2163 167 17 some some DT 2163 167 18 of of IN 2163 167 19 them -PRON- PRP 2163 167 20 presenting present VBG 2163 167 21 a a DT 2163 167 22 significant significant JJ 2163 167 23 increase increase NN 2163 167 24 in in IN 2163 167 25 performance performance NN 2163 167 26 ( ( -LRB- 2163 167 27 looking look VBG 2163 167 28 at at IN 2163 167 29 the the DT 2163 167 30 F1 F1 NNP 2163 167 31 score score NN 2163 167 32 ) ) -RRB- 2163 167 33 : : : 2163 167 34 Journal Journal NNP 2163 167 35 from from IN 2163 167 36 91.3 91.3 CD 2163 167 37 percent percent NN 2163 167 38 to to IN 2163 167 39 95.68 95.68 CD 2163 167 40 percent percent NN 2163 167 41 , , , 2163 167 42 Booktitle Booktitle NNP 2163 167 43 from from IN 2163 167 44 93.44 93.44 CD 2163 167 45 percent percent NN 2163 167 46 to to IN 2163 167 47 98.10 98.10 CD 2163 167 48 percent percent NN 2163 167 49 , , , 2163 167 50 Publisher publisher NN 2163 167 51 from from IN 2163 167 52 91.83 91.83 CD 2163 167 53 percent percent NN 2163 167 54 to to IN 2163 167 55 95.33 95.33 CD 2163 167 56 percent percent NN 2163 167 57 , , , 2163 167 58 and and CC 2163 167 59 especially especially RB 2163 167 60 Tech Tech NNP 2163 167 61 . . . 2163 168 1 rep rep NNP 2163 168 2 . . NNP 2163 168 3 from from IN 2163 168 4 86.7 86.7 CD 2163 168 5 percent percent NN 2163 168 6 to to IN 2163 168 7 95.23 95.23 CD 2163 168 8 percent percent NN 2163 168 9 . . . 2163 169 1 In in IN 2163 169 2 the the DT 2163 169 3 case case NN 2163 169 4 of of IN 2163 169 5 the the DT 2163 169 6 fields field NNS 2163 169 7 where where WRB 2163 169 8 our -PRON- PRP$ 2163 169 9 chunker chunker NN 2163 169 10 was be VBD 2163 169 11 outperformed outperform VBN 2163 169 12 , , , 2163 169 13 the the DT 2163 169 14 F1 f1 NN 2163 169 15 score score NN 2163 169 16 is be VBZ 2163 169 17 very very RB 2163 169 18 close close JJ 2163 169 19 to to IN 2163 169 20 the the DT 2163 169 21 best good JJS 2163 169 22 of of IN 2163 169 23 the the DT 2163 169 24 approaches approach NNS 2163 169 25 and and CC 2163 169 26 includes include VBZ 2163 169 27 an an DT 2163 169 28 increase increase NN 2163 169 29 in in IN 2163 169 30 one one CD 2163 169 31 of of IN 2163 169 32 its -PRON- PRP$ 2163 169 33 two two CD 2163 169 34 components component NNS 2163 169 35 ( ( -LRB- 2163 169 36 i.e. i.e. FW 2163 169 37 , , , 2163 169 38 precision precision NN 2163 169 39 or or CC 2163 169 40 recall recall NN 2163 169 41 ) ) -RRB- 2163 169 42 . . . 2163 170 1 For for IN 2163 170 2 example example NN 2163 170 3 , , , 2163 170 4 on on IN 2163 170 5 the the DT 2163 170 6 Organization Organization NNP 2163 170 7 field field NN 2163 170 8 , , , 2163 170 9 we -PRON- PRP 2163 170 10 scored score VBD 2163 170 11 93.54percent 93.54percent CD 2163 170 12 , , , 2163 170 13 the the DT 2163 170 14 best good JJS 2163 170 15 being be VBG 2163 170 16 Peng Peng NNP 2163 170 17 's 's POS 2163 170 18 94 94 CD 2163 170 19 percent percent NN 2163 170 20 . . . 2163 171 1 However however RB 2163 171 2 , , , 2163 171 3 we -PRON- PRP 2163 171 4 achieved achieve VBD 2163 171 5 a a DT 2163 171 6 gain gain NN 2163 171 7 of of IN 2163 171 8 almost almost RB 2163 171 9 10 10 CD 2163 171 10 percent percent NN 2163 171 11 in in IN 2163 171 12 precision precision NN 2163 171 13 when when WRB 2163 171 14 compared compare VBN 2163 171 15 with with IN 2163 171 16 ParsCit ParsCit NNP 2163 171 17 ( ( -LRB- 2163 171 18 100 100 CD 2163 171 19 percent percent NN 2163 171 20 vs. vs. IN 2163 171 21 90.9 90.9 CD 2163 171 22 percent percent NN 2163 171 23 precision precision NN 2163 171 24 ) ) -RRB- 2163 171 25 . . . 2163 172 1 Similarly similarly RB 2163 172 2 , , , 2163 172 3 on on IN 2163 172 4 the the DT 2163 172 5 Date Date NNP 2163 172 6 field field NN 2163 172 7 , , , 2163 172 8 our -PRON- PRP$ 2163 172 9 F1 f1 NN 2163 172 10 was be VBD 2163 172 11 98.99 98.99 CD 2163 172 12 percent percent NN 2163 172 13 , , , 2163 172 14 opposed oppose VBN 2163 172 15 to to IN 2163 172 16 ParsCit ParsCit NNP 2163 172 17 's 's POS 2163 172 18 99.19 99.19 CD 2163 172 19 percent percent NN 2163 172 20 , , , 2163 172 21 but but CC 2163 172 22 with with IN 2163 172 23 a a DT 2163 172 24 better well JJR 2163 172 25 recall recall NN 2163 172 26 of of IN 2163 172 27 98.67 98.67 CD 2163 172 28 percent percent NN 2163 172 29 . . . 2163 173 1 Field Field NNP 2163 173 2 ParsCit ParsCit NNP 2163 173 3 FLUX FLUX NNP 2163 173 4 - - HYPH 2163 173 5 CIM cim VB 2163 173 6 Our -PRON- PRP$ 2163 173 7 approach approach NN 2163 173 8 P p NN 2163 173 9 R r NN 2163 173 10 F1 f1 NN 2163 173 11 P p NN 2163 173 12 R r NN 2163 173 13 F1 f1 NN 2163 173 14 P P NNP 2163 173 15 R R NNP 2163 173 16 F1 F1 NNP 2163 173 17 Author author NN 2163 173 18 98.8 98.8 CD 2163 173 19 99.0 99.0 CD 2163 173 20 98.89 98.89 CD 2163 173 21 93.59 93.59 CD 2163 173 22 95.58 95.58 CD 2163 173 23 94.57 94.57 CD 2163 173 24 99.08 99.08 CD 2163 173 25 99.08 99.08 CD 2163 173 26 99.08 99.08 CD 2163 173 27 Title Title NNP 2163 173 28 98.8 98.8 CD 2163 173 29 98.3 98.3 CD 2163 173 30 98.54 98.54 CD 2163 173 31 93.0 93.0 CD 2163 173 32 93.0 93.0 CD 2163 173 33 93.0 93.0 CD 2163 173 34 99.65 99.65 CD 2163 173 35 99.65 99.65 CD 2163 173 36 99.65 99.65 CD 2163 173 37 Date Date NNP 2163 173 38 99.8 99.8 CD 2163 173 39 94.5 94.5 CD 2163 173 40 97.07 97.07 CD 2163 173 41 97.75 97.75 CD 2163 173 42 97.44 97.44 CD 2163 173 43 97.59 97.59 CD 2163 173 44 98.55 98.55 CD 2163 173 45 98.19 98.19 CD 2163 173 46 98.36 98.36 CD 2163 173 47 Pages Pages NNPS 2163 173 48 94.7 94.7 CD 2163 173 49 99.3 99.3 CD 2163 173 50 96.94 96.94 CD 2163 173 51 97.0 97.0 CD 2163 173 52 97.84 97.84 CD 2163 173 53 97.41 97.41 CD 2163 173 54 97.28 97.28 CD 2163 173 55 97.72 97.72 CD 2163 173 56 97.49 97.49 CD 2163 173 57 Location Location NNP 2163 173 58 96.9 96.9 CD 2163 173 59 88.4 88.4 CD 2163 173 60 92.45 92.45 CD 2163 173 61 96.83 96.83 CD 2163 173 62 97.6 97.6 CD 2163 173 63 97.21 97.21 CD 2163 173 64 95.55 95.55 CD 2163 173 65 94.5 94.5 CD 2163 173 66 95.02 95.02 CD 2163 173 67 Journal Journal NNP 2163 173 68 97.1 97.1 CD 2163 173 69 82.9 82.9 CD 2163 173 70 89.43 89.43 CD 2163 173 71 95.71 95.71 CD 2163 173 72 97.81 97.81 CD 2163 173 73 96.75 96.75 CD 2163 173 74 94.0 94.0 CD 2163 173 75 97.91 97.91 CD 2163 173 76 95.91 95.91 CD 2163 173 77 Booktitle Booktitle NNP 2163 173 78 95.7 95.7 CD 2163 173 79 99.3 99.3 CD 2163 173 80 97.46 97.46 CD 2163 173 81 97.47 97.47 CD 2163 173 82 95.45 95.45 CD 2163 173 83 96.45 96.45 CD 2163 173 84 99.13 99.13 CD 2163 173 85 99.13 99.13 CD 2163 173 86 99.13 99.13 CD 2163 173 87 Publisher publisher NN 2163 173 88 98.8 98.8 CD 2163 173 89 75.9 75.9 CD 2163 173 90 85.84 85.84 CD 2163 173 91 100 100 CD 2163 173 92 100 100 CD 2163 173 93 100 100 CD 2163 173 94 98.59 98.59 CD 2163 173 95 98.59 98.59 CD 2163 173 96 98.59 98.59 CD 2163 173 97 Table table NN 2163 173 98 2 2 CD 2163 173 99 . . . 2163 174 1 Evaluation Evaluation NNP 2163 174 2 Results Results NNP 2163 174 3 on on IN 2163 174 4 the the DT 2163 174 5 FLUX FLUX NNP 2163 174 6 - - HYPH 2163 174 7 CIM CIM NNP 2163 174 8 Dataset Dataset NNP 2163 174 9 — — : 2163 174 10 CS CS NNP 2163 174 11 Domain Domain NNP 2163 174 12 Field Field NNP 2163 174 13 FLUX FLUX NNP 2163 174 14 - - HYPH 2163 174 15 CIM cim VB 2163 174 16 Our -PRON- PRP$ 2163 174 17 approach approach NN 2163 174 18 p p NN 2163 174 19 R r NN 2163 174 20 F1 f1 NN 2163 174 21 p p NN 2163 174 22 R R NNP 2163 174 23 F1 F1 NNP 2163 174 24 Author author NN 2163 174 25 98.57 98.57 CD 2163 174 26 99.04 99.04 CD 2163 174 27 98.81 98.81 CD 2163 174 28 99.8 99.8 CD 2163 174 29 99.36 99.36 CD 2163 174 30 99.57 99.57 CD 2163 174 31 Title Title NNP 2163 174 32 84.88 84.88 CD 2163 174 33 85.14 85.14 CD 2163 174 34 85.01 85.01 CD 2163 174 35 91.39 91.39 CD 2163 174 36 91.39 91.39 CD 2163 174 37 97.39 97.39 CD 2163 174 38 Date Date NNP 2163 174 39 99.85 99.85 CD 2163 174 40 99.5 99.5 CD 2163 174 41 99.61 99.61 CD 2163 174 42 99.89 99.89 CD 2163 174 43 99.69 99.69 CD 2163 174 44 99.78 99.78 CD 2163 174 45 Pages Pages NNPS 2163 174 46 99.1 99.1 CD 2163 174 47 99.2 99.2 CD 2163 174 48 99.45 99.45 CD 2163 174 49 99.94 99.94 CD 2163 174 50 99.59 99.59 CD 2163 174 51 99.76 99.76 CD 2163 174 52 Journal Journal NNP 2163 174 53 97.23 97.23 CD 2163 174 54 89.35 89.35 CD 2163 174 55 93.13 93.13 CD 2163 174 56 99.42 99.42 CD 2163 174 57 99.16 99.16 CD 2163 174 58 99.28 99.28 CD 2163 174 59 Table Table NNP 2163 174 60 3 3 CD 2163 174 61 . . . 2163 175 1 Evaluation Evaluation NNP 2163 175 2 Results Results NNP 2163 175 3 on on IN 2163 175 4 the the DT 2163 175 5 FLUX FLUX NNP 2163 175 6 - - HYPH 2163 175 7 CIM CIM NNP 2163 175 8 Dataset Dataset NNP 2163 175 9 — — : 2163 175 10 HS HS NNP 2163 175 11 Domain Domain NNP 2163 175 12 REFERENCE REFERENCE NNP 2163 175 13 INFORMATION INFORMATION VBD 2163 175 14 EXTRACTION extraction NN 2163 175 15 AND and CC 2163 175 16 PROCESSING processing NN 2163 175 17 |GROZA |groza NN 2163 175 18 , , , 2163 175 19 GRIMNES GRIMNES NNP 2163 175 20 , , , 2163 175 21 AND and CC 2163 175 22 HANDSCHUH handschuh NN 2163 175 23 16 16 CD 2163 175 24 4.1 4.1 CD 2163 175 25 Dataset dataset NN 2163 175 26 : : : 2163 175 27 FLUX FLUX NNP 2163 175 28 - - HYPH 2163 175 29 CIM CIM NNP 2163 175 30 FLUX FLUX NNP 2163 175 31 - - HYPH 2163 175 32 CIM CIM NNP 2163 175 33 43 43 CD 2163 175 34 is be VBZ 2163 175 35 an an DT 2163 175 36 unsupervised unsupervise VBN 2163 175 37 44 44 CD 2163 175 38 reference reference NN 2163 175 39 extraction extraction NN 2163 175 40 and and CC 2163 175 41 chunking chunk VBG 2163 175 42 system system NN 2163 175 43 . . . 2163 176 1 In in IN 2163 176 2 order order NN 2163 176 3 to to TO 2163 176 4 evaluate evaluate VB 2163 176 5 its -PRON- PRP$ 2163 176 6 performance performance NN 2163 176 7 , , , 2163 176 8 the the DT 2163 176 9 authors author NNS 2163 176 10 of of IN 2163 176 11 FLUX FLUX NNP 2163 176 12 - - HYPH 2163 176 13 CIM CIM NNP 2163 176 14 created create VBD 2163 176 15 two two CD 2163 176 16 separate separate JJ 2163 176 17 datasets dataset NNS 2163 176 18 : : : 2163 176 19   : 2163 176 20 the the DT 2163 176 21 FLUX FLUX NNP 2163 176 22 - - HYPH 2163 176 23 CIM CIM NNP 2163 176 24 CS CS NNP 2163 176 25 dataset dataset NN 2163 176 26 , , , 2163 176 27 composed compose VBN 2163 176 28 on on IN 2163 176 29 a a DT 2163 176 30 collection collection NN 2163 176 31 of of IN 2163 176 32 heterogeneous heterogeneous JJ 2163 176 33 references reference NNS 2163 176 34 from from IN 2163 176 35 the the DT 2163 176 36 Computer Computer NNP 2163 176 37 Science Science NNP 2163 176 38 field field NN 2163 176 39 , , , 2163 176 40 and and CC 2163 176 41   : 2163 176 42 the the DT 2163 176 43 FLUX FLUX NNP 2163 176 44 - - HYPH 2163 176 45 CIM CIM NNP 2163 176 46 HS HS NNP 2163 176 47 dataset dataset NN 2163 176 48 is be VBZ 2163 176 49 comprised comprise VBN 2163 176 50 of of IN 2163 176 51 an an DT 2163 176 52 organized organize VBN 2163 176 53 and and CC 2163 176 54 controlled controlled JJ 2163 176 55 collection collection NN 2163 176 56 of of IN 2163 176 57 references reference NNS 2163 176 58 from from IN 2163 176 59 PubMed PubMed NNP 2163 176 60 . . . 2163 177 1 The the DT 2163 177 2 FLUX FLUX NNP 2163 177 3 - - HYPH 2163 177 4 CIM CIM NNP 2163 177 5 CS CS NNP 2163 177 6 dataset dataset NN 2163 177 7 contains contain VBZ 2163 177 8 three three CD 2163 177 9 hundred hundred CD 2163 177 10 reference reference NN 2163 177 11 strings string NNS 2163 177 12 randomly randomly RB 2163 177 13 selected select VBN 2163 177 14 from from IN 2163 177 15 the the DT 2163 177 16 ACM ACM NNP 2163 177 17 Digital Digital NNP 2163 177 18 Library Library NNP 2163 177 19 . . . 2163 178 1 Each each DT 2163 178 2 string string NN 2163 178 3 is be VBZ 2163 178 4 segmented segment VBN 2163 178 5 into into IN 2163 178 6 ten ten CD 2163 178 7 fields field NNS 2163 178 8 : : : 2163 178 9 Author author NN 2163 178 10 , , , 2163 178 11 Title Title NNP 2163 178 12 , , , 2163 178 13 Conf Conf NNP 2163 178 14 , , , 2163 178 15 Journal Journal NNP 2163 178 16 , , , 2163 178 17 Volume volume NN 2163 178 18 , , , 2163 178 19 Number Number NNP 2163 178 20 , , , 2163 178 21 Pub Pub NNP 2163 178 22 , , , 2163 178 23 Date date NN 2163 178 24 , , , 2163 178 25 Pages Pages NNPS 2163 178 26 and and CC 2163 178 27 Place Place NNP 2163 178 28 . . . 2163 179 1 The the DT 2163 179 2 FLUX FLUX NNP 2163 179 3 - - HYPH 2163 179 4 CIM CIM NNP 2163 179 5 HS HS NNP 2163 179 6 dataset dataset NN 2163 179 7 contains contain VBZ 2163 179 8 2000 2000 CD 2163 179 9 entries entry NNS 2163 179 10 , , , 2163 179 11 with with IN 2163 179 12 each each DT 2163 179 13 entry entry NN 2163 179 14 segmented segment VBD 2163 179 15 into into IN 2163 179 16 six six CD 2163 179 17 fields field NNS 2163 179 18 : : : 2163 179 19 Author author NN 2163 179 20 , , , 2163 179 21 Title Title NNP 2163 179 22 , , , 2163 179 23 Journal Journal NNP 2163 179 24 , , , 2163 179 25 Volume volume NN 2163 179 26 , , , 2163 179 27 Date Date NNP 2163 179 28 and and CC 2163 179 29 Pages Pages NNPS 2163 179 30 . . . 2163 180 1 Table table NN 2163 180 2 2 2 CD 2163 180 3 presents present VBZ 2163 180 4 the the DT 2163 180 5 comparative comparative JJ 2163 180 6 test test NN 2163 180 7 results result NNS 2163 180 8 achieved achieve VBN 2163 180 9 by by IN 2163 180 10 ParsCit ParsCit NNP 2163 180 11 , , , 2163 180 12 FLUX FLUX NNP 2163 180 13 - - HYPH 2163 180 14 CIM CIM NNP 2163 180 15 , , , 2163 180 16 and and CC 2163 180 17 our -PRON- PRP$ 2163 180 18 approach approach NN 2163 180 19 on on IN 2163 180 20 the the DT 2163 180 21 CS CS NNP 2163 180 22 dataset dataset NN 2163 180 23 . . . 2163 181 1 Similar similar JJ 2163 181 2 to to IN 2163 181 3 the the DT 2163 181 4 CORA CORA NNP 2163 181 5 dataset dataset NN 2163 181 6 , , , 2163 181 7 our -PRON- PRP$ 2163 181 8 chunker chunker NN 2163 181 9 outperformed outperform VBD 2163 181 10 the the DT 2163 181 11 other other JJ 2163 181 12 chunkers chunker NNS 2163 181 13 on on IN 2163 181 14 the the DT 2163 181 15 majority majority NN 2163 181 16 of of IN 2163 181 17 the the DT 2163 181 18 fields field NNS 2163 181 19 , , , 2163 181 20 exceptions exception NNS 2163 181 21 being be VBG 2163 181 22 the the DT 2163 181 23 Location Location NNP 2163 181 24 , , , 2163 181 25 Journal Journal NNP 2163 181 26 , , , 2163 181 27 and and CC 2163 181 28 Publisher Publisher NNP 2163 181 29 fields field NNS 2163 181 30 . . . 2163 182 1 The the DT 2163 182 2 test test NN 2163 182 3 results result VBZ 2163 182 4 on on IN 2163 182 5 the the DT 2163 182 6 HS HS NNP 2163 182 7 dataset dataset NN 2163 182 8 are be VBP 2163 182 9 presented present VBN 2163 182 10 in in IN 2163 182 11 table table NN 2163 182 12 3 3 CD 2163 182 13 . . . 2163 183 1 Here here RB 2163 183 2 we -PRON- PRP 2163 183 3 can can MD 2163 183 4 observe observe VB 2163 183 5 a a DT 2163 183 6 clear clear JJ 2163 183 7 performance performance NN 2163 183 8 improvement improvement NN 2163 183 9 on on IN 2163 183 10 all all DT 2163 183 11 fields field NNS 2163 183 12 , , , 2163 183 13 in in IN 2163 183 14 some some DT 2163 183 15 cases case NNS 2163 183 16 the the DT 2163 183 17 difference difference NN 2163 183 18 being be VBG 2163 183 19 significant significant JJ 2163 183 20 , , , 2163 183 21 e.g. e.g. RB 2163 183 22 , , , 2163 183 23 the the DT 2163 183 24 Title title NN 2163 183 25 field field NN 2163 183 26 , , , 2163 183 27 from from IN 2163 183 28 85.01 85.01 CD 2163 183 29 percent percent NN 2163 183 30 to to IN 2163 183 31 97.39 97.39 CD 2163 183 32 percent percent NN 2163 183 33 , , , 2163 183 34 or or CC 2163 183 35 the the DT 2163 183 36 Journal Journal NNP 2163 183 37 field field NN 2163 183 38 , , , 2163 183 39 from from IN 2163 183 40 93.12 93.12 CD 2163 183 41 percent percent NN 2163 183 42 to to IN 2163 183 43 99.28 99.28 CD 2163 183 44 percent percent NN 2163 183 45 . . . 2163 184 1 This this DT 2163 184 2 increase increase NN 2163 184 3 is be VBZ 2163 184 4 even even RB 2163 184 5 more more RBR 2163 184 6 relevant relevant JJ 2163 184 7 considering consider VBG 2163 184 8 the the DT 2163 184 9 size size NN 2163 184 10 of of IN 2163 184 11 the the DT 2163 184 12 dataset dataset NN 2163 184 13 , , , 2163 184 14 each each DT 2163 184 15 1percent 1percent CD 2163 184 16 representing represent VBG 2163 184 17 twenty twenty CD 2163 184 18 references reference NNS 2163 184 19 . . . 2163 185 1 4.3 4.3 CD 2163 185 2 Dataset dataset NN 2163 185 3 : : : 2163 185 4 CS CS NNP 2163 185 5 - - HYPH 2163 185 6 SW SW NNP 2163 185 7 While while IN 2163 185 8 the the DT 2163 185 9 CORA CORA NNP 2163 185 10 and and CC 2163 185 11 FLUX FLUX NNP 2163 185 12 - - HYPH 2163 185 13 CIM CIM NNP 2163 185 14 CS CS NNP 2163 185 15 datasets dataset NNS 2163 185 16 do do VBP 2163 185 17 focus focus VB 2163 185 18 on on IN 2163 185 19 the the DT 2163 185 20 computer computer NN 2163 185 21 science science NN 2163 185 22 field field NN 2163 185 23 , , , 2163 185 24 they -PRON- PRP 2163 185 25 do do VBP 2163 185 26 not not RB 2163 185 27 cover cover VB 2163 185 28 the the DT 2163 185 29 slight slight JJ 2163 185 30 differences difference NNS 2163 185 31 in in IN 2163 185 32 reference reference NN 2163 185 33 format format NN 2163 185 34 that that WDT 2163 185 35 can can MD 2163 185 36 be be VB 2163 185 37 found find VBN 2163 185 38 nowadays nowadays RB 2163 185 39 in in IN 2163 185 40 the the DT 2163 185 41 Semantic semantic JJ 2163 185 42 Web web NN 2163 185 43 community community NN 2163 185 44 . . . 2163 186 1 Consequently consequently RB 2163 186 2 , , , 2163 186 3 to to TO 2163 186 4 show show VB 2163 186 5 the the DT 2163 186 6 even even RB 2163 186 7 broader broad JJR 2163 186 8 application application NN 2163 186 9 of of IN 2163 186 10 our -PRON- PRP$ 2163 186 11 approach approach NN 2163 186 12 , , , 2163 186 13 we -PRON- PRP 2163 186 14 have have VBP 2163 186 15 compiled compile VBN 2163 186 16 a a DT 2163 186 17 dataset dataset NN 2163 186 18 named name VBN 2163 186 19 CS CS NNP 2163 186 20 - - HYPH 2163 186 21 SW SW NNP 2163 186 22 comprising comprise VBG 2163 186 23 576 576 CD 2163 186 24 reference reference NN 2163 186 25 strings string NNS 2163 186 26 randomly randomly RB 2163 186 27 selected select VBN 2163 186 28 from from IN 2163 186 29 publications publication NNS 2163 186 30 in in IN 2163 186 31 the the DT 2163 186 32 Semantic Semantic NNP 2163 186 33 Web Web NNP 2163 186 34 area area NN 2163 186 35 , , , 2163 186 36 from from IN 2163 186 37 conferences conference NNS 2163 186 38 such such JJ 2163 186 39 as as IN 2163 186 40 International International NNP 2163 186 41 Semantic Semantic NNP 2163 186 42 Web Web NNP 2163 186 43 Conference Conference NNP 2163 186 44 ( ( -LRB- 2163 186 45 ISWC ISWC NNP 2163 186 46 ) ) -RRB- 2163 186 47 , , , 2163 186 48 the the DT 2163 186 49 European European NNP 2163 186 50 Semantic Semantic NNP 2163 186 51 Web Web NNP 2163 186 52 Conference Conference NNP 2163 186 53 ( ( -LRB- 2163 186 54 ESWC ESWC NNP 2163 186 55 ) ) -RRB- 2163 186 56 , , , 2163 186 57 the the DT 2163 186 58 World World NNP 2163 186 59 Wide Wide NNP 2163 186 60 Web Web NNP 2163 186 61 Conference Conference NNP 2163 186 62 ( ( -LRB- 2163 186 63 WWW WWW NNP 2163 186 64 ) ) -RRB- 2163 186 65 , , , 2163 186 66 or or CC 2163 186 67 the the DT 2163 186 68 European European NNP 2163 186 69 Conference Conference NNP 2163 186 70 on on IN 2163 186 71 Knowledge Knowledge NNP 2163 186 72 Acquisition acquisition NN 2163 186 73 ( ( -LRB- 2163 186 74 and and CC 2163 186 75 co co JJ 2163 186 76 - - JJ 2163 186 77 located locate VBN 2163 186 78 workshops workshop NNS 2163 186 79 ) ) -RRB- 2163 186 80 . . . 2163 187 1 45 45 CD 2163 187 2 Each each DT 2163 187 3 reference reference NN 2163 187 4 entry entry NN 2163 187 5 is be VBZ 2163 187 6 segmented segment VBN 2163 187 7 into into IN 2163 187 8 twelve twelve CD 2163 187 9 fields field NNS 2163 187 10 : : : 2163 187 11 Author author NN 2163 187 12 , , , 2163 187 13 Title Title NNP 2163 187 14 , , , 2163 187 15 Conference Conference NNP 2163 187 16 , , , 2163 187 17 Workshop Workshop NNP 2163 187 18 , , , 2163 187 19 Journal Journal NNP 2163 187 20 , , , 2163 187 21 Techrep Techrep NNP 2163 187 22 , , , 2163 187 23 Organization Organization NNP 2163 187 24 , , , 2163 187 25 Publisher Publisher NNP 2163 187 26 , , , 2163 187 27 Date date NN 2163 187 28 , , , 2163 187 29 Pages Pages NNPS 2163 187 30 , , , 2163 187 31 Website Website NNP 2163 187 32 and and CC 2163 187 33 Location Location NNP 2163 187 34 . . . 2163 188 1 Table table NN 2163 188 2 4 4 CD 2163 188 3 shows show VBZ 2163 188 4 the the DT 2163 188 5 results result NNS 2163 188 6 of of IN 2163 188 7 the the DT 2163 188 8 tests test NNS 2163 188 9 carried carry VBN 2163 188 10 out out RP 2163 188 11 on on IN 2163 188 12 this this DT 2163 188 13 dataset dataset NN 2163 188 14 . . . 2163 189 1 One one PRP 2163 189 2 can can MD 2163 189 3 easily easily RB 2163 189 4 observe observe VB 2163 189 5 that that IN 2163 189 6 the the DT 2163 189 7 chunker chunker NN 2163 189 8 performed perform VBN 2163 189 9 in in IN 2163 189 10 a a DT 2163 189 11 similar similar JJ 2163 189 12 manner manner NN 2163 189 13 as as IN 2163 189 14 on on IN 2163 189 15 the the DT 2163 189 16 CORA CORA NNP 2163 189 17 dataset dataset NN 2163 189 18 , , , 2163 189 19 with with IN 2163 189 20 emphasis emphasis NN 2163 189 21 on on IN 2163 189 22 the the DT 2163 189 23 Author author NN 2163 189 24 , , , 2163 189 25 Date Date NNP 2163 189 26 , , , 2163 189 27 Pages Pages NNPS 2163 189 28 and and CC 2163 189 29 Publisher Publisher NNP 2163 189 30 fields field NNS 2163 189 31 . . . 2163 190 1 Field field VB 2163 190 2 Our -PRON- PRP$ 2163 190 3 approach approach NN 2163 190 4 P P NNP 2163 190 5 R R NNP 2163 190 6 F1 F1 NNP 2163 190 7 Author author NN 2163 190 8 98.61 98.61 CD 2163 190 9 99.27 99.27 CD 2163 190 10 98.93 98.93 CD 2163 190 11 Title Title NNP 2163 190 12 94.91 94.91 CD 2163 190 13 93.29 93.29 CD 2163 190 14 94.09 94.09 CD 2163 190 15 Date date NN 2163 190 16 98.89 98.89 CD 2163 190 17 98.34 98.34 CD 2163 190 18 98.61 98.61 CD 2163 190 19 Pages Pages NNPS 2163 190 20 98.94 98.94 CD 2163 190 21 97.24 97.24 CD 2163 190 22 98.08 98.08 CD 2163 190 23 Location Location NNP 2163 190 24 93.9 93.9 CD 2163 190 25 92.77 92.77 CD 2163 190 26 93.33 93.33 CD 2163 190 27 Organization Organization NNP 2163 190 28 85.71 85.71 CD 2163 190 29 80 80 CD 2163 190 30 00 00 CD 2163 190 31 82.75 82.75 CD 2163 190 32 Journal Journal NNP 2163 190 33 94.59 94.59 CD 2163 190 34 93.33 93.33 CD 2163 190 35 93.95 93.95 CD 2163 190 36 INFORMATION INFORMATION NNP 2163 190 37 TECHNOLOGY technology NN 2163 190 38 AND and CC 2163 190 39 LIBRARIES LIBRARIES NNP 2163 190 40 | | NNP 2163 190 41 JUNE JUNE NNP 2163 190 42 2012 2012 CD 2163 190 43 17 17 CD 2163 190 44 Conference Conference NNP 2163 190 45 96.66 96.66 CD 2163 190 46 95.08 95.08 CD 2163 190 47 95.86 95.86 CD 2163 190 48 Workshop Workshop NNP 2163 190 49 83.33 83.33 CD 2163 190 50 88.23 88.23 CD 2163 190 51 85.71 85.71 CD 2163 190 52 Publisher Publisher NNP 2163 190 53 96.61 96.61 CD 2163 190 54 97.43 97.43 CD 2163 190 55 97.01 97.01 CD 2163 190 56 Tech Tech NNP 2163 190 57 . . . 2163 191 1 rep rep NNP 2163 191 2 . . NNP 2163 191 3 100 100 CD 2163 191 4 80 80 CD 2163 191 5 88.88 88.88 CD 2163 191 6 Website Website NNP 2163 191 7 98.14 98.14 CD 2163 191 8 94.64 94.64 CD 2163 191 9 96.35 96.35 CD 2163 191 10 Table Table NNP 2163 191 11 4 4 CD 2163 191 12 . . . 2163 192 1 Evaluation Evaluation NNP 2163 192 2 Results Results NNP 2163 192 3 on on IN 2163 192 4 the the DT 2163 192 5 CS CS NNP 2163 192 6 - - HYPH 2163 192 7 SW SW NNP 2163 192 8 Dataset Dataset NNP 2163 192 9 5 5 CD 2163 192 10 . . . 2163 193 1 CONCLUSION CONCLUSION NNP 2163 193 2 In in IN 2163 193 3 this this DT 2163 193 4 paper paper NN 2163 193 5 we -PRON- PRP 2163 193 6 presented present VBD 2163 193 7 a a DT 2163 193 8 novel novel JJ 2163 193 9 approach approach NN 2163 193 10 for for IN 2163 193 11 extracting extract VBG 2163 193 12 and and CC 2163 193 13 chunking chunk VBG 2163 193 14 reference reference NN 2163 193 15 information information NN 2163 193 16 from from IN 2163 193 17 scientific scientific JJ 2163 193 18 publications publication NNS 2163 193 19 . . . 2163 194 1 The the DT 2163 194 2 solution solution NN 2163 194 3 , , , 2163 194 4 realized realize VBD 2163 194 5 using use VBG 2163 194 6 a a DT 2163 194 7 CRF CRF NNP 2163 194 8 trained train VBN 2163 194 9 chunker chunker NN 2163 194 10 , , , 2163 194 11 achieved achieve VBD 2163 194 12 good good JJ 2163 194 13 results result NNS 2163 194 14 in in IN 2163 194 15 the the DT 2163 194 16 experimental experimental JJ 2163 194 17 evaluation evaluation NN 2163 194 18 , , , 2163 194 19 in in IN 2163 194 20 addition addition NN 2163 194 21 to to IN 2163 194 22 an an DT 2163 194 23 increased increase VBN 2163 194 24 versatility versatility NN 2163 194 25 shown show VBN 2163 194 26 by by IN 2163 194 27 applying apply VBG 2163 194 28 the the DT 2163 194 29 one one CD 2163 194 30 - - HYPH 2163 194 31 time time NN 2163 194 32 trained train VBN 2163 194 33 chunker chunker NN 2163 194 34 on on IN 2163 194 35 multiple multiple JJ 2163 194 36 testing testing NN 2163 194 37 datasets dataset NNS 2163 194 38 . . . 2163 195 1 This this DT 2163 195 2 enables enable VBZ 2163 195 3 a a DT 2163 195 4 straightforward straightforward JJ 2163 195 5 adoption adoption NN 2163 195 6 and and CC 2163 195 7 reuse reuse NN 2163 195 8 of of IN 2163 195 9 our -PRON- PRP$ 2163 195 10 solution solution NN 2163 195 11 for for IN 2163 195 12 generating generate VBG 2163 195 13 semantic semantic JJ 2163 195 14 metadata metadata NN 2163 195 15 in in IN 2163 195 16 any any DT 2163 195 17 digital digital JJ 2163 195 18 library library NN 2163 195 19 or or CC 2163 195 20 publication publication NN 2163 195 21 repository repository NN 2163 195 22 focused focus VBD 2163 195 23 on on IN 2163 195 24 scientific scientific JJ 2163 195 25 publishing publishing NN 2163 195 26 . . . 2163 196 1 As as IN 2163 196 2 next next JJ 2163 196 3 steps step NNS 2163 196 4 , , , 2163 196 5 we -PRON- PRP 2163 196 6 plan plan VBP 2163 196 7 to to TO 2163 196 8 create create VB 2163 196 9 a a DT 2163 196 10 comprehensive comprehensive JJ 2163 196 11 dataset dataset NN 2163 196 12 covering cover VBG 2163 196 13 multiple multiple JJ 2163 196 14 heterogeneous heterogeneous JJ 2163 196 15 domains domain NNS 2163 196 16 ( ( -LRB- 2163 196 17 e.g. e.g. RB 2163 196 18 , , , 2163 196 19 social social JJ 2163 196 20 sciences sciences NNPS 2163 196 21 or or CC 2163 196 22 digital digital JJ 2163 196 23 humanities humanity NNS 2163 196 24 ) ) -RRB- 2163 196 25 and and CC 2163 196 26 evaluate evaluate VB 2163 196 27 the the DT 2163 196 28 chunker chunker NN 2163 196 29 ’s ’s POS 2163 196 30 performance performance NN 2163 196 31 on on IN 2163 196 32 it -PRON- PRP 2163 196 33 . . . 2163 197 1 Then then RB 2163 197 2 we -PRON- PRP 2163 197 3 will will MD 2163 197 4 focus focus VB 2163 197 5 on on IN 2163 197 6 developing develop VBG 2163 197 7 an an DT 2163 197 8 accurate accurate JJ 2163 197 9 reference reference NN 2163 197 10 consolidation consolidation NN 2163 197 11 and and CC 2163 197 12 linking linking NN 2163 197 13 technique technique NN 2163 197 14 , , , 2163 197 15 to to TO 2163 197 16 address address VB 2163 197 17 the the DT 2163 197 18 second second JJ 2163 197 19 step step NN 2163 197 20 mentioned mention VBN 2163 197 21 in in IN 2163 197 22 section section NN 2163 197 23 1 1 CD 2163 197 24 , , , 2163 197 25 i.e. i.e. FW 2163 197 26 , , , 2163 197 27 aligning align VBG 2163 197 28 the the DT 2163 197 29 resulting result VBG 2163 197 30 metadata metadata NN 2163 197 31 to to IN 2163 197 32 the the DT 2163 197 33 existing exist VBG 2163 197 34 Linked Linked NNP 2163 197 35 Data Data NNP 2163 197 36 on on IN 2163 197 37 the the DT 2163 197 38 Web web NN 2163 197 39 . . . 2163 198 1 We -PRON- PRP 2163 198 2 plan plan VBP 2163 198 3 to to TO 2163 198 4 develop develop VB 2163 198 5 a a DT 2163 198 6 flexible flexible JJ 2163 198 7 consolidation consolidation NN 2163 198 8 mechanism mechanism NN 2163 198 9 by by IN 2163 198 10 dynamically dynamically RB 2163 198 11 generating generate VBG 2163 198 12 and and CC 2163 198 13 executing execute VBG 2163 198 14 SPARQL sparql JJ 2163 198 15 queries query NNS 2163 198 16 from from IN 2163 198 17 chunked chunk VBN 2163 198 18 reference reference NN 2163 198 19 fields field NNS 2163 198 20 and and CC 2163 198 21 filtering filter VBG 2163 198 22 the the DT 2163 198 23 results result NNS 2163 198 24 via via IN 2163 198 25 two two CD 2163 198 26 string string NN 2163 198 27 approximation approximation NN 2163 198 28 metrics metric NNS 2163 198 29 ( ( -LRB- 2163 198 30 a a DT 2163 198 31 combination combination NN 2163 198 32 of of IN 2163 198 33 Monge Monge NNP 2163 198 34 - - HYPH 2163 198 35 Elkan Elkan NNP 2163 198 36 and and CC 2163 198 37 Chapman Chapman NNP 2163 198 38 Soundex Soundex NNP 2163 198 39 algorithms algorithm NNS 2163 198 40 ) ) -RRB- 2163 198 41 . . . 2163 199 1 The the DT 2163 199 2 SPARQL sparql NN 2163 199 3 queries query NNS 2163 199 4 generation generation NN 2163 199 5 will will MD 2163 199 6 be be VB 2163 199 7 implemented implement VBN 2163 199 8 in in IN 2163 199 9 an an DT 2163 199 10 extensible extensible JJ 2163 199 11 manner manner NN 2163 199 12 , , , 2163 199 13 via via IN 2163 199 14 customizable customizable JJ 2163 199 15 query query NN 2163 199 16 modules module NNS 2163 199 17 , , , 2163 199 18 to to TO 2163 199 19 accommodate accommodate VB 2163 199 20 the the DT 2163 199 21 heterogeneous heterogeneous JJ 2163 199 22 nature nature NN 2163 199 23 of of IN 2163 199 24 the the DT 2163 199 25 diverse diverse JJ 2163 199 26 Linked Linked NNP 2163 199 27 Data Data NNP 2163 199 28 sources source NNS 2163 199 29 . . . 2163 200 1 Finally finally RB 2163 200 2 , , , 2163 200 3 we -PRON- PRP 2163 200 4 intend intend VBP 2163 200 5 to to TO 2163 200 6 develop develop VB 2163 200 7 an an DT 2163 200 8 overlay overlay NN 2163 200 9 interface interface NN 2163 200 10 for for IN 2163 200 11 arbitrary arbitrary JJ 2163 200 12 online online JJ 2163 200 13 publication publication NN 2163 200 14 repositories repository NNS 2163 200 15 , , , 2163 200 16 to to TO 2163 200 17 enable enable VB 2163 200 18 on on IN 2163 200 19 - - HYPH 2163 200 20 the the DT 2163 200 21 - - HYPH 2163 200 22 fly fly NN 2163 200 23 creation creation NN 2163 200 24 , , , 2163 200 25 visualization visualization NN 2163 200 26 , , , 2163 200 27 and and CC 2163 200 28 linking linking NN 2163 200 29 of of IN 2163 200 30 semantic semantic JJ 2163 200 31 metadata metadata NN 2163 200 32 from from IN 2163 200 33 repositories repository NNS 2163 200 34 that that WDT 2163 200 35 currently currently RB 2163 200 36 do do VBP 2163 200 37 not not RB 2163 200 38 expose expose VB 2163 200 39 their -PRON- PRP$ 2163 200 40 datasets dataset NNS 2163 200 41 in in IN 2163 200 42 a a DT 2163 200 43 semantic semantic JJ 2163 200 44 / / SYM 2163 200 45 linked link VBN 2163 200 46 manner manner NN 2163 200 47 . . . 2163 201 1 ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS NNP 2163 201 2 The the DT 2163 201 3 work work NN 2163 201 4 presented present VBN 2163 201 5 in in IN 2163 201 6 this this DT 2163 201 7 paper paper NN 2163 201 8 has have VBZ 2163 201 9 been be VBN 2163 201 10 funded fund VBN 2163 201 11 by by IN 2163 201 12 Science Science NNP 2163 201 13 Foundation Foundation NNP 2163 201 14 Ireland Ireland NNP 2163 201 15 under under IN 2163 201 16 Grant Grant NNP 2163 201 17 No No NNP 2163 201 18 . . . 2163 202 1 SFI/08 sfi/08 JJ 2163 202 2 / / SYM 2163 202 3 CE CE NNP 2163 202 4 / / SYM 2163 202 5 I1380 I1380 NNP 2163 202 6 ( ( -LRB- 2163 202 7 Lion-2 Lion-2 NNP 2163 202 8 ) ) -RRB- 2163 202 9 . . . 2163 203 1 REFERENCES reference NNS 2163 203 2 AND and CC 2163 203 3 NOTES note NNS 2163 203 4 1 1 CD 2163 203 5 . . . 2163 204 1 Tim Tim NNP 2163 204 2 Berners Berners NNP 2163 204 3 - - HYPH 2163 204 4 Lee Lee NNP 2163 204 5 et et NNP 2163 204 6 al al NNP 2163 204 7 . . NNP 2163 204 8 , , , 2163 204 9 “ " `` 2163 204 10 The the DT 2163 204 11 Semantic semantic JJ 2163 204 12 Web web NN 2163 204 13 , , , 2163 204 14 ” " '' 2163 204 15 Scientific scientific JJ 2163 204 16 American American NNP 2163 204 17 284 284 CD 2163 204 18 ( ( -LRB- 2163 204 19 2001 2001 CD 2163 204 20 ) ) -RRB- 2163 204 21 : : : 2163 204 22 35–43 35–43 NN 2163 204 23 . . . 2163 205 1 2 2 LS 2163 205 2 . . . 2163 206 1 Christian Christian NNP 2163 206 2 Bizer Bizer NNP 2163 206 3 et et NNP 2163 206 4 al al NNP 2163 206 5 . . NNP 2163 206 6 , , , 2163 206 7 “ " `` 2163 206 8 Linked link VBN 2163 206 9 Data Data NNP 2163 206 10 — — : 2163 206 11 The The NNP 2163 206 12 Story Story NNP 2163 206 13 So so RB 2163 206 14 Far far RB 2163 206 15 , , , 2163 206 16 ” " '' 2163 206 17 International International NNP 2163 206 18 Journal Journal NNP 2163 206 19 on on IN 2163 206 20 Semantic Semantic NNP 2163 206 21 Web Web NNP 2163 206 22 and and CC 2163 206 23 Information Information NNP 2163 206 24 Systems Systems NNP 2163 206 25 5 5 CD 2163 206 26 ( ( -LRB- 2163 206 27 2009 2009 CD 2163 206 28 ) ) -RRB- 2163 206 29 : : : 2163 206 30 1–22 1–22 CD 2163 206 31 . . . 2163 207 1 3 3 LS 2163 207 2 . . . 2163 208 1 Generating generate VBG 2163 208 2 computer computer NN 2163 208 3 - - HYPH 2163 208 4 understandable understandable JJ 2163 208 5 metadata metadata NN 2163 208 6 represents represent VBZ 2163 208 7 an an DT 2163 208 8 issue issue NN 2163 208 9 , , , 2163 208 10 in in IN 2163 208 11 general general JJ 2163 208 12 , , , 2163 208 13 in in IN 2163 208 14 the the DT 2163 208 15 publishing publishing NN 2163 208 16 domain domain NN 2163 208 17 , , , 2163 208 18 and and CC 2163 208 19 not not RB 2163 208 20 necessarily necessarily RB 2163 208 21 only only RB 2163 208 22 in in IN 2163 208 23 its -PRON- PRP$ 2163 208 24 scientific scientific JJ 2163 208 25 area area NN 2163 208 26 . . . 2163 209 1 However however RB 2163 209 2 , , , 2163 209 3 the the DT 2163 209 4 relevant relevant JJ 2163 209 5 literature literature NN 2163 209 6 dealing deal VBG 2163 209 7 with with IN 2163 209 8 metadata metadata NN 2163 209 9 extraction extraction NN 2163 209 10 / / SYM 2163 209 11 generation generation NN 2163 209 12 has have VBZ 2163 209 13 focused focus VBN 2163 209 14 on on IN 2163 209 15 scientific scientific JJ 2163 209 16 publishing publishing NN 2163 209 17 , , , 2163 209 18 because because IN 2163 209 19 of of IN 2163 209 20 its -PRON- PRP$ 2163 209 21 accelerated accelerated JJ 2163 209 22 growing grow VBG 2163 209 23 rate rate NN 2163 209 24 , , , 2163 209 25 especially especially RB 2163 209 26 with with IN 2163 209 27 the the DT 2163 209 28 increasing increase VBG 2163 209 29 use use NN 2163 209 30 of of IN 2163 209 31 the the DT 2163 209 32 World World NNP 2163 209 33 Wide Wide NNP 2163 209 34 Web Web NNP 2163 209 35 as as IN 2163 209 36 a a DT 2163 209 37 dissemination dissemination NN 2163 209 38 mechanism mechanism NN 2163 209 39 . . . 2163 210 1 REFERENCE reference NN 2163 210 2 INFORMATION INFORMATION NNS 2163 210 3 EXTRACTION extraction NN 2163 210 4 AND and CC 2163 210 5 PROCESSING processing NN 2163 210 6 |GROZA |groza NN 2163 210 7 , , , 2163 210 8 GRIMNES GRIMNES NNP 2163 210 9 , , , 2163 210 10 AND and CC 2163 210 11 HANDSCHUH HANDSCHUH NNP 2163 210 12 18 18 CD 2163 210 13 4 4 CD 2163 210 14 . . . 2163 211 1 Knud Knud NNP 2163 211 2 Moeller Moeller NNP 2163 211 3 et et FW 2163 211 4 al al NNP 2163 211 5 . . NNP 2163 211 6 , , , 2163 211 7 “ " `` 2163 211 8 Recipes recipe NNS 2163 211 9 for for IN 2163 211 10 Semantic Semantic NNP 2163 211 11 Web Web NNP 2163 211 12 Dog Dog NNP 2163 211 13 Food Food NNP 2163 211 14 – – : 2163 211 15 The the DT 2163 211 16 ESWC eswc NN 2163 211 17 and and CC 2163 211 18 ISWC ISWC NNP 2163 211 19 Metadata Metadata NNP 2163 211 20 Projects Projects NNPS 2163 211 21 , , , 2163 211 22 ” " '' 2163 211 23 Proceedings proceeding NNS 2163 211 24 of of IN 2163 211 25 the the DT 2163 211 26 6th 6th JJ 2163 211 27 International International NNP 2163 211 28 Semantic Semantic NNP 2163 211 29 Web Web NNP 2163 211 30 Conference Conference NNP 2163 211 31 ( ( -LRB- 2163 211 32 Busan Busan NNP 2163 211 33 , , , 2163 211 34 Korea Korea NNP 2163 211 35 , , , 2163 211 36 2007 2007 CD 2163 211 37 ) ) -RRB- 2163 211 38 . . . 2163 212 1 5 5 CD 2163 212 2 . . . 2163 213 1 Wei Wei NNP 2163 213 2 Peng Peng NNP 2163 213 3 and and CC 2163 213 4 Tao Tao NNP 2163 213 5 Li Li NNP 2163 213 6 , , , 2163 213 7 “ " `` 2163 213 8 Temporal temporal JJ 2163 213 9 relation relation NN 2163 213 10 co co JJ 2163 213 11 - - JJ 2163 213 12 clustering clustering NN 2163 213 13 on on IN 2163 213 14 directional directional JJ 2163 213 15 social social JJ 2163 213 16 network network NN 2163 213 17 and and CC 2163 213 18 author author NN 2163 213 19 - - HYPH 2163 213 20 topic topic NN 2163 213 21 evolution evolution NN 2163 213 22 , , , 2163 213 23 ” " '' 2163 213 24 Knowledge Knowledge NNP 2163 213 25 and and CC 2163 213 26 Information Information NNP 2163 213 27 Systems Systems NNP 2163 213 28 26 26 CD 2163 213 29 ( ( -LRB- 2163 213 30 2011 2011 CD 2163 213 31 ) ) -RRB- 2163 213 32 : : : 2163 213 33 467–86 467–86 CD 2163 213 34 . . . 2163 214 1 6 6 CD 2163 214 2 . . . 2163 215 1 Laszlo Laszlo NNP 2163 215 2 Barabasi Barabasi NNP 2163 215 3 et et FW 2163 215 4 al al NNP 2163 215 5 . . NNP 2163 215 6 , , , 2163 215 7 “ " `` 2163 215 8 Evolution evolution NN 2163 215 9 of of IN 2163 215 10 the the DT 2163 215 11 social social JJ 2163 215 12 network network NN 2163 215 13 of of IN 2163 215 14 scientific scientific JJ 2163 215 15 collaborations collaboration NNS 2163 215 16 , , , 2163 215 17 ” " '' 2163 215 18 Physica Physica NNP 2163 215 19 A A NNP 2163 215 20 : : : 2163 215 21 Statistical Statistical NNP 2163 215 22 Mechanics Mechanics NNPS 2163 215 23 and and CC 2163 215 24 its -PRON- PRP$ 2163 215 25 Applications Applications NNPS 2163 215 26 311 311 CD 2163 215 27 ( ( -LRB- 2163 215 28 2002 2002 CD 2163 215 29 ) ) -RRB- 2163 215 30 : : : 2163 215 31 590–614 590–614 CD 2163 215 32 . . . 2163 216 1 7 7 LS 2163 216 2 . . . 2163 217 1 Xiaoming Xiaoming NNP 2163 217 2 Liu Liu NNP 2163 217 3 et et NNP 2163 217 4 al al NNP 2163 217 5 . . NNP 2163 217 6 , , , 2163 217 7 “ " `` 2163 217 8 Co co NN 2163 217 9 - - NN 2163 217 10 authorship authorship NN 2163 217 11 networks network NNS 2163 217 12 in in IN 2163 217 13 the the DT 2163 217 14 digital digital JJ 2163 217 15 library library NN 2163 217 16 research research NN 2163 217 17 community community NN 2163 217 18 , , , 2163 217 19 ” " '' 2163 217 20 Information Information NNP 2163 217 21 Processing Processing NNP 2163 217 22 & & CC 2163 217 23 Management Management NNP 2163 217 24 41 41 CD 2163 217 25 ( ( -LRB- 2163 217 26 2005 2005 CD 2163 217 27 ) ) -RRB- 2163 217 28 : : : 2163 217 29 1462–80 1462–80 LS 2163 217 30 . . . 2163 218 1 8 8 LS 2163 218 2 . . . 2163 219 1 John John NNP 2163 219 2 D. D. NNP 2163 219 3 Lafferty Lafferty NNP 2163 219 4 et et NNP 2163 219 5 al al NNP 2163 219 6 . . NNP 2163 219 7 , , , 2163 219 8 “ " `` 2163 219 9 Conditional Conditional NNP 2163 219 10 Random Random NNP 2163 219 11 Fields Fields NNPS 2163 219 12 : : : 2163 219 13 Probabilistic Probabilistic NNP 2163 219 14 Models Models NNPS 2163 219 15 for for IN 2163 219 16 Segmenting segmenting NN 2163 219 17 and and CC 2163 219 18 Labeling Labeling NNP 2163 219 19 Sequence Sequence NNP 2163 219 20 Data Data NNP 2163 219 21 , , , 2163 219 22 ” " '' 2163 219 23 Proceedings proceeding NNS 2163 219 24 of of IN 2163 219 25 the the DT 2163 219 26 18th 18th JJ 2163 219 27 International International NNP 2163 219 28 Conference Conference NNP 2163 219 29 on on IN 2163 219 30 Machine Machine NNP 2163 219 31 Learning Learning NNP 2163 219 32 ( ( -LRB- 2163 219 33 San San NNP 2163 219 34 Francisco Francisco NNP 2163 219 35 , , , 2163 219 36 CA CA NNP 2163 219 37 , , , 2163 219 38 USA USA NNP 2163 219 39 , , , 2163 219 40 2001 2001 CD 2163 219 41 ) ) -RRB- 2163 219 42 : : : 2163 219 43 282–89 282–89 CD 2163 219 44 . . . 2163 220 1 9 9 CD 2163 220 2 . . . 2163 221 1 Vladimir Vladimir NNP 2163 221 2 Vapnik Vapnik NNP 2163 221 3 , , , 2163 221 4 The the DT 2163 221 5 Nature Nature NNP 2163 221 6 of of IN 2163 221 7 Statistical statistical JJ 2163 221 8 Learning Learning NNP 2163 221 9 Theory Theory NNP 2163 221 10 ( ( -LRB- 2163 221 11 New New NNP 2163 221 12 York York NNP 2163 221 13 : : : 2163 221 14 Springer Springer NNP 2163 221 15 , , , 2163 221 16 1995 1995 CD 2163 221 17 ) ) -RRB- 2163 221 18 . . . 2163 222 1 10 10 CD 2163 222 2 . . . 2163 223 1 Isaac Isaac NNP 2163 223 2 G. G. NNP 2163 223 3 Councill Councill NNP 2163 223 4 et et FW 2163 223 5 al al NNP 2163 223 6 . . NNP 2163 223 7 , , , 2163 223 8 “ " `` 2163 223 9 ParsCit parscit NN 2163 223 10 : : : 2163 223 11 An an DT 2163 223 12 Open open JJ 2163 223 13 - - HYPH 2163 223 14 source source NN 2163 223 15 CRF CRF NNP 2163 223 16 Reference Reference NNP 2163 223 17 String String NNP 2163 223 18 Parsing Parsing NNP 2163 223 19 Package package NN 2163 223 20 , , , 2163 223 21 ” " '' 2163 223 22 Proceedings proceeding NNS 2163 223 23 of of IN 2163 223 24 the the DT 2163 223 25 Sixth Sixth NNP 2163 223 26 International International NNP 2163 223 27 Language Language NNP 2163 223 28 Resources Resources NNPS 2163 223 29 and and CC 2163 223 30 Evaluation Evaluation NNP 2163 223 31 ( ( -LRB- 2163 223 32 Marrakech Marrakech NNP 2163 223 33 , , , 2163 223 34 Morocco Morocco NNP 2163 223 35 , , , 2163 223 36 2008 2008 CD 2163 223 37 ) ) -RRB- 2163 223 38 . . . 2163 224 1 11 11 CD 2163 224 2 . . . 2163 225 1 Yong Yong NNP 2163 225 2 Kiat Kiat NNP 2163 225 3 Ng Ng NNP 2163 225 4 , , , 2163 225 5 “ " `` 2163 225 6 Citation Citation NNP 2163 225 7 Parsing Parsing NNP 2163 225 8 Using use VBG 2163 225 9 Maximum Maximum NNP 2163 225 10 Entropy Entropy NNP 2163 225 11 and and CC 2163 225 12 Repairs Repairs NNPS 2163 225 13 ” " '' 2163 225 14 ( ( -LRB- 2163 225 15 master master NNP 2163 225 16 's 's POS 2163 225 17 thesis thesis NN 2163 225 18 , , , 2163 225 19 National National NNP 2163 225 20 University University NNP 2163 225 21 of of IN 2163 225 22 Singapore Singapore NNP 2163 225 23 , , , 2163 225 24 2004 2004 CD 2163 225 25 ) ) -RRB- 2163 225 26 . . . 2163 226 1 12 12 CD 2163 226 2 . . . 2163 227 1 Fuchun Fuchun NNP 2163 227 2 Peng Peng NNP 2163 227 3 and and CC 2163 227 4 Andrew Andrew NNP 2163 227 5 McCallum McCallum NNP 2163 227 6 , , , 2163 227 7 “ " `` 2163 227 8 Information Information NNP 2163 227 9 Extraction Extraction NNP 2163 227 10 from from IN 2163 227 11 Research Research NNP 2163 227 12 Papers Papers NNPS 2163 227 13 Using use VBG 2163 227 14 Conditional Conditional NNP 2163 227 15 Random Random NNP 2163 227 16 Fields Fields NNPS 2163 227 17 , , , 2163 227 18 ” " '' 2163 227 19 Information Information NNP 2163 227 20 Processing Processing NNP 2163 227 21 & & CC 2163 227 22 Management Management NNP 2163 227 23 42 42 CD 2163 227 24 ( ( -LRB- 2163 227 25 2006 2006 CD 2163 227 26 ) ) -RRB- 2163 227 27 : : : 2163 227 28 963–79 963–79 CD 2163 227 29 . . . 2163 228 1 13 13 CD 2163 228 2 . . . 2163 229 1 C. C. NNP 2163 229 2 Lee Lee NNP 2163 229 3 Giles Giles NNP 2163 229 4 et et NNP 2163 229 5 al al NNP 2163 229 6 . . NNP 2163 229 7 , , , 2163 229 8 “ " `` 2163 229 9 CiteSeer CiteSeer NNP 2163 229 10 : : : 2163 229 11 An an DT 2163 229 12 Automatic Automatic NNP 2163 229 13 Citation Citation NNP 2163 229 14 Indexing Indexing NNP 2163 229 15 System System NNP 2163 229 16 , , , 2163 229 17 ” " '' 2163 229 18 Proceedings proceeding NNS 2163 229 19 of of IN 2163 229 20 the the DT 2163 229 21 Third Third NNP 2163 229 22 AMC AMC NNP 2163 229 23 Conference Conference NNP 2163 229 24 on on IN 2163 229 25 Digital Digital NNP 2163 229 26 Libraries Libraries NNP 2163 229 27 ( ( -LRB- 2163 229 28 Pittsburgh Pittsburgh NNP 2163 229 29 , , , 2163 229 30 PA PA NNP 2163 229 31 , , , 2163 229 32 1998 1998 CD 2163 229 33 ) ) -RRB- 2163 229 34 : : : 2163 229 35 89–98 89–98 LS 2163 229 36 . . . 2163 230 1 14 14 CD 2163 230 2 . . . 2163 231 1 Kristie Kristie NNP 2163 231 2 Seymore Seymore NNP 2163 231 3 et et NNP 2163 231 4 al al NNP 2163 231 5 . . NNP 2163 231 6 , , , 2163 231 7 “ " `` 2163 231 8 Learning learn VBG 2163 231 9 Hidden Hidden NNP 2163 231 10 Markov Markov NNP 2163 231 11 Model Model NNP 2163 231 12 Structure Structure NNP 2163 231 13 for for IN 2163 231 14 Information Information NNP 2163 231 15 Extraction Extraction NNP 2163 231 16 , , , 2163 231 17 ” " '' 2163 231 18 Proceedings proceeding NNS 2163 231 19 of of IN 2163 231 20 the the DT 2163 231 21 AAAI AAAI NNP 2163 231 22 Workshop Workshop NNP 2163 231 23 on on IN 2163 231 24 Machine Machine NNP 2163 231 25 Learning Learning NNP 2163 231 26 for for IN 2163 231 27 Information Information NNP 2163 231 28 Extraction Extraction NNP 2163 231 29 ( ( -LRB- 2163 231 30 1999 1999 CD 2163 231 31 ) ) -RRB- 2163 231 32 : : : 2163 231 33 37 37 CD 2163 231 34 – – : 2163 231 35 42 42 CD 2163 231 36 . . . 2163 232 1 15 15 CD 2163 232 2 . . . 2163 233 1 Isaac Isaac NNP 2163 233 2 G. G. NNP 2163 233 3 Councill Councill NNP 2163 233 4 et et FW 2163 233 5 al al NNP 2163 233 6 . . NNP 2163 233 7 , , , 2163 233 8 “ " `` 2163 233 9 ParsCit parscit NN 2163 233 10 : : : 2163 233 11 An an DT 2163 233 12 Open open JJ 2163 233 13 - - HYPH 2163 233 14 source source NN 2163 233 15 CRF CRF NNP 2163 233 16 Reference Reference NNP 2163 233 17 String String NNP 2163 233 18 Parsing Parsing NNP 2163 233 19 Package package NN 2163 233 20 , , , 2163 233 21 ” " '' 2163 233 22 Proceedings proceeding NNS 2163 233 23 of of IN 2163 233 24 the the DT 2163 233 25 Sixth Sixth NNP 2163 233 26 International International NNP 2163 233 27 Language Language NNP 2163 233 28 Resources Resources NNPS 2163 233 29 and and CC 2163 233 30 Evaluation Evaluation NNP 2163 233 31 ( ( -LRB- 2163 233 32 Marrakech Marrakech NNP 2163 233 33 , , , 2163 233 34 Morocco Morocco NNP 2163 233 35 , , , 2163 233 36 2008 2008 CD 2163 233 37 ) ) -RRB- 2163 233 38 . . . 2163 234 1 16 16 CD 2163 234 2 . . . 2163 235 1 Hui Hui NNP 2163 235 2 Han Han NNP 2163 235 3 et et NNP 2163 235 4 al al NNP 2163 235 5 . . NNP 2163 235 6 , , , 2163 235 7 “ " `` 2163 235 8 Rule rule NN 2163 235 9 - - HYPH 2163 235 10 based base VBN 2163 235 11 Word Word NNP 2163 235 12 Clustering Clustering NNP 2163 235 13 for for IN 2163 235 14 Document document NN 2163 235 15 Metadata Metadata NNP 2163 235 16 Extraction Extraction NNP 2163 235 17 , , , 2163 235 18 ” " '' 2163 235 19 Proceedings proceeding NNS 2163 235 20 of of IN 2163 235 21 the the DT 2163 235 22 Symposium Symposium NNP 2163 235 23 on on IN 2163 235 24 Applied Applied NNP 2163 235 25 Computing Computing NNP 2163 235 26 ( ( -LRB- 2163 235 27 Santa Santa NNP 2163 235 28 Fe Fe NNP 2163 235 29 , , , 2163 235 30 New New NNP 2163 235 31 Mexico Mexico NNP 2163 235 32 , , , 2163 235 33 2005 2005 CD 2163 235 34 ) ) -RRB- 2163 235 35 . . . 2163 236 1 17 17 CD 2163 236 2 . . . 2163 237 1 Eli Eli NNP 2163 237 2 Cortez Cortez NNPS 2163 237 3 et et NNP 2163 237 4 al al NNP 2163 237 5 . . NNP 2163 237 6 , , , 2163 237 7 “ " `` 2163 237 8 FLUX FLUX NNP 2163 237 9 - - HYPH 2163 237 10 CIM CIM NNP 2163 237 11 : : : 2163 237 12 Flexible flexible JJ 2163 237 13 Unsupervised unsupervised JJ 2163 237 14 Extraction Extraction NNP 2163 237 15 of of IN 2163 237 16 Citation Citation NNP 2163 237 17 Metadata Metadata NNP 2163 237 18 , , , 2163 237 19 ” " '' 2163 237 20 Proceedings proceeding NNS 2163 237 21 of of IN 2163 237 22 the the DT 2163 237 23 2007 2007 CD 2163 237 24 Conference Conference NNP 2163 237 25 on on IN 2163 237 26 Digital Digital NNP 2163 237 27 Libraries Libraries NNPS 2163 237 28 ( ( -LRB- 2163 237 29 New New NNP 2163 237 30 York York NNP 2163 237 31 , , , 2163 237 32 2007 2007 CD 2163 237 33 ) ) -RRB- 2163 237 34 : : : 2163 237 35 215–24 215–24 CD 2163 237 36 . . . 2163 238 1 18 18 CD 2163 238 2 . . . 2163 239 1 Machine Machine NNP 2163 239 2 Learning Learning NNP 2163 239 3 methods method NNS 2163 239 4 can can MD 2163 239 5 be be VB 2163 239 6 broadly broadly RB 2163 239 7 classified classify VBN 2163 239 8 into into IN 2163 239 9 two two CD 2163 239 10 categories category NNS 2163 239 11 : : : 2163 239 12 supervised supervise VBN 2163 239 13 and and CC 2163 239 14 unsupervised unsupervised JJ 2163 239 15 . . . 2163 240 1 Supervised supervise VBN 2163 240 2 methods method NNS 2163 240 3 require require VBP 2163 240 4 training training NN 2163 240 5 on on IN 2163 240 6 specific specific JJ 2163 240 7 datasets dataset NNS 2163 240 8 that that WDT 2163 240 9 exhibit exhibit VBP 2163 240 10 the the DT 2163 240 11 characteristics characteristic NNS 2163 240 12 of of IN 2163 240 13 the the DT 2163 240 14 target target NN 2163 240 15 domain domain NN 2163 240 16 . . . 2163 241 1 To to TO 2163 241 2 achieve achieve VB 2163 241 3 high high JJ 2163 241 4 accuracy accuracy NN 2163 241 5 levels level NNS 2163 241 6 , , , 2163 241 7 the the DT 2163 241 8 training training NN 2163 241 9 dataset dataset NN 2163 241 10 needs need VBZ 2163 241 11 to to TO 2163 241 12 be be VB 2163 241 13 reasonably reasonably RB 2163 241 14 large large JJ 2163 241 15 , , , 2163 241 16 and and CC 2163 241 17 more more RBR 2163 241 18 importantly importantly RB 2163 241 19 , , , 2163 241 20 it -PRON- PRP 2163 241 21 has have VBZ 2163 241 22 to to TO 2163 241 23 cover cover VB 2163 241 24 most most JJS 2163 241 25 of of IN 2163 241 26 the the DT 2163 241 27 possible possible JJ 2163 241 28 INFORMATION information JJ 2163 241 29 TECHNOLOGY technology NN 2163 241 30 AND and CC 2163 241 31 LIBRARIES LIBRARIES NNP 2163 241 32 | | NNP 2163 241 33 JUNE JUNE NNP 2163 241 34 2012 2012 CD 2163 241 35 19 19 CD 2163 241 36 exceptions exception NNS 2163 241 37 from from IN 2163 241 38 the the DT 2163 241 39 intrinsic intrinsic JJ 2163 241 40 data datum NNS 2163 241 41 patterns pattern NNS 2163 241 42 . . . 2163 242 1 Unlike unlike IN 2163 242 2 supervised supervise VBN 2163 242 3 methods method NNS 2163 242 4 , , , 2163 242 5 unsupervised unsupervised JJ 2163 242 6 methods method NNS 2163 242 7 do do VBP 2163 242 8 not not RB 2163 242 9 require require VB 2163 242 10 training training NN 2163 242 11 , , , 2163 242 12 and and CC 2163 242 13 in in IN 2163 242 14 principle principle NN 2163 242 15 , , , 2163 242 16 use use VB 2163 242 17 generic generic JJ 2163 242 18 rules rule NNS 2163 242 19 to to TO 2163 242 20 encode encode VB 2163 242 21 both both DT 2163 242 22 the the DT 2163 242 23 expected expected JJ 2163 242 24 patterns pattern NNS 2163 242 25 and and CC 2163 242 26 the the DT 2163 242 27 possible possible JJ 2163 242 28 exceptions exception NNS 2163 242 29 of of IN 2163 242 30 the the DT 2163 242 31 target target NN 2163 242 32 data datum NNS 2163 242 33 . . . 2163 243 1 19 19 CD 2163 243 2 . . . 2163 244 1 Peng Peng NNP 2163 244 2 and and CC 2163 244 3 McCallum McCallum NNP 2163 244 4 , , , 2163 244 5 “ " `` 2163 244 6 Information Information NNP 2163 244 7 Extraction Extraction NNP 2163 244 8 from from IN 2163 244 9 Research Research NNP 2163 244 10 Papers Papers NNPS 2163 244 11 Using use VBG 2163 244 12 Conditional Conditional NNP 2163 244 13 Random Random NNP 2163 244 14 Fields Fields NNPS 2163 244 15 . . . 2163 244 16 ” " '' 2163 244 17 20 20 CD 2163 244 18 . . . 2163 245 1 Hoifung Hoifung NNP 2163 245 2 Poon Poon NNP 2163 245 3 and and CC 2163 245 4 Pedro Pedro NNP 2163 245 5 Domingos Domingos NNP 2163 245 6 , , , 2163 245 7 “ " `` 2163 245 8 Joint joint JJ 2163 245 9 inference inference NN 2163 245 10 in in IN 2163 245 11 information information NN 2163 245 12 extraction extraction NN 2163 245 13 , , , 2163 245 14 ” " '' 2163 245 15 Proceedings proceeding NNS 2163 245 16 of of IN 2163 245 17 the the DT 2163 245 18 22nd 22nd JJ 2163 245 19 National National NNP 2163 245 20 Conference Conference NNP 2163 245 21 on on IN 2163 245 22 Artificial Artificial NNP 2163 245 23 Intelligence Intelligence NNP 2163 245 24 ( ( -LRB- 2163 245 25 Vancouver Vancouver NNP 2163 245 26 , , , 2163 245 27 British British NNP 2163 245 28 Columbia Columbia NNP 2163 245 29 , , , 2163 245 30 Canada Canada NNP 2163 245 31 , , , 2163 245 32 2007 2007 CD 2163 245 33 ) ) -RRB- 2163 245 34 : : : 2163 245 35 913–18 913–18 CD 2163 245 36 . . . 2163 246 1 21 21 CD 2163 246 2 . . . 2163 247 1 Ariel Ariel NNP 2163 247 2 Schwartz Schwartz NNP 2163 247 3 et et FW 2163 247 4 al al NNP 2163 247 5 . . NNP 2163 247 6 , , , 2163 247 7 “ " `` 2163 247 8 Multiple Multiple NNP 2163 247 9 Alignment Alignment NNP 2163 247 10 of of IN 2163 247 11 Citation Citation NNP 2163 247 12 Sentences Sentences NNPS 2163 247 13 with with IN 2163 247 14 Conditional Conditional NNP 2163 247 15 Random Random NNP 2163 247 16 Fields Fields NNPS 2163 247 17 and and CC 2163 247 18 Posterior Posterior NNP 2163 247 19 Decoding decode VBG 2163 247 20 , , , 2163 247 21 ” " '' 2163 247 22 Proceedings proceeding NNS 2163 247 23 of of IN 2163 247 24 the the DT 2163 247 25 2007 2007 CD 2163 247 26 Joint Joint NNP 2163 247 27 Conference Conference NNP 2163 247 28 on on IN 2163 247 29 Empirical Empirical NNP 2163 247 30 Methods Methods NNPS 2163 247 31 in in IN 2163 247 32 Natural Natural NNP 2163 247 33 Language Language NNP 2163 247 34 Processing Processing NNP 2163 247 35 and and CC 2163 247 36 Computational Computational NNP 2163 247 37 Natural Natural NNP 2163 247 38 Language Language NNP 2163 247 39 Learning Learning NNP 2163 247 40 ( ( -LRB- 2163 247 41 Prague Prague NNP 2163 247 42 , , , 2163 247 43 Czech Czech NNP 2163 247 44 Republic Republic NNP 2163 247 45 , , , 2163 247 46 2007 2007 CD 2163 247 47 ) ) -RRB- 2163 247 48 : : : 2163 247 49 847–57 847–57 CD 2163 247 50 . . . 2163 248 1 22 22 CD 2163 248 2 . . . 2163 249 1 Simone Simone NNP 2163 249 2 Teufel Teufel NNP 2163 249 3 et et NNP 2163 249 4 al al NNP 2163 249 5 . . NNP 2163 249 6 , , , 2163 249 7 “ " `` 2163 249 8 Automatic automatic JJ 2163 249 9 Classification Classification NNP 2163 249 10 of of IN 2163 249 11 Citation Citation NNP 2163 249 12 Function Function NNP 2163 249 13 , , , 2163 249 14 ” " '' 2163 249 15 Proceedings proceeding NNS 2163 249 16 of of IN 2163 249 17 the the DT 2163 249 18 2006 2006 CD 2163 249 19 Conference Conference NNP 2163 249 20 on on IN 2163 249 21 Empirical Empirical NNP 2163 249 22 Methods Methods NNPS 2163 249 23 in in IN 2163 249 24 Natural Natural NNP 2163 249 25 Language Language NNP 2163 249 26 Processing Processing NNP 2163 249 27 ( ( -LRB- 2163 249 28 Sydney Sydney NNP 2163 249 29 , , , 2163 249 30 Australia Australia NNP 2163 249 31 , , , 2163 249 32 2006 2006 CD 2163 249 33 ) ) -RRB- 2163 249 34 : : : 2163 249 35 103–10 103–10 CD 2163 249 36 . . . 2163 250 1 23 23 CD 2163 250 2 . . . 2163 251 1 Jien Jien NNP 2163 251 2 - - HYPH 2163 251 3 Chen Chen NNP 2163 251 4 Wu Wu NNP 2163 251 5 et et NNP 2163 251 6 al al NNP 2163 251 7 . . NNP 2163 251 8 , , , 2163 251 9 “ " `` 2163 251 10 Computational Computational NNP 2163 251 11 Analysis Analysis NNP 2163 251 12 of of IN 2163 251 13 Move Move NNP 2163 251 14 Structures structure NNS 2163 251 15 in in IN 2163 251 16 Academic Academic NNP 2163 251 17 Abstracts Abstracts NNPS 2163 251 18 , , , 2163 251 19 ” " '' 2163 251 20 COLING COLING NNP 2163 251 21 / / SYM 2163 251 22 ACL ACL NNP 2163 251 23 Interactive Interactive NNP 2163 251 24 Presentation Presentation NNP 2163 251 25 Sessions Sessions NNP 2163 251 26 ( ( -LRB- 2163 251 27 Sydney Sydney NNP 2163 251 28 , , , 2163 251 29 Australia Australia NNP 2163 251 30 , , , 2163 251 31 2006 2006 CD 2163 251 32 ) ) -RRB- 2163 251 33 : : : 2163 251 34 41–44 41–44 CD 2163 251 35 . . . 2163 252 1 24 24 CD 2163 252 2 . . . 2163 253 1 Eugenio Eugenio NNP 2163 253 2 Cesario Cesario NNP 2163 253 3 et et NNP 2163 253 4 al al NNP 2163 253 5 . . NNP 2163 253 6 , , , 2163 253 7 “ " `` 2163 253 8 Boosting boost VBG 2163 253 9 text text NN 2163 253 10 segmentation segmentation NN 2163 253 11 via via IN 2163 253 12 progressive progressive JJ 2163 253 13 classification classification NN 2163 253 14 , , , 2163 253 15 ” " '' 2163 253 16 Knowledge knowledge NN 2163 253 17 and and CC 2163 253 18 Information Information NNP 2163 253 19 Systems Systems NNP 2163 253 20 15 15 CD 2163 253 21 ( ( -LRB- 2163 253 22 2008 2008 CD 2163 253 23 ) ) -RRB- 2163 253 24 : : : 2163 253 25 285–320 285–320 CD 2163 253 26 . . . 2163 254 1 25 25 CD 2163 254 2 . . . 2163 255 1 Dublin Dublin NNP 2163 255 2 Core Core NNP 2163 255 3 website website NN 2163 255 4 , , , 2163 255 5 http://dublincore.org http://dublincore.org NNP 2163 255 6 ( ( -LRB- 2163 255 7 accessed access VBN 2163 255 8 May May NNP 2163 255 9 4 4 CD 2163 255 10 , , , 2163 255 11 2011 2011 CD 2163 255 12 ) ) -RRB- 2163 255 13 . . . 2163 256 1 26 26 CD 2163 256 2 . . . 2163 257 1 York York NNP 2163 257 2 Sure sure UH 2163 257 3 et et FW 2163 257 4 al al NNP 2163 257 5 . . NNP 2163 257 6 , , , 2163 257 7 “ " `` 2163 257 8 The the DT 2163 257 9 SWRC SWRC NNP 2163 257 10 ontology ontology NN 2163 257 11 – – : 2163 257 12 Semantic Semantic NNP 2163 257 13 Web Web NNP 2163 257 14 for for IN 2163 257 15 research research NN 2163 257 16 communities community NNS 2163 257 17 , , , 2163 257 18 ” " '' 2163 257 19 Proceedings proceeding NNS 2163 257 20 of of IN 2163 257 21 the the DT 2163 257 22 12th 12th JJ 2163 257 23 Portuguese Portuguese NNP 2163 257 24 Conference Conference NNP 2163 257 25 on on IN 2163 257 26 Artificial Artificial NNP 2163 257 27 Intelligence Intelligence NNP 2163 257 28 ( ( -LRB- 2163 257 29 Covilha Covilha NNP 2163 257 30 , , , 2163 257 31 Portugal Portugal NNP 2163 257 32 , , , 2163 257 33 2005 2005 CD 2163 257 34 ) ) -RRB- 2163 257 35 . . . 2163 258 1 27 27 CD 2163 258 2 . . . 2163 259 1 Yanjun Yanjun NNP 2163 259 2 Qi Qi NNP 2163 259 3 et et NNP 2163 259 4 al al NNP 2163 259 5 . . NNP 2163 259 6 , , , 2163 259 7 “ " `` 2163 259 8 Semi Semi NNP 2163 259 9 - - HYPH 2163 259 10 Supervised supervised JJ 2163 259 11 Sequence Sequence NNP 2163 259 12 Labeling labeling NN 2163 259 13 with with IN 2163 259 14 Self self NN 2163 259 15 - - HYPH 2163 259 16 Learned learn VBN 2163 259 17 Features feature NNS 2163 259 18 , , , 2163 259 19 ” " '' 2163 259 20 Proceedings proceeding NNS 2163 259 21 of of IN 2163 259 22 IEEE IEEE NNP 2163 259 23 International International NNP 2163 259 24 Conference Conference NNP 2163 259 25 on on IN 2163 259 26 Data Data NNP 2163 259 27 Mining Mining NNP 2163 259 28 ( ( -LRB- 2163 259 29 Miami Miami NNP 2163 259 30 , , , 2163 259 31 FL FL NNP 2163 259 32 , , , 2163 259 33 USA USA NNP 2163 259 34 , , , 2163 259 35 2009 2009 CD 2163 259 36 ) ) -RRB- 2163 259 37 . . . 2163 260 1 28 28 CD 2163 260 2 . . . 2163 261 1 David David NNP 2163 261 2 Sanchez Sanchez NNP 2163 261 3 et et NNP 2163 261 4 al al NNP 2163 261 5 . . NNP 2163 261 6 , , , 2163 261 7 “ " `` 2163 261 8 Content content NN 2163 261 9 Annotation annotation NN 2163 261 10 for for IN 2163 261 11 the the DT 2163 261 12 Semantic semantic JJ 2163 261 13 Web web NN 2163 261 14 : : : 2163 261 15 An an DT 2163 261 16 Automatic Automatic NNP 2163 261 17 Web web NN 2163 261 18 - - HYPH 2163 261 19 Based base VBN 2163 261 20 Approach Approach NNP 2163 261 21 , , , 2163 261 22 ” " '' 2163 261 23 Knowledge knowledge NN 2163 261 24 and and CC 2163 261 25 Information Information NNP 2163 261 26 Systems Systems NNP 2163 261 27 27 27 CD 2163 261 28 ( ( -LRB- 2163 261 29 2011 2011 CD 2163 261 30 ) ) -RRB- 2163 261 31 : : : 2163 261 32 393 393 CD 2163 261 33 - - SYM 2163 261 34 418 418 CD 2163 261 35 . . . 2163 262 1 29 29 CD 2163 262 2 . . . 2163 263 1 Tshering tshere VBG 2163 263 2 Cigay Cigay NNP 2163 263 3 Dorji Dorji NNP 2163 263 4 et et NNP 2163 263 5 al al NNP 2163 263 6 . . NNP 2163 263 7 , , , 2163 263 8 “ " `` 2163 263 9 Extraction extraction NN 2163 263 10 , , , 2163 263 11 selection selection NN 2163 263 12 and and CC 2163 263 13 ranking ranking NN 2163 263 14 of of IN 2163 263 15 Field Field NNP 2163 263 16 Association Association NNP 2163 263 17 ( ( -LRB- 2163 263 18 FA FA NNP 2163 263 19 ) ) -RRB- 2163 263 20 Terms term NNS 2163 263 21 from from IN 2163 263 22 domain domain NN 2163 263 23 - - HYPH 2163 263 24 specific specific JJ 2163 263 25 corpora corpora NN 2163 263 26 for for IN 2163 263 27 building build VBG 2163 263 28 a a DT 2163 263 29 comprehensive comprehensive JJ 2163 263 30 FA FA NNP 2163 263 31 terms term NNS 2163 263 32 dictionary dictionary JJ 2163 263 33 , , , 2163 263 34 ” " '' 2163 263 35 Knowledge Knowledge NNP 2163 263 36 and and CC 2163 263 37 Information Information NNP 2163 263 38 Systems Systems NNP 2163 263 39 27 27 CD 2163 263 40 ( ( -LRB- 2163 263 41 2011 2011 CD 2163 263 42 ) ) -RRB- 2163 263 43 : : : 2163 263 44 141–61 141–61 CD 2163 263 45 . . . 2163 264 1 30 30 CD 2163 264 2 . . . 2163 265 1 Please please UH 2163 265 2 note note VB 2163 265 3 that that IN 2163 265 4 the the DT 2163 265 5 chunker chunker NN 2163 265 6 is be VBZ 2163 265 7 document document NN 2163 265 8 - - HYPH 2163 265 9 format format NN 2163 265 10 agnostic agnostic JJ 2163 265 11 and and CC 2163 265 12 takes take VBZ 2163 265 13 as as RB 2163 265 14 input input NN 2163 265 15 only only JJ 2163 265 16 raw raw JJ 2163 265 17 text text NN 2163 265 18 . . . 2163 266 1 The the DT 2163 266 2 actual actual JJ 2163 266 3 extraction extraction NN 2163 266 4 of of IN 2163 266 5 this this DT 2163 266 6 raw raw JJ 2163 266 7 text text NN 2163 266 8 from from IN 2163 266 9 the the DT 2163 266 10 original original JJ 2163 266 11 document document NN 2163 266 12 ( ( -LRB- 2163 266 13 PDF PDF NNP 2163 266 14 , , , 2163 266 15 DOC DOC NNP 2163 266 16 or or CC 2163 266 17 some some DT 2163 266 18 other other JJ 2163 266 19 format format NN 2163 266 20 ) ) -RRB- 2163 266 21 is be VBZ 2163 266 22 the the DT 2163 266 23 user user NN 2163 266 24 ’s ’s POS 2163 266 25 responsibility responsibility NN 2163 266 26 . . . 2163 267 1 31 31 CD 2163 267 2 . . . 2163 268 1 As as IN 2163 268 2 a a DT 2163 268 3 note note NN 2163 268 4 , , , 2163 268 5 we -PRON- PRP 2163 268 6 chose choose VBD 2163 268 7 this this DT 2163 268 8 length length NN 2163 268 9 of of IN 2163 268 10 fifteen fifteen CD 2163 268 11 characters character NNS 2163 268 12 empirically empirically RB 2163 268 13 , , , 2163 268 14 and and CC 2163 268 15 based base VBN 2163 268 16 on on IN 2163 268 17 the the DT 2163 268 18 assumption assumption NN 2163 268 19 that that IN 2163 268 20 in in IN 2163 268 21 any any DT 2163 268 22 format format NN 2163 268 23 the the DT 2163 268 24 publication publication NN 2163 268 25 content content NN 2163 268 26 lines line NNS 2163 268 27 usually usually RB 2163 268 28 have have VBP 2163 268 29 more more JJR 2163 268 30 than than IN 2163 268 31 fifteen fifteen CD 2163 268 32 characters character NNS 2163 268 33 . . . 2163 269 1 REFERENCE reference NN 2163 269 2 INFORMATION INFORMATION NNS 2163 269 3 EXTRACTION extraction NN 2163 269 4 AND and CC 2163 269 5 PROCESSING processing NN 2163 269 6 |GROZA |groza NN 2163 269 7 , , , 2163 269 8 GRIMNES GRIMNES NNP 2163 269 9 , , , 2163 269 10 AND and CC 2163 269 11 HANDSCHUH handschuh NN 2163 269 12 20 20 CD 2163 269 13 32 32 CD 2163 269 14 . . . 2163 270 1 Lafferty Lafferty NNP 2163 270 2 et et NNP 2163 270 3 al al NNP 2163 270 4 . . NNP 2163 270 5 , , , 2163 270 6 “ " `` 2163 270 7 Conditional Conditional NNP 2163 270 8 Random Random NNP 2163 270 9 Fields Fields NNPS 2163 270 10 : : : 2163 270 11 Probabilistic Probabilistic NNP 2163 270 12 Models Models NNPS 2163 270 13 for for IN 2163 270 14 Segmenting segmenting NN 2163 270 15 and and CC 2163 270 16 Labeling Labeling NNP 2163 270 17 Sequence Sequence NNP 2163 270 18 Data Data NNP 2163 270 19 . . . 2163 270 20 ” " '' 2163 270 21 33 33 CD 2163 270 22 . . . 2163 271 1 Councill Councill NNP 2163 271 2 et et FW 2163 271 3 al al NNP 2163 271 4 . . NNP 2163 271 5 , , , 2163 271 6 “ " `` 2163 271 7 ParsCit parscit NN 2163 271 8 : : : 2163 271 9 An an DT 2163 271 10 Open open JJ 2163 271 11 - - HYPH 2163 271 12 source source NN 2163 271 13 CRF CRF NNP 2163 271 14 Reference Reference NNP 2163 271 15 String String NNP 2163 271 16 Parsing Parsing NNP 2163 271 17 Package Package NNP 2163 271 18 . . . 2163 271 19 ” " '' 2163 271 20 34 34 CD 2163 271 21 . . . 2163 272 1 The the DT 2163 272 2 manual manual JJ 2163 272 3 tagging tagging NN 2163 272 4 was be VBD 2163 272 5 performed perform VBN 2163 272 6 by by IN 2163 272 7 a a DT 2163 272 8 single single JJ 2163 272 9 person person NN 2163 272 10 and and CC 2163 272 11 since since IN 2163 272 12 the the DT 2163 272 13 reference reference NN 2163 272 14 chunks chunk NNS 2163 272 15 have have VBP 2163 272 16 no no DT 2163 272 17 ambiguity ambiguity NN 2163 272 18 attached attach VBN 2163 272 19 , , , 2163 272 20 we -PRON- PRP 2163 272 21 did do VBD 2163 272 22 not not RB 2163 272 23 see see VB 2163 272 24 the the DT 2163 272 25 need need NN 2163 272 26 for for IN 2163 272 27 running run VBG 2163 272 28 any any DT 2163 272 29 data data NN 2163 272 30 reliability reliability NN 2163 272 31 tests test NNS 2163 272 32 . . . 2163 273 1 35 35 CD 2163 273 2 . . . 2163 274 1 Ron Ron NNP 2163 274 2 Kohavi Kohavi NNP 2163 274 3 , , , 2163 274 4 “ " `` 2163 274 5 A a DT 2163 274 6 Study Study NNP 2163 274 7 of of IN 2163 274 8 Cross Cross NNP 2163 274 9 - - NNP 2163 274 10 Validation Validation NNP 2163 274 11 and and CC 2163 274 12 Bootstrap Bootstrap NNP 2163 274 13 for for IN 2163 274 14 Accuracy Accuracy NNP 2163 274 15 Estimation Estimation NNP 2163 274 16 and and CC 2163 274 17 Model Model NNP 2163 274 18 Selection Selection NNP 2163 274 19 , , , 2163 274 20 ” " '' 2163 274 21 Proceedings proceeding NNS 2163 274 22 of of IN 2163 274 23 the the DT 2163 274 24 14th 14th JJ 2163 274 25 International International NNP 2163 274 26 Joint Joint NNP 2163 274 27 Conference Conference NNP 2163 274 28 on on IN 2163 274 29 Artificial Artificial NNP 2163 274 30 Intelligence Intelligence NNP 2163 274 31 ( ( -LRB- 2163 274 32 Montreal Montreal NNP 2163 274 33 , , , 2163 274 34 Quebec Quebec NNP 2163 274 35 , , , 2163 274 36 1995 1995 CD 2163 274 37 ) ) -RRB- 2163 274 38 : : : 2163 274 39 1137–43 1137–43 LS 2163 274 40 . . . 2163 275 1 36 36 CD 2163 275 2 . . . 2163 276 1 Peng Peng NNP 2163 276 2 and and CC 2163 276 3 McCallum McCallum NNP 2163 276 4 , , , 2163 276 5 “ " `` 2163 276 6 Information Information NNP 2163 276 7 Extraction Extraction NNP 2163 276 8 from from IN 2163 276 9 Research Research NNP 2163 276 10 Papers Papers NNPS 2163 276 11 Using use VBG 2163 276 12 Conditional Conditional NNP 2163 276 13 Random Random NNP 2163 276 14 Fields Fields NNPS 2163 276 15 . . . 2163 276 16 ” " '' 2163 276 17 37 37 CD 2163 276 18 . . . 2163 277 1 Councill Councill NNP 2163 277 2 et et FW 2163 277 3 al al NNP 2163 277 4 . . NNP 2163 277 5 , , , 2163 277 6 “ " `` 2163 277 7 ParsCit parscit NN 2163 277 8 : : : 2163 277 9 An an DT 2163 277 10 Open open JJ 2163 277 11 - - HYPH 2163 277 12 source source NN 2163 277 13 CRF CRF NNP 2163 277 14 Reference Reference NNP 2163 277 15 String String NNP 2163 277 16 Parsing Parsing NNP 2163 277 17 Package Package NNP 2163 277 18 . . . 2163 277 19 ” " '' 2163 277 20 38 38 CD 2163 277 21 . . . 2163 278 1 Mallet mallet NN 2163 278 2 : : : 2163 278 3 MAchine MAchine NNP 2163 278 4 Learning Learning NNP 2163 278 5 for for IN 2163 278 6 LanguagE LanguagE NNP 2163 278 7 Toolkit Toolkit NNP 2163 278 8 , , , 2163 278 9 http://mallet.cs.umass.edu http://mallet.cs.umass.edu NNS 2163 278 10 ( ( -LRB- 2163 278 11 accessed access VBN 2163 278 12 May May NNP 2163 278 13 4 4 CD 2163 278 14 , , , 2163 278 15 2011 2011 CD 2163 278 16 ) ) -RRB- 2163 278 17 . . . 2163 279 1 39 39 CD 2163 279 2 . . . 2163 280 1 William William NNP 2163 280 2 M. M. NNP 2163 280 3 Shaw Shaw NNP 2163 280 4 et et FW 2163 280 5 al al NNP 2163 280 6 . . NNP 2163 280 7 , , , 2163 280 8 “ " `` 2163 280 9 Performance performance NN 2163 280 10 standards standard NNS 2163 280 11 and and CC 2163 280 12 evaluations evaluation NNS 2163 280 13 in in IN 2163 280 14 IR IR NNP 2163 280 15 test test NN 2163 280 16 collections collection NNS 2163 280 17 : : : 2163 280 18 Cluster- Cluster- NNP 2163 280 19 based base VBN 2163 280 20 retrieval retrieval NN 2163 280 21 models model NNS 2163 280 22 , , , 2163 280 23 ” " '' 2163 280 24 Information Information NNP 2163 280 25 Processing Processing NNP 2163 280 26 & & CC 2163 280 27 Management Management NNP 2163 280 28 33 33 CD 2163 280 29 ( ( -LRB- 2163 280 30 1997 1997 CD 2163 280 31 ) ) -RRB- 2163 280 32 : : : 2163 280 33 1–14 1–14 CD 2163 280 34 . . . 2163 281 1 40 40 CD 2163 281 2 . . . 2163 282 1 Peng Peng NNP 2163 282 2 and and CC 2163 282 3 McCallum McCallum NNP 2163 282 4 , , , 2163 282 5 “ " `` 2163 282 6 Information Information NNP 2163 282 7 Extraction Extraction NNP 2163 282 8 from from IN 2163 282 9 Research Research NNP 2163 282 10 Papers Papers NNPS 2163 282 11 Using use VBG 2163 282 12 Conditional Conditional NNP 2163 282 13 Random Random NNP 2163 282 14 Fields Fields NNPS 2163 282 15 . . . 2163 282 16 ” " '' 2163 282 17 41 41 CD 2163 282 18 . . . 2163 283 1 Councill Councill NNP 2163 283 2 et et FW 2163 283 3 al al NNP 2163 283 4 . . NNP 2163 283 5 , , , 2163 283 6 “ " `` 2163 283 7 ParsCit parscit NN 2163 283 8 : : : 2163 283 9 An an DT 2163 283 10 Open open JJ 2163 283 11 - - HYPH 2163 283 12 source source NN 2163 283 13 CRF CRF NNP 2163 283 14 Reference Reference NNP 2163 283 15 String String NNP 2163 283 16 Parsing Parsing NNP 2163 283 17 Package Package NNP 2163 283 18 . . . 2163 283 19 ” " '' 2163 283 20 42 42 CD 2163 283 21 . . . 2163 284 1 Seymore Seymore NNP 2163 284 2 et et NNP 2163 284 3 al al NNP 2163 284 4 . . NNP 2163 284 5 , , , 2163 284 6 “ " `` 2163 284 7 Learning learn VBG 2163 284 8 Hidden Hidden NNP 2163 284 9 Markov Markov NNP 2163 284 10 Model Model NNP 2163 284 11 Structure Structure NNP 2163 284 12 for for IN 2163 284 13 Information Information NNP 2163 284 14 Extraction Extraction NNP 2163 284 15 . . . 2163 284 16 ” " '' 2163 284 17 43 43 CD 2163 284 18 . . . 2163 285 1 Han Han NNP 2163 285 2 et et NNP 2163 285 3 al al NNP 2163 285 4 . . NNP 2163 285 5 , , , 2163 285 6 “ " `` 2163 285 7 Rule rule NN 2163 285 8 - - HYPH 2163 285 9 based base VBN 2163 285 10 Word Word NNP 2163 285 11 Clustering Clustering NNP 2163 285 12 for for IN 2163 285 13 Document document NN 2163 285 14 Metadata Metadata NNP 2163 285 15 Extraction Extraction NNP 2163 285 16 . . . 2163 285 17 ” " '' 2163 285 18 44 44 CD 2163 285 19 . . . 2163 286 1 Cortez Cortez NNP 2163 286 2 et et NNP 2163 286 3 al al NNP 2163 286 4 . . NNP 2163 286 5 , , , 2163 286 6 “ " `` 2163 286 7 FLUX FLUX NNP 2163 286 8 - - HYPH 2163 286 9 CIM CIM NNP 2163 286 10 : : : 2163 286 11 Flexible flexible JJ 2163 286 12 Unsupervised unsupervised JJ 2163 286 13 Extraction Extraction NNP 2163 286 14 of of IN 2163 286 15 Citation Citation NNP 2163 286 16 Metadata Metadata NNP 2163 286 17 . . . 2163 286 18 ” " '' 2163 286 19 45 45 CD 2163 286 20 . . . 2163 287 1 The the DT 2163 287 2 CS CS NNP 2163 287 3 - - HYPH 2163 287 4 SW SW NNP 2163 287 5 dataset dataset NN 2163 287 6 is be VBZ 2163 287 7 available available JJ 2163 287 8 at at IN 2163 287 9 http://resources.smile.deri.ie/corpora/cs-sw http://resources.smile.deri.ie/corpora/cs-sw ADD 2163 287 10 ( ( -LRB- 2163 287 11 accessed access VBN 2163 287 12 May May NNP 2163 287 13 4 4 CD 2163 287 14 , , , 2163 287 15 2011 2011 CD 2163 287 16 ) ) -RRB- 2163 287 17 . . . 2163 288 1 http://resources.smile.deri.ie/corpora/cs-sw http://resources.smile.deri.ie/corpora/cs-sw ADD