Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis Claudio Delli Bovi, Luca Telesca and Roberto Navigli Department of Computer Science Sapienza University of Rome {dellibovi,navigli}@di.uniroma1.it luca.telesca@gmail.com Abstract We present DEFIE, an approach to large- scale Information Extraction (IE) based on a syntactic-semantic analysis of textual defini- tions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and fi- nally exploit the resulting information to orga- nize the acquired relations hierarchically. The output of DEFIE is a high-quality knowledge base consisting of several million automati- cally acquired semantic relations.1 1 Introduction The problem of knowledge acquisition lies at the core of Natural Language Processing. Recent years have witnessed the massive exploitation of collabo- rative, semi-structured information as the ideal mid- dle ground between high-quality, fully-structured resources and the larger amount of cheaper (but noisy) unstructured text (Hovy et al., 2013). Col- laborative projects, like Freebase (Bollacker et al., 2008) and Wikidata (Vrandečić, 2012), have been being developed for many years and are continu- ously being improved. A great deal of research also focuses on enriching available semi-structured re- sources, most notably Wikipedia, thereby creating taxonomies (Ponzetto and Strube, 2011; Flati et al., 2014), ontologies (Mahdisoltani et al., 2015) and se- mantic networks (Navigli and Ponzetto, 2012; Nas- tase and Strube, 2013). These solutions, however, 1http://lcl.uniroma1.it/defie are inherently constrained to small and often pre- specified sets of relations. A more radical approach is adopted in systems like TEXTRUNNER (Etzioni et al., 2008) and REVERB (Fader et al., 2011), which developed from the Open Information Extraction (OIE) paradigm (Etzioni et al., 2008) and focused on the unconstrained extraction of a large number of relations from massive unstructured corpora. Ul- timately, all these endeavors were geared towards addressing the knowledge acquisition problem and tackling long-standing challenges in the field, such as Machine Reading (Mitchell, 2005). While earlier OIE approaches relied mostly on dependencies at the level of surface text (Etzioni et al., 2008; Fader et al., 2011), more recent work has focused on deeper language understanding at the level of both syntax and semantics (Nakashole et al., 2012; Moro and Navigli, 2013) and tackled chal- lenging linguistic phenomena like synonymy and polysemy. However, these issues have not yet been addressed in their entirety. Relation strings are still bound to surface text, lacking actual semantic con- tent. Furthermore, most OIE systems do not have a clear and unified ontological structure and re- quire additional processing steps, such as statisti- cal inference mappings (Dutta et al., 2014), graph- based alignments of relational phrases (Grycner and Weikum, 2014), or knowledge base unification pro- cedures (Delli Bovi et al., 2015), in order for their potential to be exploitable in real applications. In DEFIE the key idea is to leverage the linguistic analysis of recent semantically-enhanced OIE tech- niques while moving from open text to smaller cor- pora of dense prescriptive knowledge. The aim is 529 Transactions of the Association for Computational Linguistics, vol. 3, pp. 529–543, 2015. Action Editor: Sebastian Riedel. Submission batch: 5/2015; Revision batch: 8/2015; Published 10/2015. c©2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. Figure 1: Syntactic-semantic graph construction from a textual definition then to extract as much information as possible by unifying syntactic analysis and state-of-the-art dis- ambiguation and entity linking. Using this strategy, from an input corpus of textual definitions (short and concise descriptions of a given concept or entity) we are able to harvest fully disambiguated relation in- stances on a large scale, and integrate them auto- matically into a high-quality taxonomy of seman- tic relations. As a result a large knowledge base is produced that shows competitive accuracy and cov- erage against state-of-the-art OIE systems based on much larger corpora. Our contributions can be sum- marized as follows: • We propose an approach to IE that ties together syntactic dependencies and unified entity link- ing/word sense disambiguation, designed to discover semantic relations from a relatively small corpus of textual definitions; • We create a large knowledge base of fully disambiguated relation instances, ranging over named entities and concepts from available re- sources like WordNet and Wikipedia; • We exploit our semantified relation patterns to automatically build a rich, high-quality relation taxonomy, showing competitive results against state-of-the-art approaches. Our approach comprises three stages. First, we extract from our input corpus an initial set of seman- tic relations (Section 2); each relation is then scored and augmented with semantic type signatures (Sec- tion 3); finally, the augmented relations are used to build a relation taxonomy (Section 4). 2 Relation Extraction Here we describe the first stage of our approach, where a set of semantic relations is extracted from the input corpus. In the following, we refer to a re- lation instance as a triple t = 〈ai, r, aj〉 with ai and aj being the arguments and r the relation pattern. From each relation pattern rk the associated relation Rk is identified by the set of all relation instances where r = rk. In order to extract a large set of fully disambiguated relation instances we bring together syntactic and semantic analysis on a corpus of plain textual definitions. Each definition is first parsed and disambiguated (Figure 1a-b, Section 2.1); syntactic and semantic information is combined into a struc- tured graph representation (Figure 1c, Section 2.2) and relation patterns are then extracted as shortest paths between concept pairs (Section 2.3). The semantics of our relations draws on BabelNet (Navigli and Ponzetto, 2012), a wide-coverage mul- tilingual semantic network obtained from the auto- matic integration of WordNet, Wikipedia and other resources. This choice is not mandatory; however, inasmuch as it is a superset of these resources, Ba- belNet brings together lexicographic and encyclope- dic knowledge, enabling us to reach higher coverage while still being able to accommodate different dis- ambiguation strategies. For each relation instance t extracted, both ai,aj and the content words appear- ing in r are linked to the BabelNet inventory. In the remainder of the paper we identify BabelNet con- cepts or entities using a subscript-superscript nota- tion where, for instance, bandibn refers to the i-th BabelNet sense for the English word band. 530 2.1 Textual Definition Processing The first step of the process is the automatic extraction of syntactic information (typed depen- dencies) and semantic information (word senses and named entity mentions) from each textual definition. Each definition undergoes the following steps: Syntactic Analysis. Each textual defini- tion d is parsed to obtain a dependency graph Gd (Figure 1a). Parsing is carried out using C&C (Clark and Curran, 2007), a log-linear parser based on Combinatory Categorial Grammar (CCG). Although our algorithm seamlessly works with any syntactic formalism, CCG rules are especially suited to longer definitions and linguistic phenomena like coordinating conjunctions (Steedman, 2000). Semantic Analysis. Semantic analysis is based on Babelfy (Moro et al., 2014), a joint, state- of-the-art approach to entity linking and word sense disambiguation. Given a lexicalized semantic net- work as underlying structure, Babelfy uses a dense subgraph algorithm to identify high-coherence semantic interpretations of words and multi-word expressions across an input text. We apply Babelfy to each definition d, obtaining a sense mapping Sd from surface text (words and entity mentions) to word senses and named entities (Figure 1b). As a matter of fact, any disambiguation or entity linking strategy can be used at this stage. However, a knowledge-based unified approach like Babelfy is best suited to our setting, where context is limited and exploiting definitional knowledge as much as possible is key to attaining high-coverage results (as we show in Section 6.4). 2.2 Syntactic-Semantic Graph Construction The information extracted by parsing and dis- ambiguating a given definition d is unified into a syntactic-semantic graph Gsemd where concepts and entities identified in d are arranged in a graph struc- ture encoding their syntactic dependencies (Figure 1c). We start from the dependency graph Gd, as provided by the syntactic analysis of d in Section 2.1. Semantic information from the sense mappings Sd can be incorporated directly in the vertices of Gd by attaching available matches between words and senses to the corresponding vertices. Dependency graphs, however, encode dependencies solely on a word basis, while our sense mappings may include multi-word expressions (e.g. Pink Floyd1bn). In order to extract consistent information, subsets of vertices referring to the same concept or entity are merged to a single semantic node, which replaces the subgraph covered in the original dependency structure. Consider the example in Figure 1: an entity like Pink Floyd1bn covers two distinct and connected vertices in the dependency graph Gd, one for the noun Floyd and one for its modifier Pink. In the actual semantics of the sentence, as encoded in Gsemd (Figure 1c), these two vertices are merged to a single node referring to the entity Pink Floyd1bn (the English rock band), instead of being assigned individual word interpretations. Our procedure for building Gsemd takes as input a typed dependency graph Gd and a sense mapping Sd, both extracted from a given definition d. Gsemd is first populated with the vertices of Gd referring to disambiguated content words, merging those ver- tices covered by the same sense s ∈ Sd into a sin- gle node (like Pink Floyd1bn and Atom Heart Mother1bn in Figure 1c). Then, the remaining ver- tices and edges are added as in Gd, discarding non- disambiguated adjuncts and modifiers (like the and fifth in Figure 1). 2.3 Relation Pattern Identification At this stage, all the information in a given defi- nition d has been extracted and encoded in the cor- responding graph Gsemd (Section 2.2). We now con- sider those paths connecting entity pairs across the graph and extract the relation pattern r between two entities and/or concepts as the shortest path between the two corresponding vertices in Gsemd . This en- ables us to exclude less relevant information (typ- ically carried by adjuncts or modifiers) and reduce data sparsity in the overall extraction process. Our algorithm works as follows: given a textual definition d, we consider every pair of identified concepts or entities and compute the corresponding shortest path in Gsemd using the Floyd-Warshall al- gorithm (Floyd, 1962). The only constraint we en- force is that resulting paths must include at least one verb node. This condition filters out meaningless single-node patterns (e.g. two concepts connected 531 Algorithm 1 Relation Extraction procedure EXTRACTRELATIONSFROM(D) 1: R := ∅ 2: for each d in D do 3: Gd := dependencyParse(d) 4: Sd := disambiguate(d) 5: Gsemd := buildSemanticGraph(Gd,Sd) 6: for each 〈si,sj〉 in Sd do 7: 〈si,rij,sj〉 := shortestPath(si,sj) 8: R := R ∪{〈si,rij,sj〉} 9: filterPatterns(R,ρ) return R; with a preposition) and, given the prescriptive nature of d, is unlikely to discard semantically relevant at- tributes compacted in noun phrases. As an example, consider the two sentences “Mutter is the third al- bum by German band Rammstein” and “Atom Heart Mother is the fifth album by English band Pink Floyd”. In both cases, two valid shortest-path pat- terns are extracted. The first extracted shortest-path pattern is: X → is → album1bn → by → Y with ai=Mutter3bn, aj=Rammstein 1 bn for the first sentence and ai=Atom Heart Mother1bn, aj=Pink Floyd 1 bn for the second one. The sec- ond extracted shortest-path pattern is: X → is → Y with ai=Mutter3bn, aj=album 1 bn for the first sentence and ai=Atom Heart Mother1bn, aj=album 1 bn for the second one. In fact, our extraction process seamlessly discovers general knowledge (e.g. that Mutter3bn and Atom Heart Mother1bn are instances of the concept album 1 bn) and facts (e.g. that the entities Rammstein1bn and Pink Floyd1bn have an isAlbumBy relation with the two recordings). A pseudo-code for the entire extraction algorithm is shown in Algorithm 1: given a set of textual definitions D, a set of relations is generated over extractions R, with each relation R ⊂ R comprising relation instances extracted from D. Each d ∈ D is first parsed and disambiguated to produce a syntactic-semantic graph Gsemd (Sections 2.1-2.2); then all the concept pairs 〈si,sj〉 are examined to detect relation instances as shortest paths. Finally, we filter out from the resulting set all relations for which the number of extracted instances is below a fixed threshold ρ.2 The overall algorithm extracts over 20 million relation instances in our experimental setup (Section 5) with almost 256,000 distinct relations. 3 Relation Type Signatures and Scoring We further characterize the semantics of our re- lations by computing semantic type signatures for each R ⊂ R, i.e. by attaching a proper semantic class to both its domain and range (the sets of ar- guments occurring on the left and right of the pat- tern). As every element in the domain and range of R is disambiguated, we retrieve the corresponding senses and collect their direct hypernyms. Then we select the hypernym covering the largest subset of arguments as the representative semantic class for the domain (or range) of R. We extract hypernyms using BabelNet, where taxonomic information cov- ers both general concepts (from the WordNet taxon- omy (Fellbaum, 1998)) and named entities (from the Wikipedia Bitaxonomy (Flati et al., 2014)). From the distribution of direct hypernyms over domain and range arguments of R we estimate the quality of R and associate a confidence value with its relation pattern r. Intuitively we want to assign higher confidence to relations where the correspond- ing distributions have low entropy. For instance, if both sets have a single hypernym covering all argu- ments, then R arguably captures a well-defined se- mantic relation and should be assigned high confi- dence. For each relation R, we compute: HR = − n∑ i=1 p(hi) log2 p(hi) (1) where hi(i = 1, ...,n) are all the distinct argument hypernyms over the domain and range of R and probabilities p(hi) are estimated from the propor- tion of arguments covered in such sets. The lower HR, the better semantic types of R are defined. As a matter of fact, however, some valid but over-general relations (e.g. X is a Y, X is used for Y ) have inher- ently high values of HR. To obtain a balanced score, 2In all the experiments of Section 6 we set ρ = 10. 532 Pattern Score Entropy X directed by Y 4 025.80 1.74 X known for Y 2 590.70 3.65 X is election district1bn of Y 110.49 0.83 X is composer1bn from Y 39.92 2.08 X is street1bn named after Y 1.91 2.24 X is village2bn founded in 1912 in Y 0.91 0.18 Table 1: Examples of relation scores Figure 2: Precision against score(R) (a) and HR (b) we therefore consider two additional factors, i.e. the number of extracted instances for R and the length of the associated pattern r, obtaining the following empirical measure: score(R) = |SR| (HR + 1) length(r) (2) with SR being the set of extracted relation instances for R. The +1 term accounts for cases where HR = 0. As shown in the examples of Table 1, relations with rather general patterns (such as X known for Y ) achieve higher scores compared to very specific ones (like X is village2bn founded in 1912 in Y ) de- spite higher entropy values. We validated our mea- sure on the samples of Section 6.1, computing the overall precision for different score thresholds. The monotonic decrease of sample precision in Figure 2a shows that our measure captures the quality of extracted patterns better than HR (Figure 2b). 4 Relation Taxonomization In the last stage of our approach our set of ex- tracted relations is arranged automatically in a rela- tion taxonomy. The process is carried out by com- paring relations pairwise, looking for hypernymy- hyponymy relationships between the corresponding relation patterns; we then build our taxonomy by connecting with an edge those relation pairs for which such a relationship is found. Both the relation Figure 3: Hypernym (a) and substring (b) generalizations taxonomization procedures described here examine noun nodes across each relation pattern r, and con- sider for taxonomization only those relations whose patterns are identical except for a single noun node.3 4.1 Hypernym Generalization A direct way of identifying hypernym/hyponym noun nodes across relation patterns is to analyze the semantic information attached to them. Given two relation patterns ri and rj, differing only in respect of the noun nodes ni and nj, we first look at the as- sociated concepts or entities, ci and cj, and retrieve the corresponding hypernym sets, H(ci) and H(cj). Hypernym sets are obtained by iteratively collecting the superclasses of ci and cj from the semantic network of BabelNet, up to a fixed height. For instance, given ci = album1bn, H(ci) = {work of art1bn, creation 2 bn, artifact 1 bn}, and given cj = Rammstein1bn, H(cj) = {band2bn, musical ensemble1bn, organization 1 bn}. Once we have H(ci) and H(cj), we just check whether cj ∈H(ci) or ci ∈ H(cj) (Figure 3a). According to which is the case, we conclude that rj is a generalization of ri, or that ri is a generalization of rj. 4.2 Substring Generalization The second procedure focuses on the noun (or compound) represented by the node. Given two re- lation patterns, ri and rj, we apply the following heuristic: from one of the two nouns, be it ni, any adjunct or modifier is removed, retaining the sole head word n̂i. Then, n̂i is compared with nj and, if n̂i = nj, we assume that the relation rj is a gen- eralization of ri (Figure 3b). 3The simplifying assumption here is that two given relation patterns may be in a hypernymy-hyponymy relationship only when their plain syntactic structure is equivalent (e.g. is N1 by and is N2 by, with N1 and N2 being two distinct noun nodes). 533 DEFIE NELL PATTY REVERB WISENET Freebase DBpedia Distinct relations 255 881 298 1 631 531 664 746 245 935 1 894 1 368 Distinct relations (disambiguated) 240 618 - - - - - - Average extractions per relation 81.68 7 013.03 9.68 22.16 9.24 127 727.99 24 451.48 Distinct relation instances 20 352 903 2 089 883 15 802 946 14 728 268 2 271 807 241 897 882 33 449 631 Distinct concepts/entities involved 2 398 982 1 996 021 1 087 907 3 327 425 1 636 307 66 988 232 10 338 501 Table 2: Comparative statistics on the relation extraction process 5 Experimental Setup Input. The input corpus used for the relation extraction procedure is the full set of English textual definitions in BabelNet 2.5 (Navigli and Ponzetto, 2012).4 In fact, any set of textual definitions can be provided as input to DEFIE, ranging from existing dictionaries (like WordNet or Wiktionary) to the set of first sentences of Wikipedia articles.5 As it is a merger for various different resources of this kind, BabelNet provides a large heterogeneous set com- prising definitions from WordNet, Wikipedia, Wik- tionary, Wikidata and OmegaWiki. To the best of our knowledge, this set constitutes the largest avail- able corpus of definitional knowledge. We therefore worked on a total of 4,357,327 textual definitions from the English synsets of BabelNet’s knowledge base. We then used the same version of BabelNet as the underlying semantic network structure for dis- ambiguating with Babelfy.6 Statistics. Comparative statistics are shown in Table 2. DEFIE extracts 20,352,903 relation in- stances, out of which 13,753,133 feature a fully dis- ambiguated pattern, yielding an average of 3.15 dis- ambiguated relation instances extracted from each definition. After the extraction process, our knowl- edge base comprises 255,881 distinct semantic re- lations, 94% of which also have disambiguated content words in their patterns. DEFIE extracts a considerably larger amount of relation instances compared to similar approaches, despite the much smaller amount of text used. For example, we man- aged to harvest over 5 million relation instances more than PATTY, using a much smaller corpus (sin- 4babelnet.org 5According to the Wikipedia guidelines, an article should begin with a short declarative sentence, defining what (or who) is the subject and why it is notable. 6babelfy.org gle sentences as opposed to full Wikipedia articles) and generating a number of distinct relations that was six times less than PATTY’s. As a result, we obtained an average number of extractions that was substantially higher than those of our OIE competi- tors. This suggests that DEFIE is able to exploit the nature of textual definitions effectively and general- ize over relation patterns. Furthermore, our semantic analysis captured 2,398,982 distinct arguments (ei- ther concept or named entities), outperforming al- most all open-text systems examined. Evaluation. All the evaluations carried out in Section 6 were based on manual assessment by two human judges, with an inter-annotator agreement, as measured by Cohen’s kappa coefficient, above 70% in all cases. In these evaluations we compared DE- FIE with the following OIE approaches: • NELL (Carlson et al., 2010) with knowledge base beliefs updated to November 2014; • PATTY (Nakashole et al., 2012) with Free- base types and pattern synsets from the English Wikipedia dump of June 2011; • REVERB (Fader et al., 2011), using the set of normalized relation instances from the ClueWeb09 dataset; • WISENET (Moro and Navigli, 2012; Moro and Navigli, 2013) with relational phrases from the English Wikipedia dump of December 2012. In addition, we also compared our knowledge base with up-to-date human-contributed resources, namely Freebase (Bollacker et al., 2008) and DBpe- dia (Lehmann et al., 2014), both from the dumps of April/May 2014. 534 Top 100 Top 250 Rand 100 Rand 250 DEFIE 0.93±0.01 0.91±0.02 0.79±0.02 0.81±0.08 PATTY 0.93±0.05 N/A 0.80±0.08 N/A Table 3: Precision of relation patterns NELL PATTY REVERB WISENET Freebase DBpedia Top 100 .571 .238 .214 .155 .571 .461 Rand 100 .942 .711 .596 .635 .904 .880 Table 4: Novelty of the extracted information 6 Experiments 6.1 Quality of Relations We first assessed the quality and the semantic consistency of our relations using manual evalua- tion. We ranked our relations according to their score (Section 3) and then created two samples (of size 100 and 250 respectively) of the top scoring relations. In order to evaluate the long tail of less confident relations, we created another two sam- ples of the same size with randomly extracted re- lations. We presented these samples to our human judges, accompanying each relation with a set of 50 argument pairs and the corresponding textual defini- tions from BabelNet. For each item in the sample we asked whether it represented a meaningful rela- tion and whether the extracted argument pairs were consistent with this relation and the corresponding definitions. If the answer was positive, the rela- tion was considered as correct. Finally we esti- mated the overall precision of the sample as the proportion of correct items. Results are reported in Table 3 and compared to those obtained by our closest competitor, PATTY, in the setting of Sec- tion 5. In PATTY the confidence of a given pattern was estimated from its statistical strength (Nakas- hole et al., 2012). As shown in Table 3, DEFIE achieved a comparable level of accuracy in every sample. An error analysis identified most errors as related to the vagueness of some short and general patterns, e.g. X take Y, X make Y. Others were re- lated to parsing (e.g. in labeling the head word of complex noun phrases) or disambiguation. In ad- dition, we used the same samples to estimate the novelty of the extracted information in compari- son to currently available resources. We examined each correct relation pattern and looked manually for an equivalent relation in the knowledge bases Gold Standard DEFIE WISENET PATTY 163 131 129 126 REVERB Freebase DBpedia 122 69 39 Table 5: Coverage of semantic relations of both our OIE competitors and human-contributed resources. For instance, given the relation X born in Y, NELL and REVERB have the equivalent rela- tions personborninlocation and is born in, while Freebase and DBpedia have Place of birth and birthPlace respectively. We then computed the proportion of ‘new’ relations among those previously labeled as correct by our human judges. Results are shown in Table 4 for both the top 100 sample and the random sample. The high proportion of relations not appearing in existing re- sources (especially across the random samples) sug- gests that DEFIE is capable of discovering informa- tion not obtainable from available knowledge bases, including very specific relations (X is blizzard in Y, X is Mayan language spoken by Y, X is government- owned corporation in Y ), as well as general but un- usual ones (X used by writer of Y ). 6.2 Coverage of Relations To assess the coverage of DEFIE we first tested our extracted relations on a public dataset de- scribed in (Nakashole et al., 2012) and consist- ing of 163 semantic relations manually annotated from five Wikipedia pages about musicians. Fol- lowing the line of previous works (Nakashole et al., 2012; Moro and Navigli, 2013), for each an- notation we sought a relation in our knowledge base carrying the same semantics. Results are re- ported in Table 5. Consistently with the results in Table 4, the proportion of novel information places DEFIE in line with its closest competitors, achieving a coverage of 80.3% with respect to the gold standard. Examples of relations not cov- ered by our competitors are hasFatherInLaw and hasDaughterInLaw. Furthermore, relations holding between entities and general concepts (e.g. critizedFor, praisedFor, sentencedTo), are captured only by DEFIE and REVERB (which, however, lacks any argument semantics). We also assessed the coverage of resources based 535 Freebase DBpedia NELL Random 100 83% 81% 89% Table 6: Coverage of manually curated resources PATTY WISENET Random 100 66% 69% Table 7: Coverage of individual relation instances Hyp. Gen. Substr. Gen. PATTY (Top) PATTY (Rand) Precision 0.87±0.03 0.90±0.02 0.85±0.07 0.62±0.09 # Edges 44 412 20 339 Density 1.89×10−6 7.64×10−9 Table 8: Precision and coverage of the relation taxonomy on human-defined semantic relations: we extracted three random samples of 100 relations from Free- base, DBpedia and NELL and looked for seman- tically equivalent relations in our knowledge base. As shown in Table 6, DEFIE reports a coverage be- tween 81% and 89% depending on the resource, fail- ing to cover mostly relations that refer to numerical properties (e.g. numberOfMembers). Finally, we tested the coverage of DEFIE over in- dividual relation instances. We selected a random sample of 100 triples from the two closest com- petitors exploiting textual corpora, i.e. PATTY and WISENET. For each selected triple 〈ai, r, aj〉, we sought an equivalent relation instance in our knowl- edge base, i.e. one comprising ai and aj and a re- lation pattern expressing the same semantic relation of r. Results in Table 7 show a coverage greater than 65% over both samples. Given the dramatic re- duction of corpus size and the high precision of the items extracted, these figures demonstrate that def- initional knowledge is extremely valuable for rela- tion extraction approaches. This might suggest that, even in large-scale OIE-based resources, a substan- tial amount of knowledge is likely to come from a rather smaller subset of definitional sentences within the source corpus. 6.3 Quality of Relation Taxonomization We evaluated our relation taxonomy by manually assessing the accuracy of our taxonomization heuris- tics. Then we compared our results against PATTY, the only system among our closest competitors that generates a taxonomy of relations. The setting for this evaluation was the same of that of Section 6.1. However, as we lacked a confidence measure in this case, we just extracted a random sample of 200 hy- pernym edges for each generalization procedure. We presented these samples to our human judges and, for each hypernym edge, we asked whether the cor- responding pair of relations represented a correct generalization. We then estimated the overall preci- sion as the proportion of edges regarded as correct. Results are reported in Table 8, along with PATTY’s results in the setting of Section 5; as PATTY’s edges are ranked by confidence, we consid- ered both its top confident 100 subsumptions and a random sample of the same size. As shown in Table 8, DEFIE outperforms PATTY in terms of precision, and generates more than twice the number of edges overall. HARPY (Grycner and Weikum, 2014) en- riches PATTY’s taxonomy with 616,792 hypernym edges, but its alignment algorithm, in the setting of Section 5, also includes transitive edges and still yields a sparser taxonomy compared to ours, with a graph density of 2.32×10−7. Generalization errors in our taxonomy are mostly related to disambigua- tion errors or flaws in the Wikipedia Bitaxonomy (e.g. the concept Titular Church1bn marked as hyponym of Cardinal1bn). 6.4 Quality of Entity Linking and Disambiguation We evaluated the disambiguation stage of DEFIE (Section 2.1) by comparing Babelfy against other state-of-the-art entity linking systems. In order to compare different disambiguation outputs we se- lected a random sample of 60,000 glosses from the input corpus of textual definitions (Section 5) and ran the relation extraction algorithm (Sections 2.1- 2.3) using a different competitor in the disambigua- tion step each time. We eventually used the map- pings in BabelNet to express each output using a common dictionary and sense inventory. The coverage obtained by each competitor was as- sessed by looking at the number of distinct relations extracted in the process, the total number of relation instances extracted, the number of distinct concepts or entities involved, and the average number of se- mantic nodes within the relation patterns. For each competitor, we also assessed the precision obtained by evaluating the quality and semantic consistency of the relation patterns, in the same manner as in 536 # Relations # Triples # Entities Average Sem. Nodes Babelfy 96 434 233 517 79 998 2.37 TagME 2.0 88 638 226 905 89 318 1.67 WAT 24 083 56 503 38 147 0.39 DBpedia Spotlight 67 377 140 711 38 254 1.45 Wikipedia Miner 39 547 88 777 37 036 0.96 Table 9: Coverage for different disambiguation systems Relations Relation instances Babelfy 82.3% 76.6% TagME 2.0 76.0% 62.0% WAT 84.6% 72.6% DBpedia Spotlight 70.5% 62.6% Wikipedia Miner 71.7% 56.0% Table 10: Precision for different disambiguation systems Section 6.1, both at the level of semantic relations (on the top 150 relation patterns) and at the level of individual relation instances (on a randomly ex- tracted sample of 150 triples). Results are shown in Tables 9 and 10 for Babelfy and the following sys- tems: • TagME 2.07 (Ferragina and Scaiella, 2012), which links text fragments to Wikipedia based on measures like sense commonness and keyphraseness (Mihalcea and Csomai, 2007); • WAT (Piccinno and Ferragina, 2014), an en- tity annotator that improves over TagME and features a re-designed spotting, disambiguation and pruning pipeline; • DBpedia Spotlight8 (Mendes et al., 2011), which annotates text documents with DBpedia URIs using scores such as prominence, topical relevance and contextual ambiguity; • Wikipedia Miner9 (Milne and Witten, 2013), which combines parallelized processing of Wikipedia dumps, relatedness measures and annotation features. As shown in Table 9, Babelfy outperforms all its competitors in terms of coverage and, due to its unified word sense disambiguation and entity link- ing approach, extracts semantically richer patterns 7tagme.di.unipi.it 8spotlight.dbpedia.org 9wikipediadataminer.cms.waikato.ac.nz # Definitions Proportion (%) Wikipedia 3 899 087 89.50 Wikidata 364 484 8.35 WordNet 41 356 0.95 Wiktionary 39 383 0.90 OmegaWiki 13 017 0.30 Table 11: Composition of the input corpus by source # Relations # Relation instances Avg. Extractions Wikipedia 251 954 19 455 992 77.58 Wikidata 5 414 1 033 732 191.01 WordNet 2 260 128 200 56.73 Wiktionary 2 863 143 990 50.52 OmegaWiki 1 168 45 818 39.45 Table 12: Impact of each source on the extraction step with 2.37 semantic nodes on the average per sen- tence. This reflects on the quality of semantic rela- tions, reported in Table 10, with an overall increase of precision both in terms of relations and in terms of individual instances; even though WAT shows slightly higher precision over relations, its consid- erably lower coverage yields semantically poor pat- terns (0.39 semantic nodes on the average) and im- pacts on the overall quality of relations, where some ambiguity is necessarily retained. As an example, the pattern X is station in Y, extracted from WAT’s disambiguation output, covers both railway stations and radio broadcasts. Babelfy produces, instead, two distinct relation patterns for each sense, tag- ging station as railway station1bn for the for- mer and station5bn for the latter. 6.5 Impact of Definition Sources We carried out an empirical analysis over the input corpus in our experimental setup, studying the impact of each source of textual definitions in isolation. In fact, as explained in Section 5, BabelNet’s textual definitions come from various resources: WordNet, Wikipedia, Wikidata, Wik- tionary and OmegaWiki. Table 11 shows the com- position of the input corpus with respect to each of these definition sources. The distribution is rather skewed, with the vast majority of definitions coming from Wikipedia (almost 90% of the input corpus). We ran the relation extraction algorithm (Sections 2.1-2.3) on each subset of the input corpus. As in previous experiments, we report the number of re- lation instances extracted, the number of distinct re- 537 # Wikipages # Sentences # Extractions Precision All 14 072 225 867 39 684 61.8% Top 100 10 334 161 769 13 687 59.0% Table 13: Extraction results over non-definitional text # Relation instances # Relations # Edges PATTY (definitions) 3 212 065 41 593 4 785 PATTY (Wikipedia) 15 802 946 1 631 531 20 339 Our system 20 807 732 255 881 44 412 Table 14: Performance of PATTY on definitional data lations, and the average number of extractions for each relation. Results, as shown in Table 12, are consistent with the composition of the input cor- pus in Table 11: by relying solely on Wikipedia’s first sentences, the extraction algorithm discovered 98% of all the distinct relations identified across the whole input corpus, and 93% of the total num- ber of extracted instances. Wikidata provides more than 1 million extractions (5% of the total) but def- initions are rather short and most of them (44.2%) generate only is-a relation instances. The remain- ing sources (WordNet, Wiktionary, OmegaWiki) ac- count for less than 2% of the extractions. 6.6 Impact of the Approach vs. Impact of the Data DEFIE’s relation extraction algorithm is explic- itly designed to target textual definitions. Hence, the result it achieves is due to the mutual contribution of two key features: an OIE approach and the use of definitional data. In order to decouple these two factors and study their respective im- pacts, we carried out two experiments: first we applied DEFIE to a sample of non-definitional text; then we applied our closest competitor, PATTY, on the same definition corpus described in Section 5. Extraction from non-definitional text. We selected a random sample of Wikipedia pages from the English Wikipedia dump of October 2012. We processed each sentence as in Sections 2.1-2.2 and extracted instances of those relations produced by DEFIE in the original definitional setting (Section 5); we then automatically filtered out those instances where the arguments’ hypernyms did not agree with the semantic types of the relation. We evaluated manually the quality of extractions on a sample of Source Label Target enzyme1bn catalyzes reaction 1 bn of chemical 1 bn album1bn recorded by rock group 1 bn officier1bn commanded brigade 1 bn of army unit 1 bn bridge1bn crosses over river 1 bn academic journal1bn covers research 1 bn in science 1 bn organization1bn has headquarters 3 bn in city 1 bn Table 15: Examples of augmented semantic edges 100 items (as in Section 6.1) for both the full set of extracted instances and for the subset of extractions from the top 100 scoring relations. Results are reported in Table 13: in both cases, precision figures show that extraction quality drops consistently in comparison to Section 6.1, suggesting that our extraction approach by itself is less accurate when moving to more complex sentences (with, e.g., subordinate clauses or coreferences). PATTY on textual definitions. Since no open- source implementation of PATTY is available, we implemented a version of the algorithm which uses BABELFY for named entity disambiguation. We then ran it on our corpus of BabelNet definitions and compared the results against those originally ob- tained by PATTY (on the entire Wikipedia corpus) and those obtained by DEFIE. Figures are reported in Table 14 in terms of number of extracted relation instances, distinct relations and hypernym edges in the relation taxonomy. Results show that the dra- matic reduction of corpus size affects the support sets of PATTY’s relations, worsening both coverage and generalization capability. 6.7 Preliminary Study: Resource Enrichment To further investigate the potential of our ap- proach, we explored the application of DEFIE to the enrichment of existing resources. We focused on BabelNet as a case study. In BabelNet’s seman- tic network, nodes representing concepts and en- tities are only connected via lexicograhic relation- ships from WordNet (hypernymy, meronymy, etc.) or unlabeled edges derived from Wikipedia hyper- links. Our extraction algorithm has the potential to provide useful information to both augment unla- beled edges with labels and explicit semantic con- tent, and create additional connections based on se- mantic relations. Examples are shown in Table 15. 538 # Concept pairs # Unlabeled # Labeled Type signatures 1 403 299 90 Relation instances 8 493 588 3 401 677 551 331 Table 16: Concept pairs and associated edges in BabelNet We carried out a preliminary analysis over all dis- ambiguated relations with at least 10 extracted in- stances. For each relation pattern r, we first exam- ined the concept pairs associated with its type signa- tures and looked in BabelNet for an unlabeled edge connecting the pair. Then we examined the whole set of extracted relation instances in R and looked in BabelNet for an unlabeled edge connecting the argu- ments ai and aj. Results in Table 16 show that only 27.7% of the concept pairs representing relation type signatures are connected in BabelNet, and most of these connections are unlabeled. By the same token, more than 4 million distinct argument pairs (53.5%) do not share any edge in the semantic network and, among those that do, less than 14% have a labeled relationship. These proportions suggest that our re- lations provide a potential enrichment of the under- lying knowledge base in terms of both connectivity and labeling of existing edges. In BabelNet, our case study, cross-resource mappings might also propa- gate this information across other knowledge bases and rephrase semantic relations in terms of, e.g., au- tomatically generated Wikipedia hyperlinks. 7 Related Work From the earliest days, OIE systems had to cope with the dimension and heterogeneity of huge un- structured sources of text. The first systems em- ployed statistical techniques and relied heavily on information redundancy. Then, as soon as semi- structured resources came into play (Hovy et al., 2013), researchers started developing learning sys- tems based on self-supervision (Wu and Weld, 2007) and distant supervision (Mintz et al., 2009; Krause et al., 2012). Crucial issues in distant supervision, like noisy training data, have been addressed in var- ious ways: probabilistic graphical models (Riedel et al., 2010; Hoffmann et al., 2011), sophisticated multi-instance learning algorithms (Surdeanu et al., 2012), matrix factorization techniques (Riedel et al., 2013), labeled data infusion (Pershina et al., 2014) or crowd-based human computing (Kondreddi et al., 2014). A different strategy consists of moving from open text extraction to more constrained settings. For instance, the KNOWLEDGE VAULT (Dong et al., 2014) combines Web-scale extraction with prior knowledge from existing knowledge bases; BIPER- PEDIA (Gupta et al., 2014) relies on schema-level attributes from the query stream in order to create an ontology of class-attribute pairs; RENOUN (Yahya et al., 2014) in turn exploits BIPERPEDIA to extract facts expressed as noun phrases. DEFIE focuses, in- stead, on smaller and denser corpora of prescriptive knowledge. Although early works, such as MindNet (Richardson et al., 1998), had already highlighted the potential of textual definitions for extracting re- liable semantic information, no OIE approach to the best of our knowledge has exploited definitional data to extract and disambiguate a large knowledge base of semantic relations. The direction of most papers (especially in the recent OIE literature) seems rather the opposite, namely, to target Web-scale corpora. In contrast, we manage to extract a large amount of high-quality information by combining an OIE un- supervised approach with definitional data. A deeper linguistic analysis constitutes the fo- cus of many OIE approaches. Syntactic dependen- cies are used to construct general relation patterns (Nakashole et al., 2012), or to improve the qual- ity of surface pattern realizations (Moro and Nav- igli, 2013). Phenomena like synonymy and poly- semy have been addressed with kernel-based simi- larity measures and soft clustering techniques (Min et al., 2012; Moro and Navigli, 2013), or exploiting the semantic types of relation arguments (Nakashole et al., 2012; Moro and Navigli, 2012). An appro- priate modeling of semantic types (e.g. selectional preferences) constitutes a line of research by itself, rooted in earlier works like (Resnik, 1996) and fo- cused on either class-based (Clark and Weir, 2002), or similarity-based (Erk, 2007), approaches. How- ever, these methods are used to model the seman- tics of verbs rather than arbitrary patterns. More re- cently some strategies based on topic modeling have been proposed, either to infer latent relation seman- tic types from OIE relations (Ritter et al., 2010), or to directly learn an ontological structure from a start- ing set of relation instances (Movshovitz-Attias and Cohen, 2015). However, the knowledge generated is often hard to interpret and integrate with existing 539 knowledge bases without human intervention (Rit- ter et al., 2010). In this respect, the semantic predi- cates proposed by Flati and Navigli (2013) seem to be more promising. A novelty in our approach is that issues like poly- semy and synonymy are explicitly addressed with a unified entity linking and disambiguation algorithm. By incorporating explicit semantic content in our re- lation patterns, not only do we make relations less ambiguous, but we also abstract away from specific lexicalizations of the content words and merge to- gether many patterns conveying the same semantics. Rather than using plain dependencies we also inject explicit semantic content into the dependency graph to generate a unified syntactic-semantic representa- tion. Previous works (Moro et al., 2013) used simi- lar semantic graph representations to produce filter- ing rules for relation extraction, but they required a starting set of relation patterns and did not exploit syntactic information. A joint approach of syntactic- semantic analysis of text was used in works such as (Lao et al., 2012), but they addressed a substan- tially different task (inference for knowledge base completion) and assumed a radically different set- ting, with a predefined starting set of semantic re- lations from a given knowledge base. As we en- force an OIE approach, we do not have such require- ments and directly process the input text via parsing and disambiguation. This enables DEFIE to gener- ate relations already integrated with resources like WordNet and Wikipedia, without additional align- ment steps (Grycner and Weikum, 2014), or seman- tic type propagations (Lin et al., 2012). As shown in Section 6.3, explicit semantic content within re- lation patterns underpins a rich and high-quality re- lation taxonomy, whereas generalization in (Nakas- hole et al., 2012) is limited to support set inclusion and leads to sparser and less accurate results. 8 Conclusion and Future Work We presented DEFIE, an approach to OIE that, thanks to a novel unified syntactic-semantic analy- sis of text, harvests instances of semantic relations from a corpus of textual definitions. DEFIE ex- tracts knowledge on a large scale, reducing data sparsity and disambiguating both arguments and re- lation patterns at the same time. Unlike previous semantically-enhanced approaches, mostly relying on the semantics of argument types, DEFIE is able to semantify relation phrases as well, by providing explicit links to the underlying knowledge base. We leveraged an input corpus of 4.3 million definitions and extracted over 20 million relation instances, with more than 250,000 distinct relations and almost 2.4 million concepts and entities involved. From these relations we automatically constructed a high- quality relation taxonomy by exploiting the explicit semantic content of the relation patterns. In the resulting knowledge base concepts and entities are linked to existing resources, such as WordNet and Wikipedia, via the BabelNet semantic network. We evaluated DEFIE in terms of precision, coverage, novelty of information in comparison to existing re- sources and quality of disambiguation, and we com- pared our relation taxonomy against state-of-the-art systems obtaining highly competitive results. A key feature of our approach is its deep syntactic-semantic analysis targeted to textual def- initions. In contrast to our competitors, where syn- tactic constraints are necessary in order to keep pre- cision high when dealing with noisy data, DEFIE shows comparable (or greater) performances by ex- ploiting a dense, noise-free definitional setting. DE- FIE generates a large knowledge base, in line with collaboratively-built resources and state-of-the-art OIE systems, but uses a much smaller amount of in- put data: our corpus of definitions comprises less than 83 million tokens overall, while other OIE sys- tems exploit massive corpora like Wikipedia (typi- cally more than 1.5 billion tokens), ClueWeb (more than 33 billion tokens), or the Web itself. Fur- thermore, our semantic analysis based on Babelfy enables the discovery of semantic connections be- tween both general concepts and named entities, with the potential to enrich existing structured and semi-structured resources, as we showed in a pre- liminary study on BabelNet (cf. Section 6.7). As the next step, we plan to apply DEFIE to open text and integrate it with definition extraction and automatic gloss finding algorithms (Navigli and Ve- lardi, 2010; Dalvi et al., 2015). Also, by further ex- ploiting the underlying knowledge base, inference and learning techniques (Lao et al., 2012; Wang et al., 2015) can be applied to complement our model, generating new triples or correcting wrong ones. Fi- 540 nally, another future perspective is to leverage the increasingly large variety of multilingual resources, like BabelNet, and move towards the modeling of language-independent relations. Acknowledgments The authors gratefully acknowledge the support of the ERC Starting Grant MultiJEDI No. 259234. This research was also partially supported by Google through a Faculty Research Award granted in July 2012. References Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collab- oratively Created Graph Database For Structuring Hu- man Knowledge. In Proceedings of SIGMOD, pages 1247–1250. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an Architecture for Never- Ending Language Learning. In Proceedings of AAAI, pages 1306–1313. Stephen Clark and James R. Curran. 2007. Wide- coverage Efficient Statistical Parsing with CCG and Log-Linear Models. Computational Linguistics, 33(4):493–552. Stephen Clark and David Weir. 2002. Class-Based Prob- ability Estimation Using a Semantic Hierarchy. Com- putational Linguistics, 28(2):187–206. Bhavana Dalvi, Einat Minkov, Partha P. Talukdar, and William W. Cohen. 2015. Automatic Gloss Finding for a Knowledge Base using Ontological Constraints. In Proceedings of WSDM, pages 369–378. Claudio Delli Bovi, Luis Espinosa Anke, and Roberto Navigli. 2015. Knowledge Base Unification via Sense Embeddings and Disambiguation. In Proceedings of EMNLP, pages 726–736. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: a Web-Scale Approach to Probabilistic Knowl- edge Fusion. In Proceedings of KDD, pages 601–610. Arnab Dutta, Christian Meilicke, and Simone Paolo Ponzetto. 2014. A Probabilistic Approach for Inte- grating Heterogeneous Knowledge Sources. In Pro- ceedings of ESWC, pages 286–301. Katrin Erk. 2007. A Simple, Similarity-based Model for Selectional Preferences. In Proceedings of ACL, page 216–223. Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open Information Extraction from the Web. Commun. ACM, 51(12):68–74. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of EMNLP, pages 1535– 1545. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. Bradford Books. Paolo Ferragina and Ugo Scaiella. 2012. Fast and Accu- rate Annotation of Short Texts with Wikipedia Pages. IEEE Software, 29(1):70–75. Tiziano Flati and Roberto Navigli. 2013. SPred: Large- scale Harvesting of Semantic Predicates. In Proceed- ings of ACL, pages 1222–1232. Tiziano Flati, Daniele Vannella, Tommaso Pasini, and Roberto Navigli. 2014. Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project. In Pro- ceedings of ACL, pages 945–955. Robert W. Floyd. 1962. Algorithm 97: Shortest Path. Communications of the ACM, 5(6):345–345. Adam Grycner and Gerhard Weikum. 2014. HARPY: Hypernyms and Alignment of Relational Paraphrases. In Proceedings of COLING, pages 2195–2204. Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Eui- jong Whang, and Fei Wu. 2014. Biperpedia: An Ontology for Search Applications. In Proceedings of VLDB, pages 505–516. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge- based Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of NAACL HLT, pages 541–540. Eduard Hovy, Roberto Navigli, and Simone Paolo Ponzetto. 2013. Collaboratively built semi-structured content and Artificial Intelligence: The story so far. Artificial Intelligence, 194:2–27. Sarath Kumar Kondreddi, Peter Triantafillou, and Ger- hard Weikum. 2014. Combining Information Extrac- tion and Human Computing for Crowdsourced Knowl- edge Acquisition. In Proceedings of ICDE, pages 988–999. Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-Scale Learning of Relation- Extraction Rules with Distant Supervision from the Web. In Proceedings of ISWC. Ni Lao, Amarnag Subramanya, Fernando Pereira, and William W. Cohen. 2012. Reading the Web with Learned Syntactic-Semantic Inference Rules. In Pro- ceedings of EMNLP-CoNLL, pages 1017–1026. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, 541 Sören Auer, and Christian Bizer. 2014. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, pages 1–29. Thomas Lin, Mausam, and Oren Etzioni. 2012. No Noun Phrase Left Behind: Detecting and Typing Un- linkable Entities. In Proceedings of EMNLP-CoNLL, pages 893–903. Farzaneh Mahdisoltani, Joanna Biega, and Fabian M. Suchanek. 2015. YAGO3: A Knowledge Base from Multilingual Wikipedias. In CIDR. Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. 2011. DBPedia Spotlight: Shedding Light on the Web of Documents. In Proceedings of I-Semantics, pages 1–8. Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge. In Proceedings of CIKM, pages 233–242. David Milne and Ian H. Witten. 2013. An Open-Source Toolkit for Mining Wikipedia. Artificial Intelligence, 194:222–239. Bonan Min, Shuming Shi, Ralph Grishman, and Chin- Yew Lin. 2012. Ensemble Semantics for Large-scale Unsupervised Relation Extraction. In Proceedings of EMNLP-CoNLL, pages 1027–1037. Mike Mintz, Steven Bills, Rion Snow, and Dan Juraf- sky. 2009. Distant Supervision for Relation Extrac- tion Without Labeled Data. In Proceedings of ACL- IJCNLP, pages 1003–1011. Tom M. Mitchell. 2005. Reading the Web: A Break- through Goal for AI. AI Magazine. Andrea Moro and Roberto Navigli. 2012. WiSeNet: Building a Wikipedia-based Semantic Network with Ontologized Relations. In Proceedings of CIKM, pages 1672–1676. Andrea Moro and Roberto Navigli. 2013. Integrating Syntactic and Semantic Analysis into the Open Infor- mation Extraction Paradigm. In Proceedings of IJCAI, pages 2148–2154. Andrea Moro, Hong Li, Sebastian Krause, Feiyu Xu, Roberto Navigli, and Hans Uszkoreit. 2013. Semantic Rule Filtering for Web-Scale Relation Extraction. In Proceedings of ISWC, pages 347–362. Andrea Moro, Alessandro Raganato, and Roberto Nav- igli. 2014. Entity Linking meets Word Sense Disam- biguation: a Unified Approach. TACL, 2:231–244. Dana Movshovitz-Attias and William W. Cohen. 2015. KB-LDA: Jointly Learning a Knowledge Base of Hi- erarchy, Relations, and Facts. In Proceedings of ACL. Ndapandula Nakashole, Gerhard Weikum, and Fabian M. Suchanek. 2012. PATTY: A Taxonomy of Rela- tional Patterns with Semantic Types. In Proceedings of EMNLP-CoNLL, pages 1135–1145. Vivi Nastase and Michael Strube. 2013. Transform- ing Wikipedia into a Large Scale Multilingual Concept Network. Artificial Intelligence, 194:62–85. Roberto Navigli and Simone Paolo Ponzetto. 2012. Ba- belNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Seman- tic Network. Artificial Intelligence, 193:217–250. Roberto Navigli and Paola Velardi. 2010. Learning Word-class Lattices for Definition and Hypernym Ex- traction. In Proceedings of ACL, pages 1318–1327. Maria Pershina, Bonan Min, Wei Xu, and Ralph Grish- man. 2014. Infusion of Labeled Data into Distant Su- pervision for Relation Extraction. In Proceedings of ACL, pages 732–738. Francesco Piccinno and Paolo Ferragina. 2014. From TagME to WAT: a New Entity Annotator. In Proceed- ings of ERD, pages 55–62. Simone Paolo Ponzetto and Michael Strube. 2011. Tax- onomy Induction Based on a Collaboratively Built Knowledge Repository. Artificial Intelligence, 175(9- 10):1737–1756. Philip Resnik. 1996. Selectional Constraints: An Information-Theoretic Model and its Computational Realization. Cognition, 61(1-2):127–159. Stephen D. Richardson, William B. Dolan, and Lucy Van- derwende. 1998. MindNet: Acquiring and Structur- ing Semantic Information from Text. In Proceedings of ACL, pages 1098–1102. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling Relations and Their Mentions with- out Labeled Text. In Proceedings of ECML-PKDD, pages 148–163. Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation Extraction with Matrix Factorization and Universal Schemas. In Pro- ceedings of NAACL HLT, pages 74–84. Alan Ritter, Mausam, and Oren Etzioni. 2010. A La- tent Dirichlet Allocation Method for Selectional Pref- erences. In Proceedings of ACL, pages 424–434. Mark Steedman. 2000. The Syntactic Process. MIT Press, Cambridge, MA, USA. Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance Multi- label Learning for Relation Extraction. In Proceedings of EMNLP-CoNLL, pages 455–465. Denny Vrandečić. 2012. Wikidata: A New Platform for Collaborative Data Collection. In Proceedings of WWW, pages 1063–1064. William Yang Wang, Kathryn Mazaitis, Ni Lao, Tom M. Mitchell, and William W. Cohen. 2015. Efficient In- ference and Learning in a Large Knowledge Base - Reasoning with Extracted Information using a Locally Groundable First-Order Probabilistic Logic. Machine Learning, 100(1):101–126. 542 Fei Wu and Daniel S. Weld. 2007. Autonomously Semantifying Wikipedia. In Proceedings of CIKM, pages 41–50. Mohamed Yahya, Steven Euijong Whang, Rahul Gupta, and Alon Halevy. 2014. ReNoun: Fact Extraction for Nominal Attributes. In Proceedings of EMNLP, pages 325–335. 543 544