Large-Scale Information Extraction from Textual Definitions
through Deep Syntactic and Semantic Analysis

Claudio Delli Bovi, Luca Telesca and Roberto Navigli
Department of Computer Science

Sapienza University of Rome
{dellibovi,navigli}@di.uniroma1.it

luca.telesca@gmail.com

Abstract

We present DEFIE, an approach to large-
scale Information Extraction (IE) based on a
syntactic-semantic analysis of textual defini-
tions. Given a large corpus of definitions we
leverage syntactic dependencies to reduce data
sparsity, then disambiguate the arguments and
content words of the relation strings, and fi-
nally exploit the resulting information to orga-
nize the acquired relations hierarchically. The
output of DEFIE is a high-quality knowledge
base consisting of several million automati-
cally acquired semantic relations.1

1 Introduction

The problem of knowledge acquisition lies at the
core of Natural Language Processing. Recent years
have witnessed the massive exploitation of collabo-
rative, semi-structured information as the ideal mid-
dle ground between high-quality, fully-structured
resources and the larger amount of cheaper (but
noisy) unstructured text (Hovy et al., 2013). Col-
laborative projects, like Freebase (Bollacker et al.,
2008) and Wikidata (Vrandečić, 2012), have been
being developed for many years and are continu-
ously being improved. A great deal of research also
focuses on enriching available semi-structured re-
sources, most notably Wikipedia, thereby creating
taxonomies (Ponzetto and Strube, 2011; Flati et al.,
2014), ontologies (Mahdisoltani et al., 2015) and se-
mantic networks (Navigli and Ponzetto, 2012; Nas-
tase and Strube, 2013). These solutions, however,

1http://lcl.uniroma1.it/defie

are inherently constrained to small and often pre-
specified sets of relations. A more radical approach
is adopted in systems like TEXTRUNNER (Etzioni et
al., 2008) and REVERB (Fader et al., 2011), which
developed from the Open Information Extraction
(OIE) paradigm (Etzioni et al., 2008) and focused
on the unconstrained extraction of a large number
of relations from massive unstructured corpora. Ul-
timately, all these endeavors were geared towards
addressing the knowledge acquisition problem and
tackling long-standing challenges in the field, such
as Machine Reading (Mitchell, 2005).

While earlier OIE approaches relied mostly on
dependencies at the level of surface text (Etzioni
et al., 2008; Fader et al., 2011), more recent work
has focused on deeper language understanding at the
level of both syntax and semantics (Nakashole et al.,
2012; Moro and Navigli, 2013) and tackled chal-
lenging linguistic phenomena like synonymy and
polysemy. However, these issues have not yet been
addressed in their entirety. Relation strings are still
bound to surface text, lacking actual semantic con-
tent. Furthermore, most OIE systems do not have
a clear and unified ontological structure and re-
quire additional processing steps, such as statisti-
cal inference mappings (Dutta et al., 2014), graph-
based alignments of relational phrases (Grycner and
Weikum, 2014), or knowledge base unification pro-
cedures (Delli Bovi et al., 2015), in order for their
potential to be exploitable in real applications.

In DEFIE the key idea is to leverage the linguistic
analysis of recent semantically-enhanced OIE tech-
niques while moving from open text to smaller cor-
pora of dense prescriptive knowledge. The aim is

529

Transactions of the Association for Computational Linguistics, vol. 3, pp. 529–543, 2015. Action Editor: Sebastian Riedel.
Submission batch: 5/2015; Revision batch: 8/2015; Published 10/2015.

c©2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.


Figure 1: Syntactic-semantic graph construction from a textual definition

then to extract as much information as possible by
unifying syntactic analysis and state-of-the-art dis-
ambiguation and entity linking. Using this strategy,
from an input corpus of textual definitions (short and
concise descriptions of a given concept or entity) we
are able to harvest fully disambiguated relation in-
stances on a large scale, and integrate them auto-
matically into a high-quality taxonomy of seman-
tic relations. As a result a large knowledge base is
produced that shows competitive accuracy and cov-
erage against state-of-the-art OIE systems based on
much larger corpora. Our contributions can be sum-
marized as follows:

• We propose an approach to IE that ties together
syntactic dependencies and unified entity link-
ing/word sense disambiguation, designed to
discover semantic relations from a relatively
small corpus of textual definitions;

• We create a large knowledge base of fully
disambiguated relation instances, ranging over
named entities and concepts from available re-
sources like WordNet and Wikipedia;

• We exploit our semantified relation patterns to
automatically build a rich, high-quality relation
taxonomy, showing competitive results against
state-of-the-art approaches.

Our approach comprises three stages. First, we
extract from our input corpus an initial set of seman-
tic relations (Section 2); each relation is then scored
and augmented with semantic type signatures (Sec-
tion 3); finally, the augmented relations are used to
build a relation taxonomy (Section 4).

2 Relation Extraction

Here we describe the first stage of our approach,
where a set of semantic relations is extracted from
the input corpus. In the following, we refer to a re-
lation instance as a triple t = 〈ai, r, aj〉 with ai and
aj being the arguments and r the relation pattern.
From each relation pattern rk the associated relation
Rk is identified by the set of all relation instances
where r = rk. In order to extract a large set of fully
disambiguated relation instances we bring together
syntactic and semantic analysis on a corpus of plain
textual definitions. Each definition is first parsed and
disambiguated (Figure 1a-b, Section 2.1); syntactic
and semantic information is combined into a struc-
tured graph representation (Figure 1c, Section 2.2)
and relation patterns are then extracted as shortest
paths between concept pairs (Section 2.3).

The semantics of our relations draws on BabelNet
(Navigli and Ponzetto, 2012), a wide-coverage mul-
tilingual semantic network obtained from the auto-
matic integration of WordNet, Wikipedia and other
resources. This choice is not mandatory; however,
inasmuch as it is a superset of these resources, Ba-
belNet brings together lexicographic and encyclope-
dic knowledge, enabling us to reach higher coverage
while still being able to accommodate different dis-
ambiguation strategies. For each relation instance t
extracted, both ai,aj and the content words appear-
ing in r are linked to the BabelNet inventory. In the
remainder of the paper we identify BabelNet con-
cepts or entities using a subscript-superscript nota-
tion where, for instance, bandibn refers to the i-th
BabelNet sense for the English word band.

530


2.1 Textual Definition Processing

The first step of the process is the automatic
extraction of syntactic information (typed depen-
dencies) and semantic information (word senses and
named entity mentions) from each textual definition.
Each definition undergoes the following steps:

Syntactic Analysis. Each textual defini-
tion d is parsed to obtain a dependency graph Gd
(Figure 1a). Parsing is carried out using C&C
(Clark and Curran, 2007), a log-linear parser based
on Combinatory Categorial Grammar (CCG).
Although our algorithm seamlessly works with any
syntactic formalism, CCG rules are especially suited
to longer definitions and linguistic phenomena like
coordinating conjunctions (Steedman, 2000).

Semantic Analysis. Semantic analysis is
based on Babelfy (Moro et al., 2014), a joint, state-
of-the-art approach to entity linking and word sense
disambiguation. Given a lexicalized semantic net-
work as underlying structure, Babelfy uses a dense
subgraph algorithm to identify high-coherence
semantic interpretations of words and multi-word
expressions across an input text. We apply Babelfy
to each definition d, obtaining a sense mapping Sd
from surface text (words and entity mentions) to
word senses and named entities (Figure 1b).

As a matter of fact, any disambiguation or entity
linking strategy can be used at this stage. However,
a knowledge-based unified approach like Babelfy is
best suited to our setting, where context is limited
and exploiting definitional knowledge as much as
possible is key to attaining high-coverage results (as
we show in Section 6.4).

2.2 Syntactic-Semantic Graph Construction

The information extracted by parsing and dis-
ambiguating a given definition d is unified into a
syntactic-semantic graph Gsemd where concepts and
entities identified in d are arranged in a graph struc-
ture encoding their syntactic dependencies (Figure
1c). We start from the dependency graph Gd, as
provided by the syntactic analysis of d in Section
2.1. Semantic information from the sense mappings
Sd can be incorporated directly in the vertices of Gd
by attaching available matches between words and

senses to the corresponding vertices. Dependency
graphs, however, encode dependencies solely on a
word basis, while our sense mappings may include
multi-word expressions (e.g. Pink Floyd1bn). In
order to extract consistent information, subsets of
vertices referring to the same concept or entity are
merged to a single semantic node, which replaces
the subgraph covered in the original dependency
structure. Consider the example in Figure 1: an
entity like Pink Floyd1bn covers two distinct and
connected vertices in the dependency graph Gd, one
for the noun Floyd and one for its modifier Pink. In
the actual semantics of the sentence, as encoded in
Gsemd (Figure 1c), these two vertices are merged to a
single node referring to the entity Pink Floyd1bn
(the English rock band), instead of being assigned
individual word interpretations.

Our procedure for building Gsemd takes as input
a typed dependency graph Gd and a sense mapping
Sd, both extracted from a given definition d. Gsemd
is first populated with the vertices of Gd referring
to disambiguated content words, merging those ver-
tices covered by the same sense s ∈ Sd into a sin-
gle node (like Pink Floyd1bn and Atom Heart
Mother1bn in Figure 1c). Then, the remaining ver-
tices and edges are added as in Gd, discarding non-
disambiguated adjuncts and modifiers (like the and
fifth in Figure 1).

2.3 Relation Pattern Identification

At this stage, all the information in a given defi-
nition d has been extracted and encoded in the cor-
responding graph Gsemd (Section 2.2). We now con-
sider those paths connecting entity pairs across the
graph and extract the relation pattern r between two
entities and/or concepts as the shortest path between
the two corresponding vertices in Gsemd . This en-
ables us to exclude less relevant information (typ-
ically carried by adjuncts or modifiers) and reduce
data sparsity in the overall extraction process.

Our algorithm works as follows: given a textual
definition d, we consider every pair of identified
concepts or entities and compute the corresponding
shortest path in Gsemd using the Floyd-Warshall al-
gorithm (Floyd, 1962). The only constraint we en-
force is that resulting paths must include at least one
verb node. This condition filters out meaningless
single-node patterns (e.g. two concepts connected

531


Algorithm 1 Relation Extraction

procedure EXTRACTRELATIONSFROM(D)
1: R := ∅
2: for each d in D do
3: Gd := dependencyParse(d)
4: Sd := disambiguate(d)
5: Gsemd := buildSemanticGraph(Gd,Sd)
6: for each 〈si,sj〉 in Sd do
7: 〈si,rij,sj〉 := shortestPath(si,sj)
8: R := R ∪{〈si,rij,sj〉}
9: filterPatterns(R,ρ)

return R;

with a preposition) and, given the prescriptive nature
of d, is unlikely to discard semantically relevant at-
tributes compacted in noun phrases. As an example,
consider the two sentences “Mutter is the third al-
bum by German band Rammstein” and “Atom Heart
Mother is the fifth album by English band Pink
Floyd”. In both cases, two valid shortest-path pat-
terns are extracted. The first extracted shortest-path
pattern is:

X → is → album1bn → by → Y
with ai=Mutter3bn, aj=Rammstein

1
bn for the

first sentence and ai=Atom Heart Mother1bn,
aj=Pink Floyd

1
bn for the second one. The sec-

ond extracted shortest-path pattern is:

X → is → Y
with ai=Mutter3bn, aj=album

1
bn for the first

sentence and ai=Atom Heart Mother1bn,
aj=album

1
bn for the second one. In fact, our

extraction process seamlessly discovers general
knowledge (e.g. that Mutter3bn and Atom Heart
Mother1bn are instances of the concept album

1
bn)

and facts (e.g. that the entities Rammstein1bn and
Pink Floyd1bn have an isAlbumBy relation
with the two recordings).

A pseudo-code for the entire extraction algorithm
is shown in Algorithm 1: given a set of textual
definitions D, a set of relations is generated over
extractions R, with each relation R ⊂ R comprising
relation instances extracted from D. Each d ∈ D
is first parsed and disambiguated to produce a
syntactic-semantic graph Gsemd (Sections 2.1-2.2);
then all the concept pairs 〈si,sj〉 are examined to

detect relation instances as shortest paths. Finally,
we filter out from the resulting set all relations
for which the number of extracted instances is
below a fixed threshold ρ.2 The overall algorithm
extracts over 20 million relation instances in our
experimental setup (Section 5) with almost 256,000
distinct relations.

3 Relation Type Signatures and Scoring

We further characterize the semantics of our re-
lations by computing semantic type signatures for
each R ⊂ R, i.e. by attaching a proper semantic
class to both its domain and range (the sets of ar-
guments occurring on the left and right of the pat-
tern). As every element in the domain and range of
R is disambiguated, we retrieve the corresponding
senses and collect their direct hypernyms. Then we
select the hypernym covering the largest subset of
arguments as the representative semantic class for
the domain (or range) of R. We extract hypernyms
using BabelNet, where taxonomic information cov-
ers both general concepts (from the WordNet taxon-
omy (Fellbaum, 1998)) and named entities (from the
Wikipedia Bitaxonomy (Flati et al., 2014)).

From the distribution of direct hypernyms over
domain and range arguments of R we estimate the
quality of R and associate a confidence value with
its relation pattern r. Intuitively we want to assign
higher confidence to relations where the correspond-
ing distributions have low entropy. For instance, if
both sets have a single hypernym covering all argu-
ments, then R arguably captures a well-defined se-
mantic relation and should be assigned high confi-
dence. For each relation R, we compute:

HR = −
n∑

i=1

p(hi) log2 p(hi) (1)

where hi(i = 1, ...,n) are all the distinct argument
hypernyms over the domain and range of R and
probabilities p(hi) are estimated from the propor-
tion of arguments covered in such sets. The lower
HR, the better semantic types of R are defined. As a
matter of fact, however, some valid but over-general
relations (e.g. X is a Y, X is used for Y ) have inher-
ently high values of HR. To obtain a balanced score,

2In all the experiments of Section 6 we set ρ = 10.

532


Pattern Score Entropy
X directed by Y 4 025.80 1.74
X known for Y 2 590.70 3.65

X is election district1bn of Y 110.49 0.83
X is composer1bn from Y 39.92 2.08

X is street1bn named after Y 1.91 2.24
X is village2bn founded in 1912 in Y 0.91 0.18

Table 1: Examples of relation scores

Figure 2: Precision against score(R) (a) and HR (b)

we therefore consider two additional factors, i.e. the
number of extracted instances for R and the length
of the associated pattern r, obtaining the following
empirical measure:

score(R) =
|SR|

(HR + 1) length(r)
(2)

with SR being the set of extracted relation instances
for R. The +1 term accounts for cases where HR =
0. As shown in the examples of Table 1, relations
with rather general patterns (such as X known for Y )
achieve higher scores compared to very specific ones
(like X is village2bn founded in 1912 in Y ) de-
spite higher entropy values. We validated our mea-
sure on the samples of Section 6.1, computing the
overall precision for different score thresholds. The
monotonic decrease of sample precision in Figure
2a shows that our measure captures the quality of
extracted patterns better than HR (Figure 2b).

4 Relation Taxonomization

In the last stage of our approach our set of ex-
tracted relations is arranged automatically in a rela-
tion taxonomy. The process is carried out by com-
paring relations pairwise, looking for hypernymy-
hyponymy relationships between the corresponding
relation patterns; we then build our taxonomy by
connecting with an edge those relation pairs for
which such a relationship is found. Both the relation

Figure 3: Hypernym (a) and substring (b) generalizations

taxonomization procedures described here examine
noun nodes across each relation pattern r, and con-
sider for taxonomization only those relations whose
patterns are identical except for a single noun node.3

4.1 Hypernym Generalization
A direct way of identifying hypernym/hyponym

noun nodes across relation patterns is to analyze the
semantic information attached to them. Given two
relation patterns ri and rj, differing only in respect
of the noun nodes ni and nj, we first look at the as-
sociated concepts or entities, ci and cj, and retrieve
the corresponding hypernym sets, H(ci) and H(cj).
Hypernym sets are obtained by iteratively collecting
the superclasses of ci and cj from the semantic
network of BabelNet, up to a fixed height. For
instance, given ci = album1bn, H(ci) = {work of
art1bn, creation

2
bn, artifact

1
bn}, and given

cj = Rammstein1bn, H(cj) = {band2bn, musical
ensemble1bn, organization

1
bn}. Once we have

H(ci) and H(cj), we just check whether cj ∈H(ci)
or ci ∈ H(cj) (Figure 3a). According to which is
the case, we conclude that rj is a generalization of
ri, or that ri is a generalization of rj.

4.2 Substring Generalization
The second procedure focuses on the noun (or

compound) represented by the node. Given two re-
lation patterns, ri and rj, we apply the following
heuristic: from one of the two nouns, be it ni, any
adjunct or modifier is removed, retaining the sole
head word n̂i. Then, n̂i is compared with nj and,
if n̂i = nj, we assume that the relation rj is a gen-
eralization of ri (Figure 3b).

3The simplifying assumption here is that two given relation
patterns may be in a hypernymy-hyponymy relationship only
when their plain syntactic structure is equivalent (e.g. is N1 by
and is N2 by, with N1 and N2 being two distinct noun nodes).

533


DEFIE NELL PATTY REVERB WISENET Freebase DBpedia
Distinct relations 255 881 298 1 631 531 664 746 245 935 1 894 1 368
Distinct relations (disambiguated) 240 618 - - - - - -
Average extractions per relation 81.68 7 013.03 9.68 22.16 9.24 127 727.99 24 451.48
Distinct relation instances 20 352 903 2 089 883 15 802 946 14 728 268 2 271 807 241 897 882 33 449 631
Distinct concepts/entities involved 2 398 982 1 996 021 1 087 907 3 327 425 1 636 307 66 988 232 10 338 501

Table 2: Comparative statistics on the relation extraction process

5 Experimental Setup

Input. The input corpus used for the relation
extraction procedure is the full set of English textual
definitions in BabelNet 2.5 (Navigli and Ponzetto,
2012).4 In fact, any set of textual definitions can be
provided as input to DEFIE, ranging from existing
dictionaries (like WordNet or Wiktionary) to the set
of first sentences of Wikipedia articles.5 As it is a
merger for various different resources of this kind,
BabelNet provides a large heterogeneous set com-
prising definitions from WordNet, Wikipedia, Wik-
tionary, Wikidata and OmegaWiki. To the best of
our knowledge, this set constitutes the largest avail-
able corpus of definitional knowledge. We therefore
worked on a total of 4,357,327 textual definitions
from the English synsets of BabelNet’s knowledge
base. We then used the same version of BabelNet as
the underlying semantic network structure for dis-
ambiguating with Babelfy.6

Statistics. Comparative statistics are shown in
Table 2. DEFIE extracts 20,352,903 relation in-
stances, out of which 13,753,133 feature a fully dis-
ambiguated pattern, yielding an average of 3.15 dis-
ambiguated relation instances extracted from each
definition. After the extraction process, our knowl-
edge base comprises 255,881 distinct semantic re-
lations, 94% of which also have disambiguated
content words in their patterns. DEFIE extracts
a considerably larger amount of relation instances
compared to similar approaches, despite the much
smaller amount of text used. For example, we man-
aged to harvest over 5 million relation instances
more than PATTY, using a much smaller corpus (sin-

4babelnet.org
5According to the Wikipedia guidelines, an article should

begin with a short declarative sentence, defining what (or who)
is the subject and why it is notable.

6babelfy.org

gle sentences as opposed to full Wikipedia articles)
and generating a number of distinct relations that
was six times less than PATTY’s. As a result, we
obtained an average number of extractions that was
substantially higher than those of our OIE competi-
tors. This suggests that DEFIE is able to exploit the
nature of textual definitions effectively and general-
ize over relation patterns. Furthermore, our semantic
analysis captured 2,398,982 distinct arguments (ei-
ther concept or named entities), outperforming al-
most all open-text systems examined.

Evaluation. All the evaluations carried out in
Section 6 were based on manual assessment by two
human judges, with an inter-annotator agreement, as
measured by Cohen’s kappa coefficient, above 70%
in all cases. In these evaluations we compared DE-
FIE with the following OIE approaches:

• NELL (Carlson et al., 2010) with knowledge
base beliefs updated to November 2014;

• PATTY (Nakashole et al., 2012) with Free-
base types and pattern synsets from the English
Wikipedia dump of June 2011;

• REVERB (Fader et al., 2011), using the set
of normalized relation instances from the
ClueWeb09 dataset;

• WISENET (Moro and Navigli, 2012; Moro and
Navigli, 2013) with relational phrases from the
English Wikipedia dump of December 2012.

In addition, we also compared our knowledge
base with up-to-date human-contributed resources,
namely Freebase (Bollacker et al., 2008) and DBpe-
dia (Lehmann et al., 2014), both from the dumps of
April/May 2014.

534


Top 100 Top 250 Rand 100 Rand 250
DEFIE 0.93±0.01 0.91±0.02 0.79±0.02 0.81±0.08
PATTY 0.93±0.05 N/A 0.80±0.08 N/A

Table 3: Precision of relation patterns

NELL PATTY REVERB WISENET Freebase DBpedia
Top 100 .571 .238 .214 .155 .571 .461
Rand 100 .942 .711 .596 .635 .904 .880

Table 4: Novelty of the extracted information

6 Experiments

6.1 Quality of Relations
We first assessed the quality and the semantic

consistency of our relations using manual evalua-
tion. We ranked our relations according to their
score (Section 3) and then created two samples (of
size 100 and 250 respectively) of the top scoring
relations. In order to evaluate the long tail of less
confident relations, we created another two sam-
ples of the same size with randomly extracted re-
lations. We presented these samples to our human
judges, accompanying each relation with a set of 50
argument pairs and the corresponding textual defini-
tions from BabelNet. For each item in the sample
we asked whether it represented a meaningful rela-
tion and whether the extracted argument pairs were
consistent with this relation and the corresponding
definitions. If the answer was positive, the rela-
tion was considered as correct. Finally we esti-
mated the overall precision of the sample as the
proportion of correct items. Results are reported
in Table 3 and compared to those obtained by our
closest competitor, PATTY, in the setting of Sec-
tion 5. In PATTY the confidence of a given pattern
was estimated from its statistical strength (Nakas-
hole et al., 2012). As shown in Table 3, DEFIE
achieved a comparable level of accuracy in every
sample. An error analysis identified most errors as
related to the vagueness of some short and general
patterns, e.g. X take Y, X make Y. Others were re-
lated to parsing (e.g. in labeling the head word of
complex noun phrases) or disambiguation. In ad-
dition, we used the same samples to estimate the
novelty of the extracted information in compari-
son to currently available resources. We examined
each correct relation pattern and looked manually
for an equivalent relation in the knowledge bases

Gold Standard DEFIE WISENET PATTY

163
131 129 126

REVERB Freebase DBpedia
122 69 39

Table 5: Coverage of semantic relations

of both our OIE competitors and human-contributed
resources. For instance, given the relation X born
in Y, NELL and REVERB have the equivalent rela-
tions personborninlocation and is born
in, while Freebase and DBpedia have Place of
birth and birthPlace respectively. We then
computed the proportion of ‘new’ relations among
those previously labeled as correct by our human
judges. Results are shown in Table 4 for both the
top 100 sample and the random sample. The high
proportion of relations not appearing in existing re-
sources (especially across the random samples) sug-
gests that DEFIE is capable of discovering informa-
tion not obtainable from available knowledge bases,
including very specific relations (X is blizzard in Y,
X is Mayan language spoken by Y, X is government-
owned corporation in Y ), as well as general but un-
usual ones (X used by writer of Y ).

6.2 Coverage of Relations

To assess the coverage of DEFIE we first tested
our extracted relations on a public dataset de-
scribed in (Nakashole et al., 2012) and consist-
ing of 163 semantic relations manually annotated
from five Wikipedia pages about musicians. Fol-
lowing the line of previous works (Nakashole et
al., 2012; Moro and Navigli, 2013), for each an-
notation we sought a relation in our knowledge
base carrying the same semantics. Results are re-
ported in Table 5. Consistently with the results
in Table 4, the proportion of novel information
places DEFIE in line with its closest competitors,
achieving a coverage of 80.3% with respect to the
gold standard. Examples of relations not cov-
ered by our competitors are hasFatherInLaw
and hasDaughterInLaw. Furthermore, relations
holding between entities and general concepts (e.g.
critizedFor, praisedFor, sentencedTo),
are captured only by DEFIE and REVERB (which,
however, lacks any argument semantics).

We also assessed the coverage of resources based

535


Freebase DBpedia NELL
Random 100 83% 81% 89%

Table 6: Coverage of manually curated resources

PATTY WISENET
Random 100 66% 69%

Table 7: Coverage of individual relation instances

Hyp. Gen. Substr. Gen. PATTY (Top) PATTY (Rand)
Precision 0.87±0.03 0.90±0.02 0.85±0.07 0.62±0.09
# Edges 44 412 20 339
Density 1.89×10−6 7.64×10−9

Table 8: Precision and coverage of the relation taxonomy

on human-defined semantic relations: we extracted
three random samples of 100 relations from Free-
base, DBpedia and NELL and looked for seman-
tically equivalent relations in our knowledge base.
As shown in Table 6, DEFIE reports a coverage be-
tween 81% and 89% depending on the resource, fail-
ing to cover mostly relations that refer to numerical
properties (e.g. numberOfMembers).

Finally, we tested the coverage of DEFIE over in-
dividual relation instances. We selected a random
sample of 100 triples from the two closest com-
petitors exploiting textual corpora, i.e. PATTY and
WISENET. For each selected triple 〈ai, r, aj〉, we
sought an equivalent relation instance in our knowl-
edge base, i.e. one comprising ai and aj and a re-
lation pattern expressing the same semantic relation
of r. Results in Table 7 show a coverage greater
than 65% over both samples. Given the dramatic re-
duction of corpus size and the high precision of the
items extracted, these figures demonstrate that def-
initional knowledge is extremely valuable for rela-
tion extraction approaches. This might suggest that,
even in large-scale OIE-based resources, a substan-
tial amount of knowledge is likely to come from a
rather smaller subset of definitional sentences within
the source corpus.

6.3 Quality of Relation Taxonomization

We evaluated our relation taxonomy by manually
assessing the accuracy of our taxonomization heuris-
tics. Then we compared our results against PATTY,
the only system among our closest competitors that
generates a taxonomy of relations. The setting for
this evaluation was the same of that of Section 6.1.

However, as we lacked a confidence measure in this
case, we just extracted a random sample of 200 hy-
pernym edges for each generalization procedure. We
presented these samples to our human judges and,
for each hypernym edge, we asked whether the cor-
responding pair of relations represented a correct
generalization. We then estimated the overall preci-
sion as the proportion of edges regarded as correct.

Results are reported in Table 8, along with
PATTY’s results in the setting of Section 5; as
PATTY’s edges are ranked by confidence, we consid-
ered both its top confident 100 subsumptions and a
random sample of the same size. As shown in Table
8, DEFIE outperforms PATTY in terms of precision,
and generates more than twice the number of edges
overall. HARPY (Grycner and Weikum, 2014) en-
riches PATTY’s taxonomy with 616,792 hypernym
edges, but its alignment algorithm, in the setting
of Section 5, also includes transitive edges and still
yields a sparser taxonomy compared to ours, with a
graph density of 2.32×10−7. Generalization errors
in our taxonomy are mostly related to disambigua-
tion errors or flaws in the Wikipedia Bitaxonomy
(e.g. the concept Titular Church1bn marked as
hyponym of Cardinal1bn).

6.4 Quality of Entity Linking and
Disambiguation

We evaluated the disambiguation stage of DEFIE
(Section 2.1) by comparing Babelfy against other
state-of-the-art entity linking systems. In order to
compare different disambiguation outputs we se-
lected a random sample of 60,000 glosses from the
input corpus of textual definitions (Section 5) and
ran the relation extraction algorithm (Sections 2.1-
2.3) using a different competitor in the disambigua-
tion step each time. We eventually used the map-
pings in BabelNet to express each output using a
common dictionary and sense inventory.

The coverage obtained by each competitor was as-
sessed by looking at the number of distinct relations
extracted in the process, the total number of relation
instances extracted, the number of distinct concepts
or entities involved, and the average number of se-
mantic nodes within the relation patterns. For each
competitor, we also assessed the precision obtained
by evaluating the quality and semantic consistency
of the relation patterns, in the same manner as in

536


# Relations # Triples # Entities
Average

Sem. Nodes
Babelfy 96 434 233 517 79 998 2.37
TagME 2.0 88 638 226 905 89 318 1.67
WAT 24 083 56 503 38 147 0.39
DBpedia Spotlight 67 377 140 711 38 254 1.45
Wikipedia Miner 39 547 88 777 37 036 0.96

Table 9: Coverage for different disambiguation systems

Relations Relation instances
Babelfy 82.3% 76.6%
TagME 2.0 76.0% 62.0%
WAT 84.6% 72.6%
DBpedia Spotlight 70.5% 62.6%
Wikipedia Miner 71.7% 56.0%

Table 10: Precision for different disambiguation systems

Section 6.1, both at the level of semantic relations
(on the top 150 relation patterns) and at the level
of individual relation instances (on a randomly ex-
tracted sample of 150 triples). Results are shown in
Tables 9 and 10 for Babelfy and the following sys-
tems:

• TagME 2.07 (Ferragina and Scaiella, 2012),
which links text fragments to Wikipedia based
on measures like sense commonness and
keyphraseness (Mihalcea and Csomai, 2007);

• WAT (Piccinno and Ferragina, 2014), an en-
tity annotator that improves over TagME and
features a re-designed spotting, disambiguation
and pruning pipeline;

• DBpedia Spotlight8 (Mendes et al., 2011),
which annotates text documents with DBpedia
URIs using scores such as prominence, topical
relevance and contextual ambiguity;

• Wikipedia Miner9 (Milne and Witten, 2013),
which combines parallelized processing of
Wikipedia dumps, relatedness measures and
annotation features.

As shown in Table 9, Babelfy outperforms all its
competitors in terms of coverage and, due to its
unified word sense disambiguation and entity link-
ing approach, extracts semantically richer patterns

7tagme.di.unipi.it
8spotlight.dbpedia.org
9wikipediadataminer.cms.waikato.ac.nz

# Definitions Proportion (%)
Wikipedia 3 899 087 89.50
Wikidata 364 484 8.35
WordNet 41 356 0.95
Wiktionary 39 383 0.90
OmegaWiki 13 017 0.30

Table 11: Composition of the input corpus by source

# Relations # Relation instances Avg. Extractions
Wikipedia 251 954 19 455 992 77.58
Wikidata 5 414 1 033 732 191.01
WordNet 2 260 128 200 56.73
Wiktionary 2 863 143 990 50.52
OmegaWiki 1 168 45 818 39.45

Table 12: Impact of each source on the extraction step

with 2.37 semantic nodes on the average per sen-
tence. This reflects on the quality of semantic rela-
tions, reported in Table 10, with an overall increase
of precision both in terms of relations and in terms
of individual instances; even though WAT shows
slightly higher precision over relations, its consid-
erably lower coverage yields semantically poor pat-
terns (0.39 semantic nodes on the average) and im-
pacts on the overall quality of relations, where some
ambiguity is necessarily retained. As an example,
the pattern X is station in Y, extracted from WAT’s
disambiguation output, covers both railway stations
and radio broadcasts. Babelfy produces, instead,
two distinct relation patterns for each sense, tag-
ging station as railway station1bn for the for-
mer and station5bn for the latter.

6.5 Impact of Definition Sources

We carried out an empirical analysis over the
input corpus in our experimental setup, studying
the impact of each source of textual definitions
in isolation. In fact, as explained in Section 5,
BabelNet’s textual definitions come from various
resources: WordNet, Wikipedia, Wikidata, Wik-
tionary and OmegaWiki. Table 11 shows the com-
position of the input corpus with respect to each of
these definition sources. The distribution is rather
skewed, with the vast majority of definitions coming
from Wikipedia (almost 90% of the input corpus).

We ran the relation extraction algorithm (Sections
2.1-2.3) on each subset of the input corpus. As in
previous experiments, we report the number of re-
lation instances extracted, the number of distinct re-

537


# Wikipages # Sentences # Extractions Precision
All 14 072 225 867 39 684 61.8%
Top 100 10 334 161 769 13 687 59.0%

Table 13: Extraction results over non-definitional text

# Relation instances # Relations # Edges
PATTY (definitions) 3 212 065 41 593 4 785
PATTY (Wikipedia) 15 802 946 1 631 531 20 339
Our system 20 807 732 255 881 44 412

Table 14: Performance of PATTY on definitional data

lations, and the average number of extractions for
each relation. Results, as shown in Table 12, are
consistent with the composition of the input cor-
pus in Table 11: by relying solely on Wikipedia’s
first sentences, the extraction algorithm discovered
98% of all the distinct relations identified across
the whole input corpus, and 93% of the total num-
ber of extracted instances. Wikidata provides more
than 1 million extractions (5% of the total) but def-
initions are rather short and most of them (44.2%)
generate only is-a relation instances. The remain-
ing sources (WordNet, Wiktionary, OmegaWiki) ac-
count for less than 2% of the extractions.

6.6 Impact of the Approach vs. Impact of the
Data

DEFIE’s relation extraction algorithm is explic-
itly designed to target textual definitions. Hence, the
result it achieves is due to the mutual contribution
of two key features: an OIE approach and the
use of definitional data. In order to decouple
these two factors and study their respective im-
pacts, we carried out two experiments: first we
applied DEFIE to a sample of non-definitional text;
then we applied our closest competitor, PATTY,
on the same definition corpus described in Section 5.

Extraction from non-definitional text. We
selected a random sample of Wikipedia pages from
the English Wikipedia dump of October 2012. We
processed each sentence as in Sections 2.1-2.2 and
extracted instances of those relations produced by
DEFIE in the original definitional setting (Section
5); we then automatically filtered out those instances
where the arguments’ hypernyms did not agree with
the semantic types of the relation. We evaluated
manually the quality of extractions on a sample of

Source Label Target
enzyme1bn catalyzes reaction

1
bn of chemical

1
bn

album1bn recorded by rock group
1
bn

officier1bn commanded brigade
1
bn of army unit

1
bn

bridge1bn crosses over river
1
bn

academic journal1bn covers research
1
bn in science

1
bn

organization1bn has headquarters
3
bn in city

1
bn

Table 15: Examples of augmented semantic edges

100 items (as in Section 6.1) for both the full set of
extracted instances and for the subset of extractions
from the top 100 scoring relations. Results are
reported in Table 13: in both cases, precision figures
show that extraction quality drops consistently in
comparison to Section 6.1, suggesting that our
extraction approach by itself is less accurate when
moving to more complex sentences (with, e.g.,
subordinate clauses or coreferences).

PATTY on textual definitions. Since no open-
source implementation of PATTY is available, we
implemented a version of the algorithm which uses
BABELFY for named entity disambiguation. We
then ran it on our corpus of BabelNet definitions
and compared the results against those originally ob-
tained by PATTY (on the entire Wikipedia corpus)
and those obtained by DEFIE. Figures are reported
in Table 14 in terms of number of extracted relation
instances, distinct relations and hypernym edges in
the relation taxonomy. Results show that the dra-
matic reduction of corpus size affects the support
sets of PATTY’s relations, worsening both coverage
and generalization capability.

6.7 Preliminary Study: Resource Enrichment

To further investigate the potential of our ap-
proach, we explored the application of DEFIE to
the enrichment of existing resources. We focused
on BabelNet as a case study. In BabelNet’s seman-
tic network, nodes representing concepts and en-
tities are only connected via lexicograhic relation-
ships from WordNet (hypernymy, meronymy, etc.)
or unlabeled edges derived from Wikipedia hyper-
links. Our extraction algorithm has the potential to
provide useful information to both augment unla-
beled edges with labels and explicit semantic con-
tent, and create additional connections based on se-
mantic relations. Examples are shown in Table 15.

538


# Concept pairs # Unlabeled # Labeled
Type signatures 1 403 299 90
Relation instances 8 493 588 3 401 677 551 331

Table 16: Concept pairs and associated edges in BabelNet

We carried out a preliminary analysis over all dis-
ambiguated relations with at least 10 extracted in-
stances. For each relation pattern r, we first exam-
ined the concept pairs associated with its type signa-
tures and looked in BabelNet for an unlabeled edge
connecting the pair. Then we examined the whole
set of extracted relation instances in R and looked in
BabelNet for an unlabeled edge connecting the argu-
ments ai and aj. Results in Table 16 show that only
27.7% of the concept pairs representing relation type
signatures are connected in BabelNet, and most of
these connections are unlabeled. By the same token,
more than 4 million distinct argument pairs (53.5%)
do not share any edge in the semantic network and,
among those that do, less than 14% have a labeled
relationship. These proportions suggest that our re-
lations provide a potential enrichment of the under-
lying knowledge base in terms of both connectivity
and labeling of existing edges. In BabelNet, our case
study, cross-resource mappings might also propa-
gate this information across other knowledge bases
and rephrase semantic relations in terms of, e.g., au-
tomatically generated Wikipedia hyperlinks.

7 Related Work

From the earliest days, OIE systems had to cope
with the dimension and heterogeneity of huge un-
structured sources of text. The first systems em-
ployed statistical techniques and relied heavily on
information redundancy. Then, as soon as semi-
structured resources came into play (Hovy et al.,
2013), researchers started developing learning sys-
tems based on self-supervision (Wu and Weld, 2007)
and distant supervision (Mintz et al., 2009; Krause
et al., 2012). Crucial issues in distant supervision,
like noisy training data, have been addressed in var-
ious ways: probabilistic graphical models (Riedel
et al., 2010; Hoffmann et al., 2011), sophisticated
multi-instance learning algorithms (Surdeanu et al.,
2012), matrix factorization techniques (Riedel et al.,
2013), labeled data infusion (Pershina et al., 2014)
or crowd-based human computing (Kondreddi et al.,

2014). A different strategy consists of moving from
open text extraction to more constrained settings.
For instance, the KNOWLEDGE VAULT (Dong et
al., 2014) combines Web-scale extraction with prior
knowledge from existing knowledge bases; BIPER-
PEDIA (Gupta et al., 2014) relies on schema-level
attributes from the query stream in order to create an
ontology of class-attribute pairs; RENOUN (Yahya
et al., 2014) in turn exploits BIPERPEDIA to extract
facts expressed as noun phrases. DEFIE focuses, in-
stead, on smaller and denser corpora of prescriptive
knowledge. Although early works, such as MindNet
(Richardson et al., 1998), had already highlighted
the potential of textual definitions for extracting re-
liable semantic information, no OIE approach to the
best of our knowledge has exploited definitional data
to extract and disambiguate a large knowledge base
of semantic relations. The direction of most papers
(especially in the recent OIE literature) seems rather
the opposite, namely, to target Web-scale corpora.
In contrast, we manage to extract a large amount of
high-quality information by combining an OIE un-
supervised approach with definitional data.

A deeper linguistic analysis constitutes the fo-
cus of many OIE approaches. Syntactic dependen-
cies are used to construct general relation patterns
(Nakashole et al., 2012), or to improve the qual-
ity of surface pattern realizations (Moro and Nav-
igli, 2013). Phenomena like synonymy and poly-
semy have been addressed with kernel-based simi-
larity measures and soft clustering techniques (Min
et al., 2012; Moro and Navigli, 2013), or exploiting
the semantic types of relation arguments (Nakashole
et al., 2012; Moro and Navigli, 2012). An appro-
priate modeling of semantic types (e.g. selectional
preferences) constitutes a line of research by itself,
rooted in earlier works like (Resnik, 1996) and fo-
cused on either class-based (Clark and Weir, 2002),
or similarity-based (Erk, 2007), approaches. How-
ever, these methods are used to model the seman-
tics of verbs rather than arbitrary patterns. More re-
cently some strategies based on topic modeling have
been proposed, either to infer latent relation seman-
tic types from OIE relations (Ritter et al., 2010), or
to directly learn an ontological structure from a start-
ing set of relation instances (Movshovitz-Attias and
Cohen, 2015). However, the knowledge generated
is often hard to interpret and integrate with existing

539


knowledge bases without human intervention (Rit-
ter et al., 2010). In this respect, the semantic predi-
cates proposed by Flati and Navigli (2013) seem to
be more promising.

A novelty in our approach is that issues like poly-
semy and synonymy are explicitly addressed with a
unified entity linking and disambiguation algorithm.
By incorporating explicit semantic content in our re-
lation patterns, not only do we make relations less
ambiguous, but we also abstract away from specific
lexicalizations of the content words and merge to-
gether many patterns conveying the same semantics.
Rather than using plain dependencies we also inject
explicit semantic content into the dependency graph
to generate a unified syntactic-semantic representa-
tion. Previous works (Moro et al., 2013) used simi-
lar semantic graph representations to produce filter-
ing rules for relation extraction, but they required
a starting set of relation patterns and did not exploit
syntactic information. A joint approach of syntactic-
semantic analysis of text was used in works such
as (Lao et al., 2012), but they addressed a substan-
tially different task (inference for knowledge base
completion) and assumed a radically different set-
ting, with a predefined starting set of semantic re-
lations from a given knowledge base. As we en-
force an OIE approach, we do not have such require-
ments and directly process the input text via parsing
and disambiguation. This enables DEFIE to gener-
ate relations already integrated with resources like
WordNet and Wikipedia, without additional align-
ment steps (Grycner and Weikum, 2014), or seman-
tic type propagations (Lin et al., 2012). As shown
in Section 6.3, explicit semantic content within re-
lation patterns underpins a rich and high-quality re-
lation taxonomy, whereas generalization in (Nakas-
hole et al., 2012) is limited to support set inclusion
and leads to sparser and less accurate results.

8 Conclusion and Future Work

We presented DEFIE, an approach to OIE that,
thanks to a novel unified syntactic-semantic analy-
sis of text, harvests instances of semantic relations
from a corpus of textual definitions. DEFIE ex-
tracts knowledge on a large scale, reducing data
sparsity and disambiguating both arguments and re-
lation patterns at the same time. Unlike previous

semantically-enhanced approaches, mostly relying
on the semantics of argument types, DEFIE is able
to semantify relation phrases as well, by providing
explicit links to the underlying knowledge base. We
leveraged an input corpus of 4.3 million definitions
and extracted over 20 million relation instances,
with more than 250,000 distinct relations and almost
2.4 million concepts and entities involved. From
these relations we automatically constructed a high-
quality relation taxonomy by exploiting the explicit
semantic content of the relation patterns. In the
resulting knowledge base concepts and entities are
linked to existing resources, such as WordNet and
Wikipedia, via the BabelNet semantic network. We
evaluated DEFIE in terms of precision, coverage,
novelty of information in comparison to existing re-
sources and quality of disambiguation, and we com-
pared our relation taxonomy against state-of-the-art
systems obtaining highly competitive results.

A key feature of our approach is its deep
syntactic-semantic analysis targeted to textual def-
initions. In contrast to our competitors, where syn-
tactic constraints are necessary in order to keep pre-
cision high when dealing with noisy data, DEFIE
shows comparable (or greater) performances by ex-
ploiting a dense, noise-free definitional setting. DE-
FIE generates a large knowledge base, in line with
collaboratively-built resources and state-of-the-art
OIE systems, but uses a much smaller amount of in-
put data: our corpus of definitions comprises less
than 83 million tokens overall, while other OIE sys-
tems exploit massive corpora like Wikipedia (typi-
cally more than 1.5 billion tokens), ClueWeb (more
than 33 billion tokens), or the Web itself. Fur-
thermore, our semantic analysis based on Babelfy
enables the discovery of semantic connections be-
tween both general concepts and named entities,
with the potential to enrich existing structured and
semi-structured resources, as we showed in a pre-
liminary study on BabelNet (cf. Section 6.7).

As the next step, we plan to apply DEFIE to open
text and integrate it with definition extraction and
automatic gloss finding algorithms (Navigli and Ve-
lardi, 2010; Dalvi et al., 2015). Also, by further ex-
ploiting the underlying knowledge base, inference
and learning techniques (Lao et al., 2012; Wang et
al., 2015) can be applied to complement our model,
generating new triples or correcting wrong ones. Fi-

540


nally, another future perspective is to leverage the
increasingly large variety of multilingual resources,
like BabelNet, and move towards the modeling of
language-independent relations.

Acknowledgments

The authors gratefully acknowledge
the support of the ERC Starting
Grant MultiJEDI No. 259234.

This research was also partially supported by
Google through a Faculty Research Award granted
in July 2012.

References
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim

Sturge, and Jamie Taylor. 2008. Freebase: A Collab-
oratively Created Graph Database For Structuring Hu-
man Knowledge. In Proceedings of SIGMOD, pages
1247–1250.

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr
Settles, Estevam R. Hruschka Jr., and Tom M.
Mitchell. 2010. Toward an Architecture for Never-
Ending Language Learning. In Proceedings of AAAI,
pages 1306–1313.

Stephen Clark and James R. Curran. 2007. Wide-
coverage Efficient Statistical Parsing with CCG and
Log-Linear Models. Computational Linguistics,
33(4):493–552.

Stephen Clark and David Weir. 2002. Class-Based Prob-
ability Estimation Using a Semantic Hierarchy. Com-
putational Linguistics, 28(2):187–206.

Bhavana Dalvi, Einat Minkov, Partha P. Talukdar, and
William W. Cohen. 2015. Automatic Gloss Finding
for a Knowledge Base using Ontological Constraints.
In Proceedings of WSDM, pages 369–378.

Claudio Delli Bovi, Luis Espinosa Anke, and Roberto
Navigli. 2015. Knowledge Base Unification via Sense
Embeddings and Disambiguation. In Proceedings of
EMNLP, pages 726–736.

Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko
Horn, Ni Lao, Kevin Murphy, Thomas Strohmann,
Shaohua Sun, and Wei Zhang. 2014. Knowledge
Vault: a Web-Scale Approach to Probabilistic Knowl-
edge Fusion. In Proceedings of KDD, pages 601–610.

Arnab Dutta, Christian Meilicke, and Simone Paolo
Ponzetto. 2014. A Probabilistic Approach for Inte-
grating Heterogeneous Knowledge Sources. In Pro-
ceedings of ESWC, pages 286–301.

Katrin Erk. 2007. A Simple, Similarity-based Model for
Selectional Preferences. In Proceedings of ACL, page
216–223.

Oren Etzioni, Michele Banko, Stephen Soderland, and
Daniel S. Weld. 2008. Open Information Extraction
from the Web. Commun. ACM, 51(12):68–74.

Anthony Fader, Stephen Soderland, and Oren Etzioni.
2011. Identifying Relations for Open Information
Extraction. In Proceedings of EMNLP, pages 1535–
1545.

Christiane Fellbaum. 1998. WordNet: An Electronic
Lexical Database. Bradford Books.

Paolo Ferragina and Ugo Scaiella. 2012. Fast and Accu-
rate Annotation of Short Texts with Wikipedia Pages.
IEEE Software, 29(1):70–75.

Tiziano Flati and Roberto Navigli. 2013. SPred: Large-
scale Harvesting of Semantic Predicates. In Proceed-
ings of ACL, pages 1222–1232.

Tiziano Flati, Daniele Vannella, Tommaso Pasini, and
Roberto Navigli. 2014. Two Is Bigger (and Better)
Than One: the Wikipedia Bitaxonomy Project. In Pro-
ceedings of ACL, pages 945–955.

Robert W. Floyd. 1962. Algorithm 97: Shortest Path.
Communications of the ACM, 5(6):345–345.

Adam Grycner and Gerhard Weikum. 2014. HARPY:
Hypernyms and Alignment of Relational Paraphrases.
In Proceedings of COLING, pages 2195–2204.

Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Eui-
jong Whang, and Fei Wu. 2014. Biperpedia: An
Ontology for Search Applications. In Proceedings of
VLDB, pages 505–516.

Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke
Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-
based Weak Supervision for Information Extraction
of Overlapping Relations. In Proceedings of NAACL
HLT, pages 541–540.

Eduard Hovy, Roberto Navigli, and Simone Paolo
Ponzetto. 2013. Collaboratively built semi-structured
content and Artificial Intelligence: The story so far.
Artificial Intelligence, 194:2–27.

Sarath Kumar Kondreddi, Peter Triantafillou, and Ger-
hard Weikum. 2014. Combining Information Extrac-
tion and Human Computing for Crowdsourced Knowl-
edge Acquisition. In Proceedings of ICDE, pages
988–999.

Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu
Xu. 2012. Large-Scale Learning of Relation-
Extraction Rules with Distant Supervision from the
Web. In Proceedings of ISWC.

Ni Lao, Amarnag Subramanya, Fernando Pereira, and
William W. Cohen. 2012. Reading the Web with
Learned Syntactic-Semantic Inference Rules. In Pro-
ceedings of EMNLP-CoNLL, pages 1017–1026.

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch,
Dimitris Kontokostas, Pablo N. Mendes, Sebastian
Hellmann, Mohamed Morsey, Patrick van Kleef,

541


Sören Auer, and Christian Bizer. 2014. DBpedia - A
Large-scale, Multilingual Knowledge Base Extracted
from Wikipedia. Semantic Web Journal, pages 1–29.

Thomas Lin, Mausam, and Oren Etzioni. 2012. No
Noun Phrase Left Behind: Detecting and Typing Un-
linkable Entities. In Proceedings of EMNLP-CoNLL,
pages 893–903.

Farzaneh Mahdisoltani, Joanna Biega, and Fabian M.
Suchanek. 2015. YAGO3: A Knowledge Base from
Multilingual Wikipedias. In CIDR.

Pablo N. Mendes, Max Jakob, Andrés García-Silva, and
Christian Bizer. 2011. DBPedia Spotlight: Shedding
Light on the Web of Documents. In Proceedings of
I-Semantics, pages 1–8.

Rada Mihalcea and Andras Csomai. 2007. Wikify!:
Linking Documents to Encyclopedic Knowledge. In
Proceedings of CIKM, pages 233–242.

David Milne and Ian H. Witten. 2013. An Open-Source
Toolkit for Mining Wikipedia. Artificial Intelligence,
194:222–239.

Bonan Min, Shuming Shi, Ralph Grishman, and Chin-
Yew Lin. 2012. Ensemble Semantics for Large-scale
Unsupervised Relation Extraction. In Proceedings of
EMNLP-CoNLL, pages 1027–1037.

Mike Mintz, Steven Bills, Rion Snow, and Dan Juraf-
sky. 2009. Distant Supervision for Relation Extrac-
tion Without Labeled Data. In Proceedings of ACL-
IJCNLP, pages 1003–1011.

Tom M. Mitchell. 2005. Reading the Web: A Break-
through Goal for AI. AI Magazine.

Andrea Moro and Roberto Navigli. 2012. WiSeNet:
Building a Wikipedia-based Semantic Network with
Ontologized Relations. In Proceedings of CIKM,
pages 1672–1676.

Andrea Moro and Roberto Navigli. 2013. Integrating
Syntactic and Semantic Analysis into the Open Infor-
mation Extraction Paradigm. In Proceedings of IJCAI,
pages 2148–2154.

Andrea Moro, Hong Li, Sebastian Krause, Feiyu Xu,
Roberto Navigli, and Hans Uszkoreit. 2013. Semantic
Rule Filtering for Web-Scale Relation Extraction. In
Proceedings of ISWC, pages 347–362.

Andrea Moro, Alessandro Raganato, and Roberto Nav-
igli. 2014. Entity Linking meets Word Sense Disam-
biguation: a Unified Approach. TACL, 2:231–244.

Dana Movshovitz-Attias and William W. Cohen. 2015.
KB-LDA: Jointly Learning a Knowledge Base of Hi-
erarchy, Relations, and Facts. In Proceedings of ACL.

Ndapandula Nakashole, Gerhard Weikum, and Fabian M.
Suchanek. 2012. PATTY: A Taxonomy of Rela-
tional Patterns with Semantic Types. In Proceedings
of EMNLP-CoNLL, pages 1135–1145.

Vivi Nastase and Michael Strube. 2013. Transform-
ing Wikipedia into a Large Scale Multilingual Concept
Network. Artificial Intelligence, 194:62–85.

Roberto Navigli and Simone Paolo Ponzetto. 2012. Ba-
belNet: The Automatic Construction, Evaluation and
Application of a Wide-Coverage Multilingual Seman-
tic Network. Artificial Intelligence, 193:217–250.

Roberto Navigli and Paola Velardi. 2010. Learning
Word-class Lattices for Definition and Hypernym Ex-
traction. In Proceedings of ACL, pages 1318–1327.

Maria Pershina, Bonan Min, Wei Xu, and Ralph Grish-
man. 2014. Infusion of Labeled Data into Distant Su-
pervision for Relation Extraction. In Proceedings of
ACL, pages 732–738.

Francesco Piccinno and Paolo Ferragina. 2014. From
TagME to WAT: a New Entity Annotator. In Proceed-
ings of ERD, pages 55–62.

Simone Paolo Ponzetto and Michael Strube. 2011. Tax-
onomy Induction Based on a Collaboratively Built
Knowledge Repository. Artificial Intelligence, 175(9-
10):1737–1756.

Philip Resnik. 1996. Selectional Constraints: An
Information-Theoretic Model and its Computational
Realization. Cognition, 61(1-2):127–159.

Stephen D. Richardson, William B. Dolan, and Lucy Van-
derwende. 1998. MindNet: Acquiring and Structur-
ing Semantic Information from Text. In Proceedings
of ACL, pages 1098–1102.

Sebastian Riedel, Limin Yao, and Andrew McCallum.
2010. Modeling Relations and Their Mentions with-
out Labeled Text. In Proceedings of ECML-PKDD,
pages 148–163.

Sebastian Riedel, Limin Yao, Andrew McCallum, and
Benjamin M. Marlin. 2013. Relation Extraction with
Matrix Factorization and Universal Schemas. In Pro-
ceedings of NAACL HLT, pages 74–84.

Alan Ritter, Mausam, and Oren Etzioni. 2010. A La-
tent Dirichlet Allocation Method for Selectional Pref-
erences. In Proceedings of ACL, pages 424–434.

Mark Steedman. 2000. The Syntactic Process. MIT
Press, Cambridge, MA, USA.

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and
Christopher D. Manning. 2012. Multi-instance Multi-
label Learning for Relation Extraction. In Proceedings
of EMNLP-CoNLL, pages 455–465.

Denny Vrandečić. 2012. Wikidata: A New Platform
for Collaborative Data Collection. In Proceedings of
WWW, pages 1063–1064.

William Yang Wang, Kathryn Mazaitis, Ni Lao, Tom M.
Mitchell, and William W. Cohen. 2015. Efficient In-
ference and Learning in a Large Knowledge Base -
Reasoning with Extracted Information using a Locally
Groundable First-Order Probabilistic Logic. Machine
Learning, 100(1):101–126.

542


Fei Wu and Daniel S. Weld. 2007. Autonomously
Semantifying Wikipedia. In Proceedings of CIKM,
pages 41–50.

Mohamed Yahya, Steven Euijong Whang, Rahul Gupta,
and Alon Halevy. 2014. ReNoun: Fact Extraction for
Nominal Attributes. In Proceedings of EMNLP, pages
325–335.

543


544