Entity Linking meets Word Sense Disambiguation: a Unified Approach


Entity Linking meets Word Sense Disambiguation: a Unified Approach

Andrea Moro, Alessandro Raganato, Roberto Navigli
Dipartimento di Informatica,
Sapienza Università di Roma,

Viale Regina Elena 295, 00161 Roma, Italy
{moro,navigli}@di.uniroma1.it

ale.raganato@gmail.com

Abstract

Entity Linking (EL) and Word Sense Disam-
biguation (WSD) both address the lexical am-
biguity of language. But while the two tasks
are pretty similar, they differ in a fundamen-
tal respect: in EL the textual mention can be
linked to a named entity which may or may not
contain the exact mention, while in WSD there
is a perfect match between the word form (bet-
ter, its lemma) and a suitable word sense.
In this paper we present Babelfy, a unified
graph-based approach to EL and WSD based
on a loose identification of candidate mean-
ings coupled with a densest subgraph heuris-
tic which selects high-coherence semantic in-
terpretations. Our experiments show state-of-
the-art performances on both tasks on 6 differ-
ent datasets, including a multilingual setting.
Babelfy is online at http://babelfy.org

1 Introduction

The automatic understanding of the meaning of text
has been a major goal of research in computational
linguistics and related areas for several decades,
with ambitious challenges, such as Machine Read-
ing (Etzioni et al., 2006) and the quest for knowl-
edge (Schubert, 2006). Word Sense Disambiguation
(WSD) (Navigli, 2009; Navigli, 2012) is a historical
task aimed at assigning meanings to single-word and
multi-word occurrences within text, a task which is
more alive than ever in the research community.

Recently, the collaborative creation of large semi-
structured resources, such as Wikipedia, and knowl-
edge resources built from them (Hovy et al., 2013),

such as BabelNet (Navigli and Ponzetto, 2012a),
DBpedia (Auer et al., 2007) and YAGO2 (Hoffart
et al., 2013), has favoured the emergence of new
tasks, such as Entity Linking (EL) (Rao et al., 2013),
and opened up new possibilities for tasks such as
Named Entity Disambiguation (NED) and Wikifi-
cation. The aim of EL is to discover mentions of
entities within a text and to link them to the most
suitable entry in a reference knowledge base. How-
ever, in contrast to WSD, a mention may be partial
while still being unambiguous thanks to the context.
For instance, consider the following sentence:

(1) Thomas and Mario are strikers playing in Munich.

This example makes it clear how intertwined the
two tasks of WSD and EL are. In fact, on the one
hand, striker and play are polysemous words which
can be disambiguated by selecting the game/soccer
playing senses of the two words in a dictionary; on
the other hand, Thomas and Mario are partial men-
tions which have to be linked to the appropriate en-
tries of a knowledge base, that is, Thomas Müller
and Mario Gomez, two well-known soccer players.

The two main differences between WSD and EL
lie, on the one hand, in the kind of inventory used,
i.e., dictionary vs. encyclopedia, and, on the other
hand, in the assumption that the mention is complete
or potentially partial. Notwithstanding these differ-
ences, the tasks are similar in nature, in that they
both involve the disambiguation of textual fragments
according to a reference inventory. However, the re-
search community has so far tackled the two tasks
separately, often duplicating efforts and solutions.

In contrast to this trend, research in knowledge
acquisition is now heading towards the seamless in-

231

Transactions of the Association for Computational Linguistics, 2 (2014) 231–244. Action Editor: Noah Smith.
Submitted 9/2013; Revised 1/2014; Published 5/2014. c©2014 Association for Computational Linguistics.


tegration of encyclopedic and lexicographic knowl-
edge into structured language resources (Hovy et al.,
2013), and the main representative of this new direc-
tion is undoubtedly BabelNet (Navigli and Ponzetto,
2012a). Given such structured language resources it
seems natural to suppose that they might provide a
common ground for the two tasks of WSD and EL.

More precisely, in this paper we explore the hy-
pothesis that the lexicographic knowledge used in
WSD is also useful for tackling the EL task, and,
vice versa, that the encyclopedic information uti-
lized in EL helps disambiguate nominal mentions in
a WSD setting. We propose Babelfy, a novel, uni-
fied graph-based approach to WSD and EL, which
performs two main steps: i) it exploits random walks
with restart, and triangles as a support for reweight-
ing the edges of a large semantic network; ii) it uses
a densest subgraph heuristic on the available seman-
tic interpretations of the input text to perform a joint
disambiguation with both concepts and named enti-
ties. Our experiments show the benefits of our syn-
ergistic approach on six gold-standard datasets.

2 Related Work

2.1 Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the task of
choosing the right sense for a word within a given
context. Typical approaches for this task can be clas-
sified as i) supervised, ii) knowledge-based, and iii)
unsupervised. However, supervised approaches re-
quire huge amounts of annotated data (Zhong and
Ng, 2010; Shen et al., 2013; Pilehvar and Navigli,
2014), an effort which cannot easily be repeated
for new domains and languages, while unsupervised
ones suffer from data sparsity and an intrinsic diffi-
culty in their evaluation (Agirre et al., 2006; Brody
and Lapata, 2009; Manandhar et al., 2010; Van de
Cruys and Apidianaki, 2011; Di Marco and Nav-
igli, 2013). On the other hand, knowledge-based
approaches are able to obtain good performance us-
ing readily-available structured knowledge (Agirre
et al., 2010; Guo and Diab, 2010; Ponzetto and Nav-
igli, 2010; Miller et al., 2012; Agirre et al., 2014).
Some of these approaches marginally take into ac-
count the structural properties of the knowledge base
(Mihalcea, 2005). Other approaches, instead, lever-
age the structural properties of the knowledge base

by exploiting centrality and connectivity measures
(Sinha and Mihalcea, 2007; Tsatsaronis et al., 2007;
Agirre and Soroa, 2009; Navigli and Lapata, 2010).

One of the key steps of many knowledge-based
WSD algorithms is the creation of a graph repre-
senting the semantic interpretations of the input text.
Two main strategies to build this graph have been
proposed: i) exploiting the direct connections, i.e.,
edges, between the considered sense candidates; ii)
populating the graph according to (shortest) paths
between them. In our approach we manage to unify
these two strategies by automatically creating edges
between sense candidates performing Random Walk
with Restart (Tong et al., 2006).

The recent upsurge of interest in multilinguality
has led to the development of cross-lingual and mul-
tilingual approaches to WSD (Lefever and Hoste,
2010; Lefever and Hoste, 2013; Navigli et al., 2013).
Multilinguality has been exploited in different ways,
e.g., by using parallel corpora to build multilingual
contexts (Guo and Diab, 2010; Banea and Mihalcea,
2011; Lefever et al., 2011) or by means of ensemble
methods which exploit complementary sense evi-
dence from translations in different languages (Nav-
igli and Ponzetto, 2012b). In this work, we present
a novel exploitation of the structural properties of a
multilingual semantic network.

2.2 Entity Linking

Entity Linking (Erbs et al., 2011; Rao et al., 2013;
Cornolti et al., 2013) encompasses a set of similar
tasks, which include Named Entity Disambiguation
(NED), that is the task of linking entity mentions
in a text to a knowledge base (Bunescu and Pasca,
2006; Cucerzan, 2007), and Wikification, i.e., the
automatic annotation of text by linking its relevant
fragments of text to the appropriate Wikipedia arti-
cles. Mihalcea and Csomai (2007) were the first to
tackle the Wikification task. In their approach they
disambiguate each word in a sentence independently
by exploiting the context in which it occurs. How-
ever, this approach is local in that it lacks a collective
notion of coherence between the selected Wikipedia
pages. To overcome this problem, Cucerzan (2007)
introduced a global approach based on the simulta-
neous disambiguation of all the terms in a text and
the use of lexical context to disambiguate the men-
tions. To maximize the semantic agreement Milne

232


and Witten (2008) introduced the analysis of the se-
mantic relations between the candidate senses and
the unambiguous context, i.e., words with a single
sense candidate. However, the performance of this
algorithm depends heavily on the number of links
incident to the target senses and on the availabil-
ity of unambiguous words within the input text. To
overcome this issue a novel class of approaches have
been proposed (Kulkarni et al., 2009; Ratinov et al.,
2011; Hoffart et al., 2011) that exploit global and
local features. However, these systems either rely
on a difficult NP-hard formalization of the problem
which is infeasible for long text, or exploit popular-
ity measures which are domain-dependent. In con-
trast, we show that the semantic network structure
can be leveraged to obtain state-of-the-art perfor-
mance by synergistically disambiguating both word
senses and named entities at the same time.

Recently, the explosion of on-line social network-
ing services, such as Twitter and Facebook, have
contributed to the development of new methods for
the efficient disambiguation of short texts (Ferrag-
ina and Scaiella, 2010; Hoffart et al., 2012; Böhm et
al., 2012). Thanks to a loose candidate identification
technique coupled with a densest subgraph heuristic,
we show that our approach is particularly suited for
short and highly ambiguous text disambiguation.

2.3 The Best of Two Worlds

Our main goal is to bring together the two worlds of
WSD and EL. On the one hand, this implies relaxing
the constraint of a perfect association between men-
tions and meanings, which is, instead, assumed in
WSD. On the other hand, this relaxation leads to the
inherent difficulty of encoding a full-fledged sense
inventory for EL. Our solution to this problem is to
keep the set of candidate meanings for a given men-
tion as open as possible (see Section 6), so as to en-
able high recall in linking partial mentions, while
providing an effective method for handling this high
ambiguity (see Section 7).

A key assumption of our work is that the lexico-
graphic knowledge used in WSD is also useful for
tackling the EL task, and vice versa the encyclopedic
information utilized in EL helps disambiguate nom-
inal mentions in a WSD setting. We enable the joint
treatment of concepts and named entities by enforc-
ing high coherence in our semantic interpretations.

3 WSD and Entity Linking Together

Task. Our task is to disambiguate and link all
nominal and named entity mentions occurring
within a text. The linking task is performed by asso-
ciating each mention with the most suitable entry of
a given knowledge base.1

We point out that our definition is unconstrained
in terms of what to link, i.e., unlike Wikification and
WSD, we can link overlapping fragments of text.
For instance, given the text fragment Major League
Soccer, we identify and disambiguate several dif-
ferent nominal and entity mentions: Major League
Soccer, major league, league and soccer. In contrast
to EL, we link not only named entity mentions, such
as Major League Soccer, but also nominal mentions,
e.g., major league, to their corresponding meanings
in the knowledge base.

Babelfy. We provide a unified approach to WSD
and entity linking in three steps:

1. Given a lexicalized semantic network, we as-
sociate with each vertex, i.e., either concept or
named entity, a semantic signature, that is, a set
of related vertices (Section 5). This is a prelim-
inary step which needs to be performed only
once, independently of the input text.

2. Given a text, we extract all the linkable frag-
ments from this text and, for each of them, list
the possible meanings according to the seman-
tic network (Section 6).

3. We create a graph-based semantic interpreta-
tion of the whole text by linking the candidate
meanings of the extracted fragments using the
previously-computed semantic signatures. We
then extract a dense subgraph of this represen-
tation and select the best candidate meaning for
each fragment (Section 7).

4 Semantic Network

Our approach requires the availability of a wide-
coverage semantic network which encodes struc-
tural and lexical information both of an encyclope-
dic and of a lexicographic kind. Although in prin-
ciple any semantic network with these properties

1Mentions which are not contained in the reference knowl-
edge base are not taken into account.

233


could be utilized, in our work we used the Babel-
Net2 1.1.1 semantic network (Navigli and Ponzetto,
2012a) since it is the largest multilingual knowl-
edge base, obtained from the automatic seamless
integration of Wikipedia3 and WordNet (Fellbaum,
1998). We consider BabelNet as a directed multi-
graph which contains both concepts and named en-
tities as its vertices and a multiset of semantic rela-
tions as its edges. We leverage the multilingual lex-
icalizations of the vertices of BabelNet to identify
mentions in the input text. For example, the entity
FC Bayern Munich can be lexicalized in different
languages, e.g., F.C. Bayern de Múnich in Spanish,
Die Roten in English and Bayern München in Ger-
man, among others. As regards semantic relations,
the only information we use is that of the end points,
i.e., vertices, that these relations connect, while ne-
glecting the relation type.

5 Building Semantic Signatures

One of the major issues affecting both manually-
curated and automatically constructed semantic net-
works is data sparsity. For instance, we calculated
that the average number of incident edges is roughly
10 in WordNet, 50 in BabelNet and 80 in YAGO2,
to mention a few. Although automatically-built re-
sources typically provide larger amounts of edges,
two issues have to be taken into account: concepts
which should be related might not be directly con-
nected despite being structurally close within the
network, and, vice versa, weakly-related or even un-
related concepts can be erroneously connected by an
edge. For instance, in BabelNet we do not have an
edge between playmaker and Thomas Müller, while
we have an incorrect edge connecting FC Bayern
Munich and Yellow Submarine (song). However,
this crisp notion of relatedness can be overcome by
exploiting the global structure of the semantic net-
work, thereby obtaining a more precise and higher-
coverage measure of relatedness. We address this
issue in two steps: first, we provide a structural
weighting of the network’s edges; second, for each
vertex we create a set of related vertices using ran-
dom walks with restart.

2http://babelnet.org
3http://www.wikipedia.org

Structural weighting. Our first objective is to as-
sign higher weights to edges which are involved in
more densely connected areas of the directed net-
work. To this end, inspired by the local cluster-
ing coefficient measure (Watts and Strogatz, 1998)
and its recent success in Word Sense Induction
(Di Marco and Navigli, 2013), we use directed tri-
angles, i.e., directed cycles of length 3, and weight
each edge (v,v′) by the number of directed triangles
it occurs in:

weight(v,v′) := |{(v,v′,v′′) : (v,v′), (1)
(v′,v′′),(v′′,v) ∈ E}|+ 1

We add one to each weight to ensure the highest de-
gree of reachability in the network.

Random Walk with Restart. Our goal is to cre-
ate a semantic signature (i.e., a set of highly related
vertices) for each concept and named entity of the
semantic network. To do this, we perform a Random
Walk with Restart (RWR) (Tong et al., 2006), that is,
a stochastic process that starts from an initial vertex
of the graph4 and then, for a fixed number n of steps
or until convergence, explores the graph by choos-
ing the next vertex within the current neighborhood
or by restarting from the initial vertex with a given,
fixed restart probability α. For each edge (v,v′) in
the network, we model the conditional probability
P(v′|v) as the normalized weight of the edge:

P(v′|v) = weight(v,v
′)∑

v′′∈V weight(v,v
′′)

where V is the set of vertices of the semantic net-
work and weight(v,v′) is the function defined in
Equation 1. We then run the RWR from each ver-
tex v of the semantic network for a fixed number n
of steps (we show in Algorithm 1 our RWR pseu-
docode). We keep track of the encountered ver-
tices using the map counts, i.e., we increase the
counter associated with vertex v′ in counts every
time we hit v′ during a RWR started from v (see
line 11). As a result, we obtain a frequency distri-
bution over the whole set of concepts and entities.
To eliminate weakly-related vertices we keep only
those items that were hit at least η times (see lines
16–18). Finally, we save the remaining vertices in
the set semSignv which is the semantic signature
of v (see line 19).

4RWR can be used with an initial set of vertices, however in
this paper we use a single initial vertex.

234


Algorithm 1 Random walk with restart.
1: input: v, the starting vertex;

α, the restart probability;
n, the number of steps to be executed;
P , the transition probabilities;
η, the frequency threshold.

2: output: semSignv, set of related vertices for v.
3: function RWR(v,α,n,P,η)
4: v′ := v
5: counts := new Map < Synset,Integer >
6: while n > 0 do
7: if random() > α then
8: given the transition probabilities P(·|v′)
9: of v′, choose a random neighbor v′′

10: v′ := v′′

11: counts[v′] + +
12: else
13: restart the walk
14: v′ := v
15: n := n−1
16: for each v′ in counts.keys() do
17: if counts[v′] < η then
18: remove v′ from counts.keys()
19: return semSignv = counts.keys()

The creation of our set of semantic signatures, one
for each vertex in the semantic network, is a prelim-
inary step carried out once only before starting pro-
cessing any input text. We now turn to the candidate
identification and disambiguation steps.

6 Candidate Identification

Given a text as input, we apply part-of-speech tag-
ging and identify the set F of all the textual frag-
ments, i.e., all the sequences of words of maximum
length five, which contain at least one noun and that
are substrings of lexicalizations in BabelNet, i.e.,
those fragments that can potentially be linked to an
entry in BabelNet. For each textual fragment f ∈ F ,
i.e., a single- or multi-word expression of the input
text, we look up the semantic network for candidate
meanings, i.e., vertices that contain f or, only for
named entities, a superstring of f as their lexical-
ization. For instance, for sentence (1) in the intro-
duction, we identify the following textual fragments:
Thomas, Mario, strikers, Munich. This output is ob-
tained thanks to our loose candidate identification
routine, i.e., based on superstring matching instead
of exact matching, which, for instance, enables us to
recognize the right candidate Mario Gomez for the

mention Mario even if this named entity does not
have Mario as one of its lexicalizations (for an anal-
ysis of the impact of this routine against the exact
matching approach see the discussion in Section 9).

Moreover, as we stated in Section 3, we allow
overlapping fragments, e.g., for major league we
recognize league and major league. We denote with
cand(f) the set of all the candidate meanings of
fragment f. For instance, for the noun league we
have that cand(league) contains among others the
sport word sense and the TV series named entity.

7 Candidate Disambiguation

Semantic interpretation graph. After the identi-
fication of fragments (F ) and their candidate mean-
ings (cand(·)), we create a directed graph GI =
(VI,EI) of the semantic interpretations of the input
text. We show the pseudocode in Algorithm 2. VI
contains all the candidate meanings of all fragments,
that is, VI := {(v,f) : v ∈ cand(f),f ∈ F}, where
f is a fragment of the input text and v is a candidate
Babel synset that has a lexicalization which is equal
to or is a superstring of f (see lines 4–8). The set
of edges EI connects related meanings and is pop-
ulated as follows: we add an edge from (v,f) to
(v′,f ′) if and only if f 6= f ′ and v′ ∈ semSignv
(see lines 9–11). In other words, we connect two
candidate meanings of different fragments if one is
in the semantic signature of the other. For instance,
we add an edge between (Mario Gomez, Mario) and
(Thomas Müller, Thomas), while we do not add one
between (Mario Gomez, Mario) and (Mario Basler,
Mario) since these are two candidate meanings of
the same fragment, i.e., Mario. In Figure 1, we show
an excerpt of our graph for sentence (1).

At this point we have a graph-based representa-
tion of all the possible interpretations of the input
text. In order to drastically reduce the degree of am-
biguity while keeping the interpretation coherence
as high as possible, we apply a novel densest sub-
graph heuristic (see line 12), whose description we
defer to the next paragraph. The result is a sub-
graph which contains those semantic interpretations
that are most coherent to each other. However, this
subgraph might still contain multiple interpretations
for the same fragment, and even unambiguous frag-
ments which are not correct. Therefore, the final

235


(Tomás Milián, Thomas)

(Thomas Müller, Thomas)

(forward, striker)

(striker, striker) (FC Bayern Munich, Munich)

(Munich, Munich)

(Mario Adorf, Mario)
(Mario Basler, Mario)

(Mario Gomez, Mario)

Figure 1: An excerpt of the semantic interpretation graph automatically built for the sentence Thomas and Mario are
strikers playing in Munich (the edges connecting the correct meanings are in bold).

step is the selection of the most suitable candidate
meaning for each fragment f given a threshold θ to
discard semantically unrelated candidate meanings.
We score each meaning v ∈ cand(f) with its nor-
malized weighted degree5 in the densest subgraph:

score((v,f)) =
w(v,f) ·deg((v,f))∑

v′ ∈ cand(f)
w(v′,f) ·deg((v′,f))

(2)

where w(v,f) is the fraction of fragments the candi-
date meaning v connects to:

w(v,f) :=

|{f′ ∈ F : ∃v′ s.t. ((v,f),(v′,f′))
or ((v′,f′),(v,f)) ∈ EI}|
|F |−1

The rationale behind this scoring function is to
take into account both the semantic coherence, us-
ing a graph centrality measure among the candidate
meanings, and the lexical coherence, in terms of the
number of fragments a candidate relates to.

Finally, we link each f to the highest ranking can-
didate meaning v? if score((v?,f)) ≥ θ, where θ is
a fixed threshold (see lines 14–18 of Algorithm 2).
For instance, in sentence (1) and for the fragment
Mario we select Mario Gomez as our final candidate
meaning and link it to the fragment.

Linking by densest subgraph. We now illustrate
our novel densest subgraph heuristic, used in line 12
of Algorithm 2, for reducing the level of ambiguity
of the initial semantic interpretation graph GI . The
main idea here is that the most suitable meanings of
each text fragment will belong to the densest area of
the graph. For instance, in Figure 1 the (candidate,
fragment) pairs (Thomas Müller, Thomas), (Mario
Gomez, Mario), (striker, striker) and (FC Bayern

5We denote with deg(v) the overall number of incoming and
outgoing edges, i.e., deg(v) := deg+(v) + deg−(v).

Algorithm 2 Candidate Disambiguation.
1: input: F , the fragments in the input text;

semSign, the semantic signatures;
µ, ambiguity level to be reached;
cand, fragments to candidate meanings.

2: output: selected, disambiguated fragments.
3: function DISAMB(F,semSign,µ,cand)
4: VI := ∅;EI := ∅
5: GI := (VI,EI)
6: for each fragment f ∈ F do
7: for each candidate v ∈ cand(f) do
8: VI := VI ∪{(v,f)}
9: for each ((v,f),(v′,f′)) ∈ VI ×VI do

10: if f 6= f′ and v′ ∈ semSignv then
11: EI := EI ∪{((v,f),(v′,f′))}
12: G?I := DENSSUB(F,cand,GI,µ)
13: selected := new Map < String,Synset >
14: for each f ∈ F s.t. ∃(v,f) ∈ V ?I do
15: cand?(f) := {v : (v,f) ∈ V ?I }
16: v? := arg maxv∈cand?(f) score((v,f))
17: if score((v?,f)) ≥ θ then
18: selected(f) := v?

19: return selected

Munich, Munich) form a dense subgraph supporting
their relevance for sentence (1).

The problem of identifying the densest subgraph
of size at least k is NP-hard (Feige et al., 1999).
Therefore, we define a heuristic for k-partite graphs
inspired by a 2-approximation greedy algorithm for
arbitrary graphs (Charikar, 2000; Khuller and Saha,
2009). Our adapted strategy for selecting a dense
subgraph of GI is based on the iterative removal
of low-coherence vertices, i.e., fragment interpreta-
tions. We show the pseudocode in Algorithm 3.

We start with the initial graph G(0)I at step t = 0
(see line 5). For each step t (lines 7–16), first, we
identify the most ambiguous fragment fmax, i.e., the
one with the maximum number of candidate mean-

236


Algorithm 3 Densest Subgraph.
1: input: F , the set of all fragments in the input text;

cand, from fragments to candidate meanings;
G

(0)
I , the full semantic interpretation graph;

µ, ambiguity level to be reached.
2: output: G?I , a dense subgraph.
3: function DENSSUB(F,cand,G(0)I ,µ)
4: t := 0
5: G?I := G

(0)
I

6: while true do
7: fmax := arg maxf∈F |{v : ∃(v,f) ∈ V

(t)
I }|

8: if |{v : ∃(v,fmax) ∈ V (t)I }|≤ µ then
9: break;

10: vmin:= argmin
v ∈ cand(fmax)

score((v,fmax))

11: V
(t+1)
I := V

(t)
I \{(vmin,fmax)}

12: E
(t+1)
I := E

(t)
I ∩V

(t+1)
I ×V

(t+1)
I

13: G
(t+1)
I := (V

(t+1)
I ,E

(t+1)
I )

14: if avgdeg(G(t+1)I ) > avgdeg(G
?
I) then

15: G?I := G
(t+1)
I

16: t := t + 1
17: return G?I

ings in the graph (see line 7). Next, we discard
the weakest interpretation of the current fragment
fmax. To do so, we determine the lexical and seman-
tic coherence of each candidate meaning (v,fmax)
using Formula 2 (see line 10). We then remove
from our graph G(t)I the lowest-coherence vertex
(vmin,fmax), i.e., the one whose score is minimum
(see lines 11–13). For instance, in Figure 1, fmax
is the fragment Mario and we have: score((Mario
Gomez, Mario)) ∝ 3

3
· 5 = 5, score((Mario Basler,

Mario)) ∝ 1
3
· 1 = 0.3 and score((Mario Adorf,

Mario)) ∝ 2
3
·2 = 1.3, so we remove (Mario Basler,

Mario) from the graph since its score is minimum.

We then move to the next step, i.e., we set t :=
t + 1 (see line 16) and repeat the low-coherence re-
moval step. We stop when the number of remaining
candidates for each fragment is below a threshold
µ, i.e., |{v : ∃(v,f) ∈ V (t)I }| ≤ µ ∀f ∈ F (see
lines 8–9). During each iteration step t we com-
pute the average degree of the current graph G(t)I ,

i.e., avgdeg(G(t)I ) =
2|E(t)

I
|

|V (t)
I
|

. Finally, we select as

the densest subgraph of the initial semantic interpre-
tation graph GI the graph G?I that maximizes the av-
erage degree (see lines 14–15).

8 Experimental Setup

Datasets. We carried out our experiments on six
datasets, four for WSD and two for EL:
• The SemEval-2013 task 12 dataset for multilin-

gual WSD (Navigli et al., 2013), which consists
of 13 documents in different domains, available
in 5 languages. For each language, all noun
occurrences were annotated using BabelNet,
thereby providing Wikipedia and WordNet an-
notations wherever applicable. The number of
mentions to be disambiguated roughly ranges
from 1K to 2K per language in the different se-
tups.

• The SemEval-2007 task 7 dataset for coarse-
grained English all-words WSD (Navigli et al.,
2007). We take into account only nominal men-
tions obtaining a dataset containing 1107 nouns
to be disambiguated using WordNet.

• The SemEval-2007 task 17 dataset for fine-
grained English all-words WSD (Pradhan et al.,
2007). We considered only nominal mentions
resulting in 158 nouns annotated with WordNet
synsets.

• The Senseval-3 dataset for English all-words
WSD (Snyder and Palmer, 2004), which con-
tains 899 nouns to be disambiguated using
WordNet.

• KORE50 (Hoffart et al., 2012), which consists
of 50 short English sentences (mean length of
14 words) with a total number of 144 mentions
manually annotated using YAGO2, for which a
Wikipedia mapping is available. This dataset
was built with the idea of testing against a high
level of ambiguity for the EL task.

• AIDA-CoNLL6 (Hoffart et al., 2011), which
consists of 1392 English articles, for a total
of roughly 35K named entity mentions anno-
tated with YAGO concepts separated in devel-
opment, training and test sets.

We exploited the POS tags already available in the
SemEval and Senseval datasets, while we used the
Stanford POS tagger (Toutanova et al., 2003) for the
English sentences in the last two datasets.

6We used AIDA-CoNLL as it is the most recent and largest
available dataset for EL (Hachey et al., 2013). The TAC KBP
datasets are available only to participants.

237


Parameters. We fixed the parameters of RWR
(Section 5) to the values α = .85, η = 100 and
n = 1M which maximize F1 on a manually cre-
ated tuning set made up of 10 gold-standard seman-
tic signatures. We tuned our two disambiguation pa-
rameters µ = 10 and θ = 0.8 by optimizing F1 on
the trial dataset of the SemEval-2013 task on mul-
tilingual WSD (Navigli et al., 2013). We used the
same parameters on all the other WSD datasets. As
for EL, we used the training part of AIDA-CoNLL
(Hoffart et al., 2011) to set µ = 5 and θ = 0.0.

8.1 Systems
Multilingual WSD. We evaluated our system on
the SemEval-2013 task 12 by comparing it with the
participating systems:
• UMCC-DLSI (Gutiérrez et al., 2013) a state-

of-the-art Personalized PageRank-based ap-
proach that exploits the integration of different
sources of knowledge, such as WordNet Do-
mains/Affect (Strapparava and Valitutti, 2004),
SUMO (Zouaq et al., 2009) and the eXtended
WordNet (Mihalcea and Moldovan, 2001);

• DAEBAK! (Manion and Sainudiin, 2013)
which performs WSD on the basis of periph-
eral diversity within subgraphs of BabelNet;

• GETALP (Schwab et al., 2013) which uses an
Ant Colony Optimization technique together
with the classical measure of Lesk (1986).

We also compared with UKB w2w (Agirre
and Soroa, 2009), a state-of-the-art approach for
knowledge-based WSD, based on Personalized
PageRank (Haveliwala, 2002). We used the same
mapping from words to senses that we used in our
approach, default parameters7 and BabelNet as the
input graph. Moreover, we compared our system
with IMS (Zhong and Ng, 2010), a state-of-the-
art supervised English WSD system which uses an
SVM trained on sense-annotated corpora, such as
SemCor (Miller et al., 1993) and DSO (Ng and
Lee, 1996), among others. We used the IMS model
out-of-the-box with Most Frequent Sense (MFS) as
backoff routine since the model obtained using the
task trial data performed worse.

We followed the original task formulation and
evaluated the synsets in three different settings, i.e.,

7./ukb wsd -D dict.txt -K kb.bin --ppr w2w ctx.txt

when using BabelNet senses, Wikipedia senses and
WordNet senses, thanks to BabelNet being a super-
set of the other two inventories. We ran our sys-
tem on a document-by-document basis, i.e., disam-
biguating each document at once, so as to test its
effectiveness on long coherent texts. Performance
was calculated in terms of F1 score. We also com-
pared the systems with the MFS baseline computed
for the three inventories (Navigli et al., 2013).

Coarse-grained WSD. For the SemEval-2007
task 7 we compared our system with the two top-
ranked approaches, i.e., NUS-PT (Chan et al., 2007)
and UoR-SSI (Navigli, 2008), which respectively
exploited parallel texts and enriched semantic paths
in a semantic network, the previously described
UKB w2w system,8 a knowledge-based WSD ap-
proach (Ponzetto and Navigli, 2010) which exploits
an automatic extension of WordNet, and, as base-
line, the MFS.

Fine-grained WSD. For the remaining fine-
grained WSD datasets, i.e., Senseval-3 and
SemEval-2007 task 17, we compared our approach
with the previously described state-of-the-art
systems UKB and IMS, and, as baseline, the MFS.

KORE50 and AIDA-CoNLL. For the KORE50
and AIDA-CoNLL datasets we compared our sys-
tem with six approaches, including state-of-the-art
ones (Hoffart et al., 2012; Cornolti et al., 2013):
• MW, i.e., the Normalized Google Distance as

defined by Milne and Witten (2008);

• KPCS (Hoffart et al., 2012), which calcu-
lates a Mutual Information weighted vector of
keyphrases for each candidate and then uses the
cosine similarity to obtain candidates’ scores;

• KORE and its variants KORELSH−G and
KORELSH−F (Hoffart et al., 2012), based on
similarity measures that exploit the overlap be-
tween phrases associated with the considered
entities (KORE) and a hashing technique to re-
duce the space needed by the keyphrases asso-
ciated with the entities (LSH-G, LSH-F);

• Tagme 2.09 (Ferragina and Scaiella, 2012)
which uses the relatedness measure defined

8We report the results as given by Agirre et al. (2014).
9We used the out-of-the-box RESTful API available at

http://tagme.di.unipi.it

238


Sens3 Sem07 SemEval-2013 English French German Italian Spanish
System WN WN WN Wiki BN Wiki BN Wiki BN Wiki BN Wiki BN
Babelfy 68.3 62.7 65.9 87.4 69.2 71.6 ?56.9 81.6 69.4 84.3 66.6 83.8 69.5
IMS 71.2 63.3 65.7 – – – – – – – – – –
UKB w2w ?65.3 ?56.0 61.3 – 60.8 – 60.8 – 66.2 – 67.3 – 70.0
UMCC-DLSI – – 64.7 54.8 68.5 ?60.5 60.5 ?58.1 62.8 ?58.3 65.8 ?61.0 71.0
DAEBAK! – – – – 60.4 – 53.8 – 59.1 – ?61.3 – 60.0
GETALP-BN – – 51.4 – 58.3 – 48.3 – 52.3 – 52.8 – 57.8
MFS 70.3 65.8 ?63.0 ?80.3 ?66.5 69.4 45.3 83.1 ?67.4 82.3 57.5 82.4 ?64.4
Babelfy unif. weights 67.0 65.2 65.0 87.0 68.5 71.9 57.2 81.2 69.8 83.7 66.8 83.8 70.8
Babelfy w/o dens. sub. 68.3 63.3 65.4 87.3 68.7 71.6 57.0 81.7 69.1 84.4 66.5 83.9 69.5
Babelfy only concepts 68.2 62.7 65.5 83.0 68.7 70.2 56.6 79.3 69.3 83.0 66.3 84.0 69.7
Babelfy on sentences 66.0 65.2 63.5 84.0 67.1 70.7 53.6 82.3 68.1 83.8 64.2 83.5 68.7

Table 1: F1 scores (percentages) of the participating systems of SemEval-2013 task 12 together with MFS, UKB w2w,
IMS, our system and its ablated versions on the Senseval-3, SemEval-2007 task 17 and SemEval-2013 datasets. The
first system which has a statistically significant difference from the top system is marked with ? (χ2, p < 0.05).

by Milne and Witten (2008) weighted with
the commonness of a sense together with
the keyphraseness measure defined by Mihal-
cea and Csomai (2007) to exploit the context
around the target word;

• Illinois Wikifier10 (Cheng and Roth, 2013)
which combines local features, such as com-
monness and TF-IDF between mentions and
Wikipedia pages, with global coherence fea-
tures based on Wikipedia links and relational
inference;

• DBpedia Spotlight11 (Mendes et al., 2011)
which uses LingPipe’s string matching algo-
rithm implementation together with a weighted
cosine similarity measure to recognize and dis-
ambiguate mentions.

We also compared with UKB w2w, introduced
above. Note that we could not use supervised sys-
tems, as the training data of AIDA-CoNLL covers
less than half of the mentions used in the testing
part and less than 10% of the entities considered in
KORE50. To enable a fair comparison, we ran our
system by restricting the BabelNet sense inventory
of the target mentions to the English Wikipedia. As
is customary in the literature, we calculated the sys-
tems’ accuracy for both Entity Linking datasets.

10We used the out-of-the-box Java API available from
http://cogcomp.cs.illinois.edu/page/download view/Wikifier

11We used the 2011 version of DBpedia Spotlight as it ob-
tains better scores on the considered datasets in comparison to
the new version (Daiber et al., 2013). We used the out-of-the-
box RESTful API available at http://spotlight.dbpedia.org

9 Results

Multilingual WSD. In Table 1 we show the F1
performance on the SemEval-2013 task 12 for the
three setups: WordNet, Wikipedia and BabelNet.
Using BabelNet we surpass all systems on English
and German and obtain performance comparable
with the best systems on two other languages (UKB
on Italian and UMCC-DLSI on Spanish). Using the
WordNet sense inventory, our results are on a par
with the best system, i.e., IMS. On Wikipedia our
results range between 71.6% (French) and 87.4% F1
(English), i.e., more than 10 points higher than the
current state of the art (UMCC-DLSI) in all 5 lan-
guages. As for the MFS baseline, which is known
to be very competitive in WSD (Navigli, 2009), we
beat it in all setups except for German on Wikipedia.
Interestingly, we surpass the WordNet MFS by 2.9
points, a significant result for a knowledge-based
system (see also (Pilehvar and Navigli, 2014)).

Coarse- and fine-grained WSD. In Table 2, we
show the results of the systems on the SemEval-
2007 coarse-grained WSD dataset. As can be seen,
we obtain the second best result after Ponzetto and
Navigli (2010). In Table 1 (first two columns), we
show the results of IMS and UKB on the Senseval-3
and SemEval-2007 task 17 datasets. We rank second
on both datasets after IMS. However, the differences
are not statistically significant. Moreover, Agirre et
al. (2014, Table 5) note that using WordNet 3.0, in-
stead of 1.7 or 2.1, to annotate these datasets can
cause a more than one percent drop in performance.

239


System F1
(Ponzetto and Navigli, 2010) 85.5
Babelfy 84.6
UoR-SSI 84.1
UKB w2w 83.6
NUS-PT ?82.3
MFS 77.4
Babelfy unif. weights 85.7
Babelfy w/o dens. sub. 84.9
Babelfy only concepts 85.3
Babelfy on sentences 82.3

Table 2: F1 score (percentages) on the SemEval-2007
task 7. The first system which has a statistically signifi-
cant difference from the top system is marked with ? (χ2,
p < 0.05).

Entity Linking. In Table 3 we show the results on
the two Entity Linking datasets, i.e., KORE50 and
AIDA-CoNLL. Our system outperforms all other
approaches, with KORE-LSH-G getting closest, and
Tagme and Wikifier lagging behind on the KORE50
dataset. For the AIDA-CoNLL dataset we obtain the
third best performance after MW and KPCS, how-
ever the difference is not statistically significant.

We note the low performance of DBpedia Spot-
light which, even if it achieves almost 100% preci-
sion on the identified mentions on both datasets, suf-
fers from low recall due to its candidate identifica-
tion step, confirming previous evaluations (Derczyn-
ski et al., 2013; Hakimov et al., 2012; Ludwig and
Sack, 2011). This problem becomes even more ac-
centuated in the latest version of this system (Daiber
et al., 2013). Finally, UKB using BabelNet obtains
low performance on EL, i.e., 19.4-10.5 points below
the state of the art. This result is discussed below.

Discussion. The results obtained by UKB show
that the high performance of our unified approach
to EL and WSD is not just a mere artifact of the use
of a rich multilingual semantic network, that is, Ba-
belNet. In other words, it is not true that any graph-
based algorithm could be applied to perform both
EL and WSD at the same time equally well. This
also shows that BabelNet by itself is not sufficient
for achieving high performances for both tasks and
that, instead, an appropriate processing of the struc-
tural and lexical information of the semantic net-
work is needed. A manual analysis revealed that the
main cause of error for UKB in the EL setup stems

System KORE50 CoNLL
Babelfy 71.5 82.1
KORE-LSH-G 64.6 81.8
KORE 63.9 ?80.7
MW ?57.6 82.3
Tagme 56.3 70.1
KPCS 55.6 82.2
KORE-LSH-F 53.2 81.2
UKB w2w (on BabelNet) 52.1 71.8
Illinois Wikifier 41.7 72.4
DBpedia Spotlight 35.4 34.0
Babelfy unif. weights 69.4 81.7
Babelfy w/o dens. sub. 62.5 78.1
Babelfy only NE 68.1 78.8

Table 3: Accuracy (percentages) of state-of-the-art EL
systems and our system on KORE50 and AIDA-CoNLL.
The first system with a statistically significant difference
from the top system is marked with ? (χ2, p < 0.05).

from its inability to enforce high coherence, e.g., by
jointly disambiguating all the words, which is in-
stead needed when considering the high level of am-
biguity that we have in our semantic interpretation
graph (Cucerzan, 2007). For instance, for sentence
(1) in the introduction, UKB disambiguates Thomas
as a cricket player and Mario as the popular video
game rather than the two well-known soccer play-
ers, and Munich as the German city, rather than the
soccer team in which they play. Our approach, in-
stead, by enforcing highly coherent semantic inter-
pretations, correctly identifies all the soccer-related
entities.

In order to determine the need of our loose candi-
date identification heuristic (see Section 6), we com-
pared the percentage of times a candidate set con-
tains the correct entity against that obtained by an
exact string matching between the mention and the
sense inventory. On KORE50, our heuristic retrieves
the correct entity 98.6% of the time vs. 42.4% when
exact matching is used. This demonstrates the inad-
equacy of exact matching for EL, and the need for
a comprehensive sense inventory, as is done in our
approach.

We also performed different ablation tests by ex-
perimenting with the following variants of our sys-
tem (reported at the bottom of Tables 1, 2 and 3):
• Babelfy using uniform distribution during the

RWR to obtain the concepts’ semantic sig-
natures; this test assesses the impact of our
weighting and edge creation strategy.

240


• Babelfy without performing the densest sub-
graph heuristic, i.e., when line 12 in Algorithm
2 is G?I = GI , so as to verify the impact of
identifying the most coherent interpretations.

• Babelfy applied to the BabelNet subgraph in-
duced by the entire set of named entity ver-
tices, for the EL task, and that induced by word
senses only, for the WSD task; this test aims to
stress the impact of our unified approach.

• Babelfy applied on sentences instead of on
whole documents.

The component which has a smaller impact on
the performance is our triangle-based weighting
scheme. The main exception is on the smallest
dataset, i.e., SemEval-2007 task 17, for which this
version attains an improvement of 2.5 percentage
points.

Babelfy without the densest subgraph algorithm
is the version which attains the lowest performances
on the EL task, with a 9% performance drop on the
KORE50 dataset, showing the need for a specially
designed approach to cope with the high level of am-
biguity that is encountered on this task. On the other
hand, in the WSD datasets this version attains almost
the same results as the full version, due to the lower
number of candidate word senses.

Babelfy applied on sentences instead of on whole
documents shows a lower performance, confirm-
ing the significance of higher semantic coherence
on whole documents (notwithstanding the two ex-
ceptions on the SemEval-2007 task 17 and on the
SemEval-2013 German Wikipedia datasets).

Finally, the version in which we restrict our
system to named entities only (for EL) and con-
cepts only (for WSD) consistently obtains lower re-
sults (notwithstanding the three exceptions on the
Spanish SemEval-2013 task 12 using BabelNet and
Wikipedia, and on the SemEval 2007 coarse-grained
task). This highlights the benefit of our joint use
of lexicographic and encyclopedic structured knowl-
edge, on each of the two tasks. The 3.4% per-
formance drop attained on KORE50 is of particu-
lar interest, since this dataset aims at testing perfor-
mance on highly ambiguous mentions within short
sentences. This indicates that the semantic analysis
of small contexts can be improved by leveraging the
coherence between concepts and named entities.

10 Conclusion

In this paper we presented Babelfy, a novel,
integrated approach to Entity Linking and
Word Sense Disambiguation, available at
http://babelfy.org. Our joint solution is
based on three key steps: i) the automatic creation
of semantic signatures, i.e., related concepts and
named entities, for each node in the reference
semantic network; ii) the unconstrained identifica-
tion of candidate meanings for all possible textual
fragments; iii) linking based on a high-coherence
densest subgraph algorithm. We used BabelNet
1.1.1 as our multilingual semantic network.

Our graph-based approach exploits the semantic
network structure to its advantage: two key features
of BabelNet, that is, its multilinguality and its in-
tegration of lexicographic and encyclopedic knowl-
edge, make it possible to run our general, unified ap-
proach on the two tasks of Entity Linking and WSD
in any of the languages covered by the semantic net-
work. However, we also demonstrated that Babel-
Net in itself does not lead to state-of-the-art accu-
racy on both tasks, even when used in conjunction
with a high-performance graph-based algorithm like
Personalized PageRank. This shows the need for our
novel unified approach to EL and WSD.

At the core of our approach lies the effective treat-
ment of the high degree of ambiguity of partial tex-
tual mentions by means of a 2-approximation algo-
rithm for the densest subgraph problem, which en-
ables us to output a semantic interpretation of the
input text with drastically reduced ambiguity, as was
previously done with SSI (Navigli, 2008).

Our experiments on six gold-standard datasets
show the state-of-the-art performance of our ap-
proach, as well as its robustness across languages.
Our evaluation also demonstrates that our approach
fares well both on long texts, such as those of the
WSD tasks, and short and highly-ambiguous sen-
tences, such as the ones in KORE50. Finally, abla-
tion tests and further analysis demonstrate that each
component of our system is needed to contribute
state-of-the-art performances on both EL and WSD.

As future work, we plan to use Babelfy for in-
formation extraction, where semantics is taking the
lead (Moro and Navigli, 2013), and for the valida-
tion of semantic annotations (Vannella et al., 2014).

241


Acknowledgments

The authors gratefully acknowl-
edge the support of the ERC Start-
ing Grant MultiJEDI No. 259234.

References
Eneko Agirre and Aitor Soroa. 2009. Personalizing

PageRank for Word Sense Disambiguation. In Proc.
of EACL, pages 33–41.

Eneko Agirre, David Martı́nez, Oier López de Lacalle,
and Aitor Soroa. 2006. Two graph-based algorithms
for state-of-the-art WSD. In Proc. of EMNLP, pages
585–593.

Eneko Agirre, Aitor Soroa, and Mark Stevenson. 2010.
Graph-based Word Sense Disambiguation of biomedi-
cal documents. Bioinformatics, 26(22):2889–2896.

Eneko Agirre, Oier Lopez de Lacalle, and Aitor Soroa.
2014. Random Walks for Knowledge-Based Word
Sense Disambiguation. Computational Linguistics,
40(1):57–84.

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens
Lehmann, Richard Cyganiak, and Zachary Ives. 2007.
DBpedia: A Nucleus for a Web of Open Data. In Proc.
of ISWC/ASWC, pages 722–735.

Carmen Banea and Rada Mihalcea. 2011. Word Sense
Disambiguation with multilingual features. In Proc.
of IWCS, pages 25–34.

Christoph Böhm, Gerard de Melo, Felix Naumann, and
Gerhard Weikum. 2012. LINDA: distributed web-of-
data-scale entity matching. In Proc. of CIKM, pages
2104–2108.

Samuel Brody and Mirella Lapata. 2009. Bayesian Word
Sense Induction. In Proc. of EACL, pages 103–111.

Razvan C. Bunescu and Marius Pasca. 2006. Using en-
cyclopedic knowledge for named entity disambigua-
tion. In Proc. of EACL, pages 9–16.

Yee Seng Chan, Hwee Tou Ng, and Zhi Zhong. 2007.
NUS-PT: Exploiting Parallel Texts for Word Sense
Disambiguation in the English All-Words Tasks. In
Proc. of SemEval-2007, pages 253–256.

Moses Charikar. 2000. Greedy approximation algo-
rithms for finding dense components in a graph. In
Proc. of APPROX, pages 84–95.

Xiao Cheng and Dan Roth. 2013. Relational Inference
for Wikification. In Proc. of EMNLP, pages 1787–
1796.

Marco Cornolti, Paolo Ferragina, and Massimiliano Cia-
ramita. 2013. A framework for benchmarking entity-
annotation systems. In Proc. of WWW, pages 249–
260.

Silviu Cucerzan. 2007. Large-Scale Named Entity Dis-
ambiguation Based on Wikipedia Data. In Proc. of
EMNLP-CoNLL, pages 708–716.

Joachim Daiber, Max Jakob, Chris Hokamp, and
Pablo N. Mendes. 2013. Improving efficiency and
accuracy in multilingual entity extraction. In Proc. of
I-Semantics, pages 121–124.

Leon Derczynski, Diana Maynard, Niraj Aswani, and
Kalina Bontcheva. 2013. Microblog-genre noise and
impact on semantic annotation accuracy. In Proc. of
Hypertext, pages 21–30.

Antonio Di Marco and Roberto Navigli. 2013. Cluster-
ing and Diversifying Web Search Results with Graph-
Based Word Sense Induction. Computational Linguis-
tics, 39(3):709–754.

Nicolai Erbs, Torsten Zesch, and Iryna Gurevych. 2011.
Link discovery: A comprehensive analysis. In Proc.
of ICSC, pages 83–86.

Oren Etzioni, Michele Banko, and Michael J Cafarella.
2006. Machine Reading. In Proc. of AAAI, pages
1517–1519.

Uriel Feige, Guy Kortsarz, and David Peleg. 1999. The
dense k-subgraph problem. Algorithmica, 29:2001.

Christiane Fellbaum. 1998. WordNet: An Electronic
Lexical Database. MIT Press.

Paolo Ferragina and Ugo Scaiella. 2010. TAGME:
On-the-fly Annotation of Short Text Fragments (by
Wikipedia Entities). In Proc. of CIKM, pages 1625–
1628.

Paolo Ferragina and Ugo Scaiella. 2012. Fast and Accu-
rate Annotation of Short Texts with Wikipedia Pages.
IEEE Software, 29(1):70–75.

Weiwei Guo and Mona T. Diab. 2010. Combining
Orthogonal Monolingual and Multilingual Sources of
Evidence for All Words WSD. In Proc. of ACL, pages
1542–1551.

Yoan Gutiérrez, Yenier Castañeda, Andy González,
Rainel Estrada, Dennys D. Piug, Jose I. Abreu,
Roger Pérez, Antonio Fernández Orquı́n, Andrés
Montoyo, Rafael Muñoz, and Franc Camara. 2013.
UMCC DLSI: Reinforcing a Ranking Algorithm with
Sense Frequencies and Multidimensional Semantic
Resources to solve Multilingual Word Sense Disam-
biguation. In Proc. of SemEval-2013, pages 241–249.

Ben Hachey, Will Radford, Joel Nothman, Matthew Hon-
nibal, and James R. Curran. 2013. Evaluating En-
tity Linking with Wikipedia. Artificial Intelligence,
194:130–150.

Sherzod Hakimov, Salih Atilay Oto, and Erdogan Dogdu.
2012. Named entity recognition and disambiguation
using linked data and graph-based centrality scoring.
In Proc. of SWIM, pages 4:1–4:7.

Taher H. Haveliwala. 2002. Topic-sensitive PageRank.
In Proc. of WWW, pages 517–526.

242


Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino,
Hagen Fürstenau, Manfred Pinkal, Marc Spaniol,
Bilyana Taneva, Stefan Thater, and Gerhard Weikum.
2011. Robust disambiguation of named entities in text.
In Proc. of EMNLP, pages 782–792.

Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Mar-
tin Theobald, and Gerhard Weikum. 2012. KORE:
keyphrase overlap relatedness for entity disambigua-
tion. In Proc. of CIKM, pages 545–554.

Johannes Hoffart, Fabian M Suchanek, Klaus Berberich,
and Gerhard Weikum. 2013. YAGO2: A spatially and
temporally enhanced knowledge base from Wikipedia.
Artificial Intelligence, 194:28–61.

Eduard H. Hovy, Roberto Navigli, and Simone P.
Ponzetto. 2013. Collaboratively built semi-structured
content and Artificial Intelligence: The story so far.
Artificial Intelligence, 194:2–27.

Samir Khuller and Barna Saha. 2009. On finding dense
subgraphs. In Proc. of ICALP, pages 597–608.

Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and
Soumen Chakrabarti. 2009. Collective Annotation of
Wikipedia Entities in Web Text. In Proc. of KDD,
pages 457–466.

Els Lefever and Véronique Hoste. 2010. Semeval-2010
task 3: Cross-lingual Word Sense Disambiguation. In
Proc. of SemEval-2010, pages 15–20.

Els Lefever and Véronique Hoste. 2013. SemEval-2013
Task 10: Cross-lingual Word Sense Disambiguation.
In Proc. of SemEval-2013, pages 158–166.

Els Lefever, Véronique Hoste, and Martine De Cock.
2011. Parasense or how to use parallel corpora for
Word Sense Disambiguation. In Proc. of ACL-HLT,
pages 317–322.

Michael E. Lesk. 1986. Automatic Sense Disambigua-
tion Using Machine Readable Dictionaries: How to
Tell a Pine Cone from an Ice Cream Cone. In Proc.
of the International Conference on Systems Documen-
tation, pages 24–26.

Nadine Ludwig and Harald Sack. 2011. Named entity
recognition for user-generated tags. In Proc. of DEXA,
pages 177–181.

Suresh Manandhar, Ioannis P. Klapaftis, Dmitriy Dli-
gach, and Sameer S. Pradhan. 2010. SemEval-2010
task 14: Word sense induction & disambiguation. In
Proc. of SemEval-2010, pages 63–68.

Steve L. Manion and Raazesh Sainudiin. 2013. DAE-
BAK!: Peripheral Diversity for Multilingual Word
Sense Disambiguation. In Proc. of SemEval-2013,
pages 250–254.

Pablo N. Mendes, Max Jakob, Andrés Garcı́a-Silva, and
Christian Bizer. 2011. DBpedia spotlight: shed-
ding light on the web of documents. In Proc. of I-
Semantics, pages 1–8.

Rada Mihalcea and Andras Csomai. 2007. Wikify!: link-
ing documents to encyclopedic knowledge. In Proc. of
CIKM, pages 233–242.

Rada Mihalcea and Dan I Moldovan. 2001. Extended
WordNet: Progress report. In Proc. of NAACL Work-
shop on WordNet and Other Lexical Resources, pages
95–100.

Rada Mihalcea. 2005. Unsupervised large-vocabulary
word sense disambiguation with graph-based algo-
rithms for sequence data labeling. In Proc. of
HLT/EMNLP, pages 411–418.

George A. Miller, Claudia Leacock, Randee Tengi, and
Ross T. Bunker. 1993. A semantic concordance. In
Proc. of HLT, pages 303–308.

Tristan Miller, Chris Biemann, Torsten Zesch, and Iryna
Gurevych. 2012. Using Distributional Similarity for
Lexical Expansion in Knowledge-based Word Sense
Disambiguation. In Proc. of COLING, pages 1781–
1796.

David Milne and Ian H. Witten. 2008. Learning to link
with Wikipedia. In Proc. of CIKM, pages 509–518.

Andrea Moro and Roberto Navigli. 2013. Integrating
Syntactic and Semantic Analysis into the Open Infor-
mation Extraction Paradigm. In Proc. of IJCAI, pages
2148–2154.

Roberto Navigli and Mirella Lapata. 2010. An Experi-
mental Study of Graph Connectivity for Unsupervised
Word Sense Disambiguation. TPAMI, 32(4):678–692.

Roberto Navigli and Simone Paolo Ponzetto. 2012a. Ba-
belNet: The automatic construction, evaluation and
application of a wide-coverage multilingual semantic
network. Artificial Intelligence, 193:217–250.

Roberto Navigli and Simone Paolo Ponzetto. 2012b.
Joining forces pays off: Multilingual Joint Word Sense
Disambiguation. In Proc. of EMNLP, pages 1399–
1410.

Roberto Navigli, Kenneth C. Litkowski, and Orin Har-
graves. 2007. SemEval-2007 Task 07: Coarse-
Grained English All-Words Task. In Proc. of
SemEval-2007, pages 30–35.

Roberto Navigli, David Jurgens, and Daniele Vannella.
2013. SemEval-2013 Task 12: Multilingual Word
Sense Disambiguation. In Proc. of SemEval-2013,
pages 222–231.

Roberto Navigli. 2008. A structural approach to the
automatic adjudication of word sense disagreements.
Natural Language Engineering, 14(4):293–310.

Roberto Navigli. 2009. Word Sense Disambiguation: A
survey. ACM Computing Surveys, 41(2):1–69.

Roberto Navigli. 2012. A Quick Tour of Word Sense
Disambiguation, Induction and Related Approaches.
In Proc. of SOFSEM, pages 115–129.

243


Hwee Tou Ng and Hian Beng Lee. 1996. Integrat-
ing multiple knowledge sources to disambiguate word
sense: An exemplar-based approach. In Proc. of ACL,
pages 40–47.

Mohammad Taher Pilehvar and Roberto Navigli. 2014.
A Large-scale Pseudoword-based Evaluation Frame-
work for State-of-the-Art Word Sense Disambigua-
tion. Computational Linguistics.

Simone P. Ponzetto and Roberto Navigli. 2010.
Knowledge-rich Word Sense Disambiguation rivaling
supervised system. In Proc. of ACL, pages 1522–1531.

Sameer S. Pradhan, Edward Loper, Dmitriy Dligach, and
Martha Palmer. 2007. SemEval-2007 task 17: En-
glish lexical sample, SRL and all words. In Proc. of
SemEval-2007, pages 87–92. Association for Compu-
tational Linguistics.

Delip Rao, Paul McNamee, and Mark Dredze. 2013. En-
tity Linking: Finding Extracted Entities in a Knowl-
edge Base. In Multi-source, Multilingual Information
Extraction and Summarization, Theory and Applica-
tions of Natural Language Processing, pages 93–115.
Springer Berlin Heidelberg.

Lev-Arie Ratinov, Dan Roth, Doug Downey, and Mike
Anderson. 2011. Local and Global Algorithms for
Disambiguation to Wikipedia. In Proc. of ACL, pages
1375–1384.

Lenhart K. Schubert. 2006. Turing’s dream and the
knowledge challenge. In Proc. of NCAI, pages 1534–
1538.

Didier Schwab, Andon Tchechmedjiev, Jérôme Goulian,
Mohammad Nasiruddin, Gilles Sérasset, and Hervé
Blanchon. 2013. GETALP System: Propagation of
a Lesk Measure through an Ant Colony Algorithm. In
Proc. of SemEval-2013, pages 232–240.

Hui Shen, Razvan Bunescu, and Rada Mihalcea. 2013.
Coarse to Fine Grained Sense Disambiguation in
Wikipedia. In Proc. of *SEM, pages 22–31.

Ravi Sinha and Rada Mihalcea. 2007. Unsupervised
Graph-based Word Sense Disambiguation Using Mea-
sures of Word Semantic Similarity. In Proc. of ICSC,
pages 363–369.

Benjamin Snyder and Martha Palmer. 2004. The English
all-words task. In Proc. of Senseval-3, pages 41–43.

Carlo Strapparava and Alessandro Valitutti. 2004. Word-
Net Affect: an Affective Extension of WordNet. In
Proc. of LREC, pages 1083–1086.

Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan.
2006. Fast Random Walk with Restart and Its Appli-
cations. In Proc. of ICDM, pages 613–622.

Kristina Toutanova, Dan Klein, Christopher D. Manning,
and Yoram Singer. 2003. Feature-rich part-of-speech
tagging with a cyclic dependency network. In Proc. of
NAACL-HLT, pages 173–180.

George Tsatsaronis, Michalis Vazirgiannis, and Ion An-
droutsopoulos. 2007. Word Sense Disambiguation
with Spreading Activation Networks Generated from
Thesauri. In Proc. of IJCAI, pages 1725–1730.

Tim Van de Cruys and Marianna Apidianaki. 2011. La-
tent Semantic Word Sense Induction and Disambigua-
tion. In Proc. of ACL, pages 1476–1485.

Daniele Vannella, David Jurgens, Daniele Scarfini,
Domenico Toscani, and Roberto Navigli. 2014. Vali-
dating and Extending Semantic Knowledge Bases us-
ing Video Games with a Purpose. In Proc. of ACL.

Duncan J. Watts and Steven H. Strogatz. 1998. Col-
lective dynamics of ’small-world’ networks. Nature,
393(6684):409–10.

Zhi Zhong and Hwee Tou Ng. 2010. It Makes Sense: A
Wide-Coverage Word Sense Disambiguation System
for Free Text. In Proc. of ACL (Demo), pages 78–83.

Amal Zouaq, Michel Gagnon, and Benoit Ozell. 2009. A
SUMO-based Semantic Analysis for Knowledge Ex-
traction. In Proc of LTC.

244