A semantic framework for textual data enrichment


Accepted Manuscript

A semantic framework for textual data enrichment

Yoan Gutiérrez , Sonia Vázquez , Andrés Montoyo

PII: S0957-4174(16)30143-9
DOI: 10.1016/j.eswa.2016.03.048
Reference: ESWA 10617

To appear in: Expert Systems With Applications

Received date: 23 December 2015
Revised date: 29 February 2016
Accepted date: 25 March 2016

Please cite this article as: Yoan Gutiérrez , Sonia Vázquez , Andrés Montoyo , A semantic framework
for textual data enrichment, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.03.048

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.eswa.2016.03.048
http://dx.doi.org/10.1016/j.eswa.2016.03.048


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Highlights 

 A semantic framework for recommender systems is presented 

 An in-depth analysis of different Natural Language Processing resources is showed  

 A description of different Natural Language Processing approaches is addressed 

 Related research works are described 

 A case of study to evaluate our proposal with real data is presented 

 
ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 
A semantic framework for textual data enrichment 
 

Yoan Gutiérrez, Sonia Vázquez
*
 and Andrés Montoyo  

Department of Software and Computing Systems. University of Alicante, Spain.  

ygutierrez, svazquez, montoyo@dlsi.ua.es 

 
Abstract: In this work we present a semantic framework suitable of being used as support tool for 

recommender systems. Our purpose is to use the semantic information provided by a set of integrated 

resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic 

similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to 

recommend similar content or even to rate texts according to different dimensions. First of all, we 

describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, 

we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we 

present a combination of different NLP approaches that provide enough knowledge for being used as 

support tool for recommender systems. Finally, we illustrate a case of study with information related to 

movies and TV series to demonstrate that our framework works properly. 

 
Keywords: Recommender Systems, Framework, Integrated Semantic Resources, Sentiment Analysis, 

Word Sense Disambiguation, Content Categorization 

1. Introduction 

Recent advances in modern technologies have motivated the development of different techniques to 

improve human-machine communication. Internet and new communication tendencies such as: short 

messages, forum participations, social networks, etc, have led to a revolution in the way in which people 

work, communicate and manage their free time. As a consequence of this technological revolution, a huge 

quantity of information is generated in different social contexts via diverse sources such as: forums, 

blogs, microblogs, social networks, etc. As a result, people are able to share their knowledge, expectations 

and emotions through Internet and they may also influence political, economic or social behaviour. At 

this point, governments, enterprises or even celebrities need to manage this information in order to extract 

relevant knowledge, social tendencies, etc. Because of this new context, research community in Natural 

Language Processing (NLP) have developed different tools with which to analyse news and opinions in 

order to discover what people think or how they perceive past, present and future.  

At present, personalization and recommender systems have gained popularity. In fact, recommender 

systems began to appear in the market in 1996 (Udi et al., 2000). Since then, several approaches have 

been developed (Gediminas and Alexander, 2005):  

 Content-based: these systems try to find products, services or contents that are similar to those 
already evaluated by the user. In this kind of systems, user‟s feedback (that can be collected in 

many ways) are essential to support and accomplish recommendations (Marco de et al., 2008). 

 Knowledge-based: these systems model the user profile in order to, through inference 
algorithms, identify the correlation between their preferences and existing products, services or 

content. (Walter et al., 2012) 

 Collaborative filtering: these systems create/classify groups of users that share similar 
profiles/behaviours in order to recommend products, services or content that has been well 

evaluated by the group to which a user belongs (Perner et al., 2007). 

 Hybrid: these systems combine two or more techniques previously mentioned to improve the 
„„quality‟‟ of recommendations (Shinde and Kulkarni, 2012). 

Dealing with textual information and obtaining valuable knowledge require advanced natural 

language techniques to solve different kinds of problems: document correction, automatic translation, 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

summary elaboration, opinion extraction, word sense disambiguation, etc. Solving all of these problems 

requires a considerable linguistic knowledge and, even more importantly, a high computational cost.  

In the vast majority of tasks in NLP it is necessary to use external resources such as: Machine-

readable dictionaries
1
, Thesaurus

2
, Ontologies

3
and others. These resources have different internal 

structures, interfaces, concept relations and other characteristics. One of the most frequently used 

resources in its different versions is WordNet
4
 (WN) (Miller et al., 1990). Various semantic resources 

related to WordNet have consequently been developed in different domains or by using semantic 

integration. But it is still difficult to find resources that provide semantic integration in different domains 

and which are useful for specific NLP tasks. 

In this work we present a new semantic resource (ISR-WN) and a set of different methods to take 

advantage of it with the aim of enriching texts with semantic information. As a result, we provide a 

semantic framework suitable of being used as support tool for content-based recommender systems by 

annotating texts with different features such as: sentiments, polarities or domain labels. In order to 

analyse the results of the semantic enrichment process, we have carried out a comprehensive case of study 

using texts from movies and TV series reviews obtained from IMDb
5
. Finally, we have evaluated how our 

proposed framework works comparing our results with real ratings. 

To summarize, we point out the main contributions of this work: 

 Taking advantage of a semantic resource with different dimensions previously developed 
(ISR-WN) 

 The use of a set of NLP methods based on ISR-WN to take advantage of each one of its 
semantic dimensions 

 Providing a new semantic framework that is able to enrich texts in several dimensions with 
the aim of obtaining a support tool for content-based recommender systems 

 An exhaustive evaluation with real datasets to demonstrate how it works 

The document is structured as follows. After this introduction, each semantic resource used in ISR-

WN is described, and an in-depth analysis of the different approaches for semantic integration resources 

in NLP is also presented. Having evaluated previous proposals, in Section 3 we go on to show how ISR-
WN was developed. An evaluation according to its integration effectiveness is then provided in Section 4. 
In Section 5 we provide a brief description of the different NLP tasks selected to enrich texts. Section 6 
describes the characteristics of a case of study to illustrate how our framework works with real data 

obtained from IMDb. In Section 7 we show some examples of how the semantic enrichment approaches 
are used to annotate texts. Section 8 provides the experiment results of the case of study and Section 9 
presents a discussion about the results obtained. Finally, the conclusions and further works are presented 

in Section 10. 

2. Related Work 

This section presents the different semantic resources that are integrated into ISR-WN and a comparison 

with other semantic integration resources.  

2.1 WordNet 

As mentioned in the previous section, WN is one of the most frequently used semantic resources in 

computational linguistics (Navigli, 2009). WN is a lexical database for the English language also 

considered as ontology. It was created at the University of Princeton
6
 and it represents a semantic 

conceptual and structured network of nouns, verbs, adjectives and adverbs. The basic unit of knowledge 

is the synset (synonym sets), which represents a lexical concept (Ševčenko, 2003). A synset is associated 

with a unique eight-digit number called an offset (this number is the position in the data file). Each synset 

                                                           
*

 Corresponding author. Tel. +34 965 90 37 72; Fax: +34 965 90 93 26. E-mail address: svazquez@dlsi.ua.es  
1
 Dictionaries of words available in electronic format. 

2
 Provides relationships among words (i.e., synonyms, antonyms and others). 

3
 Conceptualization of a domain in order to share information among different agents. 

4
 http://wordnet.princeton.edu/ 

5
 http://www.imdb.com 

6
 http://wordnet.princeton.edu/wordnet/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

represents different senses which are related through the use of semantic, conceptual or lexical 

connections. The result of this set of connections is a wide navigable network with a high number of 

interrelations among different word senses. 

The semantic relations among synsets are: 

 Synonymy 

 Antonymy 

 Hyponymy / Hyperonymy 

 Meronymy / Holonymy 

 Entailment and Cause, 

 and others…(more details in
7
) 

WN establishes the frequency of usage of each word sense (synset)in its internal relationships
8
. For 

example, the word image has eight senses in WN 2.0 (see Table 1). As we can observe, one word has 

different senses, there is a sentence (gloss) which describes each one and each sense has a set of 

synonyms that are ordered by their frequency of usage.  

POS Offset Lemmas Gloss 

Noun 00039630 effigy, image, 

simulacrum 

a representation of a person (especially in the form of sculpture); "the 

coin bears an effigy of Lincoln"; "the emperor's tomb had his image 

carved in stone" 

Noun 00043454 Picture ,image, 

icon, ikon 

a visual representation (of an object or scene or person or abstraction) 

produced on a surface; "they showed us the pictures of their wedding"; 

"a movie is a series of images projected so rapidly that the eye 

integrates them" 

Noun 00047709 persona, image (Jungian psychology) a personal facade that one presents to the world; 

"a public image is as fragile as Humpty Dumpty" 

Noun 00053779 image, 

mental_image 

an iconic mental representation; "her imagination forced images upon 

her too awful to contemplate" 

Noun 00053832 prototype, 

paradigm, 

epitome, image 

a standard or typical example; "he is the prototype of good breeding"; 

"he provided America with an image of the good father" 

Noun 00059547 trope, 

figure_of_speech, 

figure, image 

language used in a figurative or non-literal sense 

Noun 00074537 double, image, 

look-alike 

someone who closely resembles a famous person (especially an actor); 

"he could be Gingrich's double"; "she's the very image of her mother 

Verb 00109926 visualize, 

visualize, 

envision, project, 

fancy, see, figure, 

picture, image 

imagine; conceive of; see in one's mind; "I can't see him on 

horseback!"; "I can see what will happen"; "I can see a risk in this 

strategy" 

Table 1. Word senses of “image” in WN 2.0 

It is important to emphasize that WN has been adapted to different languages: English, Spanish, 

Dutch, Italian, German, French, Czech, Estonian, Swedish, Norwegian, Danish, Greek, Portuguese, 

Basque, Catalan, Romanian, Lithuanian, Russian, Bulgarian, Slovenian and others that are under 

development. These versions have been developed under the supervision of the University of Princeton 

and later under that of the Global WordNet Association
9
. 

This research work is based on two versions of WN: WN 1.6 with 99,643 synsets, of which 66,025 are 

nouns, 17,915 are adjectives, 3,575 are adverbs and 12,127 are verbs, and WN 2.0 with 115,424 synsets, 

of which 79,689 are nouns, 18,563 are adjectives, 3,664 are adverbs and 13,508 are verbs. 

2.2 Semantic resources aligned to WordNet 

                                                           
7
 http://wordnet.princeton.edu/man/wninput.5WN.html 

8
 https://wordnet.princeton.edu/man/cntlist.5WN.html 

9
 http://www.globalwordnet.org/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Owing to the fact that WN has been used in many NLP research works, a set of different semantic 

resources aligned to WN synsets has been developed with the aim of obtaining more knowledge. Some of 

these resources were created from WN, such as: WordNet Domains
10

 (Magnini and Cavaglia, 2000), 

WordNet Affect
11

 (Magnini and Cavaglia, 2000, Sara and Daniele, 2009) and Semantic Classes
12

 
(Izquierdo et al., 2007). Others emerged from the association of pre-produced tags, i.e., SUMO
13

.  

The resources used in our proposed semantic integration resource (ISR-WN) are described in detail 

below.  

2.2.1 WordNet Domains 

This is a resource for the English language. WordNet Domains (WND) includes a set of Subject Field 

Codes (SFC) (Magnini and Cavaglia, 2000) with which to enrich WN synsets. Each SFC groups a set of 

words related to the same domain. On the one hand, these domains identify the context of the definition 

and on the other, they allow a quick search of concepts to take place. For example, if we are searching for 

the meaning of disc in the Computer Science context, we need only check the domain label preceding 

each definition (in this case, Computer Science) until we find the correct definition ([magnetic_disc, 

magnetic_disc, disck, disk]). This resource therefore provides the integration of domain labels in WN with 

the aim of reducing WN sense granularity by grouping different senses in the same domain or semantic 

category.  

In WND, WN has been annotated by using a semi-automatic process that assigns one or more domain 

labels to each synset. Domain labels are selected from a set of 200 hierarchically organized labels. In this 

research we use 172 of these labels. 

The main purposes of annotating WN with SFC are: 

 To create new relations among words. Domain labels allow us to relate words that pertain to different 
grammatical categories. 

 Semantic annotation. Domain labels are associated with synsets, signifying that the annotation takes 
place at the semantic level rather than at the word level. 

 Synsets pertaining to different syntactic categories can be included in the same domain label. 

 Word senses pertaining to different sub-hierarchies of WN can be included in the same domain label. 

 To reduce word sense granularity. Grouping different senses of the same word in the same domain 
label reduces the polysemy of words. 

 
Sense Domain Gloss 

man#1 person an adult male person (as opposed to a woman); “there were two women and six men on the bus” 

man#2 military someone who serves in the armed forces; “two men stood sentry duty” 

man#3 person the generic use of the word to refer to any human being; “it was every man for himself” 

man#4 factotum all of the inhabitants of the earth; “all the world loves a lover” 

man#5 biology, 

person 

any living or extinct member of the family Hominidae 

man#6 person a male subordinate; “the chief stationed two men outside the building”; “he awaited word from 

his man in Havana” 

man#7 person an adult male person who has a manly character(virile and courageous competent); “the army 

will make a man of you” 

man#8 person (informal) a male person who plays a significant role (husband or lover or boyfriend) in the life 

of a particular woman; “she takes good care of her man” 

man#9 person a manservant who acts as a personal attendant to his employer; “Jeeves was Bertie Wooster’s 

man” 

man#10 play a small object used in playing certain board games; “he taught me to set up the men on the chess 

board”; “he sacrificed a piece to get a strategic advantage” 

Table 2. Domain labels and senses for man 

                                                           
10

 http://wndomains.fbk.eu/ 
11

 http://wndomains.fbk.eu/wnaffect.html 
12

 http://rua.ua.es/dspace/bitstream/10045/2522/1/ranlp07BLC2.pdf 
13

 http://www.ontologyportal.org/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

For example, the word man has ten senses in WN (see Table 2). However, in WND we can group 

different senses according to their domain labels, and the number of senses is thus reduced from ten to 

four (Magnini et al., July 2002). Moreover, there are different specification levels in the domain 

hierarchy. The deeper the level, the greater the specialisation is.  

Fig 1 shows a brief excerpt of the WND hierarchy from (Luisa Bentivogli, 2005).  

 
Fig 1. WordNet Domains hierarchy 

Among all the SFC domain labels there is a special one called Factotum. This domain label was 

created in order to group two types of synsets (Magnini and Cavaglia, 2000): 

 Generic Synsets. Those that it is difficult to classify into a particular domain label. 

 Stop senses. Those that frequently appear in different contexts, such as: numbers, days of the week, 
colours, etc. 

2.2.2 WordNet Affect 

WordNet Affect (WNA) is an extension of WND (Magnini and Cavaglia, 2000, Sara and Daniele, 2009). 

It contains different subsets of affective concepts that group together synsets which denote emotional 

states. This resource was labelled by following a similar process to that of WND. Some of the represented 

concepts are: moods, situations eliciting emotions or emotional responses. 

This resource was extended with a set of additional labels called emotional categories. It has a 

hierarchical structure in which hyperonymy is used to relate the affective concepts of WN (Valitutti et al., 

2004). In a second revision, some modifications were made in order to differentiate those senses that are 

closer to emotional labels, and new labels were also included: positive, negative, ambiguous and neutral: 

 The first pertains to positive emotions. For example, it includes synsets such as:  
joy#1 or enthusiasm#1.  

 Negative defines negative states such as: anger#1 o sadness#1.  

 Ambiguous represents synsets whose semantics depends on the contexts in which they appear: 
surprise#1 

 Neutral represents synsets that refer to mental states but which are not characterised by valence. 
 

One important property of WNA labels is that they associate the nouns and adjectives involved in 

emotional states. In this case, the adjective modifies the state of the noun, and may in some situations 

determine the modified noun state, i.e.: cheerful / happy boy (Strapparava and Valitutti, 2004). In other 

words, if the adjective pertains to an emotional state it could indicate how the noun is related. Table 3 

shows a list of affective labels associated with synsets. 300 labels are integrated into our research work. 

 
Doctrines

Archaeology

Astrology

History

Linguistic

Psychology

Art

Religion

Heraldry

Grammar

Psychoanalysis

Music

Dance

Drawing

Photography

Plastic_arts

Theatre

Mythology

Occultism

Roman_catholic

Theology

Painting

Philately

Jewelry

Numismatic

Sculpture


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 
Affective labels Examples 

emotion noun anger#1, verb fear#1 

mood noun animosisy#1, adjective amiable#1 

trait noun aggressiveness#1, adjective competitive#1 

cognitive state noun confusion#2, adjective dazed#2 

physical state noun illness#1, adjective all in#1 

hedonic signal noun hurt#3, noun suffering#4 

emotion-eliciting situation noun awkwardness#3, adjective out of danger#1 

emotional response noun cold sweat#1, verb tremble#2 

behaviour noun offense#1, adjective inhibited#1 

attitude noun intolerance#1, noun defensive#1 

sensation noun coldness#1, verb feel#3 

Table 3. WordNet Affects labels and their associated synsets. 

2.2.3 SUMO 

SUMO
14

 (Suggested Upper Merged Ontology) is considered to be an upper level ontology. It provides 

definitions for general terms and can be used as a basis for domain specific ontologies. It was created 

from the combination of different ontological contents in one single cohesive structure. It currently 

contains around 1,000 terms and 4,000 assertions (Niles and Pease, 2003). Our research work only uses 

568 of the concepts that are aligned with WN. 

SUMO was obtained from the information of: Ontolingua
15

 and the developed ontologies of ITBM-

CNR
16

 (Unrestricted-Time, Representation, Anatomy, Biologic-Functions, and Biologic-Substances). It 

uses a standard language representation called SUO-KIF (Pease, 2007) obtained from KIF (Knowledge 

Interchange Format) (Genesereth and Fikes, 1992). 

SUMO was built by dividing the concepts into two groups: high level concepts and low level 

concepts. In the first group, the ontology of John Sowa (Sowa, 1999),  and Russell and Norvig (Russell 

and Norvig, 1994) were considered, while the remaining concepts were included in the second group. 

Finally, a unique conceptual structure combining the two high level ontologies was created. The 

remaining low level class contents were included after the combination. Fig 2 shows the high level 

concepts. 

 
Fig 2. SUMO high level concepts (Ševčenko, 2003). 

The higher level concept is the entity category, as occurs in most hierarchies. The entity concept 

groups the rest of the concepts, and the physical and abstract concepts are closest to it.  

An example of the word sense bank#1 is shown in Fig 3, along with the SUMO hierarchy. 

                                                           
14

 http://suo.ieee.org/SUO/SUMO/index.html 
15

 http://www.ksl.stanford.edu/software/ontolingua/ 
16

 http://www.ontologyportal.org/SUMOhistory/ 

Entity

Abstract Physical

Quantity Attribute SetOrClass

Proposition Relation

ProcessObject


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT Fig 3. SUMO hierarchy for bank#1. 

2.2.4 Semantic classes 

A semantic class is a sense conceptualisation that can be manually or semi automatically created from 

different abstraction levels and in different domains. WN is composed of a set of related and connected 

synsets with different semantic relations. Each of these synsets represents a concept and contains a set of 

words referring to the concept it describes (synonyms). Synsets are classified into forty-five groups which 

include lexical categories (nouns, verbs, adjectives and adverbs) and semantic groupings (person, 

phenomenon, sentiment, place, etc.) (Fellbaum, 1998). There are twenty six categories for nouns, fifteen 

for verbs, three for adjectives and one for adverbs in WN (Izquierdo, 2010).  

The organisational design of WN helps lexicographers to obtain a structure with which to create and 

edit a set of words and senses in the same semantic class. These semantic categories are considered to be 

Semantic Classes that are more general than senses, in which different WN senses are grouped in a 

semantic class. It is also possible to use this sense conceptualisation in different languages because all the 

WN senses (in English) are linked to EuroWordNet which contains WNs in different languages. 

It is important to note that despite the fact that WN was generated by following a top-down process, it 

is difficult to distinguish between those synsets that were originally semantic classes and those that were 

not. If we wish to obtain the sense categorisation it is therefore necessary to apply a method with which to 

generate the Semantic Classes. 

The main goal of Semantic Class (SC) generation (Izquierdo et al., 2007) is to reduce the polysemy, 

and various techniques with which to group senses have therefore been developed. In all cases, senses of 

the same word have been grouped together, thus reducing polysemy and improving WSD system results. 

The SC resource consists of a set of Base Level Concepts (BLC) obtained from WN using a bottom-

up process with hyperonymy relations. For each synset of WN, its BLC is obtained from the first local 

maxim according to its relative number of relations. As a result, semantic classes have a set of BLCs that 

are linked to different synsets. 

The process follows an ascendant itinerary by using the hyperonymy relations in WN. In the case of 

one synset having several hyperonyms, the path with the maximum number of relations is selected. The 

process ends when a set of initial concepts with the synsets selected as BLCs for other synsets is selected. 

In some cases, there are BLCs that do not represent an adequate number of concepts or synsets. This 

situation is avoided by developing a final filtering process for these false BLCs, in which those BLCs that 

do not represent a minimum number of concepts are eliminated. Each BLC therefore has a minimum 

threshold associated with it, as is described in (Izquierdo et al., 2010). A set of different BLCs is obtained 

by combining different thresholds and types of relations (all or only hypo/hyperonyms). Those synsets 

that do not have a BLC associated with them (because their numbers of relations do not have the 

minimum threshold required) are processed again to select another BLC from their hyperonymy relations. 

This eventually results in a new number of labels with which to categorise a set of senses. 

SCs are applied with the use of different repositories. Those currently being used are WN1.6 and 

WN2.0. 

depository fina ncia l institution, ba nk, ba nking concern, ba nking compa ny

Object

Physical

Entity

Collection Agent

Corporation

Organization

Group


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

2.2.5 SentiWordnet 

SentiWordNet
17

 (SWN) (Esuli and Sebastiani, 2006, Baccianella et al., 2010) is a lexical resource in 

which each synset is associated with three different sentiment categories: objectivity, positivity and 

negativity. Each category has a score that moves from [0..1] and the addition of the three scores is always 

1. This means that one synset could have scores that are different from 0 for each of the categories. For 

example, atrocious#3                                     

In our proposal we use SentiWordNet 3.0. This version was obtained in two steps: first a semi-

supervised learning phase and then a random walk.  

The first step consists of four sub-steps, as in the first version of SWN: 

 Semi-supervised learning 

o First sub-step. Two small sets of seeds are used, one in which all the synsets contain seven 
paradigmatically positive terms and the other in which all the synsets contain seven 

paradigmatically negative terms (Turney and Littman, 2003). Both are expanded by using 

WN binary relations in order to connect synsets with the same polarity. The expansion is 

carried out with a specific k radius. 

o Second sub-step. A previous set of synsets is used with another set of synsets from the 
Objectivity category in order to build a set of training synsets with which to create a ternary 

classifier (one synset is classified as Pos, Neg or Obj). The classifier uses synset glosses to 

conduct the process. In SWN 1.0, a bag of words model is used in which the bag of words is 

obtained from gloss words (those frequent words that are most important). In SWN 3.0, 

rather than using words from glosses (with the ambiguity problem) a bag of synsets is used. 

This sub-step is improved by varying the displacement radius. 

o Third sub-step. All WN synsets are classified as being either Pos, Neg or Obj via the 
classifier generated in the second sub-step. 

o Fourth sub-step. The second sub-step can be performed by using different values of the k 
radius and different supervised learning technologies. 

 Step 2 (random walk). WN 3.0 is considered to be a graph and an iterative process is conducted. In 
this random walk Pos(s) and Neg(s) (and consequently Obj(s)) are determined using the previous 

step. The process ends when the iterations have converged. There are 117,659 WN synset 

descriptions in our proposal. 

2.2.6 eXtendedWordNet 

eXtendedWordNet
18

 (XWN) (Sanda M. Harabagiu, 1999) is a lexical resource created at the University of 

Texas. This resource was developed to improve semantic information in WN in its different versions. The 

goal is to add semantic information to glosses and establish new relations among words (now labelled 

with their senses) from glosses and synsets. The new annotated version of WN only uses information 

from gloss definitions, and gloss examples and other information are discarded.  

eXtendedWordNet was created by applying three processes: 

 Syntactic analysis. A voting process with two syntactic analysers was applied using the outputs of (Brill, 
1995) as inputs. The content of the glosses was extended with: 

o Adverbs. Glosses of adverbs were extended by adding an adverb + is at the beginning of the 
gloss and a full stop at the end of the definition. For example, the gloss for automatically would 

be: automatically is in a reflex manner. A direct semantic annotation is thus obtained between 

word and gloss. 

o Adjectives. Glosses of adjectives were extended by adding an adjective + is something at the 
beginning of the gloss and a full stop at the end of the definition. For example, the gloss for 

pure would be: pure is something not mixed. 

o Verbs. Glosses of verbs were extended by adding to + verb + is to at the beginning of the gloss 
and a full stop at the end of the definition. For example, the gloss for shed would be: to shed is 

to cast off hair, skin, horn, or feathers.  

                                                           
17

 http://gandalf.aksis.uib.no/lrec2006/pdf/384_pdf.pdf 
18

 http://xwn.hlt.utdallas.edu/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

o Nouns. Glosses of nouns were extended by adding noun + is at the beginning of the gloss and a 
full stop at the end of the definition. For example, the gloss for play would be: play is the act 

using a sword (or other weapon) vigorously and skilfully. 

 Logical analysis. After applying a syntactic analysis, a logical form transformation is then applied. 
Syntactic relations, syntactic objects, prepositional links, complex nominalizations, etc.  

 Semantic analysis. Two versions were created: automatic and manual. Automatic annotation was 
applied by using one specific system to disambiguate WN glosses (XWN WSD) and another system to 

disambiguate free text. The decision process was a voting system that selected those senses with a 

coincidence between both systems with a precision of 90%. There were three different categories for 

semantic annotation (the verbs to be and to have were managed in a special way): 

o GOLD: manual annotation.  
o SILVER: both WSD methods returned the same value. 
o NORMAL: only uses XWN WSD value.  

This annotation obtained 100% coverage and 70% precision. Words labelled with the same sense 

of both systems obtained 90% precision. 

 
In our approach 551,551 relations were integrated using XWN1.7 and 419,387 using XWN3.0. 

2.3 Semantic integration resources 

Additional information with which to solve different NLP problems was obtained by using different 

semantic integration resources (ISR). However, one of the main problems is decentralisation. WN can 

solve this problem because it has been used as a basic resource to build others. Different resources 

currently use WN as basis to build their structures. For example: 

 MultiWordNet
19

 (MWN) (Pianta et al., 2002) is a project that is being carried out by the ITC-IRST 

group in Trento, Italy in order to build an Italian WN that is aligned with the English WN (Princeton). 

Its first version contains around 37,000 Italian words that are organised in 28,000 synsets and their 

connections to English synsets. MultiWordNet is different from EuroWordNet because at least two 

methods can be used to build a multilingual WN. The first method was used in the EuroWordNet 

project and consists of building the WNs of specific languages independently, while correspondences 

are found in a second phase (Vossen, 1998). The second method was used in MultiWordNet and 

consists of building WNs of specific languages from semantic relations in the English WN. New 

synsets are obtained from English synsets. In MWN, the domain information has been automatically 

transferred from English to Italian, thus obtaining WND (Bentivogli et al., 2004).  

 
EuroWordNet (EWN) (Dorr and Castellón, 1997, Vossen, 1998) was developed to align English, 

Spanish, Dutch, Italian, German, French, Czech and Estonian. New versions included Swedish, 

Norwegian, Danish, Greek, Portuguese, Basque, Catalan, Romanian, Lithuanian, Russian, Bulgarian 

and Slovenian. The ILI (Inter-Lingual-Index) is used to connect each language (Vossen et al., 1999). 

The use of ILI allows closer senses to be obtained among each language, thus reducing polysemy and 

obtaining a large connection among different languages. 
 

 Meaning: the Multilingual Central Repository (Meaning project MCR) (Atserias et al., 2004) is 
integrated into the EWN framework with five local WNs, including the English WN, and uses an 

improved version of Superior Concept ontology of EWN, MWN Domains, SUMO (Suggested Upper 

Merged Ontology) (Zouaq et al., 2009) and new semantic relations from corpus. The first version of 

MCR includes only conceptual Knowledge with semantic relations among synsets from local WNs. 

The last version of MCR integrates: 

o ILI based on WN1.6, including base concepts from EWN, Superior Concept ontology of 
EWN, MultiWordNet Domains and SUMO;  

o Local WNs (Basque, Catalan, Italian and Spanish) related to ILI, including WN English 
versions 1.5, 1.6, 1.7 and 1.7.1. 

o Semantic preferences collections, from SemCor and BNC, and nominal entities. 
 

 UBY: this is a large-scale lexical-semantic resource for NLP based on the ISO standard Lexical 
Markup Framework (LMF). UBY combines a wide range of information from expert and 

collaboratively constructed resources for English and German. It presents a web browser and an API 

                                                           
19

 http://multiwordnet.itc.it 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

that provides facilities that allow this tool to be used. UBY currently holds structurally and 

semantically interoperable versions of ten resources in two languages:  

o EnglishWordNet, Wiktionary, Wikipedia, FrameNet and VerbNet,  
o German Wikipedia, Wiktionary, GermaNet and IMSLex-Subcat, and multilingual Omega 

Wiki. 

 
 BabelNet: this is a very large multilingual semantic network with millions of concepts obtained from: 
o An integration of WordNet and Wikipedia based on an automatic mapping algorithm, and 
o Translations of the concepts (i.e., English Wikipedia pages and WordNet synsets) based on 

Wikipedia cross-language links and the output of a machine translation system. 

As will be noted, these examples attempt to build semantic networks with a common interface. In 

most cases, ISRs apply lexical integration with a few conceptual resources but in our proposal ISR-WN 

provides semantic information and other types of relations. 

ISRs are resources that help to improve the results of tasks such as: document classification, entity 

discrimination, author detection, etc. The improvement is owing to the fact that ISRs provide the contexts 

analysed with additional information (for example, subjectivity detection, contextual domain, etc). 

In this work we present a resource that provides navigability from any word sense in WN, domain 

labels (and sentiments from WNA), SUMO categories or Semantic Classes through the use of semantic 

relations, as shown in Fig 4. As a result, we can extract the multidimensionality of each sentence through 

the use of all the concepts and words related in a semantic network. We can also detect sentiment 

polarities in each sentence from SWN and relate them with WND, WNA, SUMO and SC. 

 
Fig 4. Sentence semantic characteristics extraction (only a few labels). 

After using the WN interface to study different lexical and conceptual resources, our goal is to 

develop a tool with which to align different WN-based resources and exploit all their relations: 

hyperonymy, meronymy, synonymy, etc. The result will be a graph with a set of nodes that represent WN 

synsets, WND concepts, SUMO categories, WNA concepts, SC labels or SWN sentiment polarities. 

Trait

Social Science

Quality

PedagogyFactotum

Administration

Root Domain

Free Time

WNA

WND

SUMO

Subjective Assessment Attribute

Removing

Occupational Role

Educational Process

Organization

Normative Attribute

Relational Attribute

Attribute

Abstract

Entity

Group

Collection

Object

Physical

Social Role

Organizational  Process

International Process

Process

Transfer

Motion

use.v.04

be.v.08

exist.v.01

be.v.01

be.v.05

be.v.02

constitute.v.01

be.v.03

equal.v.01

comunicate.v.02

metalic_element.n.01

strike.v.01

travel.v.01

move.v.02

get_rid_of.v.01

sell.v.01

geographical_area.n.01

natural_process.n.01 content.n.01

teacher.n.01

change_of_state.n.01

structure.n.01

cognition.n.01

organization.n.01

gathering.n.01

SC

Positive SWN

Negative SWN

Objetive SWN

“But it is unfair to dump on teachers as distinct from the educationalestablishment”


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

3. New integrated resource: ISR-WN 

Our semantic integration proposal is denominated as ISR-WN (Integrated Semantic Resource aligned 

with WordNet). It consists of the integration of different resources that are isolated but which can be 

aligned with WN. 

Considering that WordNet provides synsets (S) that contain a set of synonym words and WND, WNA, 

SUMO, SC and SWN provide a set of concepts (C) we can relate each concept with different synsets 

from WordNet. Therefore, ISR-WN is a Lexical Knowledge Base (LKB) that is represented as a complete 

non directed graph
20

          . Vertexes are represented with concepts                 or senses 

              , that is,      . Relations between two vertexes  i, j are represented with an 
edge     .  

According to the different resources taken into account, we have developed the integration knowledge 

base of ISR-WN based on the following resources: WN, WND, SUMO, WNA, Semantic Classes (SC), 

SWN and eXtended WordNet (XWN) 1.7 and 3.0.  Note that XWN provides additional semantic relations 

among WordNet synsets. 

Bellow, we describe each version and the integration process in each case. 

3.1 Integration process 

The integration process takes into account the above mentioned resources. The aim is to obtain a new 

environment from which to retrieve semantic information using a unified set of resources. 

It is necessary to mention that WN is one of the most frequently used resources during NLP, since it 

provides a sense inventory. Its possibilities have motivated several researchers to develop taxonomies 

with new semantic information (Magnini and Cavaglia, 2000, Valitutti et al., 2004, Niles and Pease, 

2003, Niles, 2001, Sara and Daniele, 2009, Forner, 2005, Strapparava and Valitutti, 2004). With regard to 

the resulting resources, we can mention SUMO (Niles and Pease, 2001) , WND (Sara and Daniele, 2009), 

WNA (Strapparava and Valitutti, 2004), SC (Izquierdo et al., 2007), SWN (Esuli and Sebastiani, 2006), 

eXtended WordNet (Sanda M. Harabagiu, 1999) and others. Several authors such as (Gliozzo et al., 2004, 

Magnini et al., 2002, Magnini et al., July 2002, Vázquez, 2009, Vázquez et al., 2004) have developed 

methods and systems based on these resources, and have also demonstrated improvements in several 

tasks: Information Extraction, Automatic Summarization, Document Indexing and Lexical 

Disambiguation. However, these authors have developed their approaches using one or two semantic 

resources, since there is no tool or resource with which to integrate all the semantic resources that are 

mapped onto WN. Bearing in mind both this fact and researchers‟ current need to use a single tool or 

resource to obtain semantic information from different resources, this work is focused on integrating the 

greatest possible quantity of semantic resources mapped onto WN into a single tool. 

ISR-WN includes WN as its lexical nucleus because its internal structure and relations provide 

relevant information for many NLP tasks. Fig 5 represents the conceptual model, in which each 

dimension (resource) is aligned with WN through the use of semantic interconnections (Gutiérrez et al., 

2010a) (Gutiérrez et al., 2011a). The main challenge as regards the integration consists of dealing with 

different versions of WN and different versions of the rest of the resources involved in this proposal. It is 

therefore necessary to match the mappings to the WN versions used by each resource. See Fig 6. 

This integration resource involves a set of semantic resources considering the following element 

distribution: 99,643 synsets for WN1.6
21

 and 115,424 synsets for WN2.0
22

 ; WND (with 172 labels), 

WNA (with 300 labels), SUMO (with 568 labels), SWN (with 117,659 labels), SC (with 1,231 labels), 

and XWN (551,551 relations for XWN1.7 and 19,387 new synset relations for XWN3.0). This is known 

as “Integration of Semantic Resources based on WordNet” (ISR-WN) (Gutiérrez et al., 2010a)  

(Gutiérrez et al., 2011a). It is important to note that all the labels and relations included in both versions 

have been reused from the resources cited. The elements of which ISR-WN is composed are shown in 

Section 2. 

                                                           
20

 This is a non-directed graph because each relation      connects two vertexes,    and     and      connects    

and   in both senses and is denoted with another semantic relation. The internal relations of WN are described in 
http://wordnet.princeton.edu/man/wninput.5WN.html. 

21
 66,025 nouns, 17,915 adjectives, 3,575 adverbs and 12,127 verbs. 

22
 79,689 nouns, 18,563 adjectives, 3,664 adverbs and 13,508 verbs. 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 
Fig 6. Logic Model of the Integration of Semantic Resources (ISR-WN). 

3.2 Integration architecture 

This section describes some of the process features used to integrate all the resources by using WN as an 

interlinking nucleus. This process takes into account each of their particularities and integrated resource 

annotations based on different WN versions, in this case by considering version 1.6 and 2.0. Despite the 

fact that this version only uses the English language, it can be extensible to other languages if the 

Interlingua Index (ILI) of EWN (Dorr and Castellón, 1997, Vossen, 1998) or a similar technology is used 

to align it to them.  

Fig 7 shows WN synsets linked to several taxonomies (SUMO, WND and WNA), and descriptions of SC 

and SWN respectively. In many cases the links are established through the use of mapping files, thus 

allowing the resources tagged in the WN versions to be interlinked. This results in the creation of an 

enriched semantic graph that is highly suitable for NLP applications. Fig 7 shows how all the semantic 

resources (the taxonomies of SUMO (in green), WND (in purple) and WNA (in red), the SC labels (in 

orange) and the SWN descriptions (in grey) are linked to the WN synsets. Particular aspects from each of 

the resources involved have therefore been taken into consideration. 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 
Fig 7. Architecture. 

With regard to integrating all the resources mentioned, mapping files are suitable for interlinking 

different WN versions, and are used by the semantic resources to map their labels with WN definitions. 

Owing to the word sense granularity of the WN versions (word senses that exist in one version cannot 

exist in another) many word senses cannot be taken into account when a mapping occurs. A suitable 

solution to this problem is to navigate all the necessary mapping files in order to take into account the 

greatest quantity of possible interlinks. For instance, WNA has been used in this proposal, and has been 

tagged with WN 2.0, while SWN has been tagged with 1.6 and 2.0 and SC with 2.0. There is, therefore, 

no need to use mapping files to integrate them. It is worth noting that those semantic resources that are 

annotated with both WN 1.6 and 2.0 reduce the lost interlinks to 0%.  

Fig 7 shows a model that allows one of the two nuclei, WN 1.6 or 2.0, to be chosen depending on the 

user‟s needs. This decision is not limited to the inclusion of other WN versions for further ISR-WN 

approaches. As can be seen when comparing the second version to the first of WN, the former has new 

semantic labels and the option of selecting the WN nucleus in order to build the semantic graph on 

demand. 

This integration model therefore reuses all the semantic relations included in the Princeton WN. 

Furthermore, in ISR-WN the SUMO categories also keep their relations with their respective WN 

mappings; moreover, the semantic relations of WND and WNA are now hypernym (owing to the is_a 

relation) and hyponym (owing to the is_child relation) in order for their hierarchy taxonomy to make 

sense. Pertainym is, correspondingly, the semantic relation used to link the WN synset to each label of 

both taxonomies (e.g. a synset can pertain to one or many WND and WNA labels). 

It is important to stress that the WNA for version 1.1 has been populated with new affective relations, 

which do not exist in WNA1.0 (e.g. entailment and cause). New relations in WNA1.1 can therefore link 

verbs, adjectives and adverbs with those nouns from which these are derived. All these considerations 

have been taken into account when developing ISR-WN resource. 

The resulting semantic interconnections of this knowledge base permit navigation among all the 

resources involved. For instance, if the English word atrocious is explored using WN 1.6 as a nucleus, a 

list of WN synsets can be obtained as regards this word. Once a synset (e.g. offset 00193347) has been 

selected from its representative list, the following information is retrieved. 

 00193347 atrocious [Adjective] 
o Similar-To (00192906 alarming [Adjective]) 
o Pertainym (Psychological_Feature [Domain]) 
o Hypernym (Subjetive_Assessment_Attribute [SUMO]) 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

o Pertainym (Emotion [Affect] ) 
o Pertainym (Horror [Affect] 
o SentiWN-Description ( atrocious#3 Pos: 0|Neg: 0.625 |Obj: 0.375 [SentiWordNet]) 
o Cause (05591212 horror [Noun])  

 
As will be noted, the offset, part-of-speech, synonym list and gloss are retrieved for each WN synset. 

The description of each semantic label from each particular resource linked to the synset being explored 

is also retrieved. As a search tool, ISR-WN also allows labels to be discovered in all the resources 

included, thus providing theirs associated information and establishing semantic navigation in order to 

find interesting semantic paths.  

An interesting case may be discussed as regards the new features introduced in this integration 

resource for WNA. For instance, in the case of exploring the label Horror from WNA, we find that this 

label is not linked directly with the third synset of atrocious (e.g atrocious#3 with offset 00193347). 

Moreover, it has been assumed that if a synset (e.g. atrocious#3) is linked with a noun (e.g. horror#1) by 

an affective relation (e.g. entailment, cause) which has been obtained from WNA1.1, the synset will be 

linked to all the WNA labels linked to its noun. Upon applying this procedure the synset horror#1, which 

is linked with the WNA label Horror provokes atrocious#3 to be additionally linked to Horror. Fig 8 
shows the aforementioned example after applying the previously mentioned procedure, focusing on the 

synset atrociuos#3 when linked to two WNA labels (e.g. Emotion and Horror). 

 
Fig 8. ISR-WN including new affective relation from WNA1.1. 

As part of the semantic improvement strategy for ISR-WN, the semantic links suggested by XWN1.7 

and XWN3.0 have also been considered. Their semantic relations to the synsets for ISR-WN have been 

tagged as follows: 

 XWN17_Relation_as_Synset, 

  XWN17_Relation_as_Gloss,  

 XWN30_Relation_as_Synset, 

 XWN30_Relation_as_Gloss.  

Different features are involved in the resulting semantic graph depending on the nucleus selected (WN 

1.6 or 2.0) in order to load the semantic knowledge base on demand. It is important to highlight that 

Relation_as_Synset and Relation_as_Gloss represent a bidirectional relation between synset and gloss. 

In the case of using WN1.6 as a nucleus, 505,755 and 199,123 relations will be involved, and are from 

XWN1.7 and XWN3.0, respectively. However, when WN2.0 is used to load ISR-WN 551,065 relations 

from XWN1.7 will be involved, in addition to 358,843 relations from XWN3.0. Table 4 shows all the 

semantic relations included in this integration resource. Note that XWN1.7 is annotated with WN1.7, and 

it is therefore necessary to apply a conversion of WN versions in order to map their offset of version 1.7 

onto the WN versions used as nucleus (1.6 or 2.0). The integration of XWN3.0, which is annotated with 

WN3.0, is performed in the same way. These mappings result in some relations being missed owing to 

the conversion issues mentioned above. An in-depth analysis of this can be found in the evaluation 

section. 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

An exploration of ISR-WN is presented as follows, in which WN1.6 is used to load the knowledge 

base. This exploration reuses the example of the word love, in which an increase in the semantic 

information is evident as regards the target synset: 

 01211759 love [Verb] 
o Antonym (01211167 hate, detest [Verb]) 
o Hyponym (01212004 love [Verb]) 
o Hyponym (012125039 care_for, cherish, hold_dear, teasure [Verb]) 
o Hyponym (01213640 dote [Verb]) 
o Hyponym (01213998 adore [Verb]) 
o Pertainym (Factotum [Domain]) 
o Hypernym (Intentional_Psychological_Process [SUMO]) 
o Pertainym (Emotion [Affect]) 
o Pertainym (Love [Affect]) 
o Stative (05607724 love [Noun]) 
o Hypernym (love.v.01 [SemanticClass]) 
o SentiWN-Description ( love#1 Pos: 0.5|Neg: 0 |Obj: 0.5 [SentiWordNet]) 
o XWN17_Relation_as_Synset (05608483 affection, affectionateness, fondness, tenderness, 

heart, warmheartedness [Noun]) 

o XWN17_Relation_as_Synset (05573285 liking [Noun]) 
o XWN17_Relation_as_Synset (01508689 have, have_got, hold [Verb]) 
o XWN17_Relation_as_Synset (01332909 great [Adjective]) 
o XWN17_Relation_as_Gloss (07626109 sweetheart, sweetie, steady, truelove [Noun]) 
o XWN17_Relation_as_Gloss (07462325 patriot, nationalist [Noun]) 
o XWN17_Relation_as_Gloss (07111212 bibliophile, booklover, book_lover [Noun]) 
o XWN17_Relation_as_Gloss (07073765 amorist [Noun] 
o XWN17_Relation_as_Gloss (06950629 lover [Noun]) 
o XWN17_Relation_as_Gloss (06930637 Brunnhilde, Brynhild [Noun]) 
o XWN17_Relation_as_Gloss (06921123 Psyche [Noun])  
o XWN17_Relation_as_Gloss (06897084 Adonis [Noun]) 
o XWN17_Relation_as_Gloss (01402650 unloved ,not loved [Adjective]) 
o XWN30_Relation_as_Gloss (05315655 hyperbaton [Noun]) 
o XWN30_Relation_as_Gloss (06950629 lover [Noun]) 
o XWN30_Relation_as_Gloss (07111212 bibliophile, booklover, book_lover [Noun]) 
o XWN30_Relation_as_Gloss (08287863 strawflower, golden_everlasting, 

yellow_paper_daisy,  Helichrysum_bracteatum [Noun]) 

o XWN30_Relation_as_Synset (05608483 affection, affectionateness, fondness, tenderness, 
heart, warmheartedness [Noun]) 

o XWN30_Relation_as_Synset (05573285 liking [Noun]) 
o XWN30_Relation_as_Synset (01213998 adore [Verb]) 
o XWN30_Relation_as_Synset (01214144 idolize, worship, hero-worship, revere [Verb]) 

 
As can be appreciated in this example, the exploration for the same word love (offset 01211759 for the 

verb love) has been considerably enriched. The semantic information obtained combines many labels 

from different resources and establishes new relations among WN synsets. This integration of resources 

allows to discovery new semantic interconnections never before shown in a single tool. Interesting 

features such as WND or SUMO concepts with positive or negative tendencies could also be mentioned. 

To return to the example of the word love, an exploration of one level in depth has been applied. 

However, if we were to navigate in greater depth, many more semantic elements would be discovered. 

Table 4 shows all the semantic relations. This table is used as a matrix to relate each resource by rows. 

Note that in Table 5 each relation in ISR-WN has one inverse relation. Most of the relation pairs have 

been taken from WN
23

 and the other resources involved, while the rest (those marked with *) have been 

created in this research work. The aim of creating bilateral relations has been to allow forwards and 

backwards navigation through the semantic links. 

 
23

 http://wordnet.princeton.edu/man/wninput.5WN.html 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

  WN WND WNA SUMO SC SWN 

WN 

Also see Pertainym Pertainym Hypernym Hypernym SentiWN_Description* 

Antonym 
  

Hyponym Hyponym   

Attribute 
    

Cause 
    

Derivationally related form 
    

Derived from adjective 
    

Domain of synset - REGION 
    

Domain of synset - TOPIC 
    

Domain of synset - USAGE 
    

Entailment 
    

Hypernym 
    

Hyponym 
    

Instance Hypernym 
    

Instance Hyponym  
    

Member holonym 
    

Member meronym 
    

Member of this domain - REGION 
    

Member of this domain - TOPIC 
    

Member of this domain - USAGE 
    

Part holonym 
    

Part meronym 
    

Participle of verb 
    

Pertainym (pertains to noun) 
    

Similar to 
    

Substance holonym 
    

Substance meronym 
    

Verb Group 
    

XWN17_Relation_as_Gloss* 

 
XWN17_Relation_as_Synset* 

XWN30_Relation_as_Gloss* 

XWN30_Relation_as_Synset* 

  
WND Pertainym Hypernym 
  

    Hyponym 

  
WNA Pertainym 
  

Hypernym 
      

    Hyponym 

  
SUMO Hypernym 
    

Hypernym 
    

  Hyponym Hyponym 

  
SC Hypernym 
      

Hypernym 
  

  Hyponym Hyponym 

  
SWN SentiWN_Description*           

Table 4. Relations between semantic labels on ISR-WN. 

Bidirectional relations on WordNet 

Pointer Reflect 

Antonym Antonym 

Hyponym Hypernym 

Hypernym Hyponym 

Instance Hyponym Instance Hypernym 

Instance Hypernym Instance Hyponym 

Holonym Meronym 

Meronym Holonym 

Similar to Similar to 

Attribute Attribute 

Verb Group Verb Group 

Derivationally Related Derivationally Related 

Domain of synset Member of Doman 

  
Bidirectional relations added by ISR-WN 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Pointer Reflect 

Entailment Cause 

Participle Participle 

Pertainym Pertainym 

XWN17_Relation_as_Gloss* XWN17_Relation_as_Synset* 

XWN30_Relation_as_Gloss* XWN30_Relation_as_Synset* 

SentiWN_Description* SentiWN_Description* 

Stative Cause 

Synonymy Synonymy 

Table 5. Bidirectional relations on ISR-WN. 

4. Integration process evaluation 

In this section the results obtained are analysed by taking into consideration the semantic elements that 

are available for the integration and those eventually integrated. Once developed ISR-WN following the 

aforementioned process, an evaluation took place. This was carried out in order to obtain a statistic 

assessment of the quantity of WN synsets that should be linked as regards those established. Two 

evaluations were applied based on the semantic knowledge interlinked depending on the nucleus (WN 1.6 

or WN2.0) selected for loading the semantic graph.  

4.1 Evaluation based on WN1.6 as a nucleus 

Table 6 shows a distribution of the semantic labels involved in ISR-WN with WN1.6 as a nucleus. In this 
analysis the greatest possible quantity of labels has been extracted for inclusion, and the labels which 

have not been included are justified below. 

An important factor that supports the 100% of alignments of WND, SUMO and SC has been the 

reduction in the usage of mapping files. This was possible since these three resources were built based on 

WN 1.6 and the current evaluated nucleus is WN1.6. 

 
WND 

2.0 
SUMO 

WNA 

1.0-1.1 
SC 

SWN 

3.0 

XWN 

1.7 

XWN 

3.0 
Mean 

# Labels 172 568 300 1,231 117,659 - -  

Synsets to link         

n 86,901 67,923 1,256 66,025 82,114 - -  

a 19,322 18,531 2,418 - 18,157 - -  

v 12,843 12,469 801 12,127 13,767 - -  

r 3,735 3,627 614 - 3,621 - -  

Total of synsets to link 12,2801 102,550 5,089 78,152 117,659 551,551 419,387  

Linked Synsets         

n 86,901 67,923 1,096 66,025 56,563 - -  

a 19,322 18,531 2,125  8,757 - -  

v 12,901 12,469 474 12,127 9,223 - -  

r 3,735 3,627 549  2,101 - -  

Total of linked synsets 122,801 102,550 4,244 78,152 76,644 505,755 119,123  

Difference 0 0 845 0 41,015 45,796 300,264  

Linked % 100.00 100.00 83.40 100.00 65.14 91.70 28.40 81.23 

Table 6. Synsets linked to each resource by using WN1.6 as a nucleus. 

In this integration resource we have considered both WNA versions (1.0 and 1.1) to be a single 

resource and have mixed them. This mixture has been developed by involving the taxonomy of WNA1.1 

and the WN mappings from both versions (1.0 and 1.1). However, a difference in labels (as regards all 

linked synsets with regard to those synsets that should be linked) between WNA 1.0 and WNA 1.1 has 

caused 845 missing links (see Table 6). This special case is owing to the fact that most of the WNA1.0 

labels are included in WNA1.1, but several WNA1.1 labels are not included in WNA1.0. Some links have 

therefore been removed for the integration. All the affective labels not considered by ISR-WN because 

they are not represented in the WNA1.1 taxonomy are shown as follows: attitude, emotional response, 

psy, man, sympathy, sta, softheartedness, joy-pride, identification, levity-gaiety, general-gaiety, empathy, 

positive-concern, compatibility, kindheartedness and buck-fever. Note that the taxonomies used are the 

most recent for WND and WNA (i.e. WND 3.2 and WNA1.1). 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

WN mapping files have also been considered for the alignment of WNA to WN. WNA1.1 has a 

peculiar feature consisting of the fact that only nouns are linked to affective labels. Adjectives, verbs and 

adverbs are therefore linked to these nouns at sense level by means of derived relations (i.e. entailment 

and cause). The description of this issue can be found in the section “3.1 Integration process”. 

Furthermore, when evaluating SWN integration based on WN 1.6 as a nucleus, it was found that the 

main difference between the mismatching of SWN3.0 alignments had occurred because this resource had 

been developed on the basis of a very different WN version, WN3.0. It is therefore possible to find that 

several SWN features that are annotated with WN3.0 do not exist for WN1.6. This justifies all the lost 

links shown in Table 6.  

Table 6 also shows how the integration of XWN1.7 reached 91.70%. Moreover, with XWN3.0 it 

reached a lower integration of 28.40%. A similar phenomenon occurred with SWN3.0 in which several 

mapping WN versions were involved. This is responsible for the low enrichment of ISR-WN as regards 

XWN3.0 when WN1.6 is used as a nucleus. Graph-based approaches, such as (Gutiérrez, 2012) which 

uses ISR-WN with XWN3.0, cannot therefore obtain the same relevant results as those obtained using 

XWN1.7. 

Once ISR-WN is loaded with WN1.6, it contains a total of 178,558 vertexes, of which 99,643 

represent WN synsets, 172 WND domains, 558 SUMO categories, 300 WNA affective labels, 1,231 SC 

semantic classes and, finally, 76,644 SWN descriptions. ISR-WN therefore includes a total of 

232,6211semantic relations with WN1.6 as a nucleus.  

4.2 Evaluation based on WN2.0 as a nucleus 

Table 7 shows a similar comparative study to Table 6, but here WN2.0 is considered as a nucleus. 

 
WND  

3.2 
SUMO 

WNA 

1.0-1.1 
SC 

SWN 

3.0 

XWN 

1.7 

XWN 

3.0 
Mean 

# Labels 172 568 300 1,231 117,659 - -  

Synsets to link              

n 103,504 79,688 1,256 66,025 82,114 - -  

a 19,398 18,564 2,418 - 18,157 - -  

v 19,398 13,507 801 12,127 13,767 - -  

r 3,835 3,663 614 - 3,621 - -  

Total of synsets to link 146,135 115,422 5,089 78,152 117,659 551,551 419,387  

Linked Synsets              

n 103,504 79,688 1,089 65,904 78,061 - -  

a 19,398 18,564 2,118   11,052 - -  

v 19,398 13,507 473 12,064 13,207 - -  

r 3,835 3,663 580   3,428 - -  

Total of linked synsets 146,135 115,422 4,260 77,968 105,748 505,755 119,123  

Difference 0 0 829 184 11,911 486 60,544  

Linked % 100.00 100.00 83.71 99.76 89.88 99.91 85.56 94.11 

Table 7. Synsets linked to each resource by using WN2.0 as a nucleus. 

As can be seen in Table 7, in ISR-WN SC has some missing links, since it has been developed using 

WN1.6 and has to connect with WN2.0 through the use of mapping files. When this occurs, some synsets 

are not considered since some of them do not exist for both versions (1.6 and 2.0). It will, however, be 

noted that SWN links increase in comparison to the previous distribution table because the mapping 

versions are reduced. 

WNA 1.0 and WNA 1.1 have been distributed together in both tables, since both have been seen as a 

single resource. All the WNA1.1 labels and the synset-label relations from WNA 1.0 and 1.1 have 

therefore been considered. However, only all those WNA1.0 labels that do not exist in the WNA1.1 

taxonomy have not been considered. Both versions of WNA are thus considered to be a single resource. 

As Table 7 shows, when using WN2.0 as a nucleus the incorporation of XWN increases significantly, 

particularly in the case of XWN3.0. It is most appropriate to use ISR-WN with WN2.0 since the mean of 

the integration percentage was 94.11%, while for WN1.6 it was 81.23%. It is therefore evident that the 

best integration of resources based on WN occurs when WN2.0 is taken into account. In this case, 

223,448 vertices have been computed, of which 115,424 are WN synsets, 172 are WND domains, 568 are 

SUMO categories, 300 are WNA affective domains, 1,231 are SC labels and 105,748 are SWN 

descriptions. All these vertices are related to 2,893,838 semantic relations. 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

5. NLP tasks description 

As we have mentioned before, our goal in this work is to provide a semantic framework able to enrich 

texts with semantic information and use this new information to classify, obtain opinions or recommend 

similar texts in open domains. To do this, we use ISR-WN as knowledge source for different NLP tasks. 

Basically, we have conducted four different tasks to add new information from different point of views: 

 
1- Word Sense Disambiguation (WSD). We need to obtain the correct sense of each word in 
order to obtain accurate results. Due to the fact that one word has different meanings 

depending on the context, it is necessary to decide which word sense is the correct because it 

will determine a proper understanding.  

2- Semantic similarity. The goal of this task is to discover similar content. According to the 
words, senses, content structure, etc., we will be able to obtain similar opinions that could be 

expressed with different words but the meaning is still similar.  

3- Domain classification. This task will provide the most relevant domains of a text a set of 
texts. This information will be useful to annotate texts in order to recommend similar 

content. 

4- Sentiment analysis. The goal of this task is to decide whether an opinion is positive or 
negative automatically. 

Moreover, many scientific researchers have used the semantic features that ISR-WN provides to deal 

with different NLP tasks. Some of these tasks and their contributions are shown below. Notice that most 

of semantic NLP challenges are based on these WN versions, therefore this integration resource is very 

useful for dealing with these tasks (see http://senseval.org/). 

Word Sense Disambiguation approaches (WSD): In order to create effective NLP systems it is 

necessary to transform the information extracted from the words in plain text into a conceptual level to 

detect meaningful word senses. In WSD the goal is to determine the words‟ senses in a text in which a 

word may have different senses depending on the context that it appears in. So that, the main purpose of 

the above listed approaches is to automatically choose the intended sense (meaning) of a word in a 

particular context. To be able to do that, it can be used two types of approaches: knowledge based 

approaches (Zhibiao and Martha, 1994), (Leacock and Chodorow, 1998) and corpus-based approaches 

(Peter, 2001). The former requires lexical resources such as WordNet, Roget's Thesaurus
24

, BabelNet
25

, 

DBpedia
26

, etc. to obtain semantic similarities, while the latter uses co-occurrences to measure the 

similarity between words. 

In our case, by conducting knowledge based approaches through ISR-WN, we find all the possible 

synsets for the word in WordNet and also their links to different concepts of the aforementioned resources 

(i.e. WND, WNA, SUMO, SC) being able to group word synsets under semantic concepts facilitating 

their distinctions depending on the context. For example for the concept “Sport” of WND we can find 

three word senses with respect to the word “exercise”, two for “School”, one for “Sociology” and one for 

“Pedagogy”. So that, if we determine the context of the sentence where “exercise” is being used, we can 

reduce the number of possible meanings to assign. The advantage to use ISR-WN is we can do this 

procedure not just using WND, we can also utilise the rest of the conceptual resources at the same 

platform. 

WSD can be also considered a task for doing text‟s semantic enrichment, since by applying this 

technique and depending on the knowledge base used; the processed text can be tagged or linked with 

different sources. This phenomenon can be associated to the task Entity Linking in which the matching of 

a textual entity mention to a knowledge base is performed. This entity mention can be a Wikipedia page, 

DBpedia entry, or a specific URI that identify an entity, that is a canonical entry for that entity (Rao et al., 

2013).  The reason whereby we are introducing the Entity Linking task is because we also can integrate 

BabelNet with ISR-WN for dealing with it, even for different languages. Such is the case of (Gutiérrez et 

al., 2013) in Semeval-2013 for Task 12 “Multilingual Word Sense Disambiguation”
27

. 

A list of WSD approaches that have been supported the conceptual integration of ISR-WN are 

described below. Among them, the second approach not just describes a word sense disambiguation 

approach, it also describes how by using ISR-WN relevant conceptual trees of every conceptual resource 

                                                           
24

 http://www.thesaurus.com/roget 
25

 http://babelnet.org/ 
26

 http://dbpedia.org 
27

http://www.cs.york.ac.uk/semeval-2013/task12/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

integrated in ISR-WN, can be obtained. These trees can be interpreted as a manner of obtain those 

concepts that classify a text. For example, in the sentence taken from an IMDb‟s 
28

 user review “Grey's 

Anatomy stars Ellen Pompeo (who has starred in a few movies but was never really noticeable) as the 

narrator - Meredith Grey”, using this approach based on WND this text can be classified with the 

following concepts [“Cinema”, “Art”, “Humanities”]. The following researches, which are based on ISR-
WN, support the tasks mentioned:  

 Participation of the UMCC-DLSI research team in Semeval-2010 for Task 17: All-words Word 
Sense Disambiguation on Specific Domain (Agirre et al., 2010). A description of this approach 

can be found in (Gutiérrez et al., 2010b). 

 Research paper (Gutiérrez et al., 2011b) which was discussed in the RANLP 2011 conference
 29

 
and which combined ISR-WN (considering , WN, WND, WNA and SUMO) and word sense 

frequencies to solve Word Sense Disambiguation. This proposal attained higher results in 

comparison to top proposals at a world level. 

 Research paper (Gutiérrez et al., 2011d) which was discussed in the NLDB2011 conference
 30

 
and which used ISR-WN (considering WN, WND, WNA and SUMO) as a graph-based 

approach to deal with Word Sense Disambiguation. This proposal attained important results in 

comparison with the reported submissions of Senseval-2 (Cotton et al., 2001). 

Participation of the UMCC-DLSI research team in Semeval-2013 for Task 12 “Multilingual 

Word Sense Disambiguation”
31

. This approach attained the first position of the rank. A 

description of this approach can be found in (Gutiérrez et al., 2013) 

In order to show the results obtained by WSD approaches supported by ISR-WN we describe the two 

types used in this work: graph-based  (Gutiérrez et al., 2013) and tree-based (Gutiérrez et al., 2011b). The 

graph-based approach takes into account the whole ISR-WN graph as configuration, thus it includes all 

ISR-WN resources and their links for applying WSD tasks. As part of the evaluation of this approach we 

participated in the Semeval-2013 for Task 12 “Multilingual Word Sense Disambiguation” Campaign. For 

example, for evaluating WSD in English language, 1,931 instances of Babelnet, 1,242 of Wikipedia and 

1,644 of WordNet were considered. Note that in this case ISR-WN and BabelNet, were aligned through 

WordNet, since WordNet is common for both. Based on the corpus
32

 provided by Task 12 “Multilingual 

Word Sense Disambiguation”, our WSD approach was able to be placed at the top of the campaign 

ranking, reaching a F1 around 68.5%, 54.6% and 64.7% for BabelNet, Wikipedia and WN respectively.  

More details about the approach and its experiments on Semeval-2013 campaign see (Gutiérrez et al., 

2013). 

  Resources      

Files Experiments WN WNA SUMO WND MFS Precision Recall F1 
F1 Difference with 

 the Best system 

Corpus 
d00.txt [648 

instances] 

Exp 1 (MFS)  
    

X 0.565 0.564 0.564 

 
Exp 2  
   

X 
 

0.572 0.572 0.572 

Exp 3  
  

X 
  

0.561 0.56 0.560 

Exp 4  
 

X 
   

0.555 0.554 0.554 

Exp 5  X 
    

0.572 0.572 0.572 

Exp 6 (Voting) all 
ISR-WN as 

configuration 
X X X X X 0.575 0.575 0.575 

Whole  

corpus 
(d00.txt[648 

instances],  
d01.txt[1032 

instances],  

d02.txt[757 
instances]) 

Exp 7 (MFS)   
    

X 0.601 0.599 0.600 0.090 

Exp 8 (Voting) all 
ISR-WN as 

configuration 
X X X X X 0.610 0.609 0.609 0.081 

Best system of 

Senseval-2      
0.690 0.690 0.690 

Average of the 

Senseval-2 Results      
0.499 0.360 0.391 

Worst system of 

Senseval-2      
0.370 0.345 0.357 

Table 8. Evaluation results of the tree based WSD approach with Senseval-2 corpora 

                                                           
28

 http://www.imdb.com/ 
29

 http://lml.bas.bg/ranlp2011/ 
30

 http://gplsi.dlsi.ua.es/congresos/nldb11/ 
31

 http://www.cs.york.ac.uk/semeval-2013/ 
32

 https://www.cs.york.ac.uk/semeval-2013/task12/index.php%3Fid=data.html 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

In order to show how to use each resource individually (i.e. WN, WNA, SUMO and WND) we 

describe another WSD approach: tree-based (Gutiérrez et al., 2011b).  In next table we show the 

evaluation, using the corpora provided by the public Rada Mihalcea‟s repository. This repository includes 

2,447 instances from the Senseval-2 competition “The English All-Words Task” (Cotton et al., 2001).   

Table 8 shows the results by using different configurations: each resource individually or all together. 

Note that the most frequent word sense (MFS) is also considered as a dimension for the voting. More 

details in (Gutiérrez et al., 2011b). 

Opinion Mining approaches: Nowadays, Opinion Mining (OM), also known as Sentiment Analysis 

(SA), as part of an NLP task has become a popular discipline due to its wide-relatedness to social media 

behaviour studies. OM is commonly used to analyse the comments that people post on social networks. 

Also, it allows identifying the preferences and criteria of users about situations, events, products, brands, 

etc. In order to exploit the potential that provides ISR-WN with the integration of semantic, conceptual, 

affective and sentiment scoring resources some approaches have been produced for the detection of 

opinions, relevance, and polarity classification and also measuring the impact of the sentiment polarity 

detection on Textual Entailment tasks. Those approaches are the following: 

 The multidimensional point of view of ISR-WN has been useful in the Opinion Mining area. In 
this case it was presented in the WASSA‟11 Workshop

33
 , dealing with three opinion mining 

tasks in the MOAT (NTCIR Multilingual Opinion Analysis Task
34

) competition system. These 

tasks consist of identifying whether or not a sentence represents an opinion. Another task 

involves identifying the polarity of the opinion sentence, while the last one consists of aligning 

questions with topics. This proposal used ISR-WN to extract relevant concepts which represent 

the sentences analysed. The Opinion Mining tasks were carried out by taking these concepts and 

linking them with SWN (in Enriched ISR-WN) sentiment polarities. This research attained 

relevant results which could be comparable with the first places attained in the MOAT campaign 

(Gutiérrez et al., 2011c) 

 Paper research “Approaching Textual Entailment with Sentiment Polarity” (Fernández et al., 
2012b), discussed in the ICAI‟12 conference: The 2012 International Conference on Artificial 

Intelligence. This takes ISR-WN as a knowledge base and uses the (Gutiérrez et al., 2011c) 

approach to determine Textual Entailment with Opinion Mining techniques. 

Semantic Textual Similarity (STS) approaches: STS is related to Textual Entailment35 (TE) and 

Paraphrase36 tasks. The main difference is that STS assumes bidirectional graded equivalence between the 

pair of textual snippets. In case of TE, the equivalence is directional (e.g. a student is a person, but a 

person is not necessarily a student). In addition, STS differs from TE and Paraphrase in that, rather than 

being a binary yes/no decision, STS is a similarity-graded notion (e.g. a student is more similar to a 

person than a dog to a person). This graded bidirectional is useful for NLP tasks such as Machine 

Translation (MT), Information Extraction (IE), Question Answering (QA), and Summarization. Several 

semantic tasks could be added as modules in the STS framework, such as WSD and Induction, Lexical 

Substitution, Semantic Role Labelling, Multiword Expression detection and handling, Anaphora and Co-

reference resolution, Time and Date resolution and Named Entity, among others”. Below can be found 

different participations on international competitions where STS systems made use of ISR-WN a 

semantic nucleus allowing judging textual similarities from multidimensional perspectives such as from 

domain and affective points of view and others. The following researches support STS tasks based on 

ISR-WN: 

 Participation of the UMCC-DLSI research team in Semeval-2012 in Task 6 “Semantic Textual 
Similarity”(Agirre et al., 2012) . A description of this approach can be found in  (Fernández et 

al., 2012a). 

 Participation of the UMCC-DLSI research team in Semeval-2013 Task 5
37

. “UMCC_DLSI-

(EPS): Paraphrases Detection Based on Semantic Distance” (Dávila et al., 2013) discussed in 

Second Joint Conference on Lexical and Computational Semantics (*SEM). It took ISR-WN as 

a knowledge base to determine the semantic similarity of different short texts. 

                                                           
33

 http://gplsi.dlsi.ua.es/congresos/wassa2011/ 
34

 http://research.nii.ac.jp/ntcir/ntcir-ws8/ws-en.html 
35

 http://aclweb.org/aclwiki/index.php?title=Recognizing_Textual_Entailment 
36

 The adaptation or alteration of a text or quotation to serve a different purpose from that of the original. 
37

 http://www.cs.york.ac.uk/semeval-2013/task5/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 Participation of the UMCC-DLSI research team in Semeval-2012 in Task 6 “Semantic Textual 
Similarity”

38
. A description of this approach can be found in (Chávez et al., 2013). 

Ongoing approach: In order to capture the semantics of a textual question, we pretend to use the 

aforementioned approaches (WSD and STS) supported by ISR-WN, to make interpretations of both 

questions written in natural language and ontology node‟s descriptions (and also considering the name of 

the adjacent nodes) for finding further alignments with existing ontologies. Reusing the work below, 

which has been the first approach by using ISR-WN, we would be able to recover those ontology nodes 

that involve textual descriptions relevant to textual question. The following research is part of this 

ongoing approach. 

 Paper research “Semantic Information extraction method on ontologies”(Dávila et al., 2012) 
discussed in SEPLN‟12: XXVIII Congreso de la Sociedad Española para el Procesamiento del 

Lenguaje Natural, in which semantic ontology searches are applied on the basis of ISR-WN and 

natural language adaptation. 

Next, we describe the annotation process through a case of study taken from IMDb. 

6. Case study 

In order to enrich texts using ISR-WN, we have developed different approaches to automatically process 

textual information and obtain the semantic knowledge associated to each word or to the general context. 

As a result, we provide a semantic framework suitable of being used as support of content-based 

recommender systems that is able to find contents that are similar to those already evaluated by the users 

by considering semantic information.  In this section we illustrate the annotation process with an example 

obtained from IMDb
39

 (Internet Movie Database) and explain the different approaches used to conduct 

the annotation process. 

Our experiments are carried out using textual information from IMDb. IMDb is the world‟s most 

popular and authoritative source for movie, TV and celebrity content. This website has more than 250 

million unique monthly visitors and offers a searchable database of more than 185 million data items 

including more than 3 million movies, TV and entertainment programs and more than 6 million cast and 

crew members.  

As we have mentioned above, IMDb provides a lot of information related to movies, TV shows or TV 

series. However, due to the fact that we are interested in people‟s interest and opinions we only focus our 

attention over the reviews & ratings section. In this case, for each movie or TV series the reviews & 

ratings section has a list of reviews from different users that give their opinions about general concepts, 

feelings, likes and dislikes, etc. Our purpose is to collect textual information provided by the reviews of 

different users about TV series or movies in order to enrich it with semantic information and therefore 

being able to rate or recommend similar content according to people feelings, interests or likes. With this 

case study, we aim at demonstrating the practical use of our proposed resource. 

7. Semantic enrichment approaches 

In this section, we briefly describe how to enrich textual information by using ISR-WN. As we have 

mentioned in Section 3, we have integrated a set of different resources in order to obtain new relations 
and therefore being able to build new connections. With ISR-WN we can enrich texts in different 

dimensions using: affection labels, domain labels, semantic classes, etc. To conduct our experiments and 

annotate texts we have developed a framework that uses different techniques in order to take into account 

each dimension of ISR-WN. 

7.1 WSD 

First of all, it is important to extract the correct senses of words. To do this, our framework uses an 

unsupervised multilingual approach of WSD. It works using a words window and applies a modification 

of the Personalizing Page Rank (Ppr) algorithm (Agirre and Soroa, 2009) considering as knowledge base 

the ISR-WN semantic network. More in detail, it uses senses frequencies to rank the synsets of each 

                                                           
38

 http://ixa2.si.ehu.es/sts/ 
39

 http://www.imdb.com 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

lemma (in a descending order) according to a calculated factor of relevance (Ppr+Freq algorithm, more 

details see in Section 7 or (Gutiérrez et al., 2013)). This approach was evaluated in Semeval-2013 
obtaining the best results of this campaign.  

An example of the results after processing a text fragment, taken form IMDb‟s user reviews, is shown 

in next table: 

 
Original fragment 

Format: [word][(word 

position in the sentence] 

WSD results 

Format: [position of the word in the sentence][word sense key suggested as 

appropriated in this context][gloss of the word sense suggested] 

Her mother(1) is(2) the 

famous(3) surgeon(4) and 

she is trying(5) to follow(6) 

in her mother's(7) 

footsteps(8) 

(1) mother%1:18:00::  "a woman who has given birth to a child (also used as a term 

of address to your mother); \"the mother of three children\" 

(2) be%2:42:03::  "have the quality of being; (copula, used with an adjective or a 

predicate noun); \"John is rich\"; \"This is not a good answer\" 

(3) famous%5:00:00:known:00 " widely known and esteemed; \"a famous actor\"; 

\"a celebrated musician\"; \"a famed scientist\"; \"an illustrious judge\"; \"a notable 

historian\"; \"a renowned painter\" surgeon%1:18:00:: "a physician who specializes 

in surgery " 

(4) be%2:42:03:: " have the quality of being; (copula, used with an adjective or a 

predicate noun); \"John is rich\"; \"This is not a good answer\"  

(5) try%2:41:00:: " make an effort or attempt; \"He tried to shake off his fears\"; 

\"The infant had essayed a few wobbly steps\"; \"The police attempted to stop the 

thief\"; \"He sought to improve himself\"; \"She always seeks to do good in the 

world\" " 

(6) follow%2:38:00::" to travel behind, go after, come after; \"The ducklings 

followed their mother around the pond\"; \"Please follow the guide through the 

museum\"  

(7) mother%1:18:00:: " a woman who has given birth to a child (also used as a term 

of address to your mother); \"the mother of three children\" 

(8) footstep%1:23:00:: " the distance covered by a step; \"he stepped off ten paces 

from the old tree and began to dig\" " 

Table 9: Disambiguation example 

As Table 9 shows, each word (nouns, verbs, adjectives and adverbs) is disambiguated. The annotation 

process after disambiguating each word obtains: lemma and word sense (mother%1:18:00::) and a gloss 

or definition with examples ("a woman who has given birth to a child (also used as a term of address to 

your mother); \"the mother of three children\"). With this information we are able to discriminate among a 

set of senses and understand the meaning of words if they appear in different contexts. 

7.2 Concepts and Polarities 

In order to obtain relevant concepts and extract sentiment information we propose an unsupervised 

knowledge-based approach, described in section before, that uses the RST (Relevant Semantic Trees) 

technique (Gutiérrez et al., 2011c). The result is a set of relevant semantic trees associated to each 

sentence that provides sentiment polarity values. 

An example of the relevant semantic tree technique based on WordNet Domains resource is shown in 

Fig 9, where the number between parentheses of each concept indicates the order of relevance from 1 to 7 

according to the next fragment.  


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT 

Fig 9.  Relevant Semantic Tree (RST) based on WordNet Domains (WND) 

This tree represents the most relevant domains related to the next text fragment extracted from a 

review of Grey‟s Anatomy from IMDb: 

 “…The story revolves around her time as an intern and the people she meets and sort of is portrayed 

as "survival camp for medical students." The minute she arrives at work, she meets Christina Yang 

(Sandra Oh - flawless in her bitchy supporting role), George O'Malley (T.R. Knight - one word: 

breakthrough performer), Izzie Stevens (Katherine Heighl - very, very believable as a model who is more 

like the girl-next-door), and Alex Karev (Justin Chambers - plays sort of a not-so-likable person). Most of 

all, there is Dr. Derek Shepherd (Patrick Dempsey - very attractive), the man that Meredith had a one-

night stand with - he just happens to be her boss. This is a show that wants to be liked….”. 

As we can see in Fig 9, there are seven domains that are closely related to the meaning of the context, 

starting from Person to Telecommunications. Moreover, WND has a hierarchy where all the domain 

labels (172 labels) are connected and Fig 9 also shows how the relevant domains are arranged into the 

hierarchy.  

This information is useful in order to classify texts into different categories and to obtain related 

information through a previous categorization. 

7.3 Semantic Textual Similarity 

Last step to enrich textual information is to extract semantic textual similarities. Our approach is a 

Machine Learning System (MLS) that uses several algorithms to extract all features: similarity measures, 

lexical-semantic alignment, semantic alignment (Fernández et al., 2012a). In order to extract the semantic 

features it uses the multidimensional resource ISR-WN. Thus our STS approach provides a value scale to 

decide whether a pair of contexts can be considered semantically similar or not. The scale has 5 values 

that go from 1 to 5. Where 1 indicates that there is no semantic relation between two pair of contexts and 

5 indicates that a pair of contexts is semantically equivalent. See the evaluation described in next section.  

8. Experiment setup and results 

Due to the fact that we need real texts to evaluate our framework, our experiments have been conducted 

by using texts from IMDb user reviews. These texts contain information related to user feelings, 

expectations, complains, acceptance, etc. Moreover, each review has associated an overall rating. 

In order to achieve comprehensive results we have extracted information related to medical TV series: 

House M.D., Grey Anatomy, etc. Accordingly to our data set, we randomly have selected 3 reviews to 

show how texts are annotated. Table 10 shows a brief part of the original reviews that have been 

annotated. 

TV series/Movies Original Texts 

Grey‟s Anatomy If you think ABC can't get any better - you're wrong. With the great success over smash-hits, 

"Desperate Housewives" and "Lost" they also picked up a few shows over mid-season hoping 

for more success. They got "Jake in Progress," "Eyes" and "Grey's Anatomy" - but "Grey's 

Top Level 

(1) Person 

Doctrines 

History (5) Heraldry 

(4) Art (2) Theatre 

Social 
Science 

(7) Telecommunications (3) Cinema 

Applied 
Science 

(6) Medicine 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Anatomy" is definitely considered the best out of those three. Grey's Anatomy stars Ellen 

Pompeo (who has starred in a few movies but was never really noticeable) as the narrator - 

Meredith Grey. Her mother is the famous surgeon and she is trying to follow in her mother's 

footsteps.… 

House M.D. Let me put it simply. I am a physician, and as an inviolable rule, I HATE medical shows. 

Granted, TV series tend to be one dimensional, due to inherent difficulties in the genre, but 

"doctor shows" are something I avoid like the proverbial plague. And then one evening I caught 

"House, MD" and was completely drawn into the show. In House I find the anti-hero that I've 

been waiting for in a medical show. The guy who knows everything, but is wrong often enough 

to keep us all guessing. I enjoy the contrast of House and his cadre of young fresh faced 

colleagues… 

Tomorrowland When the director of "The Incredibles" signed for this film, I was looking forward to the same 

amount of humour and exhilaration present in that animated masterpiece, something similar to 

what the little boy expresses when he realizes he can run on water. Nothing remotely close 

occurs in "Tomorrowland", a film that suffers from having too big a budget and hardly any 

original or exciting thoughts. It is also hindered by the fact that almost all of the actors appear 

clueless and not quite matching their characters. There's something about George Clooney… 

Table 10: Original reviews obtained from IMDb 

IMDb provides the overall rating for each TV show and movie in their database. For Grey‟s Anatomy 

the overall rating is 7.7/10 from 140,600 users, for House M.D. is 8.9/10 from 261,929 users and for 

Tomorrowland 6.7/10 from 39,939 users. 

After processing each text we obtain a set of different features that we use to enrich the original 

information. Table 11 shows the results of annotation for Affect labels and domain labels including a 

rating of positiveness, negativeness and objectiveness, respectively:  

Title Affects P N O Domains P N O 

Grey‟s Anatomy love 

annoyance 

liking 

anger 

anxiety 

0.82 

 
0.18 

 
0.64 

 
Person  

Theatre  

Cinema  

Art  

Heraldry  

Medicine 

Telecommunication  

0.72 

 
0.28 

 
0.93 

 
House M.D. sensation  

love 

wonder  

0.73 

 
0.27 

 
0.68 

 
Psychology 

Telecommunication  

Humanities  

Literature  

Theatre  

Medicine  

Person  

0.72 

 
0.28 

 
0.94 

 
Tomorrowland closeness  

belonging  

joy  

behaviour  

exhilaration  

sensation  

0.49 

 
0.51 

 
0.67 

 
Racing  

Telecommunication  

Sociology  

Theatre  

Cinema  

Sport  

Radio+Tv  

0.50 

 
0.50 

 
0.96 

 
Table 11: Results of annotation of three IMDb reviews 

Once got over the annotation process we are able to infer whether the review is positive or negative 

and also which are the general domains of each context. This information is useful if we want to 

recommend similar shows to other people because we can analyse the general content and extract 

common domains. Moreover, we are able to rate each show according to people‟s reviews gathering if 

they are positive, negative or objective. Fig 10 shows the annotation of sentiment positiveness of each 

IMDb review in comparison with the real IMDb Rating: 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 
Fig 10. Annotation of sentiment positiveness of three IMDb reviews 

As we can observe, our predictions for Grey‟s Anatomy and Tomorrowland are very similar to real 

IMDb rating. In these cases the textual enrichment process obtains very good results and the semantic 

classes and SUMO dimensions are nearby to real ranking. In case of House M.D. there are two points of 

difference between our predictions and real rating. This is due to the review selected mixture a set of 

negative and positive reactions to the show. For example, it begins saying “I HATE medical shows” but 

next says “I enjoy the contrast of House and his cadre of young fresh faced colleagues, complete with 

starched white lab coats, who struggle as much with their professionally imposed constraints, and sense 

of decorum, as they do with his personality”. So, in this case it is important to remark that despite the fact 
that there are some negative aspects our framework is able to establish that there is some positive things 

that must be taken into account. 

Another dimension to enrich texts is provided by the STS (Semantic Textual Similarity) approach. In 

STS, three features are used: syntactic (similar words, syntactic distances, etc), sentiment and semantics. 

As a result, we are able to identify similar texts by measuring them in a scale from 1 to 5 (where 1 

indicates that there is no semantic relation between a pair of contexts and 5 indicates that both are 

semantically equivalents). In Table 12 we can observe the results obtained after applying STS. The results 

indicate that there is a slight relation between Grey‟s Anatomy and House M.D. (1.53) and there is no 

relation between Grey‟s Anatomy and Tomorrowland (0). 

  Semantic Textual Similarity 

  Grey's Anatomy House M.D. Tomorrowland 

Grey's Anatomy  - 1.53 0.00 

  
Machine Learning 

technique  Support Vector Machine
40

     
Approach STS approach described in Section 7     

Table 12: STS results to measure contexts similarities  

In order to demonstrate how our framework would help to predict the rating with a set of multiple user 

reviews, we present an exhaustive evaluation over the 10 first IMDb reviews for each mentioned TV 

shows. Table 13 lists the title of each TV show considered on these evaluations and their ratings, which 

were manually set by the users. 

Title No. Review title 

IMDb 

Rating  

(0-1) 

Grey's Anatomy 

1 So far, so good 1 

2 Addictive show 0.8 

3 Excellent Series 1 

                                                           
40

 http://www.support-vector-machines.org/ 

0.570 
0.551 

0.427 

0.77 

0.89 

0.67 

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

Grey's Anatomy House Tomorrowland

P Affect

P Domain

P SUMO

P Semantic Class

P Synset

P Mean

Real IMDB Rating


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

4 Waste of Time 0.4 

5 Boring! 0.1 

6 This show is absolutely wonderful!!! 1 

7 Wonderful! 1 

8 Can someone explain me WHY this is a Top 10 Show? 0.4 

9 Started well, now its just more of the same... 0.6 

10 I'm Done--and SHAME on me for taking so long 0.1 

House 

11 Abrasive medical doctor saves lives - no matter what the cost. 1 

12 Extremely formulaic fantasy medical detective show 0.1 

13 Ingenious and Compelling!!! 1 

14 The Greatest Medical Show Ever 1 

15 Best Show Ever! 1 

16 Watch it for the characters, not the medicine! 0.7 

17 One of the best things on the box 0.9 

18 Very Good and Getting Better 0.7 

19 Everybody lies... 1 

20 Pllllllease watch a full episode. 1 

Tomorrowland 

21 Do not listen to the critics on this one! A vastly under-appreciated tale of promise and hope. 1 

22 A fantastic sci-fi experience 1 

23 A spectacular Disney sci-fi thrill ride 0.9 

24 Misunderstood 0.9 

25 Why do people hate this movie? 1 

26 Maybe a bit childish but enjoyable sci-fi tale none the less. 0.8 

27 Upbeat positive story for the whole family. 0.8 

28 A great movie! 0.9 

29 Fun kid safe sci fi movie that looks great 0.6 

30 Offensively shallow 0.1 

Table 13. List of reviews used for automatically rating these three movies 

Once processed the 30 reviews, Positive, Negative and Objective scores were obtained by a tree based 

OM approach (Gutiérrez et al., 2011c) using as configuration: WNA (Affect), WND (Domain), SUMO, 

SC (Semantic Class) and WN (Synsets). Moreover, an additional experiment was conducted to take into 

account the Harmonic Mean
41

 among all positive outputs per review.  

As Table 14 shows different types of evaluations were performed. One evaluation measure is the 

Pearson correlation that compares the real rating with the positiveness automatically obtained by our 

framework. Another one is the combination of all the outputs with the Harmonic Mean where we reach a 

69% of correlation.  

Related to each resource, we can notice that WNA (Affect) obtains the best results of correlation. It 

means that this resource provides better knowledge to deal with OM challenges due to the fact that WNA 

conceptualize and link only affective words.  

On the other hand, if we pay attention to the evaluation shown in Table 14, it shows the results of the 

mean difference between the real rating and the positiveness obtained by using each resource. In Table 

14, WNA reaches the smallest mean difference with 0.22 perceptual points of margin. 

It is very difficult to compare this kind of reviews with OM system outputs because the users usually 

set dramatic scores (e.g. 1 or 2) however; their reviews are not as cruel as their scores are. For example, 

we invite to take a look at the following IMDb review: number 12 of Table 13 titled “Extremely 

formulaic fantasy medical detective show” with a score of 0.1. 

In Table 14 it has been highlighted those cases in which the ratings are negatives. In those cases our 

OM functionality is able to identify that the scoring should be lower than the others but it is not able to set 

how much. Thus, as future work we propose to carry out new evaluations for introducing OM outputs in a 

machine learning system. At this way the framework could be able to learn from the different patterns 

provided by theses outputs and suggest more balanced results.  

 
41

 It is one of several kinds of average, and in particular one of the Pythagorean means. Typically, it is appropriate 

for situations when the average of rates is desired. 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

  
Tit. 

  
No. 

 Rating 

(0-1) 

Affect Domain SUMO Semantic Class Synset Harmonic Mean 

P N O P N O P N O P N O P N O P N O 
G

r
e
y
's

 A
n

a
to

m
y

 
1 1 0.81 0.19 0.66 0.78 0.22 0.96 0.59 0.41 0.88 0.69 0.31 0.92 0.58 0.42 0.72 0.68 0.28 0.81 

2 0.8 0.92 0.08 0.68 0.84 0.16 0.92 0.81 0.19 0.83 0.71 0.29 0.94 0.6 0.4 0.68 0.76 0.17 0.79 

3 1 0.9 0.1 0.66 0.88 0.12 0.95 0.65 0.35 0.91 0.62 0.38 0.93 0.75 0.25 0.8 0.74 0.18 0.84 

4 0.4 0.55 0.45 0.76 0.52 0.48 0.93 0.44 0.56 0.89 0.55 0.45 0.93 0.56 0.44 0.7 0.52 0.47 0.83 

5 0.1 0.25 0.75 0.76 0.64 0.36 0.96 0.52 0.48 0.92 0.58 0.42 0.94 0.66 0.34 0.7 0.47 0.43 0.84 

6 1 0.91 0.09 0.62 0.83 0.17 0.94 0.77 0.23 0.87 0.74 0.26 0.93 0.64 0.36 0.71 0.77 0.18 0.79 

7 1 0.93 0.07 0.62 0.56 0.44 0.93 0.68 0.32 0.89 0.63 0.38 0.93 0.66 0.34 0.73 0.67 0.2 0.8 

8 0.4 0.62 0.38 0.65 0.41 0.59 0.94 0.44 0.56 0.88 0.69 0.31 0.94 0.67 0.33 0.77 0.54 0.4 0.82 

9 0.6 0.79 0.21 0.66 0.59 0.41 0.89 0.55 0.45 0.87 0.61 0.39 0.92 0.62 0.38 0.73 0.62 0.34 0.8 

10 0.1 0.54 0.46 0.58 0.58 0.42 0.95 0.57 0.43 0.89 0.62 0.38 0.92 0.6 0.4 0.74 0.58 0.42 0.79 

H
o
u

se
 

11 1 0.92 0.08 0.65 0.72 0.28 0.94 0.64 0.36 0.9 0.61 0.39 0.92 0.66 0.34 0.73 0.69 0.21 0.81 

12 0.1 0.58 0.42 0.64 0.49 0.51 0.92 0.59 0.41 0.83 0.53 0.47 0.92 0.67 0.33 0.76 0.57 0.42 0.8 

13 1 0.5 0.5 0.68 0.52 0.48 0.97 0.6 0.4 0.93 0.58 0.42 0.93 0.73 0.27 0.77 0.58 0.39 0.84 

14 1 0.72 0.28 0.73 0.73 0.27 0.96 0.6 0.4 0.89 0.65 0.35 0.93 0.65 0.35 0.76 0.67 0.32 0.84 

15 1 0.76 0.24 0.55 0.7 0.3 0.94 0.56 0.44 0.91 0.61 0.39 0.94 0.66 0.34 0.73 0.65 0.33 0.78 

16 0.7 0.81 0.19 0.72 0.76 0.24 0.95 0.6 0.4 0.91 0.65 0.35 0.93 0.58 0.42 0.69 0.67 0.29 0.83 

17 0.9 0.69 0.31 0.68 0.64 0.36 0.95 0.63 0.37 0.89 0.63 0.37 0.93 0.67 0.33 0.74 0.65 0.35 0.82 

18 0.7 0.61 0.39 0.68 0.5 0.5 0.95 0.46 0.54 0.91 0.59 0.41 0.92 0.42 0.58 0.78 0.5 0.47 0.83 

19 1 0.84 0.16 0.7 0.79 0.21 0.95 0.65 0.35 0.92 0.67 0.33 0.94 0.69 0.31 0.79 0.72 0.25 0.85 

20 1 0.78 0.22 0.6 0.8 0.2 0.94 0.69 0.31 0.89 0.68 0.32 0.93 0.62 0.38 0.76 0.71 0.27 0.8 

T
o
m

o
r
r
o
w

la
n

d
 

21 1 0.74 0.26 0.6 0.83 0.17 0.95 0.77 0.23 0.88 0.72 0.28 0.93 0.65 0.35 0.75 0.74 0.25 0.8 

22 1 0.8 0.2 0.65 0.64 0.36 0.95 0.68 0.32 0.88 0.66 0.34 0.93 0.61 0.39 0.71 0.67 0.31 0.81 

23 0.9 0.58 0.42 0.65 0.56 0.44 0.93 0.51 0.49 0.87 0.57 0.43 0.92 0.65 0.35 0.71 0.57 0.42 0.8 

24 0.9 0.59 0.41 0.62 0.51 0.49 0.51 0.58 0.42 0.89 0.59 0.41 0.93 0.65 0.35 0.76 0.58 0.41 0.7 

25 1 0.69 0.31 0.59 0.63 0.37 0.94 0.57 0.43 0.91 0.59 0.41 0.93 0.6 0.4 0.75 0.61 0.38 0.8 

26 0.8 0.55 0.45 0.69 0.46 0.54 0.95 0.49 0.51 0.89 0.57 0.43 0.92 0.64 0.36 0.73 0.53 0.45 0.82 

27 0.8 0.76 0.24 0.6 0.7 0.3 0.92 0.62 0.38 0.87 0.7 0.3 0.92 0.62 0.38 0.77 0.68 0.31 0.8 

28 0.9 0.68 0.32 0.59 0.64 0.36 0.93 0.65 0.35 0.85 0.63 0.37 0.91 0.62 0.38 0.69 0.64 0.35 0.77 

29 0.6 0.76 0.24 0.69 0.73 0.27 0.93 0.66 0.34 0.85 0.49 0.51 0.93 0.56 0.44 0.71 0.62 0.33 0.81 

30 0.1 0.57 0.43 0.74 0.35 0.65 0.93 0.33 0.67 0.89 0.42 0.58 0.92 0.7 0.3 0.76 0.44 0.48 0.84 

Pearson 

correlation 
0.62     0.54     0.56     0.52     0.09     0.69     

 Difference  

(Rating-P) Mean 0.22   0.25   0.28   0.29   0.31   0.26  
 

Table 14. Opinion Mining evaluation over 30 reviews 

Another evaluation was carried out in order to compare the rating obtained by each movie (by 

considering 10 reviews per movie) with the rating suggested by our framework. Fig 11 shows how our 

suggestions were very consistent regarding to Grey‟s Anatomy. However, there are some differences with 

the results for Tomorrowland and House M.D, but not so distant. This is due to our results provide 

positive sentiment polarities and not specifically user ratings. Even though there are some differences, we 

can still obtain accurate results according to WNA or Domains dimensions. 

 
Fig 11. Review rating (rating mean) and OM rating (positiveness mean) comparison  

 
Grey's Anatomy House Tomorrowland

IMDB Rating (Mean) 0.640 0.840 0.800

Affect Positiveness (Mean) 0.722 0.721 0.672

Domain Positiveness (Mean) 0.664 0.663 0.605

SUMO Positiveness (Mean) 0.602 0.601 0.585

Semantic Class Positiveness (Mean) 0.643 0.619 0.593

Synset Positiveness (Mean) 0.634 0.636 0.632

Mean Positiveness (Mean) 0.635 0.640 0.609

0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

We want to remark that our framework is able to provide the OM outputs as inputs to a recommender 

system by different ways. One way is to provide the individual analytics considering each semantic 

resource (in this case SUMO, WND, WNA, SC, and the taxonomy of WN) separately. Another way could 

be considering as input all resources to get a single sentiment analysis result. And finally, we can provide 

all these outputs with a mean calculation.  

9. Discussion 

According to the experiments conducted and the results obtained, we can conclude that our resource ISR-

WN and the different approaches that take advantage of it are suitable of being used to acquire enough 

knowledge to make recommendations for TV-shows or films. Moreover, we are also able to classify texts 

into different labels with the purpose of ranking a set of TV-shows according to their similarities. 

Moreover, due to the fact that user reviews have associated a set of feelings to indicate likes or dislikes 

we are also able to detect emotions and classify texts into three different sections: positive, negative or 

objective.  

In some cases, there are contexts that mixed a set of emotions and can confuse the framework. This is 

the case of the review for “House M.D.”. However, we can establish the appropriate feelings by taking 

into account all the context information. 

We would like to mention another similar work such as (Briguez et al., 2014) that presents a novel 

framework for the specific domain of movie recommendation. This work proposes a complete set of 

postulates accounting for both quantitative and qualitative aspects of the movie domain implemented by 

means of DelP (Defeasible Logic Programming). Related to our work, there are some differences: first of 

all, we want to remark that we present a semantic framework as a tool to help recommender systems so, 

taking into account our proposal, we can mention that the way we provide information to recommend 

similar movies is established by measuring different dimensions (i.e. feelings, sentiment polarities, word 

senses, textual similarities), on the contrary in (Briguez et al., 2014) their system uses specific rules to 

determine which characteristics have to be shared. Despite the fact that the process of building specific 

rules helps to obtain better user recommendations, it is designed to one specific domain. In case of 

applying this system to another domain it would be necessary to create additional rules to obtain accurate 

results. On the other hand, our approach can be easily adapted to other domains or languages (since WN 

is a nucleus for other WN languages
42

) with minimal changes. Another difference with our work is that in 

(Briguez et al., 2014) the system provides a combination of quantitative and qualitative aspects however, 

we only have focused on quantitative aspects. The results obtained after mixing qualitative and 

quantitative aspects demonstrate that the incorporation of qualitative aspects introduce significant 

improvements. In fact, it would be a reasonable feature to take into consideration for future works. 

Related to evaluation results, we cannot provide an in depth comparison because our datasets are 

different. Moreover, as far as we are concerned we have not found any example in the literature using 

exactly the same dataset employed in this work.  

Even though our experiments have been carried out with user reviews for TV-shows we can deal with 

other domains. Moreover, we can deal with different languages by using a specific version of WN for 

each language. This fact has been demonstrated with our participation in different competitions where our 

framework was evaluated over different languages. 

One interesting usage of our proposal is shown in Fig 12. Through our semantic labels and features 

provided by our framework, one user could navigate among different TV-shows or movies that have 

semantic similarities and thus, provide a great experience of knowing automatically which are the movies 

or shows more appropriated to each one.  

                                                           
42

 http://globalwordnet.org/wordnets-in-the-world/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

 
Fig 12. Conceptual similarities among different IMDB reviews 

10. Conclusions and future works 

In this work our goal has been to enrich texts by measuring similarities among different 

comments/reviews, grading the positiveness or negativeness by using textual information and using also 

this information to create a recommendation rate according to people interests. To do this, we have 

developed a framework that allows the integration of several semantic resources in order to provide to 

recommender systems enough knowledge for being able to: extract similarities among texts, measure 

sentiment polarities and obtain a semantic analysis to understand contexts meanings. 

In order to build the knowledge database of the framework we have selected a set of resources by 

conducting an in-depth analysis of the different semantic resources used in NLP. Just like we have 

discussed in previous sections, several authors have created integrated resources but most of them have 

been based on linguistic features rather than on semantic or conceptual features. In this research work we 

have therefore considered previous studies in order to develop a multidimensional knowledge network 

that integrates different semantic resources. Therefore, an integrated resource (ISR-WN) has been 

proposed. This resource integrates semantic resources (WN, WND, WNA, SUMO, SC and new semantic 

relations provided by XWN and a sentiment analysis resource (SWN), using WN, from versions 1.6 and 

2.0). 

The knowledge database obtained in the integration process was evaluated in order to detect any 

problems in the alignment by using different intermediate versions of WN. So, depending on the nucleus 

of WN, WND and SUMO used, an accuracy of 100% was obtained, while accuracies of 99.76% and 

100% were obtained for SC related to WN1.6 and 2.0, respectively.  

In addition to this research we have included a set of research works that contains interesting 

knowledge bases used to deal with NLP tasks. One of the most interesting resources is SWN. It supplies 

the positivity of a sense, domain, category, emotion or semantic class. It is important to stress that this 

resource has been used in several research works to take advantage of its semantic multidimensionality. 

Based on the semantic multidimensionality that ISR-WN provides we are able to extract new 

information to classify, obtain opinions or recommend similar texts in open domains. Basically, we have 

applied four different tasks to add new information from different point of views: extract similarities 

among texts, measure sentiment polarities and obtain a semantic analysis for understanding contexts 

meanings.  

Moreover, we have illustrated a case of study with information related to movies and TV series 

reviews to demonstrate that our framework works properly. As future work we plan to enrich the ISR-

WN resource with collocation sense relations
43

. This type of information provides better results in tasks 

such as WSD (Gutiérrez, 2012).  

In (Gutiérrez et al., 2011c) ISR-WN was used to combine conceptualizations with polarities. We 

propose adding resources such as Micro-WNOp
44

 (a corpus labelled with sentiments with around 1,105 

WN synsets). Moreover, we are working to align ISR-WN with WNs in other languages in order to create 

a multilingual resource. 

                                                           
43

 Synset pairs that commonly appear together in corpus. 
44

 http://www.unipv.it/micrownop 

Love Love Sensation 

 
n 

Sensation Medicine Medicine 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

In order to add more knowledge to ISR-WN, we propose the integration of different ontologies by 

using the RDF model to align WN synsets with ontological concepts. This kind of information will help 

us to apply different techniques (i.e. Domain Classification) with specific domains such as: medicine, 

technology, tourism, pharmaceutical, etc. 

Moreover, we want to integrate BabelNet (Navigli and Ponzetto, 2010) (a resource that connects the 

multilingual web encyclopaedia Wikipedia with WN) with ISR-WN. BabelNet was considered as a sense 

inventory in Semeval2013
45

 in Task 12: Multilingual Word Sense Disambiguation. 

Finally, we plan to use the ISR-WN functionalities in order to help summarization systems to 

customize text summaries using different domains, categories, emotions, sentiment polarities, etc. 

Acknowledgments 

This research work has been partially funded by the University of Alicante, Generalitat Valenciana, 

Spanish Government and the European Commission through the projects, TIN2015-65136-C2-2-R, 

TIN2015-65100-R, SAM (FP7-611312), and PROMETEOII/2014/001. 

References 

Agirre, E., Cer, D., Diab, M. & Gonzalez-Agirre, A. (2012) SemEval-2012 Task 6: A Pilot on Semantic Textual 

Similarity. {*SEM 2012}: The First Joint Conference on Lexical and Computational Semantics -- Volume 

1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth 

International Workshop on Semantic Evaluation {(SemEval 2012)}. Montreal, Canada, Association for 

Computational Linguistics. 

Agirre, E., Lacalle, O. L. D., Fellbaum, C., Hsieh, S.-K., Tesconi, M., Monachini, M., Vossen, P. & Segers, R. (2010) 

SemEval-2010 task 17: All-words word sense disambiguation on a specific domain. Proceedings of the 5th 

International Workshop on Semantic Evaluation. Los Angeles, California, Association for Computational 

Linguistics. 

Agirre, E. & Soroa, A. (2009) Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th 

conference of the European chapter of the Association for Computational Linguistics (EACL-2009). 

Athens, Greece. 

Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B. & Vossen, P. (2004) The MEANING 

Multilingual Central Repository. Proceedings of the Second International Global WordNet Conference 

(GWC’04). . Brno, Czech Republic. 

Baccianella, S., Esuli, A. & Sebastiani, F. (2010) SENTIWORDNET 3.0: An Enhanced Lexical Resource for 

Sentiment Analysis and Opinion Mining. IN 2010, L. (Ed.) 7th Language Resources and Evaluation 

Conference. Valletta, MALTA. 

Bentivogli, L., Forner, P., Magnini, B. & Pianta, E. (2004) Revising the WORDNET DOMAINS Hierarchy: 

semantics, coverage and balancing. Proceedings of COLING 2004 Workshop on "Multilingual Linguistic 

Resources". Geneva, Switzerland. . 

Briguez, C. E., Budan, M. C. D., Deagustini, C. A. D., Maguitman, A. G., Capobianco, M. & Simari, G. R. (2014) 

Argument-based mixed recommenders and their application to movie suggestion. Expert Systems with 

Applications, 41, 6467-6482. 

Brill, E. (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-

speech tagging. MIT Press. 

Cotton, S., Edmonds, P., Kilgarriff, A. & Palmer, M. (2001) English All word. IN LINGUISTICS, A. F. C. (Ed.) 

SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguation Systems. 

Toulouse, France, Association for Computational Linguistics. 

Chávez, A., Dávila, H., Gutiérrez, Y., Collazo, A., Abreu, J. I., Fernández Orquín, A., Montoyo, A. & Muñoz, R. 

(2013) UMCC_DLSI: Textual Similarity based on Lexical-Semantic features. Second Joint Conference on 

Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the 

Shared Task: Semantic Textual Similarity. Atlanta, Georgia, USA, Association for Computational 

Linguistics. 

Dávila, H., Fernández, A., Gutiérrez, Y., Muñoz, R., Montoyo, A. & Vázquez, S. (2012) Semantic Information 

Extraction method on ontologies. SEPLN 2012: XXVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA 

PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL. Castellón, Spain. 

Dávila, H., Fernández Orquín, A., Chávez, A., Gutiérrez, Y., Collazo, A., Abreu, J. I., Montoyo, A. & Muñoz, R. 

(2013) UMCC_DLSI-(EPS): Paraphrases Detection Based on Semantic Distance. Second Joint Conference 

on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International 

                                                           
45

 http://www.cs.york.ac.uk/semeval-2013/ 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA, Association for Computational 

Linguistics. 

Dorr, B. J. & Castellón, M. A. M. A. I. (1997) Spanish EuroWordNet and LCS-Based Interlingual MT. AMTA/SIG-IL 

First Workshop on Interlinguas. San Diego, CA. 

Esuli, A. & Sebastiani, F. (2006) SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. IN 

2006, L. (Ed.) Fifth international conference on Languaje Resources and Evaluation Genoa - ITaly. 

Fellbaum, C. (1998) WordNet. An Electronic Lexical Database, University of Cambridge. 

Fernández, A., Gutiérrez, Y., Dávila, H., Chávez, A., González, A., Estrada, R., Castañeda, Y., Vázquez, S., 

Montoyo, A. & Muñoz, R. (2012a) UMCC_DLSI: Multidimensional Lexical-Semantic Textual Similarity. 

{*SEM 2012}: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: 

Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth 

International Workshop on Semantic Evaluation {(SemEval 2012)}. Montreal, Canada, Association for 

Computational Linguistics. 

Fernández, A., Gutiérrez, Y., Muñoz, R. & Montoyo, A. (2012b) Approaching Textual Entailment with Sentiment 

Polarity. ICAI'12 - The 2012 International Conference on Artificial Intelligence. Las Vegas, Nevada, USA. 

Forner, P. (2005) WordNet Domains 2.0. ITC-irst, Povo-Trento, Italy. 

Gediminas, A. & Alexander, T. (2005) Toward the Next Generation of Recommender Systems: A Survey of the 

State-of-the-Art and Possible Extensions. IEEE Educational Activities Department. 

Genesereth, M. R. & Fikes, R. E. (1992) Knowledge Interchange Format. IN UNIVERSITY, C. S. D. S. (Ed.) version 

3.0 Reference Manual ed. Stanford, Computer Science Department. 

Gliozzo, A., Strapparava, C. & Dagan, I. (2004) Unsupervised and Supervised Exploitation of Semantic Domains in 

Lexical Disambiguation. Computer Speech and Language. 

Gutiérrez, Y. (2012) Análisis Semántico Multidimensional aplicado a la Desambiguación del Lenguaje Natural. 

Departamento de Lenguajes y Sistemas Informáticos. Alicante, Alicante. 

Gutiérrez, Y., Castañeda, Y., González, A., Estrada, R., Piug, D. D., Abreu, J. I., Pérez, R., Fernández Orquín, A., 

Montoyo, A., Muñoz, R. & Camara, F. (2013) UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense 

Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation. 

Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 

Seventh International Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA, 

Association for Computational Linguistics. 

Gutiérrez, Y., Fernández, A., Montoyo, A. & Vázquez, S. (2010a) Integration of semantic resources based on 

WordNet. IN 2010, S. (Ed.) XXVI Congreso de la Sociedad Española para el Procesamiento del Lenguaje 

Natural. Universidad Politécnica de Valencia, Valencia, SEPLN 2010. 

Gutiérrez, Y., Fernández, A., Montoyo, A. & Vázquez, S. (2010b) UMCC-DLSI: Integrative resource for 

disambiguation task. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, 

Sweden, Association for Computational Linguistics. 

Gutiérrez, Y., Fernández, A., Montoyo, A. & Vázquez, S. (2011a) Enriching the Integration of Semantic Resources 

based on WordNet. Procesamiento del Lenguaje Natural, 47, 249-257. 

Gutiérrez, Y., Vázquez, S. & Montoyo, A. (2011b) Improving WSD using ISR-WN with Relevant Semantic Trees 

and SemCor Senses Frequency. Proceedings of the International Conference Recent Advances in Natural 

Language Processing 2011. Hissar, Bulgaria, RANLP 2011 Organising Committee. 

Gutiérrez, Y., Vázquez, S. & Montoyo, A. (2011c) Sentiment Classification Using Semantic Features Extracted from 

WordNet-based Resources. Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity 

and Sentiment Analysis (WASSA 2.011). Portland, Oregon., Association for Computational Linguistics. 

Gutiérrez, Y., Vázquez, S. & Montoyo, A. (2011d) Word Sense Disambiguation: A Graph-Based Approach Using N-

Cliques Partitioning Technique. IN MUÑOZ, R., MONTOYO, A. & MÉTAIS, E. (Eds.) Natural Language 

Processing and Information Systems. Springer Berlin / Heidelberg. 

Izquierdo, R. (2010) Una Aproximación a la Desambiguación del Sentido de las Palabras Basada en Clases 

Semánticas y Aprendizaje Automático. Departamento de Lenguajes y Sistemas Informáticos. Alicante, 

Universidad de Alicante. 

Izquierdo, R., Suárez, A. & Rigau, G. (2007) A Proposal of Automatic Selection of Coarse-grained Semantic Classes 

for WSD. Procesamiento del Lenguaje Natural, 39, 189-196. 

Izquierdo, R., Suárez, A. & Rigau, G. (2010) GPLSI-IXA: Using Semantic Classes to Acquire Monosemous Training 

Examples from Domain Texts Proceedings of the 5th International Workshop on Semantic Evaluation. 

Uppsala, Sweden, Association for Computational Linguistics. 

Leacock, C. & Chodorow, M. (1998) Using Corpus Statistics and WordNet Relations for Sense Identification. 

Computational Linguistics. 

Luisa Bentivogli, P. F., Bernardo Magnini, Emanuele Pianta (2005) Revising the WORDNET DOMAINS Hierarchy: 

semantics, coverage and balancing. ITC-irst – Istituto per la Ricerca Scientifica e Tecnologica Via 

Sommarive 18, Povo – Trento, Italy, 38050. 

Magnini, B. & Cavaglia, G. (2000) Integrating Subject Field Codes into WordNet. Proceedings of Third 

International Conference on Language Resources and Evaluation (LREC-2000). 

Magnini, B., Satrapparava, C., Pezzulo, G. & Gliozzo, A. (July 2002) The Role of Domains Informations in Word 

Sense Disambiguatios. Treto, Cambridge University Press. 

Magnini, B., Strapparava, C., Pezzulo, G. & Gliozzo, A. (2002) Comparing Ontology-Based and Corpus-Based 

Domain Annotations in WordNet. Proceedings of the First International WordNet Conference. Mysore, 

India. 


ACCEPTED MANUSCRIPT

A
CC

EP
TE

D
 M

A
N

U
SC

RI
PT

Marco De, G., Pasquale, L., Giovanni, S. & Pierpaolo, B. (2008) Integrating tags in a semantic content-based 

recommender. Proceedings of the 2008 ACM conference on Recommender systems. Lausanne, Switzerland, 

ACM. 

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. (1990) Five papers on WordNet. Princenton 

University, Cognositive Science Laboratory. 

Navigli, R. (2009) Word sense disambiguation: A survey. ACM Comput. Surv., 41, 10:1--10:69. 

Navigli, R. & Ponzetto, S. P. (2010) BabelNet: Building a Very Large Multilingual Semantic Network. Proceedings 

of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, 

Association for Computational Linguistics. 

Niles, I. (2001) Mapping WordNet to the SUMO Ontology. Teknowledge Corporation. 

Niles, I. & Pease, A. (2001) Origins of the IEEE Standard Upper Ontology. Working Notes of the IJCAI-2001 

Workshop on the IEEE Standard Upper Ontology. Seattle, Washington, USA. 

Niles, I. & Pease, A. (2003) Linking Lexicons and Ontologies: Mapping WordNet to the Suggested Upper Merged 

Ontology. 

Pease, A. (2007) Standard Upper Ontology Knowledge Interchange Format. 

Perner, P., Candillier, L., Meyer, F. & Boulle, M. (2007) Comparing State-of-the-Art Collaborative Filtering 

Systems. Machine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg. 

Peter, D. T. (2001) Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th 

European Conference on Machine Learning. Springer-Verlag. 

Pianta, E., Bentivogli, L. & Girardi, C. (2002) MultiWordNet. Developing an aligned multilingual database. 

Proceedings of the 1st International WordNet Conference. Mysore, India. 

Rao, D., Mcnamee, P. & Dredze, M. (2013) Entity linking: Finding extracted entities in a knowledge base. Multi-

source, multilingual information extraction and summarization. Springer. 

Russell, S. & Norvig, P. (1994) A Modern, Agent-Oriented Approach to Introductory Artificial Intelligence. 

Sanda M. Harabagiu, G. A. M., Dan I. Moldovan (1999) Wordnet 2-A morphologically and semantically enhanced 

resource. SIGLEX99: Standardizing Lexical Resources. 

Sara, T. & Daniele, P. (2009) New features for FrameNet: WordNet mapping. Proceedings of the Thirteenth 

Conference on Computational Natural Language Learning. Boulder, Colorado, Association for 

Computational Linguistics. 

Ševčenko, M. (2003) Online Presentation of an Upper Ontology. CTU Prague, Dept of Computer Science. 

Shinde, S. K. & Kulkarni, U. (2012) Hybrid personalized recommender system using centering-bunching based 

clustering algorithm. Expert Systems with Applications, 39, 1381-1387. 

Sowa, J. F. (1999) Knowledge representation: logical, philosophical, and computational foundations. Course 

Technology. 

Strapparava, C. & Valitutti, A. (2004) WordNet-Affect: an affective extension of WordNet. Proceedings of the 4th 

International Conference on Language Resources and Evaluation (LREC 2004). Lisbon. 

Turney, P. D. & Littman, M. L. (2003) Measuring Praise and Criticism: Inference of Semantic Orientation from 

Association. ACM Transactions on Information Systems (TOIS), 21, 315–346. 

Udi, M., Ash, P. & John, R. (2000) Experience with personalization of Yahoo! , ACM. 

Valitutti, A., Strapparava, C. & Stock, O. (Eds.) (2004) Developing Affective Lexical Resources, ITC-irst, Trento, 

Italy, PsychNology Journal. 

Vázquez, S. (2009) Resolución de la ambigüedad semántica mediante métodos basados en conocimiento y su 

aportación a tareas de PLN. Depto. de Lenguajes y Sistemas Informáticos. Alicante, Spain., Universidad de 

Alicante. 

Vázquez, S., Montoyo, A. & Rigau, G. (2004) Using Relevant Domains Resource for Word Sense Disambiguation. 

IC-AI’04. Proceedings of the International Conference on Artificial Intelligence. Ed: CSREA Press. Las 

Vegas, E.E.U.U. 

Vossen, P. (1998) EuroWordNet: A Multilingual Database with Lexical Semantic Networks, Dordrecht, Kluwer 

Academic Publishers. 

Vossen, P., Peters, W. & Gonzalo, J. (1999) Towards a Universal Index of Meaning. proceedings of the ACL-99 

Siglex workshop. University of Maryland. 

Walter, C.-N., Maria Luisa, H.-A., Rafael, V.-G. & Francisco, G.-S. (2012) Social knowledge-based recommender 

system. Application to the movies domain. Pergamon Press, Inc. 

Zhibiao, W. & Martha, P. (1994) Verbs semantics and lexical selection. Proceedings of the 32nd annual meeting on 

Association for Computational Linguistics. Las Cruces, New Mexico, Association for Computational 

Linguistics. 

Zouaq, A., Gagnon, M. & Ozell, B. (2009) A SUMO-based Semantic Analysis for Knowledge Extraction. 

Proceedings of the 4th Language & Technology Conference. Poznań, Poland.