A semantic framework for textual data enrichment Accepted Manuscript A semantic framework for textual data enrichment Yoan Gutiérrez , Sonia Vázquez , Andrés Montoyo PII: S0957-4174(16)30143-9 DOI: 10.1016/j.eswa.2016.03.048 Reference: ESWA 10617 To appear in: Expert Systems With Applications Received date: 23 December 2015 Revised date: 29 February 2016 Accepted date: 25 March 2016 Please cite this article as: Yoan Gutiérrez , Sonia Vázquez , Andrés Montoyo , A semantic framework for textual data enrichment, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.03.048 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. http://dx.doi.org/10.1016/j.eswa.2016.03.048 http://dx.doi.org/10.1016/j.eswa.2016.03.048 ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Highlights  A semantic framework for recommender systems is presented  An in-depth analysis of different Natural Language Processing resources is showed  A description of different Natural Language Processing approaches is addressed  Related research works are described  A case of study to evaluate our proposal with real data is presented ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT A semantic framework for textual data enrichment Yoan Gutiérrez, Sonia Vázquez * and Andrés Montoyo Department of Software and Computing Systems. University of Alicante, Spain. ygutierrez, svazquez, montoyo@dlsi.ua.es Abstract: In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to recommend similar content or even to rate texts according to different dimensions. First of all, we describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we present a combination of different NLP approaches that provide enough knowledge for being used as support tool for recommender systems. Finally, we illustrate a case of study with information related to movies and TV series to demonstrate that our framework works properly. Keywords: Recommender Systems, Framework, Integrated Semantic Resources, Sentiment Analysis, Word Sense Disambiguation, Content Categorization 1. Introduction Recent advances in modern technologies have motivated the development of different techniques to improve human-machine communication. Internet and new communication tendencies such as: short messages, forum participations, social networks, etc, have led to a revolution in the way in which people work, communicate and manage their free time. As a consequence of this technological revolution, a huge quantity of information is generated in different social contexts via diverse sources such as: forums, blogs, microblogs, social networks, etc. As a result, people are able to share their knowledge, expectations and emotions through Internet and they may also influence political, economic or social behaviour. At this point, governments, enterprises or even celebrities need to manage this information in order to extract relevant knowledge, social tendencies, etc. Because of this new context, research community in Natural Language Processing (NLP) have developed different tools with which to analyse news and opinions in order to discover what people think or how they perceive past, present and future. At present, personalization and recommender systems have gained popularity. In fact, recommender systems began to appear in the market in 1996 (Udi et al., 2000). Since then, several approaches have been developed (Gediminas and Alexander, 2005):  Content-based: these systems try to find products, services or contents that are similar to those already evaluated by the user. In this kind of systems, user‟s feedback (that can be collected in many ways) are essential to support and accomplish recommendations (Marco de et al., 2008).  Knowledge-based: these systems model the user profile in order to, through inference algorithms, identify the correlation between their preferences and existing products, services or content. (Walter et al., 2012)  Collaborative filtering: these systems create/classify groups of users that share similar profiles/behaviours in order to recommend products, services or content that has been well evaluated by the group to which a user belongs (Perner et al., 2007).  Hybrid: these systems combine two or more techniques previously mentioned to improve the „„quality‟‟ of recommendations (Shinde and Kulkarni, 2012). Dealing with textual information and obtaining valuable knowledge require advanced natural language techniques to solve different kinds of problems: document correction, automatic translation, ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT summary elaboration, opinion extraction, word sense disambiguation, etc. Solving all of these problems requires a considerable linguistic knowledge and, even more importantly, a high computational cost. In the vast majority of tasks in NLP it is necessary to use external resources such as: Machine- readable dictionaries 1 , Thesaurus 2 , Ontologies 3 and others. These resources have different internal structures, interfaces, concept relations and other characteristics. One of the most frequently used resources in its different versions is WordNet 4 (WN) (Miller et al., 1990). Various semantic resources related to WordNet have consequently been developed in different domains or by using semantic integration. But it is still difficult to find resources that provide semantic integration in different domains and which are useful for specific NLP tasks. In this work we present a new semantic resource (ISR-WN) and a set of different methods to take advantage of it with the aim of enriching texts with semantic information. As a result, we provide a semantic framework suitable of being used as support tool for content-based recommender systems by annotating texts with different features such as: sentiments, polarities or domain labels. In order to analyse the results of the semantic enrichment process, we have carried out a comprehensive case of study using texts from movies and TV series reviews obtained from IMDb 5 . Finally, we have evaluated how our proposed framework works comparing our results with real ratings. To summarize, we point out the main contributions of this work:  Taking advantage of a semantic resource with different dimensions previously developed (ISR-WN)  The use of a set of NLP methods based on ISR-WN to take advantage of each one of its semantic dimensions  Providing a new semantic framework that is able to enrich texts in several dimensions with the aim of obtaining a support tool for content-based recommender systems  An exhaustive evaluation with real datasets to demonstrate how it works The document is structured as follows. After this introduction, each semantic resource used in ISR- WN is described, and an in-depth analysis of the different approaches for semantic integration resources in NLP is also presented. Having evaluated previous proposals, in Section 3 we go on to show how ISR- WN was developed. An evaluation according to its integration effectiveness is then provided in Section 4. In Section 5 we provide a brief description of the different NLP tasks selected to enrich texts. Section 6 describes the characteristics of a case of study to illustrate how our framework works with real data obtained from IMDb. In Section 7 we show some examples of how the semantic enrichment approaches are used to annotate texts. Section 8 provides the experiment results of the case of study and Section 9 presents a discussion about the results obtained. Finally, the conclusions and further works are presented in Section 10. 2. Related Work This section presents the different semantic resources that are integrated into ISR-WN and a comparison with other semantic integration resources. 2.1 WordNet As mentioned in the previous section, WN is one of the most frequently used semantic resources in computational linguistics (Navigli, 2009). WN is a lexical database for the English language also considered as ontology. It was created at the University of Princeton 6 and it represents a semantic conceptual and structured network of nouns, verbs, adjectives and adverbs. The basic unit of knowledge is the synset (synonym sets), which represents a lexical concept (Ševčenko, 2003). A synset is associated with a unique eight-digit number called an offset (this number is the position in the data file). Each synset * Corresponding author. Tel. +34 965 90 37 72; Fax: +34 965 90 93 26. E-mail address: svazquez@dlsi.ua.es 1 Dictionaries of words available in electronic format. 2 Provides relationships among words (i.e., synonyms, antonyms and others). 3 Conceptualization of a domain in order to share information among different agents. 4 http://wordnet.princeton.edu/ 5 http://www.imdb.com 6 http://wordnet.princeton.edu/wordnet/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT represents different senses which are related through the use of semantic, conceptual or lexical connections. The result of this set of connections is a wide navigable network with a high number of interrelations among different word senses. The semantic relations among synsets are:  Synonymy  Antonymy  Hyponymy / Hyperonymy  Meronymy / Holonymy  Entailment and Cause,  and others…(more details in 7 ) WN establishes the frequency of usage of each word sense (synset)in its internal relationships 8 . For example, the word image has eight senses in WN 2.0 (see Table 1). As we can observe, one word has different senses, there is a sentence (gloss) which describes each one and each sense has a set of synonyms that are ordered by their frequency of usage. POS Offset Lemmas Gloss Noun 00039630 effigy, image, simulacrum a representation of a person (especially in the form of sculpture); "the coin bears an effigy of Lincoln"; "the emperor's tomb had his image carved in stone" Noun 00043454 Picture ,image, icon, ikon a visual representation (of an object or scene or person or abstraction) produced on a surface; "they showed us the pictures of their wedding"; "a movie is a series of images projected so rapidly that the eye integrates them" Noun 00047709 persona, image (Jungian psychology) a personal facade that one presents to the world; "a public image is as fragile as Humpty Dumpty" Noun 00053779 image, mental_image an iconic mental representation; "her imagination forced images upon her too awful to contemplate" Noun 00053832 prototype, paradigm, epitome, image a standard or typical example; "he is the prototype of good breeding"; "he provided America with an image of the good father" Noun 00059547 trope, figure_of_speech, figure, image language used in a figurative or non-literal sense Noun 00074537 double, image, look-alike someone who closely resembles a famous person (especially an actor); "he could be Gingrich's double"; "she's the very image of her mother Verb 00109926 visualize, visualize, envision, project, fancy, see, figure, picture, image imagine; conceive of; see in one's mind; "I can't see him on horseback!"; "I can see what will happen"; "I can see a risk in this strategy" Table 1. Word senses of “image” in WN 2.0 It is important to emphasize that WN has been adapted to different languages: English, Spanish, Dutch, Italian, German, French, Czech, Estonian, Swedish, Norwegian, Danish, Greek, Portuguese, Basque, Catalan, Romanian, Lithuanian, Russian, Bulgarian, Slovenian and others that are under development. These versions have been developed under the supervision of the University of Princeton and later under that of the Global WordNet Association 9 . This research work is based on two versions of WN: WN 1.6 with 99,643 synsets, of which 66,025 are nouns, 17,915 are adjectives, 3,575 are adverbs and 12,127 are verbs, and WN 2.0 with 115,424 synsets, of which 79,689 are nouns, 18,563 are adjectives, 3,664 are adverbs and 13,508 are verbs. 2.2 Semantic resources aligned to WordNet 7 http://wordnet.princeton.edu/man/wninput.5WN.html 8 https://wordnet.princeton.edu/man/cntlist.5WN.html 9 http://www.globalwordnet.org/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Owing to the fact that WN has been used in many NLP research works, a set of different semantic resources aligned to WN synsets has been developed with the aim of obtaining more knowledge. Some of these resources were created from WN, such as: WordNet Domains 10 (Magnini and Cavaglia, 2000), WordNet Affect 11 (Magnini and Cavaglia, 2000, Sara and Daniele, 2009) and Semantic Classes 12 (Izquierdo et al., 2007). Others emerged from the association of pre-produced tags, i.e., SUMO 13 . The resources used in our proposed semantic integration resource (ISR-WN) are described in detail below. 2.2.1 WordNet Domains This is a resource for the English language. WordNet Domains (WND) includes a set of Subject Field Codes (SFC) (Magnini and Cavaglia, 2000) with which to enrich WN synsets. Each SFC groups a set of words related to the same domain. On the one hand, these domains identify the context of the definition and on the other, they allow a quick search of concepts to take place. For example, if we are searching for the meaning of disc in the Computer Science context, we need only check the domain label preceding each definition (in this case, Computer Science) until we find the correct definition ([magnetic_disc, magnetic_disc, disck, disk]). This resource therefore provides the integration of domain labels in WN with the aim of reducing WN sense granularity by grouping different senses in the same domain or semantic category. In WND, WN has been annotated by using a semi-automatic process that assigns one or more domain labels to each synset. Domain labels are selected from a set of 200 hierarchically organized labels. In this research we use 172 of these labels. The main purposes of annotating WN with SFC are:  To create new relations among words. Domain labels allow us to relate words that pertain to different grammatical categories.  Semantic annotation. Domain labels are associated with synsets, signifying that the annotation takes place at the semantic level rather than at the word level.  Synsets pertaining to different syntactic categories can be included in the same domain label.  Word senses pertaining to different sub-hierarchies of WN can be included in the same domain label.  To reduce word sense granularity. Grouping different senses of the same word in the same domain label reduces the polysemy of words. Sense Domain Gloss man#1 person an adult male person (as opposed to a woman); “there were two women and six men on the bus” man#2 military someone who serves in the armed forces; “two men stood sentry duty” man#3 person the generic use of the word to refer to any human being; “it was every man for himself” man#4 factotum all of the inhabitants of the earth; “all the world loves a lover” man#5 biology, person any living or extinct member of the family Hominidae man#6 person a male subordinate; “the chief stationed two men outside the building”; “he awaited word from his man in Havana” man#7 person an adult male person who has a manly character(virile and courageous competent); “the army will make a man of you” man#8 person (informal) a male person who plays a significant role (husband or lover or boyfriend) in the life of a particular woman; “she takes good care of her man” man#9 person a manservant who acts as a personal attendant to his employer; “Jeeves was Bertie Wooster’s man” man#10 play a small object used in playing certain board games; “he taught me to set up the men on the chess board”; “he sacrificed a piece to get a strategic advantage” Table 2. Domain labels and senses for man 10 http://wndomains.fbk.eu/ 11 http://wndomains.fbk.eu/wnaffect.html 12 http://rua.ua.es/dspace/bitstream/10045/2522/1/ranlp07BLC2.pdf 13 http://www.ontologyportal.org/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT For example, the word man has ten senses in WN (see Table 2). However, in WND we can group different senses according to their domain labels, and the number of senses is thus reduced from ten to four (Magnini et al., July 2002). Moreover, there are different specification levels in the domain hierarchy. The deeper the level, the greater the specialisation is. Fig 1 shows a brief excerpt of the WND hierarchy from (Luisa Bentivogli, 2005). Fig 1. WordNet Domains hierarchy Among all the SFC domain labels there is a special one called Factotum. This domain label was created in order to group two types of synsets (Magnini and Cavaglia, 2000):  Generic Synsets. Those that it is difficult to classify into a particular domain label.  Stop senses. Those that frequently appear in different contexts, such as: numbers, days of the week, colours, etc. 2.2.2 WordNet Affect WordNet Affect (WNA) is an extension of WND (Magnini and Cavaglia, 2000, Sara and Daniele, 2009). It contains different subsets of affective concepts that group together synsets which denote emotional states. This resource was labelled by following a similar process to that of WND. Some of the represented concepts are: moods, situations eliciting emotions or emotional responses. This resource was extended with a set of additional labels called emotional categories. It has a hierarchical structure in which hyperonymy is used to relate the affective concepts of WN (Valitutti et al., 2004). In a second revision, some modifications were made in order to differentiate those senses that are closer to emotional labels, and new labels were also included: positive, negative, ambiguous and neutral:  The first pertains to positive emotions. For example, it includes synsets such as: joy#1 or enthusiasm#1.  Negative defines negative states such as: anger#1 o sadness#1.  Ambiguous represents synsets whose semantics depends on the contexts in which they appear: surprise#1  Neutral represents synsets that refer to mental states but which are not characterised by valence. One important property of WNA labels is that they associate the nouns and adjectives involved in emotional states. In this case, the adjective modifies the state of the noun, and may in some situations determine the modified noun state, i.e.: cheerful / happy boy (Strapparava and Valitutti, 2004). In other words, if the adjective pertains to an emotional state it could indicate how the noun is related. Table 3 shows a list of affective labels associated with synsets. 300 labels are integrated into our research work. Doctrines Archaeology Astrology History Linguistic Psychology Art Religion Heraldry Grammar Psychoanalysis Music Dance Drawing Photography Plastic_arts Theatre Mythology Occultism Roman_catholic Theology Painting Philately Jewelry Numismatic Sculpture ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Affective labels Examples emotion noun anger#1, verb fear#1 mood noun animosisy#1, adjective amiable#1 trait noun aggressiveness#1, adjective competitive#1 cognitive state noun confusion#2, adjective dazed#2 physical state noun illness#1, adjective all in#1 hedonic signal noun hurt#3, noun suffering#4 emotion-eliciting situation noun awkwardness#3, adjective out of danger#1 emotional response noun cold sweat#1, verb tremble#2 behaviour noun offense#1, adjective inhibited#1 attitude noun intolerance#1, noun defensive#1 sensation noun coldness#1, verb feel#3 Table 3. WordNet Affects labels and their associated synsets. 2.2.3 SUMO SUMO 14 (Suggested Upper Merged Ontology) is considered to be an upper level ontology. It provides definitions for general terms and can be used as a basis for domain specific ontologies. It was created from the combination of different ontological contents in one single cohesive structure. It currently contains around 1,000 terms and 4,000 assertions (Niles and Pease, 2003). Our research work only uses 568 of the concepts that are aligned with WN. SUMO was obtained from the information of: Ontolingua 15 and the developed ontologies of ITBM- CNR 16 (Unrestricted-Time, Representation, Anatomy, Biologic-Functions, and Biologic-Substances). It uses a standard language representation called SUO-KIF (Pease, 2007) obtained from KIF (Knowledge Interchange Format) (Genesereth and Fikes, 1992). SUMO was built by dividing the concepts into two groups: high level concepts and low level concepts. In the first group, the ontology of John Sowa (Sowa, 1999), and Russell and Norvig (Russell and Norvig, 1994) were considered, while the remaining concepts were included in the second group. Finally, a unique conceptual structure combining the two high level ontologies was created. The remaining low level class contents were included after the combination. Fig 2 shows the high level concepts. Fig 2. SUMO high level concepts (Ševčenko, 2003). The higher level concept is the entity category, as occurs in most hierarchies. The entity concept groups the rest of the concepts, and the physical and abstract concepts are closest to it. An example of the word sense bank#1 is shown in Fig 3, along with the SUMO hierarchy. 14 http://suo.ieee.org/SUO/SUMO/index.html 15 http://www.ksl.stanford.edu/software/ontolingua/ 16 http://www.ontologyportal.org/SUMOhistory/ Entity Abstract Physical Quantity Attribute SetOrClass Proposition Relation ProcessObject ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Fig 3. SUMO hierarchy for bank#1. 2.2.4 Semantic classes A semantic class is a sense conceptualisation that can be manually or semi automatically created from different abstraction levels and in different domains. WN is composed of a set of related and connected synsets with different semantic relations. Each of these synsets represents a concept and contains a set of words referring to the concept it describes (synonyms). Synsets are classified into forty-five groups which include lexical categories (nouns, verbs, adjectives and adverbs) and semantic groupings (person, phenomenon, sentiment, place, etc.) (Fellbaum, 1998). There are twenty six categories for nouns, fifteen for verbs, three for adjectives and one for adverbs in WN (Izquierdo, 2010). The organisational design of WN helps lexicographers to obtain a structure with which to create and edit a set of words and senses in the same semantic class. These semantic categories are considered to be Semantic Classes that are more general than senses, in which different WN senses are grouped in a semantic class. It is also possible to use this sense conceptualisation in different languages because all the WN senses (in English) are linked to EuroWordNet which contains WNs in different languages. It is important to note that despite the fact that WN was generated by following a top-down process, it is difficult to distinguish between those synsets that were originally semantic classes and those that were not. If we wish to obtain the sense categorisation it is therefore necessary to apply a method with which to generate the Semantic Classes. The main goal of Semantic Class (SC) generation (Izquierdo et al., 2007) is to reduce the polysemy, and various techniques with which to group senses have therefore been developed. In all cases, senses of the same word have been grouped together, thus reducing polysemy and improving WSD system results. The SC resource consists of a set of Base Level Concepts (BLC) obtained from WN using a bottom- up process with hyperonymy relations. For each synset of WN, its BLC is obtained from the first local maxim according to its relative number of relations. As a result, semantic classes have a set of BLCs that are linked to different synsets. The process follows an ascendant itinerary by using the hyperonymy relations in WN. In the case of one synset having several hyperonyms, the path with the maximum number of relations is selected. The process ends when a set of initial concepts with the synsets selected as BLCs for other synsets is selected. In some cases, there are BLCs that do not represent an adequate number of concepts or synsets. This situation is avoided by developing a final filtering process for these false BLCs, in which those BLCs that do not represent a minimum number of concepts are eliminated. Each BLC therefore has a minimum threshold associated with it, as is described in (Izquierdo et al., 2010). A set of different BLCs is obtained by combining different thresholds and types of relations (all or only hypo/hyperonyms). Those synsets that do not have a BLC associated with them (because their numbers of relations do not have the minimum threshold required) are processed again to select another BLC from their hyperonymy relations. This eventually results in a new number of labels with which to categorise a set of senses. SCs are applied with the use of different repositories. Those currently being used are WN1.6 and WN2.0. depository fina ncia l institution, ba nk, ba nking concern, ba nking compa ny Object Physical Entity Collection Agent Corporation Organization Group ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT 2.2.5 SentiWordnet SentiWordNet 17 (SWN) (Esuli and Sebastiani, 2006, Baccianella et al., 2010) is a lexical resource in which each synset is associated with three different sentiment categories: objectivity, positivity and negativity. Each category has a score that moves from [0..1] and the addition of the three scores is always 1. This means that one synset could have scores that are different from 0 for each of the categories. For example, atrocious#3 In our proposal we use SentiWordNet 3.0. This version was obtained in two steps: first a semi- supervised learning phase and then a random walk. The first step consists of four sub-steps, as in the first version of SWN:  Semi-supervised learning o First sub-step. Two small sets of seeds are used, one in which all the synsets contain seven paradigmatically positive terms and the other in which all the synsets contain seven paradigmatically negative terms (Turney and Littman, 2003). Both are expanded by using WN binary relations in order to connect synsets with the same polarity. The expansion is carried out with a specific k radius. o Second sub-step. A previous set of synsets is used with another set of synsets from the Objectivity category in order to build a set of training synsets with which to create a ternary classifier (one synset is classified as Pos, Neg or Obj). The classifier uses synset glosses to conduct the process. In SWN 1.0, a bag of words model is used in which the bag of words is obtained from gloss words (those frequent words that are most important). In SWN 3.0, rather than using words from glosses (with the ambiguity problem) a bag of synsets is used. This sub-step is improved by varying the displacement radius. o Third sub-step. All WN synsets are classified as being either Pos, Neg or Obj via the classifier generated in the second sub-step. o Fourth sub-step. The second sub-step can be performed by using different values of the k radius and different supervised learning technologies.  Step 2 (random walk). WN 3.0 is considered to be a graph and an iterative process is conducted. In this random walk Pos(s) and Neg(s) (and consequently Obj(s)) are determined using the previous step. The process ends when the iterations have converged. There are 117,659 WN synset descriptions in our proposal. 2.2.6 eXtendedWordNet eXtendedWordNet 18 (XWN) (Sanda M. Harabagiu, 1999) is a lexical resource created at the University of Texas. This resource was developed to improve semantic information in WN in its different versions. The goal is to add semantic information to glosses and establish new relations among words (now labelled with their senses) from glosses and synsets. The new annotated version of WN only uses information from gloss definitions, and gloss examples and other information are discarded. eXtendedWordNet was created by applying three processes:  Syntactic analysis. A voting process with two syntactic analysers was applied using the outputs of (Brill, 1995) as inputs. The content of the glosses was extended with: o Adverbs. Glosses of adverbs were extended by adding an adverb + is at the beginning of the gloss and a full stop at the end of the definition. For example, the gloss for automatically would be: automatically is in a reflex manner. A direct semantic annotation is thus obtained between word and gloss. o Adjectives. Glosses of adjectives were extended by adding an adjective + is something at the beginning of the gloss and a full stop at the end of the definition. For example, the gloss for pure would be: pure is something not mixed. o Verbs. Glosses of verbs were extended by adding to + verb + is to at the beginning of the gloss and a full stop at the end of the definition. For example, the gloss for shed would be: to shed is to cast off hair, skin, horn, or feathers. 17 http://gandalf.aksis.uib.no/lrec2006/pdf/384_pdf.pdf 18 http://xwn.hlt.utdallas.edu/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT o Nouns. Glosses of nouns were extended by adding noun + is at the beginning of the gloss and a full stop at the end of the definition. For example, the gloss for play would be: play is the act using a sword (or other weapon) vigorously and skilfully.  Logical analysis. After applying a syntactic analysis, a logical form transformation is then applied. Syntactic relations, syntactic objects, prepositional links, complex nominalizations, etc.  Semantic analysis. Two versions were created: automatic and manual. Automatic annotation was applied by using one specific system to disambiguate WN glosses (XWN WSD) and another system to disambiguate free text. The decision process was a voting system that selected those senses with a coincidence between both systems with a precision of 90%. There were three different categories for semantic annotation (the verbs to be and to have were managed in a special way): o GOLD: manual annotation. o SILVER: both WSD methods returned the same value. o NORMAL: only uses XWN WSD value. This annotation obtained 100% coverage and 70% precision. Words labelled with the same sense of both systems obtained 90% precision. In our approach 551,551 relations were integrated using XWN1.7 and 419,387 using XWN3.0. 2.3 Semantic integration resources Additional information with which to solve different NLP problems was obtained by using different semantic integration resources (ISR). However, one of the main problems is decentralisation. WN can solve this problem because it has been used as a basic resource to build others. Different resources currently use WN as basis to build their structures. For example:  MultiWordNet 19 (MWN) (Pianta et al., 2002) is a project that is being carried out by the ITC-IRST group in Trento, Italy in order to build an Italian WN that is aligned with the English WN (Princeton). Its first version contains around 37,000 Italian words that are organised in 28,000 synsets and their connections to English synsets. MultiWordNet is different from EuroWordNet because at least two methods can be used to build a multilingual WN. The first method was used in the EuroWordNet project and consists of building the WNs of specific languages independently, while correspondences are found in a second phase (Vossen, 1998). The second method was used in MultiWordNet and consists of building WNs of specific languages from semantic relations in the English WN. New synsets are obtained from English synsets. In MWN, the domain information has been automatically transferred from English to Italian, thus obtaining WND (Bentivogli et al., 2004). EuroWordNet (EWN) (Dorr and Castellón, 1997, Vossen, 1998) was developed to align English, Spanish, Dutch, Italian, German, French, Czech and Estonian. New versions included Swedish, Norwegian, Danish, Greek, Portuguese, Basque, Catalan, Romanian, Lithuanian, Russian, Bulgarian and Slovenian. The ILI (Inter-Lingual-Index) is used to connect each language (Vossen et al., 1999). The use of ILI allows closer senses to be obtained among each language, thus reducing polysemy and obtaining a large connection among different languages.  Meaning: the Multilingual Central Repository (Meaning project MCR) (Atserias et al., 2004) is integrated into the EWN framework with five local WNs, including the English WN, and uses an improved version of Superior Concept ontology of EWN, MWN Domains, SUMO (Suggested Upper Merged Ontology) (Zouaq et al., 2009) and new semantic relations from corpus. The first version of MCR includes only conceptual Knowledge with semantic relations among synsets from local WNs. The last version of MCR integrates: o ILI based on WN1.6, including base concepts from EWN, Superior Concept ontology of EWN, MultiWordNet Domains and SUMO; o Local WNs (Basque, Catalan, Italian and Spanish) related to ILI, including WN English versions 1.5, 1.6, 1.7 and 1.7.1. o Semantic preferences collections, from SemCor and BNC, and nominal entities.  UBY: this is a large-scale lexical-semantic resource for NLP based on the ISO standard Lexical Markup Framework (LMF). UBY combines a wide range of information from expert and collaboratively constructed resources for English and German. It presents a web browser and an API 19 http://multiwordnet.itc.it ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT that provides facilities that allow this tool to be used. UBY currently holds structurally and semantically interoperable versions of ten resources in two languages: o EnglishWordNet, Wiktionary, Wikipedia, FrameNet and VerbNet, o German Wikipedia, Wiktionary, GermaNet and IMSLex-Subcat, and multilingual Omega Wiki.  BabelNet: this is a very large multilingual semantic network with millions of concepts obtained from: o An integration of WordNet and Wikipedia based on an automatic mapping algorithm, and o Translations of the concepts (i.e., English Wikipedia pages and WordNet synsets) based on Wikipedia cross-language links and the output of a machine translation system. As will be noted, these examples attempt to build semantic networks with a common interface. In most cases, ISRs apply lexical integration with a few conceptual resources but in our proposal ISR-WN provides semantic information and other types of relations. ISRs are resources that help to improve the results of tasks such as: document classification, entity discrimination, author detection, etc. The improvement is owing to the fact that ISRs provide the contexts analysed with additional information (for example, subjectivity detection, contextual domain, etc). In this work we present a resource that provides navigability from any word sense in WN, domain labels (and sentiments from WNA), SUMO categories or Semantic Classes through the use of semantic relations, as shown in Fig 4. As a result, we can extract the multidimensionality of each sentence through the use of all the concepts and words related in a semantic network. We can also detect sentiment polarities in each sentence from SWN and relate them with WND, WNA, SUMO and SC. Fig 4. Sentence semantic characteristics extraction (only a few labels). After using the WN interface to study different lexical and conceptual resources, our goal is to develop a tool with which to align different WN-based resources and exploit all their relations: hyperonymy, meronymy, synonymy, etc. The result will be a graph with a set of nodes that represent WN synsets, WND concepts, SUMO categories, WNA concepts, SC labels or SWN sentiment polarities. Trait Social Science Quality PedagogyFactotum Administration Root Domain Free Time WNA WND SUMO Subjective Assessment Attribute Removing Occupational Role Educational Process Organization Normative Attribute Relational Attribute Attribute Abstract Entity Group Collection Object Physical Social Role Organizational Process International Process Process Transfer Motion use.v.04 be.v.08 exist.v.01 be.v.01 be.v.05 be.v.02 constitute.v.01 be.v.03 equal.v.01 comunicate.v.02 metalic_element.n.01 strike.v.01 travel.v.01 move.v.02 get_rid_of.v.01 sell.v.01 geographical_area.n.01 natural_process.n.01 content.n.01 teacher.n.01 change_of_state.n.01 structure.n.01 cognition.n.01 organization.n.01 gathering.n.01 SC Positive SWN Negative SWN Objetive SWN “But it is unfair to dump on teachers as distinct from the educationalestablishment” ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT 3. New integrated resource: ISR-WN Our semantic integration proposal is denominated as ISR-WN (Integrated Semantic Resource aligned with WordNet). It consists of the integration of different resources that are isolated but which can be aligned with WN. Considering that WordNet provides synsets (S) that contain a set of synonym words and WND, WNA, SUMO, SC and SWN provide a set of concepts (C) we can relate each concept with different synsets from WordNet. Therefore, ISR-WN is a Lexical Knowledge Base (LKB) that is represented as a complete non directed graph 20 . Vertexes are represented with concepts or senses , that is, . Relations between two vertexes i, j are represented with an edge . According to the different resources taken into account, we have developed the integration knowledge base of ISR-WN based on the following resources: WN, WND, SUMO, WNA, Semantic Classes (SC), SWN and eXtended WordNet (XWN) 1.7 and 3.0. Note that XWN provides additional semantic relations among WordNet synsets. Bellow, we describe each version and the integration process in each case. 3.1 Integration process The integration process takes into account the above mentioned resources. The aim is to obtain a new environment from which to retrieve semantic information using a unified set of resources. It is necessary to mention that WN is one of the most frequently used resources during NLP, since it provides a sense inventory. Its possibilities have motivated several researchers to develop taxonomies with new semantic information (Magnini and Cavaglia, 2000, Valitutti et al., 2004, Niles and Pease, 2003, Niles, 2001, Sara and Daniele, 2009, Forner, 2005, Strapparava and Valitutti, 2004). With regard to the resulting resources, we can mention SUMO (Niles and Pease, 2001) , WND (Sara and Daniele, 2009), WNA (Strapparava and Valitutti, 2004), SC (Izquierdo et al., 2007), SWN (Esuli and Sebastiani, 2006), eXtended WordNet (Sanda M. Harabagiu, 1999) and others. Several authors such as (Gliozzo et al., 2004, Magnini et al., 2002, Magnini et al., July 2002, Vázquez, 2009, Vázquez et al., 2004) have developed methods and systems based on these resources, and have also demonstrated improvements in several tasks: Information Extraction, Automatic Summarization, Document Indexing and Lexical Disambiguation. However, these authors have developed their approaches using one or two semantic resources, since there is no tool or resource with which to integrate all the semantic resources that are mapped onto WN. Bearing in mind both this fact and researchers‟ current need to use a single tool or resource to obtain semantic information from different resources, this work is focused on integrating the greatest possible quantity of semantic resources mapped onto WN into a single tool. ISR-WN includes WN as its lexical nucleus because its internal structure and relations provide relevant information for many NLP tasks. Fig 5 represents the conceptual model, in which each dimension (resource) is aligned with WN through the use of semantic interconnections (Gutiérrez et al., 2010a) (Gutiérrez et al., 2011a). The main challenge as regards the integration consists of dealing with different versions of WN and different versions of the rest of the resources involved in this proposal. It is therefore necessary to match the mappings to the WN versions used by each resource. See Fig 6. This integration resource involves a set of semantic resources considering the following element distribution: 99,643 synsets for WN1.6 21 and 115,424 synsets for WN2.0 22 ; WND (with 172 labels), WNA (with 300 labels), SUMO (with 568 labels), SWN (with 117,659 labels), SC (with 1,231 labels), and XWN (551,551 relations for XWN1.7 and 19,387 new synset relations for XWN3.0). This is known as “Integration of Semantic Resources based on WordNet” (ISR-WN) (Gutiérrez et al., 2010a) (Gutiérrez et al., 2011a). It is important to note that all the labels and relations included in both versions have been reused from the resources cited. The elements of which ISR-WN is composed are shown in Section 2. 20 This is a non-directed graph because each relation connects two vertexes, and and connects and in both senses and is denoted with another semantic relation. The internal relations of WN are described in http://wordnet.princeton.edu/man/wninput.5WN.html. 21 66,025 nouns, 17,915 adjectives, 3,575 adverbs and 12,127 verbs. 22 79,689 nouns, 18,563 adjectives, 3,664 adverbs and 13,508 verbs. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Fig 6. Logic Model of the Integration of Semantic Resources (ISR-WN). 3.2 Integration architecture This section describes some of the process features used to integrate all the resources by using WN as an interlinking nucleus. This process takes into account each of their particularities and integrated resource annotations based on different WN versions, in this case by considering version 1.6 and 2.0. Despite the fact that this version only uses the English language, it can be extensible to other languages if the Interlingua Index (ILI) of EWN (Dorr and Castellón, 1997, Vossen, 1998) or a similar technology is used to align it to them. Fig 7 shows WN synsets linked to several taxonomies (SUMO, WND and WNA), and descriptions of SC and SWN respectively. In many cases the links are established through the use of mapping files, thus allowing the resources tagged in the WN versions to be interlinked. This results in the creation of an enriched semantic graph that is highly suitable for NLP applications. Fig 7 shows how all the semantic resources (the taxonomies of SUMO (in green), WND (in purple) and WNA (in red), the SC labels (in orange) and the SWN descriptions (in grey) are linked to the WN synsets. Particular aspects from each of the resources involved have therefore been taken into consideration. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Fig 7. Architecture. With regard to integrating all the resources mentioned, mapping files are suitable for interlinking different WN versions, and are used by the semantic resources to map their labels with WN definitions. Owing to the word sense granularity of the WN versions (word senses that exist in one version cannot exist in another) many word senses cannot be taken into account when a mapping occurs. A suitable solution to this problem is to navigate all the necessary mapping files in order to take into account the greatest quantity of possible interlinks. For instance, WNA has been used in this proposal, and has been tagged with WN 2.0, while SWN has been tagged with 1.6 and 2.0 and SC with 2.0. There is, therefore, no need to use mapping files to integrate them. It is worth noting that those semantic resources that are annotated with both WN 1.6 and 2.0 reduce the lost interlinks to 0%. Fig 7 shows a model that allows one of the two nuclei, WN 1.6 or 2.0, to be chosen depending on the user‟s needs. This decision is not limited to the inclusion of other WN versions for further ISR-WN approaches. As can be seen when comparing the second version to the first of WN, the former has new semantic labels and the option of selecting the WN nucleus in order to build the semantic graph on demand. This integration model therefore reuses all the semantic relations included in the Princeton WN. Furthermore, in ISR-WN the SUMO categories also keep their relations with their respective WN mappings; moreover, the semantic relations of WND and WNA are now hypernym (owing to the is_a relation) and hyponym (owing to the is_child relation) in order for their hierarchy taxonomy to make sense. Pertainym is, correspondingly, the semantic relation used to link the WN synset to each label of both taxonomies (e.g. a synset can pertain to one or many WND and WNA labels). It is important to stress that the WNA for version 1.1 has been populated with new affective relations, which do not exist in WNA1.0 (e.g. entailment and cause). New relations in WNA1.1 can therefore link verbs, adjectives and adverbs with those nouns from which these are derived. All these considerations have been taken into account when developing ISR-WN resource. The resulting semantic interconnections of this knowledge base permit navigation among all the resources involved. For instance, if the English word atrocious is explored using WN 1.6 as a nucleus, a list of WN synsets can be obtained as regards this word. Once a synset (e.g. offset 00193347) has been selected from its representative list, the following information is retrieved.  00193347 atrocious [Adjective] o Similar-To (00192906 alarming [Adjective]) o Pertainym (Psychological_Feature [Domain]) o Hypernym (Subjetive_Assessment_Attribute [SUMO]) ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT o Pertainym (Emotion [Affect] ) o Pertainym (Horror [Affect] o SentiWN-Description ( atrocious#3 Pos: 0|Neg: 0.625 |Obj: 0.375 [SentiWordNet]) o Cause (05591212 horror [Noun]) As will be noted, the offset, part-of-speech, synonym list and gloss are retrieved for each WN synset. The description of each semantic label from each particular resource linked to the synset being explored is also retrieved. As a search tool, ISR-WN also allows labels to be discovered in all the resources included, thus providing theirs associated information and establishing semantic navigation in order to find interesting semantic paths. An interesting case may be discussed as regards the new features introduced in this integration resource for WNA. For instance, in the case of exploring the label Horror from WNA, we find that this label is not linked directly with the third synset of atrocious (e.g atrocious#3 with offset 00193347). Moreover, it has been assumed that if a synset (e.g. atrocious#3) is linked with a noun (e.g. horror#1) by an affective relation (e.g. entailment, cause) which has been obtained from WNA1.1, the synset will be linked to all the WNA labels linked to its noun. Upon applying this procedure the synset horror#1, which is linked with the WNA label Horror provokes atrocious#3 to be additionally linked to Horror. Fig 8 shows the aforementioned example after applying the previously mentioned procedure, focusing on the synset atrociuos#3 when linked to two WNA labels (e.g. Emotion and Horror). Fig 8. ISR-WN including new affective relation from WNA1.1. As part of the semantic improvement strategy for ISR-WN, the semantic links suggested by XWN1.7 and XWN3.0 have also been considered. Their semantic relations to the synsets for ISR-WN have been tagged as follows:  XWN17_Relation_as_Synset,  XWN17_Relation_as_Gloss,  XWN30_Relation_as_Synset,  XWN30_Relation_as_Gloss. Different features are involved in the resulting semantic graph depending on the nucleus selected (WN 1.6 or 2.0) in order to load the semantic knowledge base on demand. It is important to highlight that Relation_as_Synset and Relation_as_Gloss represent a bidirectional relation between synset and gloss. In the case of using WN1.6 as a nucleus, 505,755 and 199,123 relations will be involved, and are from XWN1.7 and XWN3.0, respectively. However, when WN2.0 is used to load ISR-WN 551,065 relations from XWN1.7 will be involved, in addition to 358,843 relations from XWN3.0. Table 4 shows all the semantic relations included in this integration resource. Note that XWN1.7 is annotated with WN1.7, and it is therefore necessary to apply a conversion of WN versions in order to map their offset of version 1.7 onto the WN versions used as nucleus (1.6 or 2.0). The integration of XWN3.0, which is annotated with WN3.0, is performed in the same way. These mappings result in some relations being missed owing to the conversion issues mentioned above. An in-depth analysis of this can be found in the evaluation section. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT An exploration of ISR-WN is presented as follows, in which WN1.6 is used to load the knowledge base. This exploration reuses the example of the word love, in which an increase in the semantic information is evident as regards the target synset:  01211759 love [Verb] o Antonym (01211167 hate, detest [Verb]) o Hyponym (01212004 love [Verb]) o Hyponym (012125039 care_for, cherish, hold_dear, teasure [Verb]) o Hyponym (01213640 dote [Verb]) o Hyponym (01213998 adore [Verb]) o Pertainym (Factotum [Domain]) o Hypernym (Intentional_Psychological_Process [SUMO]) o Pertainym (Emotion [Affect]) o Pertainym (Love [Affect]) o Stative (05607724 love [Noun]) o Hypernym (love.v.01 [SemanticClass]) o SentiWN-Description ( love#1 Pos: 0.5|Neg: 0 |Obj: 0.5 [SentiWordNet]) o XWN17_Relation_as_Synset (05608483 affection, affectionateness, fondness, tenderness, heart, warmheartedness [Noun]) o XWN17_Relation_as_Synset (05573285 liking [Noun]) o XWN17_Relation_as_Synset (01508689 have, have_got, hold [Verb]) o XWN17_Relation_as_Synset (01332909 great [Adjective]) o XWN17_Relation_as_Gloss (07626109 sweetheart, sweetie, steady, truelove [Noun]) o XWN17_Relation_as_Gloss (07462325 patriot, nationalist [Noun]) o XWN17_Relation_as_Gloss (07111212 bibliophile, booklover, book_lover [Noun]) o XWN17_Relation_as_Gloss (07073765 amorist [Noun] o XWN17_Relation_as_Gloss (06950629 lover [Noun]) o XWN17_Relation_as_Gloss (06930637 Brunnhilde, Brynhild [Noun]) o XWN17_Relation_as_Gloss (06921123 Psyche [Noun]) o XWN17_Relation_as_Gloss (06897084 Adonis [Noun]) o XWN17_Relation_as_Gloss (01402650 unloved ,not loved [Adjective]) o XWN30_Relation_as_Gloss (05315655 hyperbaton [Noun]) o XWN30_Relation_as_Gloss (06950629 lover [Noun]) o XWN30_Relation_as_Gloss (07111212 bibliophile, booklover, book_lover [Noun]) o XWN30_Relation_as_Gloss (08287863 strawflower, golden_everlasting, yellow_paper_daisy, Helichrysum_bracteatum [Noun]) o XWN30_Relation_as_Synset (05608483 affection, affectionateness, fondness, tenderness, heart, warmheartedness [Noun]) o XWN30_Relation_as_Synset (05573285 liking [Noun]) o XWN30_Relation_as_Synset (01213998 adore [Verb]) o XWN30_Relation_as_Synset (01214144 idolize, worship, hero-worship, revere [Verb]) As can be appreciated in this example, the exploration for the same word love (offset 01211759 for the verb love) has been considerably enriched. The semantic information obtained combines many labels from different resources and establishes new relations among WN synsets. This integration of resources allows to discovery new semantic interconnections never before shown in a single tool. Interesting features such as WND or SUMO concepts with positive or negative tendencies could also be mentioned. To return to the example of the word love, an exploration of one level in depth has been applied. However, if we were to navigate in greater depth, many more semantic elements would be discovered. Table 4 shows all the semantic relations. This table is used as a matrix to relate each resource by rows. Note that in Table 5 each relation in ISR-WN has one inverse relation. Most of the relation pairs have been taken from WN 23 and the other resources involved, while the rest (those marked with *) have been created in this research work. The aim of creating bilateral relations has been to allow forwards and backwards navigation through the semantic links. 23 http://wordnet.princeton.edu/man/wninput.5WN.html ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT WN WND WNA SUMO SC SWN WN Also see Pertainym Pertainym Hypernym Hypernym SentiWN_Description* Antonym Hyponym Hyponym Attribute Cause Derivationally related form Derived from adjective Domain of synset - REGION Domain of synset - TOPIC Domain of synset - USAGE Entailment Hypernym Hyponym Instance Hypernym Instance Hyponym Member holonym Member meronym Member of this domain - REGION Member of this domain - TOPIC Member of this domain - USAGE Part holonym Part meronym Participle of verb Pertainym (pertains to noun) Similar to Substance holonym Substance meronym Verb Group XWN17_Relation_as_Gloss* XWN17_Relation_as_Synset* XWN30_Relation_as_Gloss* XWN30_Relation_as_Synset* WND Pertainym Hypernym Hyponym WNA Pertainym Hypernym Hyponym SUMO Hypernym Hypernym Hyponym Hyponym SC Hypernym Hypernym Hyponym Hyponym SWN SentiWN_Description* Table 4. Relations between semantic labels on ISR-WN. Bidirectional relations on WordNet Pointer Reflect Antonym Antonym Hyponym Hypernym Hypernym Hyponym Instance Hyponym Instance Hypernym Instance Hypernym Instance Hyponym Holonym Meronym Meronym Holonym Similar to Similar to Attribute Attribute Verb Group Verb Group Derivationally Related Derivationally Related Domain of synset Member of Doman Bidirectional relations added by ISR-WN ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Pointer Reflect Entailment Cause Participle Participle Pertainym Pertainym XWN17_Relation_as_Gloss* XWN17_Relation_as_Synset* XWN30_Relation_as_Gloss* XWN30_Relation_as_Synset* SentiWN_Description* SentiWN_Description* Stative Cause Synonymy Synonymy Table 5. Bidirectional relations on ISR-WN. 4. Integration process evaluation In this section the results obtained are analysed by taking into consideration the semantic elements that are available for the integration and those eventually integrated. Once developed ISR-WN following the aforementioned process, an evaluation took place. This was carried out in order to obtain a statistic assessment of the quantity of WN synsets that should be linked as regards those established. Two evaluations were applied based on the semantic knowledge interlinked depending on the nucleus (WN 1.6 or WN2.0) selected for loading the semantic graph. 4.1 Evaluation based on WN1.6 as a nucleus Table 6 shows a distribution of the semantic labels involved in ISR-WN with WN1.6 as a nucleus. In this analysis the greatest possible quantity of labels has been extracted for inclusion, and the labels which have not been included are justified below. An important factor that supports the 100% of alignments of WND, SUMO and SC has been the reduction in the usage of mapping files. This was possible since these three resources were built based on WN 1.6 and the current evaluated nucleus is WN1.6. WND 2.0 SUMO WNA 1.0-1.1 SC SWN 3.0 XWN 1.7 XWN 3.0 Mean # Labels 172 568 300 1,231 117,659 - - Synsets to link n 86,901 67,923 1,256 66,025 82,114 - - a 19,322 18,531 2,418 - 18,157 - - v 12,843 12,469 801 12,127 13,767 - - r 3,735 3,627 614 - 3,621 - - Total of synsets to link 12,2801 102,550 5,089 78,152 117,659 551,551 419,387 Linked Synsets n 86,901 67,923 1,096 66,025 56,563 - - a 19,322 18,531 2,125 8,757 - - v 12,901 12,469 474 12,127 9,223 - - r 3,735 3,627 549 2,101 - - Total of linked synsets 122,801 102,550 4,244 78,152 76,644 505,755 119,123 Difference 0 0 845 0 41,015 45,796 300,264 Linked % 100.00 100.00 83.40 100.00 65.14 91.70 28.40 81.23 Table 6. Synsets linked to each resource by using WN1.6 as a nucleus. In this integration resource we have considered both WNA versions (1.0 and 1.1) to be a single resource and have mixed them. This mixture has been developed by involving the taxonomy of WNA1.1 and the WN mappings from both versions (1.0 and 1.1). However, a difference in labels (as regards all linked synsets with regard to those synsets that should be linked) between WNA 1.0 and WNA 1.1 has caused 845 missing links (see Table 6). This special case is owing to the fact that most of the WNA1.0 labels are included in WNA1.1, but several WNA1.1 labels are not included in WNA1.0. Some links have therefore been removed for the integration. All the affective labels not considered by ISR-WN because they are not represented in the WNA1.1 taxonomy are shown as follows: attitude, emotional response, psy, man, sympathy, sta, softheartedness, joy-pride, identification, levity-gaiety, general-gaiety, empathy, positive-concern, compatibility, kindheartedness and buck-fever. Note that the taxonomies used are the most recent for WND and WNA (i.e. WND 3.2 and WNA1.1). ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT WN mapping files have also been considered for the alignment of WNA to WN. WNA1.1 has a peculiar feature consisting of the fact that only nouns are linked to affective labels. Adjectives, verbs and adverbs are therefore linked to these nouns at sense level by means of derived relations (i.e. entailment and cause). The description of this issue can be found in the section “3.1 Integration process”. Furthermore, when evaluating SWN integration based on WN 1.6 as a nucleus, it was found that the main difference between the mismatching of SWN3.0 alignments had occurred because this resource had been developed on the basis of a very different WN version, WN3.0. It is therefore possible to find that several SWN features that are annotated with WN3.0 do not exist for WN1.6. This justifies all the lost links shown in Table 6. Table 6 also shows how the integration of XWN1.7 reached 91.70%. Moreover, with XWN3.0 it reached a lower integration of 28.40%. A similar phenomenon occurred with SWN3.0 in which several mapping WN versions were involved. This is responsible for the low enrichment of ISR-WN as regards XWN3.0 when WN1.6 is used as a nucleus. Graph-based approaches, such as (Gutiérrez, 2012) which uses ISR-WN with XWN3.0, cannot therefore obtain the same relevant results as those obtained using XWN1.7. Once ISR-WN is loaded with WN1.6, it contains a total of 178,558 vertexes, of which 99,643 represent WN synsets, 172 WND domains, 558 SUMO categories, 300 WNA affective labels, 1,231 SC semantic classes and, finally, 76,644 SWN descriptions. ISR-WN therefore includes a total of 232,6211semantic relations with WN1.6 as a nucleus. 4.2 Evaluation based on WN2.0 as a nucleus Table 7 shows a similar comparative study to Table 6, but here WN2.0 is considered as a nucleus. WND 3.2 SUMO WNA 1.0-1.1 SC SWN 3.0 XWN 1.7 XWN 3.0 Mean # Labels 172 568 300 1,231 117,659 - - Synsets to link n 103,504 79,688 1,256 66,025 82,114 - - a 19,398 18,564 2,418 - 18,157 - - v 19,398 13,507 801 12,127 13,767 - - r 3,835 3,663 614 - 3,621 - - Total of synsets to link 146,135 115,422 5,089 78,152 117,659 551,551 419,387 Linked Synsets n 103,504 79,688 1,089 65,904 78,061 - - a 19,398 18,564 2,118 11,052 - - v 19,398 13,507 473 12,064 13,207 - - r 3,835 3,663 580 3,428 - - Total of linked synsets 146,135 115,422 4,260 77,968 105,748 505,755 119,123 Difference 0 0 829 184 11,911 486 60,544 Linked % 100.00 100.00 83.71 99.76 89.88 99.91 85.56 94.11 Table 7. Synsets linked to each resource by using WN2.0 as a nucleus. As can be seen in Table 7, in ISR-WN SC has some missing links, since it has been developed using WN1.6 and has to connect with WN2.0 through the use of mapping files. When this occurs, some synsets are not considered since some of them do not exist for both versions (1.6 and 2.0). It will, however, be noted that SWN links increase in comparison to the previous distribution table because the mapping versions are reduced. WNA 1.0 and WNA 1.1 have been distributed together in both tables, since both have been seen as a single resource. All the WNA1.1 labels and the synset-label relations from WNA 1.0 and 1.1 have therefore been considered. However, only all those WNA1.0 labels that do not exist in the WNA1.1 taxonomy have not been considered. Both versions of WNA are thus considered to be a single resource. As Table 7 shows, when using WN2.0 as a nucleus the incorporation of XWN increases significantly, particularly in the case of XWN3.0. It is most appropriate to use ISR-WN with WN2.0 since the mean of the integration percentage was 94.11%, while for WN1.6 it was 81.23%. It is therefore evident that the best integration of resources based on WN occurs when WN2.0 is taken into account. In this case, 223,448 vertices have been computed, of which 115,424 are WN synsets, 172 are WND domains, 568 are SUMO categories, 300 are WNA affective domains, 1,231 are SC labels and 105,748 are SWN descriptions. All these vertices are related to 2,893,838 semantic relations. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT 5. NLP tasks description As we have mentioned before, our goal in this work is to provide a semantic framework able to enrich texts with semantic information and use this new information to classify, obtain opinions or recommend similar texts in open domains. To do this, we use ISR-WN as knowledge source for different NLP tasks. Basically, we have conducted four different tasks to add new information from different point of views: 1- Word Sense Disambiguation (WSD). We need to obtain the correct sense of each word in order to obtain accurate results. Due to the fact that one word has different meanings depending on the context, it is necessary to decide which word sense is the correct because it will determine a proper understanding. 2- Semantic similarity. The goal of this task is to discover similar content. According to the words, senses, content structure, etc., we will be able to obtain similar opinions that could be expressed with different words but the meaning is still similar. 3- Domain classification. This task will provide the most relevant domains of a text a set of texts. This information will be useful to annotate texts in order to recommend similar content. 4- Sentiment analysis. The goal of this task is to decide whether an opinion is positive or negative automatically. Moreover, many scientific researchers have used the semantic features that ISR-WN provides to deal with different NLP tasks. Some of these tasks and their contributions are shown below. Notice that most of semantic NLP challenges are based on these WN versions, therefore this integration resource is very useful for dealing with these tasks (see http://senseval.org/). Word Sense Disambiguation approaches (WSD): In order to create effective NLP systems it is necessary to transform the information extracted from the words in plain text into a conceptual level to detect meaningful word senses. In WSD the goal is to determine the words‟ senses in a text in which a word may have different senses depending on the context that it appears in. So that, the main purpose of the above listed approaches is to automatically choose the intended sense (meaning) of a word in a particular context. To be able to do that, it can be used two types of approaches: knowledge based approaches (Zhibiao and Martha, 1994), (Leacock and Chodorow, 1998) and corpus-based approaches (Peter, 2001). The former requires lexical resources such as WordNet, Roget's Thesaurus 24 , BabelNet 25 , DBpedia 26 , etc. to obtain semantic similarities, while the latter uses co-occurrences to measure the similarity between words. In our case, by conducting knowledge based approaches through ISR-WN, we find all the possible synsets for the word in WordNet and also their links to different concepts of the aforementioned resources (i.e. WND, WNA, SUMO, SC) being able to group word synsets under semantic concepts facilitating their distinctions depending on the context. For example for the concept “Sport” of WND we can find three word senses with respect to the word “exercise”, two for “School”, one for “Sociology” and one for “Pedagogy”. So that, if we determine the context of the sentence where “exercise” is being used, we can reduce the number of possible meanings to assign. The advantage to use ISR-WN is we can do this procedure not just using WND, we can also utilise the rest of the conceptual resources at the same platform. WSD can be also considered a task for doing text‟s semantic enrichment, since by applying this technique and depending on the knowledge base used; the processed text can be tagged or linked with different sources. This phenomenon can be associated to the task Entity Linking in which the matching of a textual entity mention to a knowledge base is performed. This entity mention can be a Wikipedia page, DBpedia entry, or a specific URI that identify an entity, that is a canonical entry for that entity (Rao et al., 2013). The reason whereby we are introducing the Entity Linking task is because we also can integrate BabelNet with ISR-WN for dealing with it, even for different languages. Such is the case of (Gutiérrez et al., 2013) in Semeval-2013 for Task 12 “Multilingual Word Sense Disambiguation” 27 . A list of WSD approaches that have been supported the conceptual integration of ISR-WN are described below. Among them, the second approach not just describes a word sense disambiguation approach, it also describes how by using ISR-WN relevant conceptual trees of every conceptual resource 24 http://www.thesaurus.com/roget 25 http://babelnet.org/ 26 http://dbpedia.org 27 http://www.cs.york.ac.uk/semeval-2013/task12/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT integrated in ISR-WN, can be obtained. These trees can be interpreted as a manner of obtain those concepts that classify a text. For example, in the sentence taken from an IMDb‟s 28 user review “Grey's Anatomy stars Ellen Pompeo (who has starred in a few movies but was never really noticeable) as the narrator - Meredith Grey”, using this approach based on WND this text can be classified with the following concepts [“Cinema”, “Art”, “Humanities”]. The following researches, which are based on ISR- WN, support the tasks mentioned:  Participation of the UMCC-DLSI research team in Semeval-2010 for Task 17: All-words Word Sense Disambiguation on Specific Domain (Agirre et al., 2010). A description of this approach can be found in (Gutiérrez et al., 2010b).  Research paper (Gutiérrez et al., 2011b) which was discussed in the RANLP 2011 conference 29 and which combined ISR-WN (considering , WN, WND, WNA and SUMO) and word sense frequencies to solve Word Sense Disambiguation. This proposal attained higher results in comparison to top proposals at a world level.  Research paper (Gutiérrez et al., 2011d) which was discussed in the NLDB2011 conference 30 and which used ISR-WN (considering WN, WND, WNA and SUMO) as a graph-based approach to deal with Word Sense Disambiguation. This proposal attained important results in comparison with the reported submissions of Senseval-2 (Cotton et al., 2001). Participation of the UMCC-DLSI research team in Semeval-2013 for Task 12 “Multilingual Word Sense Disambiguation” 31 . This approach attained the first position of the rank. A description of this approach can be found in (Gutiérrez et al., 2013) In order to show the results obtained by WSD approaches supported by ISR-WN we describe the two types used in this work: graph-based (Gutiérrez et al., 2013) and tree-based (Gutiérrez et al., 2011b). The graph-based approach takes into account the whole ISR-WN graph as configuration, thus it includes all ISR-WN resources and their links for applying WSD tasks. As part of the evaluation of this approach we participated in the Semeval-2013 for Task 12 “Multilingual Word Sense Disambiguation” Campaign. For example, for evaluating WSD in English language, 1,931 instances of Babelnet, 1,242 of Wikipedia and 1,644 of WordNet were considered. Note that in this case ISR-WN and BabelNet, were aligned through WordNet, since WordNet is common for both. Based on the corpus 32 provided by Task 12 “Multilingual Word Sense Disambiguation”, our WSD approach was able to be placed at the top of the campaign ranking, reaching a F1 around 68.5%, 54.6% and 64.7% for BabelNet, Wikipedia and WN respectively. More details about the approach and its experiments on Semeval-2013 campaign see (Gutiérrez et al., 2013). Resources Files Experiments WN WNA SUMO WND MFS Precision Recall F1 F1 Difference with the Best system Corpus d00.txt [648 instances] Exp 1 (MFS) X 0.565 0.564 0.564 Exp 2 X 0.572 0.572 0.572 Exp 3 X 0.561 0.56 0.560 Exp 4 X 0.555 0.554 0.554 Exp 5 X 0.572 0.572 0.572 Exp 6 (Voting) all ISR-WN as configuration X X X X X 0.575 0.575 0.575 Whole corpus (d00.txt[648 instances], d01.txt[1032 instances], d02.txt[757 instances]) Exp 7 (MFS) X 0.601 0.599 0.600 0.090 Exp 8 (Voting) all ISR-WN as configuration X X X X X 0.610 0.609 0.609 0.081 Best system of Senseval-2 0.690 0.690 0.690 Average of the Senseval-2 Results 0.499 0.360 0.391 Worst system of Senseval-2 0.370 0.345 0.357 Table 8. Evaluation results of the tree based WSD approach with Senseval-2 corpora 28 http://www.imdb.com/ 29 http://lml.bas.bg/ranlp2011/ 30 http://gplsi.dlsi.ua.es/congresos/nldb11/ 31 http://www.cs.york.ac.uk/semeval-2013/ 32 https://www.cs.york.ac.uk/semeval-2013/task12/index.php%3Fid=data.html ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT In order to show how to use each resource individually (i.e. WN, WNA, SUMO and WND) we describe another WSD approach: tree-based (Gutiérrez et al., 2011b). In next table we show the evaluation, using the corpora provided by the public Rada Mihalcea‟s repository. This repository includes 2,447 instances from the Senseval-2 competition “The English All-Words Task” (Cotton et al., 2001). Table 8 shows the results by using different configurations: each resource individually or all together. Note that the most frequent word sense (MFS) is also considered as a dimension for the voting. More details in (Gutiérrez et al., 2011b). Opinion Mining approaches: Nowadays, Opinion Mining (OM), also known as Sentiment Analysis (SA), as part of an NLP task has become a popular discipline due to its wide-relatedness to social media behaviour studies. OM is commonly used to analyse the comments that people post on social networks. Also, it allows identifying the preferences and criteria of users about situations, events, products, brands, etc. In order to exploit the potential that provides ISR-WN with the integration of semantic, conceptual, affective and sentiment scoring resources some approaches have been produced for the detection of opinions, relevance, and polarity classification and also measuring the impact of the sentiment polarity detection on Textual Entailment tasks. Those approaches are the following:  The multidimensional point of view of ISR-WN has been useful in the Opinion Mining area. In this case it was presented in the WASSA‟11 Workshop 33 , dealing with three opinion mining tasks in the MOAT (NTCIR Multilingual Opinion Analysis Task 34 ) competition system. These tasks consist of identifying whether or not a sentence represents an opinion. Another task involves identifying the polarity of the opinion sentence, while the last one consists of aligning questions with topics. This proposal used ISR-WN to extract relevant concepts which represent the sentences analysed. The Opinion Mining tasks were carried out by taking these concepts and linking them with SWN (in Enriched ISR-WN) sentiment polarities. This research attained relevant results which could be comparable with the first places attained in the MOAT campaign (Gutiérrez et al., 2011c)  Paper research “Approaching Textual Entailment with Sentiment Polarity” (Fernández et al., 2012b), discussed in the ICAI‟12 conference: The 2012 International Conference on Artificial Intelligence. This takes ISR-WN as a knowledge base and uses the (Gutiérrez et al., 2011c) approach to determine Textual Entailment with Opinion Mining techniques. Semantic Textual Similarity (STS) approaches: STS is related to Textual Entailment35 (TE) and Paraphrase36 tasks. The main difference is that STS assumes bidirectional graded equivalence between the pair of textual snippets. In case of TE, the equivalence is directional (e.g. a student is a person, but a person is not necessarily a student). In addition, STS differs from TE and Paraphrase in that, rather than being a binary yes/no decision, STS is a similarity-graded notion (e.g. a student is more similar to a person than a dog to a person). This graded bidirectional is useful for NLP tasks such as Machine Translation (MT), Information Extraction (IE), Question Answering (QA), and Summarization. Several semantic tasks could be added as modules in the STS framework, such as WSD and Induction, Lexical Substitution, Semantic Role Labelling, Multiword Expression detection and handling, Anaphora and Co- reference resolution, Time and Date resolution and Named Entity, among others”. Below can be found different participations on international competitions where STS systems made use of ISR-WN a semantic nucleus allowing judging textual similarities from multidimensional perspectives such as from domain and affective points of view and others. The following researches support STS tasks based on ISR-WN:  Participation of the UMCC-DLSI research team in Semeval-2012 in Task 6 “Semantic Textual Similarity”(Agirre et al., 2012) . A description of this approach can be found in (Fernández et al., 2012a).  Participation of the UMCC-DLSI research team in Semeval-2013 Task 5 37 . “UMCC_DLSI- (EPS): Paraphrases Detection Based on Semantic Distance” (Dávila et al., 2013) discussed in Second Joint Conference on Lexical and Computational Semantics (*SEM). It took ISR-WN as a knowledge base to determine the semantic similarity of different short texts. 33 http://gplsi.dlsi.ua.es/congresos/wassa2011/ 34 http://research.nii.ac.jp/ntcir/ntcir-ws8/ws-en.html 35 http://aclweb.org/aclwiki/index.php?title=Recognizing_Textual_Entailment 36 The adaptation or alteration of a text or quotation to serve a different purpose from that of the original. 37 http://www.cs.york.ac.uk/semeval-2013/task5/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT  Participation of the UMCC-DLSI research team in Semeval-2012 in Task 6 “Semantic Textual Similarity” 38 . A description of this approach can be found in (Chávez et al., 2013). Ongoing approach: In order to capture the semantics of a textual question, we pretend to use the aforementioned approaches (WSD and STS) supported by ISR-WN, to make interpretations of both questions written in natural language and ontology node‟s descriptions (and also considering the name of the adjacent nodes) for finding further alignments with existing ontologies. Reusing the work below, which has been the first approach by using ISR-WN, we would be able to recover those ontology nodes that involve textual descriptions relevant to textual question. The following research is part of this ongoing approach.  Paper research “Semantic Information extraction method on ontologies”(Dávila et al., 2012) discussed in SEPLN‟12: XXVIII Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural, in which semantic ontology searches are applied on the basis of ISR-WN and natural language adaptation. Next, we describe the annotation process through a case of study taken from IMDb. 6. Case study In order to enrich texts using ISR-WN, we have developed different approaches to automatically process textual information and obtain the semantic knowledge associated to each word or to the general context. As a result, we provide a semantic framework suitable of being used as support of content-based recommender systems that is able to find contents that are similar to those already evaluated by the users by considering semantic information. In this section we illustrate the annotation process with an example obtained from IMDb 39 (Internet Movie Database) and explain the different approaches used to conduct the annotation process. Our experiments are carried out using textual information from IMDb. IMDb is the world‟s most popular and authoritative source for movie, TV and celebrity content. This website has more than 250 million unique monthly visitors and offers a searchable database of more than 185 million data items including more than 3 million movies, TV and entertainment programs and more than 6 million cast and crew members. As we have mentioned above, IMDb provides a lot of information related to movies, TV shows or TV series. However, due to the fact that we are interested in people‟s interest and opinions we only focus our attention over the reviews & ratings section. In this case, for each movie or TV series the reviews & ratings section has a list of reviews from different users that give their opinions about general concepts, feelings, likes and dislikes, etc. Our purpose is to collect textual information provided by the reviews of different users about TV series or movies in order to enrich it with semantic information and therefore being able to rate or recommend similar content according to people feelings, interests or likes. With this case study, we aim at demonstrating the practical use of our proposed resource. 7. Semantic enrichment approaches In this section, we briefly describe how to enrich textual information by using ISR-WN. As we have mentioned in Section 3, we have integrated a set of different resources in order to obtain new relations and therefore being able to build new connections. With ISR-WN we can enrich texts in different dimensions using: affection labels, domain labels, semantic classes, etc. To conduct our experiments and annotate texts we have developed a framework that uses different techniques in order to take into account each dimension of ISR-WN. 7.1 WSD First of all, it is important to extract the correct senses of words. To do this, our framework uses an unsupervised multilingual approach of WSD. It works using a words window and applies a modification of the Personalizing Page Rank (Ppr) algorithm (Agirre and Soroa, 2009) considering as knowledge base the ISR-WN semantic network. More in detail, it uses senses frequencies to rank the synsets of each 38 http://ixa2.si.ehu.es/sts/ 39 http://www.imdb.com ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT lemma (in a descending order) according to a calculated factor of relevance (Ppr+Freq algorithm, more details see in Section 7 or (Gutiérrez et al., 2013)). This approach was evaluated in Semeval-2013 obtaining the best results of this campaign. An example of the results after processing a text fragment, taken form IMDb‟s user reviews, is shown in next table: Original fragment Format: [word][(word position in the sentence] WSD results Format: [position of the word in the sentence][word sense key suggested as appropriated in this context][gloss of the word sense suggested] Her mother(1) is(2) the famous(3) surgeon(4) and she is trying(5) to follow(6) in her mother's(7) footsteps(8) (1) mother%1:18:00:: "a woman who has given birth to a child (also used as a term of address to your mother); \"the mother of three children\" (2) be%2:42:03:: "have the quality of being; (copula, used with an adjective or a predicate noun); \"John is rich\"; \"This is not a good answer\" (3) famous%5:00:00:known:00 " widely known and esteemed; \"a famous actor\"; \"a celebrated musician\"; \"a famed scientist\"; \"an illustrious judge\"; \"a notable historian\"; \"a renowned painter\" surgeon%1:18:00:: "a physician who specializes in surgery " (4) be%2:42:03:: " have the quality of being; (copula, used with an adjective or a predicate noun); \"John is rich\"; \"This is not a good answer\" (5) try%2:41:00:: " make an effort or attempt; \"He tried to shake off his fears\"; \"The infant had essayed a few wobbly steps\"; \"The police attempted to stop the thief\"; \"He sought to improve himself\"; \"She always seeks to do good in the world\" " (6) follow%2:38:00::" to travel behind, go after, come after; \"The ducklings followed their mother around the pond\"; \"Please follow the guide through the museum\" (7) mother%1:18:00:: " a woman who has given birth to a child (also used as a term of address to your mother); \"the mother of three children\" (8) footstep%1:23:00:: " the distance covered by a step; \"he stepped off ten paces from the old tree and began to dig\" " Table 9: Disambiguation example As Table 9 shows, each word (nouns, verbs, adjectives and adverbs) is disambiguated. The annotation process after disambiguating each word obtains: lemma and word sense (mother%1:18:00::) and a gloss or definition with examples ("a woman who has given birth to a child (also used as a term of address to your mother); \"the mother of three children\"). With this information we are able to discriminate among a set of senses and understand the meaning of words if they appear in different contexts. 7.2 Concepts and Polarities In order to obtain relevant concepts and extract sentiment information we propose an unsupervised knowledge-based approach, described in section before, that uses the RST (Relevant Semantic Trees) technique (Gutiérrez et al., 2011c). The result is a set of relevant semantic trees associated to each sentence that provides sentiment polarity values. An example of the relevant semantic tree technique based on WordNet Domains resource is shown in Fig 9, where the number between parentheses of each concept indicates the order of relevance from 1 to 7 according to the next fragment. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Fig 9. Relevant Semantic Tree (RST) based on WordNet Domains (WND) This tree represents the most relevant domains related to the next text fragment extracted from a review of Grey‟s Anatomy from IMDb: “…The story revolves around her time as an intern and the people she meets and sort of is portrayed as "survival camp for medical students." The minute she arrives at work, she meets Christina Yang (Sandra Oh - flawless in her bitchy supporting role), George O'Malley (T.R. Knight - one word: breakthrough performer), Izzie Stevens (Katherine Heighl - very, very believable as a model who is more like the girl-next-door), and Alex Karev (Justin Chambers - plays sort of a not-so-likable person). Most of all, there is Dr. Derek Shepherd (Patrick Dempsey - very attractive), the man that Meredith had a one- night stand with - he just happens to be her boss. This is a show that wants to be liked….”. As we can see in Fig 9, there are seven domains that are closely related to the meaning of the context, starting from Person to Telecommunications. Moreover, WND has a hierarchy where all the domain labels (172 labels) are connected and Fig 9 also shows how the relevant domains are arranged into the hierarchy. This information is useful in order to classify texts into different categories and to obtain related information through a previous categorization. 7.3 Semantic Textual Similarity Last step to enrich textual information is to extract semantic textual similarities. Our approach is a Machine Learning System (MLS) that uses several algorithms to extract all features: similarity measures, lexical-semantic alignment, semantic alignment (Fernández et al., 2012a). In order to extract the semantic features it uses the multidimensional resource ISR-WN. Thus our STS approach provides a value scale to decide whether a pair of contexts can be considered semantically similar or not. The scale has 5 values that go from 1 to 5. Where 1 indicates that there is no semantic relation between two pair of contexts and 5 indicates that a pair of contexts is semantically equivalent. See the evaluation described in next section. 8. Experiment setup and results Due to the fact that we need real texts to evaluate our framework, our experiments have been conducted by using texts from IMDb user reviews. These texts contain information related to user feelings, expectations, complains, acceptance, etc. Moreover, each review has associated an overall rating. In order to achieve comprehensive results we have extracted information related to medical TV series: House M.D., Grey Anatomy, etc. Accordingly to our data set, we randomly have selected 3 reviews to show how texts are annotated. Table 10 shows a brief part of the original reviews that have been annotated. TV series/Movies Original Texts Grey‟s Anatomy If you think ABC can't get any better - you're wrong. With the great success over smash-hits, "Desperate Housewives" and "Lost" they also picked up a few shows over mid-season hoping for more success. They got "Jake in Progress," "Eyes" and "Grey's Anatomy" - but "Grey's Top Level (1) Person Doctrines History (5) Heraldry (4) Art (2) Theatre Social Science (7) Telecommunications (3) Cinema Applied Science (6) Medicine ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Anatomy" is definitely considered the best out of those three. Grey's Anatomy stars Ellen Pompeo (who has starred in a few movies but was never really noticeable) as the narrator - Meredith Grey. Her mother is the famous surgeon and she is trying to follow in her mother's footsteps.… House M.D. Let me put it simply. I am a physician, and as an inviolable rule, I HATE medical shows. Granted, TV series tend to be one dimensional, due to inherent difficulties in the genre, but "doctor shows" are something I avoid like the proverbial plague. And then one evening I caught "House, MD" and was completely drawn into the show. In House I find the anti-hero that I've been waiting for in a medical show. The guy who knows everything, but is wrong often enough to keep us all guessing. I enjoy the contrast of House and his cadre of young fresh faced colleagues… Tomorrowland When the director of "The Incredibles" signed for this film, I was looking forward to the same amount of humour and exhilaration present in that animated masterpiece, something similar to what the little boy expresses when he realizes he can run on water. Nothing remotely close occurs in "Tomorrowland", a film that suffers from having too big a budget and hardly any original or exciting thoughts. It is also hindered by the fact that almost all of the actors appear clueless and not quite matching their characters. There's something about George Clooney… Table 10: Original reviews obtained from IMDb IMDb provides the overall rating for each TV show and movie in their database. For Grey‟s Anatomy the overall rating is 7.7/10 from 140,600 users, for House M.D. is 8.9/10 from 261,929 users and for Tomorrowland 6.7/10 from 39,939 users. After processing each text we obtain a set of different features that we use to enrich the original information. Table 11 shows the results of annotation for Affect labels and domain labels including a rating of positiveness, negativeness and objectiveness, respectively: Title Affects P N O Domains P N O Grey‟s Anatomy love annoyance liking anger anxiety 0.82 0.18 0.64 Person Theatre Cinema Art Heraldry Medicine Telecommunication 0.72 0.28 0.93 House M.D. sensation love wonder 0.73 0.27 0.68 Psychology Telecommunication Humanities Literature Theatre Medicine Person 0.72 0.28 0.94 Tomorrowland closeness belonging joy behaviour exhilaration sensation 0.49 0.51 0.67 Racing Telecommunication Sociology Theatre Cinema Sport Radio+Tv 0.50 0.50 0.96 Table 11: Results of annotation of three IMDb reviews Once got over the annotation process we are able to infer whether the review is positive or negative and also which are the general domains of each context. This information is useful if we want to recommend similar shows to other people because we can analyse the general content and extract common domains. Moreover, we are able to rate each show according to people‟s reviews gathering if they are positive, negative or objective. Fig 10 shows the annotation of sentiment positiveness of each IMDb review in comparison with the real IMDb Rating: ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Fig 10. Annotation of sentiment positiveness of three IMDb reviews As we can observe, our predictions for Grey‟s Anatomy and Tomorrowland are very similar to real IMDb rating. In these cases the textual enrichment process obtains very good results and the semantic classes and SUMO dimensions are nearby to real ranking. In case of House M.D. there are two points of difference between our predictions and real rating. This is due to the review selected mixture a set of negative and positive reactions to the show. For example, it begins saying “I HATE medical shows” but next says “I enjoy the contrast of House and his cadre of young fresh faced colleagues, complete with starched white lab coats, who struggle as much with their professionally imposed constraints, and sense of decorum, as they do with his personality”. So, in this case it is important to remark that despite the fact that there are some negative aspects our framework is able to establish that there is some positive things that must be taken into account. Another dimension to enrich texts is provided by the STS (Semantic Textual Similarity) approach. In STS, three features are used: syntactic (similar words, syntactic distances, etc), sentiment and semantics. As a result, we are able to identify similar texts by measuring them in a scale from 1 to 5 (where 1 indicates that there is no semantic relation between a pair of contexts and 5 indicates that both are semantically equivalents). In Table 12 we can observe the results obtained after applying STS. The results indicate that there is a slight relation between Grey‟s Anatomy and House M.D. (1.53) and there is no relation between Grey‟s Anatomy and Tomorrowland (0). Semantic Textual Similarity Grey's Anatomy House M.D. Tomorrowland Grey's Anatomy - 1.53 0.00 Machine Learning technique Support Vector Machine 40 Approach STS approach described in Section 7 Table 12: STS results to measure contexts similarities In order to demonstrate how our framework would help to predict the rating with a set of multiple user reviews, we present an exhaustive evaluation over the 10 first IMDb reviews for each mentioned TV shows. Table 13 lists the title of each TV show considered on these evaluations and their ratings, which were manually set by the users. Title No. Review title IMDb Rating (0-1) Grey's Anatomy 1 So far, so good 1 2 Addictive show 0.8 3 Excellent Series 1 40 http://www.support-vector-machines.org/ 0.570 0.551 0.427 0.77 0.89 0.67 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 Grey's Anatomy House Tomorrowland P Affect P Domain P SUMO P Semantic Class P Synset P Mean Real IMDB Rating ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT 4 Waste of Time 0.4 5 Boring! 0.1 6 This show is absolutely wonderful!!! 1 7 Wonderful! 1 8 Can someone explain me WHY this is a Top 10 Show? 0.4 9 Started well, now its just more of the same... 0.6 10 I'm Done--and SHAME on me for taking so long 0.1 House 11 Abrasive medical doctor saves lives - no matter what the cost. 1 12 Extremely formulaic fantasy medical detective show 0.1 13 Ingenious and Compelling!!! 1 14 The Greatest Medical Show Ever 1 15 Best Show Ever! 1 16 Watch it for the characters, not the medicine! 0.7 17 One of the best things on the box 0.9 18 Very Good and Getting Better 0.7 19 Everybody lies... 1 20 Pllllllease watch a full episode. 1 Tomorrowland 21 Do not listen to the critics on this one! A vastly under-appreciated tale of promise and hope. 1 22 A fantastic sci-fi experience 1 23 A spectacular Disney sci-fi thrill ride 0.9 24 Misunderstood 0.9 25 Why do people hate this movie? 1 26 Maybe a bit childish but enjoyable sci-fi tale none the less. 0.8 27 Upbeat positive story for the whole family. 0.8 28 A great movie! 0.9 29 Fun kid safe sci fi movie that looks great 0.6 30 Offensively shallow 0.1 Table 13. List of reviews used for automatically rating these three movies Once processed the 30 reviews, Positive, Negative and Objective scores were obtained by a tree based OM approach (Gutiérrez et al., 2011c) using as configuration: WNA (Affect), WND (Domain), SUMO, SC (Semantic Class) and WN (Synsets). Moreover, an additional experiment was conducted to take into account the Harmonic Mean 41 among all positive outputs per review. As Table 14 shows different types of evaluations were performed. One evaluation measure is the Pearson correlation that compares the real rating with the positiveness automatically obtained by our framework. Another one is the combination of all the outputs with the Harmonic Mean where we reach a 69% of correlation. Related to each resource, we can notice that WNA (Affect) obtains the best results of correlation. It means that this resource provides better knowledge to deal with OM challenges due to the fact that WNA conceptualize and link only affective words. On the other hand, if we pay attention to the evaluation shown in Table 14, it shows the results of the mean difference between the real rating and the positiveness obtained by using each resource. In Table 14, WNA reaches the smallest mean difference with 0.22 perceptual points of margin. It is very difficult to compare this kind of reviews with OM system outputs because the users usually set dramatic scores (e.g. 1 or 2) however; their reviews are not as cruel as their scores are. For example, we invite to take a look at the following IMDb review: number 12 of Table 13 titled “Extremely formulaic fantasy medical detective show” with a score of 0.1. In Table 14 it has been highlighted those cases in which the ratings are negatives. In those cases our OM functionality is able to identify that the scoring should be lower than the others but it is not able to set how much. Thus, as future work we propose to carry out new evaluations for introducing OM outputs in a machine learning system. At this way the framework could be able to learn from the different patterns provided by theses outputs and suggest more balanced results. 41 It is one of several kinds of average, and in particular one of the Pythagorean means. Typically, it is appropriate for situations when the average of rates is desired. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Tit. No. Rating (0-1) Affect Domain SUMO Semantic Class Synset Harmonic Mean P N O P N O P N O P N O P N O P N O G r e y 's A n a to m y 1 1 0.81 0.19 0.66 0.78 0.22 0.96 0.59 0.41 0.88 0.69 0.31 0.92 0.58 0.42 0.72 0.68 0.28 0.81 2 0.8 0.92 0.08 0.68 0.84 0.16 0.92 0.81 0.19 0.83 0.71 0.29 0.94 0.6 0.4 0.68 0.76 0.17 0.79 3 1 0.9 0.1 0.66 0.88 0.12 0.95 0.65 0.35 0.91 0.62 0.38 0.93 0.75 0.25 0.8 0.74 0.18 0.84 4 0.4 0.55 0.45 0.76 0.52 0.48 0.93 0.44 0.56 0.89 0.55 0.45 0.93 0.56 0.44 0.7 0.52 0.47 0.83 5 0.1 0.25 0.75 0.76 0.64 0.36 0.96 0.52 0.48 0.92 0.58 0.42 0.94 0.66 0.34 0.7 0.47 0.43 0.84 6 1 0.91 0.09 0.62 0.83 0.17 0.94 0.77 0.23 0.87 0.74 0.26 0.93 0.64 0.36 0.71 0.77 0.18 0.79 7 1 0.93 0.07 0.62 0.56 0.44 0.93 0.68 0.32 0.89 0.63 0.38 0.93 0.66 0.34 0.73 0.67 0.2 0.8 8 0.4 0.62 0.38 0.65 0.41 0.59 0.94 0.44 0.56 0.88 0.69 0.31 0.94 0.67 0.33 0.77 0.54 0.4 0.82 9 0.6 0.79 0.21 0.66 0.59 0.41 0.89 0.55 0.45 0.87 0.61 0.39 0.92 0.62 0.38 0.73 0.62 0.34 0.8 10 0.1 0.54 0.46 0.58 0.58 0.42 0.95 0.57 0.43 0.89 0.62 0.38 0.92 0.6 0.4 0.74 0.58 0.42 0.79 H o u se 11 1 0.92 0.08 0.65 0.72 0.28 0.94 0.64 0.36 0.9 0.61 0.39 0.92 0.66 0.34 0.73 0.69 0.21 0.81 12 0.1 0.58 0.42 0.64 0.49 0.51 0.92 0.59 0.41 0.83 0.53 0.47 0.92 0.67 0.33 0.76 0.57 0.42 0.8 13 1 0.5 0.5 0.68 0.52 0.48 0.97 0.6 0.4 0.93 0.58 0.42 0.93 0.73 0.27 0.77 0.58 0.39 0.84 14 1 0.72 0.28 0.73 0.73 0.27 0.96 0.6 0.4 0.89 0.65 0.35 0.93 0.65 0.35 0.76 0.67 0.32 0.84 15 1 0.76 0.24 0.55 0.7 0.3 0.94 0.56 0.44 0.91 0.61 0.39 0.94 0.66 0.34 0.73 0.65 0.33 0.78 16 0.7 0.81 0.19 0.72 0.76 0.24 0.95 0.6 0.4 0.91 0.65 0.35 0.93 0.58 0.42 0.69 0.67 0.29 0.83 17 0.9 0.69 0.31 0.68 0.64 0.36 0.95 0.63 0.37 0.89 0.63 0.37 0.93 0.67 0.33 0.74 0.65 0.35 0.82 18 0.7 0.61 0.39 0.68 0.5 0.5 0.95 0.46 0.54 0.91 0.59 0.41 0.92 0.42 0.58 0.78 0.5 0.47 0.83 19 1 0.84 0.16 0.7 0.79 0.21 0.95 0.65 0.35 0.92 0.67 0.33 0.94 0.69 0.31 0.79 0.72 0.25 0.85 20 1 0.78 0.22 0.6 0.8 0.2 0.94 0.69 0.31 0.89 0.68 0.32 0.93 0.62 0.38 0.76 0.71 0.27 0.8 T o m o r r o w la n d 21 1 0.74 0.26 0.6 0.83 0.17 0.95 0.77 0.23 0.88 0.72 0.28 0.93 0.65 0.35 0.75 0.74 0.25 0.8 22 1 0.8 0.2 0.65 0.64 0.36 0.95 0.68 0.32 0.88 0.66 0.34 0.93 0.61 0.39 0.71 0.67 0.31 0.81 23 0.9 0.58 0.42 0.65 0.56 0.44 0.93 0.51 0.49 0.87 0.57 0.43 0.92 0.65 0.35 0.71 0.57 0.42 0.8 24 0.9 0.59 0.41 0.62 0.51 0.49 0.51 0.58 0.42 0.89 0.59 0.41 0.93 0.65 0.35 0.76 0.58 0.41 0.7 25 1 0.69 0.31 0.59 0.63 0.37 0.94 0.57 0.43 0.91 0.59 0.41 0.93 0.6 0.4 0.75 0.61 0.38 0.8 26 0.8 0.55 0.45 0.69 0.46 0.54 0.95 0.49 0.51 0.89 0.57 0.43 0.92 0.64 0.36 0.73 0.53 0.45 0.82 27 0.8 0.76 0.24 0.6 0.7 0.3 0.92 0.62 0.38 0.87 0.7 0.3 0.92 0.62 0.38 0.77 0.68 0.31 0.8 28 0.9 0.68 0.32 0.59 0.64 0.36 0.93 0.65 0.35 0.85 0.63 0.37 0.91 0.62 0.38 0.69 0.64 0.35 0.77 29 0.6 0.76 0.24 0.69 0.73 0.27 0.93 0.66 0.34 0.85 0.49 0.51 0.93 0.56 0.44 0.71 0.62 0.33 0.81 30 0.1 0.57 0.43 0.74 0.35 0.65 0.93 0.33 0.67 0.89 0.42 0.58 0.92 0.7 0.3 0.76 0.44 0.48 0.84 Pearson correlation 0.62 0.54 0.56 0.52 0.09 0.69 Difference (Rating-P) Mean 0.22 0.25 0.28 0.29 0.31 0.26 Table 14. Opinion Mining evaluation over 30 reviews Another evaluation was carried out in order to compare the rating obtained by each movie (by considering 10 reviews per movie) with the rating suggested by our framework. Fig 11 shows how our suggestions were very consistent regarding to Grey‟s Anatomy. However, there are some differences with the results for Tomorrowland and House M.D, but not so distant. This is due to our results provide positive sentiment polarities and not specifically user ratings. Even though there are some differences, we can still obtain accurate results according to WNA or Domains dimensions. Fig 11. Review rating (rating mean) and OM rating (positiveness mean) comparison Grey's Anatomy House Tomorrowland IMDB Rating (Mean) 0.640 0.840 0.800 Affect Positiveness (Mean) 0.722 0.721 0.672 Domain Positiveness (Mean) 0.664 0.663 0.605 SUMO Positiveness (Mean) 0.602 0.601 0.585 Semantic Class Positiveness (Mean) 0.643 0.619 0.593 Synset Positiveness (Mean) 0.634 0.636 0.632 Mean Positiveness (Mean) 0.635 0.640 0.609 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT We want to remark that our framework is able to provide the OM outputs as inputs to a recommender system by different ways. One way is to provide the individual analytics considering each semantic resource (in this case SUMO, WND, WNA, SC, and the taxonomy of WN) separately. Another way could be considering as input all resources to get a single sentiment analysis result. And finally, we can provide all these outputs with a mean calculation. 9. Discussion According to the experiments conducted and the results obtained, we can conclude that our resource ISR- WN and the different approaches that take advantage of it are suitable of being used to acquire enough knowledge to make recommendations for TV-shows or films. Moreover, we are also able to classify texts into different labels with the purpose of ranking a set of TV-shows according to their similarities. Moreover, due to the fact that user reviews have associated a set of feelings to indicate likes or dislikes we are also able to detect emotions and classify texts into three different sections: positive, negative or objective. In some cases, there are contexts that mixed a set of emotions and can confuse the framework. This is the case of the review for “House M.D.”. However, we can establish the appropriate feelings by taking into account all the context information. We would like to mention another similar work such as (Briguez et al., 2014) that presents a novel framework for the specific domain of movie recommendation. This work proposes a complete set of postulates accounting for both quantitative and qualitative aspects of the movie domain implemented by means of DelP (Defeasible Logic Programming). Related to our work, there are some differences: first of all, we want to remark that we present a semantic framework as a tool to help recommender systems so, taking into account our proposal, we can mention that the way we provide information to recommend similar movies is established by measuring different dimensions (i.e. feelings, sentiment polarities, word senses, textual similarities), on the contrary in (Briguez et al., 2014) their system uses specific rules to determine which characteristics have to be shared. Despite the fact that the process of building specific rules helps to obtain better user recommendations, it is designed to one specific domain. In case of applying this system to another domain it would be necessary to create additional rules to obtain accurate results. On the other hand, our approach can be easily adapted to other domains or languages (since WN is a nucleus for other WN languages 42 ) with minimal changes. Another difference with our work is that in (Briguez et al., 2014) the system provides a combination of quantitative and qualitative aspects however, we only have focused on quantitative aspects. The results obtained after mixing qualitative and quantitative aspects demonstrate that the incorporation of qualitative aspects introduce significant improvements. In fact, it would be a reasonable feature to take into consideration for future works. Related to evaluation results, we cannot provide an in depth comparison because our datasets are different. Moreover, as far as we are concerned we have not found any example in the literature using exactly the same dataset employed in this work. Even though our experiments have been carried out with user reviews for TV-shows we can deal with other domains. Moreover, we can deal with different languages by using a specific version of WN for each language. This fact has been demonstrated with our participation in different competitions where our framework was evaluated over different languages. One interesting usage of our proposal is shown in Fig 12. Through our semantic labels and features provided by our framework, one user could navigate among different TV-shows or movies that have semantic similarities and thus, provide a great experience of knowing automatically which are the movies or shows more appropriated to each one. 42 http://globalwordnet.org/wordnets-in-the-world/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Fig 12. Conceptual similarities among different IMDB reviews 10. Conclusions and future works In this work our goal has been to enrich texts by measuring similarities among different comments/reviews, grading the positiveness or negativeness by using textual information and using also this information to create a recommendation rate according to people interests. To do this, we have developed a framework that allows the integration of several semantic resources in order to provide to recommender systems enough knowledge for being able to: extract similarities among texts, measure sentiment polarities and obtain a semantic analysis to understand contexts meanings. In order to build the knowledge database of the framework we have selected a set of resources by conducting an in-depth analysis of the different semantic resources used in NLP. Just like we have discussed in previous sections, several authors have created integrated resources but most of them have been based on linguistic features rather than on semantic or conceptual features. In this research work we have therefore considered previous studies in order to develop a multidimensional knowledge network that integrates different semantic resources. Therefore, an integrated resource (ISR-WN) has been proposed. This resource integrates semantic resources (WN, WND, WNA, SUMO, SC and new semantic relations provided by XWN and a sentiment analysis resource (SWN), using WN, from versions 1.6 and 2.0). The knowledge database obtained in the integration process was evaluated in order to detect any problems in the alignment by using different intermediate versions of WN. So, depending on the nucleus of WN, WND and SUMO used, an accuracy of 100% was obtained, while accuracies of 99.76% and 100% were obtained for SC related to WN1.6 and 2.0, respectively. In addition to this research we have included a set of research works that contains interesting knowledge bases used to deal with NLP tasks. One of the most interesting resources is SWN. It supplies the positivity of a sense, domain, category, emotion or semantic class. It is important to stress that this resource has been used in several research works to take advantage of its semantic multidimensionality. Based on the semantic multidimensionality that ISR-WN provides we are able to extract new information to classify, obtain opinions or recommend similar texts in open domains. Basically, we have applied four different tasks to add new information from different point of views: extract similarities among texts, measure sentiment polarities and obtain a semantic analysis for understanding contexts meanings. Moreover, we have illustrated a case of study with information related to movies and TV series reviews to demonstrate that our framework works properly. As future work we plan to enrich the ISR- WN resource with collocation sense relations 43 . This type of information provides better results in tasks such as WSD (Gutiérrez, 2012). In (Gutiérrez et al., 2011c) ISR-WN was used to combine conceptualizations with polarities. We propose adding resources such as Micro-WNOp 44 (a corpus labelled with sentiments with around 1,105 WN synsets). Moreover, we are working to align ISR-WN with WNs in other languages in order to create a multilingual resource. 43 Synset pairs that commonly appear together in corpus. 44 http://www.unipv.it/micrownop Love Love Sensation n Sensation Medicine Medicine ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT In order to add more knowledge to ISR-WN, we propose the integration of different ontologies by using the RDF model to align WN synsets with ontological concepts. This kind of information will help us to apply different techniques (i.e. Domain Classification) with specific domains such as: medicine, technology, tourism, pharmaceutical, etc. Moreover, we want to integrate BabelNet (Navigli and Ponzetto, 2010) (a resource that connects the multilingual web encyclopaedia Wikipedia with WN) with ISR-WN. BabelNet was considered as a sense inventory in Semeval2013 45 in Task 12: Multilingual Word Sense Disambiguation. Finally, we plan to use the ISR-WN functionalities in order to help summarization systems to customize text summaries using different domains, categories, emotions, sentiment polarities, etc. Acknowledgments This research work has been partially funded by the University of Alicante, Generalitat Valenciana, Spanish Government and the European Commission through the projects, TIN2015-65136-C2-2-R, TIN2015-65100-R, SAM (FP7-611312), and PROMETEOII/2014/001. References Agirre, E., Cer, D., Diab, M. & Gonzalez-Agirre, A. (2012) SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. {*SEM 2012}: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation {(SemEval 2012)}. Montreal, Canada, Association for Computational Linguistics. Agirre, E., Lacalle, O. L. D., Fellbaum, C., Hsieh, S.-K., Tesconi, M., Monachini, M., Vossen, P. & Segers, R. (2010) SemEval-2010 task 17: All-words word sense disambiguation on a specific domain. Proceedings of the 5th International Workshop on Semantic Evaluation. Los Angeles, California, Association for Computational Linguistics. Agirre, E. & Soroa, A. (2009) Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-2009). Athens, Greece. Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B. & Vossen, P. (2004) The MEANING Multilingual Central Repository. Proceedings of the Second International Global WordNet Conference (GWC’04). . Brno, Czech Republic. Baccianella, S., Esuli, A. & Sebastiani, F. (2010) SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. IN 2010, L. (Ed.) 7th Language Resources and Evaluation Conference. Valletta, MALTA. Bentivogli, L., Forner, P., Magnini, B. & Pianta, E. (2004) Revising the WORDNET DOMAINS Hierarchy: semantics, coverage and balancing. Proceedings of COLING 2004 Workshop on "Multilingual Linguistic Resources". Geneva, Switzerland. . Briguez, C. E., Budan, M. C. D., Deagustini, C. A. D., Maguitman, A. G., Capobianco, M. & Simari, G. R. (2014) Argument-based mixed recommenders and their application to movie suggestion. Expert Systems with Applications, 41, 6467-6482. Brill, E. (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of- speech tagging. MIT Press. Cotton, S., Edmonds, P., Kilgarriff, A. & Palmer, M. (2001) English All word. IN LINGUISTICS, A. F. C. (Ed.) SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguation Systems. Toulouse, France, Association for Computational Linguistics. Chávez, A., Dávila, H., Gutiérrez, Y., Collazo, A., Abreu, J. I., Fernández Orquín, A., Montoyo, A. & Muñoz, R. (2013) UMCC_DLSI: Textual Similarity based on Lexical-Semantic features. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity. Atlanta, Georgia, USA, Association for Computational Linguistics. Dávila, H., Fernández, A., Gutiérrez, Y., Muñoz, R., Montoyo, A. & Vázquez, S. (2012) Semantic Information Extraction method on ontologies. SEPLN 2012: XXVIII CONGRESO DE LA SOCIEDAD ESPAÑOLA PARA EL PROCESAMIENTO DEL LENGUAJE NATURAL. Castellón, Spain. Dávila, H., Fernández Orquín, A., Chávez, A., Gutiérrez, Y., Collazo, A., Abreu, J. I., Montoyo, A. & Muñoz, R. (2013) UMCC_DLSI-(EPS): Paraphrases Detection Based on Semantic Distance. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International 45 http://www.cs.york.ac.uk/semeval-2013/ ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA, Association for Computational Linguistics. Dorr, B. J. & Castellón, M. A. M. A. I. (1997) Spanish EuroWordNet and LCS-Based Interlingual MT. AMTA/SIG-IL First Workshop on Interlinguas. San Diego, CA. Esuli, A. & Sebastiani, F. (2006) SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. IN 2006, L. (Ed.) Fifth international conference on Languaje Resources and Evaluation Genoa - ITaly. Fellbaum, C. (1998) WordNet. An Electronic Lexical Database, University of Cambridge. Fernández, A., Gutiérrez, Y., Dávila, H., Chávez, A., González, A., Estrada, R., Castañeda, Y., Vázquez, S., Montoyo, A. & Muñoz, R. (2012a) UMCC_DLSI: Multidimensional Lexical-Semantic Textual Similarity. {*SEM 2012}: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation {(SemEval 2012)}. Montreal, Canada, Association for Computational Linguistics. Fernández, A., Gutiérrez, Y., Muñoz, R. & Montoyo, A. (2012b) Approaching Textual Entailment with Sentiment Polarity. ICAI'12 - The 2012 International Conference on Artificial Intelligence. Las Vegas, Nevada, USA. Forner, P. (2005) WordNet Domains 2.0. ITC-irst, Povo-Trento, Italy. Gediminas, A. & Alexander, T. (2005) Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Educational Activities Department. Genesereth, M. R. & Fikes, R. E. (1992) Knowledge Interchange Format. IN UNIVERSITY, C. S. D. S. (Ed.) version 3.0 Reference Manual ed. Stanford, Computer Science Department. Gliozzo, A., Strapparava, C. & Dagan, I. (2004) Unsupervised and Supervised Exploitation of Semantic Domains in Lexical Disambiguation. Computer Speech and Language. Gutiérrez, Y. (2012) Análisis Semántico Multidimensional aplicado a la Desambiguación del Lenguaje Natural. Departamento de Lenguajes y Sistemas Informáticos. Alicante, Alicante. Gutiérrez, Y., Castañeda, Y., González, A., Estrada, R., Piug, D. D., Abreu, J. I., Pérez, R., Fernández Orquín, A., Montoyo, A., Muñoz, R. & Camara, F. (2013) UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA, Association for Computational Linguistics. Gutiérrez, Y., Fernández, A., Montoyo, A. & Vázquez, S. (2010a) Integration of semantic resources based on WordNet. IN 2010, S. (Ed.) XXVI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad Politécnica de Valencia, Valencia, SEPLN 2010. Gutiérrez, Y., Fernández, A., Montoyo, A. & Vázquez, S. (2010b) UMCC-DLSI: Integrative resource for disambiguation task. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden, Association for Computational Linguistics. Gutiérrez, Y., Fernández, A., Montoyo, A. & Vázquez, S. (2011a) Enriching the Integration of Semantic Resources based on WordNet. Procesamiento del Lenguaje Natural, 47, 249-257. Gutiérrez, Y., Vázquez, S. & Montoyo, A. (2011b) Improving WSD using ISR-WN with Relevant Semantic Trees and SemCor Senses Frequency. Proceedings of the International Conference Recent Advances in Natural Language Processing 2011. Hissar, Bulgaria, RANLP 2011 Organising Committee. Gutiérrez, Y., Vázquez, S. & Montoyo, A. (2011c) Sentiment Classification Using Semantic Features Extracted from WordNet-based Resources. Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011). Portland, Oregon., Association for Computational Linguistics. Gutiérrez, Y., Vázquez, S. & Montoyo, A. (2011d) Word Sense Disambiguation: A Graph-Based Approach Using N- Cliques Partitioning Technique. IN MUÑOZ, R., MONTOYO, A. & MÉTAIS, E. (Eds.) Natural Language Processing and Information Systems. Springer Berlin / Heidelberg. Izquierdo, R. (2010) Una Aproximación a la Desambiguación del Sentido de las Palabras Basada en Clases Semánticas y Aprendizaje Automático. Departamento de Lenguajes y Sistemas Informáticos. Alicante, Universidad de Alicante. Izquierdo, R., Suárez, A. & Rigau, G. (2007) A Proposal of Automatic Selection of Coarse-grained Semantic Classes for WSD. Procesamiento del Lenguaje Natural, 39, 189-196. Izquierdo, R., Suárez, A. & Rigau, G. (2010) GPLSI-IXA: Using Semantic Classes to Acquire Monosemous Training Examples from Domain Texts Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden, Association for Computational Linguistics. Leacock, C. & Chodorow, M. (1998) Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics. Luisa Bentivogli, P. F., Bernardo Magnini, Emanuele Pianta (2005) Revising the WORDNET DOMAINS Hierarchy: semantics, coverage and balancing. ITC-irst – Istituto per la Ricerca Scientifica e Tecnologica Via Sommarive 18, Povo – Trento, Italy, 38050. Magnini, B. & Cavaglia, G. (2000) Integrating Subject Field Codes into WordNet. Proceedings of Third International Conference on Language Resources and Evaluation (LREC-2000). Magnini, B., Satrapparava, C., Pezzulo, G. & Gliozzo, A. (July 2002) The Role of Domains Informations in Word Sense Disambiguatios. Treto, Cambridge University Press. Magnini, B., Strapparava, C., Pezzulo, G. & Gliozzo, A. (2002) Comparing Ontology-Based and Corpus-Based Domain Annotations in WordNet. Proceedings of the First International WordNet Conference. Mysore, India. ACCEPTED MANUSCRIPT A CC EP TE D M A N U SC RI PT Marco De, G., Pasquale, L., Giovanni, S. & Pierpaolo, B. (2008) Integrating tags in a semantic content-based recommender. Proceedings of the 2008 ACM conference on Recommender systems. Lausanne, Switzerland, ACM. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. (1990) Five papers on WordNet. Princenton University, Cognositive Science Laboratory. Navigli, R. (2009) Word sense disambiguation: A survey. ACM Comput. Surv., 41, 10:1--10:69. Navigli, R. & Ponzetto, S. P. (2010) BabelNet: Building a Very Large Multilingual Semantic Network. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, Association for Computational Linguistics. Niles, I. (2001) Mapping WordNet to the SUMO Ontology. Teknowledge Corporation. Niles, I. & Pease, A. (2001) Origins of the IEEE Standard Upper Ontology. Working Notes of the IJCAI-2001 Workshop on the IEEE Standard Upper Ontology. Seattle, Washington, USA. Niles, I. & Pease, A. (2003) Linking Lexicons and Ontologies: Mapping WordNet to the Suggested Upper Merged Ontology. Pease, A. (2007) Standard Upper Ontology Knowledge Interchange Format. Perner, P., Candillier, L., Meyer, F. & Boulle, M. (2007) Comparing State-of-the-Art Collaborative Filtering Systems. Machine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg. Peter, D. T. (2001) Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning. Springer-Verlag. Pianta, E., Bentivogli, L. & Girardi, C. (2002) MultiWordNet. Developing an aligned multilingual database. Proceedings of the 1st International WordNet Conference. Mysore, India. Rao, D., Mcnamee, P. & Dredze, M. (2013) Entity linking: Finding extracted entities in a knowledge base. Multi- source, multilingual information extraction and summarization. Springer. Russell, S. & Norvig, P. (1994) A Modern, Agent-Oriented Approach to Introductory Artificial Intelligence. Sanda M. Harabagiu, G. A. M., Dan I. Moldovan (1999) Wordnet 2-A morphologically and semantically enhanced resource. SIGLEX99: Standardizing Lexical Resources. Sara, T. & Daniele, P. (2009) New features for FrameNet: WordNet mapping. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder, Colorado, Association for Computational Linguistics. Ševčenko, M. (2003) Online Presentation of an Upper Ontology. CTU Prague, Dept of Computer Science. Shinde, S. K. & Kulkarni, U. (2012) Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications, 39, 1381-1387. Sowa, J. F. (1999) Knowledge representation: logical, philosophical, and computational foundations. Course Technology. Strapparava, C. & Valitutti, A. (2004) WordNet-Affect: an affective extension of WordNet. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004). Lisbon. Turney, P. D. & Littman, M. L. (2003) Measuring Praise and Criticism: Inference of Semantic Orientation from Association. ACM Transactions on Information Systems (TOIS), 21, 315–346. Udi, M., Ash, P. & John, R. (2000) Experience with personalization of Yahoo! , ACM. Valitutti, A., Strapparava, C. & Stock, O. (Eds.) (2004) Developing Affective Lexical Resources, ITC-irst, Trento, Italy, PsychNology Journal. Vázquez, S. (2009) Resolución de la ambigüedad semántica mediante métodos basados en conocimiento y su aportación a tareas de PLN. Depto. de Lenguajes y Sistemas Informáticos. Alicante, Spain., Universidad de Alicante. Vázquez, S., Montoyo, A. & Rigau, G. (2004) Using Relevant Domains Resource for Word Sense Disambiguation. IC-AI’04. Proceedings of the International Conference on Artificial Intelligence. Ed: CSREA Press. Las Vegas, E.E.U.U. Vossen, P. (1998) EuroWordNet: A Multilingual Database with Lexical Semantic Networks, Dordrecht, Kluwer Academic Publishers. Vossen, P., Peters, W. & Gonzalo, J. (1999) Towards a Universal Index of Meaning. proceedings of the ACL-99 Siglex workshop. University of Maryland. Walter, C.-N., Maria Luisa, H.-A., Rafael, V.-G. & Francisco, G.-S. (2012) Social knowledge-based recommender system. Application to the movies domain. Pergamon Press, Inc. Zhibiao, W. & Martha, P. (1994) Verbs semantics and lexical selection. Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Las Cruces, New Mexico, Association for Computational Linguistics. Zouaq, A., Gagnon, M. & Ozell, B. (2009) A SUMO-based Semantic Analysis for Knowledge Extraction. Proceedings of the 4th Language & Technology Conference. Poznań, Poland.