SWSNL: Semantic Web Search Using Natural Language Ivan Habernala,∗, Miloslav Konoṕıkb aDepartment of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia, Univerzitńı 8, 306 14 Plzeň, Czech Republic b NTIS – New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia in Pilsen, Univerzitńı 8, 306 14 Plzeň, Czech Republic Abstract As modern search engines are approaching the ability to deal with queries expressed in natural language, full support of natural language interfaces seems to be the next step in the development of future systems. The vision is that of users being able to tell a computer what they would like to find, using any number of sentences and as many details as requested. In this article we describe our effort to move towards this future using currently available technology. The Semantic Web framework was chosen as the best means to achieve this goal. We present our approach to building a complete Semantic Web Search using Natural Language (SWSNL) system. We cover the complete process which includes preprocessing, semantic analysis, semantic interpretation, and executing a SPARQL query to retrieve the results. We perform an end-to-end evaluation on a domain dealing with accommodation options. The domain data come from an existing accommodation portal and we use a corpus of queries obtained by a Facebook campaign. In our paper we work with written texts in the Czech language. In addition to that, the Natural Language Understanding (NLU) module is evaluated on another domain (public transportation) and language (English). We expect that our findings will be valuable for the research community as they are strongly related to issues found in real-world scenarios. We struggled with inconsistencies in the actual Web data, with the performance of the Semantic Web engines on a decently sized knowledge base, and others. Keywords: semantic search, natural language understanding, semantic web, statistical semantic analysis 1. Introduction Keyword-based search engines have proven to be very efficient on collections of unstructured textual content, e.g. web pages [1]. However, if users want to find information in a structured content, e.g. in a database, the basic key- word search might not be expressive enough [2]. A simple, yet sufficient solution on the Web can be provided by a form-based user interface which typically combines key- words with some other restrictions, according to the spe- cific domain structure [3]. Form-based interfaces are user friendly in the sense that they do not require the user’s prior knowledge of the underlying data structures. The structure is typically shown as multiple forms or menus that allow further specification of the user request. Never- theless, it is obvious that from the user’s point of view, a simple keyword input is a more straightforward approach than filling forms. A different approach to retrieving sought information consists in using a domain-specific query language. How- ever, as pointed out by [4], querying structured data using ∗Corresponding author. Tel: +420 377632456 Email addresses: habernal@kiv.zcu.cz (Ivan Habernal), konopik@kiv.zcu.cz (Miloslav Konoṕık) a domain-specific query language, e.g. SQL or SPARQL [5], can be complicated for casual users. Most of the exist- ing query languages use a precise syntax and the users are expected to be familiar with the back-end data structures. Evidently, this allows exact queries to be formulated by domain experts but on the other hand this is a big obsta- cle for casual users. One step beyond the above-mentioned traditional ap- proaches are Natural Language Interfaces (NLI). The key feature of such interface is that users can search for the required information by posing their queries using natural language (NL). It is assumed that posing NL queries is more precise than the keyword approach and at the same time more natural than the form-based approach. Recent studies [6, 7] indeed confirmed that NLI offers a better user experience than the form-based search for casual users. In the article we present a realistic approach to design- ing and developing a complete Semantic Web Search using Natural Language (SWSNL) system. The system builds on the Semantic Web paradigm and thus uses ontologies as a main vehicle for storing the domain structure and all the data (the Knowledge Base, KB). Furthermore, ontolo- gies, due to their ability to precisely capture the semantics, are also used to describe the meaning of user queries that Preprint submitted to Expert Systems With Applications November 14, 2012 are expressed in natural language (NL). The system can be seen as a search engine which accepts NL queries and performs the search in the back-end KB. Our SWSNL system also has a few aspects that distin- guish it from other related work. Firstly, we allow users to formulate their NL queries using more than a single sentence. Secondly, the Natural Language Understanding (NLU) module incorporates a stochastic semantic analysis model and does not require any syntactic parser prepro- cessing. The NLU module is trained from annotated data and it is language independent. Thirdly, the ontology for describing natural language queries is different from (inde- pendent of) the KB ontology. This independence enables switching between various KBs without affecting the NLU module. The system was thoroughly tested and evaluated on three different domains. Primarily, the end-to-end evalua- tion was performed on the accommodation options domain using actual data acquired from the Web as well as the cor- pus of actual NL queries in the Czech language. The other two domains, the Czech public transportation domain and a subset of the English ATIS dataset, were used to verify the portability to different domains and languages. The initial impulse to build the presented system was an idea to enrich an existing form-based application with a newly developed NLI. The expected benefits were tar- geted into better user experience and into testing the NLI approach in practice. The selected domain of accommo- dation options followed the idea as our preliminary survey which revealed a promising potential of such a domain. The selection of the Czech language as the domain lan- guage was not a random choice. The decision was meant to verify the technology in a different language than En- glish. The rest of the article is organized as follows. Section 2 outlines the context of the related work. In section 3 we introduce our system in a schematic overview. Sec- tion 4 describes the target domain together with the tech- niques required to deal with actual Web data sources. A natural language corpus is also presented. Section 5 de- scribes the formalism for capturing natural language query semantics. Section 6 proposes a statistical semantic model for analysing natural language queries and section 7 deals with the semantic interpretation of a semantic annotation in order to perform a search in a particular KB. Section 8 thoroughly describes the evaluation of our SWSNL sys- tem. Open issues and future work are then discussed in section 9. 2. Related Work Before we present the related work, we shortly discuss the terminology of the current research into Natural Lan- guage Interfaces. Typically, the term Natural Language In- terfaces (NLI) is used when a system can be accessed using (written) natural language. Such a system mostly operates on structured information and, given the natural language question, it tries to find the correct answer. The family of NLI systems can be divided into various sub-classes. A Natural Language Interfaces to Databases (NLIDB) sys- tem holds information in a relational database. The prin- ciples of NLIDB have been adapted to the Semantic Web resulting into the Natural Language Interfaces to Knowl- edge Bases (NLIKB). In this case of NLI, the information is stored in the form of ontology, which plays the funda- mental role in the Semantic Web. We define our research as Semantic Web Search Us- ing Natural Language (SWSNL). Although NLIKB cov- ers a task similar to our approach, most of the existing NLIKB systems are designed to evaluate rather complex logical queries over the Knowledge Base. On the contrary, SWSNL can be viewed as a special case of a search engine that retrieves a set of results according to a natural lan- guage query, operating in the Semantic Web field. Further- more, our SWSNL system does not belong to the family of question answering systems since these two fields have a very different motivation. Whereas a question answering system tries to answer a user’s NL questions, the purpose of our system is to retrieve search results according to a user’s NL queries. 2.1. Overview of recent NLIKB systems A very recent system called PowerAqua [8] is an ontolo- gy-based NLI system which surpasses traditional systems by managing multiple ontology sources and high scalabil- ity. Since its NL processing module remains the same as in the previous AquaLog system [9], we will review the AquaLog system. AquaLog is a portable NLIKB system which handles user queries in a natural language (English) and returns answers inferred from a knowledge base. The system uses GATE1 libraries (namely the tokenizer, the sentence splitter, the POS tagger, and the VP chunker). ORAKEL [10] is an ontology-based NLI system. It ac- cepts English factoid questions and translates them into first-order logic forms. This conversion uses full syntax parsing and a compositional semantics approach. ORAKEL can be ported into another domain but such porting re- quires a domain expert to create a domain-dependent lex- icon. The lexicon is used for an exact mapping from nat- ural language constructs to ontology entities. A possible drawback of ORAKEL’s approach is that the system can neither handle ungrammatical questions nor deal with un- known words. The FREyA system [11] is an NLIKB system that com- bines syntactic parsing with ontology reasoning. It derives parse trees of English input questions and uses heuris- tic rules to find a set of potential ontology concepts (for mapping from question terms to ontology concepts) using GATE and the OntoRoot Gazetteer [12]. The primary source for question understanding is the ontology itself. If the system encounters ambiguities, a clarification dialogue 1http://gate.ac.uk 2 is offered to the user. The potential ontology concepts re- trieved from the question analysis are then transformed into SPARQL. The system was tested on Mooney: Geog- raphy dataset [13]. A portable NLIKB system called PANTO [14] is based upon the off-the-shelf statistical parser Standford Parser and integrates tools like WordNet and various metrics al- gorithms to map the NL question terms to an interme- diate representation called QueryTriples.2 This seman- tic description is then mapped onto the OntoTriples that are connected to entities from the underlying ontology. This step involves a set of 11 heuristic mapping rules. Finally, OntoTriples are represented as SPARQL queries. The main idea of transforming a NL question into triples in PANTO is based upon the empirical observation that two nominal phrases from a parse tree are expected to be mapped onto a triple in the ontology. The system was eval- uated on the Mooney dataset and the output of PANTO was compared to the manually generated SPARQL queries. The NLP-Reduce system [15] does not involve any ad- vanced linguistic and semantic tools and depends on match- ing the query words to the KB instances. Its core part is a query generator which is responsible for creating SPARQL query given the words and the lexicon extracted from the KB. The major strength of the system is its good porta- bility as it does not depend on any complex NLP query processing. The ontology-based NLI system called QACID intro- duced in [16], covers a cinema/movie domain. Its tar- get language is Spanish. It consists of two main compo- nents, the user query formulation database, and textual- entailment engine. Whereas the former component serves mainly for development and system training purposes, the latter is intended for an unknown query processing. The core of the QACID system is a database of query formu- lations. The database contains a set of 54 clusters, each cluster represents one type of question and it has a rep- resentative query pattern which was derived from a set of training data. Each cluster is also associated with one SPARQL query. As pointed out in the QACID evaluation, the system is not able to answer unknown ontology con- cepts and therefore it fails if the user poses a query using terms that are not present in the lexicon. A domain-restricted NLI system called QUETAL [17] exploits a robust semantic analysis on a hybrid NLP ar- chitecture. The system was developed to answer NL ques- tions on two domains: the Nobel prizes and information about the Language Technology Institute. For these two domains, the ontologies were converted from existing on- tologies and merged with other more general ontologies (e.g. SUMO for the Nobel prizes). Question analysis is performed by the HPSG (Head-driven Phrase Structure Grammar [18]) syntactic and semantic parser with the sup- 2A mixed terminology (triplets vs. triples) appears in the litera- ture. In this paper we use triples as proposed in the RDF standard, http://www.w3.org/TR/rdf-concepts/#section-triples. port of the Heart of Gold architecture3 which provides shallow Named Entity Recognition (NER). The HPSG parser uses grammars for English and German and the output of the parser is a formalism called Minimal Re- cursion Semantics which is a flat, non-recursive semantic representation format [19]. This question representation is then transformed into the proto queries that are suitable for querying a particular KB. A domain-independ system QuestIO [20] extracts the domain lexicalizatin from the KB. The natural language query is transformed into SeRQL and then executed against the KB. The system was tested on 36 questions collected from the GATE user mailing list and several queries over a geographic ontology. Instead of supporting full NL questions, the QUICK (QUery Intent Constructor for Keywords) system guides users in constructing a semantic query from keywords [21]. After typing in the keywords, the system analyses them ac- cording to the underlying ontology and allows the user to choose the required semantic query in a clarification dia- logue. The performance was tested using two ontologies (a movie database and a song database) and 175 queries consisting of 2-5 keywords. Although QUICK is not a fully-fledged NLI system, it can be viewed as a trade-off between a simple keyword search and the NL question sys- tems. Another keyword-based information extraction and retrieval system based on ontology in the soccer domain is presented in [22]. The system is evaluated only on 10 simple queries. A good overview of other state-of-the art keyword-based semantic search systems is shown in [23]. 2.2. Existing Ontologies In Accommodation Domain The Travel ontology [24] is a core part of Travel Guides4. The goal of this system is to solve the problem of dis- tributed tourist sources and to help users find a vaca- tion package quickly. The ontology thoroughly models the tourist domain but it is not populated with instances of actual accommodation options. Another travel ontology developed by Knublauch in 2004 [25] was intended as an example how to put Semantic Web techniques into practice. The ontology covers the travel domain, however, it was not designed according to any existing travel information source. Accessing touristic ontology using NLI is the goal of [26]. The system is evaluated on a set of 20 simple NL queries. The author of [27] deals with a related task, namely with a personalised hotel search using Semantic Web technologies. However, his system is form-based and does not involve NLI. It is also worth mentioning that ontologies are nowa- days used by some companies as a backbone in tourism 3Heart of Gold is an open source middleware for combining shal- low and deep NLP components http://heartofgold.dfki.de/ 4http://sites.google.com/site/ontotravelguides/ 3 portals and services, such as by Seekda5 [28, 29] or Mon- deca6. 2.3. Evaluation in Related Work Whereas in established NLP branches standard test datasets together with evaluation criteria are available, there has been a lack of such dataset in the NLIKB field. Some typical problems related to the evaluation are: • There is no widely accepted agreement on what is a correct result. This is a very subjective expression as it stands for e.g. a recall of the correct query [10], a number of answered questions [9], or even a number of queries generated as an output [14]. • Each system operates on a different ontology and the ontologies vary in their size and complexity. As pointed out by [30], more attention should be paid to creating a standard evaluation set. Recently, various evaluation campigns have been initiated as a part of a se- mantic tool benchmarking project called Seals7 (Semantic Evaluation At Large Scale). In the field of NLIKB, [6] and [7] evaluate different approaches to semantic search for expert and casual users. The Mooney Geoquery was chosen as testing ontology. The dataset contains informa- tion about U. S. geography, such as cities, rivers, moun- tains, countries, etc. Originally, the dataset was in Prolog (consisting of about 1000 Prolog facts), later it was con- verted into OWL by [31]. The question corpus consists of 880 user questions with relation to U. S. geography and was collected from undergraduate students of the authors of [13]. A deep quantitative analysis of this dataset was performed by [31] and the authors of [10] point out some interesting real-world issues of this dataset. 3. SWSNL System Architecture Overview Our SWSNL system is a Java-based application con- sisting of various components. Figure 1 shows the archi- tecture of the system together with the data-flow of an example query. On the input, the user makes a query in natural lan- guage (NL). There are no restrictions on the query in terms of its length, as the user can express the query using key- words, a single sentence or the whole paragraph. We eval- uated our system on two languages, Czech and English. The details of the NL queries are described in section 4.1, some examples of actual queries can be found in Appendix A. The query is analysed by the Natural Language Un- derstanding (NLU) component. This component produces a KB-independent semantic representation of NL query. 5http://www.seekda.com/ 6http://www.mondeca.com/Clients 7http://www.seals-project.eu/ The NLU component is comprised of three basic blocks, the NL query preprocessing, Named Entity Recognition (NER) and Semantic Analysis. The details of the NLU module are presented in section 6. The module uses a sta- tistical model and thus require an annotated corpus of NL queries. The semantic interpretation module transforms the KB- independent query annotation into a query language which is then executed against the KB. The KB is created from the existing data, as described in section 4.2. As a result, a list of KB instances with their properties is displayed to the user. 4. Target Domain The target domain sets constraints for the demonstra- tion of practical and theoretical values of the developed NLI system. Even in cases where the system is domain independent (most of the current NLI systems) a suitable domain will show both the potential and the limitations of the system. As was pointed out in the introduction, we focus on the Czech language in the first place. Due to its high degree of inflection and relatively free word order, it makes the task more challenging. The system can be, however, adapted to English which will be proven later in the evaluation. We decided not to develop our system using any dataset from the existing related work. Firstly, there is no such dataset in the Czech langauge which meets requirements for performing end-to-end evaluation, namely availability of a KB that contains the desired domain information, a corpus of NL queries, and correct “answers” from the KB for each query. Secondly, the existing English corpora limit the queries only to single clauses or sentences. Thirdly, the widely-used Mooney Geoquery [13] corpus is, from our point of view, a typical corpus for the Question Answering task whereas our system is an replacement/enrichment of and existing form-based UI. In the rest of this section, the process of collecting an NL query corpus and creating a KB will be depicted. 4.1. Natural Language Query Corpus Whereas all existing NLI systems presented in the re- lated work section expect queries to be single clauses or single sentences, we decided to go beyond the state of the art by allowing users to formulate their queries using one or more sentences to express their needs more precisely and more naturally. We decided to use a voluntary-based way and started a Facebook campaign which can target a broad audience. The motivation stems from our previous experience with obtaining data for NLI systems [32] which showed us that obtaining data only from i.e. a group of students can neg- atively affect the data (in terms of a higher number of e.g. off-topic or silly queries). The Facebook page contains a clear description of the “virtual” SWSNL system which says: “There is a search 4 Figure 1: The SWSNL system architecture. engine that operates on the accommodation domain in the Czech Republic. Try to search for an accommodation op- tion that you really want by describing it using natural language instead of keywords. You are not limited to a sin- gle sentence.” A few diverse examples followed. At that time, we had no back-end database or knowledge base, thus we did not limit the users to asking queries that can be found in any existing accommodation search portal. On one hand, we did not restrict the user queries to comply with a structure or content of a particular domain model. On the other hand, there was a risk that the KB would not contain the answers to the collected NL queries. We asked each user to pose one to three queries. We gathered 68 queries in total. This was fewer than expected but the whole campaign was based upon spon- taneity and our intention was not to push anyone into participation. Note that the participants were not asked to find answers to their queries. The main goal was to collect just the queries in this step. Examples of the user queries can be found in Appendix A. Table 1 shows some statistics of the NL query cor- pus. Although there was no limit to the query length, the distribution of query lengths (in terms of the number of sentences used) illustrates that most of the users used single sentence and two sentence-long queries. However, the average query length shows that the queries are rel- atively long, which is also important from another point of view—the queries contain a lot of information (see the examples in Appendix A). This makes the corpus quite atypical compared to other existing corpora. Corpus statistics Queries consisting of 1 sentence 37 Queries consisting of 2 sentences 23 Queries consisting of 3 sentences 8 Average query length (words) 24.9 Average number of sentences per a query 1.54 Number of unique participating users 42 Average number of queries per user 1.54 Table 1: Various NL query corpus statistics. 4.2. Knowledge Base and Domain Ontology In order to evaluate a fully functional SWSNL pro- totype application, the underlying KB is essential. The KB is a module that stores complete knowledge about the covered domain. In the Semantic Web, ontologies are the main vehicle for domain modeling. Henceforth, we will use KB in terms of an ontology populated with instances. As a source website for data mining, the portal hotelypenziony.cz was chosen. The site provides a large database of various types of accommodation options (from cottages to luxury hotels). Almost all entries have some location information, such as GPS coordinates or an exact address. Furthermore, each accommodation page contains some information about facilities, prices, etc. but these fields are not mandatory. Some entries are enhanced with textual descriptions, like e.g. a site or a restaurant de- scription. The entire site was downloaded using an in-house craw- ler written in Groovy. A pattern-based information extrac- 5 tion tool was developed in order to extract the structured content from the crawled web pages. The extracted data were then transformed into the corresponding ontology in- stances and properties. The ontology is stored in OWL8 format. The domain ontology structure was designed accord- ing to the extracted information, see Figure 2. The struc- ture is more or less “flat”. It has one important class Accommodation with many object and data properties. The reason for this design stems from the structure of the website (the information source), where one accommoda- tion page is equivalent to one instance of the ontology class Accommodation.9 A transitional relation isSubpartOf (see Figure 2) al- lows capturing a hierarchy of Location instances, e.g. Street is a sub-part of City, etc. However, the prop- erty is not functional. This means that it is possible that one location is a sub-part of more than one other location. Typically, this is the case of & where c is an instance of City, a is an in- stance of District and t is an instance of TouristRegion. The task of creating such hierarchy only from the given address and the GPS coordinates can be found in [33]. Some existing ontologies for the accommodation do- main were presented in section 2.2. Although some of these ontologies are more complex in terms of number of classes and the relations among them, we decided not to re-use any of these ontologies in our system. The reason was that the structure of the source domain (the website) and the structure of the existing ontologies were so different, that we simply did not see any benefit in trying to fit our data into such different ontologies. 4.2.1. Qualitative Analysis of the Knowledge Base The total number of all instances (individuals) in the KB is 32,990 and the total number of all triples is 281,686. The total number of instances of the Accommodation class is 8641. An accommodation instance contains two types of prop- erties. We will distinguish them as text properties and structured properties. Structured properties. These properties are both da- tatype and object properties (in OWL terminology) with exactly specified semantics. This kind of prop- erty contains structured information such as number values (e.g. various capacities or prices) or facilities (e.g. room facilities, sports activities). Text properties. Text properties are four datatype prop- erties (in OWL terminology), namely accommDesc, 8http://www.w3.org/TR/owl2-overview/ 9The inheritance relation between RoomType and (PrivateDouble, PrivateSingle, ...) was an authors’ design choice. Another way to capture the same information would be e.g. a property numberOfBeds. siteDesc, conferenceDesc and restaurant. These prop- erties are mostly long text fields extracted from the accommodation website. They contain an unstruc- tured description of the accommodation option, e.g. restaurant information, description of the site, etc. Furthermore, these properties do not have an exactly specified semantic meaning, as opposed to structured properties. The structured properties are crucial for searching us- ing a query language. On the contrary, if a piece of infor- mation is available only in text properties, it can hardly be searched for using a query language. In our KB, about 60% of all accommodation instances have no additional property, except for the obligatory contact information. This means that we cannot say anything specific about these accommodation options. The system only knows that a certain accommodation option exists in a certain place but no additional information is offered. However, almost all the user queries (see the examples in Appendix A) relate to a more specific type of accommodation op- tions, with e.g. certain properties, prices, room types, etc. This means that almost 40% of accommodation options cannot be found when users ask for additional accommo- dation requirements. Note that this lack of structured properties is caused by the source web data, not by the ontology structure. 4.2.2. Ambiguity in the Knowledge Base The main advantage of a well-designed ontology is its precise semantics. It allows complex semantic constructs to be used in order to infer new facts and query the KB using a query language. In reality, systems must often deal with existing data since the amount of work needed to create and populate an ontology from scratch would be tremendous. Also our KB is completely created from existing data. As the original web data were not designed to respect precise domain se- mantics, a sort of semantic ambiguity or inconsisteny was transferred to the KB. Some notable problems are: (1) The accommodation types are quite ambiguous. There is no clear difference between e.g. Hut and Cottage, among others. (2) Lots of facilities are duplicated in some sense. For example, the KB contains four similar instances of a parking lot (namely parkingLot, parkingLotWithLights, parkingGarage, and securedParkingGarage). Note that we left all the data unchanged because it was not feasible to check and correct all the inconsistencies manually. 4.3. Gold Data Preparation The correct answer (result) to a NL query is a set of accommodation instances that satisfy the user’s require- ments. Many NL queries contain sorts of subjective ex- pressions, e.g. “cheap”, “nice”, “close to something”, etc. It is necessary to decide in advance how to interpret these expressions consistently. The same interpretation must also be used later by the SWSNL system. 6 Figure 2: Structure of the accommodation ontology. Expression Interpretation near a city all places in the district in which the city is located close to (a place) the distance is less than 1 km cheap any room of the accommodation option has a price lower than 1000 CZK cheapest returns only one cheapest accommodation option Table 2: Semantic interpretation of some NL expressions Table 2 presents some examples of the semantic inter- pretation. The values (prices and distances) were set after a consensus was reached in our research team, however, this interpretation definitely depends on each user’s point of view. We deal only with such expressions that can be quantified or expressed using the KB semantics. 4.3.1. Assigning the Correct Results For this task we combined manually constructed struc- tured and fulltext search queries. Structured search. For each NL query a corresponding SPARQL query was manually constructed. We put as many constraints from the NL query as possible into the SPARQL query. For example, if the user asks for particular facilities (e.g. parking, internet), we add all of them in the SPARQL query. This en- sures that only those accommodation instances that satisfy the constraints are selected from the KB. How- ever, adding such strict criteria has the effect. First, the results that partially satisfy the requirements are discarded.10 No ranking of results is involved. Fulltext search. As presented in section 4.2.1, each ac- commodation instance can have a few text descrip- tions (labels). Such a description usually contains information which might be available in the struc- tured properties as well (e.g. description of hotel facilities in the text and then again as structured properties). These text fields can be indexed and searched through using a fulltext search. Finally, the whole process of assigning the correct re- sults to each NL query was performed as follows: • If the NL query can be completely expressed using SPARQL, create such a query and add the results to the correct results. • If the NL query contains other requirements that cannot be expressed using SPARQL, filter the struc- tured search results with a fulltext search and check the returned results manually. • Moreover, for each NL query create the least restric- tive SPARQL query (only with location and accom- modation type) and filter the results using a fulltext 10This can be solved using OPTIONAL in SPARQL, however, at this stage we focused on results that satisfy all the requirements. 7 search. The fulltext keywords are manually selected according to the NL queries requirements. The re- sults must be checked manually. The gold data (in CSV format) as well as the KB (in OWL format) are licensed under CC-BY-SA11 and are publicly available.12 5. Semantic Description of NL Queries Some traditional approaches how NL query semantics can be expressed are e.g. frames, FOPL (first-order pred- icate logic), or semantic trees, among others. Related re- search [32, 34], based upon semantic trees, showed that describing the semantics of NL queries should follow a particular structure, e.g. using a schema for adding con- straints to the possible semantic trees. Ontologies offer such a mechanism by allowing precise semantic relations between instances to be defined. We will now discuss the term semantic annotation. In the Semantic Web, this term has multiple meanings, e.g. webpages are annotated according to a particular ontol- ogy (taxonomy) or named entities can be annotated with related instances from an ontology for furher named en- tity disambiguation [35], [36], [37]. Our usage comes from NLU research where sentences (or queries) can be anno- tated with some additional semantic information. Hence- forth, the term semantic annotation will be used in the sense of a semantic description of an NL query. On the Semantic Web, ontologies serve as a tool for storing domain information (KB). However, in our pro- posal the semantic annotation of NL queries based upon ontologies has no direct relation to the ontologies for stor- ing domain information (KB). This means that our seman- tic annotation is completely independent of a particular KB. Our semantic annotation uses ontologies exclusively to describe NL query semantics. The motivation for this separation was to enable e.g. switching between various KBs without affecting the NLU module or adapting the NLI to different language and keep the original KB. 5.1. Ontology for NL Query Semantics The domain-independent query ontology is shown in Figure 3. Each word is represented as an instance of the Word ontology class with some additional morphological properties and its position within the sentence. All Word instances are chained using the hasNextWord property. An instance of the Sentence class covers a whole particular NL query. The ontology also supports marking named entities. An instance of NamedEntity is connected to its content 11Creative Commons Attribution-ShareAlike 3.0 Unported Li- cense, http://creativecommons.org/licenses/by-sa/3.0/ 12http://liks.fav.zcu.cz/swsnl/ Figure 3: Domain independent NL query ontology (sentence ontol- ogy). words using a property startsWithWord, its lenght is stored in the numberOfWords property.13 Traditionally, named entities are used to label expres- sions with a special meaning (e.g. cities, persons, coun- tries, etc.). In this work, we extended the usage of a named entity to all words/word groups that carry impor- tant semantic information in the NL query. For example, we added the following domain named entities: relative lo- cation (covering phrases such as “near”, “in the center”, etc.), relative price (e.g. “cheap”), or additional require- ments (e.g. “fitness”, “internet”, etc.), and others. Thus the named entities in our approach can represent both “traditional” entities (e.g. place, accommodation type, fa- cilities, etc.) as well as certain domain-specific knowledge (e.g. relative location or price) and key phrases. We be- lieve that such arbitrary extension of the meaning of the term named entity is still acceptable. Named entities can also form a taxonomy as shown in Figure 3. This taxonomy can be extended in the ontology for a particular domain. For modeling the taxonomy of named entities, we use the inheritance of classes in OWL. For each query, instances of the domain-dependent query ontology classes (triples) are created. A complete example of an annotated query can be found in Appendix B. The NL query ontologies are stored in the OWL format and were developed using Protégé14. 5.2. NL Query Corpus Annotation The whole NL query corpus from section 4.1 was an- notated in the above-mentioned fashion. The annotated corpus is required for training the supervised statistical Natural Language Understanding module as will be shown later in section 6. The corpus was annotated by two independent anno- tators. For this task we developed an annotation tool 13This marking style was our design choice; we are aware of various different approaches to named entity annotation, such as using offsets in GATE. 14Protégé is a free, open source ontology editor and knowledge base framework. http://protege.stanford.edu/ 8 Figure 4: Annotation ontology for queries in the accommodation domain. The dotted lines represent the is-a relation. that provided a balanced trade-off between usability and robustness. The editor hides the complexity of the un- derlying triples and the sentence ontology.15 The inter- annotator agreement was measured on the number of match- ing triples16 from the NL query annotations. The inter- annotator agreement was found to be κ = 0.68, 95% CI (0.504, 0.848). 5.3. Formal Definition of Semantic Annotation Let S = w1 . . .wW be a sentence S (a query) consisting of W words w1...W and let T(Subj,Pred,Obj) be a triple of a subject Subj, a predicate Pred and an object Obj17. Then the semantic annotation Sem of a sentence S is an unordered set of M triples Sem(S) = {T1,T2, . . .TM}. (1) Within Sem(S), the triples can be connected. For- mally, two triples Tx(Subj1,Pred1,Obj1),Ty(Subj2,Pred2,Obj2) (2) can share their subjects or objects, so that Obj1 = Subj2 or Subj1 = Subj2, forming de facto a directed graph. 15This annotation task could be also achieved using other existing tools such as ontology-based gazetteer in GATE or CA Manager [38]. 16This is explained in detail later in section 8.1.2 17Please note that Subj, Pred and Obj are just names of the first, second and third position in the triple. They may or may not be related to the syntactic role of words in a sentence. Named Entities. Let Nτ be a Named Entity (NE) instance with type τ. Then Nτj,k = N τ span(wj . . .wk) (3) is NE of type τ which spans the words wj to wk, where 1 ≤ j < k ≤ W . This means that NE is associated with the words from the sentence. Let C be a semantic concept. C is not associated with any words. Both Nτ and C can be parts of a triple so that each triple has one of the following forms: T(Subj,Pred,Obj) = (4)  T(C,Pred,Nτj,k) if subj is C and obj is NE T(Nτ1j,k,Pred,N τ2 o,p) if subj and obj are NE T(Ca,Pred,Cb) if subj and obj are C where j,k,o,p are the NE spans, a,b are used to de- termine possibly different concepts C and τ1,τ2 can be different NE types. 6. Semantic Analysis of Natural Language Queries The main task of the Natural Language Understanding component is to create a semantic description of previously unseen NL queries. Our NLU component is based upon a statistical model and supervised learning. Furthermore, the component uses both off-the-shelf and newly developed preprocessing tools. All queries are preprocessed using tokenizer, part-of- speech tagger and lemmatizer from the Prague Depen- dency Treebank tokenizer and morphological tagger [39]. 6.1. Statistical model Using the formalism from section 5.3, the task of a semantic analysis is to find an appropriate semantic anno- tation Sem(S) given a sentence S. Using probability, we can formulate the problem as follows: P(Sem(S)|S) = P(T1,T2, . . .TM|S). (5) We approximate the probability by treating the triples as independent, thus P(T1,T2, . . .TM|S) ≈ M∏ P(Tm|S) (6) where P(Tm|S) = P(Tm(Subj,Pred,Obj)|w1 . . .wW ). (7) Furthermore, we limit our model to the first two triple types from Equation 4. This means that all triple ob- jects are NE and some triple subjects are NE, formally T = T(C,Pred,Nτ ) and T = T(Nτ1,Pred,Nτ2 ). This limitation is based upon our observation that the other two types of triples are not currently present in the cor- pus. 9 In our model the triple probabilities are approximated as P(T(C,Pred,Nτ )|S) ≈ (8) P(T(C,Pred,Nτ )|Nτj,k,wj−1) ·P(N τ j,k|S) and P(T(Nτ1,Pred,Nτ2)|S) ≈ (9) P(T(Nτ1j,k,Pred,N τ2 o,p)|N τ1 j,k,N τ2 o,p,wj−1,wo−1) ·P(Nτ1j,k|S) ·P(N τ2 o,p|S), where w is the lemma and P(Nτj,k|S) is obtained from named entity recognizer. Given a training corpus S of sentences and their an- notations S = {Sn,Sem(Sn)}, the probabilities are esti- mated using Maximum Likelihood Estimation (MLE). 6.1.1. Named Entity Recognition Named entity recognition is a crucial part of the query preprocessing. To achieve reasonable performance, we in- corporate three various NER approaches. Maximum Entropy NER. The maximum entropy-based NER (MaxEntNER) introduced in [40] was later extended and trained on a corpus from the Czech News Agency. The parameters of the corpus are very different from our NL query corpus since the Czech News Agency corpus consists of newspaper articles. However, the MaxEntNER is used to identify a few general named entities, such as Place and City. LINGVOParser. A shallow semantic parser based upon handwritten grammars, LINGVOParser [41, 42] is used to identify named entities with complex structure, such as Date, Currency or Number. String Similarity Matching. This NER approach (called WordSimilarityTrainableNER) was used as an ad-hoc tool to deal with the named entities that are not recognized by the two previous NERs. These are usually the domain- dependent entities. Recall that in our semantic annota- tion, the set of named entity types is much broader than in general NER because we use the named entities to cover all semantically important word groups (e.g. named enti- ties like AdditionalRequirements or AccommodationType). The details about string similarity matching will be ex- plained later in section 7.1. OntologyNER. One of the key features of our system is that the NLU module is independent of the back-end KB. However, some systems from the related work (e.g. [11] or [20]) use the KB as a valuable source for recognizing various domain entities, such as places or various facil- ities. Therefore we also developed OntologyNER which exploits the knowledge stored in the KB. This vocabulary- based NER search for the named entity candidates using instance labels from the KB. The search uses string sim- ilarity matching which will be explained later in section 7.1. 7. Semantic Interpretation and Search After the NL query is automatically annotated with its semantic description, it must be interpreted in order to find the desired results. Since our semantic represen- tation of an NL query (see section 5) is independent of a particular KB, it is transformed into an ontology query language (SPARQL) to perform a search in the KB. As an input, a semantic annotation of an NL query is given. As an output, a SPARQL query for the back-end KB is produced. The transformation consists of a set of domain-specific heuristic hand-written rules. Each rule processes partic- ular triples from the NL query annotation and outputs a corresponding SPARQL snippet. An example in Fig- ure 5 demonstrates how the semantic annotation of the NL query is transformed into the corresponding SPARQL query. 7.1. Matching Entities to Ontology Instances Some named entitites recognized in the NL query, such as City or Place, are related to their corresponding ontol- ogy instances. For example, if the NL query search for a certain accommodation option in a particular location (i.e., City[”Praha”] ), the resulting SPARQL query must restrict the search to that entity instance (i.e., ?Accommoda- tion p:isLocatedIn p:cPraha .). Therefore, named en- tity disambiguation is necessary to map the recognized named entities to KB instances. The mapping between named entities and the KB in- stances is based upon string similarity (or string distance). In order to select the best string metrics we manually created a small subset of the ontology instances paired with the named entities (50 pairs) and measured the accu- racy of various types of string metrics. The best accuracy about 96% was achieved with the Jaro-Winkler distance [43]. Only a selected subset of the named entity types is matched using this technique, namely all instances of the ontology class Place and Facility. Using this string similarity-based named entity disam- biguation, three types of errors may occur. First, the named entity is expressed in such a way that the string similarity between the words and the KB instance is very low. Second, the desired named entity has no correspond- ing instance in the KB. These two types of errors are quite obvious. The third type of error is caused by the ambi- guity within the KB. In some cases, two very similar in- stances have a very similar description. For example, the KB contains more than 30 instances with Praha (Prague) in their labels (referring to e.g. District, City or a city 10 Figure 5: An example of tranforming an annotated NL query into SPARQL. The NL query semantic annotation sample contains two triples in this example, namely (Accommodation, hasRoomTypeRequirement, RoomTypeRequirement[”jednol̊užkové”]) and (RoomTypeR- equirement[”jednol̊užkové”], hasNumberOfRoomsReq, Number[”dva”]) where RoomTypeRequirement and Number represent named entities. These two triples are translated into the two corresponding SPARQL snippets on the right-hand side. Notice that the NL query annotation is created according to the NL query ontology (see Figure 4), whereas the SPARQL query is bound to the particular back-end KB (see Figure 2). part). Currently, we do not deal with this kind of ambigu- ity. However, it could be easily solved i.e. by choosing the most likely instance learned from the training data or by showing a clarification dialogue (the case of [11] or [9]). 7.2. Interpretation of Named Entities with Numbers and Dates The named entitity types Number and Date are not related to any ontological instances (they do not directly represent any instance from the KB). The LINGVOPar- ser tool used to recognize this type of named entities (see section 6.1.1) also has built-in support for the semantic in- terpretation of the content of the named entities, namely numbers and dates. The tool utilizes hand-written seman- tic grammars for translation into a computer representa- tion (e.g. integers or dates). This allows interpretations of these entities to be integrated into the resulting SPARQL query. 7.3. Semantic Reasoning and Result Representation Given all the mechanisms introduced in the previous sections, it is possible to formulate the SPARQL query. Once the query is created, it can be passed to an ontol- ogy reasoner. Currently, various OWL reasoner Java im- plementations are available. They also vary in their OWL expressivity, which means what subset of the OWL seman- tics they can handle. To select a suitable reasoner for our system, a simple test was carried out. The most important OWL feature we use is the transitivity among location in- stances (see section 4.2). Thus, a simple SPARQL query was executed to obtain all instances in a particular loca- tion and its sub-locations. Table 3 outlines the results of this test. We tested reasoners from the Jena package as well as the Pellet reasoner. Only reasoners that sup- port transitive properties are listed. Given the measured performance among reasoners, the Pellet reasoner was se- lected for our system. The performance of the reasoners will be discussed later in section 9. Reasoner class Total time PelletReasoner 15m 14s OWLMicroReasoner 55m 5s OWLFBRuleReasoner 5h 15m 26s Table 3: Performance comparison of various OWL reasoners. The result of the semantic search is a set of accom- modation instances. The system displays all the retrieved accommodation instances together with their properties. 8. Evaluation In this section, we thoroughly evaluate our SWSNL system. The section is structured as follows. First, we focus on the evaluation of the Natural Language Under- standing module (introduced in section 6). Various com- binations of NER are also tested in this part of the eval- uation. Second, we present an end-to-end evaluation of our system. We describe the baseline approach and com- pare our system results with the baseline results. Third, we evaluate our NLU module on another two corpora and languages to demonstrate its portability capabilities. Our main corpus (the NL Query corpus on the accom- modation domain, see section 4.1) consists of 68 queries. We split the data into training and test set using the leave- one-out cross-validation [44]. In this method, each fold of the cross validation has only a single test example and all the rest of the data is used in training. We use this method because the available data is very small. 8.1. Natural Language Understanding Module Evaluation The NLU module (section 6) works in two stages: the Named Entity Recognition and the ontology-based statis- tical model for the semantic analysis. Its evaluation will be split into two sections, too. We use standard metrics, namely the precision (p), the recall (r) and the F-measure 11 Entity type \ NER WSim MaxEnt Onto LinPar AccommodationType 0.90 0.0 0.0 0.0 Place 0.37 0.45 0.22 0.0 Date 0.0 0.22 0.0 0.28 NumberOfPersons 0.62 0.0 0.0 0.0 RelativeLocationReq 0.70 0.0 0.0 0.0 City 0.33 0.70 0.35 0.0 Price 0.72 0.13 0.0 0.0 AdditionalReq 0.57 0.0 0.04 0.0 Number 0.32 0.10 0.0 0.37 RelativePrice 0.87 0.0 0.0 0.0 RoomTypeReq 0.55 0.0 0.0 0.0 Table 4: F-measure performance of various NER on different entity types. Some NER names are abbreviated; WSim is WordSimilari- tyTrainableNER and LinPar is LINGVOParserNER. (Fm): p = tp tp + fp , r = tp tp + fn , Fm = 2pr p + r , (10) where tp are true positives, fp are false positives, and fn are false negatives [45]. 8.1.1. Evaluation of NER The system uses four types of NER, namely the Max- EntNER, the WordSimilarityTrainableNER, the LINGVO- Parser, and the OntologyNER (see section 6.1.1). The cor- rect result (true positive) is when the named entity from the annotated corpus exactly matches the entity returned by NER, in terms of both the NE type and the word span. The results are shown in Table 4 and can be interpreted as follows. The second column (WSim) displays the results for the WordSimilarityTrainableNER. This NER is based on lexical similarity between the training examples and the unknown word(s). This NER performs best mostly for domain-dependent entities, such as AdditionalReq or Ac- commodationType, because these entities are expressed us- ing similar words in the data. We assume that this ap- proach is sufficient for domain-dependent entities. Our assumption that the MaxEntNER, even though it is trained on a completely different corpus, would yield acceptable results for general named entities, such as cities or places, is validated by the results in the third column (MaxEnt). This NER outperforms the rest of the NERs in City by almost 100% relatively. The poor results obtained by OntologyNER in the fourth column have the following explanation. This NER ex- ploits the lexical similarity between KB instances and the word(s) being recognized. The results for named entities City and Place are thus similar to the results from the sec- ond column (WordSimilarityTrainableNER). However, the performance of the OntologyNER is surpassed by the Max- EntNER, as it involves many features for entitity recogni- tion, such as the local context, word shape, etc. Finally, the last column shows that for complex named entities in natural language, such as dates or numbers, the entity-specific hand-crafted NER LINGVOParser would be the best choice. CombinedNER. For each entity type we selected the best performing NER from Table 1 and created the Combined- NER. This NER will be used in the overall evaluation and the semantic analysis evaluation in the next sections. How- ever, we expect that with larger training set it would be reasonable to use a more clever NER combination, e.g. a linear interpolation model with combination parameters set by the Expectation-Maximization algorithm. 8.1.2. Semantic Analysis Evaluation The result of the semantic analysis is a semantic an- notation of a NL query (recall section 5.3). However, the annotation of a NL query can also be seen as a set of connected triples. For each NL query, we compute preci- sion, recall and F-measure using the set of the gold triples (the human annotation from the NL query corpus) and the triples returned by the semantic analysis module.18 A true positive result triple is such a triple that matches the gold triple as follows: • The predicates are the same. • The objects/subjects have the same ontology class, i.e. and match while and do not. • Furthermore, if the objects/subjects are named en- tities, their positions in the sentence and their con- tents are the same, i.e. and do not match. The results of semantic analysis module (with the Com- binedNER incorporated) are shown in Table 5. The most important factor that influences the performance is the small size of the NL query corpus. Only 67 training ex- amples19 are not sufficient to train the semantic model properly. Moreover, the very important part of the Com- binedNER for recognizing non-general named entities, the WordSimilarityTrainableNER, is also completely trained from the data. p r Fm CombinedNER + Semantic Model 0.66 0.34 0.45 Table 5: The results of the semantic analysis with the CombinedNER 18We also considered some other measures developed for seman- tic annotation evaluation, but none of them was suitable for our purposes. BDM [46] computes semantic similarity between two an- notations of the same token in a document. The tree edit distance [47] expects the semantic annotation to form a tree. 19Note that we still use the leave-one-out cross-validation which ensures that the training and test data are distinct. 12 Figure 6: The semantic model performance depending on the train- ing set size. In the following test, we evaluated the semantic ana- lysis model’s dependency on the training data size. Using the same NL query corpus, we simulated 100% accuracy of the NER module (called MockNER) and changed the proportions of the training and test sets. Given the results from Figure 6, we can draw the following conclusion. Even with small training data, most of the returned triples are correct (p → 1). However, to obtain all triples of the semantic annotation of a single NL query, a larger training set is probably required (r is growing more slowly). 8.2. End-to-end Evaluation Each query from our NL query corpus has its own cor- responding set of correct search results (gold results), as described in section 4.3.1. This allows us to evaluate the complete SWSNL system. We performed three experi- ments. First, the baseline fulltext-based search system was evaluated. Second, we tested our SWSNL system. Third, we performed the same test with a simulation of 100% correct Semantic Analysis module. For each query the system returns a set of accommoda- tion instances. There is no ranking of the results because the KB querying is purely boolean which means that the single result either matches or not and no further discrim- ination is given. Thus, for each NL query we compute the precision, the recall, and the F-measure of the returned re- sult set according to the gold set. The final result is then the average of the results for each query (the Macro F-1 measure). 8.2.1. Baseline System As a baseline for the end-to-end evaluation we used the fulltext search. The crawled documents from the source website were indexed using Apache Lucene.20 In the pre- processing, the best available Czech stemmer [48] was used, 20http://lucene.apache.org/core/ p r Fm (a) Baseline fulltext 0.02 0.42 0.03 (b) SWSNL 0.25 0.70 0.37 (d) Correct semantic annotation 0.73 0.84 0.78 Table 6: The end-to-end results of (a) the baseline fulltext search system, (b) the search using our SWSNL system, and (c) the search using the correct semantic annotation of NL queries. the stop-words were filtered and the content was lower- cased. In this experiment, each NL query was treated as a bag of words with stop-words removed. Other standard measures used in Information Retrieval (IR) for evaluating ranked results are Mean Average Pre- cision, Precision at k (P@k) and R-Precision (R-Prec), see e.g. [45]. These metrics are suitable only for ranked results as retrieved e.g. by a fulltext search. Since we compare the baseline with our SWSNL system which does not pro- duce ranked results, it is not reasonable to measure the baseline using these metrics. The baseline system results are shown in Table 6, row (a). The results show very clearly that the keyword-based search performs poorly for such complicated tasks. The relatively high recall means that there are many hits in the retrieved set. The low precision is caused by the fact that the search model is not discriminative (i.e. it does not filter the results according to a certain location, etc.). However, this might serve as an approximation of results that would be returned by the current state-of-the-art general-purpose search engines. The parameter N for selecting only the top N retrieved documents was set empirically in order to maximize the F-measure. 8.2.2. SWSNL System Results Table 6, row (b) shows the overall results of the SWSNL system. The most important conclusion, given these re- sults, is that our semantic web search significantly out- performs the fulltext search. Whereas the fulltext-based search simply fails in this complex search task, our SWSNL system is able to yield acceptable results. The obtained F-measure of 36% may seem low but high numbers are not common in challenging IR tasks. 8.2.3. SWSNL System Results With Simulation of Correct Semantic Analysis We also tested the upper bounds of our SWSNL ap- proach by simulating the 100% correct Semantic Analy- sis module. The numbers in Table 6, row (c), confirm that some results cannot be found by the purely struc- tured search. This topic was already discussed in section 4.3 where the process of creating the gold results was pre- sented. In more detail, some of the results in the gold data were manually assigned to the queries according to their textual description. Such results are, however, not cur- rently retrieved by our SWSNL system, since the search operates only on structured properties (cf. section 4.2.1). The second reason that affects the performance in this test 13 is that some errors and ambiguity are encountered during mapping named entities to KB instances (as already dis- cussed in section 7.1). 8.3. Evaluation of NLU Module on Other Domains and Languages Although we put our main effort into the end-to-end evaluation on the accommodation domain in the Czech language, we also tested our NLU component on two dif- ferent corpora in order to prove its usability across do- mains and languages.21 In the following section, only the NLU module is evaluated as there exist no databases/KBs for these datasets for an end-to-end evaluation. 8.3.1. Czech Public Transportation Queries Corpus The Czech Public Transportation Queries Corpus (Con- nectionsCZ) is a NL query corpus in the Czech language. It contains 238 queries in the public transportation do- main. Most of the queries ask for a certain connection from one place to another, for train types, travel times, etc. The corpus is a subset of a larger corpus used in [32]. This type of corpus follows the fashion of corpora for Spoken Language Understanding (SLU) or human-com- puter spoken dialogue, such as ATIS [49], etc. Since our current work deals with search and Semantic Web tech- nologies, we are not convinced that our SWSNL system would be the best approach for this domain and task. Nev- ertheless, this evaluation serves as proof that our semantic model is domain independent and can perform well if a larger training corpus is available. In order to port the corpus into our system, we created an ontology for an NL query in this particular domain (see section 5). The domain-dependent query ontology is shown in Figure 7. The queries were then annotated according to this ontology. We conducted the two following experiments. In the first one, we used the simulation of 100% NER. Figure 8 (a) illustrates the improvement in the semantic analysis as the training set size grows. In the second experiment, we used the same NER con- figuration as in the accommodation domain (namely the Combined NER, see section 8.1.1). The results are shown in Figure 8 (b). The improvement in the performance is caused by the larger training size for both the NER and the semantic model. 8.3.2. ATIS Corpus Subset The ATIS (Air Travel Information Service) [49] cor- pus contains travel information queries in English and it has been widely used in NLU research. It is one of the commonly used corpora for testing of semantic analysis systems, used for evaluation in e.g. [50], [51], [52] and [53]. 21Both corpora can be found at http://liks.fav.zcu.cz/swsnl/. Figure 7: The ontology of the queries in the public transportation domain. The dotted lines represent the is-a relation. We extracted a small subset of the corpus (348 sen- tences) and annotated them using the same query ontology as for the Czech Public Transportation Queries Corpus, see previous section and Figure 7. Since our NER components are able to deal with the Czech language only, we performed only one test with the simulation of 100% NER on the ATIS corpus. The test is intended to evaluate the performance of our semantic model on a different language. The results are shown in Figure 8 (c). We decided to test our system on the ATIS corpus just to get an insight into the possibilities of porting the seman- tic model to another language. This should not serve as a comparison with other state-of-the-art semantic parsers, for two reasons. Firstly, our semantic annotation is differ- ent from the original frame-based ATIS annotation. Sec- ondly, we use a small subset of the whole corpus, which also negatively affects the performance. The complete cor- pus was not used because a manual annotation according to our semantic representation was required and it was not feasible to annotate the complete corpus. The obtained results illustrate that our semantic anal- ysis model can be used across different languages. No tweaking is required to adapt the system to English. The achieved performance (about 0.73 F-measure) is satisfac- tory, given the fact that the system was primarily devel- oped on a different domain and language. 8.3.3. Final Remarks to Evaluation We will also comment on the fact that our system was not evaluated on the already-mentioned Mooney Geoquery dataset (see section 2.3). Supported by our knowledge of question answering [54] the Mooney Geoquery is a typi- 14 Figure 8: The evaluation of the NLU module on different domains and languages w.r.t. the increasing sizes of the training data. (a) Semantic analysis on the Czech Public Transportation Queries Corpus with a simulation of a 100% correct NER. (b) Semantic analysis on the Czech Public Transportation Queries Corpus with the Combined NER. (c) Semantic analysis on the English ATIS corpus with a simulation of a 100% correct NER. cal factoid question-answering corpus. The answers may be lists of instances (e.g. ”what states border florida ?”), numbers (e.g. ”how long is the rio grande river ?”), or an aggregated result (e.g. ”how many square kilometers in the us ?”). On contrary, our system is developed as a replacement of a form-based UI for information retrieval. The types of queries in our corpus are significantly differ- ent from the Mooney Geoquery queries. In our case, the user wants to find a list of instances that fulfill his or her requirements instead of finding some facts about entities. 9. Conclusion This final section outlines the major contributions of this article as well as open issues and future work. A Semantic Web Search using Natural Language is, beyond all doubt, a challenging task. In order to develop a real- world system that could be deployed for a public use, many theoretical and practical problems must be resolved. We will examine these problems from various perspectives and also propose solutions to some of them. 9.1. Performance Issues One of the most critical issues which actually prevents our system from being tested in the real Web environment is the performance of the ontology reasoning. As already shown in Table 3 on page 11, we tested several OWL rea- soners and their ability to deal with transitive properties. The Pellet reasoner requires approximately 15 minutes on average to answer a single SPARQL query with transi- tive relations. We suspect that the reason for this is the size of our KB (281,686 triples, see section 4.2.1). However, to the best of our knowledge, there is no room for improve- ment without changing the back-end Semantic Web tools completely. Possible solutions. First, another ontology storage engine can be used, e.g. Sesame22. Second, it is possible to do materialisation first and then querying (as is OWLIM23). 9.2. Problems Caused by Actual Web Data Many practical issues related to the KB content were pointed out in section 4.2.1. The most important problem is the missing structured data. In other words, the data from the source Web site are incomplete. However, this will be the case in most of the systems that use public Web sources for populating their KBs. It is not feasible either to check or even to correct the data manually. Another problem is caused by the inconsistent source data. In our scenario, various KB instances have the same label which makes the mapping from the named entities to ontology instances (NE disambiguation) much harder (see section 7.1). Possible solutions. Better data post-processing after the information extraction step. Nevertheless, a huge manual effort would be required. 9.3. Future Work There are four directions that are worth exploring fur- ther: 22http://www.openrdf.org/ 23http://www.ontotext.com/owlim 15 Larger NL query corpus. The main limitation of our cur- rent semantic model performance is the small size of the corpus. Thus, it would be beneficial to obtain more data in e.g. a bootstrapping manner when the prototype is de- ployed on a testing server and is accessible by the public. Integrating fulltext search. A combination of structured search and fulltext search is a promising future task, based upon our preliminary research. Better Named Entity Recognition. The results definitely show that the NER component is crucial in the semantic search. The better the NER is, the better the semantic model performs and the more precise the search results are. Performance Improvement. Replacing the back-end with more scalable Semantic Web tools or with a relational database. 9.4. Notes to Related Work We would also like to discuss some of our findings re- garding the related work. Since our goal was to design a fully functional real-world system, many details must have been taken into account. However, in some related work these details were ignored od hidden, from our point of view. Firstly, many systems operate either on small on- tologies [10, 9] or simple domains [55, 17] therefore it can- not be assumed how would these approches perform on a decent-scale ontologies. Secondly, the natural language queries used for development and testing of some systems are mostly collected by the research teams themselves or by their students [17, 56, 13, 14, 57, 26]. In our opinion, a different way of obtaining queries should be considered in a real-world scenario to avoid any potential skewness in the data. Thirdly, many systems rely on full syntax processing of the queries and a heuristic processing of the semantics in their NLU component [9, 58, 10, 11]. Although syntactic parsing and using rules for NLU have been a widely used approach, however, in some cases this may not a feasible solution, e.g. when a syntactic parser is not available for that particular language. Finally, an exhaustive evalua- tion and a realistic discussion of the limitations is missing in some systems [59, 14]. 9.5. Main Contributions This article describes an attempt to develop a fully- functional semantic search system with a natural language interface. The following list summarizes the main contri- butions. The complete accommodation web portal was converted to the OWL ontology which allows a geospatial search us- ing an automatically-created hierarchy of locations. We also identified many practical problems that are likely to arise when a system must deal with noisy data obtained from the Web. A corpus of natural language queries in the Czech lan- guage was collected using a social media-based approach. We proposed a statistical semantic model for semantic analysis of natural language queries. It is a language- independent model which is trained from labeled data and does not depend on syntactic parsing. We created an evaluation dataset on which our system was tested. This dataset may be a valuable resource for any related research on natural language interfaces operat- ing on ontologies or as a front-end to information retreival systems. We offer this dataset and two other corpora un- der a licence which allows both research and commercial purposes. Acknowledgments This work was supported by grant no. SGS-2010-028 and by the European Regional Development Fund (ERDF), project “NTIS - New Technologies for Information Soci- ety”, European Centre of Excellence, CZ.1.05 /1.1.00/02.0090. Appendix A. Query Examples Here are a few examples of NL queries in Czech with their English translation from the NL query corpus as in- troduced in 4.1. • Ráda bych se ubytovala v Karľstejně s výhledem na hrad a královské vinice, nejlépe s polopenźı, parková- ńım a úschovnou kol, pro 2 lidi od pátku do neděle. — I’d like to have accommodation for two people in Karľstejn with a good view of the castle, including half-board, a parking lot, and a bike to hire from Friday till Sunday. • Jsme dva a chceme strávit jarńı prázdniny v maleb- ném hotýlku nebo apartmánu v Beskydech, bĺızko mo- dré sjezdovky, předpokládáme lyžárnu, vyžadujeme polopenzi. Bazén, pingpong, wellness a podobné le- grácky potěš́ı, ale nejsou nezbytné. — Two persons want to spend the spring holidays in a cute hotel or apartment in the Beskydy Mountains. We require a ski-locker room and full board. Swimming pool, a ping-pong table and some wellness will be a plus but that’s not necessary. • Hledám levné v́ıkendové ubytováńı pro 3 odoby v Jin- dřichově Hradci nebo bĺızkém okoĺı, v bĺızkosti p̊ujčov- ny j́ızdńıch kol nebo s možnost́ı uskladněńı vlastńıch, pokoj nejlépe s krbem a možnost́ı připojeńı na inter- net, vlastńı sociálńı zař́ızeńı podmı́nkou. — I’m look- ing for cheap weekend accommodation for 3 persons in the area of Jindřich̊uv Hradec, near to a bike- rental store or with a bike locker room. Room in- cluding a fireplace, Internet and a private bathroom. 16 • Hledám hotel na pořádáńı mezinárodńı konference dne 25.10.. Hotel muśı mı́t sál pro minimálně 100 lid́ı, wifi pokryt́ı, možnost uspořádáńı večeře, a mini- málně 10 dv̊ujl̊užkových pokoj̊u. Upřednostňuji hotel s vlastńı prezentačńı technikou — I’m looking for a hotel suitable for hosting an international conference on October 25. We require a large hall (100 people at least), Wi-Fi, banquet, conference equipment. 10 double-bed rooms are minimum. Appendix B. Complete NL Query Annotation Ex- ample Please refer to Figure B.9. References [1] M. L. Wilson, B. Kules, m. c. schraefel, B. Shneiderman, From keyword search to exploration: Designing future search inter- faces for the web, Foundations and Trends in Web Science 2 (1) (2010) 1–97. [2] E. Demidova, P. Fankhauser, X. Zhou, W. Nejdl, Divq: di- versification for keyword search over structured databases, in: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, ACM, New York, NY, USA, 2010, pp. 331–338. [3] H. He, W. Meng, C. Yu, Z. Wu, Constructing interface schemas for search interfaces of web databases, in: A. Ngu, M. Kitsure- gawa, E. Neuhold, J.-Y. Chung, Q. Sheng (Eds.), Web Informa- tion Systems Engineering – WISE 2005, Vol. 3806 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2005, pp. 29–42. [4] E. Kaufmann, A. Bernstein, How useful are natural lan- guage interfaces to the semantic web for casual end-users?, in: Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, ISWC’07/ASWC’07, Springer-Verlag, Berlin, Heidelberg, 2007, pp. 281–294. [5] E. Prud’hommeaux, A. Seaborne, SPARQL Query Language for RDF, W3C Recommendation (2008). [6] K. Elbedweihy, S. N. Wrigley, F. Ciravegna, Evaluating se- mantic search query approaches with expert and casual users, in: 11th International Semantic Web Conference (ISWC2012), Boston, USA, 2012. [7] K. Elbedweihy, S. N. Wrigley, F. Ciravegna, D. Reinhard, A. Bernstein, Evaluating semantic search systems to identify future directions of research, in: R. Garćıa-Castro, N. Lyndon, S. N. Wrigley (Eds.), Second International Workshop on Eval- uation of Semantic Technologies, no. 843 in CEUR Workshop Proceedings, 2012, pp. 25–36. [8] V. Lopez, M. Fernández, E. Motta, N. Stieler, PowerAqua: Sup- porting users in querying and exploring the Semantic Web, Se- mantic Web (2011) 1–17. [9] V. Lopez, V. Uren, E. Motta, M. Pasin, AquaLog: An ontology- driven question answering system for organizational semantic intranets, Web Semantics 5 (2) (2007) 72 – 105. [10] P. Cimiano, P. Haase, J. Heizmann, M. Mantel, R. Studer, To- wards portable natural language interfaces to knowledge bases – the case of the orakel system, Data and Knowledge Engineering 65 (2) (2008) 325 – 354. [11] D. Damljanovic, M. Agatonovic, H. Cunningham, Natural lan- guage interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction, in: L. Aroyo, G. Antoniou, E. Hyvönen, A. ten Teije, H. Stucken- schmidt, L. Cabral, T. Tudorache (Eds.), The Semantic Web: Research and Applications, Springer Berlin / Heidelberg, 2010, pp. 106–120. [12] Cunningham et al., Text Processing with GATE (Version 6), university of Sheffield Department of Computer Science (April 2011). [13] L. R. Tang, R. J. Mooney, Using multiple clause constructors in inductive logic programming for semantic parsing, in: Proceed- ings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, 2001, pp. 466–477. [14] C. Wang, M. Xiong, Q. Zhou, Y. Yu, Panto: A portable nat- ural language interface to ontologies, in: Proceedings of the 4th European conference on The Semantic Web: Research and Applications, ESWC ’07, Springer-Verlag, Berlin, Heidelberg, 2007, pp. 473–487. [15] E. Kaufmann, A. Bernstein, L. Fischer, NLP-Reduce: A ”näıve” but Domain-independent Natural Language Interface for Query- ing Ontologies, in: 4th European Semantic Web Conference (ESWC 2007), 2007, pp. 1–2. [16] Óscar Ferrández, R. Izquierdo, S. Ferrández, J. L. Vicedo, Ad- dressing ontology-based question answering with collections of 17 Figure B.9: This example displays a complete semantic annotation of a query from the accommodation corpus. The query can be translated as ”I would like one double-bed room and two four-bed rooms including breakfast somewhere in Pec pod Sněžkou or Harachov from 25. 12. till 1. 1.”. We added the translation to the corresponding tokens for easy understanding. 18 user queries, Information Processing & Management 45 (2) (2009) 175 – 188. [17] A. Frank, H.-U. Krieger, F. Xu, H. Uszkoreit, B. Crysmann, B. Jörg, U. Schäfer, Question answering from structured knowl- edge sources, Journal of Applied Logic 5 (1) (2007) 20 – 48. [18] C. Pollard, I. A. Sag, Information-based syntax and semantics: Vol. 1: fundamentals, Center for the Study of Language and Information, Stanford, CA, USA, 1988. [19] A. Copestake, D. Flickinger, C. Pollard, I. Sag, Minimal re- cursion semantics: An introduction, Research on Language and Computation 3 (2005) 281–332. [20] D. Damljanovic, V. Tablan, K. Bontcheva, A text-based query interface to owl ontologies, in: N. Calzolari, K. Choukri, B. Mae- gaard, J. Mariani, J. Odjik, S. Piperidis, D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), European Language Re- sources Association (ELRA), Marrakech, Morocco, 2008, pp. 361–375. [21] G. Zenz, X. Zhou, E. Minack, W. Siberski, W. Nejdl, From keywords to semantic queries–incremental query construction on the semantic web, Web Semantics 7 (3) (2009) 166 – 176. [22] S. Kara, Özgür Alan, O. Sabuncu, S. Akpınar, N. K. Cicekli, F. N. Alpaslan, An ontology-based retrieval system using se- mantic indexing, Information Systems 37 (4) (2012) 294 – 305. [23] B. Fazzinga, T. Lukasiewicz, Semantic search on the web, Se- mantic Web 1 (1-2) (2010) 89–96. [24] D. Damljanovic, V. Devedžic, Semantic Web and E-Tourism, in: M. Khosrow-Pour (Ed.), Encyclopedia of Information Science and Technology, Second Edition, IGI Global, 2009, pp. 3426– 3432. [25] H. Knublauch, Ontology-Driven Software Development in the Context of the Semantic Web: An Example Scenario with Pro- tege/OWL, in: D. S. Frankel, E. F. Kendall, D. L. Mcguinness (Eds.), 1st International Workshop on the Model-Driven Se- mantic Web (MDSW2004), 2004. [26] J. Ruiz-Mart́ınez, D. Castellanos-Nieves, R. Valencia-Garćıa, J. Fernández-Breis, F. Garćıa-Sánchez, P. Vivancos-Vicente, J. Castejón-Garrido, J. Camón, R. Mart́ınez-Béjar, Accessing touristic knowledge bases through a natural language interface, in: D. Richards, B.-H. Kang (Eds.), Knowledge Acquisition: Approaches, Algorithms and Applications, Springer Berlin / Heidelberg, 2009, pp. 147–160. [27] D. Yoo, Hybrid query processing for personalized information retrieval on the semantic web, Knowledge-Based Systems 27 (0) (2012) 211 – 218. [28] J. Scicluna, N. Steinmetz, Modelling e-tourism services and bundles, in: R. Law, M. Fuchs, F. Ricci (Eds.), Information and Communication Technologies in Tourism 2011, Springer Vi- enna, 2011, pp. 403–415. [29] J. Scicluna, N. Steinmetz, M. Zaremba, Service bundling with seekda! dynamic shop, in: U. Gretzel, R. Law, M. Fuchs (Eds.), Information and Communication Technologies in Tourism 2010, Springer Vienna, 2010, pp. 369–380. [30] D. Damljanović, K. Bontcheva, Towards enhanced usability of natural language interfaces to knowledge bases, in: V. Devedžić, D. Gašević (Eds.), Web 2.0 & Semantic Web, Springer US, 2009, pp. 105–133. [31] P. Cimiano, M. Minock, Natural language interfaces: What is the problem? – a data-driven quantitative analysis, in: H. Ho- racek, E. Métais, R. Muñoz, M. Wolska (Eds.), Natural Lan- guage Processing and Information Systems, Springer Berlin / Heidelberg, 2010, pp. 192–206. [32] I. Habernal, M. Konoṕık, Semantic annotation for the lingvose- mantics project, in: Proceedings of the 12th International Con- ference on Text, Speech and Dialogue, TSD ’09, Springer- Verlag, Berlin, Heidelberg, 2009, pp. 299–306. [33] I. Habernal, Semantic web search using natural language, Ph.D. thesis, University of West Bohemia, Faculty of Applied Sciences (2012). [34] I. Habernal, M. Konoṕık, On the way towards standardized semantic corpora for development of semantic analysis systems, in: SEMAPRO 2010: The Fourth International Conference on Advances in Semantic Processing, IARIA, 2010, pp. 96–99. [35] A. Kiryakov, B. Popov, I. Terziev, D. Manov, D. Ognyanoff, Semantic annotation, indexing, and retrieval, Web Semantics 2 (1) (2004) 49 – 79. [36] P. Cimiano, S. Handschuh, S. Staab, Towards the self- annotating web, in: Proceedings of the 13th international con- ference on World Wide Web, WWW ’04, ACM, New York, NY, USA, 2004, pp. 462–471. [37] L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Red- divari, V. Doshi, J. Sachs, Swoogle: a search and metadata en- gine for the semantic web, in: Proceedings of the thirteenth ACM international conference on Information and knowledge management, CIKM ’04, ACM, New York, NY, USA, 2004, pp. 652–659. [38] D. Damljanovic, F. Amardeilh, K. Bontcheva, CA manager framework: creating customised workflows for ontology pop- ulation and semantic annotation., in: Y. Gil, N. F. Noy (Eds.), K-CAP, ACM, 2009, pp. 177–178. [39] J. Hajič, J. Panevová, E. Hajičová, J. Panevová, P. Sgall, P. Pa- jas, J. Štěpánek, J. Havelka, M. Mikulová, Prague dependency treebank 2.0, linguistic Data Consortium, Philadelphia (2006). [40] M. Konkol, M. Konoṕık, Maximum entropy named entity recog- nition for czech language, in: Proceedings of the 14th inter- national conference on Text, speech and dialogue, TSD’11, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 203–210. [41] I. Habernal, M. Konoṕık, Active tags for semantic analysis, in: Proceedings of the 11th international conference on Text, Speech and Dialogue, TSD ’08, Springer-Verlag, Berlin, Heidel- berg, 2008, pp. 69–76. [42] M. Konoṕık, I. Habernal, Hybrid semantic analysis, in: Pro- ceedings of the 12th International Conference on Text, Speech and Dialogue, TSD ’09, Springer-Verlag, Berlin, Heidelberg, 2009, pp. 307–314. [43] W. E. Winkler, String comparator metrics and enhanced de- cision rules in the Fellegi-Sunter model of record linkage, in: Proceedings of the Section on Survey Research (American Sta- tistical Association), 1990, pp. 354–359. [44] B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd Edition, Springer-Verlag New York, Inc., Se- caucus, NJ, USA, 2011. [45] C. D. Manning, P. Raghavan, H. Schütze, Introduction to Infor- mation Retrieval, Cambridge University Press, New York, NY, USA, 2008. [46] D. Maynard, W. Peters, Y. Li, Evaluating evaluation metrics for ontology-based applications: Infinite reflection., in: LREC, European Language Resources Association, 2008. [47] K.-C. Tai, The tree-to-tree correction problem, J. ACM 26 (3) (1979) 422–433. [48] L. Dolamic, J. Savoy, Indexing and stemming approaches for the czech language, Information Processing & Management 45 (6) (2009) 714 – 720. [49] D. Dahl, M. Bates, M. Brown, W. Fisher, K. Hunicke- Smith, D. Pallett, C. Pao, A. Rudnicky, E. Shriberg, J. Garo- folo, J. Fiscus, D. Danielson, E. Bocchieri, B. Buntschuh, B. Schwartz, S. Peters, R. Ingria, R. Weide, Y. Chang, E. Thayer, L. Hirschman, J. Polifroni, B. Lund, G. Kawai, T. Kuhn, L. Norton, ATIS3 Test Data, linguistic Data Con- sortium, Philadelphia (1995). [50] Y. He, S. Young, Semantic processing using the hidden vector state model, Computer Speech & Language 19 (1) (2005) 85– 106. [51] E. Iosif, A. Potamianos, A soft-clustering algorithm for au- tomatic induction of semantic classes, in: Interspeech-07, Antwerp, Belgium, 2007, pp. 1609–1612. [52] M. Jeong, G. G. Lee, Practical use of non-local features for sta- tistical spoken language understanding, Computer Speech and Language 22 (2) (2008) 148–170. [53] C. Raymond, G. Riccardi, Generative and discriminative algo- rithms for spoken language understanding, in: Interspeech-07, Antwerp, Belgium, 2007, pp. 1605–1608. 19 [54] I. Habernal, M. Konoṕık, O. Rohĺık, Question Answering, in: C. Jouis, I. Biskri, J.-G. Ganascia, M. Roux (Eds.), Next Gen- eration Search Engines: Advanced Models for Information Re- trieval, IGI Global, 2012, in press. [55] V. Tablan, D. Damljanovic, K. Bontcheva, A natural language query interface to structured information, in: Proceedings of the 5th European semantic web conference on The semantic web: research and applications, ESWC’08, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 361–375. [56] J. Ruiz-Mart́ınez, D. Castellanos-Nieves, R. Valencia-Garćıa, J. Fernández-Breis, F. Garćıa-Sánchez, P. Vivancos-Vicente, J. Castejón-Garrido, J. Camón, R. Mart́ınez-Béjar, Accessing touristic knowledge bases through a natural language interface, in: D. Richards, B.-H. Kang (Eds.), Knowledge Acquisition: Approaches, Algorithms and Applications, Vol. 5465 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2009, pp. 147–160. [57] J. Lim, K.-H. Lee, Constructing composite web services from natural language requests, Web Smantics 8 (1) (2010) 1–13. [58] N. Stratica, L. Kosseim, B. C. Desai, Using semantic templates for a natural language interface to the CINDI virtual library, Data & Knowledge Engineering 55 (1) (2005) 4 – 19. [59] M. Gao, J. Liu, N. Zhong, F. Chen, C. Liu, Semantic mapping from natural language questions to OWL queries, Computa- tional Intelligence 27 (2) (2011) 280–314. 20