Large-scale Semantic Parsing without Question-Answer Pairs Siva Reddy, Mirella Lapata, Mark Steedman School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB siva.reddy@ed.ac.uk, mlap@inf.ed.ac.uk, steedman@inf.ed.ac.uk Abstract In this paper we introduce a novel semantic parsing approach to query Freebase in natu- ral language without requiring manual anno- tations or question-answer pairs. Our key in- sight is to represent natural language via se- mantic graphs whose topology shares many commonalities with Freebase. Given this rep- resentation, we conceptualize semantic pars- ing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Free- base guided by denotations as a form of weak supervision. Evaluation experiments on a sub- set of the FREE917 and WEBQUESTIONS benchmark datasets show our semantic parser improves over the state of the art. 1 Introduction Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a com- puter to play a game are tasks requiring communi- cation with machines in a language interpretable by them. Semantic parsing addresses the specific task of learning to map natural language (NL) to machine interpretable formal meaning representations. Tra- ditionally, sentences are converted into logical form grounded in the symbols of some fixed ontology or relational database. Approaches for learning semantic parsers have been for the most part supervised, using annotated training data consisting of sentences and their cor- responding logical forms (Zelle and Mooney, 1996; Zettlemoyer and Collins, 2005; Wong and Mooney, 2007; Kwiatkowski et al., 2010). More recently, al- ternative forms of supervision have been proposed to alleviate the annotation burden, e.g., by learn- ing from conversational logs (Artzi and Zettlemoyer, 2011), from sentences paired with system behav- ior (Chen and Mooney, 2011; Goldwasser and Roth, Question What is the capital of Texas? Logical Form λx. city(x)∧capital(x,Texas) Answer {Austin} Figure 1: An example question with annotated logi- cal query, and its answer. 2011; Artzi and Zettlemoyer, 2013), via distant su- pervision (Krishnamurthy and Mitchell, 2012; Cai and Yates, 2013), from questions (Goldwasser et al., 2011; Poon, 2013; Fader et al., 2013), and question- answer pairs (Clarke et al., 2010; Liang et al., 2011). Indeed, methods which learn from question-answer pairs have been gaining momentum as a means of scaling semantic parsers to large, open-domain problems (Kwiatkowski et al., 2013; Berant et al., 2013; Berant and Liang, 2014; Yao and Van Durme, 2014). Figure 1 shows an example of a question, its annotated logical form, and answer (or denotation). In this paper, we build a semantic parser that does not require example annotations or question-answer pairs but instead learns from a large knowledge base (KB) and web-scale cor- pora. Specifically, we exploit Freebase, a large community-authored knowledge base that spans many sub-domains and stores real world facts in graphical format, and parsed sentences from a large corpus. We formulate semantic parsing as a graph matching problem. We convert the output of an open-domain combinatory categorial gram- mar (CCG) parser (Clark and Curran, 2007) into a graphical representation and subsequently map it onto Freebase. The parser’s graphs (also called un- grounded graphs) are mapped to all possible Free- base subgraphs (also called grounded graphs) by re- placing edges and nodes with relations and types in Freebase. Each grounded graph corresponds to a unique grounded logical query. During learning, our semantic parser is trained to identify which KB sub- graph best corresponds to the NL graph. Problem- capital(Austin)∧UNIQUE(Austin)∧capital.of.arg1(e,Austin)∧capital.of.arg2(e,Texas) (a) Semantic parse of the sentence Austin is the capital of Texas. capital capital unique Austin e Texas ty p e capital. of.arg1 capital. of.arg2 unique(Austin) ∧ capital(Austin)∧ capital.of.arg1(e, Austin) ∧ capital.of.arg2(e, Texas) (b) Ungrounded graph for semantic parse (a); UNIQUE means that Austin is the only capital of Texas. capital capital target x e Texas ty p e capital. of.arg1 capital. of.arg2 target(x) ∧ capital(x) ∧ capital.of.arg1(e, x) ∧ capital.of.arg2(e, Texas) {AUSTIN} (c) Query graph after removing Austin from graph (b) and its denotation. location .city capital target x m Texas ty p e location. capital.arg1 location. capital.arg2 target(x) ∧ location.city(x) ∧ location.capital.arg1(m, x) ∧ location.capital.arg2(m, Texas) location .city capital target x n Texas ty p e location. containedby.arg1 location. containedby.arg2 target(x) ∧ location.city(x) ∧ location.containedby.arg1(n, x) ∧ location.containedby.arg2(n, Texas) {AUSTIN} {AUSTIN, DALLAS, HOUSTON ... } (d) Freebase graphs for NL graph (c) and their denotations. Figure 2: Steps involved in converting a natural language sentence to a Freebase grounded graph. atically, ungrounded graphs may give rise to many grounded graphs. Since we do not make use of manual annotations of sentences or question-answer pairs, we do not know which grounded graphs are correct. To overcome this, we rely on comparisons between denotations of natural language queries and related Freebase queries as a form of weak supervi- sion in order to learn the mapping between NL and KB graphs. Figure 2 illustrates our approach for the sentence Austin is the capital of Texas. From the CCG syn- tactic derivation (which we omit here for the sake of brevity) we obtain a semantic parse (Figure 2a) and convert it to an ungrounded graph (Figure 2b). Next, we select an entity from the graph and replace it with a variable x, creating a graph corresponding to the query What is the capital of Texas? (Figure 2c). The math function UNIQUE on Austin in Figure 2b indi- cates Austin is the only value of x which can satisfy the query graph in Figure 2c. Therefore, the denota- tion1 of the NL query graph is {AUSTIN}. Figure 2d shows two different groundings of the query graph in the Freebase KB. We obtain these by replacing edges and nodes in the query graph with Freebase relations and types. We use the denotation of the NL query as a form of weak supervision to select the best grounded graph. Under the constraint that the denotation of a Freebase query should be the same as the denotation of the NL query, the graph on the left hand-side of Figure 2d is chosen as the correct grounding. Experimental results on two benchmark datasets consisting of questions to Freebase — FREE917 (Cai and Yates, 2013) and WEBQUESTIONS (Berant 1The denotation of a graph is the set of feasible values for the nodes marked with TARGET. et al., 2013) — show that our semantic parser im- proves over state-of-the-art approaches. Our contri- butions include: a novel graph-based method to con- vert natural language sentences to grounded seman- tic parses which exploits the similarities in the topol- ogy of knowledge graphs and linguistic structure, to- gether with the ability to train using a wide range of features; a proposal to learn from a large scale web corpus, without question-answer pairs, based on denotations of queries from natural language statements as weak supervision; and the develop- ment of a scalable semantic parser which besides Freebase uses CLUEWEB09 for training, a corpus of 503.9 million webpages. Our semantic parser can be downloaded from http://sivareddy.in/ downloads. 2 Framework Our goal is to build a semantic parser which maps a natural language sentence to a logical form that can be executed against Freebase. We begin with CLUEWEB09, a web-scale corpus automatically an- notated with Freebase entities (Gabrilovich et al., 2013). We extract the sentences containing at least two entities linked by a relation in Freebase. We parse these sentences using a CCG syntactic parser, and build semantic parses from the syntactic out- put. Semantic parses are then converted to seman- tic graphs which are subsequently grounded to Free- base. Grounded graphs can be easily converted to a KB query deterministically. During training we learn which grounded graphs correspond best to the natural language input. In the following, we pro- vide a brief introduction to Freebase and its graph structure. Next, we explain how we obtain seman- tic parses from CCG (Section 2.2), how we convert them to graphs (Section 2.3), and ground them in Freebase (Section 2.4). Section 3 presents our learn- ing algorithm. 2.1 The Freebase Knowledge Graph Freebase consists of 42 million entities and 2.5 bil- lion facts. A fact is defined by a triple containing two entities and a relation between them. Entities represent real world concepts, and edges represent relations, thus forming a graph-like structure. A Freebase subgraph is shown in Figure 3 with usa natasha obama p q US president r s Columbia University m Barack Obama n Michelle Obama m m n n education .university Bachelor of Arts 1992 h e a d q u a rt e rs .c o u n tr y h e a d q u a rt e rs .o rg a n is a ti o n person. nationality.arg2 person. nationality.arg1 pe rs on . pa re nt s. ar g2 pe rs on . pa re nt s. ar g1 p e rso n . p a re n ts.a rg 2 p e rso n . p a re n ts.a rg 1 education. institution education .student marriage .spouse marriage .spouse education .institution education .degree e d u c a ti o n .d e g re e e d u c a ti o n .s tu d e n t m arriage .from m arriage .spouse m a rr ia g e .f ro m m a rr ia g e .s p o u se ty p e ty p e Figure 3: Freebase knowledge graph. Entities are represented by rectangles, relations between enti- ties by edges, mediator nodes by circles, types by rounded rectangles. rectangles denoting entities. In addition to sim- ple facts, Freebase encodes complex facts, repre- sented by multiple edges (e.g., the edges connect- ing BARACK OBAMA, COLUMBIA UNIVERSITY and BACHELOR OF ARTS). Complex facts have in- termediate nodes called mediator nodes (circles in Figure 3 with the same identifiers e.g., m and n). For reasons of uniformity, we assume that simple facts are also represented via mediator nodes and split single edges into two with each subedge going from the mediator node to the target node (see per- son.nationality.arg1 and person.nationality.arg2 in Fig- ure 3). Finally, Freebase also has entity types defin- ing is-a relations. In Figure 3 types are represented by rounded rectangles (e.g., BARACK OBAMA is of type US president, and COLUMBIA UNIVERSITY is of type education.university). 2.2 Combinatory Categorial Grammar The graph like structure of Freebase inspires us to create a graph like structure for natural lan- guage, and learn a mapping between them. To do this we take advantage of the representational power of Combinatory Categorial Grammar (Steed- man, 2000). CCG is a linguistic formalism that tightly couples syntax and semantics, and can be used to model a wide range of language phenom- Cameron directed Titanic NP (S\NP)/NP NP Cameron λyλx. directed.arg1(e,x) Titanic ∧ directed.arg2(e,y) > S\NP λx. directed.arg1(e,x) ∧ directed.arg2(e, Titanic) < S directed.arg1(e, Cameron)∧ directed.arg2(e, Titanic) 1 Figure 4: CCG derivation containing both syntactic and semantic parse construction. ena. CCG is well known for capturing long-range dependencies inherent in constructions such as co- ordination, extraction, raising and control, as well as standard local predicate-argument dependencies (Clark et al., 2002), thus supporting wide-coverage semantic analysis. Moreover, due to the transparent interface between syntax and semantics, it is rela- tively straightforward to built a semantic parse for a sentence from its corresponding syntactic derivation tree (Bos et al., 2004). In our case, the choice of syntactic parser is moti- vated by the scale of our problem; the parser must be broad-coverage and robust enough to handle a web-sized corpus. For these reasons, we rely on the C&C parser (Clark and Curran, 2004), a general- purpose CCG parser, to obtain syntactic deriva- tions. To our knowledge, we present the first at- tempt to use a CCG parser trained on treebanks for grounded semantic parsing. Most previous work has induced task-specific CCG grammars (Zettlemoyer and Collins, 2005, 2007; Kwiatkowski et al., 2010). An example CCG derivation is shown in Figure 4. Semantic parses are constructed from syntac- tic CCG parses, with semantic composition be- ing guided by the CCG syntactic derivation.2 We use a neo-Davidsonian (Parsons, 1990) semantics to represent semantic parses.3 Each word has a semantic category based on its syntactic cate- gory and part of speech. For example, the syn- tactic category for directed is (S\NP)/NP, i.e., it 2See Bos et al. (2004) for a detailed introduction to semantic representation using CCG. 3Neo-Davidsonian semantics is a form of first-order logic that uses event identifiers (e) to connect verb predicates and their subcategorized arguments through conjunctions. Titanic e Cameron directed e e 1997 dir ec ted .ar g1 dir ect ed .ar g2 d ire c te d .in d ire c te d .a rg 2 directed.arg1 directed.in directed.arg1(e, Cameron) ∧ directed.arg2(e, Titanic) ∧ directed.in(e, 1997) (a) Ungrounded graph Titanic m Cameron directed n 1997 film .di rec ted by .ar g2 film .di rec ted by .ar g1 fi lm .in itia l re le a se d a te .a rg 2 fi lm .in itia l re le a se d a te .a rg 1 film.directed by.arg2(m, Cameron) ∧ film.directed by.arg1(m, Titanic) ∧ film.initial release date.arg1(n, Titanic) ∧ film.initial release date.arg2(n, 1997) (b) Grounded graph Figure 5: Graph representations for the sentence Cameron directed Titanic in 1997. takes two argument NPs and becomes S. To rep- resent its semantic category, we use a lambda term λyλx. directed.arg1(e,x)∧ directed.arg2(e,y), where e identifies the event of directed, and x and y are arguments corresponding to the NPs in the syn- tactic category. We obtain semantic categories automatically us- ing the indexed syntactic categories provided by the C&C parser. The latter reveal the bind- ings of basic constituent categories in more com- plex categories. For example, in order to convert ((S\NP)\(S\NP))/NP to its semantic category, we must know whether all NPs have the same refer- ent and thus use the same variable name. The in- dexed category ((Se\NPx)\(Se\NPx))/NPy reveals that there are only two different NPs, x and y, and that one of them (i.e., x) is shared across two subcat- egories. We discuss the details of semantic category construction in the Appendix. Apart from n-ary predicates representing events (mostly verbs), we also use unary predicates repre- senting types in language (mostly common nouns and noun modifiers). For example, capital(Austin) indicates Austin is of type capital. Prepositions, ad- jectives and adverbs are represented by predicates lexicalized with their head words to provide more information (see capital.of.arg1 instead of of.arg1 in Figure 2a). 2.3 Ungrounded Semantic Graphs We will now illustrate how we create ungrounded semantic graphs from CCG-derived semantic parses. Figure 5a displays the ungrounded graph for the sen- Who target directed x e The Nutty Professor directed.arg1 directed .arg2 target(x) ∧ directed.arg1(e, x) ∧ directed.arg2(e, TheNuttyProfessor) (a) Who directed The Nutty Professor? capital capital unique Austin e Texas capital .state state ty p e ty p e capital.of .arg1 capital.of .arg2 unique(Austin) ∧ capital(Austin) ∧ capital.state(Austin) ∧ capital.of.arg1(e, Austin) ∧ capital.of.arg2(e, Texas) (b) Austin is the state capital of Texas. Figure 6: Ungrounded graphs with math functions TARGET and UNIQUE. tence Cameron directed Titanic in 1997. In order to construct ungrounded graphs topologically simi- lar to Freebase, we define five types of nodes: Word Nodes (Ovals) Word nodes are denoted by ovals. They represent natural language words (e.g., directed in Figure 5a, capital and state in Fig- ure 6b). Word nodes are connected to other word nodes via syntactic dependencies. For readability, we do not show inter-word dependencies. Entity Nodes (Rectangles) Entity nodes are denoted by rectangles and represent entities e.g., Cameron in Figure 5a. In cases where an entity is not known, we use variables e.g., x in Figure 6a. Entity variables are connected to their correspond- ing word nodes from which they originate by dotted links e.g., x in Figure 6a is connected to the word node who. Mediator Nodes (Circles) Mediator nodes are de- noted by circles and represent events in language. They connect pairs of entities which participate in an event forming a clique (see the entities Cameron, Titanic and 1997 in Figure 5a). We define an edge as a link that connects any two entities via a medi- ator. The subedge of an edge i.e., the link between a mediator and an entity, corresponds to the predi- cate denoting the event and taking the entity as its argument (e.g. directed.arg1 links e and Cameron in Figure 5a). Mediator nodes are connected to their corresponding word nodes from which they origi- nate by dotted links e.g. mediators in Figure 5a are connected to word node directed. Type nodes (Rounded rectangles) Type nodes are denoted by rounded rectangles. They represent unary predicates in natural language. In Figure 6b type nodes capital and capital.state are attached to Austin denoting Austin is of type capital and capi- tal.state. Type nodes are also connected to their cor- responding word nodes from which they originate by dotted links e.g. type node capital.state and word node state in Figure 6b. Math nodes (Diamonds) Math nodes are denoted by diamonds. They describe functions to be applied on the nodes/subgraphs they attach to. The function TARGET attaches to the entity variable of interest. For example, the graph in Figure 6a represents the question Who directed The Nutty Professor?. Here, TARGET attaches to x representing the word who. UNIQUE attaches to the entity variable modified by the definite article the. In Figure 6b, UNIQUE at- taches to Austin implying that only Austin satisfies the graph. Finally, COUNT attaches to entity nodes which have to be counted. For the sentence Julie An- drews has appeared in 40 movies in Figure 7, the KB could either link Julie Andrews and 40, with type node movies matching the grounded type integer, or it could link Julie Andrews to each movie she acted in and the count of these different movies add to 40. In anticipation of this ambiguity, we generate two semantic parses resulting in two ungrounded graphs (see Figures 7a and 7b). We generate all possible grounded graphs corresponding to each ungrounded graph, and leave it up to the learning to decide which ones the KB prefers. 2.4 Grounded Semantic Graphs We ground semantic graphs in Freebase by mapping edge labels to relations, type nodes to entity types, and entity nodes to Freebase entities. Math nodes remain unchanged. Though word nodes are not present in Freebase, we retain them in our grounded graphs to extract sophisticated features based on words and grounded predicates. appeared movies movies Julie Andrews e 40 appeared .arg1 appeared.in ty p e appeared.arg1(e, JulieAndrews) ∧ appeared.in(e, 40) ∧ movies(40) (a) Ungrounded Graph appeared movies movies Julie Andrews e z count 40 appeared .arg1 appeared .in ty p e appeared.arg1(e, JulieAndrews) ∧ appeared.in(e, z) ∧ movies(z) ∧ count(z, 40) (b) Alternate Ungrounded Graph appeared film movies Julie Andrews m z count 40 performance .actor performance .film ty p e performance.actor(m, JulieAndrews) ∧ performance.film(m, z) ∧ film(z) ∧ count(z, 40) (c) Grounded graph Figure 7: Graph representations for the sentence Julie Andrews has appeared in 40 movies. Un- grounded graph (a) directly connects Julie Andrews and 40, whereas graph (b) uses the math func- tion COUNT. Ungrounded graph (b) and grounded graph (c) have similar topology. Entity nodes Previous approaches (Cai and Yates, 2013; Berant et al., 2013; Kwiatkowski et al., 2013) use a manual lexicon or heuristics to ground named entities to Freebase entities. Fortunately, CLUEWEB09 sentences have been automatically annotated with Freebase entities, so we use these an- notations to ground proper names to Freebase enti- ties (denoted by uppercase words) e.g., Cameron in Figure 5a is grounded to Freebase entity CAMERON in Figure 5b. Common nouns like movies (see Fig- ure 7b) are left as variables to be instantiated by the entities satisfying the graph. Type nodes Type nodes are grounded to Free- base entity types. Type nodes capital and capi- tal.state in Figure 6b are grounded to all possible types of Austin (e.g., location.city, location.capital city, book.book subject, broadcast.genre). In cases where entity nodes are not grounded, (e.g., z in Figure 7b), employees employees 120000 e Alcoa has e e 2007 type ha s.a rg1 ha s.a rg 2 h a s.in h a s. a rg 2 has.arg1 has.in has.arg1(e, Alcoa) ∧ has.arg2(e, 120000) ∧ has.in(e, 2007) ∧ employees(120000) (a) Ungrounded Graph employees type.int 119000 m Alcoa has m m 2007 type em plo ye r.n um be r of em plo ye es .in ve rse m ea su re m en t un it .d at ed in te ge r .n um be r m e a su re m e n t u n it .d a te d in te g e r .y e a r m e a su re m e n t u n it .d a te d in te g e r .n u m b e r employer.number of employees .inverse m easurem ent unit .dated integer .year employer.number of employees.inverse(m, Alcoa) ∧ measurement unit.dated integer.number(m, 119000) ∧ measurement unit.dated integer.year(m, 2007) ∧ type.int(119000) (b) Grounded Graph Figure 8: Graph representations for Alcoa has 120000 employees in 2007. we use an automatically constructed lexicon which maps ungrounded types to grounded ones (see Sec- tion 4.2 for details). Edges An edge between two entities is grounded using all edges linking the two entities in the knowl- edge graph. For example, to ground the edge be- tween Titanic and Cameron in Figure 5, we use the following edges linking TITANIC and CAMERON in Freebase: (film.directed by.arg1, film.directed by.arg2), (film.produced by.arg1, film.produced by.arg2). If only one entity is grounded, we use all possible edges from this grounded entity. If no entity is grounded, we use a mapping lexicon which is automatically created as described in Section 4.2. Given an un- grounded graph with n edges, there are O((k + 1)n) possible grounded graphs, with k being the grounded edges in the knowledge graph for each ungrounded edge together with an additional empty (no) edge. Mediator nodes In an ungrounded graph, media- tor nodes represent semantic event identifiers. In the grounded graph, they represent Freebase fact identi- fiers. Fact identifiers help distinguish if neighboring edges belong to a single complex fact, which may or may not be coextensive with an ungrounded event. In Figure 8a, the edges corresponding to the event identifier e are grounded to a single complex fact in Figure 8b, with the fact identifier m. However, in Figure 5a, the edges of the ungrounded event e are grounded to different Freebase facts, distinguished in Figure 5b by the identifiers m and n. Furthermore, the edge in 5a between CAMERON and 1997 is not grounded in 5b, since no Freebase edge exists be- tween the two entities. We convert grounded graphs to SPARQL queries, but for readability we only show logical expressions. The conversion is deterministic and is exactly the inverse of the semantic parse to graph conversion (Section 2.3). Wherever a node/edge is instantiated with a grounded entity/type/relation in Freebase, we use them in the grounded parse (e.g., type node cap- ital.state in Figure 6b becomes location.capital city). Math function TARGET is useful in retrieving instan- tiations of entity variables of interest (see Figure 6a). 3 Learning A natural language sentence may give rise to several grounded graphs. But only one (or a few) of them will be a faithful representation of the sentence in Freebase. We next describe our algorithm for find- ing the best Freebase graph for a given sentence, our learning model, and the features it uses. 3.1 Algorithm Freebase has a large number of relations and enti- ties, and as a result there are many possible grounded graphs g for each ungrounded graph u. We con- struct and score graphs incrementally, traversing each node in the ungrounded graph and matching its edges and types in Freebase. Given a NL sentence s, we construct from its CCG syntactic derivation all corresponding ungrounded graphs u. Using a beam search procedure (described in Section 4.2), we find the best scoring graphs (ĝ,û), maximizing over dif- ferent graph configurations (g,u) of s: (ĝ,û) = arg max g,u Φ(g,u,s,K B)·θ (1) We define the score of (ĝ,û) as the dot product between a high dimensional feature representation Φ = (Φ1,...Φm) and a weight vector θ (see Sec- tion 3.3 for details on the features we employ). We estimate the weights θ using the averaged structured perceptron algorithm (Collins, 2002). As shown in Algorithm 1, the perceptron makes sev- eral passes over sentences, and in each iteration it computes the best scoring (ĝ,û) among the candi- date graphs for a given sentence. In line 6, the al- gorithm updates θ with the difference (if any) be- Algorithm 1: Averaged Structured Perceptron Input: Training sentences: {si}Ni=1 1 θ ← 0 2 for t ← 1...T do 3 for i ← 1...N do 4 (ĝi,ûi) = arg max gi,ui Φ(gi,ui,si,K B)·θ 5 if (u+i ,g + i ) 6= (ûi,ĝi) then 6 θ ← θ + Φ(g+i ,u + i ,si,K B)−Φ(ĝi,ûi,si,K B) 7 return 1T ∑ T t=i 1 N ∑ N i=1 θ i t tween the feature representations of the best scoring graph (ĝ,û) and the gold standard graph (g+,u+). The goal of the algorithm is to rank gold standard graphs higher than the any other graphs. The final weight vector θ is the average of weight vectors over T iterations and N sentences. This averaging pro- cedure avoids overfitting and produces more stable results (Collins, 2002). As we do not make use of question-answer pairs or manual annotations of sentences, gold standard graphs (g+,u+) are not available. In the following, we explain how we approximate them by relying on graph denotations as a form of weak supervision. 3.2 Selecting Surrogate Gold Graphs Let u be an ungrounded semantic graph of s. We se- lect an entity E in u, replace it with a variable x, and make it a target node. Let u+ represent the resulting ungrounded graph. Next, we obtain all grounded graphs g+ which correspond to u+ such that the denotations [[u+]]K B = [[g +]]N L . We use these surrogate graphs g+as gold standard, and the pairs (u+,g+) for model training. There is con- siderable latitude in choosing which entity E to re- place. This can be done randomly, according to en- tity frequency, or some other criterion. We found that substituting the entity with the most connections to other entities in the sentence works well in prac- tice. All the entities that can replace x in u+ to con- stitute a valid fact in Freebase will be the denotation of u+, [[u+]]N L . While it is straightforward to com- pute [[g+]]K B , it is hard to compute [[u +]]N L because of the mismatch between our natural language se- mantic language and the Freebase query language. To ensure that graphs u+ and g+ have the same de- notations, we impose the following constraints: Constraint 1 If the math function UNIQUE is attached to the entity being replaced in the un- grounded graph, we assume the denotation of u+ contains only that entity. For example, in Fig- ure 2b, we replace Austin by x, and thus assume [[u+]]N L = {AUSTIN}.4 Any grounded graph which results in [[g+]]K B = {AUSTIN} will be considered a surrogate gold graph. This allows us to learn entail- ment relations, e.g., capital.of should be grounded to location.capital (left hand-side graph in Figure 2d) and not to location.containedby which results in all loca- tions in Texas (right hand-side graph in Figure 2d). Constraint 2 If the target entity node is a num- ber, we select the Freebase graphs with denota- tion close to this number. For example, in Fig- ure 8a if 120,000 is replaced by x, and we as- sume [[u+]]N L = {120,000}. However, the grounded graph 8b retrieves [[g+]]K B = {119,000}. We treat this as correct if β γ ∈ [0.9,1.1] where β ∈ [[u+]]N L and γ ∈ [[g+]]K B . Integers can either occur directly in relation with an entity as in Figure 8b, or must be enumerated as in Figure 7c. Constraint 3 If the target entity node is a date, we select the grounded graph which results in the small- est set containing the date based on the intuition that most sentences in the data describe specific rather than general events. Constraint 4 If none of the above constraints ap- ply to the target entity E, we know E ∈ [[u+]]N L , and hence we select the grounded graphs which satisfy E ∈ [[g+]]K B as surrogate gold graphs. 3.3 Features Our feature vector Φ(g,u,s,K B) denotes the fea- tures extracted from a sentence s and its correspond- ing graphs u and g with respect to a knowledge base K B . The elements of the vector (φ1, φ2, . . . ) take integer values denoting the number of times a feature appeared. We devised the following broad feature classes: Lexical alignments Since ungrounded graphs are similar in topology to grounded graphs, we extract ungrounded and grounded edge 4We also remove UNIQUE attached to x to exactly mimic the test time setting. and type alignments. So, from graphs 5a and 5b, we obtain the edge alignment φedge(directed.arg1, directed.arg2, film.directed by.arg2, film.directed by.arg1) and the subedge align- ments φedge(directed.arg1, film.directed by.arg2) and φedge(directed.arg2, film.directed by.arg1). In a similar fashion we extract type alignments (e.g., φtype(capital,location.city)). Contextual features In addition to lexical alignments, we also use contextual features which essentially record words or word com- binations surrounding grounded edge labels. Feature φevent records an event word and its grounded predicates (e.g., in Figure 7c we extract features φevent(appear, performance.film) and φevent(appear, performance.actor). Fea- ture φarg records a predicate and its argument words (e.g., φarg(performance.film,movie) in Figure 7c). Word combination features are ex- tracted from the parser’s dependency output. The feature φdep records a predicate and the dependencies of its event word (e.g., from the grounded version of Figure 6b we extract features φdep(location.state.capital.arg1,capital,state) and φdep(location.state.capital.arg2,capital,state)). Using such features, we are able to handle multiword predicates. Lexical similarity We count the number of word stems5 shared by grounded and ungrounded edge labels e.g., in Figure 5 directed.arg1 and film.directed by.arg2 have one stem overlap (ignoring the argument labels arg1 and arg2). For a grounded graph, we compute φstem, the aggregate stem overlap count over all its grounded and ungrounded edge la- bels. We did not incorporate WordNet/Wiktionary- based lexical similarity features but these were found fruitful in Kwiatkowski et al. (2013). We also have a feature for stem overlap count between the grounded edge labels and the context words. Graph connectivity features These features pe- nalize graphs with non-standard topologies. For ex- ample, we do not want a final graph with no edges. The feature value φhasEdge is one if there exists at least one edge in the graph. We also have a fea- ture φnodeCount for counting the number of connected 5We use the Porter stemmer. Domain #Rels #Types #Triples #Train #Free917 #WebQ business 226 102 23m 30k 46 49 film 113 75 42m 13k 49 91 people 85 59 68m 56k 29 430 all* 411 210 120m 99k 124 570 Table 1: Domain-specific Freebase statistics (*some relations/types/triples are shared across domains); number of training CLUEWEB09 sentences; number of test questions in FREE917 and WEBQUESTIONS. nodes in the graph. Finally, feature φcolloc captures the collocation of grounded edges (e.g., edges be- longing to a single complex fact are likely to co- occur; see Figure 8b). 4 Experimental Setup In this section we present our experimental set-up for assessing the performance of the semantic parser described above. We present the datasets on which our model was trained and tested, discuss implemen- tation details, and briefly introduce the models used for comparison with our approach. 4.1 Data We evaluated our approach on the FREE917 (Cai and Yates, 2013) and WEBQUESTIONS (Berant et al., 2013) datasets. FREE917 consists of 917 questions and their meaning representations (written in a variant of lambda calculus) which we, however, do not use. The dataset represents 81 domains covering 635 Freebase relations, with most domains containing fewer than 10 questions. We report results on three domains, namely film, business, and people as these are relatively large in both FREE917 and Freebase. WEBQUESTIONS consists of 5,810 question-answer pairs, 2,780 of which are reserved for testing. Our experiments used a subset of WEBQUESTIONS representing the three target domains. We extracted domain-specific queries semi-automatically by identifying question- answer pairs with entities in target domain relations. In both datasets, named entities were disambiguated to Freebase entities with a named entity lexicon.6 Table 1 presents descriptive statistics for each do- main. Evaluating on all domains in Freebase would 6FREE917 comes with a named entity lexicon. For WEBQUES- TIONS we hand-coded this lexicon. generate a very large number of queries for which denotations would have to be computed (the number of queries is linear in the number of domains and the size of training data). Our system loads Freebase us- ing Virtuoso7 and queries it with SPARQL. Virtuoso is slow in dealing with millions of queries indexed on the entire Freebase, and is the only reason we did not work with the complete Freebase. 4.2 Implementation To train our model, we extracted sentences from CLUEWEB09 which contain at least two entities as- sociated with a relation in Freebase, and have an edge between them in the ungrounded graph. These were further filtered so as to remove sentences which do not yield at least one semantic parse without an uninstantiated entity variable. For example, the sen- tence Avatar is directed by Cameron would be used for training, whereas Avatar directed by Cameron re- ceived a critical review wouldn’t. In the latter case, any semantic parse will have an uninstantiated en- tity variable for review. Table 1 (Train) shows the number of sentences we obtained. In order to train our semantic parser, we ini- tialized the alignment and type features (φedge and φtype, respectively) with the alignment lexicon weights. These weights are computed as follows. Let count(r′,r) denote the number of pairs of enti- ties which are linked with edge r′ in Freebase and edge r in CLUEWEB09 sentences. We then estimate the probability distribution P(r′/r) = count(r ′,r) ∑i count(r ′ i,r) . Analogously, we created a type alignment lexicon. The counts were collected from CLUEWEB09 sen- tences containing pairs of entities linked with an edge in Freebase (business 390k, film 130k, and people 490k). Contextual features were initialized to −1 since most word contexts and grounded predi- cates/types do not appear together. All other features were set to 0. We used a beam-search algorithm to convert un- grounded graphs to grounded ones. The edges and types of each ungrounded graph are placed in a pri- ority queue. Priority is based on edge/type tf-idf scores collected over CLUEWEB09. At each step, we pop an element from the queue and ground it in Freebase. We rank the resulting grounded graphs us- 7http://virtuoso.openlinksw.com ing the perceptron model, and pick the n-best ones, where n is the beam size. We continue until the queue is empty. In our experiments we used a beam size of 100. We trained a single model for all the do- mains combined together. We ran the perceptron for 20 iterations (around 5–10 million queries). At each training iteration we used 6,000 randomly selected sentences from the training corpus. 4.3 Comparison Systems We compared our graph-based semantic parser (henceforth GRAPHPARSER) against two state-of- the-art systems both of which are open-domain and work with Freebase. The semantic parser developed by Kwiatkowski et al. (2013) (henceforth KCAZ13) is learned from question-answer pairs and follows a two-stage procedure: first, a natural language sen- tence is converted to a domain-independent seman- tic parse and then grounded onto Freebase using a set of logical-type equivalent operators. The opera- tors explore possible ways sentential meaning could be expressed in Freebase and essentially transform logical form to match the target ontology. Our ap- proach also has two steps (i.e., we first generate mul- tiple ungrounded graphs and then ground them to different Freebase graphs). We do not use opera- tors to perform structure matching, rather we cre- ate multiple graphs and leave it up to the learner to find an appropriate grounding using a rich feature space. To give a specific example, their operator lit- eral to constant is equivalent to having named enti- ties for larger text chunks in our case. Their operator split literal explores different edge possibilities in an event whereas we start with a clique and remove un- wanted edges. Our approach has (almost) similar expressive power but is conceptually simpler. Our second comparison system was the seman- tic parser of Berant and Liang (2014) (henceforth PARASEMPRE) which also uses QA pairs for train- ing and makes use of paraphrasing. Given an in- put NL sentence, they first construct a set of logi- cal forms based on hand-coded rules, and then gen- erate sentences from each logical form (using gen- eration templates and a lexicon). Pairs of logical forms and natural language are finally scored us- ing a paraphrase model consisting of two compo- nents. An association model determines whether they contain phrase pairs likely to be paraphrases System Prec Rec F1 MWG 52.6 49.1 50.8 KCAZ13 72.6 66.1 69.2 GRAPHPARSER 81.9 76.6 79.2 Table 2: Experimental results on FREE917. and a vector space model assigns a vector represen- tation for each sentence, and learns a scoring func- tion that ranks paraphrase candidates. Our seman- tic parser employs a graph-based representation as a means of handling the mismatch between natu- ral language, whereas PARASEMPRE opts for a text- based one through paraphrasing. Finally, we compared our semantic parser against a baseline which is based on graphs but em- ploys no learning. The baseline converts an un- grounded graph to a grounded one by replacing each ungrounded edge/type with the highest weighted grounded label creating a maximum weighted graph, henceforth MWG. Both GRAPHPARSER and the baseline use the same alignment lexicon (a weighted mapping from ungrounded to grounded labels). 5 Results Table 2 summarizes our results on FREE917. As described earlier, we evaluated GRAPHPARSER on a subset of the dataset representing three domains (business, film, and people). Since this sub- set contains a relatively small number of instances (124 in total), we performed 10-fold cross valida- tion with 9 folds as development data8, and one fold as test data. We report results averaged over all test folds. With respect to KCAZ13, we present re- sults with their cross-domain trained models, where training data from multiple domains is used to test foreign domains.9 KCAZ13 used generic features like string similarity and knowledge base features which apply across domains and do not require in- domain training data. We do not report results with PARASEMPRE as the small number of training in- stances would put their method at a disadvantage. We treat a predicted query as correct if its denota- 8The development data is only used for model selection and for determining the optimal training iteration. 9We are grateful to Tom Kwiatkowski for supplying us with the output of their system. Features FREE917 WEBQ All 79.2 41.4 −Contextual 73.3 42.6 −Alignment 66.7 34.8 −Connectivity 65.0 36.6 −Similarity 62.5 35.0 Table 3: GRAPHPARSER ablation results on FREE917 and WEBQUESTIONS development set. tion is exactly equal to the denotation of the manu- ally annotated gold query. As can be seen, GRAPHPARSER outperforms KCAZ13 and the MWG baseline by a wide margin. This is an encouraging result bearing in mind that our model does not use question-answer pairs. We should also point out that our domain relation set is larger compared to KCAZ13. We do not prune any of the relations in Freebase, whereas KCAZ13 use only 112 relations and 83 types from our three domains (see Table 1). We further performed a feature ab- lation study to examine the contribution of different feature classes. As shown in Table 3, the most im- portant features are those based on lexical similar- ity, as also observed in KCAZ13. Graph connectivity and lexical alignments are equally important (these features are absent from KCAZ13). Contextual fea- tures are not very helpful over and above alignment features which also encode contextual information. Overall, generic features like lexical similarity are helpful only to a certain extent; the performance of GRAPHPARSER improves considerably when addi- tional graph-related features are taken into account. We also analyzed the errors GRAPHPARSER makes. 25% of these are caused by the C&C parser and are cases where it either returns no syntactic analysis or a wrong one. 19% of the errors are due to Freebase inconsistencies. For example, our system answered the question How many stores are in Nittany mall? with 65 using the relation shop- ping center.number of stores whereas the gold stan- dard provides the answer 25 counting all stores using the relation shopping center.store. Around 15% of er- rors include structural mismatches between natural language and Freebase; for the question Who is the president of Gap Inc?, our method grounds president to a grounded type whereas in Freebase it is repre- sented as a relation employment.job.title. The remain- System Prec Rec F1 MWG 39.4 34.0 36.5 PARASEMPRE 37.5 37.5 37.5 GRAPHPARSER 41.9 37.0 39.3 GRAPHPARSER + PARA 44.7 38.4 41.3 Table 4: Experimental results on WEBQUESTIONS. ing errors are miscellaneous. For example, the ques- tion What are some films on Antarctica? receives two interpretations, i.e., movies filmed in Antarctica or movies with Antarctica as their subject. We next discuss our results on WEBQUESTIONS. PARASEMPRE was trained with 1,115 QA pairs (corresponding to our target domains) together with question paraphrases obtained from the PARALEX corpus (Fader et al., 2013).10 While training PARASEMPRE, out-of-domain Freebase relations and types were removed. Both GRAPHPARSER and PARASEMPRE were tested on the same set of 570 in-domain QA pairs with exact answer match as the evaluation criterion. For development purposes, GRAPHPARSER uses 200 QA pairs. Table 4 displays our results. We observe that GRAPHPARSER obtains a higher F1 against MWG and PARASEMPRE. Dif- ferences in performance among these systems are less pronounced compared to FREE917. This is for a good reason. WEBQUESTIONS is a challenging dataset, created by non-experts. The questions are not tailored to Freebase in any way, they are more varied and display a wider vocabulary. As a result the mismatch between natural language and Free- base is greater and the semantic parsing task harder. Error analysis further revealed that parsing errors are responsible for 13% of the questions GRAPH- PARSER fails to answer. Another cause of er- rors is mismatches between natural language and Freebase. Around 7% of the questions are of the type Where did X come from?, and our model an- swers with the individual’s nationality, whereas an- notators provide the birthplace (city/town/village) as the right answer. Moreover, 8% of the ques- tions are of the type What does X do?, which the annotators answer with the individual’s profession. In natural language, we rarely attest constructions 10We used the SEMPRE package (http://www-nlp. stanford.edu/software/sempre/) which does not use any hand-coded entity disambiguation lexicon. like X does dentist/researcher/actor. The proposed framework assumes that Freebase and natural lan- guage are somewhat isomorphic, which is not al- ways true. An obvious future direction would be to paraphrase the questions so as to increase the num- ber of grounded and ungrounded graphs. As an illus- tration, we rewrote questions like Where did X come from to What is X’s birth place, and What did X do to What is X’s profession and evaluated our model GRAPHPARSER + PARA. As shown in Table 4, even simple paraphrasing can boost performance. Finally, Table 3 (third column) examines the con- tribution of different features on the WEBQUES- TIONS development dataset. Interestingly, we ob- serve that contextual features are not useful and in fact slightly harm performance. We hypothesize that this is due to the higher degree of mismatch between natural language and Freebase in this dataset. Fea- tures based on similarity, graph connectivity, and lexical alignments are more robust and generally useful across datasets. 6 Discussion In this paper, we introduce a new semantic pars- ing approach for Freebase. A key idea in our work is to exploit the structural and conceptual similari- ties between natural language and Freebase through a common graph-based representation. We formal- ize semantic parsing as a graph matching problem and learn a semantic parser without using annotated question-answer pairs. We have shown how to ob- tain graph representations from the output of a CCG parser and subsequently learn their correspondence to Freebase using a rich feature set and their deno- tations as a form of weak supervision. Our parser yields state-of-the art performance on three large Freebase domains and is not limited to question an- swering. We can create semantic parses for any type of NL sentences. Our work brings together several strands of re- search. Graph-based representations of sentential meaning have recently gained some attention in the literature (Banarescu et al., 2013), and attempts to map sentences to semantic graphs have met with good inter-annotator agreement. Our work is also closely related to Kwiatkowski et al. (2013) and Be- rant and Liang (2014) who present open-domain se- mantic parsers based on Freebase and trained on QA pairs. Despite differences in formulation and model structure, both approaches have explicit mechanisms for handling the mismatch between natural language and the KB (e.g., using logical-type equivalent oper- ators or paraphrases). The mismatch is handled im- plicitly in our case via our graphical representation which allows for the incorporation of all manner of powerful features. More generally, our method is based on the assumption that linguistic structure has a correspondence to Freebase structure which does not always hold (e.g., in Who is the grandmother of Prince William?, grandmother is not directly ex- pressed as a relation in Freebase). Additionally, our model fails when questions are too short with- out any lexical clues (e.g., What did Charles Darwin do? ). Supervision from annotated data or paraphras- ing could improve performance in such cases. In the future, we plan to explore cluster-based semantics (Lewis and Steedman, 2013) to increase the robust- ness on unseen NL predicates. Our work joins others in exploiting the connec- tions between natural language and open-domain knowledge bases. Recent approaches in relation ex- traction use distant supervision from a knowledge base to predict grounded relations between two tar- get entities (Mintz et al., 2009; Hoffmann et al., 2011; Riedel et al., 2013). During learning, they ag- gregate sentences containing the target entities, ig- noring richer contextual information. In contrast, we learn from each individual sentence taking into account all entities present, their relations, and how they interact. Krishnamurthy and Mitchell (2012) formalize semantic parsing as a distantly supervised relation extraction problem combined with a manu- ally specified grammar to guide semantic parse com- position. Finally, our approach learns a model of seman- tics guided by denotations as a form of weak su- pervision. Beyond semantic parsing (Artzi and Zettlemoyer, 2013; Liang et al., 2011; Clarke et al., 2010), feedback-based learning has been previously used for interpreting and following NL instructions (Branavan et al., 2009; Chen and Mooney, 2011), playing computer games (Branavan et al., 2012), and grounding language in the physical world (Krishna- murthy and Kollar, 2013; Matuszek et al., 2012). Lemma POS Semantic Class Semantic Category * VB*, IN, TO, POS EVENT directed : (Se\NPx<1>)/NPy<2> : λQλPλe.∃x∃y. directed.arg1(e,x)∧directed.arg2(e,y)∧P(x)∧Q(y) * NN, NNS TYPE movie : NP : λx.movie(x) * NNP*, PRP* ENTITY Obama : NP : λx.equal(x,Obama) * RB* EVENTMOD annually : Se\Se : λPλe.lexe.annually(e)∧P(e) * JJ* TYPEMOD state : NPx/NPx : λPλx.lexx.state(x)∧P(x) be * COPULA be: (Sy\NPx)/NPy : λQλPλy.∃x.lexy(x)∧P(x)∧Q(y) the * UNIQUE the : NPx/NPx : λPλx.UNIQUE(x)∧P(x) * CD COUNT twenty : Nx/Nx : λPλx.COUNT(x,20)∧P(x) twenty : Nx/Nx : λPλx.equal(x,20)∧P(x) not, n’t * NEGATION not : (Se\NPx)/(Se\NPx) : λPλQλe.∃x.NEGATION(e)∧P(x,e)∧Q(x) no * COMPLEMENT no : NPx/Nx : λPλx.COMPLEMENT(x)∧P(x) * WDT, WP*, QUESTION what : S[wq]e/(S[dcl]e\NPx) WRB : λPλe.∃x.TARGET(x)∧P(x,e) * WDT, WP*, CLOSED which : (NPx\NPx)/(S[dcl]e\NPx) WRB : λPλQλx.∃e.P(x,e)∧Q(x) Table 5: Rules used to classify words into semantic classes. * represents a wild card expression which matches anything. lexx denotes the lexicalised form of x e.g., when state : NPx/NPx : λPλx.lexx.state(x)∧ P(x) is applied to capital : NP : λy.capital(y), the lexicalised form of x becomes capital, and there- fore the predicate lexx.state becomes capital.state. The resulting semantic parse after application is λx.capital.state(x)∧capital(x). Appendix We use a handful of rules to divide words into semantic classes. Based on a word’s seman- tic class and indexed syntactic category, we con- struct its semantic category automatically. For example, directed is a member of the EVENT class, and its indexed syntactic category is ((Se\NPx<1>)/NPy<2>) (here, <1> and <2> in- dicate that x and y are the first and second ar- guments of e). We then generate its seman- tic category as λQλPλe.∃x∃y.directed.arg1(e,x)∧ directed.arg2(e,y)∧ P(x)∧ Q(y). Please refer to Appendix B of Clark and Curran (2007) for a list of their indexed syntactic categories. The rules are described in Table 5. Syntactic cate- gories are not shown for the sake of brevity. Most rules will match any syntactic category. Excep- tions are copula-related rules (see be in the sixth row) which apply only to the syntactic category (S\NP)/NP, and rules pertaining to wh -words (see the last two rows in the table). When more than one rule apply, we end up with multiple semantic parses. There are a few cases like passives, question words, and prepositional phrases where we modified the original indexed categories for better interpretation of the semantics (these are not displayed here). We also handle non-standard CCG operators involving unary and binary rules as described in Appendix A of Clark and Curran (2007). Acknowledgements We are grateful to the anonymous reviewers for their valuable feedback on an earlier version of this paper. Thanks to Mike Lewis and the members of ILCC for helpful discussions and comments. We acknowl- edge the support of EU ERC Advanced Fellowship 249520 GRAMPLUS and EU IST Cognitive Sys- tems IP EC-FP7-270273 “Xperience”. References Artzi, Yoav and Luke Zettlemoyer. 2011. Boot- strapping semantic parsers from conversations. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edin- burgh, Scotland, pages 421–432. Artzi, Yoav and Luke Zettlemoyer. 2013. Weakly su- pervised learning of semantic parsers for mapping instructions to actions. Transations of the Associ- ation for Computational Linguistics 1(1):49–62. Banarescu, Laura, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning repre- sentation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interop- erability with Discourse. Sofia, Bulgaria, pages 178–186. Berant, Jonathan, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Nat- ural Language Processing. Seattle, Washington, USA, pages 1533–1544. Berant, Jonathan and Percy Liang. 2014. Seman- tic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA, pages 1415–1425. Bos, Johan, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. 2004. Wide-coverage semantic representations from a ccg parser. In Proceedings of Coling 2004. Geneva, Switzerland, pages 1240–1246. Branavan, S.R.K., Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learn- ing for mapping instructions to actions. In Pro- ceedings of the Joint Conference of the 47th An- nual Meeting of the ACL and the 4th International Joint Conference on Natural Language Process- ing of the AFNLP. Suntec, Singapore, pages 82– 90. Branavan, S.R.K., Nate Kushman, Tao Lei, and Regina Barzilay. 2012. Learning high-level plan- ning from text. In Proceedings of the 50th An- nual Meeting of the Association for Computa- tional Linguistics. Jeju Island, Korea, pages 126– 135. Cai, Qingqing and Alexander Yates. 2013. Large- scale semantic parsing via schema matching and lexicon extension. In Proceedings of the 51st Annual Meeting of the Association for Compu- tational Linguistics. Sofia, Bulgaria, pages 423– 433. Chen, David L. and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelli- gence. San Francisco, California, pages 859–865. Clark, Stephen and James R Curran. 2004. Pars- ing the wsj using CCG and log-linear models. In Proceedings of the 42nd Annual Meeting on Asso- ciation for Computational Linguistics. Barcelona, Spain, pages 103–111. Clark, Stephen and James R Curran. 2007. Wide- coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics 33(4):493–552. Clark, Stephen, Julia Hockenmaier, and Mark Steed- man. 2002. Building deep dependency structures with a wide-coverage CCG parser. In Proceed- ings of the 40th Annual Meeting on Association for Computational Linguistics. pages 327–334. Clarke, James, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. Driving semantic parsing from the world’s response. In Proceedings of the 14th Conference on Natural Language Learning. Uppsala, Sweden, pages 18–27. Collins, Michael. 2002. Discriminative training methods for Hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empir- ical Methods in Natural Language Processing. Philadelphia, Pennsylvania, pages 1–8. Fader, Anthony, Luke Zettlemoyer, and Oren Et- zioni. 2013. Paraphrase-driven learning for open question answering. In Proceedings of the 51st Annual Meeting of the Association for Computa- tional Linguistics. Sofia, Bulgaria, pages 1608– 1618. Gabrilovich, Evgeniy, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1 (Re- lease date 2013-06-26, Format version 1, Correc- tion level 0). Goldwasser, Dan, Roi Reichart, James Clarke, and Dan Roth. 2011. Confidence driven unsupervised semantic parsing. In Proceedings of the 49th Annual Meeting of the Association for Compu- tational Linguistics: Human Language Technolo- gies. Portland, Oregon, USA, pages 1486–1495. Goldwasser, Dan and Dan Roth. 2011. Learning from natural instructions. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona, Spain, pages 1794–1800. Hoffmann, Raphael, Congle Zhang, Xiao Ling, Luke S Zettlemoyer, and Daniel S Weld. 2011. Knowledge-based weak supervision for informa- tion extraction of overlapping relations. In Pro- ceedings of the 49th Annual Meeting of the As- sociation for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA, pages 541–550. Krishnamurthy, Jayant and Thomas Kollar. 2013. Jointly learning to parse and perceive: Connect- ing natural language to the physical world. Tran- sations of the Association for Computational Lin- guistics 1(1):193–206. Krishnamurthy, Jayant and Tom Mitchell. 2012. Weakly supervised training of semantic parsers. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Process- ing and Computational Natural Language Learn- ing. Jeju Island, Korea, pages 754–765. Kwiatkowski, Tom, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer. 2013. Scaling semantic parsers with on-the-fly ontology matching. In Proceed- ings of the 2013 Conference on Empirical Meth- ods in Natural Language Processing. Seattle, Washington, USA, pages 1545–1556. Kwiatkowski, Tom, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2010. Inducing probabilistic CCG grammars from logical form with higher-order unification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. pages 1223–1233. Lewis, Mike and Mark Steedman. 2013. Combined distributional and logical semantics. Transactions of the Association for Computational Linguistics 1:179–192. Liang, Percy, Michael Jordan, and Dan Klein. 2011. Learning dependency-based compositional se- mantics. In Proceedings of the 49th Annual Meet- ing of the Association for Computational Linguis- tics: Human Language Technologies. Portland, Oregon, USA, pages 590–599. Matuszek, Cynthia, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox. 2012. A joint model of language and perception for grounded attribute learning. In Proceedings of the 29th International Conference on Machine Learn- ing. Edinburgh, Scotland, pages 1671–1678. Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint conference of the 47th Annual Meet- ing of the Association for Computational Linguis- tics and the 4th International Joint Conference on Natural Language Processing of the Asian Fed- eration of Natural Language Processing. pages 1003–1011. Parsons, Terence. 1990. Events in the Semantics of English. MIT Press, Cambridge, MA. Poon, Hoifung. 2013. Grounded unsupervised se- mantic parsing. In Proceedings of the 51st An- nual Meeting of the Association for Computa- tional Linguistics. Sofia, Bulgaria, pages 933– 943. Riedel, Sebastian, Limin Yao, Andrew McCallum, and Benjamin M Marlin. 2013. Relation ex- traction with matrix factorization and universal schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, Georgia, pages 74–84. Steedman, Mark. 2000. The Syntactic Process. The MIT Press. Wong, Yuk Wah and Raymond Mooney. 2007. Learning synchronous grammars for semantic parsing with lambda calculus. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Re- public, pages 960–967. Yao, Xuchen and Benjamin Van Durme. 2014. In- formation extraction over structured data: Ques- tion answering with freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA, pages 956–966. Zelle, John M and Raymond J Mooney. 1996. Learning to parse database queries using induc- tive logic programming. In Proceedings of the National Conference on Artificial Intelligence. Portland, Oregon, pages 1050–1055. Zettlemoyer, Luke and Michael Collins. 2007. On- line learning of relaxed CCG grammars for pars- ing to logical form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natu- ral Language Processing and Computational Nat- ural Language Learning. Prague, Czech Repub- lic, pages 678–687. Zettlemoyer, Luke S. and Michael Collins. 2005. Learning to map sentences to logical form: Struc- tured classification with probabilistic categorial grammars. In Proceedings of 21st Conference in Uncertainilty in Artificial Intelligence. Edin- burgh, Scotland, pages 658–666.