Large-scale Semantic Parsing without Question-Answer Pairs

Siva Reddy, Mirella Lapata, Mark Steedman
School of Informatics, University of Edinburgh

10 Crichton Street, Edinburgh EH8 9AB
siva.reddy@ed.ac.uk, mlap@inf.ed.ac.uk, steedman@inf.ed.ac.uk

Abstract

In this paper we introduce a novel semantic
parsing approach to query Freebase in natu-
ral language without requiring manual anno-
tations or question-answer pairs. Our key in-
sight is to represent natural language via se-
mantic graphs whose topology shares many
commonalities with Freebase. Given this rep-
resentation, we conceptualize semantic pars-
ing as a graph matching problem. Our model
converts sentences to semantic graphs using
CCG and subsequently grounds them to Free-
base guided by denotations as a form of weak
supervision. Evaluation experiments on a sub-
set of the FREE917 and WEBQUESTIONS
benchmark datasets show our semantic parser
improves over the state of the art.

1 Introduction

Querying a database to retrieve an answer, telling
a robot to perform an action, or teaching a com-
puter to play a game are tasks requiring communi-
cation with machines in a language interpretable by
them. Semantic parsing addresses the specific task
of learning to map natural language (NL) to machine
interpretable formal meaning representations. Tra-
ditionally, sentences are converted into logical form
grounded in the symbols of some fixed ontology or
relational database.

Approaches for learning semantic parsers have
been for the most part supervised, using annotated
training data consisting of sentences and their cor-
responding logical forms (Zelle and Mooney, 1996;
Zettlemoyer and Collins, 2005; Wong and Mooney,
2007; Kwiatkowski et al., 2010). More recently, al-
ternative forms of supervision have been proposed
to alleviate the annotation burden, e.g., by learn-
ing from conversational logs (Artzi and Zettlemoyer,
2011), from sentences paired with system behav-
ior (Chen and Mooney, 2011; Goldwasser and Roth,

Question What is the capital of Texas?
Logical Form λx. city(x)∧capital(x,Texas)
Answer {Austin}

Figure 1: An example question with annotated logi-
cal query, and its answer.

2011; Artzi and Zettlemoyer, 2013), via distant su-
pervision (Krishnamurthy and Mitchell, 2012; Cai
and Yates, 2013), from questions (Goldwasser et al.,
2011; Poon, 2013; Fader et al., 2013), and question-
answer pairs (Clarke et al., 2010; Liang et al., 2011).
Indeed, methods which learn from question-answer
pairs have been gaining momentum as a means
of scaling semantic parsers to large, open-domain
problems (Kwiatkowski et al., 2013; Berant et al.,
2013; Berant and Liang, 2014; Yao and Van Durme,
2014). Figure 1 shows an example of a question, its
annotated logical form, and answer (or denotation).

In this paper, we build a semantic parser
that does not require example annotations or
question-answer pairs but instead learns from a
large knowledge base (KB) and web-scale cor-
pora. Specifically, we exploit Freebase, a large
community-authored knowledge base that spans
many sub-domains and stores real world facts in
graphical format, and parsed sentences from a
large corpus. We formulate semantic parsing as a
graph matching problem. We convert the output
of an open-domain combinatory categorial gram-
mar (CCG) parser (Clark and Curran, 2007) into
a graphical representation and subsequently map it
onto Freebase. The parser’s graphs (also called un-
grounded graphs) are mapped to all possible Free-
base subgraphs (also called grounded graphs) by re-
placing edges and nodes with relations and types in
Freebase. Each grounded graph corresponds to a
unique grounded logical query. During learning, our
semantic parser is trained to identify which KB sub-
graph best corresponds to the NL graph. Problem-


capital(Austin)∧UNIQUE(Austin)∧capital.of.arg1(e,Austin)∧capital.of.arg2(e,Texas)
(a) Semantic parse of the sentence Austin is the capital of Texas.

capital capital

unique Austin e Texas

ty
p
e

capital.
of.arg1

capital.
of.arg2

unique(Austin) ∧ capital(Austin)∧
capital.of.arg1(e, Austin) ∧ capital.of.arg2(e, Texas)

(b) Ungrounded graph for semantic parse (a); UNIQUE means
that Austin is the only capital of Texas.

capital capital

target x e Texas

ty
p
e

capital.
of.arg1

capital.
of.arg2

target(x) ∧ capital(x) ∧ capital.of.arg1(e, x) ∧
capital.of.arg2(e, Texas)

{AUSTIN}

(c) Query graph after removing Austin from graph (b) and its
denotation.

location
.city

capital

target x m Texas

ty
p
e

location.
capital.arg1

location.
capital.arg2

target(x) ∧ location.city(x) ∧ location.capital.arg1(m, x) ∧
location.capital.arg2(m, Texas)

location
.city

capital

target x n Texas

ty
p
e

location.
containedby.arg1

location.
containedby.arg2

target(x) ∧ location.city(x) ∧ location.containedby.arg1(n, x) ∧
location.containedby.arg2(n, Texas)

{AUSTIN} {AUSTIN, DALLAS, HOUSTON ... }

(d) Freebase graphs for NL graph (c) and their denotations.

Figure 2: Steps involved in converting a natural language sentence to a Freebase grounded graph.

atically, ungrounded graphs may give rise to many
grounded graphs. Since we do not make use of
manual annotations of sentences or question-answer
pairs, we do not know which grounded graphs are
correct. To overcome this, we rely on comparisons
between denotations of natural language queries and
related Freebase queries as a form of weak supervi-
sion in order to learn the mapping between NL and
KB graphs.

Figure 2 illustrates our approach for the sentence
Austin is the capital of Texas. From the CCG syn-
tactic derivation (which we omit here for the sake of
brevity) we obtain a semantic parse (Figure 2a) and
convert it to an ungrounded graph (Figure 2b). Next,
we select an entity from the graph and replace it with
a variable x, creating a graph corresponding to the
query What is the capital of Texas? (Figure 2c). The
math function UNIQUE on Austin in Figure 2b indi-

cates Austin is the only value of x which can satisfy
the query graph in Figure 2c. Therefore, the denota-
tion1 of the NL query graph is {AUSTIN}. Figure 2d
shows two different groundings of the query graph
in the Freebase KB. We obtain these by replacing
edges and nodes in the query graph with Freebase
relations and types. We use the denotation of the NL
query as a form of weak supervision to select the
best grounded graph. Under the constraint that the
denotation of a Freebase query should be the same
as the denotation of the NL query, the graph on the
left hand-side of Figure 2d is chosen as the correct
grounding.

Experimental results on two benchmark datasets
consisting of questions to Freebase — FREE917
(Cai and Yates, 2013) and WEBQUESTIONS (Berant

1The denotation of a graph is the set of feasible values for the
nodes marked with TARGET.


et al., 2013) — show that our semantic parser im-
proves over state-of-the-art approaches. Our contri-
butions include: a novel graph-based method to con-
vert natural language sentences to grounded seman-
tic parses which exploits the similarities in the topol-
ogy of knowledge graphs and linguistic structure, to-
gether with the ability to train using a wide range
of features; a proposal to learn from a large scale
web corpus, without question-answer pairs, based
on denotations of queries from natural language
statements as weak supervision; and the develop-
ment of a scalable semantic parser which besides
Freebase uses CLUEWEB09 for training, a corpus
of 503.9 million webpages. Our semantic parser
can be downloaded from http://sivareddy.in/
downloads.

2 Framework

Our goal is to build a semantic parser which maps
a natural language sentence to a logical form that
can be executed against Freebase. We begin with
CLUEWEB09, a web-scale corpus automatically an-
notated with Freebase entities (Gabrilovich et al.,
2013). We extract the sentences containing at least
two entities linked by a relation in Freebase. We
parse these sentences using a CCG syntactic parser,
and build semantic parses from the syntactic out-
put. Semantic parses are then converted to seman-
tic graphs which are subsequently grounded to Free-
base. Grounded graphs can be easily converted to
a KB query deterministically. During training we
learn which grounded graphs correspond best to the
natural language input. In the following, we pro-
vide a brief introduction to Freebase and its graph
structure. Next, we explain how we obtain seman-
tic parses from CCG (Section 2.2), how we convert
them to graphs (Section 2.3), and ground them in
Freebase (Section 2.4). Section 3 presents our learn-
ing algorithm.

2.1 The Freebase Knowledge Graph

Freebase consists of 42 million entities and 2.5 bil-
lion facts. A fact is defined by a triple containing
two entities and a relation between them. Entities
represent real world concepts, and edges represent
relations, thus forming a graph-like structure.

A Freebase subgraph is shown in Figure 3 with

usa
natasha
obama

p q US
president

r s

Columbia
University

m
Barack
Obama

n
Michelle
Obama

m m n n

education
.university

Bachelor
of Arts

1992

h
e
a
d
q
u
a
rt
e
rs

.c
o
u
n
tr
y

h
e
a
d
q
u
a
rt
e
rs

.o
rg
a
n
is
a
ti
o
n

person.

nationality.arg2

person.

nationality.arg1

pe
rs
on
.

pa
re
nt
s.
ar
g2

pe
rs
on
.

pa
re
nt
s.
ar
g1

p
e
rso

n
.

p
a
re
n
ts.a

rg
2

p
e
rso

n
.

p
a
re
n
ts.a

rg
1

education.
institution

education
.student

marriage
.spouse

marriage
.spouse

education

.institution

education
.degree e

d
u
c
a
ti
o
n

.d
e
g
re
e

e
d
u
c
a
ti
o
n

.s
tu
d
e
n
t

m
arriage

.from

m
arriage

.spouse

m
a
rr
ia
g
e

.f
ro
m

m
a
rr
ia
g
e

.s
p
o
u
se

ty
p
e

ty
p
e

Figure 3: Freebase knowledge graph. Entities are
represented by rectangles, relations between enti-
ties by edges, mediator nodes by circles, types by
rounded rectangles.

rectangles denoting entities. In addition to sim-
ple facts, Freebase encodes complex facts, repre-
sented by multiple edges (e.g., the edges connect-
ing BARACK OBAMA, COLUMBIA UNIVERSITY
and BACHELOR OF ARTS). Complex facts have in-
termediate nodes called mediator nodes (circles in
Figure 3 with the same identifiers e.g., m and n).
For reasons of uniformity, we assume that simple
facts are also represented via mediator nodes and
split single edges into two with each subedge going
from the mediator node to the target node (see per-
son.nationality.arg1 and person.nationality.arg2 in Fig-
ure 3). Finally, Freebase also has entity types defin-
ing is-a relations. In Figure 3 types are represented
by rounded rectangles (e.g., BARACK OBAMA is of
type US president, and COLUMBIA UNIVERSITY is
of type education.university).

2.2 Combinatory Categorial Grammar

The graph like structure of Freebase inspires us
to create a graph like structure for natural lan-
guage, and learn a mapping between them. To
do this we take advantage of the representational
power of Combinatory Categorial Grammar (Steed-
man, 2000). CCG is a linguistic formalism that
tightly couples syntax and semantics, and can be
used to model a wide range of language phenom-


Cameron directed Titanic

NP (S\NP)/NP NP
Cameron λyλx. directed.arg1(e,x) Titanic

∧ directed.arg2(e,y)
>

S\NP
λx. directed.arg1(e,x)
∧ directed.arg2(e, Titanic)

<
S

directed.arg1(e, Cameron)∧ directed.arg2(e, Titanic)

1
Figure 4: CCG derivation containing both syntactic
and semantic parse construction.

ena. CCG is well known for capturing long-range
dependencies inherent in constructions such as co-
ordination, extraction, raising and control, as well
as standard local predicate-argument dependencies
(Clark et al., 2002), thus supporting wide-coverage
semantic analysis. Moreover, due to the transparent
interface between syntax and semantics, it is rela-
tively straightforward to built a semantic parse for a
sentence from its corresponding syntactic derivation
tree (Bos et al., 2004).

In our case, the choice of syntactic parser is moti-
vated by the scale of our problem; the parser must
be broad-coverage and robust enough to handle a
web-sized corpus. For these reasons, we rely on the
C&C parser (Clark and Curran, 2004), a general-
purpose CCG parser, to obtain syntactic deriva-
tions. To our knowledge, we present the first at-
tempt to use a CCG parser trained on treebanks for
grounded semantic parsing. Most previous work has
induced task-specific CCG grammars (Zettlemoyer
and Collins, 2005, 2007; Kwiatkowski et al., 2010).
An example CCG derivation is shown in Figure 4.

Semantic parses are constructed from syntac-
tic CCG parses, with semantic composition be-
ing guided by the CCG syntactic derivation.2 We
use a neo-Davidsonian (Parsons, 1990) semantics
to represent semantic parses.3 Each word has
a semantic category based on its syntactic cate-
gory and part of speech. For example, the syn-
tactic category for directed is (S\NP)/NP, i.e., it

2See Bos et al. (2004) for a detailed introduction to semantic
representation using CCG.
3Neo-Davidsonian semantics is a form of first-order logic that
uses event identifiers (e) to connect verb predicates and their
subcategorized arguments through conjunctions.

Titanic

e

Cameron directed e

e

1997

dir
ec
ted

.ar
g1

dir
ect

ed

.ar
g2

d
ire

c
te
d
.in

d
ire

c
te
d
.a
rg
2

directed.arg1

directed.in

directed.arg1(e, Cameron) ∧
directed.arg2(e, Titanic) ∧ directed.in(e, 1997)

(a) Ungrounded graph

Titanic

m

Cameron directed n

1997

film
.di

rec
ted

by

.ar
g2

film
.di

rec
ted

by

.ar
g1

fi
lm

.in
itia

l
re

le
a

se
d

a
te

.a
rg

2
fi

lm
.in

itia
l

re
le

a
se

d
a

te
.a

rg
1

film.directed by.arg2(m, Cameron) ∧
film.directed by.arg1(m, Titanic) ∧

film.initial release date.arg1(n, Titanic) ∧
film.initial release date.arg2(n, 1997)

(b) Grounded graph

Figure 5: Graph representations for the sentence
Cameron directed Titanic in 1997.

takes two argument NPs and becomes S. To rep-
resent its semantic category, we use a lambda
term λyλx. directed.arg1(e,x)∧ directed.arg2(e,y),
where e identifies the event of directed, and x and
y are arguments corresponding to the NPs in the syn-
tactic category.

We obtain semantic categories automatically us-
ing the indexed syntactic categories provided by
the C&C parser. The latter reveal the bind-
ings of basic constituent categories in more com-
plex categories. For example, in order to convert
((S\NP)\(S\NP))/NP to its semantic category, we
must know whether all NPs have the same refer-
ent and thus use the same variable name. The in-
dexed category ((Se\NPx)\(Se\NPx))/NPy reveals
that there are only two different NPs, x and y, and
that one of them (i.e., x) is shared across two subcat-
egories. We discuss the details of semantic category
construction in the Appendix.

Apart from n-ary predicates representing events
(mostly verbs), we also use unary predicates repre-
senting types in language (mostly common nouns
and noun modifiers). For example, capital(Austin)
indicates Austin is of type capital. Prepositions, ad-
jectives and adverbs are represented by predicates
lexicalized with their head words to provide more
information (see capital.of.arg1 instead of of.arg1 in
Figure 2a).

2.3 Ungrounded Semantic Graphs

We will now illustrate how we create ungrounded
semantic graphs from CCG-derived semantic parses.
Figure 5a displays the ungrounded graph for the sen-


Who target directed

x e The Nutty
Professor

directed.arg1
directed
.arg2

target(x) ∧ directed.arg1(e, x) ∧
directed.arg2(e, TheNuttyProfessor)

(a) Who directed The Nutty Professor?

capital capital

unique Austin e Texas

capital
.state

state

ty
p
e

ty
p
e

capital.of
.arg1

capital.of
.arg2

unique(Austin) ∧ capital(Austin) ∧ capital.state(Austin) ∧
capital.of.arg1(e, Austin) ∧ capital.of.arg2(e, Texas)

(b) Austin is the state capital of Texas.

Figure 6: Ungrounded graphs with math functions
TARGET and UNIQUE.

tence Cameron directed Titanic in 1997. In order
to construct ungrounded graphs topologically simi-
lar to Freebase, we define five types of nodes:

Word Nodes (Ovals) Word nodes are denoted
by ovals. They represent natural language words
(e.g., directed in Figure 5a, capital and state in Fig-
ure 6b). Word nodes are connected to other word
nodes via syntactic dependencies. For readability,
we do not show inter-word dependencies.

Entity Nodes (Rectangles) Entity nodes are
denoted by rectangles and represent entities
e.g., Cameron in Figure 5a. In cases where an entity
is not known, we use variables e.g., x in Figure 6a.
Entity variables are connected to their correspond-
ing word nodes from which they originate by dotted
links e.g., x in Figure 6a is connected to the word
node who.

Mediator Nodes (Circles) Mediator nodes are de-
noted by circles and represent events in language.
They connect pairs of entities which participate in
an event forming a clique (see the entities Cameron,
Titanic and 1997 in Figure 5a). We define an edge
as a link that connects any two entities via a medi-
ator. The subedge of an edge i.e., the link between
a mediator and an entity, corresponds to the predi-

cate denoting the event and taking the entity as its
argument (e.g. directed.arg1 links e and Cameron in
Figure 5a). Mediator nodes are connected to their
corresponding word nodes from which they origi-
nate by dotted links e.g. mediators in Figure 5a are
connected to word node directed.

Type nodes (Rounded rectangles) Type nodes
are denoted by rounded rectangles. They represent
unary predicates in natural language. In Figure 6b
type nodes capital and capital.state are attached to
Austin denoting Austin is of type capital and capi-
tal.state. Type nodes are also connected to their cor-
responding word nodes from which they originate
by dotted links e.g. type node capital.state and word
node state in Figure 6b.

Math nodes (Diamonds) Math nodes are denoted
by diamonds. They describe functions to be applied
on the nodes/subgraphs they attach to. The function
TARGET attaches to the entity variable of interest.
For example, the graph in Figure 6a represents the
question Who directed The Nutty Professor?. Here,
TARGET attaches to x representing the word who.
UNIQUE attaches to the entity variable modified by
the definite article the. In Figure 6b, UNIQUE at-
taches to Austin implying that only Austin satisfies
the graph. Finally, COUNT attaches to entity nodes
which have to be counted. For the sentence Julie An-
drews has appeared in 40 movies in Figure 7, the KB
could either link Julie Andrews and 40, with type
node movies matching the grounded type integer, or
it could link Julie Andrews to each movie she acted
in and the count of these different movies add to 40.
In anticipation of this ambiguity, we generate two
semantic parses resulting in two ungrounded graphs
(see Figures 7a and 7b). We generate all possible
grounded graphs corresponding to each ungrounded
graph, and leave it up to the learning to decide which
ones the KB prefers.

2.4 Grounded Semantic Graphs
We ground semantic graphs in Freebase by mapping
edge labels to relations, type nodes to entity types,
and entity nodes to Freebase entities. Math nodes
remain unchanged. Though word nodes are not
present in Freebase, we retain them in our grounded
graphs to extract sophisticated features based on
words and grounded predicates.


appeared movies movies

Julie
Andrews

e 40

appeared
.arg1 appeared.in

ty
p
e

appeared.arg1(e, JulieAndrews) ∧ appeared.in(e, 40) ∧ movies(40)

(a) Ungrounded Graph

appeared movies movies

Julie
Andrews

e z count 40

appeared
.arg1

appeared
.in

ty
p
e

appeared.arg1(e, JulieAndrews) ∧ appeared.in(e, z)
∧ movies(z) ∧ count(z, 40)

(b) Alternate Ungrounded Graph

appeared film movies

Julie
Andrews

m z count 40

performance
.actor

performance
.film

ty
p

e

performance.actor(m, JulieAndrews) ∧ performance.film(m, z)
∧ film(z) ∧ count(z, 40)

(c) Grounded graph

Figure 7: Graph representations for the sentence
Julie Andrews has appeared in 40 movies. Un-
grounded graph (a) directly connects Julie Andrews
and 40, whereas graph (b) uses the math func-
tion COUNT. Ungrounded graph (b) and grounded
graph (c) have similar topology.

Entity nodes Previous approaches (Cai and Yates,
2013; Berant et al., 2013; Kwiatkowski et al.,
2013) use a manual lexicon or heuristics to ground
named entities to Freebase entities. Fortunately,
CLUEWEB09 sentences have been automatically
annotated with Freebase entities, so we use these an-
notations to ground proper names to Freebase enti-
ties (denoted by uppercase words) e.g., Cameron in
Figure 5a is grounded to Freebase entity CAMERON
in Figure 5b. Common nouns like movies (see Fig-
ure 7b) are left as variables to be instantiated by the
entities satisfying the graph.

Type nodes Type nodes are grounded to Free-
base entity types. Type nodes capital and capi-
tal.state in Figure 6b are grounded to all possible
types of Austin (e.g., location.city, location.capital city,
book.book subject, broadcast.genre). In cases where
entity nodes are not grounded, (e.g., z in Figure 7b),

employees employees 120000

e

Alcoa has e

e

2007

type

ha
s.a

rg1

ha
s.a
rg
2

h
a
s.in

h
a
s.
a
rg
2

has.arg1

has.in

has.arg1(e, Alcoa) ∧ has.arg2(e, 120000)
∧ has.in(e, 2007) ∧ employees(120000)

(a) Ungrounded Graph

employees type.int 119000

m

Alcoa has m

m

2007

type

em
plo

ye
r.n

um
be
r

of
em

plo
ye
es

.in
ve
rse

m
ea
su
re
m
en
t
un
it

.d
at
ed

in
te
ge
r

.n
um

be
r

m
e
a
su
re
m
e
n
t
u
n
it

.d
a
te
d
in
te
g
e
r

.y
e
a
r

m
e
a
su
re
m
e
n
t
u
n
it

.d
a
te
d
in
te
g
e
r

.n
u
m
b
e
r

employer.number

of employees
.inverse

m
easurem

ent unit

.dated
integer

.year

employer.number of employees.inverse(m, Alcoa) ∧
measurement unit.dated integer.number(m, 119000)
∧ measurement unit.dated integer.year(m, 2007) ∧

type.int(119000)

(b) Grounded Graph

Figure 8: Graph representations for Alcoa has
120000 employees in 2007.

we use an automatically constructed lexicon which
maps ungrounded types to grounded ones (see Sec-
tion 4.2 for details).

Edges An edge between two entities is grounded
using all edges linking the two entities in the knowl-
edge graph. For example, to ground the edge be-
tween Titanic and Cameron in Figure 5, we use the
following edges linking TITANIC and CAMERON in
Freebase: (film.directed by.arg1, film.directed by.arg2),
(film.produced by.arg1, film.produced by.arg2). If only
one entity is grounded, we use all possible edges
from this grounded entity. If no entity is grounded,
we use a mapping lexicon which is automatically
created as described in Section 4.2. Given an un-
grounded graph with n edges, there are O((k + 1)n)
possible grounded graphs, with k being the grounded
edges in the knowledge graph for each ungrounded
edge together with an additional empty (no) edge.

Mediator nodes In an ungrounded graph, media-
tor nodes represent semantic event identifiers. In the
grounded graph, they represent Freebase fact identi-
fiers. Fact identifiers help distinguish if neighboring
edges belong to a single complex fact, which may or
may not be coextensive with an ungrounded event.
In Figure 8a, the edges corresponding to the event
identifier e are grounded to a single complex fact in
Figure 8b, with the fact identifier m. However, in
Figure 5a, the edges of the ungrounded event e are
grounded to different Freebase facts, distinguished
in Figure 5b by the identifiers m and n. Furthermore,


the edge in 5a between CAMERON and 1997 is not
grounded in 5b, since no Freebase edge exists be-
tween the two entities.

We convert grounded graphs to SPARQL queries,
but for readability we only show logical expressions.
The conversion is deterministic and is exactly the
inverse of the semantic parse to graph conversion
(Section 2.3). Wherever a node/edge is instantiated
with a grounded entity/type/relation in Freebase, we
use them in the grounded parse (e.g., type node cap-
ital.state in Figure 6b becomes location.capital city).
Math function TARGET is useful in retrieving instan-
tiations of entity variables of interest (see Figure 6a).

3 Learning

A natural language sentence may give rise to several
grounded graphs. But only one (or a few) of them
will be a faithful representation of the sentence in
Freebase. We next describe our algorithm for find-
ing the best Freebase graph for a given sentence, our
learning model, and the features it uses.

3.1 Algorithm
Freebase has a large number of relations and enti-
ties, and as a result there are many possible grounded
graphs g for each ungrounded graph u. We con-
struct and score graphs incrementally, traversing
each node in the ungrounded graph and matching its
edges and types in Freebase. Given a NL sentence s,
we construct from its CCG syntactic derivation all
corresponding ungrounded graphs u. Using a beam
search procedure (described in Section 4.2), we find
the best scoring graphs (ĝ,û), maximizing over dif-
ferent graph configurations (g,u) of s:

(ĝ,û) = arg max
g,u

Φ(g,u,s,K B)·θ (1)

We define the score of (ĝ,û) as the dot product
between a high dimensional feature representation
Φ = (Φ1,...Φm) and a weight vector θ (see Sec-
tion 3.3 for details on the features we employ).

We estimate the weights θ using the averaged
structured perceptron algorithm (Collins, 2002). As
shown in Algorithm 1, the perceptron makes sev-
eral passes over sentences, and in each iteration it
computes the best scoring (ĝ,û) among the candi-
date graphs for a given sentence. In line 6, the al-
gorithm updates θ with the difference (if any) be-

Algorithm 1: Averaged Structured Perceptron
Input: Training sentences: {si}Ni=1

1 θ ← 0
2 for t ← 1...T do
3 for i ← 1...N do
4 (ĝi,ûi) = arg max

gi,ui
Φ(gi,ui,si,K B)·θ

5 if (u+i ,g
+
i ) 6= (ûi,ĝi) then

6 θ ← θ + Φ(g+i ,u
+
i ,si,K B)−Φ(ĝi,ûi,si,K B)

7 return 1T ∑
T
t=i

1
N ∑

N
i=1 θ

i
t

tween the feature representations of the best scoring
graph (ĝ,û) and the gold standard graph (g+,u+).
The goal of the algorithm is to rank gold standard
graphs higher than the any other graphs. The final
weight vector θ is the average of weight vectors over
T iterations and N sentences. This averaging pro-
cedure avoids overfitting and produces more stable
results (Collins, 2002).

As we do not make use of question-answer pairs
or manual annotations of sentences, gold standard
graphs (g+,u+) are not available. In the following,
we explain how we approximate them by relying on
graph denotations as a form of weak supervision.

3.2 Selecting Surrogate Gold Graphs

Let u be an ungrounded semantic graph of s. We se-
lect an entity E in u, replace it with a variable x,
and make it a target node. Let u+ represent the
resulting ungrounded graph. Next, we obtain all
grounded graphs g+ which correspond to u+ such
that the denotations [[u+]]K B = [[g

+]]N L . We use
these surrogate graphs g+as gold standard, and the
pairs (u+,g+) for model training. There is con-
siderable latitude in choosing which entity E to re-
place. This can be done randomly, according to en-
tity frequency, or some other criterion. We found
that substituting the entity with the most connections
to other entities in the sentence works well in prac-
tice. All the entities that can replace x in u+ to con-
stitute a valid fact in Freebase will be the denotation
of u+, [[u+]]N L . While it is straightforward to com-
pute [[g+]]K B , it is hard to compute [[u

+]]N L because
of the mismatch between our natural language se-
mantic language and the Freebase query language.
To ensure that graphs u+ and g+ have the same de-
notations, we impose the following constraints:


Constraint 1 If the math function UNIQUE is
attached to the entity being replaced in the un-
grounded graph, we assume the denotation of u+

contains only that entity. For example, in Fig-
ure 2b, we replace Austin by x, and thus assume
[[u+]]N L = {AUSTIN}.4 Any grounded graph which
results in [[g+]]K B = {AUSTIN} will be considered a
surrogate gold graph. This allows us to learn entail-
ment relations, e.g., capital.of should be grounded to
location.capital (left hand-side graph in Figure 2d) and
not to location.containedby which results in all loca-
tions in Texas (right hand-side graph in Figure 2d).

Constraint 2 If the target entity node is a num-
ber, we select the Freebase graphs with denota-
tion close to this number. For example, in Fig-
ure 8a if 120,000 is replaced by x, and we as-
sume [[u+]]N L = {120,000}. However, the grounded
graph 8b retrieves [[g+]]K B = {119,000}. We treat
this as correct if β

γ
∈ [0.9,1.1] where β ∈ [[u+]]N L

and γ ∈ [[g+]]K B . Integers can either occur directly
in relation with an entity as in Figure 8b, or must be
enumerated as in Figure 7c.

Constraint 3 If the target entity node is a date, we
select the grounded graph which results in the small-
est set containing the date based on the intuition that
most sentences in the data describe specific rather
than general events.

Constraint 4 If none of the above constraints ap-
ply to the target entity E, we know E ∈ [[u+]]N L , and
hence we select the grounded graphs which satisfy
E ∈ [[g+]]K B as surrogate gold graphs.

3.3 Features

Our feature vector Φ(g,u,s,K B) denotes the fea-
tures extracted from a sentence s and its correspond-
ing graphs u and g with respect to a knowledge
base K B . The elements of the vector (φ1, φ2, . . . )
take integer values denoting the number of times a
feature appeared. We devised the following broad
feature classes:

Lexical alignments Since ungrounded graphs
are similar in topology to grounded graphs,
we extract ungrounded and grounded edge
4We also remove UNIQUE attached to x to exactly mimic the test
time setting.

and type alignments. So, from graphs 5a
and 5b, we obtain the edge alignment
φedge(directed.arg1, directed.arg2, film.directed by.arg2,
film.directed by.arg1) and the subedge align-
ments φedge(directed.arg1, film.directed by.arg2)
and φedge(directed.arg2, film.directed by.arg1). In
a similar fashion we extract type alignments
(e.g., φtype(capital,location.city)).

Contextual features In addition to lexical
alignments, we also use contextual features
which essentially record words or word com-
binations surrounding grounded edge labels.
Feature φevent records an event word and its
grounded predicates (e.g., in Figure 7c we
extract features φevent(appear, performance.film)
and φevent(appear, performance.actor). Fea-
ture φarg records a predicate and its argument
words (e.g., φarg(performance.film,movie) in
Figure 7c). Word combination features are ex-
tracted from the parser’s dependency output.
The feature φdep records a predicate and the
dependencies of its event word (e.g., from the
grounded version of Figure 6b we extract features
φdep(location.state.capital.arg1,capital,state) and
φdep(location.state.capital.arg2,capital,state)). Using
such features, we are able to handle multiword
predicates.

Lexical similarity We count the number of
word stems5 shared by grounded and ungrounded
edge labels e.g., in Figure 5 directed.arg1 and
film.directed by.arg2 have one stem overlap (ignoring
the argument labels arg1 and arg2). For a grounded
graph, we compute φstem, the aggregate stem overlap
count over all its grounded and ungrounded edge la-
bels. We did not incorporate WordNet/Wiktionary-
based lexical similarity features but these were
found fruitful in Kwiatkowski et al. (2013). We also
have a feature for stem overlap count between the
grounded edge labels and the context words.

Graph connectivity features These features pe-
nalize graphs with non-standard topologies. For ex-
ample, we do not want a final graph with no edges.
The feature value φhasEdge is one if there exists at
least one edge in the graph. We also have a fea-
ture φnodeCount for counting the number of connected
5We use the Porter stemmer.


Domain #Rels #Types #Triples #Train #Free917 #WebQ
business 226 102 23m 30k 46 49
film 113 75 42m 13k 49 91
people 85 59 68m 56k 29 430
all* 411 210 120m 99k 124 570

Table 1: Domain-specific Freebase statistics (*some
relations/types/triples are shared across domains);
number of training CLUEWEB09 sentences; number
of test questions in FREE917 and WEBQUESTIONS.

nodes in the graph. Finally, feature φcolloc captures
the collocation of grounded edges (e.g., edges be-
longing to a single complex fact are likely to co-
occur; see Figure 8b).

4 Experimental Setup

In this section we present our experimental set-up
for assessing the performance of the semantic parser
described above. We present the datasets on which
our model was trained and tested, discuss implemen-
tation details, and briefly introduce the models used
for comparison with our approach.

4.1 Data

We evaluated our approach on the FREE917 (Cai
and Yates, 2013) and WEBQUESTIONS (Berant
et al., 2013) datasets. FREE917 consists of
917 questions and their meaning representations
(written in a variant of lambda calculus) which
we, however, do not use. The dataset represents
81 domains covering 635 Freebase relations, with
most domains containing fewer than 10 questions.
We report results on three domains, namely film,
business, and people as these are relatively large
in both FREE917 and Freebase. WEBQUESTIONS
consists of 5,810 question-answer pairs, 2,780 of
which are reserved for testing. Our experiments
used a subset of WEBQUESTIONS representing the
three target domains. We extracted domain-specific
queries semi-automatically by identifying question-
answer pairs with entities in target domain relations.
In both datasets, named entities were disambiguated
to Freebase entities with a named entity lexicon.6

Table 1 presents descriptive statistics for each do-
main. Evaluating on all domains in Freebase would
6FREE917 comes with a named entity lexicon. For WEBQUES-
TIONS we hand-coded this lexicon.

generate a very large number of queries for which
denotations would have to be computed (the number
of queries is linear in the number of domains and the
size of training data). Our system loads Freebase us-
ing Virtuoso7 and queries it with SPARQL. Virtuoso
is slow in dealing with millions of queries indexed
on the entire Freebase, and is the only reason we did
not work with the complete Freebase.

4.2 Implementation

To train our model, we extracted sentences from
CLUEWEB09 which contain at least two entities as-
sociated with a relation in Freebase, and have an
edge between them in the ungrounded graph. These
were further filtered so as to remove sentences which
do not yield at least one semantic parse without an
uninstantiated entity variable. For example, the sen-
tence Avatar is directed by Cameron would be used
for training, whereas Avatar directed by Cameron re-
ceived a critical review wouldn’t. In the latter case,
any semantic parse will have an uninstantiated en-
tity variable for review. Table 1 (Train) shows the
number of sentences we obtained.

In order to train our semantic parser, we ini-
tialized the alignment and type features (φedge
and φtype, respectively) with the alignment lexicon
weights. These weights are computed as follows.
Let count(r′,r) denote the number of pairs of enti-
ties which are linked with edge r′ in Freebase and
edge r in CLUEWEB09 sentences. We then estimate
the probability distribution P(r′/r) = count(r

′,r)
∑i count(r

′
i,r)

.
Analogously, we created a type alignment lexicon.
The counts were collected from CLUEWEB09 sen-
tences containing pairs of entities linked with an
edge in Freebase (business 390k, film 130k, and
people 490k). Contextual features were initialized
to −1 since most word contexts and grounded predi-
cates/types do not appear together. All other features
were set to 0.

We used a beam-search algorithm to convert un-
grounded graphs to grounded ones. The edges and
types of each ungrounded graph are placed in a pri-
ority queue. Priority is based on edge/type tf-idf
scores collected over CLUEWEB09. At each step,
we pop an element from the queue and ground it in
Freebase. We rank the resulting grounded graphs us-

7http://virtuoso.openlinksw.com


ing the perceptron model, and pick the n-best ones,
where n is the beam size. We continue until the
queue is empty. In our experiments we used a beam
size of 100. We trained a single model for all the do-
mains combined together. We ran the perceptron for
20 iterations (around 5–10 million queries). At each
training iteration we used 6,000 randomly selected
sentences from the training corpus.

4.3 Comparison Systems

We compared our graph-based semantic parser
(henceforth GRAPHPARSER) against two state-of-
the-art systems both of which are open-domain and
work with Freebase. The semantic parser developed
by Kwiatkowski et al. (2013) (henceforth KCAZ13)
is learned from question-answer pairs and follows a
two-stage procedure: first, a natural language sen-
tence is converted to a domain-independent seman-
tic parse and then grounded onto Freebase using a
set of logical-type equivalent operators. The opera-
tors explore possible ways sentential meaning could
be expressed in Freebase and essentially transform
logical form to match the target ontology. Our ap-
proach also has two steps (i.e., we first generate mul-
tiple ungrounded graphs and then ground them to
different Freebase graphs). We do not use opera-
tors to perform structure matching, rather we cre-
ate multiple graphs and leave it up to the learner to
find an appropriate grounding using a rich feature
space. To give a specific example, their operator lit-
eral to constant is equivalent to having named enti-
ties for larger text chunks in our case. Their operator
split literal explores different edge possibilities in an
event whereas we start with a clique and remove un-
wanted edges. Our approach has (almost) similar
expressive power but is conceptually simpler.

Our second comparison system was the seman-
tic parser of Berant and Liang (2014) (henceforth
PARASEMPRE) which also uses QA pairs for train-
ing and makes use of paraphrasing. Given an in-
put NL sentence, they first construct a set of logi-
cal forms based on hand-coded rules, and then gen-
erate sentences from each logical form (using gen-
eration templates and a lexicon). Pairs of logical
forms and natural language are finally scored us-
ing a paraphrase model consisting of two compo-
nents. An association model determines whether
they contain phrase pairs likely to be paraphrases

System Prec Rec F1
MWG 52.6 49.1 50.8
KCAZ13 72.6 66.1 69.2
GRAPHPARSER 81.9 76.6 79.2

Table 2: Experimental results on FREE917.

and a vector space model assigns a vector represen-
tation for each sentence, and learns a scoring func-
tion that ranks paraphrase candidates. Our seman-
tic parser employs a graph-based representation as
a means of handling the mismatch between natu-
ral language, whereas PARASEMPRE opts for a text-
based one through paraphrasing.

Finally, we compared our semantic parser against
a baseline which is based on graphs but em-
ploys no learning. The baseline converts an un-
grounded graph to a grounded one by replacing each
ungrounded edge/type with the highest weighted
grounded label creating a maximum weighted graph,
henceforth MWG. Both GRAPHPARSER and the
baseline use the same alignment lexicon (a weighted
mapping from ungrounded to grounded labels).

5 Results

Table 2 summarizes our results on FREE917. As
described earlier, we evaluated GRAPHPARSER on
a subset of the dataset representing three domains
(business, film, and people). Since this sub-
set contains a relatively small number of instances
(124 in total), we performed 10-fold cross valida-
tion with 9 folds as development data8, and one fold
as test data. We report results averaged over all
test folds. With respect to KCAZ13, we present re-
sults with their cross-domain trained models, where
training data from multiple domains is used to test
foreign domains.9 KCAZ13 used generic features
like string similarity and knowledge base features
which apply across domains and do not require in-
domain training data. We do not report results with
PARASEMPRE as the small number of training in-
stances would put their method at a disadvantage.
We treat a predicted query as correct if its denota-

8The development data is only used for model selection and for
determining the optimal training iteration.
9We are grateful to Tom Kwiatkowski for supplying us with the
output of their system.


Features FREE917 WEBQ
All 79.2 41.4
−Contextual 73.3 42.6
−Alignment 66.7 34.8
−Connectivity 65.0 36.6
−Similarity 62.5 35.0

Table 3: GRAPHPARSER ablation results on
FREE917 and WEBQUESTIONS development set.

tion is exactly equal to the denotation of the manu-
ally annotated gold query.

As can be seen, GRAPHPARSER outperforms
KCAZ13 and the MWG baseline by a wide margin.
This is an encouraging result bearing in mind that
our model does not use question-answer pairs. We
should also point out that our domain relation set is
larger compared to KCAZ13. We do not prune any of
the relations in Freebase, whereas KCAZ13 use only
112 relations and 83 types from our three domains
(see Table 1). We further performed a feature ab-
lation study to examine the contribution of different
feature classes. As shown in Table 3, the most im-
portant features are those based on lexical similar-
ity, as also observed in KCAZ13. Graph connectivity
and lexical alignments are equally important (these
features are absent from KCAZ13). Contextual fea-
tures are not very helpful over and above alignment
features which also encode contextual information.
Overall, generic features like lexical similarity are
helpful only to a certain extent; the performance of
GRAPHPARSER improves considerably when addi-
tional graph-related features are taken into account.

We also analyzed the errors GRAPHPARSER
makes. 25% of these are caused by the C&C parser
and are cases where it either returns no syntactic
analysis or a wrong one. 19% of the errors are
due to Freebase inconsistencies. For example, our
system answered the question How many stores are
in Nittany mall? with 65 using the relation shop-
ping center.number of stores whereas the gold stan-
dard provides the answer 25 counting all stores using
the relation shopping center.store. Around 15% of er-
rors include structural mismatches between natural
language and Freebase; for the question Who is the
president of Gap Inc?, our method grounds president
to a grounded type whereas in Freebase it is repre-
sented as a relation employment.job.title. The remain-

System Prec Rec F1
MWG 39.4 34.0 36.5
PARASEMPRE 37.5 37.5 37.5
GRAPHPARSER 41.9 37.0 39.3
GRAPHPARSER + PARA 44.7 38.4 41.3

Table 4: Experimental results on WEBQUESTIONS.

ing errors are miscellaneous. For example, the ques-
tion What are some films on Antarctica? receives
two interpretations, i.e., movies filmed in Antarctica
or movies with Antarctica as their subject.

We next discuss our results on WEBQUESTIONS.
PARASEMPRE was trained with 1,115 QA pairs
(corresponding to our target domains) together with
question paraphrases obtained from the PARALEX
corpus (Fader et al., 2013).10 While training
PARASEMPRE, out-of-domain Freebase relations
and types were removed. Both GRAPHPARSER and
PARASEMPRE were tested on the same set of 570
in-domain QA pairs with exact answer match as
the evaluation criterion. For development purposes,
GRAPHPARSER uses 200 QA pairs. Table 4 displays
our results. We observe that GRAPHPARSER obtains
a higher F1 against MWG and PARASEMPRE. Dif-
ferences in performance among these systems are
less pronounced compared to FREE917. This is for
a good reason. WEBQUESTIONS is a challenging
dataset, created by non-experts. The questions are
not tailored to Freebase in any way, they are more
varied and display a wider vocabulary. As a result
the mismatch between natural language and Free-
base is greater and the semantic parsing task harder.

Error analysis further revealed that parsing errors
are responsible for 13% of the questions GRAPH-
PARSER fails to answer. Another cause of er-
rors is mismatches between natural language and
Freebase. Around 7% of the questions are of the
type Where did X come from?, and our model an-
swers with the individual’s nationality, whereas an-
notators provide the birthplace (city/town/village)
as the right answer. Moreover, 8% of the ques-
tions are of the type What does X do?, which the
annotators answer with the individual’s profession.
In natural language, we rarely attest constructions
10We used the SEMPRE package (http://www-nlp.
stanford.edu/software/sempre/) which does not use
any hand-coded entity disambiguation lexicon.


like X does dentist/researcher/actor. The proposed
framework assumes that Freebase and natural lan-
guage are somewhat isomorphic, which is not al-
ways true. An obvious future direction would be to
paraphrase the questions so as to increase the num-
ber of grounded and ungrounded graphs. As an illus-
tration, we rewrote questions like Where did X come
from to What is X’s birth place, and What did X do
to What is X’s profession and evaluated our model
GRAPHPARSER + PARA. As shown in Table 4, even
simple paraphrasing can boost performance.

Finally, Table 3 (third column) examines the con-
tribution of different features on the WEBQUES-
TIONS development dataset. Interestingly, we ob-
serve that contextual features are not useful and in
fact slightly harm performance. We hypothesize that
this is due to the higher degree of mismatch between
natural language and Freebase in this dataset. Fea-
tures based on similarity, graph connectivity, and
lexical alignments are more robust and generally
useful across datasets.

6 Discussion

In this paper, we introduce a new semantic pars-
ing approach for Freebase. A key idea in our work
is to exploit the structural and conceptual similari-
ties between natural language and Freebase through
a common graph-based representation. We formal-
ize semantic parsing as a graph matching problem
and learn a semantic parser without using annotated
question-answer pairs. We have shown how to ob-
tain graph representations from the output of a CCG
parser and subsequently learn their correspondence
to Freebase using a rich feature set and their deno-
tations as a form of weak supervision. Our parser
yields state-of-the art performance on three large
Freebase domains and is not limited to question an-
swering. We can create semantic parses for any type
of NL sentences.

Our work brings together several strands of re-
search. Graph-based representations of sentential
meaning have recently gained some attention in the
literature (Banarescu et al., 2013), and attempts to
map sentences to semantic graphs have met with
good inter-annotator agreement. Our work is also
closely related to Kwiatkowski et al. (2013) and Be-
rant and Liang (2014) who present open-domain se-

mantic parsers based on Freebase and trained on QA
pairs. Despite differences in formulation and model
structure, both approaches have explicit mechanisms
for handling the mismatch between natural language
and the KB (e.g., using logical-type equivalent oper-
ators or paraphrases). The mismatch is handled im-
plicitly in our case via our graphical representation
which allows for the incorporation of all manner of
powerful features. More generally, our method is
based on the assumption that linguistic structure has
a correspondence to Freebase structure which does
not always hold (e.g., in Who is the grandmother
of Prince William?, grandmother is not directly ex-
pressed as a relation in Freebase). Additionally,
our model fails when questions are too short with-
out any lexical clues (e.g., What did Charles Darwin
do? ). Supervision from annotated data or paraphras-
ing could improve performance in such cases. In the
future, we plan to explore cluster-based semantics
(Lewis and Steedman, 2013) to increase the robust-
ness on unseen NL predicates.

Our work joins others in exploiting the connec-
tions between natural language and open-domain
knowledge bases. Recent approaches in relation ex-
traction use distant supervision from a knowledge
base to predict grounded relations between two tar-
get entities (Mintz et al., 2009; Hoffmann et al.,
2011; Riedel et al., 2013). During learning, they ag-
gregate sentences containing the target entities, ig-
noring richer contextual information. In contrast,
we learn from each individual sentence taking into
account all entities present, their relations, and how
they interact. Krishnamurthy and Mitchell (2012)
formalize semantic parsing as a distantly supervised
relation extraction problem combined with a manu-
ally specified grammar to guide semantic parse com-
position.

Finally, our approach learns a model of seman-
tics guided by denotations as a form of weak su-
pervision. Beyond semantic parsing (Artzi and
Zettlemoyer, 2013; Liang et al., 2011; Clarke et al.,
2010), feedback-based learning has been previously
used for interpreting and following NL instructions
(Branavan et al., 2009; Chen and Mooney, 2011),
playing computer games (Branavan et al., 2012), and
grounding language in the physical world (Krishna-
murthy and Kollar, 2013; Matuszek et al., 2012).


Lemma POS Semantic Class Semantic Category
* VB*, IN, TO,

POS
EVENT directed : (Se\NPx<1>)/NPy<2> : λQλPλe.∃x∃y.

directed.arg1(e,x)∧directed.arg2(e,y)∧P(x)∧Q(y)
* NN, NNS TYPE movie : NP : λx.movie(x)
* NNP*, PRP* ENTITY Obama : NP : λx.equal(x,Obama)
* RB* EVENTMOD annually : Se\Se : λPλe.lexe.annually(e)∧P(e)
* JJ* TYPEMOD state : NPx/NPx : λPλx.lexx.state(x)∧P(x)
be * COPULA be: (Sy\NPx)/NPy : λQλPλy.∃x.lexy(x)∧P(x)∧Q(y)
the * UNIQUE the : NPx/NPx : λPλx.UNIQUE(x)∧P(x)
* CD COUNT twenty : Nx/Nx : λPλx.COUNT(x,20)∧P(x)

twenty : Nx/Nx : λPλx.equal(x,20)∧P(x)
not, n’t * NEGATION not : (Se\NPx)/(Se\NPx)

: λPλQλe.∃x.NEGATION(e)∧P(x,e)∧Q(x)
no * COMPLEMENT no : NPx/Nx : λPλx.COMPLEMENT(x)∧P(x)
* WDT, WP*, QUESTION what : S[wq]e/(S[dcl]e\NPx)

WRB : λPλe.∃x.TARGET(x)∧P(x,e)
* WDT, WP*, CLOSED which : (NPx\NPx)/(S[dcl]e\NPx)

WRB : λPλQλx.∃e.P(x,e)∧Q(x)

Table 5: Rules used to classify words into semantic classes. * represents a wild card expression which
matches anything. lexx denotes the lexicalised form of x e.g., when state : NPx/NPx : λPλx.lexx.state(x)∧
P(x) is applied to capital : NP : λy.capital(y), the lexicalised form of x becomes capital, and there-
fore the predicate lexx.state becomes capital.state. The resulting semantic parse after application is
λx.capital.state(x)∧capital(x).

Appendix

We use a handful of rules to divide words into
semantic classes. Based on a word’s seman-
tic class and indexed syntactic category, we con-
struct its semantic category automatically. For
example, directed is a member of the EVENT
class, and its indexed syntactic category is
((Se\NPx<1>)/NPy<2>) (here, <1> and <2> in-
dicate that x and y are the first and second ar-
guments of e). We then generate its seman-
tic category as λQλPλe.∃x∃y.directed.arg1(e,x)∧
directed.arg2(e,y)∧ P(x)∧ Q(y). Please refer to
Appendix B of Clark and Curran (2007) for a list
of their indexed syntactic categories.

The rules are described in Table 5. Syntactic cate-
gories are not shown for the sake of brevity. Most
rules will match any syntactic category. Excep-
tions are copula-related rules (see be in the sixth
row) which apply only to the syntactic category
(S\NP)/NP, and rules pertaining to wh -words (see
the last two rows in the table). When more than one

rule apply, we end up with multiple semantic parses.
There are a few cases like passives, question words,
and prepositional phrases where we modified the
original indexed categories for better interpretation
of the semantics (these are not displayed here). We
also handle non-standard CCG operators involving
unary and binary rules as described in Appendix A
of Clark and Curran (2007).

Acknowledgements

We are grateful to the anonymous reviewers for their
valuable feedback on an earlier version of this paper.
Thanks to Mike Lewis and the members of ILCC
for helpful discussions and comments. We acknowl-
edge the support of EU ERC Advanced Fellowship
249520 GRAMPLUS and EU IST Cognitive Sys-
tems IP EC-FP7-270273 “Xperience”.

References

Artzi, Yoav and Luke Zettlemoyer. 2011. Boot-
strapping semantic parsers from conversations. In
Proceedings of the 2011 Conference on Empirical


Methods in Natural Language Processing. Edin-
burgh, Scotland, pages 421–432.

Artzi, Yoav and Luke Zettlemoyer. 2013. Weakly su-
pervised learning of semantic parsers for mapping
instructions to actions. Transations of the Associ-
ation for Computational Linguistics 1(1):49–62.

Banarescu, Laura, Claire Bonial, Shu Cai, Madalina
Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin
Knight, Philipp Koehn, Martha Palmer, and
Nathan Schneider. 2013. Abstract meaning repre-
sentation for sembanking. In Proceedings of the
7th Linguistic Annotation Workshop and Interop-
erability with Discourse. Sofia, Bulgaria, pages
178–186.

Berant, Jonathan, Andrew Chou, Roy Frostig, and
Percy Liang. 2013. Semantic parsing on Freebase
from question-answer pairs. In Proceedings of the
2013 Conference on Empirical Methods in Nat-
ural Language Processing. Seattle, Washington,
USA, pages 1533–1544.

Berant, Jonathan and Percy Liang. 2014. Seman-
tic parsing via paraphrasing. In Proceedings of
the 52nd Annual Meeting of the Association for
Computational Linguistics. Baltimore, Maryland,
USA, pages 1415–1425.

Bos, Johan, Stephen Clark, Mark Steedman,
James R. Curran, and Julia Hockenmaier. 2004.
Wide-coverage semantic representations from a
ccg parser. In Proceedings of Coling 2004.
Geneva, Switzerland, pages 1240–1246.

Branavan, S.R.K., Harr Chen, Luke Zettlemoyer,
and Regina Barzilay. 2009. Reinforcement learn-
ing for mapping instructions to actions. In Pro-
ceedings of the Joint Conference of the 47th An-
nual Meeting of the ACL and the 4th International
Joint Conference on Natural Language Process-
ing of the AFNLP. Suntec, Singapore, pages 82–
90.

Branavan, S.R.K., Nate Kushman, Tao Lei, and
Regina Barzilay. 2012. Learning high-level plan-
ning from text. In Proceedings of the 50th An-
nual Meeting of the Association for Computa-
tional Linguistics. Jeju Island, Korea, pages 126–
135.

Cai, Qingqing and Alexander Yates. 2013. Large-
scale semantic parsing via schema matching and

lexicon extension. In Proceedings of the 51st
Annual Meeting of the Association for Compu-
tational Linguistics. Sofia, Bulgaria, pages 423–
433.

Chen, David L. and Raymond J. Mooney. 2011.
Learning to interpret natural language navigation
instructions from observations. In Proceedings
of the 25th AAAI Conference on Artificial Intelli-
gence. San Francisco, California, pages 859–865.

Clark, Stephen and James R Curran. 2004. Pars-
ing the wsj using CCG and log-linear models. In
Proceedings of the 42nd Annual Meeting on Asso-
ciation for Computational Linguistics. Barcelona,
Spain, pages 103–111.

Clark, Stephen and James R Curran. 2007. Wide-
coverage efficient statistical parsing with CCG
and log-linear models. Computational Linguistics
33(4):493–552.

Clark, Stephen, Julia Hockenmaier, and Mark Steed-
man. 2002. Building deep dependency structures
with a wide-coverage CCG parser. In Proceed-
ings of the 40th Annual Meeting on Association
for Computational Linguistics. pages 327–334.

Clarke, James, Dan Goldwasser, Ming-Wei Chang,
and Dan Roth. 2010. Driving semantic parsing
from the world’s response. In Proceedings of the
14th Conference on Natural Language Learning.
Uppsala, Sweden, pages 18–27.

Collins, Michael. 2002. Discriminative training
methods for Hidden Markov models: Theory
and experiments with perceptron algorithms. In
Proceedings of the 2002 Conference on Empir-
ical Methods in Natural Language Processing.
Philadelphia, Pennsylvania, pages 1–8.

Fader, Anthony, Luke Zettlemoyer, and Oren Et-
zioni. 2013. Paraphrase-driven learning for open
question answering. In Proceedings of the 51st
Annual Meeting of the Association for Computa-
tional Linguistics. Sofia, Bulgaria, pages 1608–
1618.

Gabrilovich, Evgeniy, Michael Ringgaard, and
Amarnag Subramanya. 2013. FACC1: Freebase
annotation of ClueWeb corpora, Version 1 (Re-
lease date 2013-06-26, Format version 1, Correc-
tion level 0).


Goldwasser, Dan, Roi Reichart, James Clarke, and
Dan Roth. 2011. Confidence driven unsupervised
semantic parsing. In Proceedings of the 49th
Annual Meeting of the Association for Compu-
tational Linguistics: Human Language Technolo-
gies. Portland, Oregon, USA, pages 1486–1495.

Goldwasser, Dan and Dan Roth. 2011. Learning
from natural instructions. In Proceedings of the
22nd International Joint Conference on Artificial
Intelligence. Barcelona, Spain, pages 1794–1800.

Hoffmann, Raphael, Congle Zhang, Xiao Ling,
Luke S Zettlemoyer, and Daniel S Weld. 2011.
Knowledge-based weak supervision for informa-
tion extraction of overlapping relations. In Pro-
ceedings of the 49th Annual Meeting of the As-
sociation for Computational Linguistics: Human
Language Technologies. Portland, Oregon, USA,
pages 541–550.

Krishnamurthy, Jayant and Thomas Kollar. 2013.
Jointly learning to parse and perceive: Connect-
ing natural language to the physical world. Tran-
sations of the Association for Computational Lin-
guistics 1(1):193–206.

Krishnamurthy, Jayant and Tom Mitchell. 2012.
Weakly supervised training of semantic parsers.
In Proceedings of the 2012 Joint Conference on
Empirical Methods in Natural Language Process-
ing and Computational Natural Language Learn-
ing. Jeju Island, Korea, pages 754–765.

Kwiatkowski, Tom, Eunsol Choi, Yoav Artzi, and
Luke Zettlemoyer. 2013. Scaling semantic parsers
with on-the-fly ontology matching. In Proceed-
ings of the 2013 Conference on Empirical Meth-
ods in Natural Language Processing. Seattle,
Washington, USA, pages 1545–1556.

Kwiatkowski, Tom, Luke Zettlemoyer, Sharon
Goldwater, and Mark Steedman. 2010. Inducing
probabilistic CCG grammars from logical form
with higher-order unification. In Proceedings of
the 2010 Conference on Empirical Methods in
Natural Language Processing. pages 1223–1233.

Lewis, Mike and Mark Steedman. 2013. Combined
distributional and logical semantics. Transactions
of the Association for Computational Linguistics
1:179–192.

Liang, Percy, Michael Jordan, and Dan Klein. 2011.
Learning dependency-based compositional se-
mantics. In Proceedings of the 49th Annual Meet-
ing of the Association for Computational Linguis-
tics: Human Language Technologies. Portland,
Oregon, USA, pages 590–599.

Matuszek, Cynthia, Nicholas FitzGerald, Luke
Zettlemoyer, Liefeng Bo, and Dieter Fox. 2012.
A joint model of language and perception for
grounded attribute learning. In Proceedings of the
29th International Conference on Machine Learn-
ing. Edinburgh, Scotland, pages 1671–1678.

Mintz, Mike, Steven Bills, Rion Snow, and Dan
Jurafsky. 2009. Distant supervision for relation
extraction without labeled data. In Proceedings
of the Joint conference of the 47th Annual Meet-
ing of the Association for Computational Linguis-
tics and the 4th International Joint Conference on
Natural Language Processing of the Asian Fed-
eration of Natural Language Processing. pages
1003–1011.

Parsons, Terence. 1990. Events in the Semantics of
English. MIT Press, Cambridge, MA.

Poon, Hoifung. 2013. Grounded unsupervised se-
mantic parsing. In Proceedings of the 51st An-
nual Meeting of the Association for Computa-
tional Linguistics. Sofia, Bulgaria, pages 933–
943.

Riedel, Sebastian, Limin Yao, Andrew McCallum,
and Benjamin M Marlin. 2013. Relation ex-
traction with matrix factorization and universal
schemas. In Proceedings of the 2013 Conference
of the North American Chapter of the Association
for Computational Linguistics: Human Language
Technologies. Atlanta, Georgia, pages 74–84.

Steedman, Mark. 2000. The Syntactic Process. The
MIT Press.

Wong, Yuk Wah and Raymond Mooney. 2007.
Learning synchronous grammars for semantic
parsing with lambda calculus. In Proceedings
of the 45th Annual Meeting of the Association
of Computational Linguistics. Prague, Czech Re-
public, pages 960–967.

Yao, Xuchen and Benjamin Van Durme. 2014. In-
formation extraction over structured data: Ques-
tion answering with freebase. In Proceedings of


the 52nd Annual Meeting of the Association for
Computational Linguistics. Baltimore, Maryland,
USA, pages 956–966.

Zelle, John M and Raymond J Mooney. 1996.
Learning to parse database queries using induc-
tive logic programming. In Proceedings of the
National Conference on Artificial Intelligence.
Portland, Oregon, pages 1050–1055.

Zettlemoyer, Luke and Michael Collins. 2007. On-
line learning of relaxed CCG grammars for pars-
ing to logical form. In Proceedings of the 2007
Joint Conference on Empirical Methods in Natu-
ral Language Processing and Computational Nat-
ural Language Learning. Prague, Czech Repub-
lic, pages 678–687.

Zettlemoyer, Luke S. and Michael Collins. 2005.
Learning to map sentences to logical form: Struc-
tured classification with probabilistic categorial
grammars. In Proceedings of 21st Conference
in Uncertainilty in Artificial Intelligence. Edin-
burgh, Scotland, pages 658–666.