Learning Strictly Local Subsequential Functions

Jane Chandlee
University of Delaware

Department of Linguistics
and Cognitive Science
125 East Main Street
Newark, DE 19716

janemc@udel.edu

Rémi Eyraud
QARMA team

Laboratoire d’Informatique
Fondamentale

Marseille, France
remi.eyraud@

lif.univ-mrs.fr

Jeffrey Heinz
University of Delaware

Department of Linguistics
and Cognitive Science
125 East Main Street
Newark, DE 19716
heinz@udel.edu

Abstract

We define two proper subclasses of sub-
sequential functions based on the concept
of Strict Locality (McNaughton and Papert,
1971; Rogers and Pullum, 2011; Rogers et
al., 2013) for formal languages. They are
called Input and Output Strictly Local (ISL
and OSL). We provide an automata-theoretic
characterization of the ISL class and theorems
establishing how the classes are related to each
other and to Strictly Local languages. We give
evidence that local phonological and morpho-
logical processes belong to these classes. Fi-
nally we provide a learning algorithm which
provably identifies the class of ISL functions
in the limit from positive data in polynomial
time and data. We demonstrate this learning
result on appropriately synthesized artificial
corpora. We leave a similar learning result
for OSL functions for future work and sug-
gest future directions for addressing non-local
phonological processes.

1 Introduction

In this paper we define two proper subclasses of sub-
sequential functions based on the properties of the
well-studied Strictly Local formal languages (Mc-
Naughton and Papert, 1971; Rogers and Pullum,
2011; Rogers et al., 2013). These are languages
that can be defined with grammars of substrings of
length k (called k-factors), such that a string is in
the language only if its own k-factors are a subset of
the grammar. These languages have also been char-
acterized by Rogers and Pullum (2011) as those that

have the property expressed in the following theo-
rem (which can be taken as a defining property):
Theorem 1 (Suffix Substitution Closure). L is
Strictly Local iff for all strings u1, v1, u2, v2, there
exists k ∈ N such that for any string x of length
k − 1, if u1xv1, u2xv2 ∈ L, then u1xv2 ∈ L.

These languages can model natural language
phonotactic constraints which pick out contiguous
substrings bounded by some length k (Heinz, 2007;
Heinz, 2010). We define Input Strictly Local (ISL)
and Output Strictly Local (OSL) functions which
model phonological processes for which the target
and triggering context are a bounded contiguous
substring. Here our use of ‘process’ is not specific
to rule-based grammatical formalisms (such as SPE
(Chomsky and Halle, 1968)). ISL and OSL func-
tions model mappings from underlying forms to sur-
face forms, which are also the bedrock of constraint-
based frameworks like Optimality Theory (Prince
and Smolensky, 1993).

By showing that local phonological processes can
be modeled with ISL (and OSL) functions, we pro-
vide the strongest computational characterization of
the input-output mappings these processes repre-
sent. While it has been shown that phonological
mappings describable with rules of the form A →
B / C D (where A, B, C, and D are regular
languages) are regular (Johnson, 1972; Kaplan and
Kay, 1994), and even subsequential (Chandlee and
Heinz, 2012; Heinz and Lai, 2013), many logically
possible regular and subsequential mappings are not
plausible phonological mappings. Since these im-
plausible mappings cannot be modeled with ISL or
OSL functions, we provide a more precise notion of

491

Transactions of the Association for Computational Linguistics, vol. 2, pp. 491–503, 2014. Action Editor: Alexander Clark.
Submitted 4/2014; Revised 8/2014; Published November 1, 2014. c⃝2014 Association for Computational Linguistics.


what constitues plausible phonological mappings.
In addition, we present the Input SL Function

Learning Algorithm (ISLFLA) and prove that it
identifies this class in the limit (Gold, 1967) in poly-
nomial time and data (de la Higuera, 1997). Our
approach follows previous work on learning subse-
quential transductions, namely Oncina et al. (1993),
Oncina and Varò (1996), Castellanos et al. (1998),
and Gildea and Jurafsky (1996). Oncina et al. (1993)
present OSTIA (Onward Subsequential Transducer
Inference Algorithm), an algorithm that learns the
class of subsequential functions in the limit from
positive data. OSTIA is only guaranteed to identify
total functions exactly, but Oncina and Varò (1996)
and Castellanos et al. (1998) present the modifica-
tions OSTIA-N, OSTIA-D, and OSTIA-R, which
learn partial functions using negative data, domain
information, and range information, respectively.

In terms of linguistic applications, Gildea and
Jurafsky (1996) show that OSTIA fails to learn
the phonological mapping of English flapping when
given natural language data. The authors modified
OSTIA with three learning heuristics (context, com-
munity, and faithfulness) and showed that the mod-
ified learner successfully learns flapping and sev-
eral other phonological rules. Context encodes the
idea that phonological changes depend on the con-
text of the segment undergoing the change. Commu-
nity gives the learner the ability to deduce that seg-
ments belonging to a natural class are likely to be-
have similarly. Lastly, faithfulness, by which under-
lying segments are assumed to be realized similarly
on the surface, was encoded with a forced alignment
between the input-output strings in the data set.1 We
believe this alignment removes OSTIA’s guarantee
that all subsequential functions are learned.

Similar to the approach of Gildea and Jurafsky
(1996), our learner employs a context bias because
it knows its target is an ISL function and therefore
the transduction only involves bounded contiguous
substrings. And similar to OSTIA-D (Oncina and
Varò, 1996; Castellanos et al., 1998), the ISLFLA
makes use of domain information, because it makes
decisions based on the input strings of the data set. It
also employs a faithfulness bias in terms of the prop-

1The alignment was similar to the string-edit distance
mehod used in Hulden et al. (2011).

erty onwardness (see §2). The ISLFLA is supported
by a theoretical result like Oncina et al. (1993), but
learns a more restrictive class of mappings. We be-
lieve the theoretical results for this class will lead to
new algorithms which include something akin to the
community bias and that will succeed on natural lan-
guage data while keeping strong theoretical results.

The proposed learner also builds on earlier work
by Chandlee and Koirala (2014) and Chandlee and
Jardine (2014) which also used strict locality to learn
phonological processes but with weaker theoretical
results. The former did not precisely identify the
class of functions the learner could learn, and the
latter could only guarantee learnability of the ISL
functions with a closed learning sample.

The paper is organized as follows. §2 presents the
mathematical notations to be used. §3 defines ISL
and OSL functions, provides an automata-theoretic
characterization for ISL, and establishes some prop-
erties of these classes. §4 demonstrates how these
functions can model local phonological processes,
including substitution, insertion, and deletion. §5
presents the ISLFLA, proves that it efficiently learns
the class of ISL functions, and provides demonstra-
tions. §6 concludes.

2 Preliminaries

The set of all possible finite strings of symbols from
a finite alphabet Σ and the set of strings of length ≤
n are Σ∗ and Σ≤n, respectively. The unique empty
string is represented with λ. The length of a string w
is |w|, so |λ| = 0. The set of prefixes of w, Pref(w),
is {p ∈ Σ∗ | (∃s ∈ Σ∗)[w = ps]}, and the set of suf-
fixes of w, Suff(w), is {s ∈ Σ∗ | (∃p ∈ Σ∗)[w =
ps]}. For all w ∈ Σ∗ and n ∈ N, Suffn(w) is the
single suffix of w of length n if |w| ≥ n; otherwise
Suffn(w) = w. If w = ps, then ws−1 = p and
p−1w = s. The longest common prefix of a set of
strings S, lcp(S), is p ∈ ∩w∈SPref(w) such that
∀p′ ∈∩w∈SPref(w), |p′| < |p|.

A function f with domain A and co-domain B is
written f : A → B. We sometimes write x 7→f y
for f(x) = y. When A and B are free monoids (like
Σ∗), the input and output languages of a function f
are the stringsets dom(f) = {x | (∃y)[x 7→f y]} and
image(f) = {y | (∃x)[x 7→f y]}, respectively.

Following Oncina et al. (1993), a subsequen-

492


tial finite state transducer (SFST) is a 6-tuple
(Q,q0, Σ, Γ,δ,σ), where Q is a finite set of states,
Σ and Γ are finite alphabets, q0 ∈ Q is the initial
state, δ ⊆ Q × Σ × Γ∗ × Q is a set of edges, and
σ : Q → Γ∗ is the final output function that maps
states to strings that are appended to the output if the
input ends in that state. δ recursively defines a map-
ping δ∗: (q,λ,λ,q) ∈ δ∗; if (q,u,v,q′) ∈ δ∗ and
(q′,σ,w,q′′) ∈ δ then (q,uσ,vw,q′′) ∈ δ∗. SFSTs
are deterministic, which means their edges have the
following property: [(q,a,u,r), (q,a,v,s) ∈ δ ⇒
u = v ∧ r = s]. Hence, we also refer to δ as
the transition function, and ∀(q,a,u,r) ∈ δ, we let
δ1(q,a) = u and δ2(q,a) = r.

The relation that a SFST T recog-
nizes/accepts/generates is

R(T ) =
{

(x,yz) ∈ Σ∗ × Γ∗ | (∃q ∈ Q)[
(q0,x,y,q) ∈ δ∗ ∧z = σ(q)

]}
.

Since SFSTs are deterministic, the relations they
recognize are functions. Subsequential functions are
defined as those describable with SFSTs.

For any function f : Σ∗ → Γ∗ and x ∈ Σ∗, let the
tails of x with respect to f be defined as

tailsf (x) =
{

(y,v) | f(xy) = uv ∧
u = lcp(f(xΣ∗))

}
.

If strings x1,x2 ∈ Σ∗ have the same set of tails with
respect to a function f, they are tail-equivalent with
respect to f and we write x1 ∼f x2. Clearly, ∼f is
an equivalence relation which partitions Σ∗.

Theorem 2 (Oncina and Garcia, 1991). A function
f is subsequential iff ∼f partitions Σ∗ into finitely
many blocks.

The above theorem can be seen as the functional
analogue to the MyHill-Nerode theorem for regular
languages. Recall that for any stringset L, the tails
of a word w w.r.t. L is defined as tailsL(w) =
{u | wu ∈ L}. These tails can be used to par-
tition Σ∗ into a finite set of equivalence classes
iff L is regular. Furthermore, these equivalence
classes are the basis for constructing the (unique
up to isomorphism) smallest deterministic acceptor
for a regular language. Likewise, Oncina and Gar-
cia’s proof of Theorem 2 shows how to construct
the (unique up to isomorphism) smallest SFST for

a subsequential function f. We refer to this trans-
ducer as the canonical transducer for f and denote it
with T Cf . The states of T Cf are in one-to-one corre-
spondence with tailsf (x) for all x ∈ Σ∗ (Oncina
and Garcı́a, 1991). To construct T Cf first let, for
all x ∈ Σ∗ and a ∈ Σ, the contribution of a w.r.t.
x be contf (a,x) = lcp(f(xΣ∗)−1lcp(f(xaΣ∗)).
Then,

• Q = {tailsf (x) | x ∈ Σ∗},
• q0 = tailsf (λ),
• ∀tailsf (x) ∈ Q, σ(tailsf (x)) =
lcp(f(xΣ∗))−1f(x) if x ∈ dom(f) and is un-
defined otherwise,
• δ =

{
(tailsf (x),a,contf (a,x),

tailsf (xa))}.

T Cf has an important property called onwardness.
Definition 1 (onwardness). For all q ∈ Q let the
outputs of the edges out of q be outs(q) =

{
u |

(∃a ∈ Σ)(∃r ∈ Q)[(q,a,u,r) ∈ δ]
}

. A SFST T is
onward if for all q ∈ Q\{q0}, lcp(outs(q)) = λ.

Informally, this means that the writing of output
is never delayed. Readers are referred to Oncina
and Garcı́a (1991), Oncina et al. (1993), and Mohri
(1997) for more on SFSTs.

3 Strictly Local Functions

In this section we define Input and Output Strictly
Local functions and provide properties of these
classes. These definitions are analogous to the
language-theoretic definition of Strictly Local lan-
guages (Theorem 1) (Rogers and Pullum, 2011;
Rogers et al., 2013).

Definition 2 (Input Strictly Local Function). A func-
tion f is Input Strictly Local (ISL) if there is a k
such that for all u1,u2 ∈ Σ∗, if Suffk−1(u1) =
Suffk−1(u2) then tailsf (u1) = tailsf (u2).

The theorem below establishes an automata-
theoretic characterization of ISL functions.

Theorem 3. A function f is ISL iff there is some k
such that f can be described with a SFST for which

1. Q = Σ≤k−1 and q0 = λ
2. (∀q ∈ Q,∀a ∈ Σ,∀u ∈ Γ∗)[

(q,a,u,q′) ∈ δ ⇒ q′ = Suffk−1(qa)
]
.

493


This theorem helps make clear how ISL functions
are Markovian: the output for input symbol a de-
pends on the last (k− 1) input symbols. Also, since
the transducer defined in Theorem 3 is determinis-
tic, it is unique and we refer to it as T ISLf . T ISLf may
not be isomorphic to T Cf . Figure 1 shows T ISLf (with
k = 2) and T Cf for the identity function.2

a

b!

a

ba

b

#:!

#:!#:!

a

b

T 2!ISL for identity function

!

#:!

a,b

T C for identity function

1

Figure 1: Non-isomorphic T ISLf (left) and T Cf (right)

Before we present the proof of Theorem 3, we make
the following remarks, and then prove a lemma.

Remark 1. For all n,m ∈ N with n ≤ m, and for
all w ∈ Σ∗, Suffn(Suffm(w)) = Suffn(w) since
both w and Suffm(w) share the same n-long suffix.

Remark 2. For all w,v ∈ Σ∗,k ∈ N,
Suffk−1

(
Suffk−1(w)v

)
= Suffk−1(wv). This is

a direct consequence of Remark 1.

Lemma 1. Let T be a SFST constructed as in The-
orem 3. If (λ,w,u,q) in δ∗ then q = Suffk−1(w).

Proof. The proof is by induction on δ∗. The base
case follows from the facts that (λ,λ,λ,λ) ∈ δ∗ (by
definition of δ∗), and that λ = Suffk−1(λ).

Next assume the inductive hypothesis (IH) that
there exists w ∈ Σ∗, u ∈ Γ∗, such that (λ,w,u,q) ∈
δ∗ implies q = Suffk−1(w). We show ∀a ∈ Σ,
∀x ∈ Γ∗ that (λ,wa,ux,q′) ∈ δ∗ implies q′ =
Suffk−1(wa). Now (λ,w,u,Suffk−1(w)) ∈ δ∗
(by IH) so (Suffk−1(w),a,x,q′) ∈ δ. By construc-
tion of T , q′ = Suffk−1(Suffk−1(w)a), which by
Remark 2, equals Suffk−1(wa).

Proof of Theorem 3. (⇐) Fix k ∈ N and let f be a
function described by T = {Q,q0, Σ, Γ,δ,σ} con-
structed as in Theorem 3. Let w1,w2 ∈ Σ∗ such
that Suffk−1(w1) = Suffk−1(w2). By Lemma 1,

2In all figures, single-symbol transition labels indicate that
the input and output are identical, and the final output function
is represented as a transition on the end-of-word symbol #.

both w1 and w2 lead to the same state and therefore
tailsf (w1) = tailsf (w2). Thus f is k-ISL.

(⇒) Consider any ISL function f. Then there
is a k such that (∀w1,w2 ∈ Σ∗)[Suffk−1(w1) =
Suffk−1(w2) ⇒ tailsf (w1) = tailsf (w2)]. We
show that T ISLf exists. Since Σ≤k−1 is a finite set,
the equivalence relation ∼f partitions Σ∗ into at
most |Σ≤k−1| blocks. Hence by Theorem 2, f is
subsequential and a canonical subsequential trans-
ducer T Cf = {Qc,q0c, Σ, Γ,δc,σc} exists.

Construct T = {Q,q0, Σ, Γ,δ,σ} as follows:
• Q = Σ≤k−1 and q0 = λ
• ∀q ∈ Q,σ(q) = σc

(
tailsf (q)

)
if f(q) is de-

fined, else σ(q) is undefined.
• ∀q ∈ Q, ∀a ∈ Σ, ∀u ∈ Γ∗,(

q,a,u,Suffk−1(qa)
)

∈ δ iff(
tailsf (q),a,u,tailsf (qa)

)
∈ δc.

First, by its construction, the states and transitions
of T meet requirements (1) and (2) of Theorem 3.

Next, we show that T computes the same
function as T Cf . We first establish by induc-
tion on δ∗ the following (?): ∀w ∈ Σ∗,
∀u ∈ Γ∗,

(
q0,w,u,Suff

k−1(w)
)
∈ δ∗ iff(

tailsf (λ),w,u,tailsf (w)
)
∈ δ∗c .

Clearly
(
tailsf (λ),λ,λ,tailsf (λ)

)
∈ δ∗c and(

λ,λ,λ,λ
)
∈ δ∗ by definition of the extended tran-

sition function. Thus the base case is satisfied.
Next assume the inductive hypothesis (IH) for

w ∈ Σ∗ and u ∈ Γ∗. We show ∀a ∈ Σ and
∀x ∈ Γ∗ that

(
λ,wa,ux,Suffk−1(wa)

)
∈ δ∗ iff(

tailsf (λ),wa,ux,tailsf (wa)
)
∈ δ∗c .

Suppose
(
tailsf (λ),wa,ux,tailsf (wa)

)
∈

δ∗c . By IH,
(
tailsf (λ),w,u,tailsf (w)

)
∈ δ∗c ;

hence
(
tailsf (w),a,x,tailsf (wa)

)
∈ δc.

Let q = Suffk−1(w). Observe that
Suffk−1(w) = Suffk−1(q) by Remark 1. Conse-
quently, since f is k-ISL, tailsf (w) = tailsf (q).
Similarly, Suffk−1(wa) = Suffk−1(qa) and thus
tailsf (wa) = tailsf (qa). By substitution then,
we have

(
tailsf (λ),w,u,tailsf (q)

)
∈ δ∗c and(

tailsf (q),a,x,tailsf (qa)
)
∈ δc.

By construction of T ,
(
q,a,x,Suffk−1(qa)

)
∈

δ. By IH,
(
λ,w,u,Suffk−1(w)

)
=
(
λ,w,u,q

)
∈

δ∗. Thus,
(
λ,wa,ux,Suffk−1(qa)

)
∈ δ∗ which is

the same as saying
(
λ,wa,ux,Suffk−1(wa)

)
∈ δ∗.

Conversely, consider any a ∈ Σ and x ∈
Γ∗ and suppose

(
λ,wa,ux,Suffk−1(wa)

)
∈ δ∗.

494


By IH,
(
λ,w,u,Suffk−1(w)

)
∈ δ∗; hence(

Suffk−1(w),a,x,Suffk−1(wa)
)
∈ δ. As be-

fore, letting q = Suffk−1(w), it follows that
Suffk−1(w) = Suffk−1(q) so tailsf (w) =
tailsf (q), and Suffk−1(wa) = Suffk−1(qa)
so tailsf (wa) = tailsf (qa). Therefore,(
q,a,x,Suffk−1(qa)

)
∈ δ.

By construction of T , this means(
tailsf (w),a,x,tailsf (wa)

)
∈ δc. By IH,(

tailsf (λ),w,u,tailsf (w)
)
∈ δ∗c . By the defi-

nition of the extended transition function, it follows
that

(
tailsf (λ),wa,ux,tailsf (wa)

)
∈ δ∗c . This

completes the induction, establishing (?).
Let w ∈ Σ∗. If f(w) is defined then f(w) = uv

where
(
tailsf (λ),w,u,tailsf (w)

)
∈ δ∗c

and σc
(
tailsf (w)

)
= v. From (?), we

know
(
λ,w,u,Suffk−1(w)

)
∈ δ∗. Also,

σ
(
Suffk−1(w)

)
= σc

(
tailsf (Suff

k−1(w))
)
.

Since f is k-ISL, tailsf (w) =
tailsf (Suff

k−1(w)) (cf. Remark 1). Thus
σ
(
Suffk−1(w)

)
= σc

(
tailsf (w)

)
= v. There-

fore T (w) = uv also.
If f(w) is not defined then there are two

cases. Case 1:
(
tailsf (λ),w,u,tailsf (w)

)
6∈

δ∗c . By the above lemma, it follows(
λ,w,u,Suffk−1(w)

)
6∈ δ∗ and so T (w) is also de-

fined. Case 2:
(
tailsf (λ),w,u,tailsf (w)

)
∈ δ∗c

but σc(tailsf (w)) is undefined. By construction
of T , σ(Suffk−1(w)) is also undefined.

Similar to Definition 2 above, the following defi-
nition of OSL functions says that if the outputs of
two input strings share the same suffix of length
(k − 1), then those inputs have the same tails.
Definition 3 (Output Strictly Local Function). A
function f is Output Strictly Local (OSL) if there is a
k such that for all u1,u2 ∈ Σ∗, if Suffk−1(f(u1)) =
Suffk−1(f(u2)) then tailsf (u1) = tailsf (u2).
The automata-theoretic characterization of this class
is left as future work.

Next, some properties of ISL and OSL functions
are identified. The proofs of the following theorems
will refer to the example SFSTs in Figure 2.

First we establish that ISL and OSL functions
are proper subclasses of the subsequential functions.
(They are subclasses by definition.)
Theorem 4. There are subsequential functions
which are neither ISL nor OSL.

0 1
a

#:! #:!
b,c a,b:a,c

f1

a

b!

b
aa

b

#:!

#:!#:!

a:b

b

f2

1

a

b!

a
a

b,c:b

#:!

#:!#:!

a,b:a,c:a

b,c:b

f3

a

b!

b

a:aa
a:aa

b

#:!

#:!
#:!

a:aa

b

f4

1

0 1
a

#:! #:a

a

f5

10
a

a:!

#:!#:!

f6

1

Figure 2: Examples for proofs of Theorems 4-7

Proof. Consider the subsequential function f1 in
Figure 2 and choose any k ∈ N. Since
f1(bc

kb) = bckb and f1(ackb) = acka it follows
that tailsf1 (bc

k) 6= tailsf1 (ack) even though in-
puts bck and ack share a common suffix of length
(k − 1). Likewise, the outputs of these inputs
f1(bc

k) = bck and f1(ack) = ack share a common
suffix of length (k − 1), but the tails of these inputs,
as mentioned, are distinct. Hence by Definitions 2
and 3, there is no k for which f1 is ISL or OSL.

Theorem 5. The class of ISL functions is incompa-
rable to the class of OSL functions.

Proof. Consider f2 in Figure 2. This function is ISL
by Theorem 3. However it is not OSL. Consider any
k ∈ N. Observe that f2(aak) = f2(abk) = abk and
so the outputs share the same suffix of length (k −
1). However, tailsf2 (aa

k) 6= tailsf2 (abk) since
(a,b) ∈ tailsf2 (aak) but (a,a) ∈ tailsf2 (abk).

Similarly, consider f3 in Figure 2. This function is
2-OSL because inputs mapped to outputs which end
in a have the same tails (i.e., they all lead to state a),
and inputs mapped to outputs ending in b have the
same tails.

However, f3 is not ISL. Consider any k ∈
N. The inputs cbk and abk share the same suf-
fix of length (k − 1). However, tailsf3 (cbk) 6=

495


tailsf3 (ab
k) since (b,b) ∈ tailsf3 (cbk) but

(b,a) ∈ tailsf3 (abk).
Finally, the two classes are obviously not disjoint,

since the identity function is both ISL and OSL.

Theorem 6. The output language of an ISL or OSL
function is not necessarily a Strictly Local language.

Proof. Consider f4 in Figure 2. By Theorem 3, it
is ISL. It is also 2-OSL since inputs mapped to out-
puts which end in a have the same tails (i.e., they all
lead to state a). Similarly, inputs mapped to outputs
ending in b have the same tails.

Let k ∈ N. Observe that f4(ak) = a2k and fur-
ther that there is no input x and k such that f4(x) =
a2k+1. Since a2k = ak−1aka = ak−2akaa, if the
output language of f4 were SL it would follow, by
Theorem 1, that ak−1akaa = a2k+1 also belongs
to the output language. But there is no input in Σ∗

which f4 maps to an odd sequence of a’s.

Theorem 7. If either the output or input language
of a subsequential function f is Strictly Local then it
does not follow that f belongs to either ISL or OSL.

Proof. Let Σ = Γ = {a} and consider the function
f5 which, for all n ∈ N, maps an to an if n is even
and an to ana if n is odd. f5 is subsequential, as
shown in Figure 2, and its domain, a∗, is a Strictly
Local language. However, f5 is neither ISL nor OSL
for any k since the tails of a2k includes (λ,λ) but
the tails of aa2k includes (λ,a).

Next consider f6, which for all n ∈ N maps an to
am where m = (n div 2) if n is even and m = (n div
2)+1 if n is odd. f6 is subsequential, as shown in
Figure 2, and the image(f) = a∗ is Strictly Local.
However, f is neither ISL nor OSL for any k since
the tails of a2k includes (a,a) but the tails of
aa2k includes (a,λ).

4 Relevance to Phonology

This section will briefly discuss the range of phono-
logical processes that can be modeled with ISL
and/or OSL functions by providing illustrative ex-
amples for three common, representative processes.
These are shown in (1) with SPE-style rules.

(1) a. Devoicing: D → T / #
b. Epenthesis: ∅ → @ / {l, r} [-coronal]
c. Deletion: {T, D}→ ∅ / {T, s}

See Chandlee (2014) for more details.
First, consider the process of final obstruent de-

voicing (1-a), attested in German and other lan-
guages. This process causes an underlying voiced
obstruent in word-final position to surface as its
voiceless counterpart. In (1-a), D abbreviates the set
of voiced obstruents and T the voiceless obstruents.3

The mapping from underlying form to surface
form that this process describes is represented with
the 2-ISL function in Figure 3 (note N represents any
sonorant).

!

D

N

T

D:!

N

D:!

N T

T

T:DT

N:DND:!

#:!

#:T

#:!

#:!

T

D

N

1

Figure 3: T ISLf for (1-a) with k = 2 and Σ = {T, D, N}

In addition to substitution processes like (1-a), an-
other common type of process is insertion of a seg-
ment, such as the @-epenthesis process attested in
Dutch (Warner et al., 2001). This process inserts
a schwa between a liquid and a non-coronal con-
sonant, as specified by (1-b). Using L to represent
the liquids {l, r} and K to represent any non-coronal
consonant, Figure 4 presents T ISLf for this process.
Following Beesley and Karttunen (2003), the sym-
bol ‘?’ represents any segment of the alphabet other
than the ones for which transitions are defined.4

Lastly, an example deletion process from Greek
(Joseph and Philippaki-Warburton, 1987) deletes in-
terdental fricatives before /T/ or /s/, as in (1-c). Fig-
ure 5 presents T ISLf for this process.

3These abbreviations serve to improve the readability of the
FST - it is assumed that the transduction replaces a given voiced
obstruent with its voiceless counterpart (not with any voiceless
obstruent).

4This means Figure 4 is actually a shorthand for T ISLf , which
would otherwise have a distinct state for each symbol now rep-
resented with ‘?’.

496


!

K

?

L

K:@K

?

K

? L

L
L

?K

#:!

#:!

#:!

#:!

L

K

?

1

Figure 4: T ISLf for (1-b) with k = 2 and Σ = {L, K, ?}

!

D

s

?

T

D:!

s

?

T:!

s

?:T?

s

D:!

?:D?

D:!

T:!

T:!

D:T
T:!

?
s#:D

#:!

#:!

#:!
#:T

T:!

s

?

D

1

Figure 5: T ISLf for (1-c) with k = 2 and Σ = {T, D, s, ?}

The German, Dutch, and Greek examples demon-
strate how ISL functions can be used to model the
input-output mapping of a phonological rule. Be-
yond substitution, insertion, and deletion, it is shown
in Chandlee (2014) that ISL and/or OSL functions
can also model many metathesis patterns, specifi-
cally those for which there is an upper bound on
the amount of material that intervenes between a
segment’s original and surface positions (this ap-
pears to include all synchronic patterns). In addi-
tion, morpho-phonological processes such as local
partial reduplication (i.e., in which the reduplicant
is affixed adjacent to the portion of the base it was
copied from) and general affixation are also shown
to be ISL or OSL. More generally, we currently con-
jecture that a SPE-style rewrite rule of the form A
→ B/ C D which applies simultaneously (left-
to-right) describes an Input (Output) Strictly Local
function iff there is a k such that for all w ∈ CAD
it is the case that |w| ≤ k. We refer readers to Ka-
plan and Kay (1994) and Hulden (2009) for more

on how SPE rules and application modes determine
mappings.

In contrast, non-local (long-distance) processes
such as vowel harmony with transparent vowels
(Gainor et al., 2012; Heinz and Lai, 2013), long dis-
tance consonant harmony (Hansson, 2001; Rose and
Walker, 2004; Luo, 2013) and dissimilation (Suzuki,
1998; Bennett, 2013; Payne, 2013), unbounded dis-
placement/metathesis (Chandlee et al., 2012; Chan-
dlee and Heinz, 2012), non-local partial reduplica-
tion (Riggle, 2003), and some tonal patterns (Jar-
dine, 2013) cannot be modeled with ISL or OSL
functions. In §6 we comment on the potential for
adapting the current analysis to non-local mappings.

The next section presents a learning algorithm for
ISL functions, the ISLFLA. The development of a
corresponding algorithm for OSL functions is the
subject of ongoing work, but see §6 for comments.

5 Learning Input Strictly Local Functions

We now present a learning algorithm for the class of
ISL functions that uses its defining property as an in-
ductive principle to generalize from a finite amount
of data to a possibly infinite function. This learner
begins with a prefix tree representation of this data
and then generalizes by merging states.5 Its state
merging criterion is based on the defining property
of ISL functions: two input strings with the same
suffix of length (k−1) have the same tails. The next
section explains in detail how the algorithm works.

5.1 The Algorithm: ISLFLA

Given a finite dataset D of input-output string pairs
(w,w′) such that f(w) = w′, where f is the target
function, the learner tries to build T ISLf . The dataset
is submitted to the learner in the form of a prefix tree
transducer (PTT), which is defined in Definition 4.

Definition 4 (Prefix Tree Transducer). A prefix tree
transducer for finite set D = {(w,w′) | f(w) = w′}
for some f is PTT(D) = (Q,q0, Σ, Γ,δ,σ), where

• Q =
⋃
{Pref(w)|(w,w′) ∈ D} and q0 = λ

• δ = {(u,a,λ,ua) | u,ua ∈ Q}
• σ(w) = w′ for all (w,w′) ∈ D
5This general strategy is common in grammatical inference.

See Angluin (1982), Heinz (2009), and de la Higuera (2010).

497


As an example, the sample of data in (2) exempli-
fies the final devoicing rule in (1-a). Figure 6 gives
the PTT for this data.

(2) {(D, T), (DD, DT), (DNT, DNT), (NND,
NNT), (TDN, TDN)}

!

D

N

T TD TDN

NN NND

DD

DN DNT
T:! #:DNT

D:!

N:!

#:DT

#:T

N:! D:! #:NNT

#:TDNN:!D:!

D:!

N:!

T:!

#:!

1

Figure 6: PTT for the data in (2)

Given such a PTT, the learner’s first step is to make
it onward. In the PTT, the output string is with-
held until the end of the input (i.e., #) is reached.
In the onward version (shown in Figure 7), output is
advanced as close to the root as possible. This in-
volves determining the lcp of all the output strings
of all outgoing transitions of a state (starting from
the leaves) and suffixing that lcp to the output of
the single incoming transition of the state.

!

D

N

T

TD TDN

NN NND

DD

DN DNT
T:! #:!

D:DT

N:DNT

#:!
#:T

N:! D:! #:!

#:!N:!
D:!

D:!

N:NNT

T:TDN

#:!

1

Figure 7: Onward version of the PTT in Figure 6

Once the learner has constructed this onward rep-
resentation of the data, it begins to merge states,
using two nested loops. The outer loop proceeds
through the entire state set (ordered lexicographi-
cally) and merges each state with the state that cor-
responds to its (k − 1)-length suffix. For example,
for final devoicing k = 2, so each state will be
merged with the state that represents its final sym-
bol. This merging may introduce non-determinism,
which must be removed since by definition T ISLf is

deterministic. Non-determinism is removed in the
inner loop with additional state merges.

Consider the situation depicted on the lefthand
side of Figure 8, which resulted from a previous
merge. The non-determinism could be resolved by

0

1

2

3

4

5

D:!

N:N

D:!
T:DT

T:DTN 0

1

2

3

4

5

D:N

N:NN

D:!
T:DT

T:DT

1

Figure 8: Before (left) and after (right) pushback

merging states 1 and 2, except for the fact that the
output strings of the two transitions differ. Before
merging 1 and 2, therefore, the learner performs an
operation called pushback that retains the lcp of
the two output strings and prefixes what’s left to all
output strings of all outgoing transitions from the re-
spective destination states.

Definition 5 (pushback (Oncina et al., 1993)).
For SFST T = (Q,q0, Σ, Γ,δ,σ), v ∈ Σ∗, and
e = (q,a,w,q′) ∈ δ, pushback

(
T ,v,e

)
=

(Q,q0, Σ, Γ,δ
′,σ) where δ′ = (δ ∪

{q,a,wv−1,q′)} ∪ {(q′,b,vz,q′′)|(q′,b,z,q′′) ∈
δ}) \ ({(q′,b,z,q′′)}∪{e}).

In the example in Figure 8, pushback is applied
to the edges (0,T,DT, 1) and (0,T,DTN, 2). Only
the lcp of the output strings, which is DT, is retained
as the output string of both edges. The remainder
(which is λ and N, respectively) is prefixed to the
output string of all transitions leaving the respective
destination state. The result is shown on the right-
hand side of Figure 8. Essentially, pushback ‘un-
does’ onwardness when needed.

After pushback, states 1 and 2 can be merged.
This removes the initial non-determinism but creates
new non-determinism. The inner loop iterates un-
til all non-determinism is resolved, after which the
outer loop continues with the next state in the order.
If the inner loop encounters non-determinism that
cannot be removed, the ISLFLA exits with a mes-
sage indicating that the data sample is insufficient.

Non-removable non-determinism occurs if and
only if the situation depicted on the left in Figure
9 obtains. The normal procedure for removing non-

498


q

#:!

#:z q

s

t

a:u

a:v

1

Figure 9: Irreparable non-determinism (left) and non-
determinism from an outer loop merge (right).

determinism cannot be applied in this case. Assum-
ing z 6= λ, all of z would have to be pushed back,
but since this transition has no destination state there
is nowhere for z to go. OSTIA handles this situ-
ation by rejecting the outer loop merge that led to
it, restoring the FST to its state before that merge
and moving on to the next possible merge. But the
ISLFLA cannot reject merges. If it could, the pos-
sibility would arise that two states with the same
(k − 1)-length suffix would remain distinct in the
final FST the learner outputs. Such a FST would
not (by definition) be ISL. Therefore, the algorithm
is at an impasse: rejecting a merge can lead to a
non-ISL FST, while allowing it can lead to a non-
subsequential (hence non-ISL) FST. It therefore ter-
minates. Below is pseudo-code for the ISLFLA.

Data: PTT(D)
Result: T ISLf
τ = onward(PTT)
q = next(Qτ,first(Qτ ))
while q < last(Qτ ) do

merge(q, Suffk−1(q))
while ∃(q,a,x,q1), (q,a,y,q2) ∈ δτ do

if a = # and x 6= y then
exit ‘Insufficient data’

else
pushback(τ, (q,a,x,q1),lcp(x,y))
pushback(τ, (q,a,y,q2),lcp(x,y))
merge(q1,q2)

q = next(Qτ,q)
return τ
Algorithm 1: Pseudo-code for the ISLFLA

5.2 Learning Results

Here, we present a proof that the ISLFLA identifies
the class of ISL functions in the limit from positive
data, in the sense of Gold (1967), with polynomial

bounds on time and data (de la Higuera, 1997). First,
we establish the notion of characteristic sample.

Definition 6 (Characteristic sample). A sample CS
is characteristic of a function f for an algorithm A if
for all samples S s.t. CS ⊆ S, A returns a represen-
tation τ such that for all x ∈ dom(f), τ(x) = f(x),
and for all x /∈ dom(f), τ(x) is not defined.

We can now define the learning criteria.

Definition 7 (Identification in polynomial time and
data (de la Higuera, 1997)). A class F of functions
is identifiable in polynomial time and data using a
class T of representations iff there exist an algorithm
A and two polynomials p() and q() such that:

1. Given a sample S of size m for f ∈ F, A re-
turns a hypothesis in O(p(m)) time;

2. For each representation τ of size k of a function
f ∈ F, there exists a characteristic sample CS
of f for A of size at most O(q(k)).

Essentially the proof for convergence follows
from the fact that given a sufficient sample the al-
gorithm merges all and only states with the same
(k − 1)-length suffix.

Clearly, merges in the outer loop only involve
states with the same (k − 1)-length suffix. This is
also the case for inner loop merges. Consider the
scenario depicted on the right in Figure 9, in which
q is a state created by an outer loop merge.

After pushback, states s and t will be merged. If
x = Suffk−1(q), then both s and t must have xa as a
suffix. Since |xa| = k, it follows that Suffk−1(s) =
Suffk−1(t). It also follows that additional states
merged to remove non-determinism resulting from
the merge of s and t will have the same suffix of
length (k−1). To show that all states with the same
(k − 1)-length suffix will be merged, it is shown
that the ISLFLA will never encounter the situation
in Figure 9, provided the data set includes a seed de-
fined as follows.

Definition 8 (ISLFLA seed). Given a k-ISL function
f and T ISLf , let Q′ = {q ∈ Q | δ1(q, #) 6= λ}. A
seed for f is S = S′ ∪S′′, where

• S′ = {(w,w′) | [w ∈ Σ≤k ∧f(w) = w′]}
• S′′ = {(wa,w′′) | a ∈ Σ, [w ∈ Σk ∧
Suffk−1(w) ∈ Q′ ∧f(wa) = w′′]}.

499


Lemma 2 (Convergence). A seed for an ISL func-
tion f is a characteristic sample for ISLFLA.

Proof. We show that for any ISL function f and a
dataset D that contains a seed S the output of the
ISLFLA is T ISLf .

Let PTT(D) = (QPTT ,q0P T T , Σ, Γ,δPTT ,
σPTT ) be the input to the ISLFLA and T =
(QT ,q0T , Σ, Γ,δT ,σT ) be its output. First we show
that QT = Σ≤k−1. By Definitions 4 and 8,
Σ≤k−1 ⊆ QPTT . Since the ISLFLA only merges
states with the same (k− 1)-length suffix, Σ≤k−1 ⊆
QT . Since it does not exit until all states q have been
merged with Suffk−1(q), QT = Σ≤k−1.

Next we show that given S, the algorithm will
never need to merge two states q1 and q2 such that
δ1(q1, #) 6= δ1(q2, #). Let δ1(q1, #) = z, and
δ1(q2, #) = x with z 6= x and q1 = Suffk−1(q2).
By Definition 2, tailsf (q1) = tailsf (q2), so if
z 6= x it must be the case that q2 does not have tran-
sitions for all a ∈ Σ. This is because the only way
for the output strings of the outgoing transitions of
q2 to differ from those of q1 is if fewer transitions
were present on q2 when the PTT was made onward.
(By definition of S we know q1 has transitions for all
a ∈ Σ.) But since tailsf (q1) = tailsf (q2), we
also know that z = ux for some u ∈ Γ∗.

By Definition 8, all states up to length k have tran-
sitions for all a ∈ Σ; therefore, |q2| ≥ k + 1. This
means ∃q′ ∈ Σk between q1 and q2, which will be
merged with some other state before q2 will. This
merge will cause non-determinism, which in turn
will trigger pushback and cause u to move further
down the branch toward q2. By extension there will
be |q2|−k states between q1 and q2, each of which
will be merged, triggering pushback of u, so that
by the time q1 and q2 are merged, δ1(q2, #) = ux =
z = δ1(q2, #). Thus, all non-determinism can be
removed, and so T is subsequential.

It remains to show that ∀q ∈ QT ,a ∈ Σ,
δ2(q,a) = Suff

k−1(qa). Since state merging pre-
serves transitions, this follows from the construction
of PTT(D). By Theorem 3, T = T ISLf .

Next we establish complexity bounds on the run-
time of the ISLFLA and the size of the characteristic
sample for ISL functions. We observe that both of
these bounds improve the bounds of OSTIA. While
not surprising, since ISL functions are less general

than subsequential functions, the result is important
since it is an example of greater a priori knowledge
enabling learning with less time and data.

Let m be the length of the longest output string in
the sample and let n denote the number of states of
the PTT; n is at most the sum of the lengths of the
input strings of the pairs in the sample.

Lemma 3 (Polynomial time). The time complexity
of the ISLFLA is in O(n ·m ·k · |Σ|).
Proof. First, making the PTT onward can be done in
O(m · n): it consists of a depth-first parsing of the
PTT from its root, with a computation at each state
of the lcp of the outgoing transition outputs after
the recursive computation of the function (see de la
Higuera (2010), Chap. 18, for details). As the com-
putation of the lcp takes at most m steps, and as it
has to be done for each state, the complexity of this
step is effectively in O(m ·n).

For the two loops, we need to find a bound on the
number of merges that can occur. States q such that
|q| < k do not yield any merges in the outer loop.
All other states q′ are merged with Suffk−1(q), in
the outer loop or in the inner one. The number of
merges is thus bounded by n. Computing the suffix
of length (k − 1) of any word can be done in O(k)
with a correct implementation of strings of charac-
ters. The test of the inner loop can be done in con-
stant time and so can the merge and pushback pro-
cedures. After each merge, the test of the inner loop
needs to be done at most |Σ| times. As computing
the lcp has a complexity in O(m), the overall com-
plexity of the two loops is in O(n ·m ·k · |Σ|).

The overall complexity of the algorithm is thus
O(m ·n + n ·m ·k · |Σ|) = O(n ·m ·k · |Σ|).

Let T ISLf = (Q,q0, Σ, Γ,δ,σ) be a target trans-
ducer. We define m = max{|u| | (q,a,u,q′) ∈ δ},
n = |Q|, and p = max {|v| | (q, #,v) ∈ σ}.
Lemma 4 (Polynomial data). The size of the charac-
teristic sample is in O(n·|Σ|·k·m+n2·m+n·|Σ|·p).
Proof. The first item of the seed, S′, covers all and
only the states of the target: the left projection of
these pairs is thus linear in n and every right projec-
tion is at most n · m + p. Thus the size of S′ is at
most n · (n + n ·m + p) = O(n2 ·m + n ·p).

Concerning the second part, S′′, its cardinality is
at most n · |Σ| (in the rare case where Q′ = Q).

500


Each element of the left projection of S′′ is of length
(k + 1) and each element of its right projection is at
most of length (k + 1)·m+p. The size of S′′ is thus
in O(n · |Σ| · (k ·m + p)).

Therefore, the size of the characteristic sample is
in O(n·|Σ|·k·m+n2·m+n·|Σ|·p), which is clearly
polynomial in the size of the target transducer.

Theorem 8. The ISLFLA identifies the class of ISL
functions in polynomial time and data.

Proof. Immediate from Lemmas 2, 3, and 4.

5.3 Demonstrations
We tested the ISLFLA with the three examples in
§4, as well as the English flapping process (t → R /
V́ V). For each case, a data set was constructed
according to Definition 8 using the alphabets pre-
sented in §4. The alphabet for English flapping was
{V́, V, t, ?}. The value of k is 2 for final devoicing,
@-epenthesis, and fricative deletion and 3 for English
flapping. A few additional data points of length 5 or
6 were also added to make the data set a superset of
the seed. In all four cases, the learner returned the
correct T ISLf .

The decision to use artificial corpora in these
demonstrations was motivated by the fact that the
sample in Definition 8 will not be present in a natural
language corpus. That sample includes all possible
sequences of symbols from the alphabet of a given
length, whereas a natural language corpus will re-
flect the language-particular restrictions against cer-
tain segment sequences (i.e., phonotactics).

As discussed in the introduction, Gildea and Ju-
rafsky (1996) address this issue with natural lan-
guage data by equipping OSTIA with a community
bias, whereby segments belonging to a natural class
(i.e., stops, fricatives, sonorants) are expected to be-
have similarly, and a faithfulness bias, whereby seg-
ments are assumed to be realized similarly on the
surface. In our demonstrations we put aside the is-
sue of the behavior of segments in a natural class by
using abbreviated alphabets (e.g., T for all voiceless
stops). But if in fact knowledge of natural classes
precedes the learning of phonological processes, the
use of such an alphabet is appropriate.

In future developments of the ISLFLA we like-
wise aim to accommodate natural language data,
but in a way that maintains the theoretical result of

identification in the limit. The restrictions on seg-
ment sequences represented in natural language data
amount to ‘missing’ transitions in the initial prefix
tree transducer that is built from that data. In other
words, the transducer represents a partial, not a to-
tal function. Thus it seems the approach of Oncina
and Varò (1996) and Castellanos et al. (1998) could
be very instructive, as their use of domain informa-
tion enabled OSTIA to learn partial functions. In
our case, the fact that the domain of an ISL function
is an SL language could provide a means of ‘filling
in’ the missing transitions. The details of such an
approach are, however, being left for future work.

6 Conclusion

This paper has defined Input and Output Strictly
Local functions, which synthesize the properties of
subsequential transduction and Strictly Local formal
languages. It has provided language-theoretic char-
acterizations of these functions and argued that they
can model many phonological and morphological
processes. Lastly, an automata-theoretic character-
ization of ISL functions was presented, along with a
learning algorithm that efficiently learn this class in
the limit from positive data.

Current work includes developing a comparable
automata characterization and learning algorithm for
OSL functions, as well as defining additional func-
tional classes to model those phonological processes
that cannot be modeled with ISL or OSL functions.
The SL languages are just one region of a subregu-
lar hierarchy of formal languages (McNaughton and
Papert, 1971; Rogers and Pullum, 2011; Rogers et
al., 2013). The ISL and OSL functions defined here
are the first step in developing a corresponding hi-
erarchy of subregular functions. Of immediate in-
terest to phonology are functional counterparts for
the Tier-Based Strictly Local and Strictly Piecewise
language classes, which have been shown to model
long-distance phonotactics (Heinz, 2010; Heinz et
al., 2011). Such functions might be useful for mod-
eling the long-distance processes that repair viola-
tions of these phonotactic constraints.

Acknowledgements
We thank Adam Jardine, Jim Rogers, and the three
anonymous reviewers for helpful comments. This
research was supported by NSF CPS#1035577.

501


References
Dana Angluin. 1982. Inference of reversible languages.

Journal for the Association of Computing Machinery,
29(3):741–765.

Kenneth R. Beesley and Lauri Karttunen. 2003. Finite
State Morphology. Center for the Study of Language
and Information.

William Bennett. 2013. Dissimilation, Consonant Har-
mony, and Surface Correspondence. Ph.D. thesis, Rut-
gers University.

Antonio Castellanos, Enrique Vidal, Miguel A. Varó, and
José Oncina. 1998. Language understanding and sub-
sequential transducer learning. Computer Speech and
Language, 12:193–228.

Jane Chandlee and Jeffrey Heinz. 2012. Bounded copy-
ing is subsequential: implications for metathesis and
reduplication. In Proceedings of SIGMORPHON 12.

Jane Chandlee and Adam Jardine. 2014. Learn-
ing phonological mappings by learning Strictly Local
functions. In John Kingston, Claire Moore-Cantwell,
Joe Pater, and Robert Staubs, editors, Proceedings of
the 2013 Meeting on Phonology. LSA.

Jane Chandlee and Cesar Koirala. 2014. Learning local
phonological rules. In Proceedings of the 37th Penn
Linguistics Conference.

Jane Chandlee, Angeliki Athanasopoulou, and Jeffrey
Heinz. 2012. Evidence for classifying metathesis
patterns as subsequential. In The Proceedings of the
29th West Coast Conference on Formal Linguistics,
Somerville, MA. Cascadilla.

Jane Chandlee. 2014. Strictly Local Phonological Pro-
cesses. Ph.D. thesis, University of Delaware.

Noam Chomsky and Morris Halle. 1968. The Sound
Pattern of English. Harper & Row, New York.

Colin de la Higuera. 1997. Characteristic sets for poly-
nomial grammatical inference. Machine Learning,
27(2):125–138.

Colin de la Higuera. 2010. Grammatical Inference:
Learning Automata and Grammars. Cambridge Uni-
versity Press.

Brian Gainor, Regine Lai, and Jeffrey Heinz. 2012.
Computational characterizations of vowel harmony
patterns and pathologies. In The Proceedings of the
29th West Coast Conference on Formal Linguistics,
pages 63–71, Somerville, MA. Cascadilla.

Daniel Gildea and Daniel Jurafsky. 1996. Learning bias
and phonological-rule induction. Computational Lin-
guistics, 22(4):497–530.

E. Mark Gold. 1967. Language identification in the
limit. Information and Control, 10:447–474.

Gunnar Hansson. 2001. Theoretical and Typological Is-
sues in Consonant Harmony. Ph.D. thesis, University
of California, Berkeley.

Jeffrey Heinz and Regine Lai. 2013. Vowel harmony
and subsequentiality. In Andras Kornai and Marco
Kuhlmann, editors, Proceedings of the 13th Meeting
on the Mathematics of Language (MoL 13), pages 52–
63.

Jeffrey Heinz, Chetan Rawal, and Herbert G. Tanner.
2011. Tier-based Strictly Local constraints for phonol-
ogy. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics, pages 58–
64. Association for Computational Linguistics.

Jeffrey Heinz. 2007. The Inductive Learning of Phono-
tactic Patterns. Ph.D. thesis, University of California,
Los Angeles.

Jeffrey Heinz. 2009. On the role of locality in learning
stress patterns. Phonology, 26:303–351.

Jeffrey Heinz. 2010. Learning long-distance phonotac-
tics. Linguistic Inquiry, 41:623–661.

Mans Hulden, Iñaki Alegria, Izaskun Etxeberria, and
Montse Maritxalar. 2011. Learning word-level dialec-
tal variation as phonological replacement rules using
a limited parallel corpus. In Proceedings of the First
Workshop on Algorithms and Resources for Modelling
of Dialects and Language Varieties, DIALECTS ’11,
pages 39–48. Association for Computational Linguis-
tics.

Mans Hulden. 2009. Finite-State Machine Construction
Methods and Algorithms for Phonology and Morphol-
ogy. Ph.D. thesis, University of Arizona.

Adam Jardine. 2013. Tone is (computationally) differ-
ent. Unpublished manuscript, University of Delaware.

C. Douglas Johnson. 1972. Formal Aspects of Phonolog-
ical Description. The Hague, Mouton.

Brian D. Joseph and Irene Philippaki-Warburton. 1987.
Modern Greek. Croom Helm, Wolfeboro, NH.

Ronald M. Kaplan and Martin Kay. 1994. Regular mod-
els of phonological rule systems. Computational Lin-
guistics, 20:371–387.

Huan Luo. 2013. Long-distance consonant harmony and
subsequentiality. Unpublished manuscript, University
of Delaware.

Robert McNaughton and Seymour A. Papert. 1971.
Counter-Free Automata. MIT Press.

Mehryar Mohri. 1997. Finite-state transducers in lan-
guage and speech processing. Computational Linguis-
tics, 23:269–311.

José Oncina and Pedro Garcı́a. 1991. Inductive learning
of subsequential functions. Technical Report DSIC II-
34, University Politécnia de Valencia.

José Oncina and Miguel A. Varò. 1996. Using do-
main information during the learning of a subsequen-
tial transducer. Lecture Notes in Computer Science
- Lecture Notes in Artificial Intelligence, pages 313–
325.

502


José Oncina, Pedro Garcı́a, and Enrique Vidal. 1993.
Learning subsequential transducers for pattern recog-
nition interpretation tasks. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 15(5):448–
457.

Amanda Payne. 2013. Dissimilation as a subsequen-
tial process. Unpublished manuscript, University of
Delaware.

Alan Prince and Paul Smolensky. 1993. Optimality
Theory: Constraint interaction in generative gram-
mar. Technical Report 2, Rutgers University Center
for Cognitive Science.

Jason Riggle. 2003. Non-local reduplication. In Pro-
ceedings of the 34th Annual Meeting of the North East-
ern Linguistic Society.

James Rogers and Geoffrey K. Pullum. 2011. Aural pat-
tern recognition experiments and the subregular hier-
archy. Journal of Logic, Language and Information,
20:329–342.

James Rogers, Jeffrey Heinz, Margaret Fero, Jeremy
Hurst, Dakotah Lambert, and Sean Wibel. 2013. Cog-
nitive and sub-regular complexity. In Glyn Morrill and
Mark-Jan Nederhof, editors, Formal Grammar, Lec-
ture Notes in Computer Science, volume 8036, pages
90–108. Springer.

Sharon Rose and Rachel Walker. 2004. A typology of
consonant agreement as correspondence. Language,
80:475–531.

Keiichiro Suzuki. 1998. A Typological Investigation of
Dissimilation. Ph.D. thesis, University of Arizona.

Natasha Warner, Allard Jongman, Anne Cutler, and Doris
Mücke. 2001. The phonological status of Dutch
epenthetic schwa. Phonology, 18:387–420.

503


504