key: cord-0060364-nw6zz53c
authors: Klin, Bartek; Lasota, Sławomir; Toruńczyk, Szymon
title: Nondeterministic and co-Nondeterministic Implies Deterministic, for Data Languages
date: 2021-03-23
journal: Foundations of Software Science and Computation Structures
DOI: 10.1007/978-3-030-71995-1_19
sha: a7dff2cda495ae7dd05a16e2c7aca9edc94bcb5f
doc_id: 60364
cord_uid: nw6zz53c

We prove that if a data language and its complement are both recognized by nondeterministic register automata (without guessing), then they are also recognized by deterministic ones.

Register automata are finite-state automata equipped with a finite number of registers that can store values from an infinite data domain. When processing an input string, an automaton compares the current input data value to its registers and, based on this comparison and on the current control state, it chooses its next control state and possibly stores the input value in one of its registers. In the original model, introduced over 25 year ago by Francez and Kaminski [15] , data values can only be compared for equality and not for any other property. Subsequent extensions of the model allow for comparing data values with respect to some fixed relations such as a total order, or introduce alternation, variations on the allowed form of nondeterminism, etc.

It appears that register automata lack most of the good properties known from the classical theory of finite automata. For example, while languages of nondeterministic register automata are closed under unions and intersections, they are not closed under complement, and they do not determinize. Moreover, the expressivity of register automata is very sensitive to natural variants and extensions. Any of the following relaxations of the model leads to a strict increase of expressive power (see [15, 23, 1] for details):

increasing the number of registers (when this number is bounded), extension from one-way to two-way automata, extension from deterministic to unambiguous, nondeterministic or alternating ones,

adding the capability to nondeterministically guess data values.

In fact, almost every combination of these extensions leads to a different class of recognized languages. Furthermore, no satisfactory characterizations of languages of register automata in terms of regular expressions [17, 20] or logic [23, 12] are known. There are a few positive results: a simulation of two-way nondeterministic automata by one-way alternating automata with guessing [1] , a Myhill-Nerode characterization of languages of deterministic automata [16, 4, 5] , and the well-behaved class of languages definable by orbit-finite monoids [2] , which admits equivalent characterisations in terms of logic [11] and a syntactic subclass of deterministic automata [7] . Nevertheless, register automata satisfy almost no semantic equivalences that hold for classical finite automata.

Contribution. Our primary contribution is a collapse result: if a language and its complement are both recognized by nondeterministic register automata (NRA), then they are both recognized by deterministic ones (DRA). In symbols, we prove the following equality of language classes:

This result is shown under the assumption that the data values can be compared only for equality, and it turns out to be quite fragile. For instance, it fails if the automata can compare data values using a total order relation. It also fails if NRA are additionally equipped with the capability of guessing fresh data values, even when data values can only be compared for equality. Our secondary contribution is a collapse result for NRA with 1 register only (1-NRA), but over an arbitrary data domain that admits well quasi-order (wqo), meaning roughly that finite induced substructures of the data domain, ordered by embeddings, form a wqo. This includes both equality and ordered data domains. In short, we prove the following inclusion of language classes: 1-NRA ∩ co-1-NRA ⊆ DRA.

The inclusion is strict, as some DRA languages are not recognizable by 1-NRA.

Our proofs are mostly self-contained, but use basic notions and results about sets with atoms [1] , also known as nominal sets [24] . In particular, automorphisms of the data domain play a central role in our arguments, and we extensively use notions such as finite support and orbit-finiteness of sets. In both results, we prove that for every data language L ∈ NRA ∩ co-NRA the set of derivative languages w −1 L is orbit-finite, i.e., finite up to automorphism of data values. The collapse then follows from an orbit-finite version of the Myhill-Nerode theorem.

In our primary contribution, orbit-finiteness of the set of derivative languages is a consequence of a key technical result (Lem. 1), an abstract observation about orbit-finite families of sets, which we believe may be of independent interest. As another example application of this lemma, we give a new proof of decidability of universality for unambiguous register automata (URA).

Relation to other work. Our primary result partially confirms a conjecture of Thomas Colcombet [10] , according to which every two disjoint languages of NRA with guessing are separable by a language recognized by an URA. Working in the special case when the NRA are complementing and have no guessing, we show more: both languages are then recognized not only by an URA but by a DRA.

NRA do not have good algorithmic properties: while the emptiness problem is PSpace-complete [14] , the universality problem (does a given automaton accept all data words?) is undecidable [15] (it is decidable only for 1-NRA [14] ). Universality becomes decidable for URA, as shown recently in [22] (2-ExpSpace upper bound, improved to 2-ExpTime upper bound in [8] ), and language containment and equality for URA reduce polynomially to universality (see [8, Lemma 8] ). As mentioned above, our results allow us to re-prove this decidability result.

Register automata have been intensively investigated, with respect both to their foundational properties [15, 25, 17, 23] and to their applications to XML databases and logics [14] (see [26] for a survey). There are several other ways to extend finite-state machines with a capability to recognize languages over infinite alphabets. These include, apart from register automata: their abstract version -nominal automata or automata over atoms [4, 5, 1] ; symbolic automata [13] ; pebble automata [21] ; and data automata [3, 6] .

The model of register automata, as considered in this paper, is parametrized by an underlying relational structure Atoms over a finite vocabulary Σ. This structure constitutes a data domain; its elements are called atoms. A register automaton processes sequences of atoms, possibly coupled with labels from a fixed finite set. It may store atoms read from the input in its registers, and compare them with previously stored atoms using relations in Σ (equality included).

Here are some example data domains:

-Equality atoms: natural numbers with equality (N, =). Since equality is the only available relation, any other countably infinite set could be used instead. -Dense order atoms: rational numbers with the standard order (Q, ). Again, any countably infinite dense order without endpoints could be used instead. is the equality on the first coordinate: (n 1 , n 2 ) = 1 (m 1 , m 2 ) if n 1 = m 1 .

In the following we consider input alphabets of the form S × Atoms, where S is a finite set of labels. A data word is a finite sequence w ∈ (S × Atoms) * , and a data language is a set of data words.

A nondeterministic register automaton (NRA) A consists of:

an input alphabet of the form S × Atoms, for some finite set S, a positive integer r ∈ N (the number of registers), a finite set of control states (locations) Q, subsets I, F ⊆ Q of initial resp. accepting states, a finite set ∆ of transition rules of the form

where p, q ∈ Q, s ∈ S, ϕ(x 1 , . . . , x r , x) is a quantifier-free Σ-formula with free variables in {x 1 , . . . , x r , x}, and st ∈ {1, . . . , r, none}.

Intuitively, ϕ defines a condition which needs to be satisfied by the register contents (x 1 , . . . , x r ) and by the current atom (x) for a transition to happen, and st specifies the register in which the input atom is stored after the transition, st = none meaning that it is not to be stored in any register. An NRA A is deterministic (DRA) if it has exactly one initial state and if for every two transition rules (p, s, ϕ 1 , st 1 , q 1 ), (p, s, ϕ 2 , st 2 , q 2 ) ∈ ∆, such that ϕ 1 ∧ ϕ 2 is satisfiable in Atoms, we have st 1 = st 2 and q 1 = q 2 . We write r-NRA, resp. r-DRA, when the number of registers r is fixed.

A configuration q(a) ∈ Q × (Atoms ∪ {⊥}) r of A consists of a control state q ∈ Q and a content of registers a ∈ (Atoms ∪ {⊥}) r , where ⊥ means that the content of a register is undefined (i.e., the register is empty). A rule (1) induces a transition p(a) A run of A on a data word w = (s 1 , a 1 ) · · · (s n , a n ) is a sequence

where q 0 is an initial state and a 0 is a tuple where the content of all registers is undefined. We then say that the configuration q n (a n ) is reachable along w. The finite set of all configurations reachable along w is finite, and it is denoted A(w). A run is accepting if it ends in a configuration with an accepting state. A data word w is accepted by A if there is an accepting run of A on w. A NRA is unambiguous (URA) if every word has at most one accepting run.

The language of A, denoted L(A), is the set of all data words accepted by A.

In all our examples, the finite component S of data alphabets will be a singleton set. We will therefore omit S when describing automata, so (1) will simplify to (p, ϕ, st, q) ∈ ∆.

Graphically, a transition rule like this will be presented as Furthermore, / / p means that p is initial and q means that q is accepting.

Example 1. For the equality atoms, consider the language L ⊆ Atoms * of those words where the first letter appears at some later position:

This language is recognized by a DRA with one register and three control states:

This automaton stores the first letter in its only register and then remains in the (non-accepting) state q until the letter is encountered again; then it moves to the accepting state r and stays there.

Example 2. Still for the equality atoms, consider the reverse of the language from Example 1, i.e., the language of those words where the last letter appears at some earlier position. This language is not recognized by any DRA, but it is recognized by a NRA with one register and three control states:

This automaton nondeterministically decides to store a letter in its register and then checks that the last letter is equal to the stored one.

Example 3. Still for the equality atoms, consider the complement of the language from Example 2, i.e., the language L of those words where the last letter does not appear at any earlier position. (In particular, we consider the empty word and all length-one words to be in this language.) The language L is not recognized by any NRA. However, it becomes recognizable if automata are additionally equipped with the ability of guessing, that is, of updating the contents of their registers with arbitrary atoms, possibly different from the one that comes with the current input letter. Unlike NRA without guessing, those with guessing are closed under reversal [18, Def. 3 and Corollary 31] , and the reversal of the language L is even recognized by a DRA.

Example 4. Automata from Ex. 1-3 work just as well over the dense order domain: the formulas in their transition rules simply do not use the order relation. However, over densely ordered atoms something more happens: the language from Ex. 3 is recognizable by a NRA without guessing.

The automaton has two registers. The idea is that, at any moment in an accepting run where these registers store atoms a 1 < a 2 :

(a) in the part of the word read so far, no letter is in the open interval (a 1 , a 2 ), (b) the last letter of the word will belong to that open interval.

Condition (a) can be ensured easily: upon reading a letter a that belongs to the open interval (a 1 , a 2 ), the automaton will (enter an accepting state for the moment and) put a in one of the two registers. The register is chosen nondeterministically so that condition (b) remains true. If the currently input letter is not in the interval (a 1 , a 2 ), the automaton enters a rejecting state for the moment, with the registers kept unchanged.

Special treatment is needed to deal with situations where the last letter of the word will be larger than (or smaller than) all the letters encountered so far. These are taken care of by introducing special control states where one of the two registers remains undefined.

Over equality atoms, consider the language L k of all words w of length at least k whose kth last letter is equal to the last letter. Then L k is recognised by a NRA with one register and k + 1 states, depicted below:

The complement of L k is also recognised by an NRA, similar to the one above, but with x = x 1 in place of x = x 1 in the last transition, and with an additional component for accepting words of length smaller than k. The language L k is also recognised by a DRA with k registers, where register number i stores the letter which appeared on the latest seen position with index congruent to i, mod k. It has k states, for counting the index of the current position, mod k.

Our primary contribution is:

Over equality atoms, if a data language and its complement are both recognizable by nondeterministic register automata, then they are both recognizable by deterministic register automata.

Note that this result fails if automata with guessing are considered (see Ex. 3). Indeed, the language from Ex. 2 is recognized by a 1-NRA, and its complement in Ex. 3 is recognized by a 1-NRA with guessing, but they are not deterministically recognizable.

Moreover, the result fails (even without guessing) for densely ordered atoms. The counterexample is the same: the language from Ex. 2 is recognized by a 1-NRA, and its complement is recognized by a 2-NRA over densely ordered atoms as explained in Ex. 4, but they are not deterministically recognizable. Here the use of two registers in NRA is necessary, due to our secondary contribution: for a wide range of data domains, if a data language and its complement are both recognized by 1-NRA, then they are recognized by DRA.

We prove this for any data domain Atoms which admits wqo in the following sense. A well quasi-order (wqo) is a quasi-order (Z, ) such that for every infinite sequence z 1 , z 2 , . . . ∈ Z there are 1 i < j with z i z j . For a finite set X, an X-labeled substructure of Atoms is a set B ⊆ Atoms together with a labelling B : B → X. For two X-labeled substructures B and C of Atoms, we say that B embeds into C (written B C) if some automorphism π of Atoms, restricted to B, yields a label-preserving injection from B to C, so that B = C • π B . Let age X (Atoms) be the set of all finite labeled substructures of Atoms, partially ordered by . We say that Atoms admits wqo if for every finite set X, the quasi-order (age X (Atoms), ) is a wqo. All data domains listed in Section 2 admit wqo [19] . They are also oligomorphic (see Sec. 5 below).

Theorem 2. Over any oligomorphic atoms that admit wqo, if a data language and its complement are both recognizable by nondeterministic register automataa with one register, then they are recognizable by deterministic register automata. The rest of the paper consists of the proofs of Thms. 1 and 2, in Sec. 6 and 8, respectively, preceded by Sec. 5 that recalls basic definitions of the setting of sets with atoms which are used in the proofs. Our main technical lemma is proved in Sec. 6. Besides proving Thm. 1, in Sec. 7 we explain how it implies decidability of universality for unambiguous register automata.

Our proofs rely on some basic notions and results of the theory of sets with atoms [1] , also known as nominal sets [24] . In this section we recall what is necessary to follow our arguments; this is part of a uniform abstract approach to register automata developed in [4, 5, 1] .

Let Aut(Atoms) denote the group of all automorphisms of a relational structure Atoms. (For the equality atoms (N, =) this means the group of all bijections; for the densely ordered atoms (Q, ), the group of monotone bijections.) We consider sets equipped with an action of this group, typically, Atoms n for some n 0 or Atoms * with the componentwise action.

Group actions. A (left) action of a group G on a set X is a mapping · : G × X → X such that 1 · x = x and σπ · x = σ · (π · x) for all σ, π ∈ G and x ∈ X. We then say that G acts on X, or that X is a G-set. For x ∈ X, we call the set {π · x | π ∈ G} the orbit of x; or an orbit in X. The orbits in X partition X into disjoint sets. We call X orbit-finite if it has finitely many orbits.

Group actions canonically extend along familiar set-theoretic constructions: if X and Y are G-sets then the cartesian product X ×Y , the disjoint union X Y , the set of sequences X * , the powerset P(X) etc. are all G-sets, in the expected way. For example, G acts componentwise on X × Y via π · (x, y) = (π · x, π · y).

A structure Atoms is oligomorphic if for every n ∈ N, the componentwise action of Aut(Atoms) on Atoms n induces finitely many orbits. All structures considered in this paper are oligomorphic; an example of a nonoligomorphic structure is the total order of integers.

Supports. Let Aut(Atoms) act on a set X and let x ∈ X. A support of x is any set S ⊆ Atoms such that the following implication holds for all π ∈ Aut(Atoms):

if π(s) = s for all s ∈ S then π · x = x.

An element x ∈ X is finitely supported if it has some finite support.

For many structures Atoms, finite supports of a fixed element are always closed under intersections. Then every finitely supported x has the least support, denoted sup(x). This happens in particular for the equality atoms (as proved in [24, Prop. 2.3] or in [5, Cor. 9.4] ) and for the dense order atoms (as proved in [5, Prop. 9.5] ). It is easy to prove that taking least supports commutes with group actions: π · sup(x) = sup(π · x) for every x ∈ X and π ∈ Aut(Atoms).

Equivariance. An element (or a subset, relation, function. . . ) of an Aut(Atoms)set is called equivariant if it is supported by the empty set; equivalently, it is fixed by every automorphism of Atoms. For example:

a subset Z of an Aut(Atoms)-set X is equivariant if and only if it is a union of orbits in X (indeed, it is then equivariant as an element of P(X));

for all x ∈ X, y ∈ Y and π ∈ Aut(Atoms). An equivariant function is a function whose graph is an equivariant relation.

Standard set-theoretic relations such as set membership, or set containment, are

If ∼ is an equivariant equivalence relation on X then Aut(Atoms) acts on the set X/ ∼ , by π · C = {π · x | x ∈ C} for each ∼-equivalence class C ⊆ X.

Register automata. Fix a structure Atoms and let R be an NRA with input alphabet S × Atoms, control states Q, and with r registers. The group Aut(Atoms) acts on all the components of R:

on the input alphabet A := S × Atoms, via π · (s, a) = (s, π(a)); on the set C := Q × (Atoms {⊥}) r of all configurations of R, via π · q(a 1 , . . . , a r ) = q(π(a 1 ), . . . , π(a r )) (where π(⊥) = ⊥); −→ π · q(a ).

Furthermore, each of these components is orbit-finite, and each of its elements has a finite support. Using the terminology of [5] , this means that register automata are a special case of orbit-finite automata.

By equivariance of all the components above, the language L(R) of a register automaton is an equivariant subset of A * = (S × Atoms) * , considered with the componentwise action of Aut(Atoms) on A * , i.e. π · ((s 1 , a 1 ) , . . . , (s n , a n )) = ((s 1 , π · a 1 ), . . . , (s n , π · a n )).

Myhill-Nerode theorem. In order to prove that a language is deterministically recognizable, we use the following Myhill-Nerode characterization.

For an alphabet A = S × Atoms and data language L ⊆ A * , consider its Myhill-Nerode equivalence ∼ L ⊆ A * × A * , defined by Among other things, this theorem immediately implies that the language from Ex. 2 is not deterministically recognizable, neither for the equality atoms nor for the total order atoms. Indeed, two words are Myhill-Nerode equivalent with respect to that language if and only if they contain the same set of letters. Therefore, the language cannot be deterministically recognizable, since automorphisms of Atoms preserve the number of distinct letters in a word.

In the proof, we will make use of an abstract notion of a split of a family of sets. For any family F of subsets of a set X, a split of F is a pair (U, V ) of sets which partition X: X = U V , such that both U and V are finite unions of elements of F. Obviously, for any splits to exist, X = F must hold.

In the following lemma, Atoms is the equality atoms.

Lemma 1. For any Aut(Atoms)-set X with finitely supported elements, and any equivariant, orbit-finite family F of finitely supported subsets of X, the set G of splits of F is orbit-finite. Moreover, a bound on the number of orbits of G and the maximal size of the support of an element in G are computable from the analogous bounds for F.

As should be clear after reading Sec. 5, the set of splits of F is considered with the natural action of Aut(Atoms): π · (U, V ) = (π · U, π · V ), where π · W = {π · x | x ∈ W } for W ⊆ X.

We will prove Lem. 1 in Sec. 6.2. For now, let us show how the lemma implies Thm. 1.

Let A and B be two NRA over an alphabet A = S × Atoms such that L(A) and L(B) partition A * . We will show that the Myhill-Nerode equivalence of L = L(A) has orbit-finitely many classes. Together with Thm. 3, this will prove that L is deterministically recognizable.

Let C be the set of configurations of A B (the disjoint union of A and B.) Hence, C consists of tuples of the form q(a) where q is either a state of A or a state of B (but not both), and a is a tuple of elements of Atoms {⊥} of appropriate length. For c ∈ C denote L c := {w ∈ A * | A B accepts w from configuration c} , and let F = {L c | c ∈ C}. Since C is equivariant and orbit-finite, so is F. Moreover, if c = q(a) then L c is finitely supported by the atoms in a. Clearly, every word (s 1 , a 1 ) · · · (s n , a n ) ∈ A * is supported by {a 1 , . . . , a n }. This means that F and X = A * satisfy the assumptions of Lem. 1, therefore F has only orbit-finitely many splits.

Every word v ∈ A * induces a partition of A * into two disjoint sets:

Moreover, the sets U v and V v are finite unions of sets from F, namely

These unions are finite because automata A and B allow no guessing and so A(v) and B(v), the sets of configurations reachable in A resp. B by reading the word v, are finite. Therefore, (U v , V v ) is a split of F, for any word v. By definition, u ∼ L v if and only if U u = U v . Consider any two words v, w ∈ A * such that the splits (U v , V v ) and (U w , V w ) are in the same orbit, i.e., U w = π · U v (and therefore also V w = π · V v ) for some automorphism π. Since L is an equivariant language, we have π · U v = U π·v and so w ∼ L π · v. Theorem 1 now follows from Thm. 3.

Before proving Lem. 1, we give some examples of families of splits, which may be helpful in developing some intuitions.

The first example shows that the number of orbits of splits may grow as fast as double-exponentially, relative to the least supports of elements of F. Example 6. For the equality atoms, fix k 1 and let X be the set of all k-tuples of pairwise distinct atoms. For each S ⊆ Atoms with |S| = k, let S (k) = S k ∩ X and let M S = X \ S (k) . Note that S (k) is finite, with k! elements.

The family F ⊆ P(X) of all singletons in X and all sets M S as above is equivariant and has two orbits. Each set in F has a support of size k.

For any K ⊆ S (k) , consider the partition of X into K and X \ K. Then

Moreover, every split (U, V ) of F is of the form (K, X \ K) or (X \ K, K) for some S and K as above. Indeed, suppose U = U and V = V for some finite U, V ⊆ F. As U ∪ V = X is infinite, U ∪ V must contain M S for some set S of k atoms. Suppose without loss of generality that M S ∈ U. By disjointness of U and V , the set V ⊆ F may only contain singletons {v}, for v ∈ S (k) . Then (U, V ) = (X \ K, K), where K = V.

For K, K ⊆ S (k) , the splits defined by K and K are in the same orbit only if there is an automorphism π that fixes S as a set, such that π · K = K . Since there are only k! bijections on S, the set of splits of F has at least 2 k! k! orbits.

The next example shows the difference between splits and the finite subfamilies of F that define those splits: the set of those families may be orbit-infinite.

Example 7. Let X be the set of all finite sets of equality atoms. For any distinct atoms a, b, define E a,b , D a,b ⊆ X by:

And let F contain all sets E a,b and D a,b . This F has two orbits. Obviously, (U, V ) = (X, ∅) is a split of F; it is enough to take U = {D a,b , E a,b } and V = ∅ for any fixed a, b. However, there are many more minimal families U and V that achieve the same effect. Indeed, for any number n, and for any pairwise distinct atoms a 1 , . . . , a n , consider:

It is easy to check that U = X. All such families are minimal (in fact, removing any element from U would prevent it from being the part of any split of F), and for each n these families form a separate orbit.

The following example shows that the statement of Lem. 1 fails if the atoms are (Q, ). It is obtained from Ex. 4 via the translation given in the proof of Thm. 1, and a simplification replacing each word by its last letter. Then F has five orbits (here ±∞ are fixed under the action of Aut(Atoms)). For any finite set K ⊆ X, consider the partition of X into K and X \ K. Then K = q∈K {q} whereas X \ K is the union of all intervals (p, q), where p < q are consecutive elements in K ∪ {−∞, +∞}. Hence, (K, X \ K) is a split of F. In particular, the set of all splits of F has infinitely many orbits, because the set of finite subsets of X has infinitely many orbits.

We prove by induction a stronger statement, where the atoms are assumed to be an expansion of (N, =) by finitely many constants. In other words, in this section we will assume that Atoms is a structure over a vocabulary that consists of (equality and) a finite number of constant symbols; the universe of Atoms is N, with the constants interpreted as some pairwise distinct numbers. The group Aut(Atoms) then consists of all bijections of Atoms which fix every constant.

If Atoms is such a structure and T is a finite set of atoms all different from the constants, then by Atoms T we denote the structure, over an extended vocabulary, that arises from Atoms by interpreting all the atoms in T as additional constants. Obviously, Aut(Atoms T ) is a subgroup of Aut(Atoms), so every action of Aut(Atoms) on a set X restricts to an action of Aut(Atoms T ). This restriction preserves and reflects the existence of finite supports: an element x ∈ X is supported by some S in the action of Aut(Atoms) if and only if it is supported by S \ T in the restricted action of Aut(Atoms T ). In particular, if Atoms is an expansion of (N, =) by finitely many constants, then every finitely supported element x has a least support sup(x). Note that sup(x) never contains any constants, since those can always be safely removed from any support.

For a subset U of an orbit-finite equivariant set F, its dimension dim(U) is the maximum size of the least support of an element of U. This makes sense even if U is infinite, because F is orbit-finite and sets from the same orbit have least supports of the same size. In particular, dim(F) is well defined.

The following lemma says that adding constants to atoms preserves orbitfiniteness. It is a standard result in the theory of sets with atoms, see e.g. [1, Lem. 3.19] or [24, Lem. 5.22] , indeed it is a fundamental property of oligomorphic structures, but we re-prove it here to extract explicit bounds: Lemma 2. Fix a finite set T ⊆ Atoms. For any orbit-finite Aut(Atoms)-set F with l orbits, the corresponding action of Aut(Atoms T ) on F is also orbit-finite, with at most l · (|T | + 1) dim(F) orbits.

Proof. Assume first that F has only one orbit in the Aut(Atoms)-action, i.e., that l = 1. Let d = dim(F). Let Y denote the set of d-tuples of pairwise distinct atoms different from the constants in Atoms. This is a single-orbit set under the componentwise action of Aut(Atoms). Pick any x 0 ∈ F. Let y 0 = (a 1 , . . . , a d ) ∈ Y be an enumeration of sup(x 0 ). There is a unique equivariant surjection f : Y → X such that f (π · y 0 ) = π · x 0 for all π ∈ Aut(Atoms). (The function f is total since Y has one orbit; it is well defined because y 0 enumerates a support of x 0 , and it is surjective since X has one orbit.) Two tuples in Y are in the same orbit in the action of Aut(Atoms T ) if and only if they contain the same arrangement of atoms from T at the same positions. There are at most (|T | + 1) d such arrangements, (in fact fewer than this if d > 1, because tuples in Y are pairwise distinct), so Y has at most (|T | + 1) d such orbits. X is an image of the equivariant function f : Y → X, so the same bound applies to X. For a set F with l orbits, each of dimension at most d, the bound simply multiplies by l.

From now on consider Atoms as described above, and let X and F be as in the statement of Lem. 1. The following key lemma says that every split of F has a support of a bounded size. Lemma 3. Let U V be a split of F and let U, V be finite subfamilies of F such that U = U and V = V . Then U and V each have a support of size at most N , for some bound N computable only from dim(U), dim(V), dim(F) and the number of orbits in F.

The crux of this lemma is that the number N does not depend on the split U V . It only depends on the number of orbits in F, its dimension dim(F), and on dim(U) and dim(V) (which, anyway, are bounded from above by dim(F)).

Proof (of Lem. 3). We proceed by induction on k = dim(U) + dim(V). Fix k 0 and assume that the statement of the lemma holds for all smaller values of k. Without loss of generality, we may assume that ∅ does not belong to U nor V (as it can be safely removed from each of them).

For a finitely supported set F ⊆ X define F := {π · y | π ∈ Aut(Atoms), y ∈ F, sup(y) ∩ sup(F ) = ∅} .

Intuitively, F arises by taking all elements of F that are "fresh for F ", i.e., ones whose supports share no atoms with the support of F , and then by applying arbitrary atom automorphisms to those elements. Note that that F is equivariant and F = (π · F ) for any automorphism π.

Proof. Take any x ∈ X. Let S = F ∈U∪V sup(F ). Since U and V are finite, S is a finite set. Pick an automorphism π such that its inverse π −1 maps sup(x) to a set disjoint with S. Consider the element y = π −1 · x ∈ X. Since U ∪ V = X, there must be some F ∈ U ∪ V such that y ∈ F . Then x ∈ F .

Let us first prove the lemma for the special case where X = F for some F ∈ U ∪ V. Suppose that X = F for some F ∈ U (the case F ∈ V is symmetric).

Claim 2 Every y ∈ X with sup(y) ∩ sup(F ) = ∅ belongs to F .

Proof. Take any y as above. As X = F , there is some π and x ∈ F such that y = π · x and sup(x) ∩ sup(F ) = ∅. Pick an automorphism θ such that:

θ agrees with π on sup(x), mapping it bijectively to sup(y), θ fixes sup(F ) pointwise.

Such a θ exists since sup(x) and sup(y) are both disjoint from F . Then θ · x = π ·x = y by the first property above, and θ ·x ∈ θ ·F = F by the second property. Altogether, y ∈ F .

Proof. We show that if sup(G) is disjoint from sup(F ) then G must be empty, contradicting our previous assumption. Suppose x ∈ G. Pick an automorphism π which fixes sup(G) pointwise and maps sup(x) to a set disjoint with sup(F ). Such a π exists because sup(G) and sup(F ) are disjoint. Letting y := π · x, we have y ∈ F by Claim 2, and moreover y = π · x ∈ π · G = G. Then y ∈ F ∩ G ⊆ U ∩ V = ∅, a contradiction. This proves G = ∅, which in turn contradicts the assumption that ∅ ∈ V.

Denote T = sup(F ). If T = ∅ then by Claim 3, V has dimension 0 and therefore V is supported by the empty set. So we may assume that T = ∅. For the same reason we may assume that the family V is not empty.

Let Atoms T be obtained from Atoms by including the elements of T as new constants. Hence, Atoms T extends Atoms by at most r constants, where r := dim(F).

Let l be the number of orbits in F. By Lem. 2, the family F, treated as a family of sets over the atoms Atoms T , is still orbit-finite, with the number of orbits l depending only on l and r. Clearly, U V remains a split of F. Note that if F ∈ F is supported by some set S over Atoms, then F is supported by S, indeed even by S \ T , over Atoms T . In particular, the dimension of F does not increase by moving from Atoms to Atoms T . More interestingly, by Claim 3, the least supports of all the elements in V actually decrease when considering Atoms T as atoms. Since V is not empty, the dimension of V strictly decreases and it follows that dim(U) + dim(V) < k over Atoms T . Applying the inductive assumption yields a set T of size N , depending on k − 1 and l , such that T supports V over Atoms T . By construction, V is supported by T ∪ T over Atoms. Note that |T ∪ T | N := N + r.

This concludes the proof in the special case when X = F for some F ∈ U ∪ V.

In the general case, for each F ∈ U ∪ V define:

Then F F = F and (U F , V F ) is a split of F F which falls into the special case considered above. Hence, U F has some support S F of size at most N .

Then U is supported by S := F ∈U∪V S F . Note that S F only depends on the orbit of F , as F = (π · F ) for any automorphism π. As there are l such orbits contained in F, it follows that S has size at most N := N l. This concludes the inductive step, and the proof of Lem. 3.

Using Lem. 3, we now proceed to prove Lem. 1.

Proof (of Lemma 1). Consider an equivariant set X and an equivariant, orbit finite family F of finitely supported subsets of X. Let ((U i , V i )) i∈I be a family of splits of F. By Lem. 3, each one of these splits is supported by some set of a bounded size. Applying suitable automorphisms to each of these splits, we can obtain a family of splits ((U i , V i )) i∈I such that, for all i ∈ I: -U i and U i are in the same orbit, and each U i is supported by the same set S.

It is now enough to show that there are only finitely many subsets U ⊆ X supported by a fixed set S, which are unions of elements of F.

By Lem. 2 it follows that F has finitely many orbits under the action of the group Aut(Atoms S ) of all automorphisms which fix S pointwise. (Here, as in the statement of Lem. 1, Atoms are the pure equality atoms without any constants.) If a set U ⊆ X supported by S contains some F ∈ F as a subset, then it contains π · F for every π ∈ Aut(Atoms S ). In other words, U contains (the union of) the entire orbit in F under the action of Aut(Atoms S ). Since we assume that U is a union of elements of F, it is a union of (the unions of) orbits in F, and there are only finitely many of these.

This completes the proof of Lem. 1.

Lemma 1 is interesting in its own right and its applications are not limited to the ones mentioned in Sec. 4. We shall now show how it can be used to decide universality (and hence also language containment and equality, cf. [8, Lem. 8] ) of URA over the pure equality atoms Atoms.

Theorem 4. [22, Thm. 14] The language containment and equality problems are decidable for unambiguous register automata.

As an application of Lem. 1, we give an alternative decidability proof for the universality problem of URA. First, we prove a consequence of Lem. 1.

Lemma 4. Let X be an equivariant set over equality atoms, and let F be an equivariant, orbit-finite family of finitely supported subsets of X. There is a bound M , computable from dim(F) and the number of orbits in F, such that every P ⊆ F which is a partition of X has size at most M .

Proof. Let G = {U | (U, V ) is a split of F}. By Lem. 1, G is orbit-finite. Moreover, its elements are finitely supported. Let P ⊆ F be a partition of X into nonempty subsets. For each U ⊆ P, the union U belongs to G; in particular, we have 2 |P| elements of G, each containing different sets in P. The proof is completed by the following counting argument. 1 Let S = F ∈P sup(F ). An S-orbit in G is an orbit in G with respect to the action of those atom permutations which fix S pointwise. Equivalently, it is an orbit in G viewed as a Aut(Atoms S )-set. By Lem. 2, for any finite S ⊆ Atoms, the number of S-orbits in G is bounded by l · (|S| + 1) k , where k and l are computable from dim(F) and the number of orbits of F.

Two splits G, G ∈ G in the same S-orbit contain the same elements of P: if G = π · G then by equivariance of F and G, for each F ∈ P we have F ⊆ G if and only if π · F ⊆ π · G, but π · F = F when π fixes S pointwise. Hence, for any two distinct U, U ⊆ P, their unions U and U belong to different S-orbits in G, so there are least 2 |P| such orbits. As |S| dim(F) · |P|, we get:

It follows that |P| is bounded by some M computable from k, l, and dim(F). Lemma 4 has the following corollary, which is a strong restriction on the structure of universal URA and easily yields Thm. 4.

Call a configuration c of a NRA A nonempty if the NRA accepts some word from this configuration, i.e., the following language is nonempty:

Since NRA emptiness is decidable, it is not difficult to modify any given NRA to one with only nonempty configurations. This transformation preserves URA, so we may safely assume that we only consider URA with this property. Corollary 1. Let A be a URA with nonempty configurations and which accepts every input word. Then there is a computable bound M such that A may reach at most M different configurations when reading any given input word.

Proof. Let A be an URA over an input alphabet A = S × Atoms. Let C be the set of configurations of A and let F := {L c | c ∈ C} . Note that dim(F) is not larger than the number of registers r of A, and the number of orbits in F is not larger than the number of orbits of configurations in A, which in turn is equal to the number of control states in A times the number of orbits in (Atoms {⊥}) r (equal to the r + 1-st Bell number).

For each w ∈ A * , the set A(w) ⊆ C of configurations reachable when reading w is finite, since A has no guessing. Unambiguity of A implies that the family

consists of pairwise disjoint sets. If additionally L(A) = A * , then P w forms a partition of A * , so |P w | M where M is the bound from Lemma 4. As |A(w)| |P w |, this yields the corollary.

Decidability of universality of URA now follows using standard ideas.

Proof (of Thm. 4, sketch). We use the notation of the proof of Cor. 1. The idea is to construct the truncated powerset automaton whose states are sets of at most M states of A.

Let C denote the family of subsets of C of size at most M ; then C is orbitfinite. We define a deterministic automaton A with an infinite, but orbit-finite state space C . Its transitions are X a −→ Y, for X, Y ∈ C such that

The initial state of A is the set C 0 ⊆ C of initial configurations of A (unless |C 0 | > M , but then L(A) = A * by the corollary). Accepting states are all states X ∈ C which contain an accepting configuration of A. All the ingredients of A are equivariant, orbit-finite sets, so A is an orbit-finite deterministic automaton, and can be effectively constructed given A and M . Its language L(A ) is defined as usual. By construction, if L(A) = A * then L(A ) = A * , by Cor. 1.

Hence, A is universal if and only if A is universal. Since A is orbit-finite, universality of A can be effectively decided, using standard techniques for orbitfinite automata [1, 5] : by first complementing and then testing emptiness.

Towards proving Thm. 2, assume A and B are two complementing 1-NRA over an alphabet A = S × Atoms and that Atoms admit wqo.

Recall that configurations of a 1-NRA are either of the form q(a) where q is a control state and a ∈ Atoms is the register value, or of the form q(⊥) when the register value is still undefined. We assume, without losing generality, that both register automata A and B immediately update their register, i.e., every transition rule outgoing from an initial state updates the register.

Let Q and Q denote sets of control states of A and B, respectively, and assume without losing generality that Q and Q are disjoint.

For every nonempty data word w ∈ A + , the set A(w) ∪ B(w) of configurations of A and B reachable along w is finite, since NRA have no guessing, and contains no undefined configurations of the form q(⊥) due to the immediate update assumption. For every w ∈ A + define a finite induced substructure C w of Atoms, labeled with the finite set P = P(Q ∪ Q ), as follows. The elements of C w are the atoms that appear in configurations in A(w) ∪ B(w):

The labeling w : C w → P of C w maps a ∈ C w to the set of all control states which appear in A(w) ∪ B(w) together with a: w (a) = {q ∈ Q | (q, a) ∈ A(w)} ∪ {q ∈ Q | (q, a) ∈ B(w)} .

Let L = L(A). For each v ∈ A * define the partition of A * into:

Recall that u ∼ L v if and only if U u = U v .

Claim. Let u, v ∈ A + . If C u C v then π · u ∼ L v for some automorphism π.

Proof. By definition of , there is some π ∈ Aut(Atoms) which maps C u to a substructure of C v , so that π · C u ⊆ C v and u (a) = v (π(a)) for a ∈ C u .

Let u = π · u. By equivariance of register automata, if A reaches a configuration (q, a) when reading u, then it reaches the configuration (q, π(a)) when reading u = π · u. Hence, C u ⊆ C v and u (a) = u (π(a)) for a ∈ C u . Together with (2) we get u (a) = v (a) for all a ∈ C u . We show that this implies U u = U v , which will yield the claim as u = π · u. Towards proving U u ⊆ U v take any w ∈ U u ; then u w ∈ L. Pick an accepting run of A on u w. Let q(a) be the configuration of A in this run reached after reading the (nonempty) prefix u . In particular, A accepts w starting from the configuration q(a). Moreover, a ∈ C u and q ∈ u (a). As C u ⊆ C v and u (a) = v (a), it follows that A may reach the configuration q(a) after reading v. As w is accepted by A from this configuration, it follows that A accepts vw, so w ∈ U v .

The inclusion V u ⊆ V v is proved by a similar argument, using B instead of A, since L(B) = A * \ L(A) = A * \ L. As U u = A * \ V u and V v = A * \ U v , the inclusion V u ⊆ V v implies U u ⊇ U v . Altogether, U u = U v , so u ∼ L v, yielding the claim.

Theorem 2 now follows easily: assume towards a contradiction that A * / ∼ L is not orbit-finite. Then there is an infinite set X ⊆ A + such that π(u) ∼ L v for all distinct u, v ∈ X and π ∈ Aut(Atoms). As Atoms admits wqo, there are distinct u, v ∈ X such that C u C v . The claim above yields a contradiction.

We have studied a deterministic collapse for NRA: if a language and its complement are both recognized by NRA then they are also recognized by DRA. We have proved this for register automata over equality atoms; and for automata with one register only, over any atoms that admit wqo. We have also applied our key technical observation, namely orbit-finiteness of the set of splits of an orbit-finite family of sets, in order to re-prove decidability of universality of URA.

The assumed form A = S × Atoms of the input alphabets is not important; the results apply to arbitrary orbit-finite input alphabets A.

The proof of our main result (also of decidability of universality of URA) is effective, with elementary bounds. In particular, given two NRA with complementing languages the equivalent DRA from Thm. 1 has an exponential number of registers and a doubly-exponential number of orbits of states. The same bounds apply to a DRA constructed in our proof of Thm. 4. Moreover, assuming Atoms satisfy standard effectiveness assumptions, like decidability of their first-order theory, one can also compute an equivalent DRA from Thm. 2.

Concerning possible generalisations of our results, we believe that Thm. 1 holds not only for equality atoms, but for arbitrary oligomorphic ω-stable atoms. These include e.g. the nested equality atoms mentioned in Sec. 2. On the other hand Thm. 1 does not extend to disjoint but non-complementing NRA languages: it is not true that for every two disjoint NRA languages there is a DRA language that separates them, i.e., includes one of them and is disjoint from the other. The corresponding decision problem (given two disjoint NRA, does a separating DRA exist?) is decidable when the number of registers of a separating automaton is fixed [9] , and open in general.

An intriguing open question (not unlike the wqo Dichotomy Conjecture [19] ) is whether it is necessary for Atoms to admit wqo for Thm. 2 to hold.

Slightly infinite sets

Data monoids

Twovariable logic on data words

Automata with group actions

Automata theory in nominal sets

An extension of data automata that captures XPath

Single-use automata and transducers for infinite alphabets

Bidimensional linear recursive sequences and universality of unambiguous register automata

Timed games and deterministic separability

Forms of Determinism for Automata

Logics with rigidly guarded data tests

Generalized data automata and fixpoint logic

Minimization of symbolic automata

LTL with the freeze quantifier and register automata

Finite-memory automata

An algebraic characterization of deterministic regular languages over infinite alphabets

Regular expressions for languages over infinite alphabets

Finite-memory automata with non-deterministic reassignment

Decidability border for Petri nets with data: WQO dichotomy conjecture

Regular expressions for data words

Typechecking for XML transformers

The containment problem for unambiguous register automata

Finite state machines for strings over infinite alphabets

Nominal Sets: Names and Symmetry in Computer Science

Intractability of decision problems for finite-memory automata. Theor

Automata and logics for words and trees over an infinite alphabet

Acknowledgments. We thank Lorenzo Clemente for posing the collapse question studied in this paper, and Joanna Ochremiak and Radek Piórkowski for valuable discussions.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.