key: cord-0046140-wybbc832
authors: Banderier, Cyril; Goldwurm, Massimiliano
title: Number of Prefixes in Trace Monoids: Clique Polynomials and Dependency Graphs
date: 2020-06-24
journal: Beyond the Horizon of Computability
DOI: 10.1007/978-3-030-51466-2_22
sha: 0682c17808c5bddfa94ca3727b54c3878eb10605
doc_id: 46140
cord_uid: wybbc832

We present some asymptotic properties on the average number of prefixes in trace languages. Such languages are characterized by an alphabet and a set of commutation rules, also called concurrent alphabet, which can be encoded by an independency graph or by its complement, called dependency graph. One key technical result, which has its own interest, concerns general properties of graphs and states that “if an undirected graph admits a transitive orientation, then the multiplicity of the root of minimum modulus of its clique polynomial is smaller or equal to the number of connected components of its complement graph”. As a consequence, under the same hypothesis of transitive orientation of the independency graph, one obtains the relation [Formula: see text], where the random variables [Formula: see text] and [Formula: see text] represent the number of prefixes in traces of length n under two different fundamental probabilistic models: the uniform distribution among traces of length n (for [Formula: see text]), the uniform distribution among words of length n (for [Formula: see text]). These two quantities are related to the time complexity of algorithms for solving classical membership problems on trace languages.

In computer science, trace monoids have been introduced by Mazurkiewiecz [22] as a model of concurrent events, describing which action can permute or not with another action (we give a formal definition of traces and trace monoids in Sect. 2, see also [14] for a treatise on the subject). In combinatorics, they are related to the fundamental studies of the "monoïde partiellement commutatif" introduced by Cartier and Foata in [10] , and to its convenient geometrical view as heap of pieces proposed by Viennot in [25] . Several classical problems in language theory (recognition of rational and context-free trace languages, determination of the number of representative words of a given trace, computing the finite state automaton recognizing these words) can be solved by algorithms that work in time and space proportional to (or strictly depending on) the number of prefixes of the input trace [3, [6] [7] [8] 15, 23] . This is due to the fact that prefixes represent the possible decompositions of a trace in two parts and hence they are natural indexes for computations on traces.

This motivates the analysis of the number of prefixes of a trace of given length both in the worst and in the average case. In the average case analysis, two natural sequences of random variables play a key role:

-{T n } n∈N , the number of prefixes of traces of length n generated at random under the equidistribution of traces of given size; -{W n } n∈N , the number of prefixes of traces of length n generated at random under the equidistribution of representative words of given size.

For some families of trace monoids, the asymptotic average, variance, and limit distributions of {T n } and {W n } are known [6, 7, [19] [20] [21] . It is interesting that they rely on the structural properties of an underlying graph (the independency graph, defined in Sect. 2). For example, it is known that, for every trace monoid M, the maximum number of prefixes of a trace of length n is of the order Θ(n α ), where α is the size of the largest clique in the concurrent alphabet defining M [8] . We summarize further such results in Sect. 3. In analytic combinatorics (see [17] for an introduction to this field), it remains a nice challenge to get a more universal description of the possible asymptotics of T n and W n . In this work we prove that, if the concurrent alphabet (Σ, C) admits a transitive orientation, then

This is obtained by showing a general property of undirected graphs, which in our context is applied to the concurrent alphabet (Σ, C) and its complement (Σ, C c ). Such a property states that, for any undirected graph G admitting a transitive orientation of its edges, the number of connected components of its complement is greater or equal to the multiplicity of the root of smallest modulus in the clique polynomial of G. The interest for the present discussion mainly relies on the use of finite state automata and on classical tools of formal languages to study properties of integer random variables in particular the asymptotic behaviour of their moments. The paper is organized as follows: in Sect. 2 we recall the basic definitions on trace monoids; in Sect. 3 we summarize some asymptotic results on the random variables T n and W n ; in Sects. 4 and 5, we present our main results on crosssections of trace monoids, clique polynomials, and a new bound relating the asymptotic behaviour of T n and W n ; we then conclude with possible future extensions of our work.

For the reader not already familiar with the terminology of trace languages, we present in this section the key notions used in this article (see e.g. [14] for more details on all these notions).

Given a finite alphabet Σ, as usual Σ * denotes the free monoid of all words over Σ, ε is the empty word and |w| is the length of a word w for every w ∈ Σ * . We recall that, for any w ∈ Σ * , a prefix of w is a word u ∈ Σ * such that w = uv, for some v ∈ Σ * . Also, for any finite set S, we denote by #S the cardinality of S.

A concurrent alphabet is then a pair (Σ, C), where C ⊆ Σ ×Σ is a symmetric and irreflexive relation over Σ. Such a pair can alternatively be defined by an undirected graph, which we call independency graph, where Σ is the set of nodes and {{a, b} | (a, b) ∈ C} is the set of edges. Its complement (Σ, C c ) is called dependency graph. As the notions of concurrent alphabet and independency graph are equivalent, in the sequel we indifferently refer to either of them. Informally, a concurrent alphabet lists the pairs of letters which can commute.

The trace monoid generated by a concurrent alphabet (Σ, C) is defined as the quotient monoid Σ * / ≡ C , where ≡ C is the smallest congruence extending the equations {ab = ba : (a, b) ∈ C}, and is denoted by M(Σ, C) or simply by M. Its elements are called traces and its subsets are named trace languages. In other words, a trace is an equivalence class of words with respect to the relation ≡ C given by the reflexive and transitive closure of the binary relation ∼ C over Σ * such that uabv ∼ C ubav for every (a, b) ∈ C and every u, v ∈ Σ * . For any w ∈ Σ * , we denote by [w] the trace represented by w; in particular [ε] is the empty trace, i.e. the unit of M. Note that the product of two traces r, s ∈ M, where r = [x] and s = [y], is the trace t = [xy], which does not depend on the representative words x, y ∈ Σ * and we denote the product by t = s · r. The length of a trace t ∈ M, denoted by |t|, is the length of any representative word. For any n ∈ N, let M n := {t ∈ M : |t| = n} and m n := #M n .

Note that if C = ∅ then M reduces to Σ * , while if C = {(a, b) ∈ Σ×Σ | a = b} then M is the commutative monoid of all monomials with letters in Σ.

Any trace t ∈ M can be represented by a partial order over the multiset of letters of t, denoted by PO(t). It works as follows: first, consider a word w satisfying t = [w]. Then, for any pair of letters (a, b) of w, let a i be the i-th occurrence of the letter a and b j the j-th occurrence of the letter b. The partial order is then defined as a i < b j whenever a i precedes b j in all representative words of [w]. (See Example 1 hereafter.)

A prefix of a trace t ∈ M is a trace p such that t = p · s for some s ∈ M. Clearly, any prefix of t is a trace p = [u] where u is a prefix of a representative of t. It is easy to see that if p is a prefix of t then the PO(u) is an order ideal of PO(t) and can be represented by the corresponding antichain. We recall that an antichain of a partial order set (S, ≤) is a subset A ⊆ S such that a ≤ b does not hold for any pair of distinct elements a, b ∈ A, while an order ideal in (S, ≤) is a subset {a ∈ S | ∃ b ∈ A such that a ≤ b} for some antichain A of (S, ≤). For every t ∈ M, we denote by Pref(t) the set of all prefixes of t.

Let M be the trace monoid characterized by the following independency graph:

That is, one has ab = ba, bc = cb, cd = dc. Then, the trace [bacda] (i.e., the equivalence class of the word bacda) is the set of words {bacda, badca, abdca, abcda, acbda}. The corresponding partially ordered set is given by the following diagram

where an arrow from x i to y j means that x i always precedes y j and where we omitted the arrows implied by transitivity. The set of prefixes is given by Recognizable, rational and context-free trace languages are well defined by means of linearization and closure operations over traditional string languages; their properties and in particular the complexity of their membership problems are widely studied in the literature (see for instance [8, 14, 15, 23] ).

For any alphabet Σ and trace monoid M, we denote by Z Σ the set of formal series on words (they are thus series in noncommutative variables) and by Z M the set of formal series on traces (they are thus series in partially commutative variables), and Z[[z]] stands for ring of classical power series in the variable z. These three distinct rings (with the operations of sum and Cauchy product, see [5, 14, 24] ) will be used in Sects. 4 and 5.

Several algorithms are presented in the literature for the recognition of rational and context-free trace languages, or for other problems like computing the number of representative words of a trace, that take a trace t as input and then carry out some operations on all prefixes of t [3, [6] [7] [8] [9] 15, 23] . Thus, their time and space complexity strictly depend on the number of prefixes of t and in many cases they work just in time Θ(# Pref(t)). Now, it follows from [8] that

where α is the size of the largest clique in the independency graph of M. It is thus essential to get a more refined analysis of the asymptotic behaviour of # Pref(t) under natural distribution models in order to obtain a better understanding of the average complexity of all these algorithms.

In this section, we recall the main results on the number of prefixes of a random trace, under two different probabilistic models.

A main goal of the present contribution is to compare the random variables T n and W n , defined by

where t is uniformly distributed over M n , while w is uniformly distributed over Σ n . Clearly the properties of T n and W n immediately yield results on time complexity of the algorithms described in [3, 6, 7] assuming, respectively, equiprobable input traces of length n and equiprobable representative words of length n. Since every trace of length n has at least n + 1 prefixes, a first crude asymptotic bound is

for a suitable constant d > 0, where α is defined as in (1). More precise results on the moments of W n are studied in [6, 7, 20] :

where k is the number of connected components of the dependency graph of M. This relation is obtained by constructing suitable bijections between each moment of W n and the set of words of length n in a regular language [6] . These bijections also allow proving a first order cancellation of the variance, i.e. var(W n ) = O(n 2k−1 ) [20] . Further, when the dependency graph is transitive, this leads to two different limit laws, either chi-squared or Gaussian, according whether all the connected components of (Σ, C c ) have the same size or not [19] .

Now, in order to analyse T n (the number of prefixes of a random trace of size n), it is useful to introduce the generating function of the trace monoid M: The Möbius function of M is defined as

where all a i ∈ Σ are distinct and (a i , a j ) ∈ C for any i = j, 0 otherwise.

It is in fact a polynomial belonging to Z M . As established by Cartier and Foata in [10] , an important property of μ M is that

where ξ M = t∈M t is the characteristic series of M. Here, ξ M can be seen as a partially commutative analogue of M (z). Now, let p M ∈ Z[z] be the commutative analogue of μ M . It then follows that

where c i is the number of cliques of size i in the independency graph of M.

For this reason, we call p M the clique polynomial of the independency graph (Σ, C). Its properties are studied in several papers (see for instance [18, 21] ). In particular, the commutative analogue of Eq. (4) is then

This entails that M (z) = (p M (z)) −1 , a fundamental identity which can also be derived by an inclusion-exclusion principle.

As it is known from [21] that p M has a unique root ρ of smallest modulus (and clearly ρ > 0 via Pringsheim's theorem, see [17] ), one gets m n = #M n = cρ −n n −1 + O ρ −n n −2 , where c > 0 is a constant and is the multiplicity of ρ in p M (z). We observe that the existence of a unique root of smallest modulus for p M (z) is not a consequence of the strict monotonicity of the sequence {m n }. Indeed, if one considers M (z) = 1 (1−z 3 )(1−z) 2 , one has m n+3 = ((n + 5)m n + 2m n+1 + 2m n+2 )/(n + 3) so the sequence {m n } is strictly increasing; however, the polynomial (1−z 3 )(1−z) 2 has 3 distinct roots of smallest modulus. Therefore, such a M (z) cannot be the generating function of a trace monoid.

In our context, clique polynomials are particularly relevant as they are related to the average value of the number of prefixes of traces [7, 21] . In fact, for any trace monoid M, we have E[T n ] = Pn mn , where P n = t∈Mn # Pref(t). Since ξ 2 M = t∈M # Pref(t)t, from (4) and (6) its commutative analogue becomes n P n z n = p M (z) −2 and hence P n = Θ(ρ −n n 2 −1 ), which proves

where is the multiplicity of the smallest root of p M (z).

Cross-sections are standard tools to study the properties of trace monoids by lifting the analysis at the level of free monoids. Intuitively, a cross-section of a trace monoid M is a language L having exactly one representative string for each trace in M. Thus, the generating function of L coincides with M (z) and hence it satisfies equality (6) . Among all cross-sections of M, it is convenient to consider a canonical one. A natural one is based on a normal form using the lexicographic order [1] . Alternatively, one can see it as based on the orientations of edges in the independency graph of M, as used in [12, 13] to study properties of Möbius functions in trace monoids. It works as follows. Let ≤ be any total order on the alphabet Σ and let ≤ * be the lexicographic linear order induced by ≤ over Σ * . We denote by < C the binary relation over Σ such that a < C b if (a, b) ∈ C and a ≤ b. Thus, < C is an orientation of the independency graph of M. We now consider the following cross-section of M: the language L ≤ of all minimal lexicographic representatives of traces in M, i.e. L ≤ = {w ∈ Σ * | w ≤ * y for every y ∈ [w]}. Moreover, L ≤ is regular, as it satisfies the equality

where C a := {c ∈ Σ | (a, c) ∈ C} is the set of letters allowed to commute with a. Thus, L ≤ is the set of all words in Σ * that do not contain any factor of the form bva where a < C b and v ∈ C * a . Then, for any w ∈ Σ * , in order to verify whether w ∈ L ≤ , one can read the letters of w in their order, updating at each step the family of letters a ∈ Σ forming a "forbidden" factor of the form bva, with a < C b, v ∈ C * a . If one of these letters is met then w is rejected, otherwise it is accepted.

To formalize the definition, for each b ∈ Σ, the predecessors of b are Pred(b) = {a ∈ Σ | a < C b}. Define the finite state automaton A as the 4-tuple (2 Σ , ∅, δ, F ), where the set of states is 2 Σ , i.e. the power set of Σ, the initial state is the empty set ∅, F = {S ∈ 2 Σ | S = Σ} is the family of final states and the transition function δ : (2 Σ × Σ) → 2 Σ is given by

Note that, during a computation, the current state S represents the set of forbidden letters. At the beginning, all input letters are allowed, as ∅ is the initial state, while Σ is a trap state, where all letters are forbidden. In a general step, if S ⊆ Σ is the current state and b / ∈ S is an input letter, the new set of forbidden letters must be obtained from S ∪ Pred(b) by removing those elements that do not commute with b. This justifies the above definition of δ and it is clear that

Moreover, the state set of the above automaton can be reduced to the states S Σ reachable from ∅. Setting

the entries of the transition matrix A of the automaton A are given by:

] of this transition matrix has therefore all its entries which are monomials of degree one in z. Factorizing by z, this commutative analogue can thus be written zA, for a matrix A we call the adjacency matrix of A. Note that A strictly depends on both the concurrent alphabet (Σ, C) and the total order ≤ over Σ.

As a consequence, since A recognizes a cross-section of M, denoting by π and η, respectively, the characteristic (column) vectors of ∅ and Q, the generating function M (z) is given by

where I is the identity matrix of size #Q × #Q and π is the transposed of π. This identity, together with relation (6) proves the following proposition.

For any trace monoid M with a concurrent alphabet (Σ, C), let ≤ be a total order on Σ, let A be the adjacency matrix of the automaton A recognizing the cross-section L ≤ of M, and assume I, π and η defined as in (9) . Then, M (z) and p M (z) satisfy the identities 

The standard ordering (a, b, c, d, e) on Σ induces the following (non-transitive) orientation < C over the independency graph From that I − zA is easily computed (where A is the adjacency matrix of A): 

where α is the size of the maximum clique in (Σ, C) and all x i 's are eigenvalues of a adjacency matrix A.

Proof (sketch). The result follows from Proposition 1 by refining equalities (10) and recalling that all roots of clique polynomials are different from 0.

We observe that the reverse property does not hold in general, i.e. it may occur that an eigenvalue of A is not the reciprocal of a root of p M (z). However, as shown in the following section, such a reverse sentence is true whenever the graph (Σ, C) admits a transitive orientation.

Now let us consider a trace monoid M such that its independency graph (Σ, C) admits a transitive orientation. Then, we may fix a total order ≤ on Σ such that < C is transitive. In this case, the definition of cross-section L ≤ and of the automaton A can be simplified, since the set of "forbidden" factors of the form bwa, with a < C b and w ∈ C * a , can be reduced to the simple set of words S = {τ σ ∈ Σ 2 | σ < C τ }. To prove this property, consider a forbidden factor of the above form bwa, with a < C b and w ∈ C * a ; thus any symbol c occurring in w must verify (a, c) ∈ C. As a consequence, either a < C c or c < C a: in the first case ca belongs to S while, in the second case, by transitivity of < C we have c < C b and hence bc is in S.

Thus, identity (8) can be simplified as L ≤ = Σ * \ a<Cb Σ * baΣ * . Moreover, the state set of the automaton A can be reduced to Q = {Pred(a) | a ∈ Σ} and the transition function now assumes values δ(S, b) = Pred(b), for every S ∈ Q and every b ∈ Σ\S. Proposition 3. Let (Σ, C) be a concurrent alphabet with an associated independency graph admitting a transitive orientation < C . Let ≤ be a total order on Σ extending < C . Also assume that the dependency graph (Σ, C c ) is connected. Then the adjacency matrix A is primitive.

Proof (sketch) . Under these hypotheses, by the simplifications above, it turns out that the state diagram of the automaton A (defined by ≤) is strongly connected and has at least one loop.

The hypothesis of transitivity for < C cannot be avoided to guarantee that A is primitive. For instance, in Example 2 the dependency graph (Σ, C c ) is connected but the orientation < C of (Σ, C) is not transitive, and in fact observe that the corresponding transition matrix is not irreducible and hence A is not primitive. Nevertheless, the smallest root of p M (z) is simple and then the same concurrent alphabet satisfies the following theorem. Proof (sketch). First, it is well-known [18, 21] that p M (z) is always the product of the clique polynomials of all independency subalphabets given by the connected components of (Σ, C c ). Then, each of these clique polynomials (using the additional condition that one has a transitive orientation) has a smallest root of multiplicity 1: this follows from Proposition 3 and a commutative analogue of a result in [11] stating that, when (Σ, C) has a transitive orientation, its clique polynomial equals det(I − zA).

Applying the previous theorem to relations (3) and (7), one gets the following. Theorem 5. Let (Σ, C) be a concurrent alphabet. If its independency graph admits a transitive orientation < C , then the random variables counting the number of prefixes in traces (as defined in (2)

Consider the concurrent alphabet (Σ, C) and the orientation < C of Example 2. Note that (Σ, C) is connected but < C is not transitive and in fact A is not primitive. However, (Σ, C) admits a (different) orientation that is transitive, given by

A total order extending the previous orientation is c < d < a < e < b. Computing matrix A with respect to this total order we obtain

and hence det(I − zA) = 1 − 5z + 6z 2 − z 3 = p M (z).

The following example considers an independency graph of M that does not admit any transitive orientation. In this case p M (z) is a proper factor of det(I − zA), but its smallest root is again simple and hence ≤ k is still true even if the hypothesis of Theorem 4 is not satisfied.

Consider the concurrent alphabet corresponding to the following independency graph G, associated to the following partial order < C :

Thus the transition matrix, defined according to Sect. 4, is given by the following table: 

We have investigated the fundamental role played by the clique polynomial in asymptotic studies of trace monoids. Building on the factorization property (stated in Proposition 1), we got a link between the multiplicity of its smallest root and the number of connected components of some associated graph (Theorem 4). This, in turn, is the key for a new asymptotic relation between the number of prefixes in traces of length n: E[T n ] = O(E[W n ]) (Theorem 5), where T n and W n correspond to two natural models (uniform distribution over traces and over words). In the long version of this article, we plan to extend these analyses to more general cases (including concurrent alphabets without transitive orientation). Several other problems remain open in our context and could be at the centre of future investigations. The first one concerns the adjacency matrix A defined in Sect. 4, which does not seem to be studied too much in the previous literature; in particular, in all our examples det(I − zA) is a clique polynomial, even when the concurrent alphabet (Σ, C) does not admit any transitive orientation. For this purpose, similarly to the approach used in [11] and in our proof of Theorem 4, it is possible to adapt a noncommutative approach building on links to words with forbidden patterns (see [2] ). We plan to use these links to tackle the asymptotic behaviour of the variance and higher moments of {T n }, and the limit distributions of both {T n } and {W n } for all trace monoids.

In conclusion, all these studies are further illustration of the nice interplay between complex analysis (analytic combinatorics) and the structural properties of formal languages, as also illustrated e.g. in [4, 5, 16, 17, 19, 20] .

Inhomogeneous sorting

Analytic combinatorics of lattice paths with forbidden patterns, the vectorial kernel method, and generating functions for pushdown automata

Analysis of algorithms for the recognition of rational and context-free trace languages

Formulae and asymptotics for coefficients of algebraic functions

Rational Series and Their Languages

On the prefixes of a random trace and the membership problem for context-free trace languages

Analysis of a class of algorithms for problems on trace languages

Equivalence and membership problems for regular trace languages

Efficient recognition of trace languages defined by repeat until loops

Problèmes combinatoire de commutation et réarrangements

Determinants and Möbius functions in trace monoids

Transitive orientations, Möbius functions, and complete semi-thue systems for free partially commutative monoids

Möbius functions and confluent semi-commutations

The Book of Traces

Commutations dans les monoïdes libres: un cadre théorique pour l'étude du parallélisme

Analytic models and ambiguity of context-free languages

Analytic Combinatorics

Dependence polynomials

Some limit distributions in analysis of algorithms for problems on trace languages

Probabilistic estimation of the number of prefixes of a trace

Clique polynomials have a unique root of smallest modulus

Concurrent program schemes and their interpretations

Some properties of trace languages

Automata-Theoretic Aspects of Formal Power Series

Heaps of pieces, I : Basic definitions and combinatorial lemmas