key: cord-0046140-wybbc832 authors: Banderier, Cyril; Goldwurm, Massimiliano title: Number of Prefixes in Trace Monoids: Clique Polynomials and Dependency Graphs date: 2020-06-24 journal: Beyond the Horizon of Computability DOI: 10.1007/978-3-030-51466-2_22 sha: 0682c17808c5bddfa94ca3727b54c3878eb10605 doc_id: 46140 cord_uid: wybbc832 We present some asymptotic properties on the average number of prefixes in trace languages. Such languages are characterized by an alphabet and a set of commutation rules, also called concurrent alphabet, which can be encoded by an independency graph or by its complement, called dependency graph. One key technical result, which has its own interest, concerns general properties of graphs and states that “if an undirected graph admits a transitive orientation, then the multiplicity of the root of minimum modulus of its clique polynomial is smaller or equal to the number of connected components of its complement graph”. As a consequence, under the same hypothesis of transitive orientation of the independency graph, one obtains the relation [Formula: see text], where the random variables [Formula: see text] and [Formula: see text] represent the number of prefixes in traces of length n under two different fundamental probabilistic models: the uniform distribution among traces of length n (for [Formula: see text]), the uniform distribution among words of length n (for [Formula: see text]). These two quantities are related to the time complexity of algorithms for solving classical membership problems on trace languages. In computer science, trace monoids have been introduced by Mazurkiewiecz [22] as a model of concurrent events, describing which action can permute or not with another action (we give a formal definition of traces and trace monoids in Sect. 2, see also [14] for a treatise on the subject). In combinatorics, they are related to the fundamental studies of the "monoïde partiellement commutatif" introduced by Cartier and Foata in [10] , and to its convenient geometrical view as heap of pieces proposed by Viennot in [25] . Several classical problems in language theory (recognition of rational and context-free trace languages, determination of the number of representative words of a given trace, computing the finite state automaton recognizing these words) can be solved by algorithms that work in time and space proportional to (or strictly depending on) the number of prefixes of the input trace [3, [6] [7] [8] 15, 23] . This is due to the fact that prefixes represent the possible decompositions of a trace in two parts and hence they are natural indexes for computations on traces. This motivates the analysis of the number of prefixes of a trace of given length both in the worst and in the average case. In the average case analysis, two natural sequences of random variables play a key role: -{T n } n∈N , the number of prefixes of traces of length n generated at random under the equidistribution of traces of given size; -{W n } n∈N , the number of prefixes of traces of length n generated at random under the equidistribution of representative words of given size. For some families of trace monoids, the asymptotic average, variance, and limit distributions of {T n } and {W n } are known [6, 7, [19] [20] [21] . It is interesting that they rely on the structural properties of an underlying graph (the independency graph, defined in Sect. 2). For example, it is known that, for every trace monoid M, the maximum number of prefixes of a trace of length n is of the order Θ(n α ), where α is the size of the largest clique in the concurrent alphabet defining M [8] . We summarize further such results in Sect. 3. In analytic combinatorics (see [17] for an introduction to this field), it remains a nice challenge to get a more universal description of the possible asymptotics of T n and W n . In this work we prove that, if the concurrent alphabet (Σ, C) admits a transitive orientation, then This is obtained by showing a general property of undirected graphs, which in our context is applied to the concurrent alphabet (Σ, C) and its complement (Σ, C c ). Such a property states that, for any undirected graph G admitting a transitive orientation of its edges, the number of connected components of its complement is greater or equal to the multiplicity of the root of smallest modulus in the clique polynomial of G. The interest for the present discussion mainly relies on the use of finite state automata and on classical tools of formal languages to study properties of integer random variables in particular the asymptotic behaviour of their moments. The paper is organized as follows: in Sect. 2 we recall the basic definitions on trace monoids; in Sect. 3 we summarize some asymptotic results on the random variables T n and W n ; in Sects. 4 and 5, we present our main results on crosssections of trace monoids, clique polynomials, and a new bound relating the asymptotic behaviour of T n and W n ; we then conclude with possible future extensions of our work. For the reader not already familiar with the terminology of trace languages, we present in this section the key notions used in this article (see e.g. [14] for more details on all these notions). Given a finite alphabet Σ, as usual Σ * denotes the free monoid of all words over Σ, ε is the empty word and |w| is the length of a word w for every w ∈ Σ * . We recall that, for any w ∈ Σ * , a prefix of w is a word u ∈ Σ * such that w = uv, for some v ∈ Σ * . Also, for any finite set S, we denote by #S the cardinality of S. A concurrent alphabet is then a pair (Σ, C), where C ⊆ Σ ×Σ is a symmetric and irreflexive relation over Σ. Such a pair can alternatively be defined by an undirected graph, which we call independency graph, where Σ is the set of nodes and {{a, b} | (a, b) ∈ C} is the set of edges. Its complement (Σ, C c ) is called dependency graph. As the notions of concurrent alphabet and independency graph are equivalent, in the sequel we indifferently refer to either of them. Informally, a concurrent alphabet lists the pairs of letters which can commute. The trace monoid generated by a concurrent alphabet (Σ, C) is defined as the quotient monoid Σ * / ≡ C , where ≡ C is the smallest congruence extending the equations {ab = ba : (a, b) ∈ C}, and is denoted by M(Σ, C) or simply by M. Its elements are called traces and its subsets are named trace languages. In other words, a trace is an equivalence class of words with respect to the relation ≡ C given by the reflexive and transitive closure of the binary relation ∼ C over Σ * such that uabv ∼ C ubav for every (a, b) ∈ C and every u, v ∈ Σ * . For any w ∈ Σ * , we denote by [w] the trace represented by w; in particular [ε] is the empty trace, i.e. the unit of M. Note that the product of two traces r, s ∈ M, where r = [x] and s = [y], is the trace t = [xy], which does not depend on the representative words x, y ∈ Σ * and we denote the product by t = s · r. The length of a trace t ∈ M, denoted by |t|, is the length of any representative word. For any n ∈ N, let M n := {t ∈ M : |t| = n} and m n := #M n . Note that if C = ∅ then M reduces to Σ * , while if C = {(a, b) ∈ Σ×Σ | a = b} then M is the commutative monoid of all monomials with letters in Σ. Any trace t ∈ M can be represented by a partial order over the multiset of letters of t, denoted by PO(t). It works as follows: first, consider a word w satisfying t = [w]. Then, for any pair of letters (a, b) of w, let a i be the i-th occurrence of the letter a and b j the j-th occurrence of the letter b. The partial order is then defined as a i < b j whenever a i precedes b j in all representative words of [w]. (See Example 1 hereafter.) A prefix of a trace t ∈ M is a trace p such that t = p · s for some s ∈ M. Clearly, any prefix of t is a trace p = [u] where u is a prefix of a representative of t. It is easy to see that if p is a prefix of t then the PO(u) is an order ideal of PO(t) and can be represented by the corresponding antichain. We recall that an antichain of a partial order set (S, ≤) is a subset A ⊆ S such that a ≤ b does not hold for any pair of distinct elements a, b ∈ A, while an order ideal in (S, ≤) is a subset {a ∈ S | ∃ b ∈ A such that a ≤ b} for some antichain A of (S, ≤). For every t ∈ M, we denote by Pref(t) the set of all prefixes of t. Let M be the trace monoid characterized by the following independency graph: That is, one has ab = ba, bc = cb, cd = dc. Then, the trace [bacda] (i.e., the equivalence class of the word bacda) is the set of words {bacda, badca, abdca, abcda, acbda}. The corresponding partially ordered set is given by the following diagram where an arrow from x i to y j means that x i always precedes y j and where we omitted the arrows implied by transitivity. The set of prefixes is given by Recognizable, rational and context-free trace languages are well defined by means of linearization and closure operations over traditional string languages; their properties and in particular the complexity of their membership problems are widely studied in the literature (see for instance [8, 14, 15, 23] ). For any alphabet Σ and trace monoid M, we denote by Z Σ the set of formal series on words (they are thus series in noncommutative variables) and by Z M the set of formal series on traces (they are thus series in partially commutative variables), and Z[[z]] stands for ring of classical power series in the variable z. These three distinct rings (with the operations of sum and Cauchy product, see [5, 14, 24] ) will be used in Sects. 4 and 5. Several algorithms are presented in the literature for the recognition of rational and context-free trace languages, or for other problems like computing the number of representative words of a trace, that take a trace t as input and then carry out some operations on all prefixes of t [3, [6] [7] [8] [9] 15, 23] . Thus, their time and space complexity strictly depend on the number of prefixes of t and in many cases they work just in time Θ(# Pref(t)). Now, it follows from [8] that where α is the size of the largest clique in the independency graph of M. It is thus essential to get a more refined analysis of the asymptotic behaviour of # Pref(t) under natural distribution models in order to obtain a better understanding of the average complexity of all these algorithms. In this section, we recall the main results on the number of prefixes of a random trace, under two different probabilistic models. A main goal of the present contribution is to compare the random variables T n and W n , defined by where t is uniformly distributed over M n , while w is uniformly distributed over Σ n . Clearly the properties of T n and W n immediately yield results on time complexity of the algorithms described in [3, 6, 7] assuming, respectively, equiprobable input traces of length n and equiprobable representative words of length n. Since every trace of length n has at least n + 1 prefixes, a first crude asymptotic bound is for a suitable constant d > 0, where α is defined as in (1). More precise results on the moments of W n are studied in [6, 7, 20] : where k is the number of connected components of the dependency graph of M. This relation is obtained by constructing suitable bijections between each moment of W n and the set of words of length n in a regular language [6] . These bijections also allow proving a first order cancellation of the variance, i.e. var(W n ) = O(n 2k−1 ) [20] . Further, when the dependency graph is transitive, this leads to two different limit laws, either chi-squared or Gaussian, according whether all the connected components of (Σ, C c ) have the same size or not [19] . Now, in order to analyse T n (the number of prefixes of a random trace of size n), it is useful to introduce the generating function of the trace monoid M: The Möbius function of M is defined as where all a i ∈ Σ are distinct and (a i , a j ) ∈ C for any i = j, 0 otherwise. It is in fact a polynomial belonging to Z M . As established by Cartier and Foata in [10] , an important property of μ M is that where ξ M = t∈M t is the characteristic series of M. Here, ξ M can be seen as a partially commutative analogue of M (z). Now, let p M ∈ Z[z] be the commutative analogue of μ M . It then follows that where c i is the number of cliques of size i in the independency graph of M. For this reason, we call p M the clique polynomial of the independency graph (Σ, C). Its properties are studied in several papers (see for instance [18, 21] ). In particular, the commutative analogue of Eq. (4) is then This entails that M (z) = (p M (z)) −1 , a fundamental identity which can also be derived by an inclusion-exclusion principle. As it is known from [21] that p M has a unique root ρ of smallest modulus (and clearly ρ > 0 via Pringsheim's theorem, see [17] ), one gets m n = #M n = cρ −n n −1 + O ρ −n n −2 , where c > 0 is a constant and is the multiplicity of ρ in p M (z). We observe that the existence of a unique root of smallest modulus for p M (z) is not a consequence of the strict monotonicity of the sequence {m n }. Indeed, if one considers M (z) = 1 (1−z 3 )(1−z) 2 , one has m n+3 = ((n + 5)m n + 2m n+1 + 2m n+2 )/(n + 3) so the sequence {m n } is strictly increasing; however, the polynomial (1−z 3 )(1−z) 2 has 3 distinct roots of smallest modulus. Therefore, such a M (z) cannot be the generating function of a trace monoid. In our context, clique polynomials are particularly relevant as they are related to the average value of the number of prefixes of traces [7, 21] . In fact, for any trace monoid M, we have E[T n ] = Pn mn , where P n = t∈Mn # Pref(t). Since ξ 2 M = t∈M # Pref(t)t, from (4) and (6) its commutative analogue becomes n P n z n = p M (z) −2 and hence P n = Θ(ρ −n n 2 −1 ), which proves where is the multiplicity of the smallest root of p M (z). Cross-sections are standard tools to study the properties of trace monoids by lifting the analysis at the level of free monoids. Intuitively, a cross-section of a trace monoid M is a language L having exactly one representative string for each trace in M. Thus, the generating function of L coincides with M (z) and hence it satisfies equality (6) . Among all cross-sections of M, it is convenient to consider a canonical one. A natural one is based on a normal form using the lexicographic order [1] . Alternatively, one can see it as based on the orientations of edges in the independency graph of M, as used in [12, 13] to study properties of Möbius functions in trace monoids. It works as follows. Let ≤ be any total order on the alphabet Σ and let ≤ * be the lexicographic linear order induced by ≤ over Σ * . We denote by < C the binary relation over Σ such that a < C b if (a, b) ∈ C and a ≤ b. Thus, < C is an orientation of the independency graph of M. We now consider the following cross-section of M: the language L ≤ of all minimal lexicographic representatives of traces in M, i.e. L ≤ = {w ∈ Σ * | w ≤ * y for every y ∈ [w]}. Moreover, L ≤ is regular, as it satisfies the equality where C a := {c ∈ Σ | (a, c) ∈ C} is the set of letters allowed to commute with a. Thus, L ≤ is the set of all words in Σ * that do not contain any factor of the form bva where a < C b and v ∈ C * a . Then, for any w ∈ Σ * , in order to verify whether w ∈ L ≤ , one can read the letters of w in their order, updating at each step the family of letters a ∈ Σ forming a "forbidden" factor of the form bva, with a < C b, v ∈ C * a . If one of these letters is met then w is rejected, otherwise it is accepted. To formalize the definition, for each b ∈ Σ, the predecessors of b are Pred(b) = {a ∈ Σ | a < C b}. Define the finite state automaton A as the 4-tuple (2 Σ , ∅, δ, F ), where the set of states is 2 Σ , i.e. the power set of Σ, the initial state is the empty set ∅, F = {S ∈ 2 Σ | S = Σ} is the family of final states and the transition function δ : (2 Σ × Σ) → 2 Σ is given by Note that, during a computation, the current state S represents the set of forbidden letters. At the beginning, all input letters are allowed, as ∅ is the initial state, while Σ is a trap state, where all letters are forbidden. In a general step, if S ⊆ Σ is the current state and b / ∈ S is an input letter, the new set of forbidden letters must be obtained from S ∪ Pred(b) by removing those elements that do not commute with b. This justifies the above definition of δ and it is clear that Moreover, the state set of the above automaton can be reduced to the states S Σ reachable from ∅. Setting the entries of the transition matrix A of the automaton A are given by: ] of this transition matrix has therefore all its entries which are monomials of degree one in z. Factorizing by z, this commutative analogue can thus be written zA, for a matrix A we call the adjacency matrix of A. Note that A strictly depends on both the concurrent alphabet (Σ, C) and the total order ≤ over Σ. As a consequence, since A recognizes a cross-section of M, denoting by π and η, respectively, the characteristic (column) vectors of ∅ and Q, the generating function M (z) is given by where I is the identity matrix of size #Q × #Q and π is the transposed of π. This identity, together with relation (6) proves the following proposition. For any trace monoid M with a concurrent alphabet (Σ, C), let ≤ be a total order on Σ, let A be the adjacency matrix of the automaton A recognizing the cross-section L ≤ of M, and assume I, π and η defined as in (9) . Then, M (z) and p M (z) satisfy the identities The standard ordering (a, b, c, d, e) on Σ induces the following (non-transitive) orientation < C over the independency graph From that I − zA is easily computed (where A is the adjacency matrix of A): where α is the size of the maximum clique in (Σ, C) and all x i 's are eigenvalues of a adjacency matrix A. Proof (sketch). The result follows from Proposition 1 by refining equalities (10) and recalling that all roots of clique polynomials are different from 0. We observe that the reverse property does not hold in general, i.e. it may occur that an eigenvalue of A is not the reciprocal of a root of p M (z). However, as shown in the following section, such a reverse sentence is true whenever the graph (Σ, C) admits a transitive orientation. Now let us consider a trace monoid M such that its independency graph (Σ, C) admits a transitive orientation. Then, we may fix a total order ≤ on Σ such that < C is transitive. In this case, the definition of cross-section L ≤ and of the automaton A can be simplified, since the set of "forbidden" factors of the form bwa, with a < C b and w ∈ C * a , can be reduced to the simple set of words S = {τ σ ∈ Σ 2 | σ < C τ }. To prove this property, consider a forbidden factor of the above form bwa, with a < C b and w ∈ C * a ; thus any symbol c occurring in w must verify (a, c) ∈ C. As a consequence, either a < C c or c < C a: in the first case ca belongs to S while, in the second case, by transitivity of < C we have c < C b and hence bc is in S. Thus, identity (8) can be simplified as L ≤ = Σ * \ a