key: cord-0043263-y1o0mda9 authors: Lye, Aaron title: Context-Sensitive Fusion Grammars Are Universal date: 2020-01-07 journal: Language and Automata Theory and Applications DOI: 10.1007/978-3-030-40608-0_19 sha: d695fd8696b44fdb691f603e9b4d46bb3d850af7 doc_id: 43263 cord_uid: y1o0mda9 Context-sensitive fusion grammars are a special case of context-dependent fusion grammars where a rule has only a single positive context condition instead of finite sets of positive and negative context conditions. They generate hypergraph languages from start hypergraphs via successive applications of context-sensitive fusion rules and multiplications of connected components, as well as a filtering mechanism to extract terminal hypergraphs from derived hypergraphs in a certain way. The application of a context-sensitive fusion rule consumes two complementarily labeled hyperedges and identifies corresponding attachment vertices provided that the context condition holds. In this paper, we show that the Post correspondence problem can be formulated very intuitively by such a grammar. Furthermore, we prove that these grammars can generate all recursively enumerable string languages (up to representation of strings as graphs) and are universal in this respect. In 2017 we introduced fusion grammars as generative devices on hypergraphs (cf. [2] ). They are motivated by the observation, that one encounters various fusion processes in various scientific fields like DNA computing, chemistry, tiling, fractal geometry, visual modeling and others. The common principle is that a few small entities may be copied and fused to produce more complicated entities. Besides hypergraph language generation they can be used to model and solve interesting decision problems, e.g., in [3] it is shown that the Hamiltonian path problem can be solved efficiently by a respective fusion grammar due to the massive parallelism in a way that mimics Adleman's famous experiment in DNA computing (cf. [1] ). In this paper, we show that the Post correspondence problem (PCP, cf. [6] ), which is well-known to be undecidable, can be expressed very intuitively by means of fusion and its solvability by using context-sensitive fusion rules. Hence, undeciability results carry over to context-sensitive fusion grammars. Recently, we showed that context-dependent fusion grammars (introduced in [4] ) are powerful enough to simulate Turing machines (cf. [5] ). In this paper, we show that one can do much better. We show that rules with a single positive context condition are sufficient. To prove this, a known result of formal language theory is used, which is, that each recursively enumerable string language is a (left) quotient of two linear languages. In our construction we employ the same recognition mechanism as the one for PCP. Throughout in the proofs we are actually operating on graphs. As graphs are a subclass of hypergraphs the results hold for the general case. The paper is organized as follows. In Sect. 2, basic notions and notations of hypergraphs are recalled. Section 3 introduces the notions of context-sensitive fusion grammars. In Sect. 4 we present a reduction of the Post correspondence problem to the membership and emptiness problem for context-sensitive fusion grammars. Afterwards, we prove that context-sensitive fusion grammars can generate all recursively enumerable string languages (up to representation) in Sect. 5. Section 6 concludes the paper pointing out some open problems. A hypergraph over a given label alphabet Σ is a system H = (V, E, s, t, lab) where V is a finite set of vertices, E is a finite set of hyperedges, s, t : E → V * are two functions assigning to each hyperedge a sequence of sources and targets, respectively, and lab : E → Σ is a function, called labeling. Given H, H ∈ H Σ , a hypergraph morphism g : H → H consists of two mappings g V : V H → V H and g E : We will use removals of the form (∅, E) below. Let H ∈ H Σ and let att(e) be the set of source and target vertices for Given H, H ∈ H Σ , the disjoint union of H and H is denoted by H + H . It is defined by the disjoint union of the underlying sets (also denoted by +). The disjoint union of H with itself k times is denoted by k · H. We use the multiplication of H defined by means of C(H) as follows. Let m : C(H) → N be a mapping, called multiplicity, A string is represented by a simple path where the sequence of labels along the path equals the given string. Let Σ be a label alphabet. Let w = x 1 . . . x n ∈ Σ * for n ≥ 1 and x i ∈ Σ for i = 1, . . . , n. Then the string graph of w is defined by The string graph of the empty string λ, denoted by sg(λ), is the discrete graph with a single node 0. Obviously, there is a one-toone correspondence between Σ * and sg(Σ * ) = {sg(w) | w ∈ Σ * }. For technical reasons, we need the extension of a string graph sg(w) for some w ∈ Σ * by a s-labeled edge bending from the begin node 0 to the end node n, where n is the length of w. The resulting graph is denoted by sg(w) s . In this section, we introduce context-sensitive fusion grammars. These grammars generate hypergraph languages from start hypergraphs via successive applications of context-sensitive fusion rules, multiplications of connected components, and a filtering mechanism. Such a rule is applicable if the positive contextcondition holds. Its application consumes the two hyperedges and fuses the sources of the one hyperedge with the sources of the other as well as the targets of the one with the targets of the other. H and called a direct derivation. A context-sensitive fusion rule is a tuple csfr = (fr (A), c: H . Remark 1. 1. In this paper, we only make use of the case where every hyperedge has one source and one target vertex. Hence, fusion rules are of the form A A . The type is therefore omitted throughout the paper. 2. The applications of fr (A) and (fr (A), id) are equivalent. We use the first as an abbreviation for the latter. We call these rules context-free fusion rules. for each x ∈ F where the morphism is uniquely defined by the labels and maps the vertices as follows: Only reduce(a 1 ) is applicable because the other complementarily labeled edges do not share a common source vertex. The matching morphism g maps the edges labeled a 1 , a 1 , resp. in fr (a 1 ) to the a 1 -labeled (resp, a 1 -labeled) edges in G; vertices are mapped respectively: wards, no further context-sensitive fusion rule is applicable. Given a finite hypergraph, the set of all possible successive fusions is finite as fusion rules never create anything. To overcome this limitation, arbitrary multiplications of disjoint components within derivations are allowed. The generated language consists of the terminal part of all resulting connected components that contain no fusion symbols and at least one marker symbol, where marker symbols are removed in the end. These marker symbols allow us to distinguish between wanted and unwanted terminal components. If for every A ∈ F a rule in P exists and every rule is context-free, then all rules are specified F and CSFG is a fusion grammar as defined in [2] . P is obsolete. In this section, we model Post correspondence problems (PCPs) by means of context-sensitive fusion grammars in such a way that a PCP is solvable if the generated language of the corresponding grammar consists of a single vertex and that a PCP is not solvable if the language is empty. Therefore, it turns out that the emptiness problem and the membership problem for context-sensitive fusion grammars are undecidable. The Post correspondence problem is defined as follow. Given a finite set of pairs {(u 1 , v 1 ), (u 2 , v 2 ), . . . , (u k , v k )} with u i , v i ∈ Σ * for some finite alphabet Σ. Does there exist a sequence of indices i 1 · · · i n with n > 0 such that u i1 · · · u in = v i1 · · · v in ? In terms of fusion, the pairs may be copied and fused in order to concatenate the strings. However, one needs a recognition mechanism to decide whether u i1 · · · u in = v i1 · · · v in or not. This recognition procedure is expressible by means of context-sensitive fusion. The proof of the theorem is based on the following lemmata. Let G = dsg(u 1 · · · u n , u 1 · · · u n ) be the hypergraph consisting of two string graphs sg(u 1 · · · u n ) and sg(u 1 · · · u n ) with u 1 , . . . , u n ∈ Σ where the first vertex of both string graphs is the same. i.e., Proof. Induction base: n = 0. dsg(λ, λ) = [1] because by definition sg(λ) is the discrete graph [1] by construction of dsg these two vertices are identified yielding the discrete graph [1] . Hence, dsg(λ, λ) 0 =⇒ [1] . Induction step: Given G = dsg(u 1 · · · u n+1 , u 1 · · · u n+1 ). Then reduce(u 1 ) can be applied because by construction of dsg(u 1 · · · u n+1 , u 1 · · · u n+1 ) the two complementary u 1 -and u 1 -labeled hyperedges share a common source vertex yielding Proof. The statement follows directly from the fact that the two rules do not share fusion symbols such that they matches are hyperedge disjoint and that the context conditions of reduce(x) only requires a commonly shared source for the two hyperedges. Proof (of Theorem 1). Let S = {(u 1 , v 1 ), (u 2 , v 2 ), . . . , (u k , v k )}. Let i 1 · · · i n be a solution to S, i.e., u i1 · · · u in = v i1 · · · v in . Let m 1 , . . . , m k be the number of occurrences of (u j , v j ) in the sequence except the first. Then there exists a derivation . . i n is a solution to S; and (4) the two connected complementary strings graphs can be erased by successive applications of reduce(x) for suitable x due to Lemma 1. Hence, • ∈ L(CSFG(S)). Then there exists a derivation Z S * = X + µ for some hypergraph X. A μ is the only connected component with marker in the start hypergraph, therefore, µ must stem from A μ . The only possibility to get rid of the A-hyperedge without attaching a new one is the application of fr (A) to A μ and some init(x 1 x 2 · · · x w1 , y 1 y 2 · · · y w2 ) with x j , y j ∈ Σ where the latter connected component is obtained from respective multiplications and the successive fusion wrt fr (A) to some init(u i1 , v i1 )+cont(u i2 , v i2 )+. . .+cont(u in , v in ) for some n and possibly applications of reduce(x) for suitable x. Due to Lemma 2 all the applications of reduce(x) can be shifted behind the applications of fr (A) and due to [2, Corollary 1] all the multiplications can be done as initial derivation step. To obtain µ the two connected complementary strings graphs must be erased by successive applications of reduce(x 1 ), . . . , reduce(x w1 ). If x 1 · · · x w1 is a proper prefix of y 1 · · · y w2 , i.e., y 1 · · · y w2 = x 1 · · · x w1 y w1+1 · · · y w2 , then one gets µ y w1+1 . . . y w2 , and analogously if y 1 · · · y w2 is a proper prefix of x 1 · · · x w1 , then one gets µ x w2+1 . . . x w1 . This implies w 1 = w 2 and y i = x i for 1 ≤ i ≤ w 1 must hold. Because The second statement is a direct consequence of the first. Other connected components do not contribute to the language due to the lack of μ-hyperedges. In this section, we prove that context-sensitive fusion grammars can generate all recursively enumerable string languages. For every Chomsky grammar one can construct a corresponding context-sensitive fusion grammar such that the generated languages of the corresponding grammars coincide up to representation. is the corresponding context-sensitive fusion grammar where Schematic drawings of some connected components of the start hypergraph are depicted in Fig. 1 . The proof is based on the following fact. We recall some details of the proof because we will refer to them in the proof of Theorem 2. Remark 3. L 0 = rev(rev(L 0 )) = rev(L(G)) = L = \L P , where rev(L 0 ) = {r(w) | w ∈ L 0 } where r(w) = x n · · · x 1 for w = x 1 · · · x n and G = (N, T, P, S) is a Chomsky grammar with L(G) = rev(L 0 ). The basic idea is that for each w ∈ L(G) exists a derivation S = w 1 → w 2 → · · · → w n−1 → w n → w n+1 = w with w i = x i u i y i and w i+1 = x i v i y i where u i :: = v i ∈ P for i = 1, . . . , n , i.e., S = x 1 u 1 y 1 → x 1 v 1 y 1 = x 2 u 2 y 2 → · · · → x n−1 v n−1 y n−1 = x n u n y n → x n v n y n = w. L = captures the relation x i v i y i = x i+1 u i+1 y i+1 and L P captures the relation x i u i y i → x i v i y i . 1 L = and L P are linear. The following grammars generate them. Two derivations may be X 0 =⇒ aX 0 a =⇒ aAX 1 baa =⇒ aAbX 1 bbaa =⇒ aAbcX 2 cccbbaa =⇒ aAbcAX 3 bAacccbbaa =⇒ aAbcAccbAacccbbaa = d and Y 0 =⇒ Y 1 ccc =⇒ aY 1 accc =⇒ aAY 1 Aaccc =⇒ aAbY 1 bAaccc =⇒ aAbcAccbAaccc = z. Removing the prefix z from d yields bbaa. Every context-free string grammars can be transformed into fusion grammars generating the same language up to representation of strings as graphs as the following construction shows. Because only dsg(X 0 , Y 0 ) μ contains a μ-hyperedge this connected component is substantial for some derived connected component contributing to the generated language. W.l.o.g. one can assume that dsg(X 0 , Y 0 ) μ is never multiplied due to the following reasoning. Let C be a connected component derivable from Z. Let # μ : H Σ → N be a mapping of hypergraphs over Σ to the number of μ-labeled hyperedges in the respective hypergraph. Then # μ (C) ≤ 1, i.e., no two or more copies of dsg(X 0 , Y 0 ) μ contribute to C as the following reasoning indicates. For each C ∈ C(Z) # μ (C) ≤ 1 by construction. For each C / ∈ C(Z) assume Z * for some k, l ∈ N where C 1 and C 2 are two connected components and # μ ( Further, r must be a context-free fusion rule because the positive context conditions of reduce(x) restrict that both hyperedges must be attached to a common source vertex which is not possible if C 1 and C 2 are two connected components. Let fr (A) be the applied context-free fusion rule, A ∈ {Y 0 , Y 1 , X 1 , X 2 , X 3 , X 4 }. W.l.o.g. let A be the label of the hyperedge in C 1 and let A be the label of the hyperedge in C 2 . Furthermore, it is sufficient to analyze the case # μ (C i ) = 1 for i = 1, 2. However, # μ (C i ) = 1 implies that dsg(X 0 , Y 0 ) μ contributes to C i but because the linear structure of the rules in P = and P P carries over to the connected components C 2 cannot contain both a μ-and a A-labeled hyperedge. Hence, the assumption must be false. The fusion rules wrt Y 0 , Y 1 , X 0 , X 1 , X 2 , X 3 are context free and thus one connected component or two connected components with two complementarily labeled hyperedges from this subset can be fused arbitrarily. This may produce connected components without markers where all the hyperedges labeled with Y 0 , Y 0 , Y 1 , . . . , X 3 , X 3 are fused. E.g. sg(xY 1 x) Y1 may be multiplied several times and all the complementary Y 1 -and Y 1 -hyperedges can be fused yielding two circles. However this connected component is not fusible to some other connected component because now it is only labeled with fusion symbols {N ∪ T ∪ {c}} but for these symbols the fusion is restricted to take only place if the two complementary hyperedges are attached to the same vertex. A similar argument can be applied to other cases wrt connected components with X i -hyperedges. The direct derivations steps can be interchanged 3 in such a way that one gets a derivation of the following form: Hence, Y = sg(w 1 · · · w m ) μ . The linear structure of the connected components gives us w 1 · · · w m ∈ L(G). In this paper, we have continued the research on context-dependent fusion grammars. We have introduced context-sensitive fusion grammars and have showed that the Post correspondence problem can be formulated very intuitively by such a grammar. Afterwards, we have showed that every Chomsky grammar can be simulated by a corresponding context-sensitive fusion grammar. Hence, they can generate all recursively enumerable string languages (up to representation of strings as graphs). This improves the previous result presented in [5] showing that context-dependent fusion grammars (with positive and negative context-conditions) are another universal computing model. However, further research is needed including the following open question. Is it true, that fusion grammars without context-conditions are not universal? Are also only negative context conditions powerful enough to simulate Chomsky grammars? If so is also a single negative context-condition sufficient? One may also investigate fusion grammar with other regulations like priorities or regular expressions. Molecular computation of solutions to combinatorial problems Fusion grammars: a novel approach to the generation of graph languages Relating DNA computing and splitting/fusion grammars Transformation of petri nets into contextdependent fusion grammars Transformation of turing machines into context-dependent fusion grammars A variant of a recursively unsolvable problem DNA Computing -New Computing Paradigms We are grateful to Hans-Jörg Kreowski and Sabine Kuske for valuable discussions. We also thank the reviewers for their valuable comments. Proof. 1. Each context-free string grammar G can be transformed into a hyperedge replacement grammar with connected right hand sides. Hence, the transformation of hyperedge replacement grammars into fusion grammars (cf. [2] ) can be applied yielding FG(G). Remark 4. The connected components in the start hypergraphs of the contextsensitive fusion grammar in Construction 2 are hypergraph representation of the rules of the two linear string grammars (cf. Construction 3) slightly modified. The connected components in Z = are constructed for the linear rules in G = such that each symbol in N ∪ T ∪ {c} is complemented and for each T -symbol the primed copy is used instead. The connected components for the linear rules in G P containing X 0 and X 1 are constructed such that they contain fusion symbols left and terminal symbols right of the X i -labeled hyperedge. Again for each terminal symbol the primed copy is used instead. The other connected components use the standard construction and are therefore only fusion symbol labeled (replacing also terminal symbols by their primed copy).Proof (of Theorem 2). Let w ∈ L(G). Then w ∈ L = \L P by Fact 1 and there are derivations in G = and G P with Y 0 * → u and X 0 * → uw with u = u 1 · · · u n and w = w 1 · · · w m . For each of these derivations exists by Lemma 3 a derivation in the corresponding fusion grammar (FG(G = ), FG(G P ), resp. where G = and G P are defined in Remark 3). Because the nonterminal alphabets of G = and G P are disjoint and the connected component dsg(X 0 , Y 0 ) μ contains two hyperedges one labeled with each start symbol of the two linear string grammars there is a derivation . . . w 1 w m µ + [n]. . Consequently, sg(w 1 · · · w m ) ∈ L(CSFG(G)).