key: cord-0046135-tx5z2nep authors: Vu, Martin; Fernau, Henning title: Insertion-Deletion Systems with Substitutions I date: 2020-06-24 journal: Beyond the Horizon of Computability DOI: 10.1007/978-3-030-51466-2_33 sha: 4855e15d1e3ad9bbfcba4f1d857a2cb929e9fe54 doc_id: 46135 cord_uid: tx5z2nep With good biological motivation, we add substitutions as a further type of operations to (in particular, context-free) insertion-deletion systems. This way, we obtain new characterizations of and normal forms for context-sensitive and recursively enumerable languages. Insertion-deletion systems, or ins-del systems for short, are well established as computational devices and as a research topic within Formal Languages throughout the past nearly 30 years, starting off with the PhD thesis of Kari [3] . However, from its very beginning, papers highlighting the potential use of such systems in modelling DNA computing also discussed the replacement of single letters (possibly within some context) by other letters, an operation called substitution in [2, 4] . Interestingly, all theoretical studies on grammatical mechanisms involving insertions and deletions omitted including the substitution operation in their studies. With this paper, we are stepping into this gap by studying ins-del systems with substitutions, or ins-del-sub systems for short. We put special emphasis on extending context-free ins-del systems with substitutions. We observe quite diverse effects, depending on whether the substitutions are context-free, one-sided or two-sided. We can characterize the contextsensitive languages by extending context-free insertion systems with substitutions, which can be seen as a new normal form for monotone Chomsky grammars. For omitted proofs and further results, see [13] . An ins-del system is a 5-tuple ID = (V, T, A, I, D), consisting of two alphabets V and T with T ⊆ V , a finite language A over V , a set of insertion rules I and a set of deletion rules D. Both sets of rules are formally defined as sets of triples of the form (u, a, v) with a, u, v ∈ V * and a = λ. We call elements occurring in T terminal symbols, while referring to elements of V \T as nonterminals. Elements of A are called axioms. Let w 1 uvw 2 , with w 1 , u, v, w 2 ∈ V * , be a string. Applying the insertion rule (u, a, v) ∈ I inserts the string a ∈ V * between u and v, which results in the string w 1 uavw 2 . The application of a deletion rule (u, a, v) ∈ D results in the removal of an substring a from the context (u, v). More formally let w 1 uavw 2 ∈ V * be a string. Then, applying (u, a, v) ∈ D results in the string w 1 uvw 2 . We define the relation =⇒ as follows: Let x, y ∈ V * . Then we write x =⇒ ins y if y can be obtained by applying an insertion or deletion rule to x. We also write (u, a, v) ins or (u, a, v) del to specify whether the applied rule has been an insertion or a deletion rule. Consider (u, a, v) ins or (u, a, v) del . Then we refer to u as the left context and to v as the right context of (u, a, v) ins /(u, a, v) del . Let ID = (V, T, A, I, D) be an ins-del system. The language generated by ID is defined by L(ID) = {w ∈ T * | α =⇒ * w, α ∈ A}. The size of ID describes its descriptional complexity and is defined by a tuple (n, m, m ; p, q, q ), where By INS m,m n DEL q,q p we denote the family of all ins-del systems of size (n, m, m ; p, q, q ) [1, 12] . Depending on the context, we also denote the family of languages characterized by ins-del systems of size (n, m, m ; p, q, q ) by INS m,m n DEL q,q p . We call a family INS 0,0 n DEL 0,0 p a family of context-free ins-del systems, while we call a family INS m,m n DEL q,q p with m + m > 0 ∧ mm = 0 or (q + q > 0 ∧ qq = 0) a family of one-sided ins-del systems. According to [1] , an ins-del system of size (n, m, m ; p, q, q ) is said to be in normal form if -for any (u, a, v) ∈ I , it holds that |a| = n, |u| = m and |v| = m , and -for any (u, a, v) ∈ D , it holds that |a| = p, |u| = q and |v| = q . Alhazov et al. [1, 12] have shown the following auxiliary result: Theorem 1. For every ins-del system ID, one can construct an insertiondeletion system ID in normal form of the same size with L(ID ) = L(ID). In the following sections, we use a modified normal form for ins-del systems of size (1, 1, 1; 1, 1, 1). Given an arbitrary ins-del system of size (1, 1, 1; 1, 1, 1), the construction of this modified normal form is as follows: Construction 1. Let ID = (V, T, A, I, D) be an ins-del system of size (1, 1, 1; 1, 1, 1). Without loss of generality, we assume {$, X} ∩ V = ∅ and $ = X. We construct ID = (V ∪ {$, X}, T, A , I , D ) as follows: The basic idea of Construction 1 is the same as in the usual normal form constructions (see Theorem 1): the symbol $ is used as a padding symbol to ensure that the left and right contexts of all rules are of the required size. We can show that Construction 1 is equivalent to the usual normal form construction. Unlike the usual normal form construction, context-free deletions can only occur at the beginning and the end of a sentential form in the case of Construction 1. This fact will prove useful below. Ins-del systems have been extensively studied regarding the question if they can describe all of the recursively enumerable languages. Let us summarize these results first by listing the classes of languages known to be equal to RE: INS [6] . We define substitution rules to be of the form (u, a → b, v); u, v ∈ V * ; a, b ∈ V . Let w 1 uavw 2 ; w 1 , w 2 ∈ V * be a string over V . Then applying the substitution rule (u, a → b, v) allows us to substitute a single letter a with another letter b in the context of u and v, resulting in the string w 1 ubvw 2 . Formally, we define an ins-del-sub system to be a 6-tuple ID r = (V, T, A, I, D, S), where V, T, A, I and D are defined as in the case of usual ins-del systems and S is a set of substitution rules. Substitution rules define a relation =⇒ sub : Let x = w 1 uavw 2 and y = w 1 ubvw 2 be strings over V . We write x =⇒ sub y iff there is a substitution rule (u, a → b, v). In the context of ins-del-sub systems, we write= ⇒ to denote any of the relations =⇒ ins , =⇒ del or =⇒ sub . We define the closures= ⇒ * and= ⇒ + as usual. The language generated by an ins-del-sub system ID r is defined as L(ID r ) = {w ∈ T * | α= ⇒ * w, α ∈ A}. As with usual ins-del system, we measure the complexity of an ins-delsub system ID r = (V, T, A, I, D, S) via its size, which is defined as a tuple (n, m, m ; p, q, q ; r, r ), where n, m, m , p, q and q are defined as in the case of usual ins-del systems, while r and r limit the maximal length of the left and right context of a substitution rule, respectively, i.e., r = max{|u| | (u, a → b, v) ∈ S}, r = max{|v| | (u, a → b, v) ∈ S}. INS m,m n DEL q,q p SUB r,r denotes the family of all ins-del-sub systems of size (n, m, m ; p, q, q ; r, r ). Note that, as only one letter is replaced by any substitution rule, there is no subscript below SUB. Depending on the context, we also refer to the family of languages generated by ins-del-sub systems of size (n, m, m ; p, q, q ; r, r ) by INS m,m n DEL q,q p SUB r,r . Expanding our previous terminology, we call substitution rules of the form (λ, a → b, λ) contextfree, while substitution rules of the form (u, Let R be the the reversal (mirror) operator. For a language L and its mirror L R the following lemma holds. We will now define the term resolve. Let ID r = (V, T, A, I, D, S) be an insdel-sub system. We say that a nonterminal X of ID r is resolved if X is either deleted or substituted. It is easy to see that in any terminal derivation of ID r all nonterminals must be resolved at some point of the derivation. We remark that a nonterminal X may be resolved by being substituted with a nonterminal Y , which in turn must be resolved. As in the case of ins-del systems without substitution rules, we define a normal form for ins-del-sub systems. An ins-del-sub system of size (n, m, m ; p, q, q ; r, r ) is said to be in normal form if -for any (u, a, v) ∈ I, it holds that |a| = n, |u| = m and |v| = m ; -for any (u, a, v) ∈ D, it holds that |a| = p, |u| = q and |v| = q ; -for any (u, a → b, v) ∈ S, it holds that |u| = r and |v| = r . For every ins-del-sub system ID r of size (n, m, m ; p, q, q ; r, r ), one can construct an ins-del-sub system ID r of the same size in normal form, with L(ID r ) = L(ID r ). Proof. Let ID r = (V, T, A, I, D, S) be an ins-del-sub system of size (n, m, m ; p, q, q ; r, r ). The basic idea is similar to the normal form construction for insdel systems in [1, 12] . In fact, the sets of insertion and deletion rules of ID r = (V ∪ {$}, T, A , I , D ∪ {(λ, $, λ)}, S ) are constructed as in the ins-del system normal form construction. S and A are defined as follows: As in Theorem 1, $ is a new symbol, that is, $ / ∈ V , which is introduced to be the padding symbol. Let shown by induction. While the only-if part follows easily, consider the following for the if part. We can assume that in any derivation of ID r the first i and the last j letters of the axiom are not deleted until the very end of derivation. Hence, insertion rules of the form (z 1 , $ n , z 2 ) with z 1 ∈ ($ ∪ V ) m , z 2 ∈ ($ ∪ V ) m are applicable until the very end. It is clear that due to insertion rules of the form (z 1 , $ n , z 2 ) and the deletion rule (λ, $, λ) it is possible to generate an arbitrary number of $ at an arbitrary position of a sentential form of ID r . The following result will be useful in subsequent proofs; compare to Lemma 1. Let L be a family of languages that is closed under reversal. Then: Due to the definition of ins-del-sub systems, the following result is clear. Whether this inclusion is proper, is the question, that will be addressed in the following sections. We will see that while in some cases an arbitrary system of size (n, m, m ; p, q, q , r, r ) can be simulated by a system of size (n, m, m ; p, q, q ), this is not the general case. Furthermore, we will see that families INS m,m n DEL q,q p , which are not computationally complete, may reach computational completeness via an extension with substitution rules. Additionally, we will see below that families of ins-del systems which are equally powerful may no longer be after being extended with the same class of substitution rules, i.e., we have DEL q2,q 2 p2 SUB r,r . The reverse case might occur, as well. Because the application of an insertion rule (u, x, v) corresponds to the application of the monotone rewriting rule uv → uav and the application of a substitution rule (u, a → b, v) corresponds to the application of the monotone rewriting rule uav → ubv, a monotone grammar can simulate derivations of an insertion-substitution system. (More technically speaking, we have to do the replacements on the level of pseudo-terminals N a for each terminal a and also add rules N a → a, but these are minor details.) Hence, we can conclude: We will focus on context-free ins-del systems, which are extended with substitution rules. More precisely, we will analyze the computational power of the family of systems INS 0,0 n DEL 0,0 p SUB r,r . We are going to analyze substitution rules of the form (λ, a → b, λ), which means that letters may be substituted regardless of any context. We will show that extending context-free ins-del systems with context-free substitution rules does not result in a more powerful system. In fact, a context-free ins-del-sub system of size (n, 0, 0; p, 0, 0; 0, 0) can be simulated by an ins-del system of size (n, 0, 0; p, 0, 0). (λ, a → b, λ) , has been introduced by either an insertion rule (λ, w 1 aw 2 , λ) or as part of an axiom w 1 aw 2 at some point before executing the substitution. As a serves no purpose other than to be replaced (i.e., it is not used as a context), the basic idea is to skip introducing a altogether and introduce b instead. More formally: instead of applying an insertion rule (λ, w 1 aw 2 , λ)/an axiom w 1 aw 2 and replacing a via (λ, a → b, λ) at a later point, we introduce a new insertion rule (λ, w 1 bw 2 , λ)/a new axiom w 1 bw 2 , which we apply instead of (λ, w 1 aw 2 , λ)/w 1 aw 2 . This idea can be cast into an algorithm to produce an ins-del system ID = (V, T, A , I , D) with L(ID r ) = L(ID). As only context-free insertion rules of size maximum (n, 0, 0) are added to I , it is clear that Considering the question about the generative power of context-free ins-del systems with context-free substitution rules compared to usual context-free insdel systems, Theorem 5 and Lemma 3 together yield: The language generated by ID r is L(ID r ) = {w | w ∈ {a, b, c} * , |w| = 3n, n ∈ N}. Using the construction introduced in Theorem 5 yields the ins-del system ID = ({a, b, c}, {a, b, c}, {λ}, I, ∅) , While it is clear that L(ID) = L(ID r ), we remark that this example shows that the construction method of Theorem 5 may yield an ins-del system whose number of rules is exponentially greater than the number of rules of the system with substitutions. Now, we will analyze the effect of one-sided substitution rules if used to extend a context-free ins-del system. We will show that using one-sided substitution rules can greatly increase the computational power of context-free insertion and deletion rules. In some cases, we even get computationally completeness results. We will now construct an ins-del-sub system ID r of size (1, 0, 0; 1, 0, 0; 1, 0) which simulates ID r of size (1, 1, 0; 1, 1, 0; 1, 0). The system ID r is constructed in the following manner: Construction 2. We assume the system ID r = (V, T, A, I, D, S) to be in normal form and any rule of ID r to be labelled in a one-to-one manner, i.e., there is a bijection between a set of labels and the rule set. Let $ be the padding symbol used in the construction of the normal form of ID r . The system ID r = (V , T, A, I , D , S ∪ S ) is constructed as follows. For each rule of ID r , we introduce a new nonterminal X i and define The set I of insertion rules of ID r contains all (λ, X i , λ), where i is the label of an insertion rule of ID r , while the set D of deletion rules as contains (λ, $, λ) and all (λ, X i , λ), where i is the label of a deletion rule of ID r . Furthermore, we define the set of substitution rules S = S 1 ∪ S 2 , with Each deletion rule (u, a, λ) ∈ D of ID r , where i is the label of (u, a, λ), corresponds to a deletion rule (λ, X i , λ) ∈ D and a substitution rule (u, a → X i , λ) ∈ S 2 of ID r . The basic idea of the construction is to simulate a deletion rule (u, a, λ) ∈ D by substituting the letter a with left context u via (u, a → X i , v) ∈ S 2 . The introduced nonterminal X i is then deleted at some point by the deletion rule (λ, X i , λ) ∈ D . It is clear that a derivation of the form in which the application of (λ, X i , λ) ∈ D succeeds an application of (u, a → X i , λ) ∈ S 2 immediately, is equivalent to the application of a deletion rule (u, a, v) ∈ D. It needs much more care to prove the following converse: Proposition 1. Let α ∈ A. Consider a derivation α= ⇒ * w ∈ T * of ID r . Then, there is an alternative derivation of ID r , leading from α to w, in which all nonterminals X i ∈ V \V are resolved immediately after being introduced. This allows us to state: {(b, X 1 → a, λ), (a, X 2 → b, λ), (b, X 3 → a, λ) }. The generated language is L(ID s ) = (ba) + , as we can easily see that any generated word begins with a letter b and ends with a letter a. Furthermore, any word generated by ID s is not of the form w 1 bbw 2 , as the only way to introduce the terminal symbol b (except for the b introduced via the axiom) is by substituting a nonterminal X 2 with b. However, this substitution requires a left context a, which means that at some point the letter to the left of any b has been a. There are no insertion rules which can insert an additional b or X 2 between a and b. Furthermore, there are no deletion rules at all, which means that no a can be deleted. Therefore, the letter to the left of any b cannot be another b. Using the same argumentation, we can see, that any word generated by ID s is not of the form w 1 aaw 2 , either. It is easy to see that a result identical to Theorem 6 can be shown analogously for the mirrors of INS 0,0 1 DEL 0,0 1 SUB 1,0 and INS 1,0 1 DEL 1,0 1 SUB 1,0 . Therefore: We now analyze the computational power of ins-del-sub systems of size (2, 0, 0; 2, 0, 0; 0, 1). While the family of ins-del systems of size (2, 0, 0; 2, 0, 0) is known to be a proper subset of CF, see [11, 12] , we will show that an extension with substitution rules of the form (λ, A → B, C) results in a significant increase in computational power. More precisely, by simulating an ins-del systems of size (2, 0, 1; 2, 0, 0), we will show that INS 0,0 2 DEL 0,0 2 SUB 0,1 = RE holds. The basic idea of Construction 3 is essentially the same as in Construction 2: as context-free insertion rules cannot scan for contexts (by definition), this task is handled by the corresponding substitution rules. Consider an insertion rule (λ, N i,2 N i,1 , λ) of ID r where i is the label of an insertion rule (λ, ab, c) ∈ I. Then the substitution rules, corresponding to this rule, are (λ, N i,1 → b, c) and (λ, N i,2 → a, b) . This idea leads us to: As INS 0,1 2 DEL 0,0 2 = RE holds according to [8, Theorem 5] , we conclude: This is an interesting result as the families of ins-del systems of size (2, 0, 0; 2, 0, 0) and of size (2, 0, 0; 0, 0, 0) are known to be equal [12, Theorem 4.7 ], yet both classes extended with the same class of substitution rules differ in computational power. As RE and CS are closed under reversal, the next corollary follows with Lemma 2 and Theorem 4. After analyzing the effect of context-free and one-sided substitution rules on context-free ins-del systems, we will now proceed to two-sided substitution rules, i.e., substitution rules with left and right context. Somehow surprisingly, this lifts the computational power of even the 'weakest' ins-del systems, that is, systems of size (1, 0, 0; 1, 0, 0), up to the level of RE. Let ID ∈ INS 1,1 1 DEL 1,1 1 . We will show that there is a system ID r ∈ INS 0,0 1 DEL 0,0 1 SUB 1,1 capable of simulating ID. The basic idea is that the context checks, necessary for simulating rules with left and right context, are performed by the substitution rules. The system ID r is constructed in the following manner: The basic idea is similar to Construction 2. Each deletion rule (u, a, v) ∈ D of ID, where i is the label of (u, a, v), corresponds to a deletion rule (λ, X i , λ) ∈ D and a substitution rule (u, a → X i , v) ∈ S 2 . We leave the context checks to the substitution rules. The same idea is applied to the insertion rules. With some technical effort, we can prove the following result. Let α ∈ A. Consider a derivation α= ⇒ * w ∈ T * . Then, there is an alternative derivation, leading from α to w, in which all nonterminals X i ∈ V \V are resolved immediately after being introduced. This property is the key to show that for a system ID r of size (1, 0, 0; 1, 0, 0; 1, 1) constructed from a given ins-del system ID of size (1, 1, 1; 1, 1, 1) in normal form according to Construction 4, we find L(ID) = L(ID r ). As such ins-del systems are known to be computational complete, we conclude: We now analyze the power of ins-del-sub systems of size (1, 0, 0; 0, 0, 0; 1, 1). By definition, it is clear that INS 0,0 1 DEL 0,0 0 SUB 1,1 ⊆ INS 0,0 1 DEL 0,0 1 SUB 1,1 holds. In the following, we will show that this inclusion is proper. To be more precise, we will show that ins-del-sub systems of size (1, 0, 0; 0, 0, 0; 1, 1) characterize the context-sensitive languages. By Theorem 4, we are left to prove CS ⊆ INS 0,0 1 DEL 0,0 0 SUB 1,1 . For every context-sensitive language L, there is a linear bounded automaton (LBA) LB = (Q, T, Γ, q 0 , δ, , F ) accepting L. We are going to construct an insdel-sub system of size (1, 0, 0; 0, 0, 0; 1, 1) to simulate LB. We first give a brief sketch of the basic idea behind this simulation in the following paragraph. The simulation evolves around strings of the form with u 1 , . . . , u n ∈ T ; q j ∈ Q and v 1 , . . . , v n ∈ Γ . The concatenation of the first component of each tuple, that is, u 1 . . . u n , is the input word of the linear bounded automaton LB, while the concatenation of the second component of each tuple, that is, an accepting configuration, i.e., q i ∈ F , we substitute all tuples with their respective first component. For instance (u k , v k ) is substituted with u k . In short, this means that if (the simulation of) LB running on u 1 . . . u n halts in an accepting configuration, we generate the word u 1 . . . u n . More details follow: ,qibr) ,r , X (a,lqib),l | a ∈ T, b ∈ Γ, q i ∈ Q, r ∈ R, l ∈ L} generates the language L by simulating LB. Strings of the form consist of symbols in V 2 , while the symbols in V 1 are auxiliary symbols used to generate such strings. The symbols in V 3 are used to simulate LB's transitions. We define A = {X 0 (a, a#) | a ∈ T } ∪ {a | a ∈ L ∩ T } and I = {(λ, X a , λ) | a ∈ T }. If λ ∈ L, we add λ to the axiom. The set of substitution rules is defined as we collect substitution rules used to initialize the simulation. The substitution rules in the set S N are used to simulate the application of a transition δ(q i , b) (q j , c, N) . S N consists of substitution rules of the form c, N) . The substitution rules in the set S L are used to simulate left moves. For each transition δ(q i , b) (q j , c, L) of LB, we add substitution rules (λ, (a, q i br) → X (a,cr);qj ;L , λ), (λ, (d, le) → X (d,lqj e),l , X (a,cr);qj ;L ), (X (d,lqj e),l , X (a,cr);qj ;L → (a, cr), λ), (λ, X (d,lqj e),l → (d, lq j e), (a, cr)) to S L with a, d ∈ T, b, c, e ∈ Γ, q i , q j ∈ Q, l ∈ L, r ∈ R. Similarly, the substitution rules in S R can simulate right moves δ(q i , b) (q j , c, R) with: (λ, (a, lq i b) → X (a,lc);qj ;R , λ), (X (a,lc);qj ;R , (d, er) → X (d,qj er),r , λ) (λ, X (a,lc);qj ;R → (a, lc), X (d,qj er),r ), ((a, lc), X (d,qj er),r → (d, q j er), λ) . The set S endmarker,L consists of substitution rules of the form with a ∈ T, b, c ∈ Γ, q i , q j ∈ Q, δ(q i , b) (q j , c, L) and δ(q i , $) (q j , $, R). The set S endmarker,R consists of substitution rules of the form (λ, (a, q i b#) → (a, cq j #), λ), (λ, (a, bq i #) → (a, q j b#), λ) with a ∈ T, b, c ∈ Γ, q i , q j ∈ Q, δ(q i , b) (q j , c, R) and δ(q i , #) (q j , #, L). Both sets are used for the simulation of δ(q i , b) (q j , c, L) and δ(q i , b) (q j , c, R) as well, in the case the read/write head moves to/from an endmarker. The set S final = S f1 ∪ S f2 is used to generate a word w ∈ T * if w has been accepted by the simulated linear bounded automaton LB. S f1 consists of the substitutions (λ, (a, q f b) → a, λ), (λ, (a, $q f b) → a, λ), (λ, (a, q f $b) → a, λ), (λ, (a, q f b#) → a, λ), (λ, (a, bq f #) → a, λ) and S f2 consists of (λ, (a, b) → a, c), (c, (a, b) → a, λ), (λ, (a, $b) → a, c), (c, (a, b#) → a, λ) with a, c ∈ T, b ∈ Γ, q f ∈ F . Working out the correctness of this construction, we can show: Theorem 8. INS 0,0 1 DEL 0,0 0 SUB 1,1 = CS. As a consequence of Theorem 8, we can formulate the following Penttonenstyle normal form theorem for context-sensitive languages. We believe that this could be useful in particular when dealing with variations of insertion systems. Allowing erasing productions on top, we also arrive at a new characterization of the family of recursively enumerable languages. By different methods, we could even prove that either of the two non-context-free forms suffices to achieve RE. We have shown that the addition of substitution rules to ins-del systems yields new characterizations of RE and CS. In particular we have shown the following equalities: INS 0,0 2 DEL 0,0 2 SUB 1,0 = RE, INS 0,0 1 DEL 0,0 1 SUB 1,1 = RE and INS 0,0 1 DEL 0,0 0 SUB 1,1 = CS. Additionally we have shown INS 1,0 1 DEL 1,0 1 SUB 1,0 = INS 0,0 1 DEL 0,0 1 SUB 1,0 . While in the above cases an extension with (non-contextfree) substitution rules leads to an increase in computational power, we have also shown that the addition of context-free substitution rules to context-free ins-del systems does not affect the computational power. The main open question is if INS 0,0 1 DEL 0,0 1 SUB 1,0 is computationally complete. We conjecture this not to be the case, as with only left context rules, information can only be propagated in one direction. Yet, should INS 0,0 1 DEL 0,0 1 SUB 1,0 equal RE, this would provide an interesting new normal form. A minor open question is the strictness of the inclusion INS 0,0 2 DEL 0,0 0 SUB 0,1 ⊆ CS. Small size insertion and deletion systems Computing with DNA On insertions and deletions in formal languages DNA computing: arrival of biological mathematics Further results on insertion-deletion systems with one-sided contexts Computational power of insertiondeletion (P) systems with rules of size two Context-free insertiondeletion systems Insertion-deletion systems with onesided contexts DNA Computing: New Computing Paradigms On the computational power of insertion-deletion systems On minimal context-free insertion-deletion systems Recent developments on insertion-deletion systems On insertion-deletion systems with substitution rules