key: cord-0046128-is6a5f7y
authors: Konstantinidis, Stavros
title: Theoretical and Implementational Aspects of the Formal Language Server (LaSer)
date: 2020-06-24
journal: Beyond the Horizon of Computability
DOI: 10.1007/978-3-030-51466-2_25
sha: b396d4b321028e1addbbf19837f86d09037fe7fc
doc_id: 46128
cord_uid: is6a5f7y

LaSer, the formal language server, allows a user to enter a question about an independent language and provides an answer either in real time or by generating a program that can be executed at the user’s site. Typical examples of independent languages are codes in the classic sense, such as prefix codes and error-detecting languages, or DNA-computing related codes. Typical questions about independent languages are the satisfaction, maximality and construction questions. We present some theoretical and implementational aspects of LaSer, as well as some ongoing progress and research plans.

LaSer, [18] , the formal language server, allows a user to enter a question about an independent language and provides an answer either in real time or by generating a program that can be executed at the user's site. Typical examples of independent languages are codes in the classic sense, such as prefix codes and error-detecting languages, or DNA-computing related codes. Typical questions about independent languages are the satisfaction, maximality and construction questions. For example, the satisfaction question is to decide, given an independence I and a regular language L, whether L is independent with respect to I.

We present some theoretical and implementational aspects of LaSer, as well as some ongoing progress and research plans.

Let R be a binary relation, that is, a subset of Σ * × Σ * , where Σ is an alphabet. A language L is R-independent, [21, 22] , if Examples of R-independent languages are error-detecting languages for various error combinations, variable length codes, such as prefix codes and suffix codes, as well as DNA-related languages [2, 5, 6, 10, [12] [13] [14] 23] . LaSer allows users to represent rational relations via transducers 1 , and regular languages via NFAs 2 . The satisfaction question is whether L(a) is R(t)-independent, given an NFA a and a transducer t. If the answer is NO, a witness word pair (x, y) is computed such that x = y and one of (x, y) and (y, x) is in R(t). The maximality question is whether L(a) is a maximal R(t)-independent language knowing that L(a) is R(t)-independent. If the answer is NO, a witness word x / ∈ L(a) is computed such that L(a) ∪ {x} is R(t)-independent. The construction question is to make an n-element language (if possible) that is R(t)-independent, given transducer t, integer n > 0 and the size of the alphabet.

If t is a transducer then the following language class

is called the independence, or property, described by t. In this case any language L ∈ P t is said to satisfy P t . For example, the independence "prefix codes" is described by the transducer px and the independence "2-synchronization-error detecting languages" is described by the transducer id 2 (see Fig. 1 ). An arrow with label a/a denotes multiple transitions: one with label a/a for each a ∈ Σ, and similarly for labels a/λ. An arrow with label a/a denotes multiple transitions: one with label a/a for all a, a ∈ Σ with a = a . Let x be any word. We have: px(x) = the set of proper prefixes of x, equivalently, R(px) = the set of word pairs (x, y) such that y is a proper prefix of x; sx(x) = set of proper suffixes of x; sub1(x) = set of words resulting by substituting at most one symbol in x with another one; id2(x) = set of words resulting by inserting and/or deleting at most 2 symbols in x.

Transducer independences are closed under intersection, that is, if P t1 and P t2 are transducer independences, then also P t1 ∩ P t2 is a transducer independence.

This is useful when we are interested in languages L satisfying two or more properties, such as the combined property of being a prefix and 1-substitution detecting code. As prefix codes are described by the transducer px and 1-substitution detecting codes are described by the transducer sub 1 then also the combined property is described by a transducer. This implies that LaSer can answer the satisfaction, maximality and construction questions for the combined property. LaSer's backend is based on the Python package FAdo [8] , which implements automata, transducers, and independences described by transducer objects in the module codes.py [17] . The choice to use FAdo is based on the facts that its installation is very simple, it contains a rich set of easy to use methods, and is written in Python which in turn provides a rich availability of high level methods.

Three examples of independences that are not of the form P t are the following.

-The class of UD codes (uniquely decodable/decipherable codes). LaSer supports this class. -The class of comma-free codes: that is, all languages L satisfying the equation

-The class of language pairs (L 1 , L 2 ) satisfying the equation

Here the site directed insertion operation x sdi ← y between words x, y, introduced in [3] , is such that

where u, v are nonempty. This operation models site-directed mutagenesis, an important technique for introducing a mutation into a DNA sequence.

That "UD codes" and "comma-free codes" are not transducer independences can be shown using dependence theory: every transducer independence is a 2independence, but the "comma-free codes" independence, for instance, is a 3independence and not a 2-independence-see [12, 15] . In general, intersecting (combining) the above independences with a transducer independence results into a new independence that is not of the form P t for some transducer t. LaSer does not handle non-transducer independences, with the exception of "UD codes" for which specific algorithms are employed. More specifically, for the satisfaction question LaSer uses the quadratic-time elegant algorithm of [9] . For the maximality question LaSer uses Schützenberger's theorem that maximality of L is equivalent to the condition that every word in Σ * is subword (part) of some word of L * [2] . A specific algorithm is also used in [23] for the satisfaction question of the combination of the UD code and two other DNA-related independences.

We now discuss possible approaches of representing non-transducer independences as finite objects in a way that these objects can be manipulated by algorithms which can answer questions about the independences being represented. Some approaches are discussed in [15, 19] . It is important to note that there is a distinction between the terms "property" and "independence". A (language) property is simply a class (set) of languages. A (language) independence is a property P for which the concept of maximality is defined, that is, P must satisfy if L ∈ P then also L ∈ P for all L ⊆ L (see [15] for details). A general type of independences can be defined via rational language equations ϕ(L) = ∅, where ϕ(L) is an independence expression involving the variable L. An independence expression ϕ is defined inductively as follows: it is L or a language constant, or one of ϕ 1 ϕ 2 , ϕ 1 ∪ϕ 2 , ϕ 1 ∩ϕ 2 , (ϕ 1 ) * , t(ϕ 1 ), θ(ϕ 1 ), where ϕ 1 , ϕ 2 are independence expressions, t is a transducer constant, and θ is an antimorphic permutation 3 constant. A rational language equation is an expression of the form ϕ(L) = ∅, where ϕ(L) is an independence expression containing the variable L and the language constants occurring in ϕ(L) represent regular languages. A language L is ϕ-independent if it satisfies the equation ϕ(L) = ∅. The independence P ϕ described by ϕ is the set of ϕ-independent languages. Example 1. Every transducer independence P t such that t is an input-altering transducer 4 is described by the rational language equation

where we have used the standard notation t(L) = {y | (x, y) ∈ R(t), x ∈ L}.

The independence "comma-free codes" is described by the rational language equation (2) . The independence "θ-free languages", [10] , is described by the rational language equation

The satisfaction question for independences P ϕ is implemented in [19] , where parsing of expressions ϕ(L) is implemented using Python's lark library, and evaluation of ϕ(L) for given L = L(a) is implemented using the FAdo package.

What about independences described by equations like (3)? This is a nontransducer independence. Various general types of independences are defined 3 An antimorphic permutation θ maps the alphabet Σ onto Σ and extends to words anti-morphically: θ(xy) = θ(y)θ(x). A typical example of this is the DNA involution on the alphabet {a, c, g, t} such that θ(a) = t, θ(c) = g, θ(g) = c, θ(t) = a. In this case, θ(aac) = gtt. 4 This is a transducer t such that w / ∈ t(w) for all words w. For example, px and sx in Fig. 1 are input-altering transducers.

in [5, 11, 12, 22] . Equation (3) however involves an independence expression with two language variables: L 1 and L 2 . In analogy to transducer independences that are R-independences, for binary relations R, one can consider independences with respect to higher degree relations 5 . Consider the ternary relation Above we have made the following notation: for any k-tape transducer t, for any i ∈ {1, . . . , k}, and for any list of k − 1 languages L 1 , . . . , L k−1 , the expression t(L 1 , . . . , L k−1 : i) denotes the set of all words w that result if we consider the i-th tape as output tape and the rest k − 1 tapes as input tapes such that the input k − 1 words are from the k − 1 languages. Thus, we can talk about the independence described by the equation

Given a k-tape transducer t and k − 1 NFAs accepting the languages L 1 , . . . , L k−1 , the satisfaction question can be decided if we construct the transducer resulting by intersecting t with the NFAs at the k − 1 positions other than i, and then testing whether the resulting transducer has a path from an initial to a final state.

We propose to investigate further the rational language equations defined here as well as in [15] . Topics of interest are maximality, embedding and expressibility 6 as well as enhancement and implementation of algorithms involved. Some of these topics could be complex. For example, the maximality question for transducer independences can be answered by a simple algorithm, but the question is PSPACE hard. The embedding question is to construct a maximal independent language containing the given independent language L(a). For transducer independences described by Eq. (4), the embedding question is addressed in [16] . The maximality and embedding questions for non-transducer independences appear to be far more complex. For the satisfaction question of independences described by rational language equations, a useful question is to define and implement witnesses of non-satisfaction. For example, a witness of non-satisfaction for equation (2) would be a word triple (x, y, z) such that x, y, z ∈ L and xy = uzv for some nonempty words u and v.

Transductions and Context-Free Languages

Codes and Automata

Site-directed insertion: decision problems, maximality and minimality

Relations over words and logic: a chronology

Trajectory-based codes

Bond-free DNA language classes

On relations defined by generalized finite automata

FAdo: Tools for formal languages manipulation

Deciding code related properties by means of finite transducers

Coding properties of DNA languages

Syntactic monoids of codes

Codes

Transducer descriptions of DNA code properties and undecidability of antimorphic problems

An algebra of discrete channels that involve combinations of three basic error types

Applications of transducers in independent languages, word distances, codes

Embedding rationally independent languages into maximal ones

This is the full journal version of the paper "Implementation of Code Properties via Transducers

LaSer: Independent LAnguage SERver

Deciding rational property definitions, Honours Undergraduate Thesis

Elements of Automata Theory

Codes and binary relations

Languages and Codes

Testing DNA code words properties of regular languages

We thank the anonymous referees for constructive comments.