key: cord-0060412-ebjdigdr authors: Mitterwallner, Fabian; Lochmann, Alexander; Middeldorp, Aart; Felgenhauer, Bertram title: Certifying Proofs in the First-Order Theory of Rewriting date: 2021-02-26 journal: Tools and Algorithms for the Construction and Analysis of Systems DOI: 10.1007/978-3-030-72013-1_7 sha: d6c4ccbaaaec1bbe876011c226377f7fa08e7137 doc_id: 60412 cord_uid: ebjdigdr The first-order theory of rewriting is a decidable theory for linear variable-separated rewrite systems. The decision procedure is based on tree automata techniques and recently we completed a formalization in the Isabelle proof assistant. In this paper we present a certificate language that enables the output of software tools implementing the decision procedure to be formally verified. To show the feasibility of this approach, we present FORT-h, a reincarnation of the decision tool FORT with certifiable output, and the formally verified certifier FORTify. Many properties of rewrite systems can be expressed as logical formulas in the first-order theory of rewriting. This theory is decidable for the class of linear variable-separated rewrite systems, which includes all ground rewrite systems. The decision procedure is based on tree automata techniques and goes back to Dauchet and Tison [7] . It is implemented in FORT [17, 18] . FORT takes as input one or more rewrite systems R 0 , R 1 , . . . and a formula ϕ, and determines whether or not the rewrite systems satisfy the property expressed by ϕ, in which case it reports yes or no. FORT may not reach a conclusion due to limited resources. For properties related to confluence and termination, designated competitions (CoCo [15] , termCOMP [9] ) of software tools take place regularly. Occasionally, yes/no conflicts appear. Since the participating tools typically couple a plethora of techniques with sophisticated search strategies, human inspection of the output of tools to determine the correct answer is often not feasible. Hence certified categories were created in which tools must output a formal certificate. This certificate is verified by CeTA [21] , an automatically generated Haskell program using the code generation feature of Isabelle. This requires not only that the underlying techniques are formalized in Isabelle, but the formalization must be executable for code generation to apply. During the time-consuming formalization process, mistakes in papers are sometimes brought to light. Since 2017 we are concerned with the question of how to ensure the correctness of the answers produced by FORT. The certifier CeTA supports a great many techniques for establishing concrete properties like termination and confluence, but the formalizations in the underlying Isabelle Formalization of Rewriting (IsaFoR) 3 are orthogonal to the ones required for supporting the decision procedure underlying FORT. We recently completed the formalization of the automata constructions involved in the decision procedure [14] . Earlier fragments were described in [8, 13] . In this paper we put these efforts to the test. More precisely, we 1. present a certificate language which is rich enough to express the various automata operations in decision procedures for the first-order theory of rewriting as well as numerous predicate symbols that may appear in formulas in this theory, 2. describe the tasks required to turn the formalization described in [14] into verified code to check certificates within reasonable time, 3. present a new reincarnation of FORT in Haskell, named FORT-h, which is capable of producing certificates. The remainder of the paper is organized as follows. The next section briefly recapitulates the first-order theory of rewriting and the variant of the decision procedure described in [14] . Sections 3 and 4 describe the representation of formulas in certificates and the certificate language. In Section 5 we describe how certificates are validated by FORTify, the verified Haskell program obtained from the Isabelle formalization. Section 6 describes FORT-h. Experimental results are presented in Section 7, before we conclude in Section 8. Familiarity with term rewriting [2] and tree automata [6] is useful, but we briefly recall important definitions and notation that we use in the remainder. Terms T (F , V) are constructed from a signature F, consisting of function symbols with fixed arities, and a set of variables V. A term rewrite system (TRS for short) R consists of rewrite rules → r between terms and r. Instead of the usual restrictions / ∈ V and Var(r) ⊆ Var( ), we require Var( )∩Var(r) = ∅. Here Var(t) denotes the set of variables in a term t. Moreover, and r are assumed to be linear terms (i.e., variables occur at most once). The conditions on the rewrite rules are necessary to ensure decidability of the first-order theory of rewriting for these linear variable-separated TRSs. The (one-step) rewrite relation of a TRS R is denoted by → R . A term t is ground if Var(t) = ∅. The set of ground terms is denoted by T (F ). The first-order theory of rewriting is defined over a language L containing the predicate symbols →, → * , =, and many more. As models, we consider finite linear variable-separated TRSs R over signatures F such that T (F ) is nonempty. The set T (F ) serves as domain for the variables in formulas over L. The interpretation of the predicate symbol → in R is the one-step rewrite relation → R over T (F ), → * denotes the restriction of → * R to terms in T (F ), and = is interpreted as the identity relation on T (F ). Since we use ground terms as carrier, formulas in the first-order theory of rewriting express properties on ground terms. For instance, the following formula ϕ expresses the property of having unique normal forms (UNR): To use ϕ for establishing UNR for arbitrary terms (i.e., terms in T (F , V)) two additional constant symbols need to be added to the signature [18] . (More on this in Section 8.) Additional predicates in L increase the expressive power and also allow expressing properties more compactly. For instance, we can write In Section 3 we present a grammar that describes the available constructions for predicates. All predicates that can be represented using these constructions are supported in our decision procedure. The decision procedure is based on tree automata that recognize relations on ground terms. Here we give a brief summary. More information can be found in [6] and [14] . A tree automaton A = (F , Q, Q f , ∆) consists of a finite signature F , a finite set Q of states, disjoint from F , a subset Q f ⊆ Q of final states, and a set of transition rules ∆. Transition rules have one of the following two shapes: f (p 1 , . . . , p n ) → q with f ∈ F and p 1 , . . . , p n , q ∈ Q, or p → q with p, q ∈ Q. The latter are called epsilon transitions. Transition rules can be viewed as rewrite rules between ground terms in T (F ∪Q). The induced rewrite relation is denoted by → ∆ or → A . A ground term t ∈ T (F ) is accepted by A if t → * ∆ q for some q ∈ Q f . The set of all accepted terms is denoted by L(A) and a set L of ground terms is regular if L = L(A) for some tree automaton A. We encode n-tuples with n 1 of ground terms as terms over an enriched signature, as follows. We write F (n) for the signature (F ∪ {⊥}) n where ⊥ / ∈ F is a fresh constant. The arity of a symbol f 1 · · · f n ∈ F (n) is the maximum of the arities of f 1 , . . . , f n . The encoding of terms t 1 , . . . , t n ∈ T (F ) is the unique term t 1 , . . . , t n ∈ T (F (n) ) such that Pos( t 1 , . . . , t n ) = Pos(t 1 ) ∪ · · · ∪ Pos(t n ) and t 1 , . . . , t n (p) = f 1 · · · f n where f i = t i (p) if p ∈ Pos(t i ) and f i = ⊥ otherwise, for all p ∈ Pos( t 1 , . . . , t n ) and 1 i n. As an example, for the terms s = f(g(a), f(b, b)), t = g(g(a)), and u = f(b, g(a)) we obtain s, t, u = fgf(ggb(aa⊥), f⊥g(b⊥a, b⊥⊥)). An n-ary relation on ground terms is regular if its encoding is accepted by a tree automaton operating on terms in T (F (n) ). Such an automaton is called an RR n automaton and regular n-ary relations are called RR n relations. The i-th cylindrification of an RR n relation R over T (F ) is the RR n+1 relation {(t 1 , . . . , t i−1 , u, t i , . . . , t n ) | (t 1 , . . . , t n ) ∈ R and u ∈ T (F )}. Besides RR n automata, the decision procedure makes use of ground tree transducers (GTTs for short). A GTT is a pair G = (A, B) of tree automata over the same signature F . A pair (s, t) of ground terms in T (F ) is accepted by t for some term u ∈ T (F ∪ Q). Here Q is the combined set of states of A and B. The set of all such pairs is denoted by L(G). We denote by L a (G) the set of all pairs (s, t) such that s → * A q → * B t for some state q ∈ Q. A binary relation R on ground terms is a(n anchored) GTT relation if there exists a GTT G such that R = L(G) (R = L a (G)). The decision procedure for the firstorder theory of rewriting described in [7] and implemented in FORT uses GTTs, the formalized variant described in [14] uses anchored GTTs (aGTTs), which have better closure properties. Both are supported in our certificate language, but FORT-h and FORTify use anchored GTTs since they permit us to model more predicates while reducing the need for ad-hoc constructions that need to be turned into executable (verified) code. The decision procedure for the first-order theory of rewriting constructs RR n automata for the subformulas in a bottom-up fashion. GTTs (aGTTs) come into play for some of the atomic subformulas consisting of predicate symbols and variables. Closure properties take care of the logical structure of formulas. A final emptiness check determines whether the formula is satisfied for the TRS given as input to the decision procedure. Rather than formally stating the properties involved, we illustrate the decision procedure on an example. Example 1. Consider the formula ϕ = ∀ s ∃ t (s → * t ∧ NF(t)), which expresses the normalization property of TRSs. To determine whether a TRS R over a signature F satisfies ϕ, we first construct an RR 1 automaton A 1 that accepts the ground normal forms in T (F ), using an algorithm first described in [5] and recently formalized in [13] . For the subformula s → * t we construct a GTT G 1 for the parallel rewrite relation − → R . Since GTT relations are effectively closed under transitive closure (while RR 2 relations are not), we obtain a GTT G 2 for → * R . This GTT is transformed into an RR 2 automaton A 2 . (In the formalized decision procedure described in [14] , an RR 2 automaton for → * is constructed from an anchored GTT for the root step relation → R , using suitable closure properties of anchored GTT and RR 2 relations.) We cylindrify the RR 1 automaton A 1 into an RR 2 automaton A 3 that accepts T (F ) × NF R . A product construction involving A 2 and A 3 produces an RR 2 automaton A 4 for the subformula s → * t ∧ NF(t). Projection yields an RR 1 automaton A 5 corresponding to ∃ t (s → * t ∧ NF(t)). So ϕ holds if and only if L(A 5 ) = T (F ). In FORT the ∀ quantifier is transformed into the equivalent ¬ ∃ ¬. Hence complementation is used to obtain an RR 1 automaton A 6 and the existential quantifier is implemented using projection. This gives an RR 0 automaton A 7 which either accepts the empty relation ∅ or the singleton set {()} consisting of the nullary tuple (). The outermost negation gives rise to another complementation step. The final RR 0 automaton A 8 is tested for emptiness: L(A 8 ) = ∅ if and only the TRS R does not satisfy ϕ. The first step in the certification process is to translate formulas in the first-order theory of rewriting into a format suitable for further processing. We adopt de Bruijn indices [4] to avoid alpha renaming. in FORT syntax. It expresses the commutation of two TRSs, indicated by the indices 0 and 1. Using de Bruijn indices for the term variables s, t, u, v produces We refer to Example 4 for further explanation. The formal syntax of formulas in certificates is given below. Angle brackets are used for non-terminal symbols. Here rr 2 denotes the supported binary regular relations, which are formally defined after Example 3. Likewise, rr 1 stands for regular sets (which are identified with unary regular relations). term ::= nat trs ::= nat | natnat : De Bruijn indices are used for term variables and natdenotes a TRS with index nat in which the left-and right-hand sides of the rules have been swapped. The class of linear variable-separated TRSs is closed under this operation. We use it to represent the conversion relation ↔ * of a TRS R as the reachability relation → * induced by the TRS R ∪ R − . Example 3. The commutation property in Example 2 is rendered as follows: (forall (forall (forall (or (not (and (rr2 (step* (0)) 2 1) (rr2 (step* (1)) 2 0))) (exists (and (rr2 (step* (1)) 2 0) (rr2 (step* (0)) 1 0))))))) Here (step* (0)) denotes the RR 2 relation → * induced by the first TRS (which is indexed by 0) and (rr2 (step* (1)) 2 0) represents the subformula We continue with the certificate syntax of RR 1 and RR 2 relations: Here (terms) refers to T (F ), (nf ( trs + )) to the normal forms (NF) induced by the union of the underlying TRSs, and (inf rr 2 ) to the infinity predicate (INF R ) which is satisfied by all terms having infinitely many successors with respect to the relation R. Furthermore, (proj (1 | 2) rr 2 ) denotes projection (π) to the first (second) argument, (gtt gtt pos num ) the transformation of a GTT relation into an RR 2 relation with corresponding context closure (cf. [14, Section 3]), (id rr 1 ) the identity relation on the underlying set, and (gtc gtt ) ((atc gtt )) the (anchored) transitive closure of the underlying (anchored) GTT relation. The constructs defined above closely correspond to the formalized closure operations for the predicates in the first-order theory of rewriting, reported in [14] and summarized below: Here A are anchored GTT relations ( gtt ), R are RR 2 relations ( rr 2 ), and T are regular sets of ground terms ( rr 1 ). For convenience of tool authors, we add a few other constructs to rr 2 . The certifier expands these to a sequence of basic constructs given above. rr 2 ::= · · · | (step ( trs + )) | (step= ( trs + )) | (step+ ( trs + )) | (step* ( trs + )) | (equality) The complete list can be obtained from the accompanying website. A certificate for a first-order formula ϕ explains how the corresponding RR n automaton is constructed. We adopt a line-oriented natural deduction style. The automata are implicit. This is a deliberate design decision to keep certificates small. More importantly, it avoids having to check equivalence of finite tree automata, which is EXPTIME-complete [6, Section 1.7]. certificate ::= ( item inference formula info * ) certificate | (empty item ) | (nonempty item ) item ::= nat info ::= (size nat nat nat ) | · · · inference ::= (rr1 rr 1 term ) | (rr2 rr 2 term term ) Currently the info field only serves as an interface between the tool (which provides the certificate) and the certifier to compare the sizes of the constructed automata. In the future we plan to extend this field with concrete automata. This allows to test language equivalence of a tree automaton computed by a tool that supports our certificate language and the one reconstructed by FORTify, thereby providing tool authors with a mechanism to trace buggy constructions in case a certificate is rejected. We revisit Example 1 to illustrate the construction of certificates. ) expressing normalization is rendered as ϕ = ∀∃(1 → * 0 0 ∧ 0 ∈ NF[0]) in de Bruijn notation. Here 1 refers to the variable s, the second and third occurrences of 0 refer to t, and the last occurrence of 0 refer to the first (and only) TRS, which has index 0. We construct the certificate bottom-up, to mimic the decision procedure. The first line is for The components can be read as follows: item = 0 denotes the first step in our proof, inference = rr1 (nf (0)) 0 construct the automaton that accepts the normal forms and keeps track of the variable 0, formula = rr1 (nf (0)) 0 denotes the subformula 0 ∈ NF[0]; it is satisfiable if and only if the automaton constructed using the description in inference is not empty. The apparent redundancy will disappear when we continue. We proceed by expressing the relation → * 0 and subsequently make sure that the second component of → * 0 is in normal form: (1 (rr2 (step* (0)) 1 0) (rr2 (step* (0)) 1 0)) (2 (and (1 0)) (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))) Line 1 is similar to line 0. The inference step and 1 0 in line 2 constructs an RR 2 automaton that accepts the intersection of the relations modeled in lines 1 and 0. This automaton corresponds to A 4 in Example 1. The cylindrification step from A 1 to A 3 in Example 1 is left implicit. We continue with the projection of variable 0 and afterwards complement the resulting automaton. This is done by an exists followed by a not inference step: (3 (exists 2) (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0))))) (4 (not 3) (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))) The inference steps until this point describe the construction of A 6 in Example 1. We complete the certificate by introducing the remaining operators: (5 (exists 4) (exists (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0))))))) (6 (not 5) (not (exists (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))))) (7 (nnf 6) (forall (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))) (nonempty 7) The nnf inference step does not modify the tree automaton computed in step 6 (which corresponds to A 8 in Example 1) but checks the equivalence of the formula in line 6 with the one of line 7, which corresponds to the input formula ϕ . The equivalence check incorporates ∀ elimination, negation normal form, and associativity, commutativity and idempotency of ∧ and ∨. In the future we might add support for additional equivalences in first-order logic. The final step (nonempty 7) checks that L(A 8 ) = ∅. So this certificate claims that the input TRS is normalizing. For TRSs that do not satisfy ϕ, the final line in the certificate would be (empty 7). In the previous example we intentionally skipped over some details to convey the underlying intuition. First of all, the rr 2 construct (step* (0)) is derived and internally unfolded via (anchored) GTTs into (gtt (gtc (root-step 0)) >= >) Starting from an anchored GTT that accepts the root step relation induced by the first (and only) TRS in the list, an application of the GTT transitive closure operation followed by a multi-hole context closure operation with at least one hole that may appear in any position, an RR 2 automaton that accepts the relation → * 0 is constructed. We also mentioned that cylindrification is implicit. The same holds for the projection operation that is used in the exists inference steps. A projection takes place in the first component if the variable 0 is present in the list of variables, otherwise the inference step preserves the automaton. This approach is sound as variables indicate the relevant components of the RR n automaton. Thanks to the de Bruijn representation, the innermost quantifier refers to variable 0, the first component in the given RR n automaton. However we must keep track of all variables occurring in the surrounding formula and update that list accordingly. The example in the preceding section makes clear that a certificate can be viewed as a recipe for the certifier to perform certain operations on automata and for-mulas to confirm the final (non-)emptiness claim. In particular, checking a certificate is expensive because the decision procedure for the first-order theory is replayed using code-generated operations from a verified version of the decision procedure. In this section we describe the steps we performed to turn the Isabelle formalization of the decision procedure described in [14] into our certifier FORTify. We use the FOL-Fitting library [3] , which is part of the Archive of Formal Proofs, 4 to connect the first-order theory of rewriting and first-order logic. The translation is more or less straightforward. We interpret RR 1 constructions as predicates and RR 2 construction as relations in first-order logic and prove both interpretations to be semantically equivalent: lemma eval formula F Rs α f = eval α undefined (for eval rel F Rs) (form of formula f ) With this equivalence we are able to define the semantics of formulas: definition correct certificate where correct certificate F Rs claim infs n ≡ (claim = Empty ←→ (formula unsatisfiable (fset F ) (map fset Rs) (fst (snd (snd (infs ! n))))) ∧ claim = Nonempty ←→ formula satisfiable (fset F ) (map fset Rs) (fst (snd (snd (infs ! n))))) Last but not least we define the important function check certificate which takes as input a signature, a list of TRSs, a boolean, a formula, and a certificate. This function first verifies that the given formula and the claim corresponds to the ones referenced in the certificate and afterwards checks the integrity of the certificate. The following lemmata, which are formally proved in Isabelle, state the correctness of the check certificate function: lemma check certificate F Rs A fm (Certificate infs claim n) = Some B =⇒ fm = fst (snd (snd (infs ! n))) ∧ A = (claim = Nonempty) lemma check certificate F Rs A fm (Certificate infs claim n) = Some B =⇒ (B = True −→ correct certificate F Rs claim infs n) ∧ (B = False −→ correct certificate F Rs (case claim of Empty ⇒ Nonempty | Nonempty ⇒ Empty) infs n) The first lemma ensures that our check function verifies that the provided parameters fm (formula) and A (answer satisfiable/unsatisfiable) match the formula and the claim stated in the certificate. The second lemma is the key result. It states that the check function returns Some True if and only if the certificate is correct. The only-if case is hidden in the last two lines. More precisely, if the claim of the certificate is wrong then negating the claim (the first-order theory of rewriting is complete) leads to a correct certificate. Therefore, if our check function returns Some None then the certificate is correct after negating the claim. Our check function returns None if the global assumptions (the input TRS is not linear variable-separated, the signature is not empty, etc.) are not fulfilled. We plan to extend the check certificate function in the near future such that it reports these kind of errors. A central part of the formalization is to obtain a trustworthy decision procedure to verify certificates. Hence we use the code generation facility of Isabelle/HOL to produce an executable version of our check certificate function. Isabelle's code generation facility is able to derive executable code for our constructions with the exception of inductively defined sets. In [8, Section 7] an abstract Horn inference system for finite sets is introduced to overcome this limitation. We use this framework to obtain executable code for the following constructions defined as inductive sets: reachable and productive states of a tree automaton, states of tree automata obtained by the subset construction, epsilon transitions for the composition and transitive closure constructions of (anchored) GTTs, an inductive set needed for the tree automaton for the infinity predicate. At this point we can use Isabelle's code generation to obtain an executable check function. However, more effort is needed to obtain an efficient check function. Checking the certificate in Example 6 below did not terminate after more than 24 hours computation time. We used the profiling capabilities of the Glasgow Haskell Compiler (GHC) to analyze the generated code. This revealed that most of the time was spent on checking membership. Since the computed tree automata can grow very large, the use of lists as underlying data structure for sets in the generated code is a bottleneck. To overcome this problem we decided to use the container framework of Lochbihler [12] . In our case, the setup involved a non-trivial overhead as the container framework requires multiple class instances for data types used inside sets. Some of these instances could be derived automatically by the deriving framework of Sternagel and Thiemann [20] . Afterwards Isabelle's code generation was able to generate a check certificate function that uses red-black trees as underlying data structure for sets. Sadly, the function was still infeasible for the certificate in Example 6. This time the power set construction, which is exponential in worst case, turned out to be the culprit. In this construction we compute the transitive closure of the present epsilon transitions multiple times. Adding an explicit construction to remove epsilon transitions from tree automata solved this issue. To make a long story short, after further modifications we were able to verify the certificate for Example 6 in a little less than 3 minutes, which we consider fast enough for a first prototype. The resulting code-generated certifier is called FORTify. The overall design of FORTify is shown in Figure 1 . It can be viewed as two separate modules A and B. Module B is the verified Haskell code base that is generated by Isabelle's code generation facility, containing the check certificate function and the data type declarations for formulas and certificates. To use this functionality, we wrote a parser which translates strings representing formulas (signatures, TRSs, certificates) to semantically equivalent formulas (signatures, TRSs, certificates) represented in the data types obtained from the generated code. This was done in Haskell and refers to module A in Figure 1 . Module A accepts formulas in FORT syntax. Hence it also applies the conversion to the de Bruijn representation. After the translation in module A, the check certificate function in module B is executed and its output is reported. Importantly, the code in module A is not verified in Isabelle. Correctness of FORTify must therefore assume correctness of module A as well as the correctness of the Glasgow Haskell Compiler, which we use to generate a standalone executable from the generated code. FORT-h is a new decision tool for the first order theory of rewriting. It is a reimplementation of the decision mode of the previous FORT tool [18] based on a modified decision procedure. The decision procedure, like the formalization, is based on anchored GTTs. The new tool is implemented in Haskell whereas FORT is written in Java. FORT-h supports all features of FORT while extending the domain of supported TRSs from left-linear right-ground TRSs to linear variable-separated ones. While FORT could technically take such TRSs as input, it is unsound when checking non-ground properties on them. Example 5. To check confluence of the linear variable-separated TRS g(g(x)) → g(y) a → g(a) FORT-h can be called with > ./fort-h "CR" input.trs NO where input.trs is a text file containing the rewrite system. The tool correctly states that NO the system is not confluent. However, FORT incorrectly identifies this as confluent due to the lack of support for variables appearing in right-hand sides of rules. FORT-h took part in the 2020 edition of the Confluence Competition, competing in five categories: COM, GCR, NFP, UNC and UNR. Even though it does not support many problems tested in the competition, due to the restriction to linear variable-separated TRSs, it was able to win the category for most YES results in UNR. The tool expects as input a formula ϕ and one or more TRSs, as seen in Figure 2 . It then outputs the answer YES or NO depending on whether ϕ is satisfied or not by the given TRSs. FORT-h may be passed some additional options: -c FILE: causes FORT-h to write a certificate to the given FILE, -i: enables the additional info in the inference steps in the certificate, -v: enables verbose output (e.g. showing the internal formula representation). -w: enables witness generation. As an example of the latter, consider Example 5 and the call > fort-h -w "CR" input.trs NO formula body / witness: (0 (<-o ->*) 1 &~0 (->* o *<-) 1) 0 = g(_00()) 1 = g(_01()) So in addition to the answer NO, it also outputs a counter example for the given formula consisting of the two terms g( 00()) and g( 01()). Here 00 and 01 are additional constants required to reduce confluence to ground-confluence, and represent variables. The terms should therefore be read as g(x) and g(y). Internally FORT-h represents formulas using de Bruijn indices as described in Section 4. Additionally, universal quantifiers and implications are eliminated, and negations are pushed as far as possible to the atomic subformulas. The tool then traverses the formula in a bottom-up fashion, constructing the corresponding anchored GTTs and RR n automata. During this traversal we also keep track of the steps taken, to construct the certificate if necessary. To improve performance the automata are cached and reused for equal subformulas. The tree automaton representing the whole formula is then checked for emptiness. If the accepted language is empty, FORT-h reports NO, otherwise it outputs YES. The experiments described in this section were run on a computer with a Intel(R) Core(TM) i7-5930K CPU with 6 cores at 3.50GHz. In the 2019 edition of the Confluence Competition [15] three tools contested the commutation (COM) category: 5 ACP [1], CoLL [19] , and FORT. On input problem COPS #1118 the tools gave conflicting answers. To determine the correct answer we use FORT-h to produce a certificate for ground-confluence by calling > fort-h -c cert -i "GCom([0], [1] )" 1118.trs YES This produces the following certificate: (0 (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (size 13 53 0)) (1 (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1) (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1) (size 11 47 0)) (2 (not 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))) (3 (and (0 2)) (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))))) (4 (exists 3) (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))) (5 (exists 4) (exists (exists (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))))))) (6 (not 5) (not (exists (exists (and ( (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))))) (7 (nnf 6) (forall (forall (or ( (not (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1)) (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))) (nonempty 7) When passing this certificate to FORTify, after 2 minutes and 57 seconds the output Certified is produced, so we can be assured that the TRSs do commute. Note that the inference steps 0 and 1 contain the optional size information. Here (size k m n) means the underlying RR n automaton constructed by FORT-h contains k final states, m transitions, and n epsilon transitions. We also ran some experiments comparing FORT-h to FORT. The problems for these experiments are taken from the Confluence Problems database (COPS), and consists of 122 left-linear right-ground TRSs. Note that FORT-h implements no parallelism, while FORT does. For the first two experiments we chose a timeout of 60 seconds for the decision tools and 600 seconds for FORTify. The formulas were taken from the experiments reported in [17] . The first three denote different but equivalent formulations of ground-confluence (GCR). The results are shown in Table 1 , where the YES (NO) column shows the number of systems determined to be (non-)ground-confluent together with average time (∅-time) the tool took. The ∞ column is the number of timeouts. To compare overall performance the total time column contains the sum of all runtimes, including timeouts but excluding the time taken by FORTify. The columns show the numbers of certifiable results as well as the overall time taken by FORTify (-time). These results show that, even though they have the same meaning, the choice of formula has an impact on performance. Interestingly FORT-h is generally faster and can solve more problems than FORT even though it can not take advantage of any parallelism. This performance advantage is more prominent in systems which are non-confluent. For problems with the answer YES, FORT can still prove more. The table also shows that FORTify can only certify a small portion the results. This is due to the performance of the certifier, since all other problems time out. It is also apparent that formulas containing conversion (↔ * ) are especially slow. No wrong results by the decision tools where identified. The second set of formulas represents the normal form property, restricted to ground terms (GNFP): The results for these are shown in Table 2 . The same pattern is observed, where even though both can (dis)prove satisfaction for the same formulas, FORT-h is faster overall. For the last experiment we test performance on properties over two TRSs. This is done by checking ground-commutation (GCOM) for all pairs of systems form the dataset, resulting in 7503 problems. A timeout of 60 seconds was used. The results, presented in Table 3 , show that FORT-h is ahead here as well, (dis)proving more problems and doing so in significantly less time. Full details of the experiments are available from the website 6 accompanying this paper. Precompiled binaries of FORT-h and FORTify are available from the same site. We also present a few additional experiments with FORTify. 6 https://fortissimo.uibk.ac.at/tacas2021 In this paper we presented FORTify, a certifier for the first-order theory of rewriting for linear variable-separated TRSs, together with an expressive certificate language for formulas and proofs. Moreover, a new implementation of the decision procedure for the theory of rewriting, FORT-h, is capable of producing certificates in this language. We mention three topics which require further research. First of all, many certificates produced by FORT-h cannot be validated by the current version of FORTify within reasonable time. We will further improve the algorithms and data structures used in the check-certificate function. A natural candidate for optimization is the transitive closure algorithm generated by Isabelle, which always takes cubic time. Currently, sharing only takes place in the inference rules. Expanding this to the individual constructions will be the next step. Also trimming of anchored GTTs could improve the run time. In the current state of the formalization only trimming of GTTs is proved to be sound. Profiling will be used to determine other candidates that are likely to have a large impact on the validation time. A second topic for future research is the certification of properties on open (i.e., non-ground) terms. In [8, 16, 18] conditions are presented to reduce properties related to confluence to the corresponding properties on ground terms, by adding additional constants to the signature. These results need to be formalized in Isabelle and the certificate language needs to be extended, before FORTify can be used to certify the corresponding categories in the Confluence Competition. We plan to define signature extensions directly in formulas, to offer the most flexibility. A related issue is the support for many-sorted signatures in the Isabelle formalization. FORT-h already supports many-sorted TRSs, which is the format in the GCR category of CoCo. A third topic is improving the efficiency of FORT-h. We anticipate that supporting parallelism will further speed up FORT-h, especially for large formulas. Preprocessing techniques that go beyond the mere transformation to negation normal form will be helpful to obtain equivalent formulas that reduce the size of the ensuing tree automata in the decision procedure. In [10] similar ideas are applied to WSkS, in connection with MONA [11] . Proving confluence of term rewriting systems automatically Term Rewriting and All That First-order logic according to Fitting. Archive of Formal Proofs Lambda calculus notation with nameless dummies: A tool for automatic formula manipulation, with application to the Church-Rosser theorem Sequentiality, monadic second-order logic and tree automata Tree automata techniques and applications The theory of ground rewrite systems is decidable A verified ground confluence tool for linear variable-separated rewrite systems in Isabelle/HOL The termination and complexity competition Antiprenexing for WSkS: A little goes a long way MONA implementation secrets Light-weight containers for Isabelle: Efficient, extensible, nestable Formalized proofs of the infinity and normal form predicates in the first-order theory of rewriting Proc. 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems A verified decision procedure for the first-order theory of rewriting for linear variable-separated rewrite systems variable-separated rewrite systems in Isabelle/HOL Confluence competition Extending Tools for Confluence and Related Properties of Rewrite Systems. Master's thesis Automating the first-order theory of left-linear rightground term rewrite systems Proc. 9th International Joint Conference on Automated Reasoning. LNAI CoLL: A confluence tool for left-linear term rewrite systems Deriving class instances for datatypes Certification of termination proofs using CeTA ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use Acknowledgments. We thank René Thiemann for giving valuable advice on how to improve the efficiency of the generated code. The comments by the anonymous reviewers improved the presentation.