key: cord-0060428-rnr5voqu
authors: Reger, Giles; Schoisswohl, Johannes; Voronkov, Andrei
title: Making Theory Reasoning Simpler
date: 2021-02-26
journal: Tools and Algorithms for the Construction and Analysis of Systems
DOI: 10.1007/978-3-030-72013-1_9
sha: 87959a155c3c13cf48b196e3ba4f2a8f94d989a0
doc_id: 60428
cord_uid: rnr5voqu

Reasoning with quantifiers and theories is at the core of many applications in program analysis and verification. Whilst the problem is undecidable in general and hard in practice, we have been making large pragmatic steps forward. Our previous work proposed an instantiation rule for theory reasoning that produced pragmatically useful instances. Whilst this led to an increase in performance, it had its limitations as the rule produces ground instances which (i) can be overly specific, thus not useful in proof search, and (ii) contribute to the already problematic search space explosion as many new instances are introduced. This paper begins by introducing that specifically addresses these two concerns as it produces general solutions and it is a simplification rule, i.e. it replaces an existing clause by a ‘simpler’ one. Encouraged by initial success with this new rule, we performed an experiment to identify further common cases where the complex structure of theory terms blocked existing methods. This resulted in four further simplification rules for theory reasoning. The resulting extensions are implemented in the Vampire theorem prover and evaluated on SMT-LIB, showing that the new extensions result in a considerable increase in the number of problems solved, including 90 problems unsolved by state-of-the-art SMT solvers.

Many applications of reasoning in program analysis and verification depend on reasoning with the first-order theory of arithmetic, often in combination with other theories and quantifiers. A common approach to this problem is via Satisfiability Modulo Theory (SMT) solving, which has strong support for decidable theories but may struggle to scale in the presence of quantifiers. Conversely, superposition-based first-order solvers handle quantifiers naturally and have, recently, been extended to reason with theories [2, 3, 5, 6, 9, 13, 16, 21] . Such solvers are based on a saturation loop and tend to suffer from search space explosion. This is compounded by the effective but explosive use of theory axioms, leading to the derivation of numerous inconsequential consequences of the theory. So far we have attempted to control this explosive behaviour [10, 17] but now we aim to eliminate some of it. This paper introduces a set of simplification rules for reasoning in the theory of (any combination of linear or non-linear real, rational, or integer) arithmetic, i.e. rules that make reasoning in arithmetic simpler. This work was motivated by our previous attempt [20] to find useful instances of first-order clauses that would be otherwise difficult to find via reasoning with theory axioms. For example, when considering the two clauses r(7x) ¬r(6 + y) ∨ p(y)

our previous work would apply resolution on r(7x) and ¬r(6+y) using unification with abstraction to produce the clause 7x = 6 + y ∨ p(y) and then applied theory instantiation, utilising an SMT solver to find the substitution {x → 1, y → 1}, producing the instance p(1). This may or may not be useful to proof search and, crucially, we need to keep performing inferences with the original clauses in case it is not. In this case, we would prefer to instantiate with {y → 7x − 6} to produce 7x = 6 + (7x − 6) ∨ p(7x − 6), which can be reduced to p(7x − 6). This is a general solution (being logically equivalent) that is also simpler -in this case it has fewer variables than the original clause. Hence, we replace the clause by the more general result, aiding proof search and preventing the addition of unnecessary instances. The above was motivated by the observation that we would often see clauses of the form kx = t ∨ C[x] (for numeral k, variable x, and term t) and expend much effort using theory axioms to rewrite kx = t into x = t k . This led us to conduct an experiment to identify other common cases where arithmetic clauses could be simplified. An immediate observation is that, if x ranges over the reals, p(7x − 6) can be instantiated with {x → (y+6) 7 } to produce p(y). Furthermore, in the above example we no longer need to employ the expensive unification with abstraction as we can instantiate r(7x) with {x → z 7 } to produce r(z) and then resolve with r(6 + y) ∨ p(y) to produce p(y) directly.

Another observation was that a large amount of effort was expended by the theorem prover reordering sums and products to expose seemingly obvious structure. For example, taking (3t + x) + 2t and producing 5t + x requires three theory axioms and 12 rewriting steps. To combat this, we introduce an evaluation method that flattens sums and products, reorders and simplifies them, before reintroducing the necessary bracketed structure. A related common issue was the occurrence of terms that could easily be cancelled, such as in 4x + 3 < 4x + 10, again requiring significant rewriting effort that can be replaced by a special rule. This paper does not present the exploratory experimentation described above but focusses instead on the fruits of this work. After introducing the necessary preliminaries (Sec. 2), we make the following contributions:

-A new Gaussian Variable Elimination rule (Sec. 3) that eliminates variables if they can be described completely in terms of other variables. -A set of Arithmetic Subterm Generalisation rules (Sec. 4) that replace clauses with obvious generalisations, as in the above cases of replacing p(7x−6) with p(y) and r(7x) with r(x). -A general approach to the evaluation of terms involving arithmetic (Sec. 5), including a special rule to handle a surprisingly common corner case involving unary minus. 

First-Order Logic and Theories. We consider a many-sorted first-order logic with equality. A signature is a pair Σ = (Ξ, Ω) where Ξ is a set of sorts and Ω a set of predicate and function symbols with associated argument and return sorts from Ξ. Terms are of the form c, x, or f (t 1 , . . . , t n ) where f is a function symbol of arity n ≥ 1, t 1 , . . . , t n are terms, c is a zero arity function symbol (i.e. a constant) and x is a variable. We assume that all terms are well-sorted and write t : σ if term t has sort σ. Atoms are of the form p(t 1 , . . . , t n ), q or t 1 s t 2 where p is a predicate symbol of arity n, t 1 , . . . , t n are terms, q is a zero arity predicate symbol and for each sort s ∈ Ξ, s is the equality symbol for the sort s. We write simply when s is known from the context or irrelevant. A literal is either an atom A, in which case we call it positive, or a negation of an atom ¬A, in which case we call it negative. When L is a negative literal ¬A and we write ¬L, we mean the positive literal A. For negative literals with binary predicates ¬(t 1 ♦ t 2 ) (like, e.g. equality), we sometimes write t 1 ♦ t 2 .

A clause is a disjunction of literals L 1 ∨ . . . ∨ L n for n ≥ 0. We disregard the order of literals and treat a clause as a multiset. When n = 0 we speak of the empty clause, which is always false. When n = 1 a clause is called a unit clause. Variables in clauses are considered to be universally quantified. Standard methods exist to transform an arbitrary first-order formula into clausal form (e.g. [15] and our recent work in [19] ).

In the following we use expression to mean a term, an atom, a literal, or a clause. We write E[t] p to denote an expression E containing a term t at position p (a position is a unique point in an expression's syntax tree) and may then write E[s] p to denote the same expression with t replaced by term s at p. We will normally leave the position p as implicit. A substitution is any θ of the form {x 1 → t 1 , . . . , x n → t n }, where n ≥ 0. Eθ is the expression obtained from E by the simultaneous replacement of each x i by t i . An expression is ground if it contains no variables. An instance of E is any expression Eθ and a ground instance of E is any instance of E that is ground. A unifier of two terms, atoms or literals E 1 and E 2 is a substitution θ such that E 1 θ = E 2 θ. It is known that if two expressions have a unifier, then they have a so-called most general unifier.

We assume a standard notion of a (first-order, many-sorted) interpretation I, which assigns a non-empty domain I s to every sort s ∈ Ξ, and maps every function symbol f to a function I f and every predicate symbol p to a relation I p on these domains so that the mapping respects sorts. We call I f the interpretation of f in I, and similarly for I p and I s . Interpretations are also sometimes called first-order structures. A sentence is a closed formula, i.e. with no free variables. We use the standard notions of validity and satisfiability of sentences in such interpretations. An interpretation is a model for a set of clauses if (the universal closure of) each of these clauses is true in the interpretation.

A theory T is identified by a class of interpretations. A sentence is satisfiable in T if it is true in at least one of these interpretations and valid if it is true in all of them. A function (or predicate) symbol f is called uninterpreted in T , if for every interpretation I of T and every interpretation I which agrees with I on all symbols apart from f , I is also an interpretation of T . A theory is called complete if, for every sentence F of this theory, either F or ¬F is valid in this theory. Evidently, every theory of a single interpretation is complete. We can define satisfiability and validity of arbitrary formulas in an interpretation in a standard way by treating free variables as new uninterpreted constants.

The theories we will deal with are the theories of integer, rational, and real arithmetic with uninterpreted functions, denoted by T Z , T Q , and T R , which fix the interpretation of a distinguished sort σ Z , σ Q , and σ R to the set of mathematical integers Z, rationals Q, and reals R respectively, and assign the usual meanings to the function and predicate symbols {+, −, <, ≤, ·}. By k, we denote the numeral interpreted as k in any of these theories. We consider signatures over these theories to additionally contain uninterpreted functions, and predicates, hence, in contrast to the case without unintpreted functions, for none of these theories there is a sound and complete proof system (see e.g. [13] ).

Unless stated differently, we use the symbols x, y, z for variables, s, t, u for terms, C, D for clauses, p, q, r for predicate symbols, f, g, h for function symbols, and σ for substitutions, and sorts, with sometimes suffixes being added.

Term Orderings. A simplification ordering (see, e.g. [8] ) on terms is an ordering that is well-founded, monotonic, stable under substitutions and has the subterm property. Such an ordering captures a notion of simplicity, i.e. t 1 ≺ t 2 implies that t 1 is in some way simpler than t 2 . Vampire uses the Knuth-Bendix ordering [12] , which is parametrized by total precedence ordering on function and predicate symbols . This is total on ground terms and partial on non-ground ones, leading to the possibility of incomparable terms, e.g. f (x, a) and f (b, y). A simplification ordering ≺ on terms can be extended to a simplification ordering on literals and clauses, using a multiset extension of orderings. For simplicity, we will use ≺ to refer to the term ordering and its lifting.

Saturation-Based Proof Search. We introduce our new rules within the context of saturation-based proof search. The general idea in saturation is to maintain two sets of Active and Passive clauses. A saturation-loop then selects a clause C from Passive, places C in Active, applies generating inferences between C and clauses in Active, and finally places newly derived clauses in Passive after applying some retention tests. The retention tests involve checking whether the new clause is itself redundant (i.e. a tautology) or redundant with respect to existing clauses (implied by a set of smaller clauses in Active ∪ Passive). Rules that remove the parent clause immediately from the search space without performing a retention test are called immediate simplification rules. Whenever there are applicable immediate simplification rules, the first one wrt. some fixed ordering is chosen to be applied to the selected clause instead of applying any other rule. The rules introduced in this paper are all introduced as immediate simplification rules. However, as mentioned later, not all of them strictly obey the requirement that the result is smaller. Normally this would have implications on the completeness of the approach but we lose completeness when we start reasoning with theories. This leads us to a trade-off between the potential loss of some proofs by missing some inferences, and the potential gain via simplifying proof search. Our later experimental results show that forgoing completeness is of pragmatic interest.

Superposition Calculus. Vampire works with the superposition and resolution calculus (see our previous work [11, 14] for a description). The calculus itself is not of direct interest to this work. We do, however, draw attention to two rules. Firstly, the Equality Resolution rule s t ∨ C Cθ θ is a most general unifier of s and t is a starting point for both our previous theory instantiation work and the Gaussian Variable Elimination rule introduced later (Sec. 3). Secondly, we draw attention to the Demodulation (or rewriting by unit equalities) rule

where lθ = t, rθ ≺ lθ, and (l r)θ ≺ L[t] ∨ C. This is of interest as later we will need to take special care of the last side-condition when evaluating terms.

Theory Reasoning. To perform theory reasoning within this context it is common to do two things. Firstly, to evaluate new clauses to put them in a common form (e.g. rewrite all inequalities in terms of <) and evaluate ground theory terms and literals (e.g. 1 + 2 becomes 3 and 1 < 2 becomes false). More complex evaluation is possible and is the subject of this work (see Section 5). Secondly, relevant theory axioms can be added to the initial search space. For example, if the input clauses use the + symbol one can add the axioms x + y y + x and x + 0 x, among others.

In addition to these basic methods, Vampire also employs a number of other techniques. AVATAR modulo theories [16] uses an SMT solver within the context of clause splitting to ensure that the ground part of any chosen clause splits are theory-consistent. The previously mentioned unification with abstraction and theory instantiation [20] rules support lazy unification modulo theories and pragmatic instantiation. Theory axiom usage can be controlled by the set of support strategy [17] or layered clause selection [10] . Both approaches de-prioritise reasoning with theory axioms.

Recall the example 7x = 6 + y ∨ p(y) from the Introduction (Sec. 1) where we want to identify the substitution {y → 7x − 6} to produce the simpler instance p(7x − 6). Our general approach is to rewrite 7x = 6 + y in terms of y and then apply the standard Equality Resolution rule introduced in Sec. 2. This gives us the straightforward rule:

where x : σ Z , x : σ Q , or x : σ R , s, t =⇒ * gve x, u , or t, s =⇒ * gve x, u and x is not a subterm of u. The relation =⇒ * gve is the reflexive, and transitive closure of the relation =⇒ gve which can be defined as follows.

if s = 0, and s : σ Q , or t : σ R s / t, u =⇒ gve s, u · t if t = 0, and t : σ Q , or t : σ R It should be noted that =⇒ gve is not normalising. The pair s 1 + s 2 , t can, for example, be rewritten to s 1 , t − s 2 , as well as to s 2 , t − s 1 . But due to the fact that there is at most a linear number of such rewritings, we can enumerate all of them and choose the first x, t , such that x is not a subterm of t. Further choice comes from the fact that we can either rewrite based on l, r , or based on r, l . Looking at our example, we could rewrite 6 + y, 7x =⇒ gve y, 7x − 6 but also 7x, 6 + y =⇒ gve x, (6 + y) / 7 if x is not of integer sort, leaving us with a choice. Another source of choice comes from the fact that our premise can contain multiple negative equalities. Any of those could potentially be used to rewrite the rest of the clause. Since application of the rule, will yield a logically equivalent conclusion, with fewer literals and fewer distinct variables, we make an arbitrary choice. For the same reason, we implement this as a simplification rule (thus removing the premise from the search space) even though the conclusion will often be incomparable to (not smaller than) the premise.

To further demonstrate this rule we consider the additional example x + y = 36 ∨ x + 3y = 90 ∨ p(x, y) gve (36 − y) + 3y = 90 ∨ p(36 − y, y) eval 36 + 2y = 90 ∨ p(36 − y, y) gve ∨ p(36 − (90 − 36)/2, (90 − 36)/2) eval p (9, 27) which highlights the need to interleave evaluation between successive Gaussian elimination steps -we discuss our evaluation strategy below.

Taking a closer look at the choice for our example from the previous section, we see that we could have instantiated the premise y + 6 7x ∨ p(y) either with {y → 7x − 6} to get p(7x − 6), or with {x → (6 + y) / 7} to obtain p(y) (again, assuming that x is not of integer sort). Both of the clauses are logically equivalent in T Q , and T R , since the earlier is an instance of the latter, and the latter implies the earlier as we can apply the substitution {x → (y + 6) / 7} and simplify the result to the earlier clause. Obviously this kind of reasoning can be applied for any linear subterm k · x + d where k = 0.

Splitting this idea into multiple rules lets us take these generalizations further. Therefore we propose 4 rules for arithmetic subterm generalization, that are illustrated in a single example in Figure 1 .

Since we do not want the applicability of our generalization rules to depend on associativity and commutativity (AC) we will formulate them modulo AC. For this purpose we introduce the following notation. We use C[t] AC to denote a clause that contains the subterm t modulo AC. Further we use C[t ] AC to denote the same clause, but all occurrences of t modulo AC, being replaced by t .

where x : σ for some σ ∈ {σ Z , σ Q , σ R } all occurrences of x are in the subterm x + t 1 + . . . + t n (modulo AC) x is not a subterm of t i

The first rule deals with the case where a clause contains a sum with a variable as summand. Such a sum can be generalized by applying the substitution {x → x − t 1 − . . . − t n } , and simplifying the result.

Numeral Multiplication Generalization

In the second rule we generalize a product that contains one variable that occurs only once in this product. Its soundness is justified by the substitution {x → x k }.

In this rule we generalize subterms that are products of variables, containing redundant variables. The rule is sound since we can replace x i by 1.

The last rule lets us generalize away redundant powers of variables. Its soundness is guaranteed by the fact, that for Real numbers the co-domains of x n and x k are the same.

All of the above rules produce a result that is smaller with respect to any simplification ordering due to the removal of terms, justifying their implementation as immediate simplifications.

As mentioned above, reasoning with arithmetic often requires us to be able to evaluate terms -evaluations such as 3 + 3 =⇒ 6 and f (x) + 0 =⇒ f (x) are straightforward but we also want to support evaluations such as (3t + x)+2t =⇒ 5t + x for variable x and arbitrary term t. We introduce a new method for this (replacing a previous ad-hoc method implemented in Vampire). The general idea is to first rewrite terms into a special normal form, apply simplifying steps that preserve this form, and then denormalise to obtain standard terms again. We describe the three steps in detail below.

Normalization. This step removes the need to take care of reordering and bracketing of terms. Our general normal form is as follows

. To get to this normal form we rewrite −t as −1 · t, rewrite t / c as t · 1 c , rewrite t as 1 · t where necessary, and sort with respect to ≺ 1 and ≺ 2 . Both relations ≺ 1 , and ≺ 2 need to be strict total orderings, on terms, and ≺ 1 -sorted lists of terms respectively. Vampire uses so-called aggressive sharing for terms, meaning that for each distinct term there is at most one instance present in memory, and copies are being made by copying the term's id. Hence we can define ≺ 1 as comparing the ids of two terms. We use the same approach for ≺ 2 .

Simplification. Once in normal form, terms can be simplified by joining coefficients for identical terms and removing terms multiplied by zero. This can be given as follows:

If we would generate an empty sum by removing an addition we will simplify to 0 instead. All of these steps can be implemented in linear time and in a bottom up manner, since we firstly can rely on the terms being sorted by the non-numeral parts of their summands, and secondly on a numeral part of a product being on a fixed position.

Denormalisation. Finally, as the normal form contains redundant information (such as 1 · t + . . . instead of t + . . .) we need to denormalise as follows:

We define the rule eval to be the chain of normalising, simplifying and denormalising a clause in a bottom-up manner, which is only applied if the step of simplification is successful for some subterm. The reason for not always applying the rules is to prevent arbitrary reordering of sums and products, which in many cases leads to conclusions being bigger than the premise. This can have significant consequences beyond perturbing proof search. Consider the following scenario involving the Demodulation rule (see Sec. 2).

This process would repeat itself ad infinitum as the initial clause is deleted, replaced by an identical clause. Evaluation would violate the side-condition that should have prevented this, if we would not insist on the step of simplification being successful for the rule to be applied.

In most cases this inference rule is a true simplification wrt. our simplification ordering, since we eliminate at least one symbol in each of the cases in the step simplification. Due to generating sometimes bigger terms in the normalisation, like in the case x + x ⇒ 1 · x + 1 · x ⇒ 2 · x we sometimes violate the simplification ordering. Due to the fact that these cases do not occur too frequently, and completeness is not possible in our base theories, we ignore these violations.

During experimentation, we discovered many cases where a unary minus blocks our evaluation rule. Consider the following desired derivation

The motivation for our last rule was two-fold. Firstly evaluation of constant predicates can be helpful in some cases, but fails in seemingly trivial cases. One example for a case like this is the redundant literal 4x + 3 < 4x + 10. The simple approach of evaluating interpreted predicates fails since we are dealing with nonground symbols. However it can be simplified to a ground term that can then be evaluated, by cancelling away the 4x on both sides of the inequality. The second motivation were cases where unification with abstraction yields literals in which gve could almost be applied but require a step of cancellation. An example for such a case is the derivation

In order to resolve both of these cases we propose the inference rule cancellation cancel, which consists of the following two symmetric cases depending on which side is cancelled.

In order for the rule to not be sensitive to associativity and commutativity, we perform the same steps of normalisation and denormalisation as for the rule eval. Again we will only simplify a clause, if cancellation itself, not only normalisation and denormalisation, is applicable.

The rule is a simplification rule since the number of symbols is reduced with (almost) every application of the cancellation. Table 1 . Compares the number of problems solved with any configuration where a new option is enabled to the ones where it is disabled, with a runtime of 10 seconds. The column "both" lists how many were solved in either case. The columns "on", and "off" list how many additional problems could have been solved with the option enabled, or disabled respectively. 

We describe two experiments to establish the impact of the new rules. The first experiment compares the new rules to each other, whilst the second experiment aims to determine how helpful the new rules will be in designing extensions to Vampire's portfolio mode. This is a standard approach to evaluating the benefit of new features in an automated theorem prover [18] .

Experimental Setup. We implemented the rules as immediate simplification rules in Vampire 4.5 (the implementation is available from the GitHub repository linked from the Vampire website [1], on the branch integer-arithemtic). We selected a suitable subset of problems as follows. We started with the set problems of 56,210 from SMT-LIB that involve quantifiers and arithmetic. In a first step we filtered out benchmarks that Vampire could solve within 1 second in both default mode (which involves a simpler version of the rule eval), and in default mode with eval enabled. Our main experiments were carried out on the remaining set of 21,512 benchmarks, we which will refer to as B. Filtering out trivial benchmarks avoids the results containing noise from benchmarks that can easily be solved and is an approach recently adopted by SMT-COMP [22] . Experiments are run on a Linux cluster where each node contains two octacore 2.1 GHz Intel Xeon processors and 160GB of RAM. The raw results of our experiments can be found on GitHub 3 . Experiment 1. In our first experiment we wanted to find out which are the best combinations of new rules, and whether the rules themselves have a positive impact on proof search. Therefore we ran Vampire in each of the 32 configurations C resulting from enabling or disabling each of the 5 groups of rules (asg, gve, eval, push − , and cancel) over B with a timeout of 10 seconds.

The results are given in Table 1 showing the total number of problems solved and the problems gained/lost when compared to the default mode with no options set. Each row represents the combination (union) of 16 strategies where that option is turned on. This shows that, with the exception of evaluation, the gains outweigh the losses, sometimes considerably. This result for evaluation tells us that the other rules can still operate effectively without our new evaluation and, further, that the two evaluation methods are in some sense complementary. Therefore, whilst we explore this further, we will keep both evaluation methods. The most significant gains are with cancellation, which may be related to the fact that it is applicable to inequalities as well as equalities.

Greedy Ranking. Another way of looking at the results of Experiment 1 is to create a greedy ranking rank of all configurations C, starting with the set of all configurations, and ranking the configuration solving the most benchmarks in B as the best, ranking the one that solves most of the remaining benchmarks as second, and so on. The top 10 strategies in this ranking are given in Table 2 .

The overall best strategy uses all 5 of the new rules. Interestingly, the second best strategy only uses the gve rule. This ranking indicates the most promising strategies to use in our next experiment.

In our second experiment we wanted to see how many new problems we can solve with the new simplification rules compared to our current best effort in Vampire 4.5. Therefore we ran Vampire with the three top ranking configurations of experiment 3 forced added on top of Vampire's portfolio mode. The portfolio mode executes a sequence of strategies heuristically chosen based on problem features. Forcing a configuration of new options on top of this forces each strategy to make use of the new options. We ran this experiment over B with a timeout of 200 seconds. Results are given in Table 3 and show that the new rules allow Vampire to solver considerably more problems (1052) than it could before whilst losing relatively few (47). The best configuration of options (all five new rules) solves the most with the other two configurations solving roughly the same. The interesting point here is that they remain complementary, solving a large number of problems uniquely. These are the exact conditions we require for producing a new, powerful portfolio mode. It is likely that performance will improve even further when also considering other option combinations.

Finally, Table 4 compares the number of problems solved by either of the three top strategies -referred to as Vampire * -against Vampire 4.5, Z3 [7] and Cvc4 [4] . Results are further separated by the logic in which the benchmarks belong -A stands for Arrays, UF stands for Uninterpreted Functions, DT stands for Data Types, L stands for Linear, N for Non-linear, I stands for Integers, R stands for Reals, with the final A standing for Arithmetic. Here we notice that the new rules make a considerable impact in the case of pure linear real arithmetic. This is likely due to the fact that the asg allows us to fully generalise away most linear terms and gve will be broadly applicable without uninterpreted functions. It is interesting to note that, whilst the new Vampire solves fewer problems than Cvc4, and Z3 overall, it solves many (1056, and 1350) problems that the other provers do not solve. The most striking result is that we can solve 90 new problems, neither Vampire 4.5 nor either of the state-of-the-art SMT solvers could solve.

We have motivated and introduced five new simplification rules for reasoning in the theory of arithmetic within saturation-based first-order theorem provers. These rules were implemented within the Vampire theorem prover and demonstrated to improve the reasoning power on problems taken from SMT-LIB. It remains future work to explore the ideal combinations of these rules and existing proof search heuristics. It also remains an open question whether we can design an evaluation rule and modified simplification ordering that ensures that every evaluation that we want to perform is a true simplification. As demonstrated, this is not necessary pragmatically but would be satisfying theoretically.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Superposition modulo linear arithmetic SUP(LA)

Refutational theorem proving for hierarchic first-order theories

Proceedings of the 23rd International Conference on Computer Aided Verification, number 6806 in Lecture Notes in Computer Science

Hierarchic Superposition With Weak Abstraction

On deciding satisfiability by theorem proving with speculative inferences

Z3: an efficient SMT solver

Rewriting

Theory instantiation

Layered clause selection for theory reasoning

Selecting the selection

Simple word problems in universal algebra

Integrating linear arithmetic into superposition calculus

First-order theorem proving and Vampire

Computing small clause normal forms

AVATAR modulo theories

Set of support for theory reasoning

The challenges of evaluating a new feature in Vampire

New techniques in clausal form generation

Unification with abstraction and theory instantiation in saturation-based reasoning

A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic

The smt competition

This is not currently possible as the weight of −y + −t is 5, which is larger than the weight of −(y + t), meaning the second step is not a simplification.We introduce a simple fix by modifying the weight function and symbol precedence of the Knuth-Bendix ordering as follows:1. We let − to be weight 0 (for every sorted version of −) 2. We let − be the largest symbol among symbols of its sort As a result we can use the following rewrite rule as an additional simplifaction rule, since the right hand side has the same weight as the left hand side, but −, the outer most symbol on the left hand side, has higher precedence than + the one on the right hand side. Table 4 . Comparing our new approach, Vampire *, against Vampire 4.5, Cvc4, and Z3 with results separated by logic. The notation (+a, −b) means that the solver solved a problems the new Vampire could not solve, and the new vampire could solve b the other solver couldn't. The entries a(b) in the column Vampire *, list the number a of problems that could be solved by our new rules, and b the number of these problems that could not be solved by any of the other solvers.