key: cord-0060380-mma82iba
authors: Ghilardi, Silvio; Gianola, Alessandro; Kapur, Deepak
title: Interpolation and Amalgamation for Arrays with MaxDiff
date: 2021-03-23
journal: Foundations of Software Science and Computation Structures
DOI: 10.1007/978-3-030-71995-1_14
sha: bfcb1c180ece6afad0271e5846da7d63fc81bac3
doc_id: 60380
cord_uid: mma82iba

In this paper, the theory of McCarthy’s extensional arrays enriched with a maxdiff operation (this operation returns the biggest index where two given arrays differ) is proposed. It is known from the literature that a diff operation is required for the theory of arrays in order to enjoy the Craig interpolation property at the quantifier-free level. However, the diff operation introduced in the literature is merely instrumental to this purpose and has only a purely formal meaning (it is obtained from the Skolemization of the extensionality axiom). Our maxdiff operation significantly increases the level of expressivity; however, obtaining interpolation results for the resulting theory becomes a surprisingly hard task. We obtain such results via a thorough semantic analysis of the models of the theory and of their amalgamation properties. The results are modular with respect to the index theory and it is shown how to convert them into concrete interpolation algorithms via a hierarchical approach.

Since McMillan's seminal papers [31, 32] , interpolation has been successfully applied in software model checking, also in combination with orthogonal techniques like PDR [38] or k-induction [29] . The reason why interpolation techniques are so attractive is because they allow to discover in a completely automatic way new atoms (improperly often called 'predicates') that might contribute to the construction of invariants. In fact, software model-checking problems are typically infinite state, so invariant synthesis may require introducing formulae whose search is not finitely bounded. One way to discover them is to analyze spurious error traces; for instance, if the system under examination (described by a transition formula T r(x, x )) cannot reach in n-step an error configuration in U (x) starting from an initial configuration in In(x), this means that the formula

In(x 0 ) ∧ T r(x 0 , x 1 ) ∧ · · · ∧ T r(x n−1 , x n ) ∧ U (x n )

The third author has been partially supported by the National Science Foundation CCF award 1908804.

is inconsistent (modulo a suitable theory T ). From the inconsistency proof, by computing an interpolant, say at the i-th iteration, one can produce a formula φ(x) such that, modulo T , we have

(1) This formula (and the atoms it contains) can contribute to the refinement of the current candidate loop invariant guaranteeing safey. This fact can be exploited in very different ways during invariant search, depending on the various techniques employed. It should be noticed however that interpolants are not unique and that different interpolation algorithms may return interpolants of different quality: all interpolants restrict search, but not all of them might be conclusive.

This new application of interpolation is different from the role of interpolants for analyzing proof theories of various logics starting with the pioneering works of [15, 24, 34] . It should be said however that Craig interpolation theorem in first order logic does not give by itself any information on the shape the interpolant can have when a specific theory is involved. Nevertheless, this is crucial for the applications: when we extract an interpolant from a trace like (1), we are typically handling a theory which might be undecidable, but whose quantifier-free fragment is decidable for satisfiability (usually within a somewhat 'reasonable' computational complexity). Thus, it is desirable (although not always possible) that the interpolant is quantifier-free, a fact which is not guaranteed in the general case. This is why a lot of effort has been made in analyzing quantifier-free interpolation, also exploiting its connection to semantic properties like amalgamation and strong amalgamation (see [9] for comprehensive results in the area).

The specific theories we want to analyze in this paper are variants of Mc-Carthy's theory of arrays [30] with extensionality (see Section 3 below for a detailed description). The main operations considered in this theory are the write operation (i.e. the array update) and the read operation (i.e., the access to the content of an array cell). As such, this theory is suitable to formalize programs over arrays, like standard copying, comparing, searching, sorting, etc. functions; verification problems of this kind are collected in the SV-COMP benchmarks category "ReachSafety-Arrays" 4 , where safety verification tasks involving arrays of finite but unknown length are considered.

By itself, the theory of arrays with extensionality does not have quantifier free interpolation [28] 5 ; however, in [8] it was shown that quantifier-free interpolation is restored if one enriches the language with a binary function skolemizing the extensionality axiom (the result was confirmed -via different interpolation algorithms -in [23, 37] ). Such a Skolem function, applied to two array variables a, b, returns an index diff(a, b) where a, b differ (it returns an arbitrary value if a is equal to b). This semantics for the diff operation is very undetermined and does not have a significant interpretation in concrete programs. That is why we propose to modify it in order to give it a defined and natural meaning: we ask for diff(a, b) to return the biggest index where a, b differ (in case a = b we ask for diff(a, b) to be the minimum index 0). Since it is natural to view arrays as functions defined on initial intervals of the nonnegative integers, this choice has a clear semantic motivation. The expressive power of the theory of arrays so enriched becomes bigger: for instance, if we also add to the language a constant symbol for the undefined array constantly equal to some 'undefined' value ⊥ (where ⊥ is meant to be different from the values a[i] actually in use), then we can define |a| as diff(a, ). In this way we can model the fact that a is undefined outside the interval [ 0, |a| ] -this is useful to formalize the above mentioned SV-COMP benchmarks.

The effectiveness of quantifier-free interpolation in the theory of arrays with maxdiff is exemplified in the simple example of Figure 1 : the invariant certifying the assert in line 7 of the Strcpy algorithm can be obtained taking a suitable quantifier-free interpolant out of the spurious trace (1) already for n = 2. In more realistic examples, as witnessed by current research [2, 3, 4, 5, 16, 22, 25, 13] , it is quite clear that useful invariants require universal quantifiers to be expressed and if undecidable fragments are invaded, incomplete solvers must be used. However, even in such circumstances, quantifier-free interpolation does not lose its interest: for instance, the tool Booster [5] 6 synthesizes universally quantified invariants out of quantifer-free interpolants (quantifier-free interpolation problems are generated by negating and skolemizing universally quantified formulae arising during invariants search, see [4] for details). 

Strcpy function: code and associated transition system (with program counter missed in the latter for simplicity).

Proving that the theory of arrays with the above 'maxdiff' operation enjoys quantifier-free interpolation revealed to be a surprisingly difficult task. In the end, the interpolation algorithm we obtain resembles the interpolation algorithms generated via the hierarchic locality techniques introduced in [35, 36] and employed also in [37] ; however, its correctness, completeness and termination proofs require a large détour going through non-trivial model-theoretic arguments (these arguments do not substantially simplify adopting the complex framework of 'amalgamation closures' and 'W -separability' of [37] , and that is the reason why we preferred to supply direct proofs).

This paper concentrates on theoretical and methodological results, rather than on experimental aspects. It is almost completely dedicated to the correctness and completeness poof of our interpolation algorithm: in Subsection 3.1 we summarize our proof plan and supply basic intuitions. The paper is structured as follows: in Section 2 we recall some background, in Section 3 we introduce our theory of arrays with maxdiff; Sections 4 and 5 supply the semantic proof of the amalgamation theorem; Sections 6 and 7 are dedicated to the algorithmic aspects, whereas Section 8 analyzes complexity for the restricted case where indexes are constrained by the theory of total orders. In the final Section 9, we mention some still open problems. The main results in the paper are Theorems 2,4,5: for space reasons, all proofs of these theorems will be only sketched, full details are nevertheless supplied in the online available extended version [21] . This extended version contains additional material on complexity analysis and implementation. It contains also a proof about nonexistence of uniform interpolants (see [26, 27, 20, 10, 11, 12] for the definition and more information on uniform interpolants).

We assume the usual syntactic (e.g., signature, variable, term, atom, literal, formula, and sentence) and semantic (e.g., structure, sub-structure, truth, satisfiability, and validity) notions of (possibly many-sorted) first-order logic. The equality symbol "=" is included in all signatures considered below. Notations like E(x) mean that the expression (term, literal, formula, etc.) E contains free variables only from the tuple x. A 'tuple of variables' is a list of variables without repetitions and a 'tuple of terms' is a list of terms (possibly with repetitions). Finally, whenever we use a notation like E(x, y) we implicitly assume not only that both the x and the y are pairwise distinct, but also that x and y are disjoint. A constraint is a conjunction of literals. A formula is universal (existential ) iff it is obtained from a quantifier-free formula by prefixing it with a string of universal (existential, resp.) quantifiers.

Theories and satisfiability modulo theory. A theory T is a pair (Σ, Ax T ), where Σ is a signature and Ax T is a set of Σ-sentences, called the axioms of T (we shall sometimes write directly T for Ax T ). The models of T are those Σ-structures in which all the sentences in Ax T are true. A Σ-formula φ is T -satisfiable (or T -consistent) if there exists a model M of T such that φ is true in M under a suitable assignment a to the free variables of φ (in symbols, (M, a) |= φ); it is T -valid (in symbols, T ϕ) if its negation is T -unsatisfiable or, equivalently, ϕ is provable from the axioms of T in a complete calculus for first-order logic. A theory T = (Σ, Ax T ) is universal iff all sentences in Ax T are universal. A formula ϕ 1 T -entails a formula ϕ 2 if ϕ 1 → ϕ 2 is T -valid (in symbols, ϕ 1 T ϕ 2 or simply ϕ 1 ϕ 2 when T is clear from the context). If Γ is a set of formulae and φ a formula, Γ T φ means that there are γ 1 , . . . , γ n ∈ Γ such that γ 1 ∧· · ·∧γ n T φ. The satisfiability modulo the theory T (SMT(T )) problem amounts to establishing the T -satisfiability of quantifier-free Σ-formulae (equivalently, the T -satisfiability of Σ-constraints). A theory T admits quantifier-elimination iff for every formula φ(x) there is a quantifier-free formula φ (x) such that T φ ↔ φ .

Some theories have special names, which are becoming standard in SMTliterature; for instance, EUF (Σ) is the pure equality theory in the signature Σ (this is commonly abbreviated as EUF if there is no need to specify the signature Σ). More standard theory names will be recalled during the paper.

Embeddings and sub-structures The support of a structure M is denoted with |M|. For a (sort, function, relation) symbol σ, we denote as σ M the interpretation of σ in M. An embedding is a homomorphism that preserves and reflects relations and operations (see, e.g., [14] ). Formally, a Σ-embedding (or, simply, an embedding) between two Σ-structures M and N is any mapping µ : |M| −→ |N | satisfying the following three conditions: (a) it is a (sortpreserving) injective function; (b) it is an algebraic homomorphism, that is for every n-ary function symbol f and for every a 1 , . . . , a n ∈ |M|, we have f N (µ(a 1 ), . . . , µ(a n )) = µ(f M (a 1 , . . . , a n )); (c) it preserves and reflects predicates, i.e. for every n-ary predicate symbol P , we have (a 1 , . . . , a n ) ∈ P M iff (µ(a 1 ), . . . , µ(a n )) ∈ P N . If |M| ⊆ |N | and the embedding µ : M −→ N is just the identity inclusion |M| ⊆ |N |, we say that M is a substructure of N or that N is a superstructure of M. As it is known, the truth of a universal (resp. existential) sentence is preserved through substructures (resp. superstructures).

A theory T is stably infinite iff every T -satisfiable quantifier-free formula (from the signature of T ) is satisfiable in an infinite model of T . By compactness, it is possible to show that T is stably infinite iff every model of T embeds into an infinite one (see, e.g., [17] ). A theory T is convex iff for every conjunction of literals δ, if δ T n i=1 x i = y i then δ T x i = y i holds for some i ∈ {1, ..., n}. Let T i be a stably-infinite theory over the signature Σ i such that the SM T (T i ) problem is decidable for i = 1, 2 and such that Σ 1 and Σ 2 are disjoint (i.e. the only shared symbol is equality). Under these assumptions, the Nelson-Oppen combination result [33] says that the SMT problem for the combination T 1 ∪ T 2 of the theories T 1 and T 2 is decidable.

Interpolation properties. Craig's interpolation theorem [14] roughly states that if a formula φ implies a formula ψ then there is a third formula θ, called an interpolant, such that φ implies θ, θ implies ψ, and every non-logical symbol in θ occurs both in φ and ψ. Our interest is to specialize this result to the computation of quantifier-free interpolants modulo (combinations of) theories.

[Plain quantifier-free interpolation] A theory T admits (plain) quantifier-free interpolation (or, equivalently, has quantifier-free interpolants) iff for every pair of quantifier-free formulae φ, ψ such that ψ ∧ φ is T -unsatisfiable, there exists a quantifier-free formula θ, called an interpolant, such that: (i) ψ T -entails θ, (ii) θ ∧ φ is T -unsatisfiable, and (iii) only the variables occurring in both ψ and φ occur in θ.

In verification, the following extension of Definition 1 is considered more useful.

[General quantifier-free interpolation] Let T be a theory in a signature Σ; we say that T has the general quantifier-free interpolation property iff for every signature Σ (disjoint from Σ) and for every pair of ground Σ ∪ Σformulae φ, ψ such that φ∧ψ is T -unsatisfiable 7 , there is a ground formula θ such that: (i) φ T -entails θ; (ii) θ ∧ ψ is T -unsatisfiable; (iv) all relations, constants and function symbols from Σ occurring in θ also occur in φ and ψ.

By replacing free variables with free constants, it should be clear that general quantifier-free interpolation (Definition 2) implies plain quantifier-free interpolation (Definition 1); however, the converse implication does not hold.

Amalgamation and strong amalgamation. Interpolation can be characterized semantically via amalgamation. A universal theory T has the strong amalgamation property if the above embeddings µ 1 , µ 2 and the above model M can be chosen so to satisfy the following additional condition: if, for some m 1 ∈ |M 1 |, m 2 ∈ |M 2 |, µ 1 (m 1 ) = µ 2 (m 2 ) holds, then there exists an element a in |A| such that m 1 = a = m 2 .

The first statement of the following theorem is an old result due to [6] ; the second statement is proved in [9] (where it is also suitably reformulated for theories which are not universal): Theorem 1. Let T be a universal theory. Then (i) T has the amalgamation property iff it admits quantifier-free interpolants; (ii) T has the strong amalgamation property iff it has the general quantifier-free interpolation property.

We underline that, in presence of stable infiniteness, strong amalgamation is a modular property (in the sense that it transfers to signature-disjoint unions of theories), whereas amalgamation is not (see again [9] for details).

The McCarthy theory of arrays [30] has three sorts ARRAY, ELEM, INDEX (called "array", "element", and "index" sort, respectively) and two function symbols rd ("read") and wr ("write") of appropriate arities; its axioms are: ∀y, i, e. rd(wr(y, i, e), i) = e ∀y, i, j, e. i = j → rd(wr(y, i, e), j) = rd(y, j).

The McCarthy theory of arrays with extensionality has the further axiom

called the 'extensionality' axiom. The theory of arrays with extensionality is not universal and quantifier-free interpolation fails for it [28] . In [8] a variant of the McCarthy theory of arrays with extensionality, obtained by Skolemizing the axioms of extensionality, is introduced. This variant of the theory turns out to be universal and to enjoy quantifier-free interpolation. However, the Skolem function introduced in [8] is generic, here we want to make it more informative, so as to return the biggest index where two different arrays differ. To locate our contribution in the general context, we need the notion of an index theory.

Definition 4. An index theory T I is a mono-sorted theory (let INDEX be its sort) satisfying the following conditions: -T I is universal, stably infinite and has the general quantifier-free interpolation property (i.e. it is strongly amalgamable, see Theorem 1); -SM T (T I ) is decidable; -T I extends the theory T O of linear orderings with a distinguished element 0.

We recall that T O is the theory whose only proper symbols (beside equality) are a binary predicate ≤ and a constant 0 subject to the axioms saying that ≤ is reflexive, transitive, antisymmetric and total (the latter means that i ≤ j ∨ j ≤ i holds for all i, j). Thus, the signature of an index theory T I contains at least the binary relation symbol ≤ and the constant 0. In the paper, by a T I -term, T I -atom, T I -formula, etc. we mean a term, atom, formula in the signature of T I . Below, we use the abbreviation i < j for i ≤ j ∧ i = j. The constant 0 is meant to separate 'formally positive' indexes -those satisfying 0 ≤ i -from the remaining 'formally negative' ones.

Examples of index theories are T O itself, integer difference logic IDL, integer linear arithmetic LIA, and real linear arithmetics LRA. In order to match the requirements of Definition 4, one must however make a careful choice of the language, see [9] for details: the most important detail is that integer (resp. real) division by all positive integers should be added to the language of LIA (resp. LRA). For most applications, IDL (namely the theory of integer numbers with 0, ordering, successor and predecessor) 8 suffices as in this theory one can model counters for scanning arrays.

Given an index theory T I , we now introduce our array theory with maxdiff ARD(T I ) (parameterized by T I ) as follows. We still have three sorts ARRAY, ELEM, INDEX; the language includes the symbols of T I , the read and write operations rd, wr, a binary function diff of type ARRAY × ARRAY → INDEX, as well as constants and ⊥ of sorts ARRAY and ELEM, respectively. The constant ⊥ models an undetermined (e.g. undefined, not-in-use, not coming from appropriate initialization, etc.) value and ε models the totally undefined array; the term diff(x, y) returns the maximum index where x and y differ and returns 0 if x and y are equal. 9 Formally, the axioms of ARD(T I ) include, besides the axioms of T I , the following ones:

In the read-over-write axiom (3), we put the proviso i ≥ 0 because we want all our arrays to be undefined on negative indexes (negative updates makes no sense and have no effect: by axiom (8), reading a negative index always produces ⊥).

We call AR ext (T I ) (the 'theory of arrays with extensionality parameterized by T I ') the theory obtained from ARD(T I ) by removing the symbol diff and by replacing the axioms (5)-(7) by the extensionality axiom (2) . Since the extensionality axioms follows from axiom (5), ARD(T I ) is an extension of AR ext (T I ).

As an effect of the above axioms, we have that an array x is undefined outside the interval [0, |x|], where |x| is defined as |x| := diff(x, ε). Typically, this interval is finite and in fact our proof of Theorem 3 below shows that any satisfiable constraint is satisfiable in a model where all such intervals (relatively to the variables involved in the constraint) are finite.

The next lemma is immediate from the axiomatization of ARD(T I ):

An atom of the form a = wr(b, i, e) is equivalent (modulo ARD) to

An atom of the form diff(a, b) = i is equivalent (modulo ARD(T I )) to

For our interpolation algorithm in Section 7, we need to introduce iterated diff operations, similarly to [37] . As we know diff(a, b) returns the biggest index where a and b differ (it returns 0 if a = b). Now we want an operator that returns the last-but-one index where a, b differ (0 if a, b differ in at most one index), an operator that returns the last-but-two index where a, b differ (0 is they differ in at most two indexes), etc. Our language is already enough expressive for that, so we can introduce such operators explicitly as follows. Given array variables a, b, we define by mutual recursion the sequence of array terms b 1 , b 2 , . . . and of index terms diff 1 (a, b), diff 2 (a, b) , . . . :

Intuitively, b k+1 is the same as b except for all k-last indexes on which a and b differ, in correspondence of which b k+1 has the same value as a. A useful fact is that conjunctions of formulae of the kind j<l diff j (a, b) = k j can be eliminated in favor of universal clauses in a language whose only symbol for array variables is rd. In detail:

is equivalent modulo ARD to the conjunction of the following five formulae:

The main result of the paper is that, for every index theory T I , the array theory with maxdiff ARD(T I ) indexed by T I enjoys quantifier-free interpolation and that interpolants can be computed hierarchically by relying on a black-box quantifier-free interpolation algorithm for the weaker theory T I ∪EUF (the latter theory has quantifier free interpolation because T I is strongly amalgamable and because of Theorem 1). In this subsection, we supply intuitions and we give a qualitative high-level view to our proofs: more technical details and full proofs can be found in [21] .

The algorithm.

By general easy transformations (recalled in Section 7 below), it is sufficient to be able to extract a quantifier-free interpolant out of a pair of quantifier-free formulae A, B such that (i) A∧B is ARD(T I )-inconsistent; (ii) both A and B are conjunctions of flat literals, i.e. of literals which are equalities between variables, disequalities between variables or literals of the form R(x), ¬R(x), f (x) = y (where x, y are variables, R is a predicate symbol and f a function symbol). Let us call common the variables occurring in both A and B. The fact that a quantifier-free interpolant exists intuitively means that there are two reasoners (an A-reasoner operating on formulae involving only the variables occurring in A and a B-reasoner operating on formulae involving only the variables occurring in B) that are able to discover the inconsistency of A∧B by exchanging information on the common language, i.e. by communicating each other only the entailed quantifier-free formulae involving the common variables.

A problem that can be addressed when designing an interpolation algorithm, is that there are infinitely many common terms that can be built up out of finitely many common variables and it may happen that some uncommon terms can be recognized to be equal to some common terms during the deductions performed by the A-reasoner and the B-reasoner.

As an example, suppose that A contains the literals c 1 = wr(c 2 , i, e), c 1 = c 2 , a = wr(c 3 , i, e), where only c 1 , c 2 , c 3 are common (i.e. only these variables occur in B). Then using diff operations, we can deduce i = diff(c 1 , c 2 ), e = rd(c 1 , i) so that in the end we can conclude that a is also 'common', being definable in term of common variables. Thus, the A-reasoner must communicate (via a defining common term or in some other indirect way) to the B-reasoner any fact it discovers about a, although a was not listed among the common variables since the very beginning. In more sophisticated examples, iterated diff operations are needed to discover 'hidden' common facts.

To cope with the above problem, our algorithm gives names i k = diff k (c 1 , c 2 ) to all the iterated diffs of common array variables c 1 , c 2 (the newly introduced names i k are considered common and can be replaced back with their defining terms when the interpolants are computed at the end of the algorithm).

The second component of our algorithm is instantiation. Both the A-and the B-reasoner use the content of Lemmas 1 and 2 in order to handle atoms of the kind a = b, a 1 = wr(a 2 , i, e), i = diff k (a 1 , a 2 ). Whenever they come across such atoms, the equivalent formulae supplied by these lemmas are taken into consideration; in fact, whenever the lemmas produce universally quantified clauses of the kind ∀h C, they replace in C the universally quantified index variable h by all possible instantiations with their own index terms (these are the terms built up from index variables occurring in A for the A-reasoner and occurring in B for the B-reasoner respectively). Such instantiations can be read as clauses in the language of T I ∪ EUF if we replace every array variable a by a fresh unary function symbol f a and read terms like rd(a, i) as f a (i).

Of course both the production of names for iterated diff-terms and the instantiation with owned index terms need to be repeated (possibly, infinitely many times); we prove however (this is the content of our main Theorem 4 below) that if A ∧ B is ARD(T I )-inconsistent, then sooner or later the union of the sets of the clauses deduced by the A-reasoner and the B-reasoner in the restricted signature of T I ∪ EUF is T I ∪ EUF -inconsistent, i.e., the instantiation process terminates. This means that an interpolant can be extracted, using a black-box quantifier-free interpolation algorithm for the weaker theory T I ∪ EUF . In the simple case where T I is just the theory T O of total orders, we shall prove in Section 8 that a quadratic number of instantiations always suffices. In the general case, however, the situation is similar to the statement of Herbrand theorem: finitely many instantiations suffice to get an inconsistency proof in the weaker logical formalism, but a bound cannot be given.

Theorem 4 is proved in a contrapositive way: we show that if a T I ∪ EUFinconsistency never arises, then A ∧ B is ARD(T I )-consistent. This is proved in two steps: if T I ∪ EUF -inconsistency does not arise, we produce two ARD(T I )models A and B, where A satisfies A and B satisfies B. Moreover, A and B are built up in such a way that they share the same ARD(T I )-substructure. In the second step, we prove the amalgamation theorem for ARD(T I ), so that the amalgamated model will produce the desired model of A ∧ B. In fact, the two steps are inverted in our exposition: we first prove the amalgamation theorem in Section 5 (Theorem 2) and then our main theorem in Section 7 (Theorem 4).

We preliminarily discuss the class of models of ARD(T I ) and we make important clarifications about embeddings between such models. An embedding µ : M −→ N between AR ext (T I )-models is said to be difffaithful iff whenever diff(a, b) is defined so is diff(µ(a), µ(b)) and it is equal to µ (diff(a, b) ). Since there might not be a maximum index where a, b differ, in principle it is not always possible to expand a functional model of AR ext (T I ) to a functional model of ARD(T I ), keeping the set of indexes unchanged. Indeed, in order to do that in a diff-faithful way, one needs to explicitly add to INDEX M new indexes including at least indexes representing the missing maximum indexes where two given array differ. This idea is used in the following lemma (proved in the online available extended version [21] ):

Lemma 3. For every index theory T I , every model of AR ext (T I ) has a difffaithful embedding into a model of ARD(T I ).

We now sketch the proof of the amalgamation property for ARD(T I ). We recall that strong amalgamation holds for models of T I (see Definition 4). If we succeed, the claim follows by Lemma 3: indeed, thanks to that lemma, we can embed in a diff-faithful way M (which is a model of AR ext (T I )) to a model M of ARD(T I ), which is the required ARD(T I )-amalgam.

We take the T I -reduct of M to be a model supplied by the strong amalgamation property of T I (again, we can freely assume that the T I -reducts of M 1 , M 2 identically include in it); we let ELEM M to be ELEM M1 ∪ ELEM M2 . We need to define ν i : M i −→ M (i = 1, 2) in such a way that ν i is diff-faithful and ν 1 •µ 1 = ν 2 •µ 2 . We take the INDEX and the ELEM-components of ν 1 , ν 2 to be just identical inclusions. The only relevant point is the action of ν i on ARRAY Mi : since we have strong amalgamation for indexes, in order to define it, it is sufficient to extend any a ∈ ARRAY Mi to all the indexes k ∈ (

; the definition for such k is as follows: (*) we let ν i (a)(k) be equal to µ 3−i (c)(k), where c is any array c ∈ ARRAY N for which there is a ∈ ARRAY Mi such that a ∼ Mi a and such that the relation k > diff Mi (a , µ i (c)) holds in INDEX M ; 10 if such c does not exist, then we put ν i (a)(k) = ⊥.

Definition (*) is forced by some constraints that ν i (a)(k) must satisfy. Of course, definition (*) itself needs to be justified: besides showing that it enjoys the required properties, we must also prove that it is well-given (i.e. that it does not depend on the selected c and a ). It is easy to see that, if the definition is correct, then we have ν 1 • µ 1 = ν 2 • µ 2 ; also, it is clear that ν i preserves read and write operations (hence, it is a homomorphism) and is injective. For (i) justifying the definition of ν i and (ii) showing that it is also diff-faithful, we need to show the following two claims (the proof is not easy, see the extended version [21] for details) for arrays a 1 , a 2 ∈ ARRAY M 1 , for an index k ∈ (INDEX M2 \ INDEX N ) and for arrays c 1 , c 2 ∈ ARRAY N (checking the same facts in M 2 is symmetrical): M1 (a 1 , a 2 ) , then ν 1 (a 1 )(k) = ν 1 (a 2 )(k).

The key step of the interpolation algorithm that will be proposed in Section 7 depends upon the problem of checking satisfiability (modulo ARD(T I )) of quantifier-free formulae; this will be solved in the present section by adapting instantiation techniques, like those from [7] .

We define the complexity c(t) of a term t as the number of function symbols occurring in t (thus variables and constants have complexity 0). A flat literal L is a formula of the kind

where the x i are variables, R is a relation symbol, and t is a term of complexity less or equal to 1. If I is a set of T I -terms, an I-instance of a universal formula of the kind ∀i φ is a formula of the kind φ(t/i) for some t ∈ I.

A pair of sets of quantifier-free formulae Φ = (Φ 1 , Φ 2 ) is a separated pair iff (1) Φ 1 contains equalities of the form diff k (a, b) = i and a = wr(b, i, e); moreover if it contains the equality diff k (a, b) = i, it must also contain an equality of the form diff l (a, b) = j for every l < k; (2) Φ 2 contains Boolean combinations of T I -atoms and of atoms of the forms: rd(a, i) = rd(b, j), rd(a, i) = e, e 1 = e 2 ,

where a, b, i, j, e, e 1 , e 2 are variables or constants of the appropriate sorts. The separated pair is said to be finite iff Φ 1 and Φ 2 are both finite.

In practice, in a separated pair Φ = (Φ 1 , Φ 2 ), reading rd(a, i) as a functional application, it turns out that the formulae from Φ 2 can be translated into quantifier-free formulae of the combined theory T I ∪ EUF (the array variables occurring in Φ 2 are converted into free unary function symbols). T I ∪ EUF enjoys the decidability of the quantifier-free fragment and has quantifier-free interpolation because T I is an index theory (see Nelson-Oppen results [33] and Theorem 1): we adopt a hierarchical approach (similar to [35, 36] ) and we rely on satisfiability and interpolation algorithms for such a theory as black boxes.

Let I be a set of T I -terms and let Φ = (Φ 1 , Φ 2 ) be a separated pair; we let Φ(I) = (Φ 1 (I), Φ 2 (I)) be the smallest separated pair satisfying the following conditions: -Φ 1 (I) is equal to Φ 1 and Φ 2 (I) contains Φ 2 ; -Φ 2 (I) contains all I-instances of the two formulae

where a is any array variable occurring in Φ 1 or Φ 2 ; -if Φ 1 contains the atom a = wr(b, i, e) then Φ 2 (I) contains all the I-instances of the formulae (11); -if Φ 1 contains the conjunction l i=1 diff i (a, b) = k i , then Φ 2 (I) contains the formulae (14), (15) , (16) , (17) as well as all I-instances of the formula (18) .

is the set of T I -terms of complexity at most M built up from the index variables occurring in Φ 1 , Φ 2 . The full in-

and let Φ 2 be empty. Then (Φ 1 , Φ 2 ) is a separated pair; 0-instantiating it adds to Φ 2 the following formulae (we delete those which are redundant)

The following results are proved in the extended version [21] :

Lemma 4. Let φ be a quantifier-free formula; then it is possible to compute finitely many finite separation pairs Φ 1 = (Φ 1 1 , Φ 1 2 ), . . . , Φ n = (Φ n 1 , Φ n 2 ) such that φ is ARD(T I )-satisfiable iff so is one of the Φ i . Lemma 5. The following conditions are equivalent for a finite separation pair

Theorem 3. The SM T (ARD(T I )) problem is decidable for every index theory T I (i.e. for every theory satisfying Definition 4).

Concerning the complexity of the above procedure, notice that the satisfiability of the quantifier-free fragment of common index theories (like IDL, LIA, LRA) is decidable in NP; as a consequence, from the above proof we get (for such index theories) also an NP bound for our SM T (ARD(T I )))-problems because 0-instantiation is clearly finite and polynomial. The fact that 0-instantiation suffices is a common feature of the above satisfiability procedure and of the satisfiability procedures from [7] . Unfortunately, when coming to interpolation algorithms in the next section, there is no evidence that 0-instantiation suffices.

Since amalgamation is equivalent to quantifier-free interpolation for universal theories like ARD(T I ) (see Theorem 1), Theorem 2 ensures that ARD(T I ) has the quantifier-free interpolation property. However, the proof of Theorem 2 is not constructive, so in order to compute an interpolant for an ARD(T I )-unsatisfiable conjunction like ψ(x, y) ∧ φ(y, z), one should enumerate all quantifier-free formulae θ(y) which are logical consequences of φ and are inconsistent with ψ (modulo ARD(T I )). Since the quantifier-free fragment of ARD(T I ) is decidable by Theorem 3, this is an effective procedure and, since interpolants of jointly unsatisfiable pairs of formulae exist, it also terminates. However, such kind of an algorithm is not practical.

In this section, we improve the situation by supplying a better algorithm based on instantiation (à-la-Herbrand). In the next section, using the results of the present section, for the special case where T I is just the theory of linear orders, we identify a complexity bound for this algorithm.

Our problem is the following: given two quantifier-free formulae A and B such that A ∧ B is not satisfiable (modulo ARD(T I )), to compute a quantifierfree formula C such that ARD(T I ) |= A → C, ARD(T I ) |= C ∧ B → ⊥ and such that C contains only the variables (of sort INDEX, ARRAY, ELEM) which occur both in A and in B.

We call the variables occurring in both A and B common variables, whereas the variables occurring in A (resp. in B) are called A-variables (resp. Bvariables). The same terminology applies to terms, atoms and formulae: e.g., a term t is an A-term (B-term, common term) iff it is built up from A-variables (B-variables, common variables, resp.).

The following operations can be freely performed (see [9] or [8] for details): (i) pick an A-term t and a fresh variable a (of appropriate sort) and conjoin A to a = t (a will be considered an A-variable from now on); (ii) pick a B-term t and a fresh variable b (of appropriate sort) and conjoin B to b = t (b will be considered a B-variable from now on); (iii) pick a common term t and a fresh variable c (of appropriate sort) and

conjoin both A and B to c = t (c will be considered a common variable from now on); (iv) conjoin A with some quantifier-free A-formula which is implied (modulo ARD(T I )) by A;

(v) conjoin B with some quantifier-free B-formula which is implied (modulo ARD(T I )) by B. Operations (i)-(v) either add logical consequences or explicit definitions that can be eliminated (if desired) after the final computation of the interpolant. In addition, notice that if A is the form A ∨ A (resp. B is of the form B ∨ B ) then from interpolants of A ∧ B and A ∧ B (resp. of A ∧ B and A ∧ B ), we can recover an interpolant of A ∧ B by taking disjunction (resp. conjunction).

Because of the above remarks, using the procedure in the proof of Lemma 4, both A and B are assumed to be given in the form of finite separated pairs. Thus A is of the form A 1 ∧ A 2 , B is of the form B 1 ∧ B 2 , for separated pairs (A 1 , A 2 ) and (B 1 , B 2 ). Also, by (iv)-(v) above, A and B are assumed to be both 0-instantiated. We call A (resp. B) the separated pair (A 1 , A 2 ) (resp. (B 1 , B 2 ) ). We also use the letters A 1 , A 2 , B 1 , B 2 both for sets of formulae and for the corresponding conjunctions; similarly, A represent both the pair (A 1 , A 2 ) and the conjunction A 1 ∧ A 2 (and similarly for B).

The formulae from A 2 and B 2 are formulae from the signature of T I ∪ EUF (after rewriting terms of the kind rd(a, i) to f a (i), where the f a are free function symbols). Of course, if A 2 ∧B 2 is T I ∪EUF -inconsistent, we can get our quantifierfree interpolant by using our black box algorithm for interpolation in the weaker theory T I ∪EUF : recall that T I ∪EUF has quantifier-free interpolation because T I is an index theory and for Theorem 1. The remarkable fact is that A 2 ∧B 2 always becomes T I ∪ EUF -inconsistent if sufficiently many diffs among common array variables are introduced and sufficiently many instantiations are performed.

Formally, we shall apply the loop below until A 2 ∧B 2 becomes inconsistent : the loop is justified by (i)-(v) above and Theorem 4 guarantees that A 2 ∧ B 2 eventually becomes inconsistent modulo T I ∪ EUF , if A ∧ B was originally inconsistent modulo ARD(T I ). When A 2 ∧B 2 becomes inconsistent modulo T I ∪EUF , we can get our interpolant using the interpolation algorithm for T I ∪ EUF . [Of course, in the interpolant returned by T I ∪ EUF , the extra variables introduced by the explicit definitions from (iii) above need to be eliminated.] We need a counter M recording how many times the Loop below has been executed (initially M = 0).

Loop (to be repeated until A 2 ∧ B 2 becomes inconsistent modulo T I ∪ EUF ). Pick two distinct common ARRAY-variables c 1 , c 2 and n ≥ 1 and s.t. no conjunct of the kind diff n (c 1 , c 2 ) = k occurs in both A 1 and B 1 for some n ≥ 1 (but s.t. for every l < n there is a conjunct of the form diff l (a, b) = k occurring in both A 1 and B 1 ). Pick also a fresh INDEX constant k n ; conjoin diff n (c 1 , c 2 ) = k n to both A 1 and B 1 ; then M -instantiate both A and B. Increase M to M + 1.

Notice that the fresh index constants k n introduced during the loop are considered common constants (they come from explicit definitions like (iii) above) and so they are considered in the M -instantiation of both A and B.

Example 2. Let A be the formula Φ 1 from Example 1 and let B be

B is 0-instantiated; 0-instantiating A produces the formulae shown in Example 1. The loop needs to be executed twice; it adds the literals diff 0 (c 1 , c 2 ) = k 0 , diff 1 (c 1 , c 2 ) = k 1 ; 0-instantiation produces formulae A 2 , B 2 whose conjunction is T I ∪EUF -inconsistent (inconsistency can be tested via an SMT-solver like z3 or MathSat, see the ongoing implementation [1] ). The related T I ∪ EUFinterpolant (once k 0 and k 1 are replaced by diff 0 (c 1 , c 2 ) and diff 1 (c 1 , c 2 ), respectively) gives our ARD(T I )-interpolant.

Theorem 4. If A∧B is ARD(T I )-inconsistent, then the above loop terminates.

Proof. Suppose that the loop does not terminate and let A = (A 1 , A 2 ) and B = (B 1 , B 2 ) be the separated pairs obtained after infinitely many executions of the loop (they are the union of the pairs obtained in each step). Notice that both A and B are fully instantiated. 12 We claim that (A , B ) is ARD(T I )-consistent (contradicting the assumption that (A, B) was already ARD(T I )-inconsistent).

Since no contradiction was found, by compactness of first-order logic, A 2 ∪B 2 has a T I ∪ EUF -model M (below we treat index and element variables occurring in A, B as free constants and the array variables occurring in A, B as free unary function symbols). M is a two-sorted structure (the sorts are INDEX and ELEM) endowed for every array variable a occurring in A, B of a function a M : INDEX M −→ ELEM M . In addition, INDEX M is a model of T I . We build three ARD(T I )-structures A, B, C and two embeddings µ 1 : C −→ A, µ 2 : C −→ B such that A |= A , B |= B and such that for every common variable x we have µ 1 (x C ) = x A and µ 2 (x C ) = x B . The consistency of A ∪ B then follows from the amalgamation Theorem 2. The two structures A, B are obtained by taking the full functional model induced by the restriction of M to the interpretation of A-terms and B-terms (respectively) of sort INDEX, ELEM and then by applying Lemma 3; the construction of C requires some subtleties, to be detailed in the extended version [21] , where the full proof of the theorem is provided.

Comparing the results from Sections 7 and 6, a striking difference emerges: whereas variable and constant instantiations are sufficient for satisfiability checking, our interpolation algorithm requires full instantiation over all common terms. Such a full instantiation might be quite impractical, especially in index theories like LIA and LRA (it is less annoying in theories like IDL: here all terms are of the kind S n (x) or P n (x), where x is a variable or 0 and S, P are the successor and the predecessor functions). The problem disappears in simpler theories like the theory of linear orders T O, where all terms are variables (or the constant 0). Still, even in the case of T O, the proof of Theorem 4 does not give a bound for termination of the interpolation algorithm: we know that sooner or later an inconsistency will occur, but we do not know how many times we need to execute the main loop. We now improve the proof of Theorem 4 by supplying the missing bound. In this section, the index theory is fixed to be T O and we abbreviate ARD(T O) as ARD. The full proof of the theorem below is in [21] . Theorem 5. If A ∧ B is inconsistent modulo ARD, then the above loop terminates in at most ( m 2 −m 2 ) · (n + 1) steps, where n is the number of the index variables occurring in A, B and m is the number of the common array variables.

Proof. We sketch a proof of the theorem: the idea is that if after N := ( m 2 −m 2 ) · (n+1) steps no inconsistency occurs, then we can run the algorithm for infinitely many further steps without finding an inconsistency either. ) is the number of distinct unordered pairs of common array variables, so the pair (c 1 , c 2 ) has been examined more than n times). In M, some index variable k l for l ≤ k n+1 , if not assigned to 0, is assigned to an element x which is different from the elements assigned to the n variables occurring in A, B. This allows us to enlarge M to a superstructure which is a model of A N +1 2 ∧ B N +1 2 by 'duplicating' x. Continuing in this way, we produce a chain of T O ∪ EUF -models witnessing that we can run infinitely many steps of the algorithm without finding an inconsistency.

We studied an extension of McCarthy theory of arrays with a maxdiff symbol. This symbol produces a much more expressive theory than the theory of plain diff symbol already considered in the literature [8, 37] .

We have also considered another strong enrichment, namely the combination with arithmetic theories like IDL, LIA, LRA, . . . (all such theories are encompassed by the general notion of an 'index theory'). Such a combination is non trivial because it is a non disjoint combination (the ordering relation is in the shared signature) and does not fulfill the T 0 -compatibility requirements of [17, 19, 18] needed in order to modularly import satisfiability and interpolation algorithms from the component theories.

The above enrichments come with a substantial cost: although decidability of satisfiability of quantifier-free formulae is not difficult to obtain, quantifierfree interpolation becomes challenging. In this paper, we proved that quantifierfree interpolants indeed do exist: the interpolation algorithm is indeed rather simple, but its justification comes via a complicated détour involving semantic investigations on amalgamation properties.

The interpolation algorithm is based on hierarchic reduction to general quantifier-free interpolation in the index theory. The reduction requires the introduction of iterated diff terms and a finite number of instantiations of the universal clauses associated to write and diff-atoms. For the simple case where the index theory is just the theory of total orders, we were able to polynomially bound the depth of the iterated diff terms to be introduced as well as the number of instantiations needed. The main open problem we leave for future is the determination of analogous bounds for richer index theories.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Lazy abstraction with interpolants for arrays

SAFARI: SMT-based abstraction for arrays with interpolants

An extension of lazy abstraction with interpolation for programs with arrays

Booster: An acceleration-based verification framework for array programs

Amalgamation properties and interpolation theorems for equational theories

What's decidable about arrays?

Quantifier-free interpolation of a theory of arrays

Quantifier-free interpolation in combinations of equality interpolating theories

Model completeness, covers and superposition

Combined covers and Beth definability

Model completeness, uniform interpolants and superposition calculus (with applications to verificaton of data-aware processes)

Verifying array manipulating programs with full-program induction

Model Theory

Three uses of the Herbrand-Gentzen theorem in relating model theory and proof theory

Quantified invariants via syntax-guided synthesis

Model theoretic methods in combined constraint satisfiability

Interpolation, amalgamation and combination (the nondisjoint signatures case)

Modularity results for interpolation, amalgamation and superamalgamation

Computing uniform interpolants for EUF via (conditional) DAG-based compact representations

Interpolation and amalgamation for Arrays with MaxDiff (extended version)

Quantifiers on demand

Efficient interpolation for the theory of arrays

Constructing Craig interpolation formulas

Putting the squeeze on array programs: Loop verification via inductive rank reduction

Nonlinear polynomials, interpolants and invariant generation for system analysis

Conditional congruence closure over uninterpreted and interpreted symbols

Interpolation for Data Structures

Interpolating strong induction

Towards a Mathematical Science of Computation

Interpolation and SAT-based model checking

Lazy abstraction with interpolants

Simplification by Cooperating Decision Procedures

Lower bounds for resolution and cutting plane proofs and monotone computations

Interpolation in local theory extensions

On interpolation and symbol elimination in theory extensions

Complete instantiation-based interpolation

Interpolating property directed reachability