key: cord-0282558-dxojlezj
authors: Rummer, Philipp
title: Competition Report: CHC-COMP-20
date: 2020-08-07
journal: nan
DOI: 10.4204/eptcs.320.15
sha: 0a6bbe4fc686d69aceaa734006da4130dc4a1f07
doc_id: 282558
cord_uid: dxojlezj

CHC-COMP-20 is the third competition of solvers for Constrained Horn Clauses. In this year, 9 solvers participated at the competition, and were evaluated in four separate tracks on problems in linear integer arithmetic, linear real arithmetic, and arrays. The competition was run in the first week of May 2020 using the StarExec computing cluster. This report gives an overview of the competition design, explains the organisation of the competition, and presents the competition results.

Constrained Horn Clauses (CHC) have over the last decade emerged as a uniform framework for reasoning about different aspects of software safety [10, 2] . Constrained Horn clauses form a fragment of first-order logic, modulo various background theories, in which models can be constructed effectively with the help of model checking algorithms. Horn clauses can be used as an intermediate verification language that elegantly captures various classes of systems (e.g., sequential code, programs with functions and procedures, concurrent programs, or networks of timed automata) and various verification methodologies (e.g., the use of state invariants, verification with the help of contracts, Owicki-Gries-style invariants, or rely-guarantee methods). Horn solvers can be used as off-the-shelf back-ends in verifiers, and thus enable construction of verification systems in a modular way.

CHC-COMP-20 is the third competition of solvers for Constrained Horn Clauses, a competition affiliated with the 7th Workshop on Horn Clauses for Verification and Synthesis (HCVS) at ETAPS 2020. The goal of CHC-COMP is to compare state-of-the-art tools for Horn solving with respect to performance and effectiveness on realistic, publicly available benchmarks. The deadline for submitting solvers to CHC-COMP-20 was April 30 2020, resulting in 9 solvers participating, which were evaluated in the first week of May 2020. The 9 solvers were evaluated in four separate tracks on problems in linear integer arithmetic, linear real arithmetic, and the theory of arrays. The results of the competition can be found in Section 6 of this report.

Due to the Covid-19 crisis, both ETAPS 2020 and HCVS were postponed, and at the time of finalising this report no new dates had been set; this means that the present report is the main documentation of CHC-COMP-20 at the moment. It is planned, however, that the competition will be presented and discussed in detail also at the HCVS workshop when it takes place, either in physical or virtual form.

CHC-COMP-20 heavily builds on the infrastructure and scripts written for CHC-COMP-18 and CHC-COMP-19, run by Arie Gurfinkel and Grigory Fedyukovich, respectively. Contributors to the competition infrastructure also include Adrien Champion, Dejan Jovanovic, and Nikolaj Bjørner.

Like in the first two competitions, CHC-COMP-20 was run using StarExec [24] . We are extremely grateful for the computing resources and evaluation environment provided by StarExec, and for the fast and competent support by Aaron Stump 2 Brief Overview of the Competition Design

In CHC-COMP-20 the same tracks as in CHC-COMP-19 were used for evaluating solvers:

• LIA-nonlin: benchmarks with at least one non-linear clause, and linear integer arithmetic as background theory;

• LIA-lin: benchmarks with only linear clauses, and linear integer arithmetic as background theory;

• LIA-lin-arrays: benchmarks with only linear clauses, and the combined theory of linear integer arithmetic and arrays as background theory;

• LRA-TS: benchmarks encoding transition systems, with linear real arithmetic as background theory. Benchmarks in this track have exactly one uninterpreted relation symbol, and exactly three linear clauses encoding initial states, transitions, and error states.

The competition was run on 30 nodes provided by StarExec. Each node had two quadcore CPUs, and each node was used to run two jobs in parallel during the competition runs. The machine specs are: 

All solvers submitted to CHC-COMP-20 were evaluated twice:

• in a first set of test runs, in which pre-submissions of the solvers were evaluated to check their configurations and identify possible inconsistencies. For the test runs a smaller set of randomly selected benchmarks was used. The results of the test runs for each solver were directly communicated to the team submitting the solver, but not made public and not shared with other teams.

In the test runs, each solver-benchmark pair was limited to 600s CPU time, 600s wall-clock time, and 64GB memory.

• in the competition runs, the results of which determined the outcome of CHC-COMP-20. The selection of the benchmarks for the competition runs is described in Section 4, and the evaluation of the competition runs in Section 2.4. In the competition runs, each job was limited to 1800s CPU time, 1800s wall-clock time, and 64GB memory.

The ranking of solvers in each track was done based on the Score reached by the solvers in the competition run for that track. In case two solvers had equal Score, the ranking of the two solvers was determined by CPU time. It was assumed that the outcome of running one solver on one benchmark can only be sat, unsat, or unknown; the last outcome includes solvers giving up, running out of resources, or crashing.

The definition of Score and CPU time are:

• Score: the number of sat or unsat results produced by a solver on the benchmarks of a track.

• CPU time: the average CPU time needed by a solver to produce the result sat or unsat in some track, not counting the runtime for unknown results. This average time was computed based on the CPU time stamp reported by StarExec for the output of the result sat/unsat.

In addition, the following features were computed for each solver and each track:

• Wall-clock time: the average wall-clock time needed by a solver to produce the result sat or unsat in some track, not counting the runtime for unknown results.

• Speedup: the ratio CPU time / Wall-clock time.

• SotAC: the state-of-the-art contribution of a solver, computed in the same way as in the CADE ATP System Competition (CASC). The SotAC of a benchmark in some track is the inverse of the number of systems that reported sat or unsat for the benchmark. The SotAC of a solver is the average SotAC of the benchmarks it could solve.

CHC-COMP represents benchmarks in a fragment of the SMT-LIB 2.6 format. The fragment is defined on https://chc-comp.github.io/format.html. For CHC-COMP-20, several minor modifications were done in the format definition, in particular the role of nullary predicates was clarified. The conformance of a well-typed SMT-LIB script with the CHC-COMP fragment can be checked using the format-checker available on https://github.com/chc-comp/chc-tools.

All benchmarks used in CHC-COMP-20 were pre-processed using the format.py script available in the repository https://github.com/chc-comp/scripts, using the command line > python3 format.py --out_dir <outdir> --merge_queries True <smt-file>

The script tries to translate arbitrary Horn-like problems in SMT-LIB format to problems within the CHC-COMP fragment. Only benchmarks processed in this way were used in the competition. The option --merge_queries was added for CHC-COMP-20 to the script, and has the effect of merging multiple queries in a benchmark into a single query by introducing an auxiliary nullary predicate. For a discussing on this, see Section 3.3. After processing with format.py, benchmarks were checked and categorised into the four tracks using the format-checker scripts available on https://github.com/chc-comp/chc-tools.

Benchmarks that could not be processed by format.py, were rejected by the format-checker, or did not conform to any of the competition tracks, were not used in CHC-COMP-20.

In the CHC-COMP format, a query is a clause with the head false, and encodes the property to be verified in a benchmark. At the moment, the CHC-COMP fragment of SMT-LIB only allows benchmarks with exactly one query, which means that problems involving multiple queries have to be mapped to the single-query format. Of the CHC-COMP benchmarks, a significant number contains multiple queries.

In the past competitions, this was handled by splitting benchmarks with multiple queries into multiple single-query benchmarks (option --split_queries of format.py); in CHC-COMP-20, instead benchmarks with multiple queries were converted to the single-query format by introducing an auxiliary nullary predicate (or Boolean variable), and merging all queries to a single query (--merge_queries of format.py). The motivation for the new pre-processing is that splitting of queries sometimes makes benchmarks artificially hard: with multiple queries, often some of the queries can be disproven easily, while other queries are difficult to prove or disprove. Such problems, seen as a whole, are simple to solve, but splitting and considering all queries individually will lead to some hard benchmarks. Since the benchmarks used in CHC-COMP should represent realistic applications, this is an undesired effect.

In future editions of CHC-COMP, it might be even better to just allow benchmarks with multiple queries, and extend the CHC-COMP format accordingly. Many solvers are already now able to process benchmarks with multiple queries. Other solvers could decide themselves whether benchmarks with multiple queries should be handled by splitting or by merging; both translations are quite simple to implement in a solver, or alternatively a solver could invoke the existing script for pre-processing.

In contrast to most other competitions, CHC-COMP stores benchmarks in a decentralised way, in multiple repositories managed by the contributors of the benchmarks themselves. Table 1 summarises the number of benchmarks that were obtained by collecting benchmarks from all available repositories using the process in Section 3.2. Duplicate benchmarks were identified by computing a checksum for each (processed) benchmark, and were discarded.

The repository chc-comp19-benchmarks of benchmarks selected for CHC-COMP-19 was included in the collection, because this repository contains several unique families of benchmarks that are not available in other repositories under https://github.com/chc-comp. Such benchmarks include problems generated by the Ultimate tools in the LIA-lin-arrays track.

From jayhorn-benchmarks, only the problems generated for sv-comp-2020 were considered, which subsume the problems for sv-comp-2019.

This section describes how the benchmarks for CHC-COMP-20 were selected among the unique benchmarks summarised in Table 1 library only contains 501 and 499 unique benchmarks, respectively, which are small enough sets to use all benchmarks in the competition. For the tracks LIA-nonlin and LIA-lin, in contrast, too many benchmarks are available, so that a representative sample of the benchmarks had to be chosen.

To gauge the difficulty of the available problems in LIA-nonlin and LIA-lin, a simple rating based on the performance of the CHC-COMP-19 solvers was computed. For this, the two top-ranked competing solvers from CHC-COMP-19 were run for a few seconds on each of the LIA-nonlin and LIA-lin benchmarks:

• Eldarica: was run with a 5s timeout on each benchmark. 2 Eldarica runs entirely in managed code on a JVM; to avoid frequent JVM restarts, the daemon mode of Eldarica was used. Otherwise, the same binary and same options were chosen as in CHC-COMP-19.

• Ultimate Unihorn: was run with a slightly higher timeout of 8s, since the solver also partly runs in managed code, but (to the best of the organiser's knowledge) does not have a similar daemon mode as Eldarica. The same binary and options as in CHC-COMP-19 were used.

The outcomes of those test runs gave rise to three possible ratings for each benchmark:

• A: both tools were able to determine the benchmark status within the given time budget.

• B: only one tool could determine the benchmark status.

• C: both tools timed out. The number of benchmarks per rating are shown in Table 2 . As can be seen from the table, the simple rating method separates the benchmarks into partitions of comparable size, and provides some information about the relative hardness of the problems in the different repositories.

From each repository r, up to 3 · N r benchmarks were then selected randomly: N r benchmarks with rating A, N r benchmarks with rating B, and N r benchmarks with rating C. If a repository contained fewer than N r benchmarks for some particular rating, instead benchmarks with the next-higher rating were chosen. As special cases, up to N r benchmarks were selected from repositories with only A-rated benchmarks; up to 2 · N r benchmarks from repositories with only B-rated benchmarks; and up to 3 · N r benchmarks from repositories with only C-rated benchmarks.

The number N r was chosen individually for each repository, based on a manual inspection of the repository to judge the diversity of the contained benchmarks. The chosen N r , and the numbers of selected benchmarks for each repository, are given in Table 3 .

For the actual selection of benchmarks with rating X, the following Unix command was used: > cat <rating-X-benchmark-list> | sort -R | head -n <num>

The final set of benchmarks selected for CHC-COMP-20 can be found in the github repository https://github.com/chc-comp/chc-comp20-benchmarks, and on StarExec in the public space CHC/CHC-COMP/chc-comp20-benchmarks.

In total, 9 solvers were submitted to CHC-COMP-20: 8 competing solvers, and one further solver (Eldarica, co-developed by the competition organiser) that was entering outside of the competition. A summary of the participating solvers is given in Table 4 .

More details about the participating solvers are provided in the solver descriptions in Section 8. The binaries of the solvers used for the competition runs can be found in the public StarExec space CHC/CHC-COMP/chc-comp20-benchmarks. 

Detailed results for the four tracks can be found in Figures 1, 2, 3 , and 4.

LIA-lin-arrays: Only one case of inconsistent results was observed in the competition runs, namely the results for benchmark chc-LIA-lin-arrays_381.smt2 in the LIA-lin-arrays track. Ultimate Unihorn claimed that this benchmark is satisfiable, whereas Spacer reported that it is unsatisfiable; all other solvers timed out. This issue was discussed with the authors of the solvers. Spacer can produce an internal counterexample for the problem, but no counterexample for the complete input problem that could be verified by independent tools. Ultimate Unihorn does not have functionality to output models. No obvious bug was found in either of the tools. This means that there is no immediate way to establish the actual status of the benchmark beyond doubt, and no way to tell which of the solvers gave the right answer; but clearly one of the tools contains a bug.

Since no further inconsistencies were observed in the competition, in this particular case the organiser decided to remove the benchmark chc-LIA-lin-arrays_381.smt2 and not count Spacer's nor Ultimate Unihorn's answer. The issue highlights a problem in the competition design, however, which should be addressed in the next competition (see Section 7 for more thoughts about this). The performance of IC3IA increased slightly, and the new version reached the same score as Ultimate TreeAutomizer. The performance of ProphIC3, in contrast, decreased significantly, and ProphIC3 moved from the second to the third place.

In the preparation phase of CHC-COMP-20, there was a discussion with several teams about the use of CPU time vs. wall-clock time to define the resource budget of solvers, and the possibility to have separate tracks in future editions of CHC-COMP for parallel solvers (with wall-clock time budget and no limit on CPU time). For CHC-COMP-20, such tracks were not introduced in the end, and the primary resource limit was the available CPU time, but adding tracks for parallel solvers can be an interesting new feature for CHC-COMP-21, or beyond. It is meaningful to analyse the CHC-COMP-20 outcomes in this light. An immediate observation is that the three LIA tracks show very different behaviour with respect to runtime than the LRA-TS track. In the LIA tracks, the average CPU time of solvers is usually significantly below 100s, and the cactus plots show that the ranking of solvers hardly changes above 100s CPU time (Figures 1, 2, 3 ). Only few benchmarks in those tracks can be solved with CPU time above 600s. This means that CPU time (and therefore also wall-clock time) is essentially not a limiting factor in the LIA tracks, the current solvers already hardly utilise the available computation time. Since there is plenty of computation time available, already at this point solvers can use portfolios and run different configurations in parallel, as done by some of the participating solvers. To make specific parallel LIA tracks interesting, it would be necessary to change the scoring scheme of CHC-COMP: no longer count just the number of solved benchmarks, but also factor in the required wall-clock time in the ranking of solvers.

The situation is different in LRA-TS: in this track the average CPU time used by solvers is between 100s and 300s, and the cactus plots show interesting developments even after 600s CPU time. Since the maximum speed-up observed in LRA-TS is 2.93, the cactus plot for wall-clock performance should be interpreted only up to a time limit of around 600s, beyond 600s the results are limited by the available CPU time. Limiting the wall-clock time to 600s would indeed change the ranking of solvers, with the parallel computation used in pSally paying off, and other solvers could of course be optimised in a similar way. It would be an interesting experiment to repeat the evaluation of LRA-TS with unlimited CPU time, and wall-clock time limited to 1800s, which would amount to a parallel track.

In summary, for the next editions of CHC-COMP parallel tracks are mainly interesting for LRA-TS, where solvers can indeed utilise the available computation time. In the LIA tracks the solvers are mainly limited by the implemented algorithms and heuristics, and less by the available computation time.

The organiser would like to congratulate the winners of the four CHC-COMP-20 tracks, Spacer and IC3IA, and all solvers and people submitting solvers for the excellent performance this year! Thanks go to all people who have been helping with infrastructure, scripts, benchmarks, or in other ways, see the acknowledgements in the introduction; and to the HCVS workshop for hosting CHC-COMP! In order to keep CHC-COMP an interesting and relevant competition, the organiser also identified several questions and issues that should be discussed and addressed in the next editions:

• Duplicate benchmarks included in multiple repositories. The selection of benchmarks in CHC-COMP-20 probably contained some benchmarks repeatedly, for instance benchmarks from the repositories sv-comp or chc-comp19-benchmarks that were pre-processed using different tools initially, and thus not identified as duplicates. The outcome of the competition was probably not affected very much by this, but this is an issue that should be addressed in the long run. To keep the decentralised model of storing benchmarks in CHC-COMP, one could maintain a global list of all available unique benchmarks.

• A more systematic method for hardness rating of benchmarks is needed. Some rating is required to select interesting benchmarks, but any rating has the potential of changing the outcome of the competition. The use of Eldarica and Ultimate Unihorn (in the versions from CHC-COMP-19) for problem rating possibly puts the CHC-COMP-20 versions of those solvers at a disadvantage (or advantage) compared to other solvers not used for rating. For instance, if solvers A and B show completely uncorrelated performance on some set of benchmarks, then picking 50% easy and 50% hard benchmarks for solver A will not affect the expected outcome for solver B, but will fix the outcome of solver A to 50%.

• A better way has to be found to handle cases of inconsistent results, as observed this year in the LIA-lin-arrays track. There are different solutions: (i) only use benchmarks with known status in the competition; (ii) require solvers to produce, on demand, models or counterexamples when they claim to be able to solve a problem. This requires a discussion.

• An approach to determine and store the expected outcome of the individual benchmarks.

• A discussion is needed about new tracks to be added in the competition: parallel tracks, or tracks with new background theories, for instance algebraic data-types or bit-vectors?

• A bigger set of benchmarks is needed, and all users and tool authors are encouraged to submit benchmarks! In particular, in the LIA-nonlin and LRA-TS tracks, the competition results indicate that harder benchmarks are required.

The tool descriptions in this section were contributed by the tool submitters, and the copyright on the texts remains with the individual authors. Algorithm. Eldarica-abs is a variant of Eldarica [15] , which is a model checker for Horn clauses, Numerical Transition Systems, and software programs. Eldarica-abs mainly concentrates on the modification of the exploration strategies of abstraction lattices utilized in Eldarica. Identical with Eldarica, the inputs of Eldarica-abs can be read in a variety of formats, including SMT-LIB v2 and Prolog for Horn clauses, and fragments of Scala and C for software programs; and these inputs can be analysed using a variant of the Counterexample-Guided Abstraction Refinement (CEGAR) method, in which interpolation-based techniques are used to synthesis new predicates for CEGAR. Different from Eldarica, Eldarica-abs takes the cost-guided bidirectional heuristic search strategy to search constructed abstraction lattices in order to compute the interpolants of good quality [25] . As the goal elements are not only maximal and feasible, but also of minimal cost, this search strategy chooses from the top of the abstraction lattice to search downwards and introduces the predecessors computing function to avoid the repeated visit of the predecessors; and then, for the feasible predecessors of one element under consideration, upward searching strategy is taken to search an element whose all successors are all infeasible; for the infeasible predecessors, the cost and infeasibility information contained in these elements are utilized to change the set of the elements to be searched; additionally, for the selection of every element to be extended, we make the most of the information contained in the visited elements set and the set of elements to be visited, in order to choose the best candidate element and then promote the performance of the searching process along an effective path.

Architecture and Implementation. The overall architecture of Eldarica-abs is the same as Eldarica and it can be divided into three phases: input encoding phase; preprocessing phase; CEGAR engine solving phase [15] . Among these phases, interpolation abstraction is introduced as an effective convergence heuristic technique to compute the Craig interpolants of good quality, which are utilized to inspire the acquirement of the right predicates necessary in CEGAR process. In the course of obtaining suitable Craig interpolants, Eldarica-abs takes a different cost-guided bidirectional heuristic exploration strategy to search the constructed abstraction lattices. Based on the work of Eldarica, Eldarica-abs is also implemented in Scala and depends on Java and Scala libraries and the Princess SMT solver. Algorithm. The tool prophic3 is a prototype implementation of a Counter-example Guided Abstraction Refinement (CEGAR) [20] algorithm for model checking modulo the theory of arrays. The algorithm abstracts the theory of arrays using uninterpreted functions and lazily adds array axioms. Furthermore, it uses counterexamples to add history and prophecy variables which can help find simpler invariants, even reducing quantified invariants to quantifier-free invariants in some cases. The approach wraps a standard model checker. The underlying model checker must support all the theories used in the input problem, except the theory of arrays.

Architecture and Implementation. The prophic3 prototype depends on ic3ia [11] , an implementation of IC3 Modulo Theories via Implicit Predicate Abstraction (IC3IA) [6] . IC3IA is one approach for extending IC3 [5] to arbitrary theories. This is version 2020.05 of ic3ia. It depends on MathSAT 5.6.3 [7] . This tool operates on transition systems rather than CHC. CHC clauses are translated to Verification Modulo Theories (VMT) format using a program distributed with ic3ia.

Configuration in CHC-COMP-20. The submitted solver is a portfolio approach which runs prophic3 from git tag chccomp-2020 with option -no-eq-uf, as well as bounded model checking [1] up to bound 100 on the concrete system. https://github.com/makaimann/prophic3 GPLv3

Algorithm. Our competition entry is derived from the model checker Sally [17] , which we have enhanced in two ways: First, we have supplied our interpolating SMT solver OpenSMT with a specialized interpolation procedure for Linear Real Arithmetic (LRA) as part of Sally's backend. Secondly, we have implemented a parallel version of Sally (pSally) where multiple instances cooperate by sharing information discovered about the problem at hand. The current version of the tool is limited to solving safety of Transition Systems encoded in LRA.

The tool uses Sally's PD-KIND engine [17] as the core algorithm. PD-KIND strengthens the IC3 algorithm with k-induction and gradually builds a k-inductive safe invariant of the transition system. It relies on an SMT solver for answering satisfiability queries, generalization and interpolation. Our specialized interpolation procedure computes stronger interpolants than traditional interpolation algorithm and this has been shown to be useful in model checking scenarios [3] . The parallel version on the other hand leverages a portfolio of interpolation algorithms for discovering useful facts about the system under analysis, as well as a cooperative framework for sharing the discovered information [4] .

Architecture and Implementation. The competition entry uses Sally's PD-KIND reasoning engine, with the SMT solvers Yices2 [9] and OpenSMT [16] for generalization and interpolation, respectively. OpenSMT uses an interpolation algorithm that computes decomposed LRA interpolants [3] . The parallel version uses SMTS framework [21] for managing multiple instances and their communication. Algorithm. SPACER [19] is an IC3/PDR-style algorithm for solving linear and non linear CHCs. Given a set of CHCs, it iteratively proves the unreachability of false at larger and larger depths until a model is found or the set of CHCs is proven unsatisfiable. To prove unreachability at a particular depth, SPACER recursively generates sets of predecessor states (called proof obligations (POBs)) from which false can be derived and blocks them. Once a POB is blocked, SPACER generalizes the proof to learn a lemma that blocks multiple POBs. SPACER uses many heuristics to learn lemmas. These include interpolation, inductive generalization and quantifier generalization. The latest version of Spacer presents a new heuristic for learning lemmas [18] .

The current implementation of SPACER supports linear and nonlinear CHCs in the theory of Arrays, Linear Arithmetic and FixedSizeBitVectors. SPACER can generate both quantified and quantifier free models as well as resolution proof of unsatisfiability.

Architecture and Implementation. SPACER is implemented on top of the Z3 theorem prover. It uses many SMT solvers implemented in Z3. Additionally, it implements a max-SMT solver and an interpolating SMT solver.

Hossein Hojjat University of Tehran, Iran

Algorithm. Eldarica [15] is a Horn solver applying classical algorithms from model checking: predicate abstraction and counterexample-guided abstraction refinement (CEGAR). Eldarica can solve Horn clauses over linear integer arithmetic, arrays, algebraic data-types, and bit-vectors. It can process Horn clauses and programs in a variety of formats, implements sophisticated algorithms to solve tricky systems of clauses without diverging, and offers an elegant API for programmatic use.

Architecture and Implementation. Eldarica is entirely implemented in Scala, and only depends on Java or Scala libraries, which implies that Eldarica can be used on any platform with a JVM. For computing abstractions of systems of Horn clauses and inferring new predicates, Eldarica invokes the SMT solver Princess [22] as a library.

Configuration in CHC-COMP-20. Eldarica is in the competition run with the option -abstractPO, which enables a simple portfolio mode: two instances of the solver are run in parallel, one with the default options, and one with the option -abstract:off to switch off the interpolation abstract technique. Figure 3 : Solver performance on 500 benchmarks of the LIA-lin-arrays track (one benchmark on which Spacer and Ultimate Unihorn give conflicting answers is not counted)

Score 

Configuration in CHC-COMP-20. We used different configurations for different tracks

Tools and Algorithms for the Construction and Analysis of Systems

Horn Clause Solvers for Program Verification

Tools and Algorithms for the Construction and Analysis of Systems

A Cooperative Parallelization Approach for Property-Directed k-Induction

SAT-Based Model Checking without Unrolling. In: VMCAI

Infinite-state invariant checking with IC3 and predicate abstraction

The MathSAT5 SMT Solver

Ultimate TreeAutomizer (CHC-COMP Tool Description). In: HCVS/PERR@ETAPS

Yices 2.2

Synthesizing Software Verifiers from Proof Rules

Open-source IC3 Modulo Theories with Implicit Predicate Abstraction

Ultimate Automizer and the Search for Perfect Interpolants -(Competition Contribution). In: TACAS (2)

Software Model Checking for People Who Love Automata

Efficient Interpolation for the Theory of Arrays

The ELDARICA Horn Solver

OpenSMT2: An SMT Solver for Multi-core and Cloud Computing

Property-directed k-induction

Global Guidance for Local Generalization in Model Checking

SMT-based model checking for recursive programs

Counterexample Guided Abstraction Refinement Via Program Execution

SMTS: Distributed, Visualized Constraint Solving

A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic

Probabilistic Inference for Predicate Constraint Satisfaction

StarExec: A Cross-Community Infrastructure for Logic Solving

Facilitating CHC Solving with Improving Interpolation Abstraction Exploration

Algorithm. ULTIMATE UNIHORN reduces the satisfiability problem for a set of constraint Horn clauses to a software verfication problem. In a first step UNIHORN applies a yet unpublished translation in which the constraint Horn clauses are translated into a recursive program that is nondeterministic and whose correctness is specified by an assert statement The program is correct (i.e., no execution violates the assert statement) if and only if the set of CHCs is satisfiable. For checking whether the recursive program satisfies its specification, Unihorn uses ULTIMATE AUTOMIZER [12] which implements an automata-based approach to software verification [13] .Architecture and Implementation. ULTIMATE UNIHORN is a toolchain in the ULTIMATE framework. This toolchain first parses the CHC input and then runs the chctoboogie plugin which does the translation from CHCs into a recursive program. We use the Boogie language to represent that program. Afterwards the default toolchain for verifying a recursive Boogie programs by ULTIMATE AU-TOMIZER is applied. The ULTIMATE framework shares the libraries for handling SMT formulas with the SMTInterpol SMT solver. While verifying a program, ULTIMATE AUTOMIZER needs SMT solvers for checking satisfiability, for computing Craig interpolants and for computing unsatisfiable cores. The version of UNIHORN that participated in the competition used the SMT solvers SMTInterpol 8 and Z3 9 . The ULTIMATE framework is written in Java and build upon the Eclipse Rich Client Platform (RCP). The source code is available at GitHub 10 .Configuration in CHC-COMP-20. Our StarExec archive for the competition is shipped with the bin/starexec_run_default shell script calls the ULTIMATE command line interface with the AutomizerCHC.xml toolchain file and the AutomizerCHC_No_Goto.epf settings file. Both files can be found in toolchain (resp. settings) folder of ULTIMATE's repository.https://ultimate.informatik.uni-freiburg.de/LGPLv3 with a linking exception for Eclipse RCP