key: cord-0059606-hwidx0vc
authors: Baek, Seulkee; Carneiro, Mario; Heule, Marijn J. H.
title: A Flexible Proof Format for SAT Solver-Elaborator Communication
date: 2021-03-01
journal: Tools and Algorithms for the Construction and Analysis of Systems
DOI: 10.1007/978-3-030-72016-2_4
sha: 8b533727d80d39766ab572316b3b7487cc5b53dc
doc_id: 59606
cord_uid: hwidx0vc

We introduce FRAT, a new proof format for unsatisfiable SAT problems, and its associated toolchain. Compared to DRAT, the FRAT format allows solvers to include more information in proofs to reduce the computational cost of subsequent elaboration to LRAT. The format is easy to parse forward and backward, and it is extensible to future proof methods. The provision of optional proof steps allows SAT solver developers to balance implementation effort against elaboration time, with little to no overhead on solver time. We benchmark our FRAT toolchain against a comparable DRAT toolchain and confirm >84% median reduction in elaboration time and >94% median decrease in peak memory usage.

The Boolean satsifiability problem is the problem of determining, for a given Boolean formula consisting of Boolean variables and connectives, whether there exists a variable assignment under which the formula evaluates to true. Boolean satisfiability (SAT) is interesting in part because there are surprisingly diverse types of problems that can be encoded as Boolean formulas and solved efficiently by checking their satisfiability. SAT solvers, programs that automatically solve SAT problems, have been successfully applied to a wide range of areas, including hardware verification [2] , planning [14] , and combinatorics [12] .

The performance of SAT solvers has taken great strides in recent years, and modern solvers can often solve problems involving millions of variables and clauses, which would have been unthinkable a mere 20 years ago [15] . But this improvement comes at the cost of significant increase in the code complexity of SAT solvers, which makes it difficult to either assume their correctness on faith, or certify their program correctness directly. As a result, the ability of SAT solvers to produce independently verifiable certificates has become a pressing necessity. Since there is an obvious certificate format (the satisfying boolean assignment) for satisfiable problems, the real challenge in proof-producing SAT solving is in devising a compact proof format for unsatisfiable problems, and developing a toolchain that efficiently produces and verifies it.

The current de facto standard proof format for unsatisfiable SAT problems is DRAT [10] . The format, as well as its predecessor DRUP, were designed with a strong focus on quick adaptation by the community, emphasizing easy proof emission, practically zero overhead, and reasonable validation speed [11] . The DRAT format has become the only supported proof format in SAT Competition and Races since 2014 due to entrants losing interest in alternatives.

DRAT is a clausal proof format [6] , which means that a DRAT proof consists of a sequence of instructions for adding and deleting clauses. It is helpful to think of a DRAT proof as a program for modifying the 'active multiset' of clauses: the initial active multiset is the clauses of the input problem, and this multiset grows and shrinks over time as the program is executed step by step. The invariant throughout program execution is that the active multiset at any point of time is at least as satisfiable as the initial active multiset. This invariant holds trivially in the beginning and after a deletion; it is also preserved by addition steps by either RUP or RAT, which we explain shortly. The last step of a DRAT proof is the addition of the empty clause, which ensures the unsatisfiability of the final active multiset, and hence that of the initial active multiset, i.e. the input problem.

Every addition step in DRAT is either a reverse unit propagation (RUP) step [6] or a resolution asymmetric tautology (RAT) [13] step. A clause C has the property AT (asymmetric tautology) with respect to a formula F if F, C 1 ⊥, which is to say, there is a proof of the empty clause by unit propagation using F and the negated literals in C. A RUP step that adds C to the active multiset F is valid if C has property AT with respect to F . A clause l ∨ C has property RAT with respect to F if for every clause l ∨ D ∈ F , the clause C ∨ D has property AT with respect to F . In this case, C is not logically entailed by F , but F and F ∧ C are equisatisfiable, and a RAT step will add C to the active multiset if C has property RAT with respect to F . (See [10] for more about the justification for this proof system.)

DRAT has a number of advantages over formats based on more traditional proof calculi, such as resolution or analytic tableaux. For SAT solvers, DRAT proofs are easier to emit because CNF clauses are the native data structures that the solvers store and manipulate internally. Whenever a solver obtains a new clause, the clause can be simply streamed out to a proof file without any further modification. Also, DRAT proofs are more compact than resolution proofs, as the latter can become infeasibly large for some classes of SAT problems [7] .

There is, however, room for further improvement in the DRAT format due to the information loss incurred by DRAT proofs. Consider, for instance, the SAT problem and proofs shown in Figure 1 . The left column is the input problem in the DIMACS format, the center column is its DRAT proof, and the right column is the equivalent proof in the LRAT format, which can be thought of as an enriched version of DRAT with more information. The numbers before the first zero on lines without a "d" represent literals: positive numbers denote positive literals, while negative numbers denote negative literals. The first clause of the input formula is (x 1 ∨ x 2 ∨ x 3 ), or equivalently 1 2 -3 0 in DIMACS.

The first lines of both DRAT and LRAT proofs are RUP steps for adding the clause (x 1 ∨ x 2 ), written 1 2 0. When an LRAT checker verifies this step, it is informed of the IDs of active clauses (the trailing numbers 1 6 3) relevant for unit propagation, in the exact order they should be used. Therefore, the LRAT checker only has to visit the first, sixth, and third clauses and confirm that, starting with unit literals x 1 , x 2 , they yield the new unit literals x 3 , x 4 , ⊥. In contrast, a DRAT checker verifying the same step must add the literals x 1 , x 2 to the active multiset (in this case, the eight initial clauses) and carry out a blind unit propagation with the whole resulting multiset until contradiction. This omission of RUP information in DRAT proofs introduces significant overheads in proof verification. Although the exact figures vary from problem to problem, checking a DRAT proof typically takes approximately twice as long as solving the original problem, whereas the verification time for an LRAT proof is negligible compared to its solution time. This additional cost of checking DRAT proofs also represents a lost opportunity: when a SAT solver emits a RUP step, it knows exactly how the new clause was obtained, and this knowledge can (in theory) be turned into an LRAT-style RUP annotation, which can cut down verification costs significantly if conveyed to the verifier.

For the DRAT format, a design choice was made not to include such information since demanding explicit proofs for all steps turned out to be impractical. Although it is theoretically possible to always glean the correct RUP annotation from the solver state, computing this information can be intricate and costly for some types of inferences (e.g. conflict-clause minimization [22] ), making it harder to support proof logging [25] . Reducing such overheads is particularly important for solving satisfiable formulas, as proofs are superfluous for them and the penalty for maintaining such proofs should be minimized. We should note, however, that proof elaboration need not be an all-or-nothing business; if it is infeasible to demand 100% elaborated proofs, we can still ask solvers to fill in as many gaps as it is convenient for them to do so, which would still be a considerable improvement over handling all of it from the verifier side.

Inclusion of final clauses is another potential area for improvement over the DRAT format. A DRAT proof typically includes many addition steps that do not ultimately contribute to the derivation of the empty clause. This is unavoidable in the proof emission phase, since a SAT solver cannot know in advance whether a given clause will be ultimately useful, and must stream out the clause before it can find out. All such steps, however, should be dropped in the postprocessing phase in order to compress proofs and speed up verification. The most straightforward way of doing this is processing the proof in reverse order [6] : when processing a clause C k+1 , identify all the clauses used to derive C k+1 , mark them as 'used', and move on to clause C k . For each clause, process it if it is marked as used, and skip it otherwise. The only caveat of this method is that the postprocessor needs to know which clauses were present at the very end of the proof, since there is no way to identify which clauses were used to derive the empty clause otherwise. Although it is possible to enumerate the final clauses by a preliminary forward pass through a DRAT proof, this is clearly unnecessary work since SAT solvers know exactly which clauses are present at the end, and it is desirable to put this information in the proof in the first place.

To address the above issues, we introduce FRAT, a new proof format designed to allow fine-grained communication between SAT solvers and elaborators. The main differences between FRAT and DRAT are:

(1) optional annotation of RUP steps, (2) inclusion of final clauses, and (3) identification of clauses by unique IDs.

We've already explained the rationale for (1) and (2); (3) is necessary for concise references to clauses in deletions and RUP step annotations. More specifically, a FRAT proof consists of the following six types of proof steps:

o: An original step; a clause from the input file. The purpose of these lines is to name the clauses from the input with identifiers; they are not required to come in the same order as the file, they are not required to be numbered in order, and not all steps in the input need appear here. Proof may also progress (with a and d steps) before all o steps are added. a, l: An addition step, and an optional LRAT-style unit propagation proof of the step. The proof, if provided, is a sequence of clauses in the current formula in the order that they become unit. For solver flexibility, they are allowed to come out of order, but the elaborator is optimized for the case where they are correctly ordered. For a RAT step, the negative numbers in the proof refer to the clauses in the active set that contain the negated pivot literal, followed by the unit propagation proof of the resolvent. See [3] for more details on the LRAT checking algorithm. d: A deletion step for deleting the clause with the given ID from the formula.

The literals given must match the literals in the corresponding addition step up to permutation. r: A relocation step. The syntax is r ids 0, where ids has the form s 0 , t 0 , . . . , s k , t k and must consist of an even number of clause IDs. It indicates that the active clause with ID s i is re-labeled and now has ID t i , for each 0 ≤ i ≤ k. (This is used for solvers that use pointer identity for clauses, but also do garbage collection to decrease memory fragmentation.) f: A finalization step. These steps come at the end of a proof, and provide the list of all active clauses at the end of the proof. The clauses may come in any order, but every step that has been added and not deleted must be present.

(For best results, clauses should be finalized in roughly reverse order of when they were added.) (Our modified version of CaDiCaL also outputs a seventh kind of step, t todo id 0, to collect statistics on code paths that produce a steps without proofs. See Section 3 for how this information is used.) Figure 1 is an example from [3] , which includes a SAT problem in DIMACS format, and the proofs of its unsatisfiability in DRAT and LRAT formats. It shows how proofs are produced and elaborated via the DRAT toolchain. Figure  2 shows the corresponding problem and proofs for the FRAT toolchain. Notice how the FRAT proof is more verbose than its DRAT counterpart and includes all the hints for addition steps, which are reused in the subsequent LRAT proof.

Binary FRAT The files shown in Figure 2 are in the text version of the FRAT format, but for efficiency reasons solvers may also wish to use a binary encoding. The binary FRAT format is exactly the same in structure, but the integers are encoded using the same variable-length integer encoding used in binary DRAT [9] . Unsigned numbers are encoded in 7-bit little endian, with the high bit set on each byte except the last. That is, the number

Signed numbers are encoded by mapping n ≥ 0 to f (n) := 2n and −n (with n > 0) to f (n) := 2n + 1, and then using the unsigned encoding. (Incidentally, the mapping f is not surjective, as it misses 1. But it is used by other formats so we have decided not to change it.) To illustrate that proofs are optional, we have omitted the proofs of steps 11 and 12 in this example. The steps must still be legal RAT steps but the elaborator will derive the proof rather than the solver.

The purpose of the FRAT format is for solvers to be able to quickly write down what they are doing while they are doing it, with the elaborator stage "picking up the pieces" and preparing the proof for consumption by simpler mechanisms such as certified LRAT checkers. As such, it is important that we are able to concisely represent all manner of proof methods used by modern SAT solvers. The high level syntax of a FRAT file is quite simple: A sequence of "segments", each of which begins with a character, followed by zero or more nonzero numbers, followed by a 0. In the binary version, each segment similarly begins with a printable character, followed by zero or more nonzero bytes, followed by a zero byte. (Note that continuation bytes in an unsigned number encoding are always nonzero.) This means that it is possible to jump into a FRAT file and find segment boundaries by searching for a nearby zero byte. This is in contrast to binary LRAT, in which add steps are encoded as a id literal * 0 (± id ) * 0, because a random zero byte could either be the end of a segment or the middle of an add step. Since 0x61, the ASCII representation of a, is also a valid step ID (encoding the signed number −48), in a sequence such as (a nonzero * 0) * , the literals and the steps cannot be locally disambiguated.

The local disambiguation property is important for our FRAT elaborator, because it means that we can efficiently parse FRAT files generated by solvers backward, reading the segments in reverse order so that we can perform backward checking in a single pass.

DRAT is based on adding clauses that are RAT with respect to the active formula. It is quite versatile and sufficient for most common cases, covering CDCL steps, hyper-resolution, unit propagation, blocked clause elimination and many other techniques. However, we recognize that not all methods can be cast into this format, or are too expensive to translate into this proof system. In this work we define only six segment characters (a, d, f, l, o, r), that suffice to cover methods used by SAT solvers targeting DRAT. However, the format is forward-compatible with new kinds of proof steps, that can be indicated with different characters.

For example, CryptoMiniSat [21] is a SAT solver that also supports XOR clause extraction and reasoning, and can derive new XOR clauses using proof techniques such as Gaussian elimination. Encoding this in DRAT is quite complicated: The XOR clauses must be Tseitin transformed into CNF, and Gaussian elimination requires a long resolution proof. Participants in SAT competitions therefore turn this reasoning method off as producing the DRAT proofs is either too difficult or the performance gains are canceled out by the overhead.

FRAT resolves this impasse by allowing the solver to express itself with minimal encoding overhead. A hypothetical extension to FRAT would add new segment characters to allow adding and deleting XOR clauses, and a new proof method for proof by linear algebra on these clauses. The FRAT elaborator would be extended to support the new step kinds, and it could either perform the expensive translation into DRAT at that stage (only doing the work when it is known to be needed for the final proof), or it could pass the new methods on to some XLRAT backend format that understands these steps natively. Since the extension is backward compatible, it can be done without impacting any other FRAT-producing solvers.

The FRAT proof format is designed to allow conversion of DRAT-producing solvers into FRAT-producing solvers at minimal cost, both in terms of implementation effort and impact on runtime efficiency. In order to show the feasibility of such conversions, we chose two popular SAT solvers, CaDiCaL 1 and MiniSat 2 , to modify as case studies. The solvers were chosen to demonstrate two different aspects of feasibility: since MiniSat forms the basis of the majority of modern SAT solvers, an implementation using MiniSat shows that the format is widely applicable, and provides code which developers can easily incorporate into a large number of existing solvers. CaDiCaL, on the other hand, is a cuttingedge modern solver which employs a wide range of sophisticated optimizations.

A successful conversion of CaDiCaL shows that the technology is scalable, and is not limited to simpler toy examples.

As mentioned in Section 2, the main solver modifications required for FRAT production are inclusions of clause IDs, finalization steps, and LRAT proof traces. The provision of IDs requires some non-trivial modification as many solvers, including CaDiCaL and MiniSat, do not natively keep track of clause IDs, and DRAT proofs use literal lists up to permutation for clause identity. In CaDiCaL, we added IDs to all clauses, leading to 8 bytes overhead per clause. Additionally, unit clauses are tracked separately, and ensuring proper ID tracking for unit clauses resulted in some added code complexity. In MiniSat, we achieved 0 byte overhead by using the pointer value of clauses as their ID, with unit clauses having computed IDs based on the literal. This requires the use of relocation steps during garbage collection. The output of finalization steps requires identifying the active set from the solver state, which can be subtle depending on the solver architecture, but is otherwise a trivial task assuming knowledge of the solver.

LRAT trace production is the heart of the work, and requires the solver to justify each addition step. This modification is relatively easier to apply to Mini-Sat, as it only adds clauses in a few places, and already tracks the "reasons" for each literal in the current assignment, which makes the proof trace straightforward. In contrast, CaDiCaL has over 30 ways to add clauses; in addition to the main CDCL loop, there are various in-processing and optimization passes that can create new clauses.

To accommodate this complexity, we leverage the flexibility of the FRAT format which allows optional hints to focus on the most common clause addition steps, to reap the majority of runtime advantage with only a few changes. The FRAT elaborator falls back on the standard elaboration-by-unit propagation when proofs are not provided, so future work can add more proofs to CaDiCaL without any changes to the toolchain.

To maximize the efficacy of the modification, we used a simple method to find places to add proofs. In the first pass, we added support for clause ID tracking and finalization, and changing the output format to FRAT syntax. Since CaDi-CaL was already producing DRAT proofs, we can easily identify the addition and removal steps and replace them with a and d steps. Once this is done, Ca-DiCaL is producing valid FRAT files which can pass through the elaborator and get LRAT results, but it will be quite slow since the FRAT elaborator is essentially acting as a less-optimized version of DRAT-trim at this point.

We then find all code paths that lead to an a step being emitted, and add an extra call to output a step of the form t todo id 0, where todo id is some unique identifier of this position in the code. The FRAT elaborator is configured to ignore these steps, so they have no effect, but by running the solver on benchmarks we can count how many t steps of each kind appear, and so see which code paths are hottest.

The basic idea is that elaborating a step that has a proof is much faster than elaborating a step that doesn't, but the distribution of code paths leading to add steps is highly skewed, so adding proofs to to the top 3 or 4 paths already decreases the elaboration time by over 70%. At the time of writing, about one third of CaDiCaL code paths are covered, and median elaboration time is about 15% that of DRAT-trim (see Section 5) . (This is despite the fact that our elaborator could stand to improve on low level optimizations, and runs about twice as slow as DRAT-trim when no proofs are provided.)

The main tasks of the FRAT-to-LRAT elaborator 3 are provision of missing RUP step hints, elimination of irrelevant clause additions, and re-labeling clauses with new IDs. These tasks are performed in two separate 'passes' over files, writing Algorithm 1 First pass (elaboration): FRAT to elaborated reversed FRAT 1: function Elaborate(cert) 2:

F ← ∅, revcert ← [] F is a map ID → clause with a bool marking 3:

for step in reverse(cert) do 4:

case step of 5:

a(i, C, proof ? ) ⇒ 9:

C ← F.remove(i); assert C C 10:

if C .marked then 11:

steps ← case proof ? of 12:

ε ⇒ ProveRAT(F, C) 13:

l(steps) ⇒ CheckHint(F, C, steps) 14:

for j in {j | ±j ∈ steps } do 15:

if ¬Fj.marked then 16:

Fj.marked ← true 17:

revcert ← revcert, d(step, Fj)

revcert ← revcert, a(i, C, l(steps )) 19 :

revcert ← revcert, r(R )

and reading directly to disk (so the entire proof is never in memory at once). In the first pass, the elaborator reads the FRAT file and produces a temporary file (which may be stored on disk or in memory depending on configuration). The temporary file is essentially the original FRAT file with the steps put in reverse order, while satisfying the following additional conditions:

-All a steps have annotations.

-Every clause introduced by an o, a, or r step ultimately contributes to the proof of ⊥. Note that we consider an r step as using an old clause with the old ID and introducing a new clause with the new ID. -There are no f steps. Algorithm 1 shows the pseudocode of the first pass, Elaborate(cert). Here, cert is the FRAT proof obtained from the SAT solver, and the pass works by iterating over its steps in reverse order, producing the temporary file revcert. The map F maintains the active formula as a map with unique IDs for each clause (double inserts and removes to F are always error conditions), and the effect of each step is replayed backwards to reconstruct the solver's state at the point each step was produced. M ← ∅, k ← |Forig|, lrat ← [] M is a map ID → ID 3:

for step in reverse(revcert) do 4:

case step of 5:

o(i, C) ⇒ find j such that C (F orig )j; M.insert(i, j) 6:

a(i, C, l(steps)) ⇒ 7:

k ← k + 1; M.insert(i, k) 8:

lrat ← lrat, add(k, C, [±M i | ±i ∈ steps]) 9:

if C = ⊥ then return lrat 10:

assert false no proof of ⊥ found -All d or f clauses are immediately inserted to F , but (with the exception of the empty clause) are marked as not necessarily required for the proof, and the d step is deferred until just before its first use (or rather, just after the last use). -ProveRAT(F, C) , not given here, checks that C has property RAT with respect to F , and produces a step list in LRAT format (where positive numbers are clause references in a unit propagation proof, and negative numbers are used in RAT steps, indicating the clauses to resolve against). -CheckHint(F, C, steps) does the same thing, but it has been given a candidate proof, steps. It will check that steps is a valid proof, and if so, returns it, but the steps in the unit propagation proof may be out of order (in which case they are reordered to LRAT conformity), and if the given proof is not salvageable, it falls back on ProveRAT(F, C) to construct the proof.

In the second pass, Renumber(F orig , revcert) reads the input DIMACS file and the temporary file from the first pass, and produces the final result in LRAT format. Not much checking happens in this pass, but we ensure that the o steps in the FRAT file actually appear (up to permutation) in the input. The state that is maintained in this pass is a list of all active clause IDs, and the corresponding list of LRAT IDs (in which original steps are always numbered sequentially in the file, and add/delete steps use a monotonic counter that is incremented on each addition step).

The resulting LRAT file can then be verified by any of the verified LRAT checkers [26] (and our toolchain also includes a built-in LRAT checker for verification).

The 2-pass algorithm is used in order to optimize memory usage. The result of the first pass is streamed out so that the intermediate elaboration result does not have to be stored in memory simultaneously. Once the temporary file is streamed out, we need at least one more pass to reverse it (even if the labels did not need renumbering) since its steps are in reverse order.

We performed benchmarks comparing our FRAT toolchain (modified CaDiCaL + FRAT-to-LRAT elaborator written in Rust) against the DRAT toolchain (standard CaDiCaL + DRAT-trim) and measured their execution times, output file sizes, and peak memory usages while solving SAT instances in the DIMACS format and producing their LRAT proofs. All tests were performed on Amazon EC2 r5a.xlarge instances, running Ubuntu Server 20.04 LTS on 2.5 GHz AMD EPYC 7000 processors with 32 GB RAM and 512 GB SSD.

The instances used in the benchmark were chosen by selecting all 97 instances for which default-mode CaDiCaL returned 'UNSAT' in the 2019 SAT Race results. One of these instances was excluded because DRAT-trim exhausted the available 32GB memory and failed during elaboration. Although this instance was not used for comparisons below, we note that it offers further evidence of the FRAT toolchain's efficient use of memory, since the FRAT-to-LRAT elaboration of this instance succeeded on the same system. The remaining 96 instances were used for performance comparison of the two toolchains. 4 Figures 5 and 6 show the time and memory measurements from the benchmark. We can see from Figure 5 that the FRAT toolchain is significantly faster than DRAT toolchain. Although the modified CaDiCaL tends to be slightly (6%) slower than standard CaDiCaL, that overhead is more than compensated by a median 84% decrease in elaboration time (the sum over all instances are 1700.47 s in the DRAT toolchain vs. 381.70 s in the FRAT toolchain, so the average is down by 77%). If we include the time of the respective solvers, the FRAT + modified CaDiCaL toolchain takes 53.6% of the DRAT + CaDiCaL toolchain on median. The difference in the toolchains' time budgets is clear: the DRAT toolchain spends 42% of its time in solving and 58% in elaboration, while FRAT spends 85% on solving and only 15% on elaboration. Figure 6 shows a dramatic difference in peak memory usage between the FRAT and DRAT toolchains. On median, the FRAT toolchain used only 5.4% as much peak memory as DRAT. (The average is 318.62 MB, which is 11.98% that of the DRAT toolchain's 2659.07 MB, but this is dominated by the really large instances. The maximum memory usage was 2.99 GB for FRAT and 21.5 GB for DRAT, but one instance exhausted the available 32 GB in DRAT and is not included in this figure. ) This result is in agreement with our initial expectations: the FRAT toolchain's 2-pass elaboration method allows it to limit the number of clauses held in memory to the size of the active set used by the solver, whereas the DRAT toolchain loads all clauses in a DRAT file into memory at once during elaboration. This difference suggests that the FRAT toolchain can be used to verify instances that would otherwise require more memory than the system limit on the DRAT toolchain.

There were no noticeable differences in the sizes or verification times of LRAT proofs produced by the two toolchains. On average, LRAT proofs produced by the FRAT toolchain were 1.873% smaller and 3.314% faster 5 to check than those from the DRAT toolchain.

One minor downside of the FRAT toolchain is that it requires the storage of a temporary file during elaboration, but we do not expect this to be a problem in practice since the temporary file is typically much smaller than either the FRAT or LRAT file. In our test cases, the average temporary file size was 28.68% and 47.60% that of FRAT and LRAT files, respectively. In addition, users can run the elaborator with the -m option to bypass temporary files and write the temporary data to memory instead, which further improves performance but foregoes the memory conservation that comes with 2-pass elaboration.

The CaDiCaL modification is only a prototype, and some of its weaknesses show in the data. The general pattern we observed is that on problems for which the predicted CaDiCaL code paths were taken, the generated files have a large number of hints and the elaboration time is negligible (the "FRAT elab" line in fig. 5 ); but on problems which make use of the more unusual in-processing operations, many steps with no hints are given to the elaborator, and performance becomes comparable to DRAT-trim. For solver developers, this means that there is a very direct relationship between proof annotation effort and mean solution + elaboration time. Currently, elaboration of FRAT files with no annotations (the worst-case scenario for the FRAT toolchain) typically takes slightly more than twice as long as elaboration of DRAT files with DRAT-trim, likely due to missing optimizations from DRAT-trim that could be incorporated, but this only underscores the effectiveness of adding hints to the format.

As already mentioned, the FRAT format is most closely related to the DRAT format [8] , which it seeks to replace as an intermediate output format for SAT solvers. It is also dependent on the LRAT format and related tools [3] , as the FRAT toolchain targets LRAT as the final output format.

The GRAT format [16] and toolchain also aims to improve elaboration of SAT unsatisfiability proofs, but takes a different approach from that of FRAT. It retains DRAT as the intermediate format, but uses parallel processing and targets a new final format with more information than LRAT in order to improve overall performance. GRAT also comes with its own verified checker [17] .

Specifying and verifying the program correctness of SAT solvers (sometimes called the autarkic method, as opposed to the proof-producing skeptical method) is a radically different approach to ensuring the correctness of SAT solvers. There have been various efforts to verify nontrivial SAT solvers [18, 20, 19, 4, 5] . Although these solvers have become significantly faster, they cannot compete with the (unverified) state-of-the-art solvers. It is also difficult to maintain and modify certified solvers. Proving the correctness of nontrivial SAT solvers can provide new insights about key invariants underlying the used techniques [5] .

Generally speaking, devising proof formats for automated reasoning tools and augmenting the tools with proof output capability is an active research area. Notable examples outside SAT solving include the LFSC format for SMT solving [23] and the TSTP format for classical first-order ATPs [24] . In particular, the recent work on the veriT SMT solver [1] is motivated by similar rationales as that for the FRAT toolchain; the key insight is that a proof production pipeline is often easier to optimize on the solver side than on the elaborator side, as the former has direct access to many types of useful information.

The test results show that the FRAT format and toolchain made significant performance gains relative to their DRAT equivalents in both elaboration time and memory usage. We take this as confirmation of our initial conjectures that (1) there is a large amount of useful and easily extracted information in SAT solvers that is left untapped by DRAT proofs, and (2) the use of streaming verification is the key to verifying very large proofs that cannot be held in memory at once.

The practical ramification is that, provided that solvers produce well-annotated FRAT proofs, the elaborator is no longer a bottleneck in the pipeline. Typically, when DRAT-trim hangs it does so either by taking excessive time, or by attempting to read in an entire proof file at once and exhausting memory (the so-called "uncheckable" proofs that can be produced but not verified). But FRAT-to-LRAT elaboration is typically faster than FRAT production, and the memory consumption of the FRAT-to-LRAT elaborator at any given point is proportional to the memory used by the solver at the same point in the proof. Since LRAT verification is already efficient, the only remaining limiting factor is essentially the time and memory usage of the solver itself.

In addition to performance, the other main consideration in the design of the FRAT format and toolchain was flexibility of use and extension. The encoding of FRAT files allows them to be read and parsed both backward and forward, and the format can be modified to include more advanced inferences, as we have discussed in the example of XOR steps. The optional l steps allow SAT solvers to decide precisely when they will provide explicit proofs, thereby promoting a workable compromise between implementation complexity and runtime efficiency. SAT solver developers can begin using the format by producing the most bare-bones FRAT proofs with no annotations (essentially DRAT proofs with metadata for original/final clauses) and gradually work toward providing more complete hints. We hope that this combination of efficiency and flexibility will motivate performance-minded SAT solver developers to adopt the format and support more robust proof production, which is presently only an afterthought in most SAT solvers.

Scalable fine-grained proofs for formula processing

Symbolic model checking using SAT procedures instead of BDDs

Efficient certified RAT verification

Optimizing a verified SAT solver

A verified SAT solver with watched literals using imperative HOL

Verification of proofs of unsatisfiability for CNF formulas

The intractability of resolution

The DRAT format and DRAT-trim checker

Clausal proof compression

Verifying refutations with extended resolution

Bridging the gap between easy generation and efficient verification of unsatisfiability proofs

Solving and verifying the boolean pythagorean triples problem via cube-and-conquer

Inprocessing rules

Pushing the envelope: Planning, propositional logic, and stochastic search

Fascicle 6: Satisfiability

The GRAT tool chain

Efficient verified (un) SAT certificate checking

Formal verification of a modern SAT solver by shallow embedding into Isabelle/HOL

versat: A verified modern SAT solver

The mechanical verification of a dpll-based satisfiability solver

Extending SAT solvers to cryptographic problems

Minimizing learned clauses

SMT proof checking using a logical framework

Tstp data-exchange formats for automated theorem proving tools. Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems

Improved conflict-clause minimization leads to improved propositional proof traces

Mechanical verification of SAT refutations with extended resolution

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.