key: cord-0046675-llto2m5p
authors: Heisinger, Maximilian; Fleury, Mathias; Biere, Armin
title: Distributed Cube and Conquer with Paracooba
date: 2020-06-26
journal: Theory and Applications of Satisfiability Testing - SAT 2020
DOI: 10.1007/978-3-030-51825-7_9
sha: c7545ebe2b23370a0c9564a64ad5ae32c43f7b79
doc_id: 46675
cord_uid: llto2m5p

Cube and conquer is currently the most effective approach to solve hard combinatorial problems in parallel. It organizes the search in two phases. First, a look-ahead solver splits the problem into many sub-problems, called cubes, which are then solved in parallel by incremental CDCL solvers. In this tool paper we present the first fully integrated and automatic distributed cube-and-conquer solver Paracooba targeting cluster and cloud computing. Previous work was limited to multi-core parallelism or relied on manual orchestration of the solving process. Our approach uses one master per problem to initialize the solving process and automatically discovers and releases compute nodes through elastic resource usage. Multiple problems can be solved in parallel on shared compute nodes, controlled by a custom peer-to-peer based load-balancing protocol. Experiments show the scalability of our approach.

SAT solvers have been successfully applied in many practical domains, including cryptanalysis, hardware and software verification but also with increasing interest have been used to solve hard mathematical problems [17, 21, 26] . Sequential state-of-the-art SAT solving combines the well-known conflict-drivenclause-learning procedure (CDCL) [33, 34] with sophisticated preprocessing techniques [10, 23] and other efficient heuristics for variable selection [6, 28, 30] , restarts [2, 7, 32] , and clause database reduction [32] . While some authors argue that there was "no major performance breakthrough in close to two decades" [29] , at the same time computers have become more and more powerful thanks to the ubiquitous availability of multi-core processors and the increasing usage of computers in the cloud. Thus improving the efficiency of parallel SAT solving remains an important topic. Accordingly, beside the traditional parallel track, the SAT Competition 2020 [19] features for the first time also a cloud track.

One approach to solve large problems in parallel consists in splitting the problem into smaller, more manageable instances, for example, using cube and conquer [16, 20] . All these sub-problems are subsequently solved independently in parallel. This method was used by Heule to settle some long-standing mathematical conjectures [17, 21] . Splitting the problems was done automatically by a tool, but then required to manually distribute instances for parallel solving.

In this paper, we present Paracooba [15] . After splitting a problem with the look-ahead solver March, Paracooba transfers the sub-problems in an efficient way to many nodes (including over network). It detects when new instances become online and balances the work across all available nodes.

Other attempts for automatic and efficient distribution of problems exist, but use divide and conquer: Problems are dynamically split when nodes are underused (Sect. 2). In contrast, Paracooba assumes the problem is already split. Each node runs at least one instance of the SAT solver CaDiCaL [5] . The sub-problems are solved incrementally to reuse information from the previous solving. Paracooba relies on a custom protocol to automatically detect nodes that are underused and balances work across all nodes, including newly joining ones. It also supports disconnecting nodes by rebalancing the jobs (Sect. 3).

In the experiments, we focus on a single CNF cruxmiter, a miter for 32-bit adder trees [25] , which is considered a challenge for resolution-based solvers and exemplary for the difficulties that arise in the verification of arithmetic circuits (see also [24] ). Such benchmarks were also used in the SAT Race 2019. Already in the original work on cube and conquer similar multiplier equivalence checking problems were shown to benefit from the cube-and-conquer approach. Our results in Sect. 4 show that we get linear scaling with respect to the number of threads.

We use standard notations and refer the reader to the Handbook of Satisfiability for an introduction to SAT [8] as well as to the chapter on parallel SAT solving [4] in the Handbook of Parallel Constraint Reasoning [14] .

One idea to improve solving of large instances is to distribute the work across different machines, via either a diversification of the search or splitting of the search space. In the first approach, several solvers are used as portfolio. By changing some parameters used by SAT solvers, they heuristically search on different parts of the search space and share some of the clauses they learned. ManySAT [13] pioneered the approach, which is now used in various tools like CryptoMiniSat [35] , HordeSat [3] , Plingeling [5] , and Syrup [1] . As soon as any instance derives SAT or UNSAT, then the problem is solved.

We use another approach that divides the search space explicitly as pioneered in [9, 22, 36] and refined in [16, 20] . Solving the formula ϕ is equivalent to splitting it into the two formulas ϕ∧x and ϕ∧¬x and solving them. Unlike diversification, the overall problem is only considered to be UNSAT if all sub-problems are. Still, if any sub-problem is SAT, the overall problem is SAT, too. Splitting can be done dynamically during solving whenever a problem is deemed too hard. This is used for instance by Painless [27] or MapleAmpharos [31] . These tools also share clauses to get some of the benefits of portfolio solvers.

Splitting can also be done upfront by look-ahead. By splitting the formula recursively, we obtain a formula of the form ϕ ∧ c 1 , ..., ϕ ∧ c n where the conjunctions c i are called cubes. We use March [18] to split the problem: It produces cubes, e.g., of the form L 1 L 2 L 3 , L 1 L 2 ¬L 3 , ..., ¬L 1 ¬L 2 ¬L 3 . The cubes can be represented as a binary tree, the cube tree, where cubes are a path to a leaf: At each node, either the left (positive) or the right path (negative) is taken.

Paracooba distinguishes between the masters that initiated work and workers that do the actual solving. Each node can either explore the cube tree deeper by sending work further (see Sect. 3.1) or solve the problem itself if a leaf node of the cube tree has been reached (Sect. 3.2). Nodes are also responsible for sending the result SAT or UNSAT back. Paracooba supports joining of new nodes dynamically, and the leaf nodes are able to wait for new tasks without consuming resources or shut down automatically, which is important if Paracooba is run in the cloud (Sect. 3.3). Figure 1 gives an overview of the solving process.

To combine fast local solving with automatic distribution to networked compute nodes, Paracooba sees tasks as paths in the cube tree. It distinguishes between assigned tasks (path to leafs) that are waiting for an available local worker and unassigned tasks. Only unassigned tasks are distributed further. A compute node is mapped to one Paracooba process which contains a fixed-size thread pool of local workers. Beside maintaining information on available nodes, every compute node has a unique 64-bit ID.

Connections between compute nodes are established at any time either by an integrated auto-discovery protocol or by providing a known peer at startup. Once connected, each compute node receives the full formula sent by the master. Then it announces that it is ready to receive tasks. Each compute node has a solving context for every master with the problem and the cubes to solve, a queue for unassigned tasks, and one for assigned tasks. Only paths in cube trees are exchanged during solving and similar assigned tasks are solved by the same solver. New contexts are created whenever a new master becomes online, and old ones are deleted if its master becomes offline. By using low-level socket functionality (UDP/TCP), Paracooba can be run without setting up a specialized environment (as needed for MPI [12] ).

New (unassigned) tasks received by a compute node are inserted into the queue. When a compute node becomes idle, tasks with paths to leafs are instantiated into assigned tasks to be solved locally, whereas shorter paths are split (by going deeper in the cube tree) into unassigned tasks that are distributed further. The overall strategy is to solve tasks with longer paths locally (as we are closer to leafs), while other tasks are distributed to further known compute nodes.

The SAT solver CaDiCaL [5] solves the assigned tasks incrementally [11] . It makes use of efficient preprocessing, including variable elimination and relying on efficiently restoring preprocessing steps if necessary [11] . This also provides a motivation for solving long paths locally: the cubes after a long shared path will be similar, making it possible to reuse more information compared to solving diverse cubes, where most of the preprocessing will have to be undone. If hard sub-problems are clustered on a single compute node, some can be offloaded.

We use the look-ahead solver March [18] to generate cubes. Paracooba takes the output file containing the formula and the cubes as argument. This Paracooba instance is the master node. All compute nodes parse both formula and cubes (reusing CaDiCaL's parser). After parsing, the initial task consisting of the empty path is created on the master compute node which will then branch on the first variable of the cube tree and create new unassigned tasks. These are either solved directly on the master or distributed to other compute nodes.

Paths in the cube tree are often transmitted across the network and should, therefore, have a compact representation. We represent them as 64-bit unsigned integers, where the first 58 bits describe the path in the binary tree and the last 6 bits specify the length of the path. This representation entails a maximum tree height of 58, which limits the number of different tasks to 2 58 . This constraint is not an issue, since it is 11 orders of magnitude larger than the one million cubes used for Heule's proof for Pythagorean Triples [21] that already created a 200TB proof. Communication between compute nodes is done using a custom protocol, which defines messages sent over UDP and TCP. The former is unreliable (packages can be dropped) and is used for non-critical messages, like auto-discovery, while the (reliable) latter is used for transmission of formulas, tasks, results, and status updates. Once a new compute node becomes known, all other nodes establish a TCP connection to it, which is used for all remaining transfers in order to circumvent UDP reliability issues in larger environments.

A sample interaction between a master and two daemon compute nodes is given in Fig. 2 . First, the master starts with a problem to solve. It broadcasts an announcement request to all devices on the network. The daemons 1 and 2 answer the request and receive the formula in iCNF and a job initiator message. After that, solving starts and a path is sent from master to daemon 1. Work is rebalanced from daemon 1 to daemon 2. Once the problem is solved, the status is bubbled up to master and each node is responsible for collecting the results of offloaded jobs. Finally, master can conclude (UN)SAT.

Every daemon and every master sends a status message at every "tick", i.e., in configurable intervals with default 100 ms, to all compute nodes it knows. 

These messages describe the current queue sizes and are used by the distribution algorithm to decide whether and where tasks should be offloaded.

Paracooba allows an "m to n" relation between masters and daemons, where daemons are used by different masters at the same time. Jobs are scheduled based on path length, not on the identity of the master.

When distributing tasks to other compute nodes, the ID of the original master, of the distribution target, and of the sender are always referenced, making all Paracooba instances aware of senders and receivers of each task. For the same reason, status messages of daemon compute nodes also contain a list of all current contexts to announce the formulas for which they can solve tasks.

By automatically discovering compute nodes in the same network, Paracooba can manage its overall resources automatically. Every daemon that is newly discovered by a master gets the formula and the cubes and, once ready, can receive tasks from all other connected compute nodes. Whenever a master node goes offline, it sends an offline announcement, which removes its solving context from all connected daemons, including all results and solver instances.

Compute nodes maintain a moving average of time between status messages for all other connected nodes. If a remote compute node does not send a status update early enough, it gets removed from the list of known nodes and all tasks sent to that node get re-added to the local unassigned-task queue (and can, for example, be offloaded again).

To save compute resources, an auto-shutdown timer can be enabled to measure the time a compute node has been idle without active tasks to shut down the compute node, if no new tasks are added before the timer runs out. Because tasks get distributed to inactive nodes quickly, the timeout can be set to low values (e.g., 3 s) to reduce cost, making Paracooba suited for cloud scenarios. 

As motivated in the introduction, we tested our tool Paracooba on a 32-bit cruxmiter problem [25] , for which March takes less than 10 s to split the initial problem into 52 520 cubes. We then run Paracooba on our compute cluster of 32 nodes connected through cheap commodity 1 Gbit ethernet cards. Each node contains two 8-core Intel Xeon E5-2620 v4 CPUs running at 2.10 GHz (turbo-mode disabled) and 128 GB main memory. Thus every node has 16 cores. Table 1 shows the performance with respect to the number of threads.The run-time distribution for solving cubes is heavily skewed. Most tasks need only a few seconds, but some take more than a minute, limiting the performance improvement that can be achieved by using more threads, as, following Amdahl's law, the possible speedup is limited by the time required to solve the slowest cube. After 2 min, 5 instances of CaDiCaL are still running and it takes another minute to solve those. Paracooba outperforms static scheduling done by splitting the cubes upfront over 512 threads and solving each resulting iCNF for each group of cubes incrementally by CaDiCaL (4 min 17 s wall-clock time). We experimented with resplitting cubes, but could not improve solving time.

Paracooba is the first distributed cube-and-conquer solver. It relies on the state-of-the-art look-ahead solver March to split the problem and then efficiently distributes the cubes over as many nodes as available. Our experiments reveal that the speedup is larger than the number of cores until saturation is reached.

As future work, it would be interesting to support proof generation in the nodes and store them in the master node.

A distributed version of Syrup

Refining restarts strategies for SAT and UNSAT

HordeSat: a massively parallel portfolio SAT solver

Parallel satisfiability. Handbook of Parallel Constraint Reasoning

CaDiCaL, Lingeling, Plingeling, Treengeling and YalSAT entering the SAT Competition

Evaluating CDCL variable scoring schemes

Evaluating CDCL restart schemes

Handbook of Satisfiability

Parallel propositional satisfiability checking with distributed dynamic learning

Effective preprocessing in SAT through variable and clause elimination

Incremental inprocessing in SAT solving

The MPI 2.2 standard and the emerging MPI 3 standard

ManySAT: a parallel SAT solver

Handbook of Parallel Constraint Reasoning

Cube and conquer: guiding CDCL SAT solvers by lookaheads

Schur number five

March eq: implementing additional reasoning into an efficient look-ahead SAT solver

Cube-and-conquer for satisfiability. Handbook of Parallel Constraint Reasoning

Solving very hard problems: cubeand-conquer, a hybrid SAT solving method

A distribution method for solving SAT in grids

Inprocessing rules

Verifying large multipliers by combining SAT and computer algebra

Arithmetic verification problems submitted to the SAT Race

Computer-aided proof of Erdős discrepancy properties

Modular and efficient divideand-conquer SAT solver on top of the painless framework

Learning rate based branching heuristic for SAT solvers

SAT: Disruption, demise & resurgence (2019). pOS'2019

Chaff: engineering an efficient SAT solver

A propagation rate based splitting heuristic for divide-and-conquer solvers

Between SAT and UNSAT: the fundamental difference in CDCL SAT

Conflict-driven clause learning SAT solvers

GRASP -a new search algorithm for satisfiability

Extending SAT solvers to cryptographic problems

PSATO: a distributed propositional prover and its application to quasigroup problems

Acknowledgment. This work is supported by the Austrian Science Fund (FWF), NFN S11408-N23 (RiSE), the LIT project LOGTECHEDU, and the LIT AI Lab funded by the State of Upper Austria. Daniela Kaufmann, Sibylle Möhle, and the reviewers suggested many textual improvements.