key: cord-0047733-z1x1syhn
authors: Blondin, Michael; Esparza, Javier; Helfrich, Martin; Kučera, Antonín; Meyer, Philipp J.
title: Checking Qualitative Liveness Properties of Replicated Systems with Stochastic Scheduling
date: 2020-06-16
journal: Computer Aided Verification
DOI: 10.1007/978-3-030-53291-8_20
sha: e067750a192ef87818ef429587df999d85206ebb
doc_id: 47733
cord_uid: z1x1syhn

We present a sound and complete method for the verification of qualitative liveness properties of replicated systems under stochastic scheduling. These are systems consisting of a finite-state program, executed by an unknown number of indistinguishable agents, where the next agent to make a move is determined by the result of a random experiment. We show that if a property of such a system holds, then there is always a witness in the shape of a Presburger stage graph: a finite graph whose nodes are Presburger-definable sets of configurations. Due to the high complexity of the verification problem (non-elementary), we introduce an incomplete procedure for the construction of Presburger stage graphs, and implement it on top of an SMT solver. The procedure makes extensive use of the theory of well-quasi-orders, and of the structural theory of Petri nets and vector addition systems. We apply our results to a set of benchmarks, in particular to a large collection of population protocols, a model of distributed computation extensively studied by the distributed computing community.

Replicated systems consist of a fully symmetric finite-state program executed by an unknown number of indistinguishable agents, communicating by rendez-vous 18-11193S. or via shared variables [14, 16, 41, 46] . Examples include distributed protocols and multithreaded programs, or abstractions thereof. The communication graph of replicated systems is a clique. They are a special class of parameterized systems, i.e., infinite families of systems that admit a finite description in some suitable modeling language. In the case of replicated systems, the (only) parameter is the number of agents executing the program. Verifying a replicated system amounts to proving that an infinite family of systems satisfies a given property. This is already a formidable challenge, made even harder by the fact that we want to verify liveness (more difficult than safety) against stochastic schedulers. Loosely speaking, stochastic schedulers select the set of agents that should execute the next action as the result of a random experiment. Stochastic scheduling often appears in distributed protocols, and in particular also in population protocols-a model much studied in distributed computing with applications in computational biology 1 -that supplies many of our case studies [9, 58] . Under stochastic scheduling, the semantics of a replicated system is an infinite family of finite-state Markov chains. In this work, we study qualitative liveness properties, stating that the infinite runs starting at configurations of the system satisfying a precondition almost surely reach and stay in configurations satisfying a postcondition. In this case, whether the property holds or not depends only on the topology of the Markov chains, and not on the concrete probabilities.

We introduce a formal model of replicated systems, based on multiset rewriting, where processes can communicate by shared variables or multiway synchronization. We present a sound and complete verification method called Presburger stage graphs. A Presburger stage graphs is a directed acyclic graphs with Presburger formulas as nodes. A formula represents a possibly infinite inductive set of configurations, i.e., a set of configurations closed under reachability. A node S (which we identify with the set of configurations it represents) has the following property: A run starting at any configuration of S almost surely reaches some configuration of some successor S of S, and, since S is inductive, get trapped in S . A stage graph labels the node S with a witness of this property in the form of a Presburger certificate, a sort of ranking function expressible in Presburger arithmetic. The completeness of the technique, i.e., the fact that for every property of the replicated system that holds there exists a stage graph proving it, follows from deep results of the theory of vector addition systems (VASs) [52] [53] [54] .

Unfortunately, the theory of VASs also shows that, while the verification problems we consider are decidable, they have non-elementary computational complexity [33] . As a consequence, verification techniques that systematically explore the space of possible stage graphs for a given property are bound to be very inefficient. For this reason, we design an incomplete but efficient algorithm for the computation of stage graphs. Inspired by theoretical results, the algorithm combines a solver for linear constraints with some elements of the theory of wellstructured systems [2, 39] . We report on the performance of this algorithm for a large number of case studies. In particular, the algorithm automatically verifies many standard population protocols described in the literature [5, 8, 20, 22, 23, 28, 31] , as well as liveness properties of distributed algorithms for leader election and mutual exclusion [3, 40, 42, 44, 50, 59, 61, 64] .

Related Work. The parameterized verification of replicated systems was first studied in [41] , where they were modeled as counter systems. This allows one to apply many efficient techniques [11, 24, 37, 47] . Most of these works are inherently designed for safety properties, and some can also handle fair termination [38] , but none of them handles stochastic scheduling. To the best of our knowledge, the only works studying parameterized verification of liveness properties under our notion of stochastic scheduling are those on verification of population protocols. For fixed populations, protocols can be verified with standard probabilistic model checking [13, 65] , and early works follow this approach [28, 31, 60, 63] . Subsequently, an algorithm and a tool for the parameterized verification of population protocols were described in [21, 22] , and a first version of stage graphs was introduced in [23] for analyzing the expected termination time of population protocols. In this paper we overhaul the framework of [23] for liveness verification, drawing inspiration from the safety verification technology of [21, 22] . Compared to [21, 22] , our approach is not limited to a specific subclass of protocols, and captures models beyond population protocols. Furthermore, our new techniques for computing Presburger certificates subsume the procedure of [22] . In comparison to [23] , we provide the first completeness and complexity results for stage graphs. Further, our stage graphs can prove correctness of population protocols and even more general liveness properties, while those of [23] can only prove termination. We also introduce novel techniques for computing stage graphs, which compared to [23] can greatly reduce their size and allows us to prove more examples correct.

There is also a large body of work on parameterized verification via cutoff techniques: one shows that a specification holds for any number of agents iff it holds for any number of agents below some threshold called the cutoff (see [6, 26, 30, 34, 46] , and [16] for a comprehensive survey). Cut-off techniques can be applied to systems with an array or ring communication structure, but they require the existence and effectiveness of a cutoff, which is not the case in our setting. Further parameterized verification techniques are regular model checking [1, 25] and automata learning [7] . The classes of communication structures they can handle are orthogonal to ours: arrays and rings for regular model checking and automata learning, and cliques in our work. Regular model checking and learning have recently been employed to verify safety properties [29] , liveness properties under arbitrary schedulers [55] and termination under finitary fairness [51] . The classes of schedulers considered in [51, 55] are incomparable to ours: arbitrary schedulers in [55] , and finitary-fair schedulers in [51] . Further, these works are based on symbolic state-space exploration, while our techniques are based on automatic construction of invariants and ranking functions [16] .

Let N denote {0, 1, . . .} and let E be a finite set. A unordered vector over E is a mapping V : E → Z. In particular, a multiset over E is an unordered vector M : E → N where M (e) denotes the number of occurrences of e in M . The sets of all unordered vectors and multisets over E are respectively denoted Z E and N E . Vector addition, subtraction and comparison are defined componentwise. The size of a multiset M is denoted |M | = e∈E M (e). We let E k denote the set of all multisets over E of size k. We sometimes describe multisets using a set-like notation, e.g.M = f, g, g or equivalently M = f, 2 · g is such that M (f ) = 1, M (g) = 2 and M (e) = 0 for all e ∈ {f, g}.

Presburger Arithmetic. Let X be a set of variables. The set of formulas of Presburger arithmetic over X is the result of closing atomic formulas, as defined in the next sentence, under Boolean operations and first-order existential quantification. Atomic formulas are of the form

where a i and b are integers, x i are variables and ∼ is either < or ≡ m , the latter denoting the congruence modulo m for any m ≥ 2. Formulas over X are interpreted on N X . Given a formula φ of Presburger arithmetic, we let φ denote the set of all multisets satisfying φ. A set E ⊆ N X is a Presburger set if E = φ for some formula φ.

A configuration is a multiset C of states, which we interpret as a global state with C(q) agents in each state q ∈ Q.

For every t = (x, y) ∈ T with x = X 1 , X 2 , . . . , X k and y = Y 1 , Y 2 , . . . , Y k , we write

and, if so, can occur, leading to the configuration C = C + Δ(t). If t is not enabled at C, then we say that it is disabled. We use the following reachability notation: C t − → C ⇐⇒ t is enabled at C and its occurrence leads to C ,

Observe that, by definition of transitions, C − → C implies |C| = |C |, and likewise for C * − → C . Intuitively, transitions cannot create or destroy agents.

A run is an infinite sequence C 0 t 1 C 1 t 2 C 2 · · · such that C i ti+1 −−→ C i+1 for every i ≥ 0. Given L ⊆ T * and a set of configurations C, we let

Stochastic Scheduling. We assume that, given a configuration C, a probabilistic scheduler picks one of the transitions enabled at C. We only make the following two assumptions about the random experiment determining the transition: first, the probability of a transition depends only on C, and, second, every transition enabled at C has a nonzero probability of occurring. Since C * − → C implies |C| = |C |, the number of configurations reachable from any configuration C is finite. Thus, for every configuration C, the semantics of P from C is a finite-state Markov chain rooted at C. Example 1. Consider the replicated system P = (Q, T ) of arity 2 with states

Intuitively, at every moment in time, agents are either Active or Passive, and have output Yes or No, which corresponds to the four states of Q. This system is designed to satisfy the following property: for every configuration C in which all agents are initially active, i.e., C satisfies C(

, then eventually all agents stay forever in the "yes" states {A Y , P Y }, and otherwise all agents eventually stay forever in the "no" states {A N , P N }.

Let us fix a replicated system P = (Q, T ). Formulas of linear temporal logic (LTL) on P are defined by the following grammar:

where φ is a Presburger formula over Q. We look at φ as an atomic proposition over the set N Q of configurations. Formulas of LTL are interpreted over runs of P in the standard way. We abbreviate ♦ϕ ≡ true U ϕ and ϕ ≡ ¬♦¬ϕ.

Let us now introduce the probabilistic interpretation of LTL. A configuration C of P satisfies an LTL formula ϕ with probability p if Pr[C, ϕ] = p, where Pr[C, ϕ] denotes the probability of the set of runs of P starting at C that satisfy ϕ in the finite-state Markov chain rooted at C. The measurability of this set of runs for every C and ϕ follows from well-known results [65] . The qualitative model checking problem consists of, given an LTL formula ϕ and a set of configurations I, deciding whether Pr[C, ϕ] = 1 for every C ∈ I. We will often work with the complement problem, i.e., deciding whether Pr[C, ¬ϕ] > 0 for some C ∈ I.

In contrast to the action-based qualitative model checking problem of [35] , our version of the problem is undecidable due to adding atomic propositions over configurations (see the full version of the paper [19] for a proof):

It is known that qualitative model checking problems of finite-state probabilistic systems reduces to model checking of non-probabilistic systems under an adequate notion of fairness.

A run of a replicated system P is fair if for every possible step C t − → C of P the following holds: if the run contains infinitely many occurrences of C, then it also contains infinitely many occurrences of C t C .

So, intuitively, if a run can execute a step infinitely often, it eventually will. It is readily seen that a fair run of a finite-state transition system eventually gets "trapped" in one of its bottom strongly connected components, and visits each of its states infinitely often. Hence, fair runs of a finite-state Markov chain have probability one. The following proposition was proved in [35] for a model slightly less general than replicated systems; the proof can be generalized without effort:

. Let P be a replicated system, let C be a configuration of P, and let ϕ be an LTL formula. It is the case that Pr[C, ϕ] = 1 iff every fair run of P starting at C satisfies ϕ.

We implicitly use this proposition from now on. In particular, we define:

Liveness Specifications for Replicated Systems. We focus on a specific class of temporal properties for which the qualitative model checking problem is decidable and which is large enough to formalize many important specifications. Using well-known automata-theoretic technology, this class can also be used to verify all properties describable in action-based LTL, see e.g. [35] .

A stable termination property is given by a pair Π = (ϕ pre , Φ post ), where Φ post = {ϕ 1 post , . . . , ϕ k post } and ϕ pre , ϕ 1 post , . . . , ϕ k post are Presburger formulas over Q describing sets of configurations. Whenever k = 1, we sometimes simply write Π = (ϕ pre , ϕ post ). The pair Π induces the LTL property

Abusing language, we say that a replicated system P satisfies Π if ϕ pre ⊆ ϕ Π , that is, if every configuration C satisfying ϕ pre satisfies ϕ Π with probability 1.

The stable termination problem is the qualitative model checking problem for I = ϕ pre and ϕ = ϕ Π given by a stable termination property Π = (ϕ pre , Φ post ).

Let us reconsider the system from Example 1. We can formally specify that all agents will eventually agree on the majority output Yes or No. Let

. The system satisfies the property specified in Example 1 iff it satisfies Π Y and Π N . As an alternative (weaker) property, we could specify that the system always stabilizes to either output by

In the rest of the paper, we fix a replicated system P = (Q, T ) and a stable termination property Π = (ϕ pre , Φ post ), where Φ post = {ϕ 1 post , . . . , ϕ k post }, and address the problem of checking whether P satisfies Π. We start with some basic definitions on sets of configurations.

-Let C, C be sets of configurations. We say that C leads to C , denoted C C , if for all C ∈ C, every fair run from C eventually visits a configuration of C .

Note that certificates only require the existence of some executions decreasing f , not for all of them to to decrease it. Despite this, we have:

C leads to C iff there exists a certificate for C C .

The proof, which can be found in the full version [19] , depends on two properties of replicated systems with stochastic scheduling. First, every configuration has only finitely many descendants. Second, for every fair run and for every finite execution C w − → C , if C appears infinitely often in the run, then the run contains infinitely many occurrences of C w − → C . We can now introduce stage graphs:

A stage graph of P for the property Π is a directed acyclic graph whose nodes, called stages, are sets of configurations satisfying the following conditions:

1. every stage is an inductive set; 2. every configuration of ϕ pre belongs to some stage; 3. if C is a non-terminal stage with successors C 1 , . . . , C n , then there exists a certificate for C

The existence of a stage graph implies that P satisfies Π. Indeed, by conditions 2-3 and repeated application of Proposition 2, every run starting at a configuration of ϕ pre eventually reaches a terminal stage, say C, and, by condition 1, stays in C forever. Since, by condition 4, all configurations of C satisfy some ϕ i post , after its first visit to C every configuration satisfies ϕ i post .

Example 3. Figure 1 depicts stage graphs for the system of Example 1 and the properties defined in Example 2. The reader can easily show that every stage C is inductive by checking that for every C ∈ C and every transition t ∈ {t 1 , . . . , t 4 } enabled at C, the step C ti − → C satisfies C ∈ C. For example, if a configuration satisfies A Y > A N , so does any successor configuration. The following proposition shows that stage graphs are a sound and complete technique for proving stable termination properties.

Proposition 3 does not tell us anything about the decidability of the stable termination problem. To prove that the problem is decidable, we introduce Presburger stage graphs. Intuitively these are stage graphs whose stages and certificates can be expressed by formulas of Presburger arithmetic.

n ⇐⇒ ϕ(C, n) for some Presburger formula ϕ(x, y). -A Presburger stage graph is a stage graph whose stages and certificates are all Presburger.

Using a powerful result from [36] , we show that: (1) P satisfies Π iff it has a Presburger stage graph for Π (Theorem 2); (2) there exists a denumerable set of candidates for a Presburger stage graph for Π; and (3) there is an algorithm that decides whether a given candidate is a Presburger stage graph for Π (Theorem 3). Together, (1) (2) (3) show that the stable termination problem is semi-decidable. To obtain decidability, we observe that the complement of the stable termination problem is also semi-decidable. Indeed, it suffices to enumerate all initial configurations C |= ϕ pre , build for each such C the (finite) graph G C of configurations reachable from C, and check if some bottom strongly connected component B of G C satisfies B |= ϕ i post for all i. This is the case iff some fair run starting at C visits and stays in B, which in turn is the case iff P violates Π.

We observe that testing whether a given graph is a Presburger stage graph reduces to Presburger arithmetic satisfiability, which is decidable [62] and whose complexity lies between 2-NEXP and 2-EXPSPACE [15] : Theorem 3. The problem of deciding whether an acyclic graph of Presburger sets and Presburger certificates is a Presburger stage graph, for a given stable termination property, is reducible in polynomial time to the satisfiability problem for Presburger arithmetic.

At the current state of our knowledge, the decision procedure derived from Theorem 3 has little practical relevance. From a theoretical point of view, the TOWERhardness result of [33] implies that the stage graph may have non-elementary size in the system size. In practice, systems have relatively small stage graphs, but, even so, the enumeration of all candidates immediately leads to a prohibitive combinatorial explosion.

For this reason, we present a procedure to automatically construct (not guess) a Presburger stage graph G for a given replicated system P and a stable termination property Π = (ϕ pre , Φ post ). The procedure may fail, but, as shown in the experimental section, it succeeds for many systems from the literature.

The procedure is designed to be implemented on top of a solver for the existential fragment of Presburger arithmetic. While every formula of Presburger arithmetic has an equivalent formula within the existential fragment [32, 62] , quantifier-elimination may lead to a doubly-exponential blow-up in the size of the formula. Thus, it is important to emphasize that our procedure never requires to eliminate quantifiers: If the pre-and postconditions of Π are supplied as quantifier-free formulas, then all constraints of the procedure remain in the existential fragment. We give a high-level view of the procedure (see Algorithm 1), which uses several functions, described in detail in the rest of the paper. The procedure maintains a workset WS of Presburger stages, represented by existential Presburger formulas. Initially, the only stage is an inductive Presburger overapproximation PotReach( ϕ pre ) of the configurations reachable from ϕ pre (PotReach is an abbreviation for "potentially reachable"). Notice that we must necessarily use an overapproximation, since post * ( ϕ pre ) is not always expressible in Presburger arithmetic 3 . We use a refinement of the overapproximation introduced in [22, 37] , equivalent to the overapproximation of [24] .

In its main loop (lines 2-9), Algorithm 1 picks a Presburger stage S from the workset, and processes it. the unsatisfiability of the existential Presburger formula φ ∧ ¬ϕ i post , where φ is the formula characterizing S. If S is not terminal, then the procedure attempts to construct successor stages in lines 5-9, with the help of three further functions: AsDead, IndOverapprox, and Split. In the rest of this section, we present the intuition behind lines 5-9, and the specification of the three functions. Sections 5, 6 and 7 present the implementations we use for these functions.

Lines 5-9 are inspired by the behavior of most replicated systems designed by humans, and are based on the notion of dead transitions, which can never occur again (to be formally defined below). Replicated systems are usually designed to run in phases. Initially, all transitions are alive, and the end of a phase is marked by the "death" of one or more transitions, i.e., by reaching a configuration at which these transitions are dead. The system keeps "killing transitions" until no transition that is still alive can lead to a configuration violating the postcondition. The procedure mimics this pattern. It constructs stage graphs in which if S is a successor of S, then the set of transitions dead at S is a proper superset of the transitions dead at S. For this, AsDead(S) computes a set of transitions that are alive at some configuration of S, but which will become dead in every fair run starting at S (line 5). Formally, AsDead(S) returns a set U ⊆ Dead(S) such that S |= ♦dead(U ), defined as follows. Observe that we can compute Dead(S) by checking unsatisfiability of a sequence of existential Presburger formulas: as S is inductive, we have Dead(S) = {t | S |= dis(t)}, and S |= dis(t) holds iff the existential Presburger formula

The following proposition, whose proof appears in the full version [19] , shows that determining whether a given transition will eventually become dead, while decidable, is PSPACE-hard. Therefore, Sect. 7 describes two implementations of this function, and a way to combine them, which exhibit a good trade-off between precision and computation time.

If the set U returned by AsDead(S) is nonempty, then we know that every fair run starting at a configuration of S will eventually reach a configuration of S ∩ dead(U ) . So, this set, or any inductive overapproximation of it, can be a legal successor of S in the stage graph. Function IndOverapprox(S, U) returns such an inductive overapproximation (line 7). To be precise, we show in Sect. 5 that dead(U ) is a Presburger set that can be computed exactly, albeit in doubly-exponential time in the worst case. The section also shows how to compute overapproximations more efficiently. If the set U returned by AsDead(S) is empty, then we cannot yet construct any successor of S. Indeed, recall that we want to construct stage graphs in which if S is a successor of S, then Dead(S ) is a proper superset of Dead(S). In this case, we proceed differently and try to split S: Definition 7. A split of some stage S is a set {S 1 , . . . , S k } of (not necessarily disjoint) stages such that the following holds:

If there exists a split {S 1 , . . . , S k } of S, then we can let S 1 , . . . , S k be the successors of S in the stage graph. Observe that a stage may indeed have a split. We have Dead(C 1 ∪ C 2 ) = Dead(C 1 ) ∩ Dead(C 2 ), and hence Dead(C 1 ∪ C 2 ) may be a proper subset of both Dead(C 1 ) and Dead(C 2 ): Example 4. Consider the system with states {q 1 , q 2 } and transitions t i :

The canonical split of S, if it exists, is the set {S ∩ dead(t) | t / ∈ Dead(S)}. As mentioned above, Sect. 5 shows that dead(U ) can be computed exactly for every U , but the computation can be expensive. Hence, the canonical split can be computed exactly at potentially high cost. Our implementation uses an underapproximation of dead(t) , described in Sect. 6.

We show that, given a set U of transitions, -we can effectively compute an existential Presburger formula describing the set dead(U ) , with high computational cost in the worst case, and -we can effectively compute constraints that overapproximate or underapproximate dead(U ) , at a reduced computational cost.

Downward and Upward Closed Sets. We enrich N with the limit element ω in the usual way. In particular, n < ω holds for every n ∈ N. An ω-configuration is a mapping C ω : Q → N ∪ {ω}. The upward closure and downward closure of a set C ω of ω-configurations are the sets of configurations ↑ C ω and ↓ C ω , respectively defined as:

A set C of configurations is upward closed if C = ↑ C, and downward closed if C = ↓ C. These facts are well-known from the theory of well-quasi orderings:

For every set C of configurations, the following holds:

configurations sup(C), called its decomposition, such that C = ↓ sup(C).

Computing dead(U ) Exactly. It follows immediately from Definition 6 that both dis(U ) and dead(U ) are downward closed. Indeed, if all transitions of U are disabled at C, and C ≤ C, then they are also disabled at C , and clearly the same holds for transitions dead at C. Furthermore:

Proof. For every t ∈ U and q ∈ • t, let C ω t,q be the ω-configuration such that C ω t,q (q) = • t(q) − 1 and C ω t,q (p) = ω for every p ∈ Q \ {q}. In other words, C ω t,q is the ω-configuration made only of ω's except for state q which falls short from • t(q) by one. This ω-configurations captures all configurations disabled in t due to an insufficient amount of agents in state q. We have:

The latter can be made minimal by removing superfluous ω-configurations.

For the case of sup( dead(U ) ), we invoke [45, Prop. 2] which gives a proof for the more general setting of (possibly unbounded) Petri nets. Their procedure is based on the well-known backwards reachability algorithm (see, e.g., [2, 39] ).

Since sup( dead(U ) ) is finite, its computation allows to describe dead(U ) by the following linear constraint 4 :

However, the cardinality of sup( dead(U ) ) can be exponential [45, Remark for Prop. 2] in the system size. For this reason, we are interested in constructing both under-and over-approximations.

Overapproximations of dead(U ) . For every i ∈ N, define dead(U ) i as:

Loosely speaking, dead(U ) i is the set of configurations C such that every configuration reachable in at most i steps from C disables U . We immediately have:

Using Proposition 5 and the following proposition, we obtain that dead(U ) i is an effectively computable overapproximation of dead(U ) . 

A death certificate for U in P is a finite set C ω of ω-configurations such that:

1. ↓ C ω |= dis(U ), i.e., every configuration of ↓ C ω disables U , and 2. ↓ C ω is inductive, i.e., post

If U is dead at a set C of configurations, then there is always a certificate that proves it, namely sup( dead(U ) ). In particular, if C ω is a death certificate for U then ↓ C ω ⊆ dead(U ) , that is, ↓ C ω is an underapproximation of dead(U ) Using Proposition 6, it is straightforward to express in Presburger arithmetic that a finite set C ω of ω-configurations is a death certificate for U : Proposition 7. For every k ≥ 1 there is an existential Presburger formula DeathCert k (U, C ω ) that holds iff C ω is a death certificate of size k for U .

Given a stage S, we try to find a set C ω 1 , . . . , C ω of death certificates for transitions t 1 , . . . , t ∈ T \ Dead(S) such that S ⊆ ↓ C ω 1 ∪ · · · ∪ ↓ C ω . This allows us to split S into S 1 , . . . , S , where S i def = S ∩ ↓ C ω i . For any fixed size k ≥ 1 and any fixed , we can find death certificates C ω 1 , . . . , C ω of size at most k by solving a Presburger formula. However, the formula does not belong to the existential fragment, because the inclusion check S ⊆ ↓ C ω 1 ∪· · ·∪↓ C ω requires universal quantification. For this reason, we proceed iteratively. For every i ≥ 0, after having found C ω 1 , . . . , C ω i we search for a pair

). An efficient implementation requires to guide the search for (C i+1 , C ω i+1 ), because otherwise the search procedure might not even terminate, or might split S into too many parts, blowing up the size of the stage graph. Our search procedure employs the following heuristic, which works well in practice. We only consider the case k = 1, and search for a pair (C i+1 , C ω i+1 ) satisfying (i) and (ii) above, and additionally:

(iii) all components of C ω i+1 are either ω or between 0 and max t∈T,q∈Q

. Condition (iii) guarantees termination. Intuitively, condition (iv) leads to certificates valid for sets U ⊆ T \ Dead(S) as large as possible. So it allows us to avoid splits that, loosely speaking, do not make as much progress as they could. Condition (v) allows us to avoid splits with many elements because each element of the split has a small intersection with S.

An example illustrating these conditions is given in the full version [19] .

Recall that the function AsDead(S) takes an inductive Presburger set S as input, and returns a (possibly empty) set U ⊆ Dead(S) of transitions such that S |= ♦dead(U ). This guarantees S dead(U ) and, since S is inductive, also S S ∩ dead(U ) .

By Proposition 4, deciding if there exists a non-empty set U of transitions such that S |= ♦dead(U ) holds is PSPACE-hard, which makes a polynomial reduction to satisfiability of existential Presburger formulas unlikely. So we design incomplete implementations of AsDead(S) with lower complexity. Combining these implementations, the lack of completeness essentially vanishes in practice.

The implementations are inspired by Proposition 2, which shows that S dead(U ) holds iff there exists a certificate f such that:

To find such certificates efficiently, we only search for linear functions f (C) = q∈Q a(q) · C(q) with coefficients a(q) ∈ N for each q ∈ Q.

Our first procedure computes the existence of a linear ranking function.

Definition 8. A function r : S → N is a ranking function for S and U if for every C ∈ S and every step C t − → C the following holds:

1. if t ∈ U , then r(C) > r(C ); and 2. if t / ∈ U , then r(C) ≥ r(C ).

If r : S → N is a ranking function for S and U , then there exists k ∈ N such that (r, k) is a bounded certificate for S dead(U ) .

Proof. Let M be the minimal finite basis of the upward closed set dead(U ) . For every configuration D ∈ M , let σ D be a shortest sequence that enables some

− − → C for some configurations C and C . By Definition 8, we have r(C) ≥ r(C ) > r(C ), and so condition (Cert) holds. As |σ D t D | ≤ k, we have that (r, k) is a bounded certificate.

It follows immediately from Definition 8 that if r 1 and r 2 are ranking functions for sets U 1 and U 2 respectively, then r defined as r(C) def = r 1 (C) + r 2 (C) is a ranking function for U 1 ∪ U 2 . Therefore, there exists a unique maximal set of transitions U such that S dead(U ) can be proved by means of a ranking function. Further, U can be computed by collecting all transitions t ∈ Dead(S) such that there exists a ranking function r t for {t}. The existence of a linear ranking function r t can be decided in polynomial time via linear programming, as follows. Recall that for every step C u − → C , we have C = C + Δ(u). So, by linearity, we have

Thus, the constraints of Definition 8 can be specified as:

where a : Q → Q ≥0 gives the coefficients of r t , that is, r t (C) = a · C, and a · x def = q∈Q a(q) · x(q) for x ∈ N Q . Observe that a solution may yield a function whose codomain differs from N. However, this is not an issue since we can scale it with the least common denominator of each a(q).

Transitions layers were introduced in [22] as a technique to find transitions that will eventually become dead. Intuitively, a set U of transitions is a layer if (1) no run can contain only transitions of U , and (2) U becomes dead once disabled; the first condition guarantees that U eventually becomes disabled, and the second that it eventually becomes dead. We formalize layers in terms of layer functions. Condition C2 is expressible in Presburger arithmetic because of Proposition 5. However, instead of computing dead(U ) explicitly, there is a more efficient way to express this constraint. Intuitively, dis(U ) = dead(U ) is the case if enabling a transition u ∈ U requires to have previously enabled some transition u ∈ U . This observation leads to:

This allows us to give the constraint lin-layer (U, a) , which is of polynomial size:

lin-layer(U, a) def = lin-layer-fun(U, a) ∧ dis-eq-dead(U ).

The ranking and layer functions of Sects. 7.1 and 7.2 are incomparable in power, that is, there are sets of transitions for which a ranking function but no layer function exists, and vice versa. This is shown by the following two systems: showing that U eventually becomes dead. As C t4t5 −−→ C for C = A, B , there is no ranking function r for this U , which would need to satisfy r(C) < r(C).

For our implementation of AsDead(S), we therefore combine both approaches. We first compute (in polynomial time) the unique maximal set U for which there is a linear ranking function. If this U is non-empty, we return it, and otherwise compute a set U of maximal size for which there is a linear layer function.

We implemented the procedure of Sect. 4 on top of the SMT solver Z3 [57] , and use the Owl [48] and HOA [12] libraries for translating LTL formulas. The resulting tool automatically constructs stage graphs that verify stable termination properties for replicated systems. We evaluated it on two sets of benchmarks, described below. The first set contains population protocols, and the second leader election and mutual exclusion algorithms. All tests where performed on a machine with an Intel Xeon CPU E5-2630 v4 @ 2.20 GHz and 8GB of RAM. The results are depicted in Fig. 2 and can be reproduced by the certified artifact [18] . For parametric families of replicated systems, we always report the largest instance that we were able to verify with a timeout of one hour. For IndOverapprox, from the approaches in Sect. 5, we use IndOverapprox 0 in the examples marked with * and IndOverapprox ∞ otherwise. Almost all constructed stage graphs are a chain with at most 3 stages. The only exceptions are the stage graphs for the approximate majority protocols that contained a binary split and 5 stages. The size of the Presburger formulas increases with increasing size of the replicated system. In the worst case, this growth can be exponential. However, the growth is linear in all examples marked with *. Population Protocols. Population protocols [8, 9] are replicated systems that compute Presburger predicates following the computation-as-consensus paradigm [10] . Depending on whether the initial configuration of agents satisfies the predicate or not, the agents of a correct protocol eventually agree on the output "yes" or "no", almost surely. Example 1 can be interpreted as a population protocol for the majority predicate A Y > A N , and the two stable termination properties that verify its correctness are described in Example 2. To show that a population protocol correctly computes a given predicate, we thus construct two Presburger stage graphs for the two corresponding stable termination properties. In all these examples, correctness is proved for an infinite set of initial configurations. Our set of benchmarks contains a broadcast protocol [31] , three majority protocols (Example 1, [23, Ex. 3], [5] ), and multiple instances of parameterized families of protocols, where each protocol computes a different instance of a parameterized family of predicates 5 . These include various flock-of-birds protocol families ( [28] , [20, Sect. 3] , [31, threshold-n] ) for the family of predicates x ≥ c for some constant c ≥ 0; two families for threshold predicates of the form a · x ≥ c [8, 20] ; and one family for remainder protocols of the form a · x ≡ m c [22] . Further, we check approximate majority protocols ( [27, 56] , [51, coin game]). As these protocols only compute the predicate with large probability but not almost surely, we only verify that they always converge to a stable consensus.

Comparison with [22] . The approach of [22] can only be applied to so-called strongly-silent protocols. However, this class does not contain many fast and succinct protocols recently developed for different tasks [4, 17, 20] .

We are able to verify all six protocols reported in [22] . Further, we are also able to verify the fast Majority [5] protocol as well as the succinct protocols Flock-of-birds [20, Sect. 3] and Threshold [20] . All three protocols are not strongly-silent. Although our approach is more general and complete, the time to verify many strongly-silent protocol does not differ significantly between the two approaches. Exceptions are the Flock-of-birds [28] protocols where we are faster ( [22] reaches the timeout at c = 55) as well as the Remainder and the Flock-ofbirds-threshold-n protocols where we are substantially slower ( [22] reaches the timeout at m = 80 and c = 350, respectively). Loosely speaking, the approach of [22] can be faster because they compute inductive overapproximations using an iterative procedure instead of PotReach. In some instances already a very weak overapproximation, much less precise than PotReach, suffices to verify the result. Our procedure can be adapted to accommodate this (it essentially amounts to first running the procedure of [22] , and if it is inconclusive then run ours).

Other Distributed Algorithms. We have also used our approach to verify arbitrary LTL liveness properties of non-parameterized systems with arbitrary communication structure. For this we apply standard automata-theoretic techniques and construct a product of the system and a limit-deterministic Büchi automaton for the negation of the property. Checking that no fair runs of the product are accepted by the automaton reduces to checking a stable termination property.

Since we only check correctness of one single finite-state system, we can also apply a probabilistic model checker based on state-space exploration. However, our technique delivers a stage graph, which plays two roles. First, it gives an explanation of why the property holds in terms of invariants and ranking functions, and second, it is a certificate of correctness that can be efficiently checked by independent means.

We verify liveness properties for several leader election and mutex algorithms from the literature [3, 40, 42, 44, 50, 59, 61, 64] under the assumption of a probabilistic scheduler. For the leader election algorithms, we check that a leader is eventually chosen; for the mutex algorithms, we check that the first process enters its critical section infinitely often.

Comparison with PRISM [49] . We compared execution times for verification by our technique and by PRISM on the same models. While PRISM only needs a few seconds to verify instances of the mutex algorithms [3, 40, 50, 59, 61, 64] where we reach the time limit, it reaches the memory limit for the two leader election algorithms [42, 44] already for 70 and 71 processes, which we can still verify.

We have presented stage graphs, a sound and complete technique for the verification of stable termination properties of replicated systems, an important class of parameterized systems. Using deep results of the theory of Petri nets, we have shown that Presburger stage graphs, a class of stage graphs whose correctness can be reduced to the satisfiability problem of Presburger arithmetic, are also sound and complete. This provides a decision procedure for the verification of termination properties, which is of theoretical nature since it involves a blind enumeration of candidates for Presburger stage graphs. For this reason, we have presented a technique for the algorithmic construction of Presburger stage graphs, designed to exploit the strengths of SMT-solvers for existential Presburger formulas, i.e., integer linear constraints. Loosely speaking, the technique searches for linear functions certifying the progress between stages, even though only the much larger class of Presburger functions guarantees completeness.

We have conducted extensive experiments on a large set of benchmarks. In particular, our approach is able to prove correctness of nearly all the standard protocols described in the literature, including several protocols that could not be proved by the technique of [22] , which only worked for so-called stronglysilent protocols. We have also successfully applied the technique to some selfstabilization algorithms, leader election and mutual exclusion algorithms.

Our technique is based on the mechanized search for invariants and ranking functions. It avoids the use of state-space exploration as much as possible. For this reason, it also makes sense as a technique for the verification of liveness properties of non-parameterized systems with a finite but very large state space.

Regular model checking

General decidability theorems for infinite-state systems

Regular model checking without transducers (on efficient verification of parameterized systems)

Recent algorithmic advances in population protocols

Fast and exact majority in population protocols

Liveness of parameterized timed networks

Learning regular sets from queries and counterexamples

Computation in networks of passively mobile finite-state sensors

Computation in networks of passively mobile finite-state sensors

The computational power of population protocols

Unbounded-thread program verification using thread-state equations

The Hanoi omega-automata format

Principles of Model Checking

Symbolic counter abstraction for concurrent software

The complexitiy of logical theories

Decidability of Parameterized Verification. Synthesis Lectures on Distributed Computing Theory

Succinct population protocols for presburger arithmetic

Artifact evaluation VM and instructions to generate experimental results for the CAV20 paper: checking Qualitative Liveness Properties of Replicated Systems with Stochastic Scheduling

Checking qualitative liveness properties of replicated systems with stochastic scheduling

Large flocks of small birds: on the minimal size of population protocols

Peregrine: a tool for the analysis of population protocols

Towards efficient verification of population protocols

Automatic analysis of expected termination time for population protocols

The logical view on continuous petri nets

Regular model checking

Reasoning about networks with many identical finite state processes

The cell cycle switch computes approximate majority

Algorithmic verification of population protocols

Learning to prove safety over parameterised concurrent systems

Verification by network decomposition

Guidelines for the verification of population protocols

Theorem proving in arithmetic without multiplication

The reachability problem for petri nets is not elementary

On reasoning about rings

Model checking population protocols

Verification of population protocols

An SMTbased approach to coverability analysis

An SMT-based approach to fair termination analysis

Well-structured transition systems everywhere!

Reachability sets of parameterized rings as regular languages

Reasoning about systems with many processes

Probabilistic self-stabilization

On the reachability problem for 5-dimensional vector addition systems

Token management schemes and random walks yield selfstabilizing mutual exclusion

Structural liveness of petri nets is expspace-hard and decidable

Dynamic cutoff detection in parameterized concurrent programs

A widening approach to multithreaded program verification

Owl: a library for ω-words, automata, and LTL

PRISM 4.0: verification of probabilistic real-time systems

On the advantages of free choice: a symmetric and fully distributed solution to the dining philosophers problem

Fair termination for parameterized probabilistic concurrent systems

Vector addition systems reachability problem (a simpler solution)

Presburger vector addition systems

Vector addition system reversible reachability problem

Liveness of randomised parameterised systems under arbitrary schedulers

Random processes in genetics

Z3: an efficient SMT solver

Distributed information processing in biological and computational systems

Regular model checking

On automatic verification of self-stabilizing population protocols

Myths about the mutual exclusion problem

Über die Vollständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. Comptes Rendus du I er Congrès des mathématiciens des pays slaves

PAT: towards flexible verification under fairness

A simple solution to Lamport's concurrent programming problem with linear wait

Automatic verification of probabilistic concurrent finite-state programs

Open Access This chapter is licensed under the terms of the Creative Commons