key: cord-0060365-lt6yukwg
authors: Mayr, Richard; Schewe, Sven; Totzke, Patrick; Wojtczak, Dominik
title: Simple Stochastic Games with Almost-Sure Energy-Parity Objectives are in NP and coNP
date: 2021-03-23
journal: Foundations of Software Science and Computation Structures
DOI: 10.1007/978-3-030-71995-1_22
sha: 42b1e50ad5de9b7915396b71f7450a2a2d3c9798
doc_id: 60365
cord_uid: lt6yukwg

We study stochastic games with energy-parity objectives, which combine quantitative rewards with a qualitative [Formula: see text] -regular condition: The maximizer aims to avoid running out of energy while simultaneously satisfying a parity condition. We show that the corresponding almost-sure problem, i.e., checking whether there exists a maximizer strategy that achieves the energy-parity objective with probability 1 when starting at a given energy level k, is decidable and in [Formula: see text] . The same holds for checking if such a k exists and if a given k is minimal.

Simple stochastic games (SSGs), also called competitive Markov decision processes [30] , or 2 1 2 -player games [23, 22] are turn-based games of perfect information played on finite graphs. Each state is either random or belongs to one of the players (maximizer or minimizer). A game is played successively moving a pebble along the game graph, where the next state is chosen by the player who owns the current one or, in the case of random states, according to a predefined distribution. This way, an infinite run is produced. The maximizer tries to achieve an objective (in our case almost surely), while the minimizer tries to prevent this. The maximizer can be seen as a controller trying to ensure an objective in the face of both known random failure modes (encoded by the random states) and an unknown or hostile environment (encoded by the minimizer player).

Stochastic games were first introduced in Shapley's seminal work [46] in 1953 and have since then played a central role in the solution of many problems in computer science, including synthesis of reactive systems [45, 42] ; checking interface compatibility [27] ; well-formedness of specifications [28] ; verification of open systems [4] ; and many others.

A huge variety of objectives for such games was already studied in the literature. We will mainly focus on three of them in this paper: parity; meanpayoff; and energy objectives. In order to define them we assume that numeric rewards are assigned to transitions, and priorities (encoded by bounded nonnegative numbers) are assigned to states.

The parity objective simply asks that the minimal priority that appears infinitely often in a run is even. Such a condition is a canonical way to define desired behaviors of systems, such as safety, liveness, fairness, etc.; it subsumes all ω-regular objectives. The algorithmic problem of deciding the winner in nonstochastic parity games is polynomial-time equivalent to the model checking of the modal µ-calculus [49] and is at the center of the algorithmic solutions to the Church's synthesis problem [44] . But the impact of parity games goes well beyond automata theory and logic: They facilitated the solution of two long-standing open problems in stochastic planning [29] and in linear programming [32] , which was done by careful adaptation of the parity game examples on which the strategy improvement algorithm [31] requires exponentially many iterations.

The parity objective can be seen as a special case of the mean-payoff objective that asks for the limit average reward per transition along the run to be non-negative. Mean-payoff objectives are among the first objectives studied for stochastic games and go back to a 1957 paper by Gillette [33] . They allow for reasoning about the efficiency of a system, e.g., how fast it operates once optimally controlled.

The energy objective [14] can be seen as a refinement of the mean-payoff objective. It asks for the accumulated reward at any point of a run not to be lower than some finite threshold. As the name suggests, it is useful when reasoning about systems with a finite initial energy level that should never become depleted. Note that the accumulated reward is not bounded a-priori, which essentially turns a finite-state game into an infinitely-state one.

In this paper we consider SSGs with energy-parity objectives, which requires runs to satisfy both an energy and a parity objective. It is natural to consider such an objective for systems that should not only be correct, but also energy efficient. For instance, consider a robot maintaining a nuclear power plant. We not only require the robot to correctly react to all possible chains of events (parity objective for functional correctness), but also never to run out of energy as charging it manually would be risky (energy objective).

While the complexity of games with single objectives is often in NP ∩ coNP, asking for multiple objectives often makes solving games harder. Parity games are commonly viewed as the simplest of these objectives, and some traditional solutions for non-stochastic games go through simple reductions to mean-payoff or energy conditions (which are quite similar in non-stochastic games) to discounted payoff games that establishes the membership of those problems in UP and coUP [35] . However, asking for two parity objectives to be satisfied at the same time leads to coNP completeness [21] .

We study the almost sure satisfaction of the energy-parity objective, i.e., with probability 1. Such qualitative analysis is important as there are many applications where we need to know whether the correct behavior arises almostsurely, e.g., in the analysis of randomized distributed algorithms (see, e.g, [43, 47] ) and safety-critical examples like the one from above. Moreover, the algorithms for quantitative analysis, i.e., computing the optimal probability of satisfaction, typically start by performing the qualitative analysis first and then solving a game with a simpler objective (see, e.g., [23, 15] ). Finally, there are stochastic models for which qualitative analysis is decidable but quantitative one is not (e.g., probabilistic finite automata [6] ). This may also be the case for our model.

Our contributions. We consider stochastic games with energy-parity winning conditions and show that deciding whether maximizer can win almost-surely for a given initial energy level k is in NP ∩ coNP. We show the same for checking if such k exists at all and checking if a given k is the smallest possible for which this holds. The proofs are considerably harder than the corresponding result for MDPs [40] (on which they are partly based), because the attainable mean-payoff value is no longer a valid criterion in the analysis (via combinations of sub-objectives). E.g., even though the stored energy might be inexorably drifting towards +∞ (resp. −∞), the mean-payoff value might still be zero because the minimizer (resp. maximizer) can delay payoffs for longer and longer (though not indefinitely, due to the parity condition). Moreover, the minimizer might be able to choose between different ways of losing and never commit to any particular way after any finite prefix of the play (see Example 1).

Our proof characterizes almost-sure energy-parity via a recursive combination of complex sub-objectives called Gain and Bailout, which can each eventually be solved in NP ∩ coNP.

Our proof of the coNP membership is based on a result on the strategy complexity of a natural class of objectives, which is of independent interest. We show (cf. Theorem 6; based on previous work in [34] ) that, if an objective O is such that its complement is both shift-invariant and submixing, and that every MDP admits optimal finite-memory deterministic maximizer strategies for O, then the same is true in turn-based stochastic games. Example 1. Fig. 1 shows an energy-parity game that the maximizer can win almost surely when starting with an energy level of ≥ 2 from the middle left node. Whenever the game is at that node with an energy level ≥ 3, then the maximizer can turn left and has at least 1 2 chance that the energy level will never drop to 2 while wining the game with priority 2. This is because we can Fig. 1 : A SSG with two maximizer states ( ), one minimizer state ( ) and one probabilistic state ( ). Each state is annotated with its priority. Each edge is annotated with a reward by which the energy level is increased after traversing it (respectively, decreased if the reward is negative). The maximizer wins if the lowest priority visited infinitely often is even and the energy level never drops below 0.

view this process as a random walk on a half line. If x n is the probability of reaching energy level 2 when starting at energy n then these probabilities are the least point-wise positive solution to the following system of linear equations: x 2 = 1, x n = 2 3 x n+1 + 1 3 x n−1 for all n ≥ 3. We then get that x n = 1 2 n−2 so the probability of not reaching energy level 2 is ≥ 1 2 for all n ≥ 3. Always turning left guarantees that, almost surely, the parity condition holds and the limes inferior of the energy level is not −∞. We call this condition Gain. Strategies for Gain can be used when the energy level is sufficiently high (at least 3 in our example) to win with a positive probability.

However, if maximizer plays for Gain and always moves left, then for every initial energy level the chance of eventually dropping the energy down to level 2 is positive, due to the negative cycle. When that happens, the only other option for the maximizer is to move right. There minimizer can 'choose how to lose', via a disjunction of two conditions that we later formalize as Bailout. Either minimizer goes back to the start state without changing the energy level (thus maximizer wins as the energy stays at level 2 and only the good priority 2 is seen), or minimizer turns right. In the latter case, the play visits a dominating odd priority (which is bad for maximizer) but also increases the energy by 1, which allows maximizer to switch back to playing left for the Gain condition until energy level 2 is reached again.

Our maximizer strategies are a complex interplay between Bailout and Gain. In the example, it is easy to see that the probability of seeing priority 1 infinitely often is zero if maximizer follows the just described strategy (the probability of requiring to go right more than n times is at most ( 1 2 ) n ), so maximizer wins this energy-parity game almost surely. Note that maximizer does not win almost surely when the initial energy level is 0 or 1.

Previous work on combined objectives. Non-stochastic energy-parity games have been studied in [16] . They can be solved in NP ∩ coNP and maximizer strategies require only finite (but exponential) memory, a property that also allowed to show P-time inter-reducibility with mean-payoff parity games. More recently they were also shown to be solvable in pseudo-quasi-polynomial time [26] . Related results on non-stochastic games (e.g., mean-payoff parity) are summarized in [18] .

Most existing work on combined objectives for stochastic systems, for example [17, 18, 9, 40] , is restricted to Markov decision processes (MDPs; aka 1 1 2 -player games). Almost-sure energy-parity objectives for MDPs were first considered in [17, 18] , where a direct reduction to ordinary energy games was proposed. This reduction relies on the assumption that maximizer can win using finite memory if at all. Unfortunately, this assumption does not necessarily hold: it was shown in [40] that an almost sure winning strategy for energy-parity in finite MDPs may require infinite memory. Nevertheless, it was possible to recover the original result, that deciding the existence of a.s. winning strategies is in NP ∩ coNP (and pseudo-polynomial time), by showing that the existence of an a.s. winning strategy can be witnessed by the existence of two compatible, and finite-memory, winning strategies for two simpler objectives. We generalize this approach from MDPs to full stochastic games.

Stochastic mean-payoff parity games were studied in [20] , where it was shown that they can be solved in NP ∩ coNP. However, this does not imply a solution for stochastic energy-parity games, since, unlike in the non-stochastic case [16] , there is no known reduction from energy-parity to mean-payoff parity in stochastic games. (The reduction in [16] relies on the fact that maximizer has a winning finitememory strategy for energy-parity, which does not generally hold for stochastic games or MDPs; see above.)

A related model are the 1-counter MDPs (and stochastic games) studied in [12, 11, 8] , since the value of the counter can be interpreted as the stored energy. These papers consider the objective of reaching counter value zero (which is dual to the energy objective of staying above zero), thus the roles of minimizer and maximizer are swapped. However, unlike in this paper, these works do not combine termination objectives with extra parity conditions.

Structure of the paper. The rest of the paper is organized as follows. We start by introducing the notation and formal definitions of games and objectives in the next section. In Section 3 we show how checking almost-sure energy-parity objectives can be characterized in terms of two newly defined auxiliary objectives: Gain and Bailout. In Sections 4 and 5, we show that almost-sure Bailout and Gain objectives, respectively, can be checked in NP and coNP. Section 6 contains our main result: NP and coNP algorithms for checking almost-sure energy-parity games with a known and unknown initial energy, as well as checking if a given initial energy is the minimal one. We conclude and point out some open problems in Section 7. Due to page restrictions, most proofs in the main body of the paper were replaced by sketches. The detailed proofs can be found in the full version of this paper [41] .

A probability distribution over a set X is a function f : X → [0, 1] such that x∈X f (x) = 1. We write D(X) for the set of distributions over X.

, where all states have an outgoing edge and the set of states is partitioned into states owned by maximizer (V ), minimizer (V ) and probabilistic states (V ). The set of edges is E ⊆ V × V and λ : V → D(E) assigns each probabilistic state a probability distribution over its outgoing edges. W.l.o.g., we assume that each probabilistic state has at most two successors, because one can introduce a new probabilistic state for each excess successor. We let λ(ws)

A path is a finite or infinite sequence ρ def = s 0 e 0 s 1 e 1 . . . such that e i = (s i , s i+1 ) ∈ E holds for all indices i. A run is an infinite path and we write Runs def = (V E) ω for the set of all runs.

A strategy for maximizer is a function σ : (V E) * V → D(E) that assigns to each path ws ∈ (V E) * V a probability distribution over the outgoing edges of its target node s. That is, σ(ws)(e) > 0 implies e = (s, t) ∈ E for some

Of particular interest to us will be the class of memoryless deterministic strategies (MD) and the class of finite-memory deterministic strategies (FD). Strategies for minimizer are defined analogously and will usually be denoted by τ :

A maximizing (minimizing) Markov Decision Process (MDP) is a game in which minimizer (maximizer) has no choices, i.e., all her states have exactly one successor. We will write G[τ ] for the MDP resulting from fixing the strategy τ . A Markov chain is a game where neither player has a choice. In particular, G[σ, τ ] is a Markov chain obtained by setting, in the game G, the strategies for maximizer and minimizer to σ and τ , respectively.

Given an initial state s ∈ V and strategies σ and τ for maximizer and minimizer, respectively, the set of runs starting in s naturally extends to a probability space as follows. We write Runs G w for the w-cylinder, i.e., the set of all runs with prefix w ∈ (V E) * V . We let F G be the σ-algebra generated by all these cylinders. We inductively define a probability function P G,σ,τ s on all cylinders, which then uniquely extends to F G by Carathéodory's extension theorem [5] , by setting P G,σ,τ

Objective Functions. A (Borel) objective is a set Obj ∈ F G of runs. We write Obj def = Runs \ Obj for its complement. Borel objectives Obj are weakly determined [39, 38] , which means that

This quantity is called the value of Obj in state s, and written as Val G s (Obj). We say that Obj holds almost-surely (abbreviated as a.s.) at state s iff there exists σ such that ∀τ, P G,σ,τ s (Obj) = 1. Let AS G (Obj) denote the set of states at which Obj holds almost surely. We will drop the superscript G and simply write Runs, P σ,τ s and AS (Obj), if the game is clear from the context. We use the syntax and semantics of operators F (eventually) and G (always) from the temporal logic LTL [25] to specify some conditions on runs.

A reachability condition is defined by a set of target states T ⊆ V . A run ρ = s 0 e 0 s 1 . . . satisfies the reachability condition iff there exists an i ∈ N s.t. s i ∈ T . We write FT ⊆ Runs for the set of runs that satisfy this reachability condition. Given a set of states W ⊆ V , we lift this to a safety condition on runs and write GW ⊆ Runs for the set of runs ρ = s 0 e 0 s 1 . . . where ∀i. s i ∈ W .

A parity condition is given by a bounded function parity : V → N that assigns a priority (a non-negative integer) to each state. A run ρ ∈ Runs satisfies the parity condition iff the minimal priority that appears infinitely often on the run is even. The parity objective is the subset PAR ⊆ Runs of runs that satisfy the parity condition.

Energy conditions are given by a function r : E → Z, that assigns a reward value to each edge. For a given initial energy value k ∈ N, a run s 0 e 0 s 1 e 1 . . . satisfies the k-energy condition if, for every finite prefix of length n, the energy level k + n i=0 r (e i ) is greater or equal to 0. Let EN(k) ⊆ Runs denote the k-energy objective, consisting of those runs that satisfy the k-energy condition.

The l-storage condition holds for a run s 0 e 0 s 1 e 1 . . . if l + n−1 i=m r (s i , s i+1 ) ≥ 0 holds for every infix s m e m s m+1 . . . s n . Let ST(k, l) ⊆ Runs denote the k-energy l-storage objective, consisting of those runs that satisfy both the k-energy and the l-storage condition. We write ST(k) for l ST(k, l). Clearly, ST(k) ⊆ EN(k).

Mean-payoff and limit-payoff conditions are defined w.r.t. the same reward function as the energy conditions. The mean-payoff value of a run ρ = s 0 e 0 s 1 e 1 . . .

The combined energy-parity objective EN(k) ∩ PAR is Borel and therefore weakly determined, meaning that it has a well-defined (inf sup = sup inf) value for every game [39, 38] . Moreover, the almost-sure energy-parity objective (asking to win with probability 1) is even strongly determined [37] : either maximizer has a strategy to enforce the condition with probability 1 or minimizer has a strategy to prevent this.

The main theorem of this section (Theorem 5) characterizes almost sure energyparity objectives in terms of two intermediate objectives called Gain and k-Bailout for parameters k ≥ 0. This will form the basis of all computability results: we will show (as Theorems 14, 17 and 18) how to compute almost-sure sets for these intermediate objectives.

Definition 2. Consider a finite SSG G = (V, E, λ), as well as reward and parity functions defining the objectives PAR, LimInf(> −∞), LimSup(= ∞) as well as ST(k, l) and EN(k) for every k, l ∈ N. We define combined objectives Gain and k-Bailout

The main idea behind these two objectives is a special witness property for energy-parity. We argue that, if maximizer has an almost-sure winning strategy for energy-parity then he also has one that combines two almost-sure winning strategies, one for Gain and one for k-Bailout.

Notice that playing an almost-sure winning strategy for Gain implies a uniformly lower-bounded strictly positive chance that the energy level never drops below zero (assuming it is sufficiently high to begin with). This fact uses the finiteness of the set of control-states and does not hold for infinite-state MDPs. In the unlikely event that the energy level does get close to zero, maximizer switches to playing an almost sure winning strategy for k-Bailout. This is a disjunction of two scenarios, and the balance might be influenced by minimizer's choices. In the first scenario (ST(k, l) ∩ PAR) the energy never drops much and stays above zero (thus satisfying energy-parity). In the second scenario, (EN(k) ∩ LimSup(= ∞)), the parity objective is temporarily suspended in favor of boosting (while always staying above zero) the energy to a sufficiently high level to switch back to the strategy for Gain and thus try again from the beginning. The probability of infinitely often switching between these modes is zero due to the lower-bounded chance of success in the Gain phase. Therefore, maximizer eventually wins by playing for Gain. Note that maximizer needs to remember the current energy level in order to know when to switch and consequently, this strategy uses infinite memory.

Example 3. Consider again the game in Fig. 1 . The middle left state satisfies both Gain and k-Bailout objectives for all k ≥ 2 almost-surely. The respective winning strategies are to always go left for Gain or always go right for k-Bailout when at that state. Note that it neither satisfies 0-Bailout nor 1-Bailout objectives.

We define the subset W ⊆ V of states from which maximizer can almost surely win both Gain and k-Bailout (assuming sufficiently high initial energy), while at the same time ensuring that the play remains within this set of states. These are the states from which maximizer can win by freely combining individual strategies for the Gain and Bailout objectives. 

This condition describes a fixed-point, and as it is easy to see that if two sets W 1 and W 2 are such fixed-points, then so is W 1 ∪ W 2 . Thus, the maximal fixed-point W is well-defined.

Our main characterization of almost-sure energy-parity objectives is the following Theorem 5. It states that maximizer can almost surely win an EN(k) ∩ PAR objective if, and only if, he can win the easier k-Bailout objective while always staying in the safe set W . Our proof of this characterization theorem relies on the following claim, which allows to lift the existence of finite-memory deterministic optimal strategies from MDPs to SSGs. It applies to a fairly general class of objectives and, we believe, is of independent interest.

Recall that Obj def = Runs \ Obj denotes the complement of objective Obj. For  runs a, b, c ∈ Runs we say that a is a shuffle of b and c if there exist factorizations  b = b 0 b 1 . . . and c = c 0 c 1 . . . such that a = b 0 c 0 b 1 c 1 . . . . An objective Obj is called submixing if, for every run a ∈ Obj that is a shuffle of runs b and c, either b ∈ Obj or c ∈ Obj. Obj is shift-invariant if, for every run s 1 e 1 s 2 e 2 . . ., it holds that s 1 e 1 s 2 e 2 . . . ∈ Obj ⇐⇒ s 2 e 2 . . . ∈ Obj. Shift-invariance slightly generalizes the better-known tail condition (see [34] for a discussion).

Theorem 6. Let O be an objective such that O is both shift-invariant and submixing. If maximizer has optimal FD strategies (from any state s) for O for every finite MDP then maximizer has optimal FD strategies (from any state s) for O for every finite SSG.

This applies in particular to the Gain objective, but not to k-Bailout objectives, as these are not shift-invariant. A proof of Theorem 6 can be found in [41] . It uses a recursive argument based on the notion of reset strategies from [34] .

The remainder of this section is dedicated to proving Theorem 5. We will first collect the remaining technical claims about Gain, Bailout, and reachability objectives. Most notably, as Lemma 8, we show that if maximizer can almost surely win Gain in a SSG, then he can do so using a FD strategy which moreover satisfies an energy-parity objective with strictly positive (and lower-bounded) probability. This is shown in part based on Theorem 6 applied to the Gain objective. We will also need the following fact about reachability objectives in finite MDPs. Proof. Fix a δ ∈ [0, 1) and a state s ∈ V . Both LimInf(= −∞), as well as PAR objectives are shift-invariant and submixing, and therefore also the union has both these properties. It follows that Gain = LimInf(> −∞) ∩ PAR = LimInf(= −∞) ∪ PAR is both shift-invariant and submixing, since the complement of a parity objective is also a parity objective. By Lemma 16 and Theorem 6, there exists an almost-sure winning FD strategyσ for maximizer for the objective Gain from s, i.e., ∀τ. Pσ ,τ s (Gain) = 1, thus yielding Item 1. Let M be the MDP obtained from G by fixing the strategyσ for maximizer from s. Since G is finite andσ is FD, also M is finite. In M we have ∀τ. P τ s (Gain) = 1. In particular, in M, the set T def = {s | Val s (LimInf(= −∞)) = 1} is not reachable, i.e., ∀τ. P τ s (Reach T ) = 0. By Lemma 7, in M there exists a horizon h ∈ N and a constant c < 1 such that for all i ≥ h we have ∀τ. P τ s (EN(i) ∩ Reach T ) ≤ c i 1−c . Since T cannot be reached in M, the condition Reach T evaluates to true and we have ∀τ. P τ s (EN(i)) ≥ 1 − c i 1−c . Since c < 1 and δ < 1, we can pick a sufficiently largek ≥ h such that 1 − ck 1−c ≥ δ and obtain ∀τ. P τ s (EN(k)) ≥ δ in M. Moreover, the above property ∀τ. P τ s (Gain) = 1 in particular implies ∀τ. P τ s (PAR) = 1. Thus we obtain ∀τ. P τ s (EN(k) ∩ PAR) ≥ δ in M. Back in the SSG G, we have ∀τ. Pσ ,τ s (EN(k) ∩ PAR) ≥ δ as required for Item 2.

Proof. Let ρ be a run in EN(k) ∩ PAR. There are two cases. In the first case we have ρ ∈ ∪ l ST(k, l) ∩ PAR and thus directly ρ ∈ k-Bailout. Otherwise, ρ / ∈ ∪ l ST(k, l)∩PAR. Since ρ ∈ PAR, we must have ρ / ∈ ∪ l ST(k, l). Since ρ ∈ EN(k), it follows that ρ does not satisfy the l-storage condition for any l ∈ N. So, for every l ∈ N, there exists an infix ρ of ρ s.t. l + r (ρ ) < 0. Let ρ be the prefix of ρ before ρ . Since ρ ∈ EN(k) we have k +r (ρ ρ ) ≥ 0 and thus r (ρ ) ≥ −k −r (ρ ) > −k +l. To summarize, if ρ / ∈ ∪ l ST(k, l) ∩ PAR then, for every l, it has a prefix ρ with r (ρ ) > −k + l. Thus ρ ∈ LimSup(= ∞). Thus ρ ∈ k-Bailout.

We now define W as the set of states that are almost-sure winning for energy-parity with some sufficiently high initial energy level. (W is also called the winning set for the unknown initial credit problem.)

Proof. Let s ∈ AS (EN(k) ∩ PAR) and σ a strategy that witnesses this property. Except for a null-set, all runs ρ = se 0 s 1 e 1 . . . e n−1 s n . . . from s induced by σ satisfy EN(k) ∩ PAR.

Let ρ = se 0 s 1 e 1 . . . s m be a finite prefix of ρ. For every n ≥ 0 we have k + n−1 i=0 r (e i ) ≥ 0, since ρ ∈ EN(k). In particular this holds for all n ≥ m. So, for every n ≥ m, we have k + Towards Item 1, we have EN(k) ⊆ LimInf(> −∞) and thus EN(k) ∩ PAR ⊆ LimInf(> −∞) ∩ PAR = Gain. Therefore σ witnesses s ∈ AS (Gain ∩ GW ).

Towards Item 2, we have EN(k) ∩ PAR ⊆ k-Bailout by Lemma 9. Thus σ witnesses s ∈ AS (k-Bailout ∩ GW ).

Proof. It suffices to show that W satisfies the monotone condition imposed on W (cf. Definition 4), since W is defined as the largest set satisfying this condition.

Let s ∈ W = k AS (EN(k) ∩ PAR). Then s ∈ AS EN(k) ∩ PAR for some fixedk. By Lemma 11(1) we have s ∈ AS (Gain ∩ GW ). By Lemma 11 (2) we

Proof of Theorem 5. Towards the ⊆ inclusion, we have

by Lemma 11 (2) and Lemma 12. Towards the ⊇ inclusion, let s ∈ AS (k-Bailout ∩ GW ) and σ 1 be a strategy that witnesses this. We show that s ∈ AS (EN(k) ∩ PAR). We now consider the modified SSG G = (W, E, λ) with the state set restricted to W . In particular, s ∈ W and σ 1 witnesses s ∈ AS (k-Bailout) in G . We now construct a strategy σ that witnesses s ∈ AS (EN(k) ∩ PAR) in G , and thus also in G. The strategy σ will use infinite memory to keep track of the current energy level of the run.

Apart from σ 1 , we require several more strategies as building blocks for the construction of σ.

First, in G we had ∀s ∈ W. s ∈ AS (Gain ∩ GW ), and thus in G we have ∀s ∈ W. s ∈ AS (Gain). For every s ∈ W we instantiate Lemma 8 for G with δ = 1/2 and obtain a numberk s and a strategyσ s with 

Thus in G for every s ∈ W there exists a strategyσ s with ∀τ. Pσ s ,τ s (k 2 -Bailout) = 1. The strategiesσ s are called bailout strategies. Let k def = k 1 + k 2 − k + 1. We now define the strategy σ.

Start: First σ plays like σ 1 from s. Since σ 1 witnesses s ∈ AS (k-Bailout) against every minimizer strategy τ , almost all induced runs ρ = se 0 s 1 e 1 . . . satisfy either

Almost all runs ρ of the latter type (B) (and potentially also some runs of type (A)) satisfy EN(k) and l i=0 r (e i ) ≥ k eventually for some l. If we observe l i=0 r (e i ) ≥ k for some prefix se 0 s 1 e 1 . . . e l s of the run ρ then our strategy σ plays from s as described in the Gain part below. Otherwise, if we never observe this condition, then our run ρ is of type (A) and σ continues playing like σ 1 . Since property (A) implies (EN(k) ∩ PAR), this is sufficient.

Gain: In this case we are in the situation where we have reached some state s after some finite prefix ρ of the run, where r (ρ ) ≥ k . Our strategy σ now plays like the gain strategyσ s , as long as r (ρ ) ≥ k − k 1 holds for the current prefix ρ of the run. By Item 2, this will satisfy ∀τ. Pσ s ,τ s (EN(k s )∩PAR) ≥ 1/2 and thus ∀τ. Pσ s ,τ s (EN(k 1 ) ∩ PAR) ≥ 1/2. It follows that with probability ≥ 1/2 we will keep playingσ s forever and satisfy PAR and always r (ρ ) ≥ k − k 1 and thus EN(k), since k + r (ρ ) ≥ k + k − k 1 = k 2 + 1 ≥ 0. Otherwise, if eventually r (ρ ) = k − k 1 − 1 then we have k + r (ρ ) = k 2 . In this case (which happens with probability < 1/2) we continue playing as described in the Bailout part below.

Bailout: In this case we are in the situation where we have reached some state s ∈ W after some finite prefix ρ of the run, where k + r (ρ ) = k 2 . Since s ∈ W , we can now let our strategy σ play like the bailout strategỹ σ s and obtain ∀τ. As long as r (ρ ) < k holds for the current prefix ρ of the run, we keep playingσ s . Otherwise, if eventually r (ρ ) ≥ k holds, then we switch back to playing the Gain strategy above. All the runs that never switch back to playing the Gain strategy must be of type (A) and thus satisfy PAR. Since we have k 2 -Bailout ⊆ EN(k 2 ), it follows that, for every prefix ρ of the run from s , according toσ s we have k 2 + r (ρ ) ≥ 0. Thus, for every prefix ρ of ρ, we have k + r (ρ ) = k + r (ρ ) + r (ρ ) = k 2 + r (ρ ) ≥ 0. Therefore, the EN(k) objective is satisfied by all runs.

As shown above, almost all runs induced by σ that eventually stop switching between the three modes satisfy EN(k) ∩ PAR. Switching from Gain/Bailout to Start is impossible, but switching from Gain to Bailout and back is possible. However, the set of runs that infinitely often switch between Gain and Bailout is a null-set, because the probability of switching from Gain to Bailout is ≤ 1/2. Thus, σ witnesses s ∈ AS (EN(k) ∩ PAR). 

In this section we will argue that it is possible decide, in NP and coNP, whether the bailout objective can be satisfied almost surely. More precisely, we show the existence of procedures to decide if, for a given k ∈ N and state s, there exists an l ∈ N such that s almost-surely satisfies the Bailout(k, l) objective

Recall that the idea behind the Bailout objective is that, during a game for energy-parity, maximizer is temporarily abandoning the parity (but not the energy) condition in order to increase the energy to a sufficient level (which will then allow him to try an a.s. strategy for Gain once more). However, in a stochastic game -as opposed to an MDP [40] -an opponent could possibly prevent this increase in energy level at the expense of satisfying the original energy-parity objective in the first place (cf. Example 1). The Bailout objective is designed to capture the disjunction of both outcomes, as both are favorable for the maximizer. The parameter k is the acceptable total energy drop (i.e., the initial value), and the parameter l is the acceptable energy drop on any infix of a play, which translates to the upper bound on the energy level in the second outcome.

The question can be phrased equivalently as membership of a control state s in the almost-sure set for the k-Bailout objective for a given game G and energy level k ∈ N.

Theorem 14. One can check in NP, coNP and pseudo-polynomial time if, for a given SSG G def = (V, E, λ), k ∈ N and control state s ∈ V , maximizer can almost-surely satisfy k-Bailout from s.

Moreover, there are K, L ∈ N, polynomial in |V | and the largest absolute transition reward, so that k≥0 AS G (k-Bailout) = AS G (Bailout(K, L) ). And so, checking whether state s belongs to k≥0 AS G (k-Bailout) is in NP and coNP.

Proof (sketch). This is shown by a sequence of transformations of the game and ultimately reduced to a finding the winner of a non-stochastic game with an energy-parity objective, which is known to be solvable in NP, coNP and pseudopolynomial time [19] . One important observation is that it is possible to replace, without changing the outcome, the energy EN(k) condition in the Bailout(k, l) objective by the more restrictive energy-storage ST(k, l) condition. See [41] for further details.

In this section we will argue that it is possible to decide, in NP and coNP, whether the Gain objective (i.e., LimInf(> −∞) ∩ PAR) can be satisfied almost surely.

We start by investigating the strategy complexity of winning strategies for the Gain objective.

Lemma 15. In every finite SSG, minimizer has optimal MD strategies for objective Gain.

Proof. We show that maximizer has MD optimal strategies for LimInf(= −∞) ∪ PAR. This is equivalent to the claim of the lemma because LimInf(> −∞) ∩ PAR = LimInf(= −∞) ∪ PAR and the complement of a parity condition is itself a parity condition (with all priorities incremented by one).

We note that both LimInf(= −∞), as well as parity objectives PAR are shiftinvariant and submixing and therefore also that the union LimInf(= −∞) ∪ PAR has both these properties. The claim now follows from the fact that SSGs with objectives that are both submixing and shift-invariant admit MD optimal strategies for maximizer [34, Theorem 5.2] .

Based on the results in [40] one can show a similar claim for maximizer strategies in MDPs.

Lemma 16. For finite MDPs, almost-sure winning maximizer strategies for Gain can be chosen FD.

Using the existence of MD optimal minimizer strategies (Lemma 15) and a coNP upper bound for checking almost sure Gain in MDPs established in [40] , we can derive a coNP procedure. See [41] for full details.

Theorem 17. Checking whether a state s ∈ V of a SSG satisfies Gain almostsurely is in coNP.

The rest of this section will deal with the NP upper bound, which is the most challenging part of this paper. The crux of our proof is the observation that if maximizer has a strategy that wins almost surely against all MD minimizer strategies, then he wins almost surely. This is because one of these MD strategies is optimal due to Lemma 15. We show that, in order to witness such an almost-sure winning strategy for maximizer in SSG G, it suffices to provide a polynomially larger SSG G 3 , together with an almost-sure winning strategy for the storageparity objective (see Theorem 21 in Section 6) in G 3 . This will give us an NP algorithm, because G 3 , along with its winning strategy, can be guessed and verified in polynomial time. Formally we claim that: Theorem 18. Checking whether a state s ∈ V of G satisfies Gain almost-surely is in NP.

Proof. (sketch) For technical convenience, we will assume w.l.o.g. that every SSG henceforth is in a normal form, where every random state has only one predecessor, which is owned by the maximizer. To show the existence of G 3 , we are going to introduce two intermediate games: G 1 and G 2 . These games are never constructed by our NP algorithm, but are just defined to break down the complex construction of G 3 into more manageable steps.

Intuitively, G 1 is just G where all rewards on edges are multiplied by a large enough factor, f , to turn strategies with a mean-payoff > 0 into ones with mean-payoff > 2. G 2 is an extension of G 1 where the maximizer is given a choice before every visit to a probabilistic node. He can either let the game proceed as before, or sacrifice part of his one-step reward in exchange for a more evenly balanced reward outcome, so the energy can no longer drop arbitrarily low when a probabilistic cycle is reached. As a result, in G 2 it suffices to consider a storage-parity objective (see Theorem 21 in Section 6) instead of Gain. The number of choices maximizer is given is the number of MD minimizer strategies, which clearly can be exponential. That would not suffice for an NP algorithm. Therefore, we show that most of these choices are redundant and can be removed without impairing the almost sure wining region. As the result of that pruning, we obtain G 3 of polynomial size.

For the the technical details of the G → G 1 → G 2 → G 3 constructions please see [41] . Figure 2 shows how these transformations may look like.

In this section, we prove the main results of the paper, namely that almost-sure energy parity stochastic games can be decided in NP and coNP. The proofs are straightforward and follow from the much more involved characterization of almost sure energy parity objective in terms of the Bailout and Gain objectives established in Section 3 and their computational complexity analysis in Sections 4 and 5, respectively. Theorem 19. Given an SSG, energy level k * , checking if a state s is almost-sure winning for EN(k * ) ∩ PAR is in NP ∩ coNP.

Proof. Recall that we can compute the set W from Definition 4 by iterating

starting with W 0 def = V , until we reach the greatest fixed point W . Note that at step i we need to solve almost sure Gain and almost sure k AS (k-Bailout), where the states of the game are restricted to W i−1 . There can be at most |V | steps, because at least one state is removed in each iteration. (c) The game G3 Fig. 2 : An example game G (left) and the derived games. The strategy that always loops in the right-most state of G ensures a mean-payoff of 3. As this is the only MD strategy for maximizer that ensures a positive mean-payoff, a factor f = 1 is sufficient here and we have G 1 = G. In the derived game G 2 in Fig. 2b there are as many trade-in options for the random state as there are MD minimizer's strategies in G 1 (just two in this example). The blue one (top left) corresponds to minimizer going left and the red one (top right) to going up in G 1 . Maximizer almost-surely wins Gain in G iff he almost-surely wins a storage-parity condition (see Theorem 21) in G 3 .

It then suffices to check AS (k-Bailout ∩ GW) (i.e., AS (k-Bailout) for the subgame that consists only of the states of the fixed point W for k = k * . Note that this step can be skipped if k * ≥ K, the bound from Theorem 14.

Before we discuss how to use NP and coNP procedures to construct these sets and to conduct the final test on the fixed point W , we note that the '∩GW i−1 ' does not add anything substantial, as these are simply the same tests and procedures conducted on the subgame that only consist of the states of W i−1 .

To obtain an NP procedure for constructing AS (Gain)-or, as remarked above, AS (Gain ∩ GW i−1 )-we can guess and validate its membership for each state s in this set, using the NP result from Theorem 18, and we can guess and validate its non-membership for each state s not in this set in NP, using the coNP result from Theorem 17. Similarly, we can guess and validate both the membership and the non-membership in k AS (k-Bailout ∩ GW i−1 )-and of k AS (k-Bailout ∩ GW i−1 ) by analysing the subgame with only the states in W i−1 -by using the NP and coNP result, respectively, from Theorem 14.

Once we can construct these sets, we can also intersect them and check if a fixed point has been reached. (One can, of course, stop when s / ∈ W i .) We can now conduct the final check in NP using Theorem 18. A coNP algorithm that constructs W can be designed analogously: once W i−1 is known, membership and non-membership of a state s in AS (Gain ∩ GW i−1 ) can be guessed and validated in coNP by Theorem 17 and by Theorem 18, respectively; and membership or non-membership of a state in k AS (k-Bailout ∩ GW i−1 ) can be guessed and validated in coNP using the coNP and NP part, respectively, of Theorem 14.

Once W is constructed, we can conduct the final check in coNP using Theorem 17.

This result, together with the upper bound on the energy needed to win energy-parity objective, allows us to solve the "unknown initial energy problem" [7] , which is to compute the minimal initial energy level required.

Corollary 20. For any state s, checking if there is k such that AS (EN(k) ∩ PAR) holds is in NP ∩ coNP. Also, for a given k * , checking if k * is the minimal energy level required to win almost surely is in NP ∩ coNP as well.

Proof. Due to Theorem 14, if there is an energy level k for which AS (EN(k) ∩ PAR) holds, then it also holds for the bound K whose size is polynomial in the size of the game. We can then simply calculate K and then use NP and coNP algorithms from Theorem 19 for AS (EN(K) ∩ PAR).

As for the second claim, note that checking whether maximizer cannot win almost surely EN(k) ∩ PAR is also in NP and coNP as a complement of a coNP and an NP set, respectively. Therefore, for an NP/coNP upper bound it suffices to simultaneously guess certificates for almost surely EN(k * ) ∩ PAR and not almost surely EN(k * − 1) ∩ PAR and verify them in polynomial time.

Finally, let us mention that the slightly more restrictive storage-parity objectives can also be solved in NP ∩ coNP. These are almost identical to energy-parity except that, in addition, there must exist some bound l ∈ N such that the energy level never drops by more than l during a run. This extra condition ensures that, if the storage-parity objective holds almost-surely, then there must exist a finite-memory winning strategy for maximizer.

Theorem 21. One can check in NP, coNP and pseudo-polynomial time if, for a given SSG H def = (V, E, λ), k ∈ N and control state s ∈ V , maximizer can almost-surely satisfy ST(k) ∩ PAR from s.

Moreover, there is a bound L ∈ N, polynomial in the number of states and the largest absolute transition reward, so that ST(k) ∩ PAR = ST(k, L) ∩ PAR.

Proof. (sketch) This result follows by a simple adaptation of the proofs showing the same computational complexity of the Bailout objective (Section 4). See [41] for further details.

Example 22. In the game in Fig. 1 , maximizer cannot ensure the storage-parity condition ST(k)∩PAR for any initial energy level k. This is because it would imply the existence of a finite-memory almost-surely winning strategy, which as we have already argued, cannot be true. More intuitively, to prevent an intermediate energy drop by l units, a winning maximizer strategy for storage-parity would need to stop moving left after observing the negative cycle in the leftmost state l successive times. However, when maximizer moves right, this gives minimizer the chance to visit the rightmost bad state (with dominating odd priority 1). The chance of that happening is (1/3) l > 0. In particular, this probability is > 0 for any value of the intermediate energy drop l. Therefore, for any fixed l, maximizer would need to move right infinitely often to satisfy storage and lose (against an optimal minimizer strategy that moves to the rightmost state).

We showed that several almost-sure problems for combined energy-parity objectives in simple stochastic games are in NP ∩ coNP. No pseudo-polynomial algorithm is known (just like for stochastic mean-payoff parity games [20] ). All these problems subsume (stochastic) parity games, by setting all rewards to 0. Thus the existence of a pseudo-polynomial algorithm would imply that (stochastic and non-stochastic) parity games are in P, which is a long-standing open problem.

It is known that maximizer already needs infinite memory to win almostsurely a combined energy-parity objective in MDPs [40] . Our results do not imply anything about the memory requirement for optimal minimizer strategies in SSGs for this objective. We conjecture that memoryless minimizer strategies suffice. If this conjecture holds (and is proven), this would greatly simplify the coNP upper bound that we established for this problem.

A natural question is whether results on mean-payoff/energy/parity games can be generalized to a setting with multi-dimensional payoffs. Non-stochastic multi-mean-payoff and multi-energy games have been studied in [48, 36, 1] . To the best of our knowledge, the techniques used there, e.g. upper bounds on the necessary energy levels as in [36] , do not generalize to stochastic games (or MDPs).

Multiple mean-payoff objectives in MDPs have been studied in [10, 24] , but the corresponding multi-energy (resp. multi-energy-parity) objective has extra difficulties due to the 0-boundary condition on the energy. I.e., even on Markov chains, and without any parity condition, it subsumes problems about multidimensional random walks. Some partial results on Markov chains and MDPs have been obtained in [13, 2, 3] , but the decidability of the almost-sure problem for stochastic multi-energy-parity games (and MDPs) remains open.

Solving parity games on integer vectors

Qualitative analysis of VASS-induced MDPs

Decisive Markov Chains

Alternating-time temporal logic

Undecidable problems for probabilistic automata of fixed dimension

Infinite runs in weighted timed automata with energy constraints

Approximating the Termination Value of One-Counter MDPs and Stochastic Games

Optimizing the expected mean payoff in energy Markov decision processes

Markov decision processes with multiple long-run average objectives

One-Counter Stochastic Games

One-counter Markov decision processes

Zero-reachability in probabilistic multi-counter automata

Resource interfaces

The complexity of stochastic Rabin and Streett games

Energy parity games

Energy and mean-payoff parity Markov decision processes

Games and Markov decision processes with mean-payoff parity and energy parity objectives

Energy parity games

Perfect-information stochastic mean-payoff parity games

Generalized parity games. In: International Conference on Foundations of Software Science and Computational Structures (FoSSaCS)

Simple stochastic parity games

Quantitative stochastic parity games

Unifying two views on multiple meanpayoff objectives in Markov decision processes

A pseudo-quasi-polynomial algorithm for mean-payoff parity games

Interface automata

Trace theory for automatic hierarchical verification of speed-independent circuits

Exponential lower bounds for policy iteration

Competitive Markov Decision Processes

An exponential lower bound for the parity game strategy improvement algorithm as we know it

Subexponential lower bounds for randomized pivoting rules for the simplex algorithm

Stochastic games with zero stop probabilities

Two-Player Perfect-Information Shift-Invariant Submixing Stochastic Games Are Half-Positional

Deciding the winner in parity games is in UP ∩ co-UP

Fixed-dimensional energy games are in pseudo-polynomial time

On strong determinacy of countable stochastic games

Stochastic games with Borel payoffs

The determinacy of Blackwell games

MDPs with Energy-Parity Objectives

Simple Stochastic Games with Almost-Sure Energy-Parity Objectives are in NP and coNP (2021)

On the synthesis of a reactive module

Verification of the randomized consensus algorithm of Aspnes and Herlihy: a case study

Automata on infinite objects and Church's problem

Supervisory control of a class of discrete event processes

Stochastic games. Proceedings of the national academy of sciences

Fun with firewire: A comparative study of formal verification methods applied to the IEEE 1394 root contention protocol

The complexity of multi-mean-payoff and multi-energy games

Alternating tree automata, parity games, and modal mu-calculus

which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use

The work of all the authors was supported in part by EPSRC grant EP/M027287/1. Sven Schewe and Dominik Wojtczak were also supported by EPSRC grant EP/P020909/1.