Aalborg Universitet

When To Test?

Troubleshooting with Postponed System Test

Ottosen, Thorsten Jørgen; Jensen, Finn V.

Publication date:
2010

Document Version
Early version, also known as pre-print

Link to publication from Aalborg University

Citation for published version (APA):
Ottosen, T. J., & Jensen, F. V. (2010). When To Test? Troubleshooting with Postponed System Test.
Department of Computer Science, Aalborg University.

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

            ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
            ? You may not further distribute the material or use it for any profit-making activity or commercial gain
            ? You may freely distribute the URL identifying the publication in the public portal ?

Take down policy
If you believe that this document breaches copyright please contact us at vbn@aub.aau.dk providing details, and we will remove access to
the work immediately and investigate your claim.

Downloaded from vbn.aau.dk on: April 06, 2021

https://vbn.aau.dk/en/publications/f8ecc587-23fd-4361-a0d4-60934b9f477f


When To Test?
Troubleshooting with Postponed System Test.

Thorsten J. Ottosen and Finn V. Jensen

Technical Report 10-001
Department of Computer Science

Aalborg University, Denmark

October 19, 2010

Abstract.

Troubleshooting is about computing a cost-efficient way to repair a faulty
device. In this paper we investigate the most simple action-based trou-
bleshooting scenario with a single extension: the overall system test is not
required to be performed after each action, but it may be postponed until
several actions have been performed. As shown by Kadane and Simon,
the simple troubleshooting scenario is easily solvable in polynomial time.
However, we conjecture that the new troubleshooting scenario is NP-hard
and therefore describe an Θ(n3) time heuristic. The new heuristic guar-
antees optimality for a class of models, and for other classes of models
we benchmark it against the optimal solution and other simpler greedy
heuristics. The benchmark is performed on artificial models as well as a
real-world troubleshooting model, and they suggest that the heuristics are
close to optimal.


Introduction

Imagine that you have a device which has been running well up to now,
but suddenly it is malfunctioning. A set of faults F describes the possible
causes to the problem. To fix the problem you have a set A of actions,
which may fix the problem by repairing one or more faults. An action, α,
has a positive cost Cα and it has two possible outcomes, namely "α = yes"
(the problem was fixed) and "α = no" (the action failed to fix the problem).
Informally, our task is to determine the best sequence of actions until the
problem is fixed or until no more actions remains to be carried out.

For an overview of state of the art of decision theoretic troubleshoot-
ing we refer the reader to (Jensen et al., 2001), (Langseth and Jensen, 2001),
(Skaanning and Vomlel, 2001), (Langseth and Jensen, 2003), (Koca and Bil-
giç, 2004a), (Koca and Bilgiç, 2004b), and (Warnquist et al., 2008). Basically
all non-trivial troubleshooting domains are NP-hard (Vomlelová, 2003). The
scenario investigated in this article is similar to troubleshooting with con-
ditional costs, but the proof for NP-hardness does not carry through in this
simple scenario. Nevertheless we conjucture that our scenario is NP-hard.

In traditional troubleshooting it has been assumed that each action is
followed by a test that determines whether the entire system D is working.
Furthermore, it has generally been assumed that the cost of this test, CD, is
so low that it should not even be modelled. We show that this assumption
is incorrect in general and develop terminology and algorithms for this new
scenario.

In this paper we shall examine troubleshooting problems where the fol-
lowing model parameters are given:

1. An overall problem D that we need to solve (e.g. "my printer does
not work").

2. A set of faults F = {f1, . . . , fm} describing the possible causes to the
problem (e.g. "toner cartridge empty").

3. For each fault f ∈F, a probability P(f) describing how likely it is that
f is present when troubleshooting begins.

4. A set of actions A = {α1, . . . ,αn} that can potentially solve D (e.g.
"change the toner cartridge").

5. For each action α ∈A, a cost Cα describing the resources required to
perform the action.

6. For each action α ∈ A, a success probability P(α = yes|f) describ-
ing the likelihood of solving D by performing the action when f is
present.

2


7. A cost of testing if the problem D remains, CD > 0.

Furthermore, we have the following simplifying assumptions about the
model:

Assumption 1. The single fault assumption: Initially the problem D is known to
exist and it is due to the presence of a single fault from F.

Assumption 2. The action result assumption: repeating a failed action will not
fix the problem.

Assumption 3. The carefulness assumption: by performing an action or testing
the system, we never introduce new faults.

Assumption 4. The independent actions assumption: Each action repairs a dis-
tinct set of faults that may be causing the problem D.

We uphold these assumptions throughout this paper.

Remark. Assumption 1 implies∑
f∈F

P(f) = 1

and by Assumption 4 the repair probability of an action can be computed
as

P(α = yes) =
∑
f∈F

P(α = yes|f) · P(f) .

However, since actions may be imperfect (P(α = yes|f) < 1), Assumption
4 does not imply ∑

α∈A
P(α = yes) = 1 .

Furthermore, the above equation can also be violated by the fact that not
all faults may be addressed by an action.

Before we can define our optimization problem formally, we need some
notation and terminology. We write εi to denote that the first i actions have
failed (ε0 ⊆ εi), and we have by assumption P

(
ε0
)

= 1 because the device
is faulty. We call P(α = yes) for the repair probability of α, and we shall
abbreviate it with Pα. To make it clear that the cost of an action includes
the cost of testing the system CD, we write CA+D for CA + CD and so CA
never includes the cost of testing the system. For each V ⊆A we can form
a compound action A with the following properties

PA =
∑
α∈V

Pα, CA =
∑
α∈V

Cα, |A| = |V | .

3


To make the distinction clear, we call actions α ∈ A for atomic actions. We
can see that a partition of A into k subsets {V1, . . . , Vk} induces a new
model with independent actions {A1, . . . , Ak}. We shall also say that
{A1, . . . , Ak} is a partition of A whenever the corresponding sets of atomic
actions form a partition of A. From now on we shall not distinguish be-
tween the set of actions and the compound action; that is, we use A to mean
a set of atomic actions and write A = {α1, . . . ,α|A|}. A troubleshooting se-
quence is a sequence of compound actions s = 〈A1, . . . , A`〉 prescribing the
process of repeatedly performing the next action until the problem is fixed
or the last action has been performed and such that {A1,. . . ,A`} is a partition
of A. When each compound action contains only one action, we say that
the sequence is atomic. A subsequence of s, s[k,m] = 〈Ak+1, . . . , Am〉 with
k ≤ m, is called a partial troubleshooting sequence if k ≥ 1 or m < `. If
k = 0, we may simply write s[m].

Definition 1. Let s = 〈A1, . . . , A`〉 be a troubleshooting sequence. Then
the expected cost of repair (ECR) of s is the mean of the cost until an action
succeeds or all actions have been performed:

ECR (s) =
∑̀
i=1

CAi+D · P
(
εi−1

)
.

Formally, the problem is then to find a troubleshooting sequence with min-
imal ECR (·) over all possible sequences. We call such a sequence for opti-
mal.

An important definition is that of efficiency:

Definition 2. The efficiency of an action A is

ef(A) =
PA

CA+D
.

Its importance stems from the following results.

Theorem 1 (Kadane and Simon (1977)). Let s = 〈α1, . . . ,αn〉 be an atomic
troubleshooting sequences, and let CD = 0. Then s is optimal if and only if

ef(αi) ≥ ef(αi+1) for i ∈{1, . . . ,n− 1} .

Theorem 2 (Jensen et al. (2001)). Let s = 〈α1, . . . ,αn〉 be an atomic trou-
bleshooting sequence. Then a necessary condition for s to be optimal, even when
CD > 0, is that

ef(αi) ≥ ef(αi+1) for i ∈{1, . . . ,n− 1} .

Corollary 1. Theorem 2 holds for arbitrary troubleshooting sequences.

4


Proof. Because a troubleshooting sequence partitions A, the compound ac-
tions can be seen as virtual actions which are also independent, thus satis-
fying Assumption 3. All other assumptions cannot be invalidated.

We may also compute the ECR of any atomic troubleshooting sequence:

Theorem 3 (Jensen et al. (2001)). Let s = 〈α1, . . . ,αn〉 be an atomic trou-
bleshooting sequence. Then the ECR of s may be computed as

ECR (s) =

n∑
i=1

Cαi+D ·


1 − i−1∑

j=1

Pαj


 ,

where 1 −
∑i−1

j=1 Pαj = P
(
εi−1

)
.

Corollary 2. Theorem 3 holds for arbitrary troubleshooting sequences.

Proof. Same argument as for Corollary 1.

Thus, due to Assumption 1-4, we may completely ignore the F, P(f), and
P(α = yes|f) once the repair probabilities have been computed. Therefore
we only use Pα in the rest of this paper.

The following example shows that Theorem 1 does not extend to arbi-
trary troubleshooting sequences when CD > 0.

Example 1. Consider a model with four atomic actions α1, α2, α3 and α4.
The other properties of the model are as follows:

Pα Cα Cα+D ef(α)

α1 0.24 1 2 0.12
α2 0.42 3 4 0.105
α3 0.2 1 2 0.1
α4 0.14 19 20 0.007

So CD = 1. We have

ECR (〈α1,α2,α3,α4〉) = 2 + 0.76 · 4 + 0.34 · 2 + 0.14 · 20 = 8.52
ECR (〈{α1,α2},α3,α4〉) = 5 + 0.34 · 2 + 0.14 · 20 = 8.48

thus proving that it is more efficient to form the compound action A =
{α1,α2}.

To be able to test heuristic algorithms it is necessary with an algorithm
that is guaranteed to find an optimal value. Exhaustive searches are al-
ways able to do this, albeit the algorithm might take exponential or super-
exponential time. Algorithm 1 shows an exhaustive search with pruning
(line 11, applying Theorem 2) and memoization (line 5-8).

5


Algorithm 1 Exhaustive Search With Compound Actions
1: function GENERATEALLSEQUENCES(A, &S)
2: Let all = ∅
3: for i = 1 to |A| do
4: for all

(|A|
i

)
ways to construct the first compound action A do

5: if A 6∈ S then
6: Compute A from scratch
7: Set S = S∪{A}
8: end if
9: Let after = GENERATEALLSEQUENCES(A\ A,S)

10: for s = 〈A1, . . . , A`〉 ∈ after do
11: if ef(A1) ≤ ef(A) then
12: Let s′ = 〈A, A1, . . . , A`〉
13: Set all = all∪{s′}
14: end if
15: end for
16: end for
17: end for
18: return all
19: end function
20: function OPTIMALCOMPOUNDACTIONSEQUENCE(A)
21: Let S = ∅
22: Let all = GENERATEALLSEQUENCES(A,S)
23: return arg min

s∈all
ECR (s)

24: end function

6


Let us briefly discuss the complexity of this algorithm. The algorithm re-
cursively generates all possible sequences of compound actions. The num-
ber of such sequences is n! ·2n−1 because for each of the n! permutations of
the n atomic actions we can form 2n−1 different partitions. Thus the run-
ning time and memory requirement of the algorithm is O(n! · 2n). Note,
however, that this is not a tight upper bound since a troubleshooting se-
quence s = 〈A1, . . . , A`〉 can be constructed by

∏`
i=1 |Ai|! different order-

ings. Later we show that the complexity of an exhaustive search can be
reduced to Θ(n3 · n!).

Two Simple Greedy Heuristics

In this section we shall develop two simple greedy algorithms. Further-
more, we also give examples showing that none of the algorithms are guar-
anteed to find an optimal troubleshooting sequence.

For the first algorithm we consider how the ECR of a troubleshooting
sequence can be improved by merging two neighbouring actions into one
compound action. We have the following result.

Theorem 4. Let the two troubleshooting sequences s and s′ be given as

s = 〈. . . , Ax, Ax+1, . . .〉
s′ = 〈. . . , A, . . .〉

with A = {Ax, Ax+1} and everything else being the same. Then s′ has a strictly
lower ECR than s if and only if

CD >
CAx+1 · Pαx

1 −
∑x

j=1 Pαj
.

Proof. When the sequence with the new compound action is better, we
must have

ECR
(
s′
)
− ECR (s) ≤ 0 .

We observe that most of the terms cancel each other out and we are left

7


with

CA+D · P
(
εx−1

)
−
[
CAx+D · P

(
εx−1

)
+ CAx+1+D · P(ε

x)
]
< 0

m

CA+D < CAx + CD + (CAx+1 + CD) ·
P(εx)

P(εx−1)

m

CAx+1 ·
(

1 −
P(εx)

P(εx−1)

)
< CD ·

P(εx)

P(εx−1)

m

CD > CAx+1 ·

(
P
(
εx−1

)
P(εx)

− 1

)
m

CD > CAx+1 ·

(
Pαx

1 −
∑x

j=1 Pαj

)

Theorem 4 immediately suggests a procedure where we greedily merge ad-
jacent actions until the ECR is no longer improved. We start with an atomic
troubleshooting sequence and form compound actions by repeated appli-
cation of Theorem 4. Justified by Theorem 1, there are two obvious choices
of the initial sorting: with respect to descending efficiency or descending
P
C

-value. We have summarized this procedure in Algorithm 2. (The special
function 1st arg(·) returns the first index for which the boolean argument
returns true.) As specified, the algorithm runs in Θ(n2) time. However, if
the sum in line 5 in CompoundAction(·) are precomputed , the total cost
of all calls to CompoundAction(·) can be made linear. The running time of
the algorithm is then dominated by the initial sorting of the actions leading
to an Θ(n · lg n) worst case running time.
Is Algorithm 2 guaranteed to find an optimal troubleshooting sequence?
The example below show it is not.

Example 2. We consider the following model:

Pα Cα Cα+D ef(α)
Pα
Cα

α1 0.61 1 11 0.055 0.61
α2 0.21 5 15 0.014 0.042
α3 0.18 3 13 0.013 0.06

So CD = 10. Using the ef(·) order Algorithm 2 first computes

10 <
5 · 0.61
1 − 0.61

≈ 7.8 .

8


Algorithm 2 Computing sequence with compound actions greedily
1: function COMPOUNDACTION(〈α1, . . . ,αn〉 ,x)
2: if x = n then
3: return {αn}
4: end if

5: Let m =
n

1st arg
i=x

(
CD ≤

Cαi+1 · Pαi
1 −

∑i
j=1 Pαj

)
6: return {αx, . . . ,αm}
7: end function
8: function GREEDYCOMPOUNDACTIONSEQUENCE(A)
9: Let s′ = 〈α1, . . . ,αn〉 such that ef(αi) ≥ ef(αi+1) or

Pαi
Cαi
≥

Pαi+1
Cαi+1

.

10: Let s = 〈〉
11: Let i = 1
12: while i ≤ n do
13: Let A = COMPOUNDACTION(s′, i)
14: Set s = s + A
15: Set i = i + |A|
16: end while
17: return s
18: end function

Since this is false, it continues with the second test

10 <
3 · 0.21

1 − 0.61 − 0.21
= 3.5

which means that the algorithm suggest the sequence consisting of a single
compound action {α1,α2,α3}. If we repeat the calculations using the PC -
order, we get the same result. The ECR is easily seen to be 19, but

ECR (〈α1,{α2,α3}〉) = 11 + 0.39 · 18 = 18.02

so the algorithm is not optimal.

Let us now consider another greedy heuristic. The algorithm is based
on the idea that we should repeatedly form the most efficient compound
action until all atomic actions have been assigned a compound action. To
do this, we need a simple way of computing the most efficient compound
action from a set of atomic actions. We first need a lemma.

Lemma 1. Let a,b,c, and d be strictly positive numbers. Then

a + b

c + d
≥
a

c
if and only if

b

d
≥
a

c

9


Proof.

a + b

c + d
≥
a

c
⇔ a · c + b · c ≥ a · c + a ·d ⇔

b

d
≥
a

c

Theorem 5. Let V ⊆ A be given. Let A be a compound action consisting of ac-
tions in V with ef(A) maximal and with |A| minimal over those actions with max-
imal efficiency. Then A can be found by the following procedure: define ef(A) = 0
when A = ∅; then repeatedly add an action α ∈ V to A with the highest Pα

Cα
value

until ef(A) does not increase.

Proof. Let A consist of actions in V such that ef(A) is maximized and |A| is
minimal. Then ef(A) equals∑

α∈A Pα

CD +
∑

α∈A Cα
=

∑
α∈A\{β} Pα + Pβ

CD +
∑

α∈A\{β} Cα + Cβ
=
SP + Pβ
SC + Cβ

=
P

C

where β is chosen arbitrarily. Let furthermore α ∈ V \ A. We shall prove

Pβ
Cβ
≥

P

C
≥

Pα
Cα

which implies the theorem. We first prove

Pβ
Cβ
≥

P

C
(i).

Because ef(A) is maximal and A has minimal size, Lemma 1 gives

SP + Pβ
SC + Cβ

>
SP
SC
⇔

Pβ
Cβ

>
SP
SC

(ii).

Then assume Equation (i) is false; then by Lemma 1

SP + Pβ
SC + Cβ

≥
Pβ
Cβ
⇔

SP
SC
≥

Pβ
Cβ

which is a contradiction with Equation (ii). We then consider the second
inequality. Since ef(A) is maximized, Lemma 1 implies

P

C
≥

P + Pα
C + Cα

⇔
P

C
≥

Pα
Cα

which completes the proof.

Remark. In (Langseth and Jensen, 2001) a similar heuristic procedure is
reached albeit for a slightly different problem.

10


Algorithm 3 Greedy algorithm with max-efficient actions
1: function GREEDYCOMPOUNDACTIONSEQUENCE2(A)
2: Let s′ = 〈α1, . . . ,αn〉 such that

Pαi
Cαi
≥

Pαi+1
Cαi+1

3: Let s = 〈〉
4: Repeatedly add the most efficient compound action to s
5: return s
6: end function

Algorithm 3 shows a greedy algorithm based on Theorem 5. Like Al-
gorithm 2 we can precompute the sums used in the efficiency and get a
running time of Θ(n · lg n).

The following example shows that Algorithm 3 is not guaranteed to find
an optimal troubleshooting sequence.

Example 3. Consider the following model

Pα Cα Cα+D
Pα
Cα

α1 0.15 1 2 0.150
α2 0.35 2 3 0.175
α3 0.50 3 4 0.167

where the ordering considered is 〈α2,α3,α1〉. Algorithm 3 computes the
following values to determine the first compound action:

ef({α2}) =
0.35

3
= 0.117

ef({α2,α3}) =
0.35 + 0.5

6
= 0.1416

ef({α2,α3,α1}) =
1

7
= 0.1429

Thus the suggested sequence is one single compound action with ECR 7.
However,

ECR (〈{α2,α3},α1〉) = 6 + 0.15 · 2 = 6.3
which shows the algorithm is not optimal.

Having established the heuristic nature of both greedy algorithms, we would
like to know if they at least find a local optimum. We use the following def-
inition to define a local optimum.

Definition 3. Let s = 〈α1, . . . ,αn〉 be a sequence of atomic actions, and let
sc = 〈B1, . . . , Bm〉 be a compound action sequence. Then sc is said to be
consistent with s if ∀ k < l

αi ∈ Bk and αj ∈ Bl =⇒ i < j.

11


In other words, sc can be seen as an ordered partition of s.

If an atomic action sequence s is given, we call the set of all compound
action sequences consistent with s for C(s). A local optimum is then a com-
pound action sequence sc ∈ C(s) with minimal ECR. We then say the (or-
dered) partition induced by sc is optimal.

From Example 2 and 3 it follows that neither Algorithm 2 nor Algorithm
3 guarantees an optimal partition.

An Algorithm For Optimal Partitioning

We have now seen how the greedy algorithms fail to provide optimal se-
quences even under the restriction that the solution must be consistent with
a single seed-sequence s of atomic actions. However, since there are 2n−1

ways to partition the actions of s, a brute force approach is quite impracti-
cal even for models of moderate size. By exploiting Theorem 6 below we
can construct an O(n3) dynamic programming algorithm.

Lemma 2. Let k ≥ 0 and let sc = 〈A1, . . . , A`〉 be a partial troubleshooting
sequence that is performed after the first k actions have been performed. Then the
contribution of sc to the total ECR is given by

ECRk(s
c) =

∑̀
i=1

P
(
εk+i−1

)
· CAi+D

where

P
(
εk+i−1

)
= 1 −

k∑
j=1

Pαj −
i−1∑
j=1

PAj .

Proof. Follows directly from Corollary 2.

The Lemma ensures that the following definition is sound.

Definition 4. Let s = 〈α1, . . . ,αn〉 be an atomic troubleshooting sequence.
For any partial sequence s[k,m] with k ≤ m the ECR of an optimal com-
pound action sequence consistent with s[k,m] is defined by

ECR∗(s[k,m]) =

{
min

sc∈C(s[k,m])
ECRk(s

c) if k < m

0 if k = m
.

Theorem 6. Let s = 〈α1, . . . ,αn〉 be an atomic troubleshooting sequence. Then
for k < m, ECR∗(s[k,m]) can be calculate as

ECR∗(s[k,m]) =
m

min
i=k+1

(
ECRk(〈{αk+1, . . . ,αi}〉) + ECR∗(s[i,m])

)

12


Proof. We use induction in m−k. Basis step: m−k = 1. Since there is only
one compound sequence consistent with 〈αm〉, we get

ECR∗(s[k,m]) = min(ECRk(〈{αm}〉) + 0) = ECRk(〈{αm}〉)

so the theorem holds in this case. Induction step: assume the theorem holds
for m−k ≤ x and let m−k = x + 1. Then let

s∗ = arg min
sc∈C(s[k,m])

ECRk(s
c)

for which we must prove

ECR∗(s∗) =
m

min
i=k+1

(
ECRk(〈{αk+1, . . . ,αi}〉) + ECR∗(s[i,m])

)
.

By Lemma 2 it is correct to split the ECR into two terms corresponding to an
ordered partition of s∗. Furthermore, by the induction hypothesis we can
compute ECR∗(s[i,m]) for any i since m−i ≤ x. Then consider that s∗ must
start with a compound action consisting of y ∈ {1, 2, . . . ,x + 1} actions.
These choices of the first action correspond exactly to the expressions that
we minimize over, and the result follows.

Theorem 6 shows that the problem of finding an optimal partition of
a seed sequence s requires optimal partitions of smaller partial sequences
(the optimal-substructure property) and that these smaller partial sequences
are needed by the computation of several bigger sequences (the overlap-
ping subproblems property). These two properties enables a dynamic pro-
gramming algorithm, that is, an algorithm that only computes solutions to
subproblems once by storing the results in a table. Algorithm 4 uses this
approach to find the optimal partition of a seed-sequence s. The algorithm
runs in Θ(n3) time and uses Θ(n2) space. The correctness of the algorithm
follows from the above theorem. Algorithm 4 now also means that we can
make a more efficient exhaustive search for the best troubleshooting se-
quence. The algorithm runs in Θ(n3 · n!) time and is shown in Algorithm
6.

Unfortunately, Algorithm 5 is not guaranteed to find an optimal trou-
bleshooting sequence, as the following example shows.

Example 4. Consider the model from Example 3, but take CD = 2:

Pα Cα Cα+D ef(α)
Pα
Cα

α1 0.15 1 3 0.05 0.150
α2 0.35 2 4 0.0875 0.175
α3 0.50 3 5 0.1 0.167

13


Algorithm 4 Optimal Partition into Compound Actions
1: procedure FILLCELL(row,col, s, CD, &ecrMatrix, &prevMatrix)

Require: row > col
2: Let N = row − col
3: Let A = 〈αcol, . . . ,αcol+N〉 from s
4: Let ecr = P

(
εcol−1

)
· CA

5: Set ecrMatrix[row,col] = ecr
6: Set prevMatrix[row,col] = col
7: for prev = col + 1 to row − 1 do
8: Let ecr′ = ecrMatrix[prev,col] + ecrMatrix[row,prev]
9: if ecr′ < ecr then

10: Set ecrMatrix[row,col] = ecr′

11: Set prevMatrix[row,col] = prevMatrix[row,prev]
12: end if
13: end for
14: end procedure
15: function OPTIMALPARTITION(s = 〈α1, . . . ,αn〉, CD)
16: Let N = |s|
17: Let ecrMatrix = {0}
18: Let prevMatrix = {0}
19: for row = 1 to N do
20: for col = row − 1 down to 1 do
21: FILLCELL(row,col, s, CD,ecrMatrix,prevMatrix)
22: end for
23: end for
24: Let s∗ = 〈〉
25: Let prev = N
26: repeat
27: Let i = prevMatrix[prev, 1]
28: Let A = 〈αi+1, . . . ,αprev〉
29: Set s∗ = 〈A, s∗〉
30: until prev < 1
31: ASSERT(ECR (s∗) = ecrMatrix[N, 1] )
32: return s∗

33: end function

Algorithm 5 Compound Action Sequence with Optimal Partition
1: function COMPOUNDACTIONSEQUENCE(A)
2: Let s = 〈α1, . . . ,αn〉 such that ef(αi) ≥ ef(αi+1) or

Pαi
Cαi
≥

Pαi+1
Cαi+1

3: Let s = OPTIMALPARTITION(s)
4: return s
5: end function

14


Algorithm 6 Optimized Exhaustive Search With Compound Actions
1: function OPTIMALCOMPOUNDACTIONSEQUENCE2(A)
2: Let bestEcr = ∞
3: Let bestSeq = 〈〉
4: for all n! permutations of A s do
5: Let s′ = OPTIMALPARTITIONING(s)
6: if ECR (s′) < bestEcr then
7: Set bestEcr = ECR (s′)
8: Set bestSeq = s′

9: end if
10: end for
11: return s′

12: end function

For the ef(·) ordering, the algorithm constructs the following tables—here
visualized as one table with a pair of values (ECR∗(s[k,m]), i) where i− 1
is the split that minimized the (partial) ECR:




(0, 0) (0, 0) (0, 0) (0, 0)
(5, 0) (0, 0) (0, 0) (0, 0)
(7, 0) (2, 1) (0, 0) (0, 0)

(7.45, 2) (2.45, 2) (0.45, 2) (0, 0)




Looking at the bottom left entry, we get that the last action is {α1} and then
looking at the entry above it, that the first action is {α3,α2} and 〈{α3,α2},α1〉
has ECR 7.45. Even though the initial P

C
ordering is different, the algorithm

returns the same troubleshooting sequence. However,

ECR (〈{α1,α3},α2〉) = 6 + 0.35 · 4 = 7.4

which shows that the algorithm does not lead to an optimal sequence.

A Heuristic for The General Problem

We have seen that by changing the ordering of the atomic actions, we could
improve the ECR compared to keeping the ordering fixed. We shall there-
fore consider the orthogonal task of optimizing the ordering given that the
partitioning is fixed.

Definition 5. Let sc = 〈A1, . . . , A`〉 and sd = 〈B1, . . . , B`〉 be two compound
action sequences of the same size. Then sc and sd are partition equivalent if

|Ai| = |Bi| ∀i ∈{1, 2, . . . ,`}.

15


Lemma 3. Let sc = 〈. . . , Ax, . . . , Ay, . . .〉 and sd = 〈. . . , Bx, . . . , By, . . .〉 be
two partition equivalent compound action sequences. Furthermore, let the only
difference between sc and sd be that two actions α ∈ Ax and β ∈ Ay have been
swapped s.t. β ∈ Bx and α ∈ By. Then

ECR (sc) − ECR(sd) = (Cα − Cβ) ·

[
P(β) − P(α) +

y−1∑
i=x

P(Ai)

]

+ [P(β) − P(α)] ·
y∑

i=x+1

CAi+D .

Proof. Let sc and sd be given as above. We then have

ECR(sc) =
∑̀
i=1

CAi+D · (1 −
i−1∑
j=1

P(Aj)) =
∑̀
i=1

CAi ·ai

ECR(sd) =
∑̀
i=1

CBi+D · (1 −
i−1∑
j=1

P(Bj)) =
∑̀
i=1

CBi · bi .

The difference between these two numbers can be expressed as

ECR(sc) − ECR(sd) = ∆X + ∆I + ∆Y (i)

where

• ∆X is the difference caused by changes to CAx,

• ∆I is the difference caused by ai changing into bi for all compound
actions between Ax and Ay, and

• ∆Y is the difference causes by changes to CAy and ay.

For i ∈{1, . . . ,x− 1,y + 1, . . . ,`} the terms cancel out. We have

∆X = ax · (CAx+D − CBx+D) = ax · (Cα − Cβ) .

Furthermore, since bi = ai + P(α) − P(β)∀ i ∈{x + 1, . . . ,y}:

∆I =

y−1∑
i=x+1

(ai − bi) · CAi+D = [P(β) − P(α)] ·
y−1∑
i=x+1

CAi+D

Finally we have

∆Y = ay · CAy+D − by · CBy+D
= ay · CAy+D − by · (CAy+D − Cβ + Cα)
= (ay − by) · CAy+D − by · (Cα − Cβ)
= [P(β) − P(α)] · CAy+D − by · (Cα − Cβ) .

16


Now observe that

by = ay − [P(β) − P(α)]

=

(
ax −

y−1∑
i=x

P(Ai)

)
− [P(β) − P(α)]

which implies that

∆Y = [P(β) − P(α)] · CAy+D

−

[
ax − P(β) + P(α) −

y−1∑
i=x

P(Ai)

]
· (Cα − Cβ) .

Inserting these three results into Equation (i) yields

ECR(sc) − ECR(sd) = (Cα − Cβ) ·

[
P(β) − P(α) +

y−1∑
i=x

P(Ai)

]

+ [P(β) − P(α)] ·
y∑

i=x+1

CAi+D

as required.

Theorem 7. Let sc, α and β be given as in Lemma 3. Then the ECR of sc can be
improved by swapping α and β if and only if

(Cα − Cβ) ·

[
P(β) − P(α) +

y−1∑
i=x

P(Ai)

]
> [P(α) − P(β)] ·

y∑
i=x+1

CAi+D

Proof. We should swap the two actions if

ECR(sc) − ECR(sd) > 0
m

(Cα − Cβ) ·

[
P(β) − P(α) +

y−1∑
i=x

P(Ai)

]
+ [P(β) − P(α)] ·

y∑
i=x+1

CAi+D > 0

m

(Cα − Cβ) ·

[
P(β) − P(α) +

y−1∑
i=x

P(Ai)

]
> [P(α) − P(β)] ·

y∑
i=x+1

CAi+D .

It turns out that in certain cases we can avoid the full analysis of Theorem
7.

17


Proposition 1. Let sc, α and β be given as in Lemma 3. If

P(β) > P(α) and Cα ≥ Cβ, or
P(β) = P(α) and Cα > Cβ

then ECR(sc) can be improved by swapping α and β.

Proof. If P(β) > P(α) and Cα ≥ Cβ, then P(β)−P(α) < 0, but Cα−Cβ ≥ 0.
In Theorem 7 the two other factors are strictly positive, so the left hand side
becomes non-negative while the right hand side becomes negative making
the inequality true. If P(β) = P(α) and Cα > Cβ, then the left hand side
is strictly positive while the right hand side is zero which also makes the
inequality true.

Based on the above results, we can easily search for a better partition
equivalent troubleshooting sequence. The procedure is given in Algorithm
7. Arguably, after the algorithm has terminated, it could be that more ac-
tions could be swapped to improve the ECR. However, we have been un-
successful in proving the number of possible swaps. Therefore we prefer
an algorithm that terminates deterministically. Furthermore, we tried to
run the procedure until no more swaps could be made on our test models,
and we found that it did not improve the ECR compared to Algorithm 7.

Algorithm 7
1: procedure OPTIMIZEPARTITION(& sc=〈A1, . . . , An〉)
2: for x = 1 to n do
3: for α ∈ Ax do
4: for y = x + 1 to n do
5: for β ∈ Ay do
6: if (Cα − Cβ) ·

[
P(β) − P(α) +

∑y−1
i=x P(Ai)

]
>

7: [P(α) − P(β)] ·
∑y

i=x+1 CAi+D
8: then
9: SWAP(α,β)

10: end if
11: end for
12: end for
13: end for
14: end for
15: end procedure

It should be clear that the algorithm runs in Θ(n3) where n is the number
of atomic actions. The reason it cannot run in Θ(n2) time is that checking
Theorem 7 takes linear time on average, albeit when Proposition 1 applies
it only takes constant time. Because any swapping invalidates the involved

18


sums, it is not possible to precompute these as with the greedy approaches
we saw earlier. We can also formulate the improved heuristic described in
Algorithm 8.

Algorithm 8 Compound Action Sequence with Optimized Partition
1: function COMPOUNDACTIONSEQUENCE(A)
2: Let s = 〈α1, . . . ,αn〉 such that ef(αi) ≥ ef(αi+1) or

Pαi
Cαi
≥

Pαi+1
Cαi+1

3: Let s = OPTIMALPARTITION(s)
4: OPTIMIZEPARTITION(s)
5: return s
6: end function

We have by means of a computer constructed examples that show that
the algorithm is not guaranteed to find optimal troubleshooting sequences.
However, for certain models we can prove that optimality is ensured. Con-
sider a model where the actions can be ordered into the sequence s =
〈α1, . . . ,αn〉 such that Pαi ≥ Pαi+1 and Cαi ≤ Cαi+1 for i ∈{1, 2, . . . , n − 1};
we call such models for non-reordable models.

Theorem 8. Let s = 〈α1, . . . ,αn〉 be an atomic troubleshooting sequence in a
non-reordable model such that ef(αi) ≥ ef(αi+1). Then there exists an optimal
troubleshooting sequence sc consistent with s.

Proof. Let sd = 〈B1, . . . , B`〉 and sc = 〈A1, . . . , A`〉 be partition equivalent
compound troubleshooting sequences where sd is optimal and sc is consis-
tent with s. Now assume sd is not consistent with s. Since sc 6= sd, we can
find the index of the first pair of compound actions Ax and Bx where the
two sequences differ. Then at least one action β ∈ Ax, but β 6∈ Bx and at
least one action α ∈ Bx, but α 6∈ Ax. Furthermore, α must be found in
Ay with y > x. Since sc is consistent with s and since the model is non-
reordable

Pβ ≥ Pα and Cβ ≤ Cα (i).

We now consider two cases.
Case 1: at least one of the inequalities in (i) is strict. Then if we apply

Theorem 7 on sd we find that we can improve the ECR of sd by swapping
α and β. However, this is a contradiction to sd being optimal. Hence in this
case any optimal sequence must be consistent with s.

Case 2: we have equality in Equation (i). In that case we can safely
swap α and β in sd without altering the ECR. If we now have sd = sc, then
we are done. If not, we can repeat the procedure until sd = sc. Since each
swap puts at least one atomic action into its rightful compound action, the
procedure terminates in a finite number of steps, thus proving that there
exists an optimal sequence consistent with s.

19


Corollary 3. Algorithm 5 and 8 find an optimal troubleshooting sequence in non-
reorderable models.

Proof. Both algorithms calls OptimalPartition(·) after having sorted
the actions with respect to descending efficiency.

Results

In this section we shall investigate the performance of the heuristic algo-
rithms. In particular, we investigate how the following two model param-
eters affect the precision:

1. The closeness of the initial efficiency of actions.

2. The closeness of the cost of the actions.

Due to the difficulty of finding an optimal solution, we could not inves-
tigate models with more than 8 actions. In the models below the repair
probabilities have not been normalized to make the specification accurate
(but they are of course normalized at runtime) and the efficiency stated is
the efficiency rounded to three decimals when CD = 0.

Model 1: close efficiencies + close costs
α1 α2 α3 α4 α5 α6 α7 α8

Pα 0.21 0.17 0.19 0.155 0.185 0.139 0.17 0.135
Cα 1.8 1.4 1.7 1.3 1.6 1.2 1.5 1.1
ef(α) 0.086 0.090 0.083 0.088 0.085 0.085 0.084 0.091

Model 2: close efficiencies + non-close costs
α1 α2 α3 α4 α5 α6 α7 α8

Pα 0.36 0.205 0.32 0.155 0.28 0.11 0.25 0.05
Cα 8 4 7 3 6 2 5 1
ef(α) 0.026 0.030 0.026 0.030 0.027 0.031 0.029 0.029

Model 3: non-close efficiencies + close costs
α1 α2 α3 α4 α5 α6 α7 α8

Pα 0.4 0.105 0.52 0.055 0.78 0.13 0.6 0.02
Cα 1.8 1.4 1.7 1.3 1.6 1.2 1.5 1.1
ef(α) 0.085 0.029 0.117 0.016 0.187 0.042 0.153 0.007

Model 4: non-close efficiencies + non-close costs
α1 α2 α3 α4 α5 α6 α7 α8

Pα 0.16 0.15 0.05 0.165 0.165 0.159 0.07 0.165
Cα 8 4 7 3 6 2 5 1
ef(α) 0.018 0.035 0.007 0.051 0.025 0.073 0.013 0.152

20


For each of the four initial models we run all the algorithms for increas-
ing values of CD. The following observation shows when it does not make
sense to increase CD anymore.

Proposition 2. Once CD is made so high that the optimal sequence consists of
one compound action, then increasing CD can never change the optimal sequence.

The increment in CD was determined as one permille of the largest cost
of any action in the model. Starting from CD = 0 we kept sampling until the
optimal sequence consists of only one compound action. The tables below
summarize the result for the four models where each column describes the
results of a given algorithm. The first four rows summarize the percent-
wise deviation from the optimal ECR, and the last row shows the percent
of cases where the algorithm was optimal. The number of increments to
CD is given in the parenthesis after the model name. The first algorithm
column is the simple ef(αi) ≥ ef(αi+1) sorting. For Algorithm 2, 5, and 8
we give two numbers. The first corresponds to an initial ef(·) ordering and
the second corresponds to an initial P

C
ordering.

Model 1: close efficiencies + close costs (5828)
ef(αi) ord. Alg. 3 Alg. 2 Alg. 5 Alg. 8

min 0 0 0/0 0/0 0/0
max 128.26 45.56 4.28/2.83 1.47/0.68 1.08/0.63
mean 73.14 10.05 1.66/0.66 0.77/0.07 0.15/0.02
median 79.37 5.97 1.52/0.37 0.79/0 0.11/0
% opt. 1.48 0.05 1.49/26.56 1.49/62.06 39.88/84.80

Model 2: close efficiencies + non-close costs (4201)
ef(αi) ord. Alg. 3 Alg. 2 Alg. 5 Alg. 8

min 0 0 0/0 0 /0 0/0
max 97.16 38.90 6.17/2.34 5.05/0.48 4.17/0.48
mean 54.71 9.01 2.75/0.33 1.87/0.06 1.09/0.03
median 58.87 5.71 3.15/0.07 1.77/0.01 1.17/0
% opt. 0.33 0.05 8.40/32.85 9.50/46.04 18.64/63.91

Model 3: non-close efficiencies + close costs (79145)
ef(αi) ord. Alg. 3 Alg. 2 Alg. 5 Alg. 8

min 0 0 0/0 0/0 0/0
max 149.50 3.93 2.80/2.80 0/0 0/0
mean 124.27 0.08 0.16/0.16 0/0 0/0
median 137.14 0 0/0 0/0 0/0
% opt. 0.45 76.27 73.89/73.89 100/100 100/100

21


Model 4: non-close efficiencies + non-close costs (18085)
ef(αi) ord. Alg. 3 Alg. 2 Alg. 5 Alg. 8

min 0 0 0/0 0/0 0/0
max 219.13 4.87 2.62/2.62 0/0 0/0
mean 162.80 0.35 0.24/0.24 0/0 0/0
median 180.95 0 0/0 0/0 0/0
% opt. 0.25 53.08 76.08/76.08 100/100 100/100

We can summarize the results as follows:

1. Algorithm 8 is by far the best heuristic. Algorithm 3 is the worst of
the new heuristics. Algorithm 2 performs surprisingly well, almost
as good as Algorithm 5 and would thus be considered the preferred
choice for a system under real-time constraints.

2. Models with non-close efficiencies are much easier to solve than mod-
els with close efficiencies. Models with non-close costs seem to induce
larger deviations from the optimal ECR than models with close costs.
We consider none of these findings surprising.

We have also tested the algorithms on a real-world model with 32 actions.
The results are summarized in the table below where the statistics are com-
puted as the percent-wise deviation from the overall best algorithm (Algo-
rithm 8 with P

C
ordering). The −0 indicates that a small percentage of the

cases where better than Algorithm 8. Also notice that the apparent conflict
between "median" and "% best" in column two is due to rounding of the
median.

Model 5: 32 actions (33145)
ef(αi) ord. Alg. 3 Alg. 2 Alg. 5 Alg. 8

min 0 0 0/0 −0/0 −0
max 667.30 3.79 25.30/19.05 10.68/0.05 6.07
mean 555.34 0.14 5.94/3.14 3.25/0 1.51
median 598.70 0 3.73/1.32 3.59/0 1.30
% best 0.01 48.15 0.01/0 0.01/99.43 0.01

The conclusion here is the same with one noticeable difference: Algorithm
3 is now the third best algorithm, only beaten by Algorithm 5 and 8 with
P
C

ordering and far better than any other heuristic. The fact that Algorithm
5 and 8 are very close to each other also suggests that we are quite close to
the optimal value.

22


Conclusion And Further Research

We have extended a classical polynomial troubleshooting scenario with
postponed system test. Albeit this is a somewhat simple extension, we
conjecture that the new scenario is NP-hard. We have identified a class of
non-reordable models where the best of the described heuristics are guar-
anteed optimal. For reorable models, we found that the suggested heuris-
tics provide quite close approximations to an optimal solution—even for
heuristics with an O(n·lg n) complexity. Apart from proving or disproving
NP-hardness, future research includes finding guarantees on the perfor-
mance of the heuristics as well as guarantees about local optimality when
optimizing a given partition.

Acknowledgements

We would like to thank Sven Skyum for the discussions that lead to The-
orem 5. We are also grateful to Dezide Aps for providing the real-world
model used in the experiments. Finally, we would like to thank our col-
legues of the Machine Intelligence group at Aalborg University for provid-
ing such a fruitful research environment.

References

F. V. Jensen, U. Kjærulff, B. Kristiansen, C. Skaanning, J. Vomlel, and
M. Vomlelová. The sacso methodology for troubleshooting complex sys-
tems. Artificial Intelligence for Engineering Design, Analysis and Manufac-
turing, 15:321–333, 2001.

J. Kadane and H. Simon. Optimal strategies for a class of constrained se-
quential problems. The Annals of Statistics, 5:237–255, 1977.

E. Koca and T. Bilgiç. Troubleshooting with questions and conditional costs.
Proceedings of the 13th Turkish Symposium on Artificial Intelligence and Arti-
ficial Neural Networks, pages 271–280, 2004a.

E. Koca and T. Bilgiç. A troubleshooting approach with dependent actions.
In R. L. de Mántaras and L. Saitta, editors, ECAI 2004: 16th European Con-
ference on Artificial Intelligence, pages 1043–1044. IOS Press, 2004b. ISBN
1-58603-452-9.

H. Langseth and F. V. Jensen. Decision theoretic troubleshooting of coherent
systems . Reliability Engineering & System Safety, pages 49–62, 2003.

23


H. Langseth and F. V. Jensen. Heuristics for two extensions of basic trou-
bleshooting. In Proceedings of the Seventh Scandinavian Conference on Arti-
ficial Intelligence, pages 80–89. IOS Press, 2001. ISBN 1-58603-161-9.

C. Skaanning and J. Vomlel. Troubleshooting with simultaneous models.
Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 6th,
2001.

M. Vomlelová. Complexity of decision-theoretic troubleshooting. Int. J.
Intell. Syst., 18(2):267–277, 2003.

H. Warnquist, M. Nyberg, and P. Säby. Troubleshooting when action costs
are dependent with application to a truck engine. In Proceeding of the 2008
conference on Tenth Scandinavian Conference on Artificial Intelligence, pages
68–75, Amsterdam, The Netherlands, The Netherlands, 2008. IOS Press.
ISBN 978-1-58603-867-0.

24