Sufficient Conditions for Causality to be
Transitive

Joseph Y. Halpern∗

Cornell University
Computer Science Department

Ithaca, NY 14853
halpern@cs.cornell.edu

http://www.cs.cornell.edu/home/halpern

Abstract

Natural conditions are provided that are sufficient to ensure that causality
as defined by approaches that use counterfactual dependence and structural
equations will be transitive.

∗I thank Chris Hitchcock and the anonymous reviewers of the paper for perceptive comments
that greatly influenced the structure and story of the paper. Work supported in part by NSF grants
IIS-0812045, IIS-0911036, and CCF-1214844, by AFOSR grants FA9550-08-1-0438, FA9550-
09-1-0266, and FA9550-12-1-0040, and by ARO grant W911NF-09-1-0281.

1


1 Introduction
The question of the transitivity of causality has been the subject of much debate.
As Paul and Hall [2013] say: “Causality seems to be transitive. If C causes D and
D causes E, then C thereby causes E.” The appeal to transitivity is quite standard
in informal scientific reasoning: we say things like “the billiards expert hit ball A,
causing it to hit ball B, causing it to carom into ball C, which then drops into the
pocket”. It then seems natural to conclude then the pool expert’s shot caused ball
C to drop into the pocket.

Paul and Hall [2013, p. 215] suggest that “preserving transitivity is a basic
desideratum for an adequate analysis of causation”. Hall [2000] is even more
insistent, saying “That causation is, necessarily, a transitive relation on events
seems to many a bedrock datum, one of the few indisputable a priori insights we
have into the workings of the concept.” Lewis [1986, 2000] imposes transitivity
in his influential definition of causality, by taking causality to be the transitive
closure (“ancestral”, in his terminology) of a one-step causal dependence relation.

But numerous examples have been presented that cast doubt on transitivity.
Paul and Hall [2013] give a sequence of such counterexamples; Hall [2000] gives
others. I review two such examples in the next section. This leaves us in a some-
what uncomfortable position. It seems so natural to think of causality as transi-
tive. In light of the examples, should we just give up on these intuitions? Paul
and Hall [2013] suggest that “What’s needed is a more developed story, according
to which the inference from “C causes D” and “D causes E” to “C causes E” is
safe provided such-and-such conditions obtain—where these conditions can typ-
ically be assumed to obtain, except perhaps in odd cases . . . ”. The goal of this
paper is to provide sufficient conditions for causality to be transitive. I formalize
this using the structural equations framework of Halpern and Pearl [2001, 2005].
The properties that I require suggest that these conditions apply to any definition
of causality that depends on counterfactual dependence and uses structural equa-
tions (see, for example, [Glymour and Wimberly 2007; Hall 2007; Halpern 2015;
Halpern and Pearl 2005; Hitchcock 2001; Hitchcock 2007; Woodward 2003] for
examples of such approaches).

These conditions may explain why, although causality is not transitive in gen-
eral (and is not guaranteed to be transitive according to any of the counterfactual
accounts mentioned above), we tend to think of causality as transitive, and are
surprised when it is not.

2


2 Defining causation using counterfactuals
In this section, I review some of the machinery of structural equations needed
to define causality. For definiteness, I use the same formalism as that given by
Halpern and Pearl [2005].

2.1 Causal structures
Approaches based on structural equations assume that the world is described in
terms of random variables and their values. Some random variables may have a
causal influence on others. This influence is modeled by a set of structural equa-
tions. It is conceptually useful to split the random variables into two sets: the
exogenous variables, whose values are determined by factors outside the model,
and the endogenous variables, whose values are ultimately determined by the ex-
ogenous variables. For example, in a voting scenario, we could have endogenous
variables that describe what the voters actually do (i.e., which candidate they vote
for), exogenous variables that describe the factors that determine how the voters
vote, and a variable describing the outcome (who wins). The structural equations
describe how the outcome is determined (majority rules; a candidate wins if A
and at least two of B, C, D, and E vote for him; etc.).

Formally, a causal model M is a pair (S,F), where S is a signature, which
explicitly lists the endogenous and exogenous variables and characterizes their
possible values, and F defines a set of modifiable structural equations, relating
the values of the variables. A signature S is a tuple (U,V,R), where U is a
set of exogenous variables, V is a set of endogenous variables, and R associates
with every variable Y ∈ U ∪V a nonempty set R(Y ) of possible values for Y
(that is, the set of values over which Y ranges). For simplicity, I assume here
that V is finite, as is R(Y ) for every endogenous variable Y ∈ V. F associates
with each endogenous variable X ∈ V a function denoted FX such that FX :
(×U∈UR(U)) × (×Y ∈V−{X}R(Y )) → R(X). This mathematical notation just
makes precise the fact that FX determines the value of X, given the values of
all the other variables in U ∪V. If there is one exogenous variable U and three
endogenous variables, X, Y , and Z, then FX defines the values of X in terms of
the values of Y , Z, and U. For example, we might have FX(u,y,z) = u + y,
which is usually written as X = U + Y .1 Thus, if Y = 3 and U = 2, then X = 5,
regardless of how Z is set.

1The fact that X is assigned U + Y (i.e., the value of X is the sum of the values of U and Y )
does not imply that Y is assigned X − U; that is, FY (U, X, Z) = X − U does not necessarily

3


The structural equations define what happens in the presence of external inter-
ventions. Setting the value of some variable X to x in a causal model M = (S,F)
results in a new causal model, denoted MX=x, which is identical to M, except that
the equation for X in F is replaced by X = x.

Following [Halpern and Pearl 2005], I restrict attention here to what are called
recursive (or acyclic) models. This is the special case where there is some total
ordering ≺ of the endogenous variables (the ones in V) such that if X ≺ Y ,
then X is independent of Y , that is, FX(. . . ,y, . . .) = FX(. . . ,y′, . . .) for all
y,y′ ∈R(Y ). Intuitively, if a theory is recursive, there is no feedback. If X ≺ Y ,
then the value of X may affect the value of Y , but the value of Y cannot affect
the value of X. It should be clear that if M is an acyclic causal model, then given
a context, that is, a setting ~u for the exogenous variables in U, there is a unique
solution for all the equations. We simply solve for the variables in the order given
by ≺. The value of the variables that come first in the order, that is, the variables X
such that there is no variable Y such that Y ≺ X, depend only on the exogenous
variables, so their value is immediately determined by the values of the exogenous
variables. The values of variables later in the order can be determined once we
have determined the values of all the variables earlier in the order.

It is sometimes helpful to represent a causal model graphically. Each node
in the graph corresponds to one variable in the model. An arrow from one node
to another indicates that the former variable figures as a nontrivial argument in
the equation for the latter. The graphical representation is useful for visualizing
causal models, and will be used in the next section.

2.2 A language for reasoning about causality
To define causality carefully, it is useful to have a language to reason about causal-
ity. Given a signature S = (U,V,R), a primitive event is a formula of the form
X = x, for X ∈V and x ∈R(X). A causal formula (over S) is one of the form
[Y1 ← y1, . . . ,Yk ← yk]ϕ, where

• ϕ is a Boolean combination of primitive events,

• Y1, . . . ,Yk are distinct variables in V, and

• yi ∈R(Yi).

hold.

4


Such a formula is abbreviated as [~Y ← ~y]ϕ. The special case where k = 0 is
abbreviated as ϕ. Intuitively, [Y1 ← y1, . . . ,Yk ← yk]ϕ says that ϕ would hold if
Yi were set to yi, for i = 1, . . . ,k.

A causal formula ψ is true or false in a causal model, given a context. As
usual, I write (M,~u) |= ψ if the causal formula ψ is true in causal model M
given context ~u. The |= relation is defined inductively. (M,~u) |= X = x if the
variable X has value x in the unique (since we are dealing with acyclic models)
solution to the equations in M in context ~u (that is, the unique vector of values for
the exogenous variables that simultaneously satisfies all equations in M with the
variables in U set to ~u). The truth of conjunctions and negations is defined in the
standard way. Finally, (M,~u) |= [~Y ← ~y]ϕ if (M~Y =~y,~u) |= ϕ.

2.3 Defining causality
The basic intuition behind counterfactual definitions of causality is that A is a
cause of B if there is counterfactual dependence between A and B: if A hadn’t
occurred (although it did), then B would not have occurred. It is well known that
the counterfactual dependence does not completely capture causality; there are
many examples in the literature where people say that A is a cause of B despite
the fact that B does not counterfactually depend on A (at least, not in this sim-
ple sense). Nevertheless, all the counterfactual definitions of causality (as well as
people’s causality ascriptions) agree that this simple type of counterfactual depen-
dence gives a sufficient condition for causality. For the purposes of this paper, I
consider only cases where this counterfactual dependence holds.

More formally, say that X = x is a but-for cause of ϕ in (M,~u) (where ϕ is a
Boolean combination of primitive events) if (M,~u) |= X = x∧ϕ (so both X = x
and ϕ hold in context ~u) and there exists some x′ such that (M,~u) |= [X ← x′]¬ϕ.
Thus, with a but-for cause, changing the value of X to something other than x
changes the truth value of ϕ; that is, ϕ counterfactually depends on X.

All the complications in counterfactual approaches to causality arise in how
they deal with cases of causality that are not but-for causality. Roughly speaking,
the idea is that X = x is a cause of Y = y if the outcome Y = y counterfactually
depends on X under the appropriate contingency (i.e., holding some other vari-
ables fixed at certain values). While the various approaches to defining causality
differ in exactly how this is done, they all agree that a but-for cause should count
as a cause. So, for simplicity in this paper, I consider only but-for causality and
do not both to give a general definition of causality

5


3 Sufficient Conditions for Transitivity
In this section I present two different sets of conditions sufficient for transitivity.
Before doing that, I give two counterexamples to transitivity, since these motivate
the conditions. The first example is taken from (an early version of) Hall [2004],
and is also considered by Halpern and Pearl [2005].

Example 1: Consider the following scenario:

Billy contracts a serious but nonfatal disease so is hospitalized. Sup-
pose that Monday’s doctor is reliable, and administers the medicine
first thing in the morning, so that Billy is fully recovered by Tuesday
afternoon. Tuesday’s doctor is also reliable, and would have treated
Billy if Monday’s doctor had failed to. Given that Monday’s doctor
treated Billy, it’s a good thing that Tuesday’s doctor did not treat him:
one dose of medication is harmless, but two doses are lethal.

Suppose that we are interested in Billy’s medical condition on Wednesday. We
can represent this using a causal model MB with three variables:

• MT for Monday’s treatment (1 if Billy was treated Monday; 0 otherwise);

• TT for Tuesday’s treatment (1 if Billy was treated Tuesday; 0 otherwise);
and

• BMC for Billy’s medical condition (0 if Billy feels fine on Wednesday; 1 if
Billy feels sick on Wednesday; 2 if Billy is dead on Wednesday).

We can then describe Billy’s condition as a function of the four possible combi-
nations of treatment/nontreatment on Monday and Tuesday. I omit the obvious
structural equations corresponding to this discussion; the causal graph is shown in
Figure 1:

In the context where Billy is sick and Monday’s doctor treats him, MT = 1 is
a but-for cause of TT = 0—because Billy is treated Monday, he is not treated on
Tuesday morning. And TT = 0 is a but-for cause of Billy’s being alive (BMC =
0 ∨ BMC = 1). However, MT = 1 is not a cause of Billy’s being alive. It is
clearly not a but-for cause; Billy will still be alive if MT is set to 0. Indeed, it
is not even a cause under the more general definitions of causality, according to
all the approaches mentioned above; no setting of the other variables will lead
to a counterfactual dependence between MT and BMC 6= 2. This shows that
causality is not transitive according to these approaches. Although MT = 1 is a

6


r

r

r

?

S
S
S
S
SSw
�
�
�
�
��/BMC

TT

MT

Figure 1: Billy’s medical condition.

cause of TT = 0 and TT = 0 is a cause of BMC = 0∨BMC = 1, MT = 1 is not
a cause of BMC = 0∨ BMC = 1. (Of course, according to Lewis [1986, 2000],
who takes the transitive closure of the one-step dependence relation, MT = 1 is a
cause of BMC = 0∨BMC = 1.)

Although this example may seem somewhat forced, there are many quite real-
istic examples of lack of transitivity with exactly the same structure. Consider the
body’s homeostatic system. An increase in external temperature causes a short-
term increase in core body temperature, which in turn causes the homeostatic
system to kick in and return the body to normal core body temperature shortly
thereafter. But if we say that the increase in external temperature happened at
time 0 and the return to normal core body temperature happened at time 1, we
certainly would not want to say that the increase in external temperature at time 0
caused the body temperature to be normal at time 1!2

There is another reason that causality is intransitive, which is illustrated by the
following example, due to McDermott [1995].

Example 2: Suppose that a dog bites Jim’s right hand. Jim was planning to det-
onate a bomb, which he normally would do by pressing the button with his right
forefinger. Because of the dog bite, he presses the button with his left forefinger.
The bomb still goes off.

Consider the causal model MD with variables DB (the dog bites, with values
0 and 1), P (the press of the button, with values 0, 1, and 2, depending on whether
the button is not pressed at all, pressed with the right hand, or pressed with the

2I thank Richard Scheines [personal communication, 2013] for this example.

7


left hand), and B (the bomb goes off). We have the obvious equations: DB is
determined by the context, P = DB + 1, and B = 1 if P is either 1 or 2. In the
context where DB = 1, it is clear that DB = 1 is a but-for cause of P = 2 (if the
dog had not bitten, P would have been 1), and P = 2 is a but-for cause of B = 1
(if P were 0, then B would be 0), but DB = 1 is not a but-for cause of B = 1.
And again, DB = 1 is not a cause of B = 1 even under a more general notion
of causation. Whether or not the dog had bitten Jim, the button would have been
pressed and the bomb would have detonated.

As I said, I believe that we feel that causality is transitive because, in typical
settings, it is. My belief is based mainly on introspection here and informal polling
of colleagues. Even when told that causality is not transitive, people seem to find it
hard to construct counterexamples. This suggests that when they think about their
everyday experience of causality, they come up with examples where causality is
transitive. If there were many counterexamples available in everyday life, it would
be easier to generate them.

I now give two sets of simple conditions that are sufficient to guarantee tran-
sitivity. Specifically, I give conditions to guarantee that if X1 = x1 is a but-for
cause of X2 = x2 in (M,~u) and X2 = x2 is a but-for cause of X3 = x3 in (M,~u),
then X1 = x1 is a but-for cause of X3 = x3 in (M,~u).

The first set of conditions assumes that X1, X2, and X3 each has a default
setting. We can think of the default setting as the result of doing nothing. This
makes sense, for example, in the billiards example at the beginning of the paper,
where we can take the default setting for the shot to be the expert doing nothing,
and the default setting for the balls to be that they are not in motion. Let the default
setting be denoted by the value 0.

Proposition 3.1: Suppose that (a) X1 = x1 is a but-for cause of X2 = x2 in
(M,~u), (b) X2 = x2 is a but-for cause of X3 = x3 in (M,~u), (c) x3 6= 0, (d)
(M,~u) |= [X1 ← 0](X2 = 0), and (e) (M,~u) |= [X1 ← 0,X2 ← 0](X3 = 0).
Then X1 = x1 is a but-for cause of X3 = x3 in (M,~u).

Proof: If X2 = 0 is the unique solution to the equations in the causal model
MX1←0 in context ~u and X3 = 0 in the unique solution to the equations in
MX1←0,X2←0 in context ~u, then it is immediate that X3 = 0 in the unique solution
to the equations in MX1←0 in context ~u. That is, (M,~u) |= [X1 ← 0](X3 = 0). It
follows from assumption (a) that (M,~u) |= X1 = x1. We must thus have x1 6= 0,
since otherwise (M,~u) |= X1 = 0 ∧ [X1 ← 0](X3 = 0), so (M,~u) |= X3 = 0,

8


which contradicts assumptions (b) and (c). Thus, X1 = x1 is a but-for cause of
X3 = x3, since the value of X3 depends counterfactually on that of X1.

Although the conditions of Proposition 3.1 are clearly rather specialized, they
arise often in practice. Conditions (d) and (e) say that if X1 remains in its default
state, then so will X2, and if both X1 and X2 remain in their default states, then so
will X3. (These assumptions are very much in the spirit of the assumptions that
make a causal network self-contained, in the sense defined by Hitchcock [2007].)
Put another way, this says that the reason for X2 not being in its default state is
X1 not being in its default state, and the reason for X3 not being in its default
state is X1 and X2 both not being in their default states. The billiard example can
be viewed as a paradigmatic example of when these conditions apply. It seems
reasonable to assume that if the expert does not shoot, then ball A does not move;
and if the expert does not shoot and ball A does not move (in the context of
interest), then ball B does not move, and so on.

Of course, the conditions on Proposition 3.1 do not apply in either Example 1
or Example 2. The obvious default values in Example 1 are MT = TT = 0, but
the equations say that in all contexts ~u of the causal model MB for this example,
we have (MB,~u) |= [MT ← 0](TT = 1). In the second example, if we take
DB = 0 and P = 0 to be the default values of DB and P , then in all contexts ~u
of the causal model MD, we have (MD,~u) |= [DB ← 0](P = 1).

While Proposition 3.1 is useful, there are many examples where there is no
obvious default value. When considering the body’s homeostatic system, even if
there is arguably a default value for core body temperature, what is the default
value for the external temperature? But it turns out that the key ideas of the proof
of Proposition 3.1 apply even if there is no default value. Suppose that X1 = x1 is
a but-for cause of X2 = x2 in (M,~u) and X2 = x2 is a but-for cause of X3 = x3
in (M,~u). Then to get transitivity, it suffices to find values x′1, x

′
2, and x

′
3 such

that x3 6= x′3, (M,~u) |= [X1 ← x′1](X2 = x′2), and (M,~u) |= [X1 ← x1,X2 ←
x′2](X3 = x

′
3). The argument in the proof of Proposition 3.1 then shows that

(M,~u) |= [X1 ← x′1](X3 = x′3).3 It then follows that X1 = x1 is a but-for
cause of X3 = x3 in (M,~u). In Proposition 3.1, x′1, x

′
2, and x

′
3 were all 0, but

there is nothing special about the fact that 0 is a default value here. As long as

3The analogous statement is also valid in standard conditional logic. That is, taking A > B to
represent “if A were the case then B would be the case”, using standard closest-world semantics
[Lewis 1973], (A > B) ∧ ((A ∧ B) > C) ⇒ (A > C) is valid. I thank two of the anonymous
reviewers of this paper for encouraging me both to note that this idea is the key argument of the
paper and to relate it to the Lewis approach.

9


we can find some values x′1, x
′
2, and x

′
3, these conditions apply. I formalize this as

Proposition 3.2, which is a straightforward generalization of Proposition 3.1.

Proposition 3.2: Suppose that there exist values x′1, x
′
2, and x

′
3 such that (a)

X1 = x1 is a but-for cause of X2 = x2 in (M,~u), (b) X2 = x2 is a but-for cause
of X3 = x3 in (M,~u), (c) x3 6= x′3, (d) (M,~u) |= [X1 ← x′1](X2 = x′2), and (e)
(M,~u) |= [X1 ← x′1,X2 ← x′2](X3 = x′3). Then X1 = x1 is a but-for cause of
X3 = x3 in (M,~u).

To see how these ideas apply, suppose that a student receive an A+ in a course,
which causes her to be accepted at Cornell University (her top choice, of course!),
which in turn causes her to move to Ithaca. Further suppose that if she had re-
ceived an A in the course she would have gone to university U1 and as a result
moved to city C1, and if she gotten anything else, she would have gone to uni-
versity at U2 and moved to city C2. This story can be captured by a causal model
with three variables, G for her grade, U for the university she goes to, and C for
the city she moves to. There are no obvious default values for any of these three
variables. Nevertheless, we have transitivity here: The student’s A+ was a cause
of her being accepted at Cornell and being accepted at Cornell was a cause of her
move to Ithaca; it seems like a reasonable conclusion that the student’s A+ was
a cause of her move to Ithaca. And, indeed, transitivity follows from Proposi-
tion 3.2. We can take the student getting an A to be x′1, take the student being
accepted at university U1 to be x′2, and the student moving to C1 to be x

′
3 (assuing

that U1 is not Cornell and that C1 is not Ithaca, of course).
The conditions provided in Proposition 3.2 are not only sufficient for causality

to be transitive, they are necessary as well, as the following result shows.

Proposition 3.3: If X1 = x1 is a but-for cause of X3 = x3 in (M,~u), then there
exist values x′1, x

′
2, and x

′
3 such that x3 6= x′3, (M,~u) |= [X1 ← x′1](X2 = x′2),

and (M,~u) |= [X1 ← x′1,X2 ← x′2](X3 = x′3).

Proof: Since X1 = x1 is a but-for cause of X3 = x3 in (M,~u), there must exist
values x′1 6= x1 and x3 6= x′3 such that (M,~u) |= [X1 ← x′1](X3 = x′3). Let x′2
be such that (M,~u) |= [X1 ← x′1](X2 = x′2). Since (M,~u) |= [X1 ← x′1](X2 =
x′2 ∧X3 = x′3), it easily follows that (M,~u) |= [X1 ← x′1,X2 = x′2](X3 = x′3).

In light of Propositions 3.2 and 3.3, understanding why causality is so often
taken to be transitive comes down to finding sufficient conditions to guarantee

10


the assumptions of Proposition 3.2. I now present another set of conditions suf-
ficient to guarantee the assumptions of Proposition 3.2 (and thus, sufficient to
make causality transitive), motivated by the two examples showing that causality
is not transitive. To deal with the problem in Example 2, I require that for ev-
ery value x′2 in the range of X2, there is a value x

′
1 in the range of X1 such that

(M,~u) |= [X1 ← x′1](X2 = x′2). This requirement holds in many cases of in-
terest; it is guaranteed to hold if X1 = x1 is a but-for cause of X2 = x2zi and
X2 is a binary variable (i.e., takes on only two values), since but-for causality re-
quires that two different values of X1 result in different values of X2. But this
requirement does not hold in Example 2; no setting of DB can force P to be 0.

Imposing this requirement still does not deal with the problem in Example 1.
To do that, we need one more condition. Say that a variable Y depends on X if
there is some setting of all the variables in U ∪V other than X and Y such that
varying the value of X in that setting results in Y ’s value varying; that is, there is
a setting ~z of the variables other than X and Y and values x and x′ of X such that
FY (x,~z) 6= FY (x′,~z).

Up to now I have used the phrase “causal path” informally; I now make it
more precise. A causal path in a causal model M is a sequence (Y1, . . . ,Yk) of
variables such that Yj+1 depends on Yj for j = 1, . . . ,k−1. Since there is an edge
between Yj and Yj+1 in the causal graph for M exactly if Yj+1 depends on Yj, a
causal path is just a path in the causal graph. A causal path from X1 to X2 is just
a causal path whose first node is X1 and whose last node is X2. Finally, Y lies on
a causal path from X1 to X2 if Y is a node (possibly X1 or X2) on a directed path
from X1 to X2.

The additional condition that I require for transitivity is that X2 must lie on
every causal path from X1 to X3. Roughly speaking, this says that all the influence
of X1 on X3 goes through X2. This condition does not hold in Example 1; as
Figure 1 shows, there is a direct causal path from MT to BMC that does not
include TT . On the other hand, this condition does hold in many examples of
interest. Going back to the example of the student’s grade, the only way that the
student’s grade can influence which city the student moves to is via the university
that accepts the student.

The following result summarizes the second set of conditions sufficient for
transitivity.

Proposition 3.4: Suppose that X1 = x1 is a but-for cause of X2 = x2 in the
causal setting (M,~u), X2 = x2 is a but-for cause of X3 = x3 in (M,~u), and the
following two conditions hold:

11


(a) for every value x′2 ∈ R(X2), there exists a value x′1 ∈ R(X1) such that
(M,~u) |= [X1 ← x′1](X2 = x′2);

(b) X2 is on every causal path from X1 to X3.

Then X1 = x1 is a but-for cause of X3 = x3.

The proof of Proposition 3.4 is not hard, although we must be careful to get
all the details right. The high-level idea of the proof is easy to explain though.
Suppose that X2 = x2 is a but-for cause of X3 = x3 in (M,~u). Then there must
be some values x2 6= x′2 and x3 6= x′3 such that (M,~u) |= [X2 ← x′2](X3 = x′3).
By assumption, there exists a value x′1 ∈ R(X1) such that (M,~u) |= [X1 ←
x′1](X2 = x

′
2). The requirement that X2 is on every causal path from X1 to X3

guarantees that [X2 ← x′2](X3 = X3) implies [X1 ← x′1,X2 ← x′2](X3 = X3).
Roughly speaking, X2 “screens off” the effect of X1 on X3, since it is on every
causal path from X1 to X3. Now we can apply Proposition 3.2. I defer the formal
argument to the appendix.

It is easy to construct examples showing that the conditions of Proposition 3.4
are not necessary for causality to be transitive. Suppose that X1 = x1 causes
X2 = x2, X2 = x2 causes X3 = x3, and there are several causal paths from X1
to X3. Roughly speaking, the reason that X1 = x1 may not be a but-for cause
of X3 = x3 is that the effects of X1 on X3 may “cancel out” along the various
causal paths. This is what happens in the homeostasis example. If X2 is is on
all the causal paths from X1 to X3, then, as we have seen, all the effect of X1 on
X3 is mediated by X2, so the effect of X1 on X3 on different causal paths cannot
“cancel out”. But even if X2 is not on all the causal paths from X1 to X3, the
effects of X1 on X3 may not cancel out along the causal paths; and X1 = x1 may
still be a cause of X3 = x3. That said, it seems difficult to find a weakening of the
condition in Proposition 3.4 that is simple to state and suffices for causality to be
transitive.

A Proof of Proposition 3.4
To prove Proposition 3.4, I need a preliminary result, which states a key (and
obvious!) property of causal paths: if there is no causal path from X to Y , then
changing the value of X cannot change the value of Y . Although it is intuitively
obvious, proving it carefully requires a little bit of work.

12


Lemma A.1: If Y and all the variables in ~X are endogenous, Y /∈ ~X, and there is
no causal path from a variable in ~X to Y , then for all sets ~W of variables disjoint
from ~X and Y , and all settings ~x and ~x′ for ~X, y for Y , and ~w for ~W , we have

(M,u) |= [ ~X ← ~x, ~W ← ~w](Y = y) iff (M,u) |= [ ~X ← ~x′, ~W ← ~w](Y = y)

and
(M,u) |= [ ~X ← ~x](Y = y) iff (M,u) |= Y = y.

Proof: Define the maximum distance of a variable Y in a causal model M,
denoted maxdist(Y ), to be the length of the longest causal path from an ex-
ogenous variable to Y . We prove the result by induction on maxdist(Y ). If
maxdist(Y ) = 1, then the value of Y depends only on the values of the exogenous
variables, so the result trivially holds. If maxdist(Y ) > 1, let Z1, . . . ,Zk be the
endogenous variables on which Y depends. These are the endogenous parents of
Y in the causal graph (i.e., these are exactly the endogenous variables Z such that
there is an edge from Z to Y in the causal graph). For each Z ∈ {Z1, . . . ,Zk},
maxdist(Z) < maxdist(Y ): for each path from an exogenous variable to Z, there
is a longer path to Y , namely, the one formed by adding the edge from Z to Y .
Moreover, there is no path from a variable in ~X to any of Z1, . . . ,Zk nor is any of
Z1, . . . ,Zk in ~X (for otherwise there would be a path from a variable in ~X to Y ,
contradicting the assumption of the lemma). Thus, the inductive hypothesis holds
for each of Z1, . . . ,Zk. Since the value of each of Z1, . . . ,Zk does not change
when we change the setting of ~X from ~x to ~x′, and the value of Y depends only
on the values of Z1, . . . ,Zk and ~u (i.e., the values of the exogenous variables), the
value of Y cannot change either.

I can now prove Proposition 3.4. I restate it here for the convenience of the
reader.
Proposition 3.4: Suppose that X1 = x1 is a but-for cause of X2 = x2 in the
causal setting (M,~u), X2 = x2 is a but-for cause of X3 = x3 in (M,~u), and the
following two conditions hold:

(a) for every value x′2 ∈ R(X2), there exists a value x′1 ∈ R(X1) such that
(M,~u) |= [X1 ← x′1](X2 = x′2);

(b) X2 is on every causal path from X1 to X3.

Then X1 = x1 is a but-for cause of X3 = x3.

13


Proof: Since X2 = x2 is a but-for cause of X3 = x3 in (M,~u), there must exist
x′2 6= x2 and x′3 6= x3 such that (M,~u) |= [X2 ← x′2](X3 = x′3). By assumption,
there exists a value x′1 such that (M,~u) |= [X1 ← x′1](X2 = x′2). I claim that
(M,~u) |= [X1 ← x′1](X3 = x′3). This follows from a more general claim. I show
that if Y is on a causal path from X2 to X3, then

(M,~u) |= [X1 ← x′1](Y = y) iff (M,~u) |= [X2 ← x′2](Y = y). (1)

Although it is not obvious, this is essentially the argument sketched in the main
part of the text. Literally the same argument as that given below for the proof of
(1) also shows that

(M,~u) |= [X1 ← x′1](Y = y) iff (M,~u) |= [X1 ← x′1 ∧X2 ← x′2](Y = y).

Define a partial order ≺ on endogenous variables that lie on a causal path from
X2 to X3 by taking Y1 ≺ Y2 if Y1 precedes Y2 on some causal path from X2 to
X3. Since M is a recursive model, if Y1 ≺ Y2, we cannot have Y2 ≺ Y1 (otherwise
there would be a cycle). I prove (1) by induction on the ≺ ordering. The least
element in this ordering is clearly X2; X2 must come before every other variable
on a causal path from X2 to X3. By assumption, (M,~u) |= [X1 ← x′1](X2 = x′2),
and clearly (M,u) |= [X2 ← x′2](X2 = x′2). Thus, (1) holds for X2. Thus
completes the base case of the induction.

For the inductive step, let Y be a variable that lies on a causal path from X2
and X3, and suppose that (1) holds for all variables Y ′ such that Y ′ ≺ Y . Let
Z1, . . . ,Zk be the endogenous variables that Y depends on in M. For each of
these variables Zi, either there is a causal path from X1 to Zi or there is not. If
there is, then the path from X1 to Zi can be extended to a directed path P from
X1 to X3, by going from X1 to Zi, from Zi to Y , and from Y to X3 (since Y
lies on a causal path from X2 to X3). Since, by assumption, X2 lies on every
causal path from X1 to X3, X2 must lie on P . Moreover, X2 must precede Y
on P . (Proof: Since Y lies on a path P ′ from X2 to X3, X2 must precede Y
on P ′. If Y precedes X2 on P , then there is a cycle, which is a contradiction.)
Since Zi precedes Y on P , it follows that Zi ≺ Y , so by the inductive hypothesis,
(M,~u) |= [X1 ← x′1](Zi = zi) iff (M,~u) |= [X2 ← x′2](Zi = zi).

Now if there is no causal path from X1 to Zi, then there also cannot be a
causal path P from X2 to Zi (otherwise there would be a causal path from X1
to Zi formed by appending P to a causal path from X1 to X2, which must exist
since, if not, it easily follows from Lemma A.1 that X1 = x1 would not be a
cause of X2 = x2). Since there is no causal path from X1 to Zi, by Lemma A.1,

14


we must have that (M,~u) |= [X1 ← x′1](Zi = zi) iff (M,~u) |= Zi = zi iff
(M,~u) |= [X2 ← x′2](Zi = zi).

Since the value of Y depends only on the values of Z1, . . . ,Zk and ~u, and
I have just shown that (M,~u) |= [X1 ← x′1](Z1 = z1 ∧ . . . ∧ Zk = zk) iff
(M,~u) |= [X2 ← x′2](Z1 = z1 ∧ . . .∧Zk = zk), it follows that (M,~u) |= [X1 ←
x′1](Y = y) iff (M,~u) |= [X2 ← x′2](Y = y). This completes the proof of
the induction step. Since X3 is on a causal path from X2 to X3, it follows that
(M,~u) |= [X1 ← x′1](X3 = x′3) iff (M,~u) |= [X2 ← x′2](X3 = x′3). Since
(M,~u) |= [X2 ← x′2](X3 = x′3) by construction, we have that (M,~u) |= [X1 ←
x′1](X3 = x

′
3), as desired. Thus, X1 = x1 is a but-for cause for X3 = x3.

15


References
Glymour, C. and F. Wimberly (2007). Actual causes and thought experiments.

In J. Campbell, M. O’Rourke, and H. Silverstein (Eds.), Causation and
Explanation, pp. 43–67. Cambridge, MA: MIT Press.

Hall, N. (2000). Causation and the price of transitivity. Journal of Philoso-
phy XCVII(4), 198–222.

Hall, N. (2004). Two concepts of causation. In J. Collins, N. Hall, and L. A.
Paul (Eds.), Causation and Counterfactuals. Cambridge, Mass.: MIT
Press.

Hall, N. (2007). Structural equations and causation. Philosophical Studies 132,
109–136.

Halpern, J. Y. (2015). A modification of the Halpern-Pearl definition of causal-
ity. In Proc. 24th International Joint Conference on Artificial Intelligence
(IJCAI 2015), pp. 3022–3033.

Halpern, J. Y. and J. Pearl (2001). Causes and explanations: A structural-model
approach. Part I: Causes. In Proc. Seventeenth Conference on Uncertainty
in Artificial Intelligence (UAI 2001), pp. 194–202.

Halpern, J. Y. and J. Pearl (2005). Causes and explanations: A structural-model
approach. Part I: Causes. British Journal for Philosophy of Science 56(4),
843–887.

Hitchcock, C. (2001). The intransitivity of causation revealed in equations and
graphs. Journal of Philosophy XCVIII(6), 273–299.

Hitchcock, C. (2007). Prevention, preemption, and the principle of sufficient
reason. Philosophical Review 116, 495–532.

Lewis, D. (1986). Causation. In Philosophical Papers, Volume II, pp. 159–213.
New York: Oxford University Press. The original version of this paper,
without numerous postscripts, appeared in the Journal of Philosophy 70,
1973, pp. 113–126.

Lewis, D. (2000). Causation as influence. Journal of Philosophy XCVII(4),
182–197.

Lewis, D. K. (1973). Counterfactuals. Cambridge, Mass.: Harvard University
Press.

16


McDermott, M. (1995). Redundant causation. British Journal for the Philoso-
phy of Science 40, 523–544.

Paul, L. A. and N. Hall (2013). Causation: A User’s Guide. Oxford University
Press.

Woodward, J. (2003). Making Things Happen: A Theory of Causal Explana-
tion. Oxford, U.K.: Oxford University Press.

17