A Criterion of Probabilistic Causation* Charles R. Twardy and Kevin B. Korbyz The investigation of probabilistic causality has been plagued by a variety of misconceptions and misunderstandings. One has been the thought that the aim of the probabilistic account of causality is the reduction of causal claims to probabilistic claims. Nancy Cartwright (1979) has clearly rebutted that idea. Another ill-conceived idea continues to haunt the debate, namely the idea that contextual unanimity can do the work of objective homogeneity. It cannot. We argue that only objective homogeneity in combination with a causal interpre- tation of Bayesian networks can provide the desired criterion of probabilistic causality. 1. Introduction. The early work on probabilistic causality, especially that of Salmon (1980) and Suppes (1970), made clear that we need to dis- criminate among purported causal factors, all of which have a probabi- listic impact on an effect, ruling out those which are screened off by common ancestors. Subsequently, it also became clear that such a degree of discrimination was insufficient, that the context in which probabilistic impacts are found must be more acutely observed. The Contextual Unanimity Thesis (CUT) emerged as the leading proposal for sifting out those factors which promote (or prevent) some effect. CUT is the idea that causes are context-independent. It claims that C causes E only if C raises the probability of E across all homogeneous reference classes created by a partition using all other factors causally relevant for E.1 This thesis has been endorsed by many of those attempting to make sense of the concept of probabilistic causality—among others, by Cartwright (1979), Skyrms Philosophy of Science, 71 (July 2004) pp. 241–262. 0031-8248/2004/7103-0001$10.00 Copyright 2004 by the Philosophy of Science Association. All rights reserved. #04353 UCP: PHOS article # 710301 241 *Received March 2003; revised November 2003. y To contact the authors write to Computer Science and Software Engineering, Monash University, Clayton, VIC 3800, Australia; e-mail: {ctwardy, korb}@csse.monash.edu.au. z We are grateful to Christoper Hitchcock, Dan Hausman, Peter Moulder, Michaelis Michael, Lucas Hope and two anonymous referees for helpful comments. NSF grant SES 99-06565 provided partial support for this work. 1. Note that neither this nor any other probabilistic criterion of causality described in this paper is an attempted reduction of causality to probability. In later sections we discuss what else is needed to establish or identify causal relations. (1980), Eells and Sober (1983), Humphreys (1989) and (in a weakened form) Cartwright again (1989). According to this requirement, we should not say that imbibing a poison causes death if, in fact, there is a perfect antidote, for in the reference class of those treated with the antidote the poison is inefficacious, which result defeats the causal claim. This may seem like an odd requirement to impose upon causal talk, for there seems nearly always to be available some reference class in which a putative cause will fail to raise the probability of its effect: namely, the reference class in which the object, thing, or space-time slice of interest is going to be immediately obliterated, before the causal mechanism can operate to completion. Thus, we should say that firing squads fail to cause the deaths of their victims, since an imminent heart attack could preempt that eventuality.2 A plausible response, and defense of CUT, could invoke conversa- tional implicature, ruling out pre-emptive or other annoying causal factors which are clearly, by conversational context, meant to be ruled out. But such maneuvering aims to save unanimity theory in order to account for ordinary language. Probabilistic causality is not primarily an ordinary language investigation, as the long history in ordinary discourse of as- suming that causes are sufficient for their effects surely demonstrates. CUT is, at best, a poor aid in understanding ordinary language causal talk, whereas the concept of objective homogeneity does the real work in helping us make sense of probabilistic causality, if only in combination with a known causal structure.3 First we need to clarify some of the terms of the discussion, before turning to a critique of CUT. Our interest is the conditions for C being a probabilistic cause of E. The relevant concept of probability is that of physical probability or objective chance, presumably under a propensity interpretation (or, perhaps, frequencies in an infinite hypothetical se- quence of trials—the exact story is not the issue here). We are not directly concerned with token causality—the conditions for a token event to be a token cause of another token event—but with type causality, such as whether smoking causes lung cancer.4 2. Skyrms’ formulation of ‘‘unanimity’’ escapes this kind of trivial counterexample by allowing the cause to sometimes, but not always, fail to have a positive probabilistic impact on the effect. Skyrms’ account nevertheless falls under every other objection we raise in this discussion. 3. A forthcoming paper (‘‘Probabilistic Causality and Practical Causal Generalizations’’) by Daniel Hausman makes this point in a different way. 4. The relation between type and token causality has not yet been fully explicated. Pearl addresses this issue in terms of ‘‘causal beams’’ in Causality (2000). See also Halpern and Pearl (2001). #04353 UCP: PHOS article # 710301 242 charles r. twardy and kevin b. korb We discuss, and distinguish, two concepts that some may have con- fused: contextual unanimity and objective homogeneity. Both appeal to populations of events and their partitions. The cells in a partition are types of events, in particular, events which share some fixed setof values.5 Both contextual unanimity and objective homogeneity begin with a partition where each cell (alternatively: subpopulation, reference class, or context) is homogeneous for the effect. A cell Hi is homogeneous for an effect E if there is no computable-beforehand property A which makes a difference to the probability of E for events in the cell.6 There are two basic concepts of homogeneity, namely: Objective Homogeneity: A population (or subpopulation) Hi is ob- jectively homogeneous for an effect E if there is no computable- beforehand property A which makes a difference to the probability of E. That is, for any such A within Hi, PðE jAÞ¼ PðE jAÞ. Epistemic Homogeneity: A population (or subpopulation) is episte- mically homogeneous for an effect E if we do not know, of any computable- beforehand property A, that it makes a difference to the probability of E. When we are acting as practical investigators of causality, we are con- strained by the limits of our knowledge, so epistemic homogeneity has a role to play in accounting for scientific method and pragmatic judgment. But, on our view, objective homogeneity is the regulative concept required for analyzing the goal of our methodological efforts, namely causal structure. Employing causal Bayesian networks (treated in more detail subse- quently), we can, in fact, introduce a third kind of homogeneity that explicitly reflects this regulative role of objective homogeneity: Model Homogeneity: Given a causal model M, a population (or subpopulation) identified by a set of observed variables O in M is ho- mogeneous for an effect E if there is no other variable in M which is a non-descendent of E and the observation of which makes a difference to the probability of E. If our understanding of the causal structure of the world can be repre- sented in a causal model, then our current best understanding of homo- geneous contexts or subpopulations is given by its model homogeneities. As we learn more about the world we develop and refine our best models, and our epistemically homogeneous reference classes get refined in turn. 5. We represent possible events as variables whose values determine the type of that event. 6. Roughly, a property is computable-beforehand if it can be computed from properties available to us up to and including the occurrence of C. See Church (1940). #04353 UCP: PHOS article # 710301 243probabilistic causation We may reasonably hope that a sequence of such refinements is aimed at the truth, which is the set of model homogeneities for the true causal model. In the meantime, prior to Keynes’s limit, we work with models that are false and our epistemic homogeneities may well mislead us. Such is the fate of every inductive agent. Note that, just as in Wesley Salmon’s early work on statistical rele- vance (Salmon 1974), homogeneity (of every kind) is concerned with quantities: it is essential for homogeneity to obtain that the probabilities do not deviate. One of the attractions of CUT is that it has far laxer standards—simply that differences in probability across contexts all pull in the same direction, at whatever magnitude. We shall argue that the attractiveness of laxity is strictly superficial. 2. Unanimity and Homogeneity. Contextual unanimity is properly de- fined in terms of objective homogeneity (cf. Eells 1985): Contextual Unanimity: A population is contextually unanimous for C with respect to E if and only if in all cells (contexts) Hi in an objectively homogeneous partition,7 PðE jCÞ > PðE jCÞ. The corresponding thesis is the claim that causes are properly identified in terms of unanimity. More formally, Contextual Unanimity Thesis (CUT): C causes E only if the back- ground population is unanimous for C with respect to E. Whereas unanimity theory supposes that what is causally efficacious (e.g., worthy of the rubric ‘‘causal power,’’ in Cartwright’s language) must be so across contexts, homogeneity theory adopts the exactly op- posite position: that causal factors are context-dependent and, to get the causal story right, we must relativize causal claims to a specific context. Not only need the causal powers push in the same general direction, but they need to push with the very same strength. Anything less does in fact average disparate powers, which is not good enough even for government work.8 Hence, our initial thesis is: Objective Homogeneity Thesis (OHT): C is causally relevant to E in the homogeneous context Hi only if, within Hi; PðE jCÞ 6¼ PðE jCÞ. Objective homogeneity was first championed by Salmon (1970) as a key to statistical explanation. It takes seriously the relativization of causal 7. Of course, the homogeneity requirement here applies to all relevant properties except the causal factor C in question and properties not fixed at the time of C. 8. Despite Cartwright’s remarks to the contrary (1989). Also, we shall henceforth use ‘‘causal power’’ and ‘‘causal capacity’’ interchangeably, as we do not believe that Cartwright has a well-motivated distinction between them. #04353 UCP: PHOS article # 710301 244 charles r. twardy and kevin b. korb claims to populations, advocated by Eells (1988), perhaps more seriously than Eells did himself. The requirement of OHT is that causal claims be relativized to objectively homogeneous reference classes (the cells Hi), so that the introduction of any further putative causal factor cannot change the probability of the effect to any degree whatsoever. If causal claims are not so relativized, they can be arbitrarily wrong about the causal strength (effect size). Note that whereas CUT is specifically about a state C promoting the outcome state E, OHT is about whether variable C is causally relevant to variable E. But whenever one state of a variable is promoted, another must be demoted, so when speaking of variables causing one another, causality just is causal relevance. It is also worth noting that one can always recover the CUT verdict from the OHT conditions, but not the other way around. There is a clear and direct relation between the properties of contextual unanimity and objective homogeneity. Unanimity is defined in terms of homogeneity, so there is a sense in which any case having the property of being objectively homogeneous must trivially also have the property of being contextually unanimous. That is, if we restrict our context to the single cell satifying our objective homogeneity requirement, that context will be unanimous for the effect. Of course, the aim of CUT is to find a useful criterion that generalizes beyond such narrow contexts, so the theses are distinct; OHT is quite significantly stricter than CUT. It is worth noting that a weaker version of the homogeneity thesis can be had by substituting epistemic for objective homogeneity—that is, re- lying upon the model homogeneities of our currently best model. This would produce a criterion far easier to meet: it would require only that we not know of any causal factor which could partition the reference class to produce distinct probabilities of the effect. And it is surely true that as a practical matter we are often satisfied with epistemic homogeneity. In- deed, Dupré and Cartwright appear to believe that epistemic homogeneity is all that we can ever have: But we have been arguing that however finely we partition the popu- lation, we cannot know from the statistics that we have anything better than an average causal upshot. (Dupré and Cartwright 1988, 530) Science progresses, quite explicitly in the form of experimental methods, from crude epistemically homogeneous classes to identifying inhomoge- neities to more refined models and so improved epistemically homoge- neous classes (as we pointed out in Korb 1999). And in some cases science progresses to what is arguably objectively homogeneous (cf. Bell’s theorem). Thus, objective homogeneity provides the metaphysical norm toward which epistemic homogeneity can tend. Dupré loudly denounces #04353 UCP: PHOS article # 710301 245probabilistic causation such a view as hiding behind a ‘‘metaphysical fig leaf’’ (1993). But a de- nunciation is no counter argument. If the history of science makes anything clear, it is clear that a scientific explanation which omits unknown causal factors may be wrong, discovered to be wrong, and properly be replaced by an account which incorporates those variables. Dupré’s dismissal of meta- physics would simply leave these known facts about science mysterious. Objective homogeneity in OHT provides the stricter standard that can make sense of our normative aspirations. 3. Problems with CUT. CUT fails to provide an adequate criterion for probabilistic causality simply because it is not strict enough. If we partition only as finely as required to satisfy contextual unanimity, we have effec- tively declared that causation has only direction but not strength. CUT is equivalent to creating a homogeneous partition and then agglomerating disparate capacities to bring about an effect, on the grounds that they operate in the same direction, regardless of their strengths. As Dan Haus- man emphasizes,9 the appeal of CUT is that it appears to make causal talk useful. Instead of requiring us to identify, even if only in principle, all the factors needed for objective homogeneity, we can take the more relaxed approach of identifying only such factors as are needed to force a common causal bearing (direction of impact) on the effect. The difficulty with this line of thought is two-fold. First, in non-microphysical cases we are no closer practically to locating those factors needed for a common causal bearing than we are to locating those needed for objective homogeneity itself. If we can happily assert that smoking causes cancer, for example, it is not because we know that there are no countervailing forces where smoking is neutral or even preventative for cancer. So CUT fails even as an account of ordinary causal talk. But second, this line of thought—aiming to account for the utility of and/or the pragmatics of ordinary causal language—is a diversion from giving a normative, metaphysical account of causation, and not the very same thing. If we can connect the distinct metaphysics and epistemics of causation, we will have achieved our goal. Beyond the considerations about how causal talk works, CUT seems to have little going for it. If some treatment raises the probability of survival in one category of cancer patient by 300% and and in another by 5%, this is not a trivial difference to be papered over by talk of ‘‘causal power.’’ Effect size matters! Indeed, the multivariate methods of statistics used to analyze contingency tables and make sense out of such cases unsurpris- ingly depend explicitly on exact estimates of just such differences. A methodology for causal inference founded upon a metaphysics which ignores them would simply be a non-starter. 9. In his forthcoming paper. #04353 UCP: PHOS article # 710301 246 charles r. twardy and kevin b. korb 3.1. Varieties of Populations. Taking particular values of C and E, there are four kinds of population to consider (where C abbreviates C ¼ c for some c, etc.): 1. Positive case: C raises the probability of E in all objectively ho- mogeneous cells. In this case we can say C promotes E, or some- times just C causes E. 2. Negative case: C lowers the probability of E in all objectively homogeneous cells. In this case we can say C prevents E, and it is strictly parallel to the positive case. 3. Mixed case: C raises the probability of E in some cells and lowers the probability of E in others. Most real cases appear to be like this. 4. Neutral case: C is always irrelevant for E. The neutral case has generally been taken to be non-problematic, with the conclusion that C is not a cause of E. Later, we see where this can fail. In the mixed case, an uncompromising CUT advocate would insist that C is simply not a cause of E, regardless of it having measurable effects.10 Eells’ insistence on relativizing to contextually unanimous subpopu- lations in fact puts him partway between that extremism and OHT. Eells’ view absorbs all mixed cases into one of the other three cases, by rela- tivizing the causal claim to some, but not all, the factors which identify an objectively homogeneous reference class. Under either interpretation, CUT is mixing together distinct cells in the homogeneous partition. Dupré and Cartwright correctly complain that, despite the unanimity require- ment, all that we really have are ‘‘causal upshots’’—an average of a variety of causal powers (even though they happen to point in the same direction). As we just mentioned, this is a problem if we should like to make sense of multivariate methods of causal discovery, which crucially depend upon effect sizes (see, e.g., Korb and Nicholson 2003; Spirtes et al. 2000). It is also a problem when we attempt to make sense of causal interactions. 3.2. Mixtures and Interactions. In the linear case one cause interacts with another when the effect is not simply the sum of independently operating causal forces. Contextual unanimity does not have the tools to make sense of these causal interactions. A strict CUT theorist must claim that causes cannot interact, because according to CUT, neither positive nor negative causes can be interacting, and mixed causes are not causes at all. But we all know that causes do interact, so Eells and others have concluded that interaction happens precisely in the mixed case: where the valence of a cause reverses from positive to negative across contexts. 10. Glennan and others criticize CUT on these grounds, as we shall see later. #04353 UCP: PHOS article # 710301 247probabilistic causation Yet mixed cases are neither necessary nor sufficient for causal inter- action. They are not necessary because causes may interact by reinforcing each other beyond their individual effects (synergy or potentiation), by preempting one another, as well as by cancelling each other out in an exclusive-or relation (as, for example, alkali and acid can do). They are not sufficient because you can get a valence-reversal without interaction, via dual (multiple) causal pathways: for example, oral contraceptives do not interact with pregnancy to provide a mixed case for thrombosis— there is merely a multiple-path dependency (see Figure 1). 4. Problems with OHT. We shall now consider some objections to OHT. Glennan has recently objected that homogeneity forces us only to treat fully specified complexes of causal factors, which are unnatural and un- wieldy; Dupré and Cartwright contend that homogeneity does not work in any case, since it fails for dual capacity scenarios. Neither objection can be sustained—although the possibility of dual capacities (in the form of precisely counteracting multiple pathways) will in the end force us to elaborate OHT with explicit reference to causal structure. 4.1. Glennan’s Objection. In his discussion of contextual unanimity in the units of selection debate in evolutionary theory, Glennan argues: if one requires context independence [i.e., contextual unanimity] for an entity to be a unit of selection, the unit of selection will inevitably be the entire genome. (2002, 122) This for the reason that there is very widespread interaction across loci, with alleles at some potentially negating or reversing the developmental effects at others, and all in such a complex interrelation that we cannot seriously expect the causal impact on reproduction to be unanimous without accounting for the entire nexus. So we will not get unanimity until we conjoin a great many background conditions into the cause. Glennan considers this an unwelcome result since it appears to trivialize the debate on units of selection. But perhaps that debate should be trivialized. The same criticism must apply to OHT, as it is the more demanding requirement. If fixing the entire genomic background is required to achieve contextual unanimity, it will certainly also be required to find homogeneous reference classes for se- lection. To be sure, OHT here has a purely linguistic advantage over CUT: it asserts causality of the single gene (so long as there is any Hi where it makes a difference), whereas CUT must assert causality only of the conjunction of all factors required to achieve unanimity. The same point can be put in a simpler domain than the units of selection debate. Glennan also says, #04353 UCP: PHOS article # 710301 248 charles r. twardy and kevin b. korb Now it is possible to accept this consequence of contextual una- nimity—to say that generalizations like the one that smoking causes cancer are false. (2002, 124) Presumably, if we homogenists are constrained to deny such an evident truth as that smoking causes cancer, then we are in trouble. But we are not. Taken as a general claim, homogenists can readily affirm the efficacy of smoking because there is at least one context in which it has the requisite probabilistic impact. Taken as a specific claim, we can say it is ambiguous and ask for a context. Within a context its truth or falsity will be clear. In support of the homogenist’s treatment of the general claim, some plausible account, under implicature, can be made ruling out known or easily anticipated cases where the causal power of smoking is negated or reversed. The same defense applies to Eells, though not to strict unanimists (if there are any). But this line of consideration is irrelevant to the metaphysics of causation. Returning to the genome, it is entirely likely that a less trivializing account of the units of selection debate can be made out of pragmatic considerations. In any case, as with smoking, we can certainly say that particular alleles cause particular characteristics, although no doubt only and always in particular circumstances which may never be fully accounted for. Objective homogeneity provides a regulative, normative account of causality and not necessarily a pragmatic account accessible in the short term. But rather than dismissing this verdict as a metaphysical pipe dream, why shouldn’t we believe that a Bayesian network that fully depicts the causal structure of, say, human ontogeny would require var- iables representing every locus in the human genome? It would surely require that—and more, since humans do not develop in a vacuum. 4.2. Dupré and Cartwright’s Objection. Dupré and Cartwright (1988, section 4) argue that Eells’ relativization of causal claims to populations (and, by extension, ours to homogeneous reference classes) fails in the case of dual capacities. We can use Hesslow’s famous example (Hesslow 1976) (Figure 1): oral contraceptives increase the risk of thrombosis in women who would not otherwise have become pregnant, but decrease that risk in women who would otherwise have become pregnant.11 Of this example, Dupré and Cartwright say: [This] criterion [unanimity and/or homogeneity] . . . yields the conclusion that in this population contraceptives do not prevent 11. This is because pregnancy increases the risk twelve-fold and second-generation oral contraceptives by three-fold. Third-generation oral contraceptives increase the risk six-fold (Development and Evaluation Service 1996). #04353 UCP: PHOS article # 710301 249probabilistic causation thrombosis (via prevention of pregnancy). The reason, we may all agree, is trivial. We have held pregnancy fixed . . . (1988, 529) If we hold pregnancy fixed, then contraceptives can only reinforce the tendency that pregnancy has (in one subpopulation) to produce throm- bosis (and with similar result in the other subpopulation). But this is obviously wrong, as Dupré and Cartwright immediately observe: [Many] women will in fact have been saved from thrombosis by the pills; without the pills they would probably have become pregnant and run a high risk of thrombosis. Well, of course! If you hold fixed those who actually do become pregnant, etc., then you will run a very high risk of confusion in interpreting your experimental results. That is, in fact, the whole point behind Cartwright’s refinement of her causality condition, CC*, elaborated in her Nature’s Capacities and their Measurement (1989). A homogeneous reference class traditionally does not refer to intermediate effects (or any effects in the future of C), though it does require reference to the subjunctively identified subpopulations (captured retrospectively by CC*), such as those women who otherwise would have become pregnant—as we have discussed at length elsewhere (Korb 1999). By confusing the subjunctive and indicative moods, Dupré and Cartwright have wrongly concluded that relativizing causal claims to homogeneous classes continues to mix causal capacities. Nevertheless, there are cases of dual capacities where OHT fails. 4.3. Causal Neutrality in Simple Worlds. When we identify homoge- neous reference classes we are considering holding fixed all causally rel- evant factors prior to the one in question. Pregnancy comes after the factor Figure 1. Multiple paths to thrombosis. #04353 UCP: PHOS article # 710301 250 charles r. twardy and kevin b. korb in question (the pill). Of course, the subjunctive property comes before, which is what we were relying upon. We do not know precisely how to analyze such subjunctive properties, but presumably it is to be done in reference to a richer causal nexus than that described in Figure 1, with multiple hidden causes prior to taking the pill fixing an objective proba- bility of falling pregnant without the pill. But what if the world is simple? What if there really are only three variables that describe the world? If the world is as simple as Figure 1, and if the parameters exactly balance, so that the pill has no net probabilistic impact on thrombosis, then OHT advises that there is no causal relation between them. But that is wrong—we are assuming that Figure 1 is the right causal model, after all! So, it turns out that in the end, objective homogeneity alone cannot do the work we require of it.12 We might be tempted simply to extend the partitioning conditions of OHT to incorporate downstream events, since, after all, observing Preg- nancy in the neutral Hesslow case reveals the causal connection between the pill and thrombosis. Furthermore, allowing downstream variables need not resurrect the Dupré–Cartwright objection about falsely con- cluding that contraceptives only promote thrombosis, because it is a principle of homogeneity theory to require specification of the context, explicitly or implicitly. It is no error to say that, given that we know someone is pregnant, the pill does not prevent pregnancy; nor in that case can it be neutral for thrombosis. Nevertheless, this future-fixing OHT is inadequate for another reason: it sanctions the ascription of causality to non-causes. Consider the network of Figure 2. If we are allowed to observe the effect A, then we induce a probabilistic dependency between C and E (because the path from C to X becomes active, as explained in Section 5.1). But there is no sense in Figure 2. Failure of future-fixing. 12. Neither, of course, can CUT. And note that the real issue here is the neutralizing effect of multiple pathways, not the simplicity of the world per se. In complex worlds, however, it seems likely that the invocation of subjunctive properties will be available to imbalance such pathways. We cannot escape by observing that such neutral cases have measure zero (since they depend upon exact parameterizations): we understand perfectly well what the causal story is here, so we ought to be able to codify that understanding. #04353 UCP: PHOS article # 710301 251probabilistic causation which C causes E here, even if it is known in advance that its effect A will hold. 4.4. Intransitivity. On reflection OHT has an obvious problem: it ignores causal structure. Being exclusively probabilistic (and temporal), it cannot but mishandle exact balances. But if we move beyond proba- bilities to appeal to causal structure itself, we may wonder why we should bother with the probabilities at all. Indeed, a plausible conjecture for a satisfactory criterion of causality is the transitive closure over directed causal links in a causal graph. In that case, we could deploy our best causal discovery programs, for example, and simply examine the result- ant graph to see whether X and Y are connected by a directed chain. Unfortunately, transitivity over direct causal links does not hold (Cooper 1999; Hitchcock 2001b; Pearl 2000). There are strange cases where, although X causes Y and Y causes Z, it just is not the case that X causes Z. For example, suppose Y describes how a terrorist pushes a detonator (with his left hand or his right hand) and Z an explosion. Then if X in- dicates whether a dog bites the terrorist’s right hand, it may well have an impact on Y: if the dog bites the right hand, the terrorist will push with the left and otherwise with the right. But in either case the bomb will explode, so the dog’s bite has no impact whatsoever on that out- come. We need a criterion of causality where X does not here cause Z.13 It is only in cooperation that causal structure and probabilistic impact can identify general causal structure. 5. Wiggle Logic. On OHT, the most natural kind of causation is similar to what Eells calls ‘‘causal relevance.’’ We approach the problem by 13. A more mundane example was described by Richard Neapolitan (2003) in a presen- tation: finesteride reduces DHT (a kind of testosterone) levels in rats; and low DHT can cause erectile dysfunction. However, finesteride didn’t reduce DHT levels sufficiently for erectile dysfunction to ensue, in one study. Figure 3. Representing perfect intervention on C. #04353 UCP: PHOS article # 710301 252 charles r. twardy and kevin b. korb considering what happens probabilistically when we intervene upon (wiggle) the causal variable C in an attempt to fix its value to some c, which we represent IC=c (or simply IC). 14 In the spirit of OHT we could say that one variable C causes another E in a causal Bayesian network precisely when there is some causal context and some possible manipulation of C which is followed by a change in the distribution on E. In particular, a direct link should guarantee causation, given a causal Bayesian network which is also a perfect map.15 A natural way to interpret causal relevance is via manipulations of C given obser- vations of other variables O: Causal Relevance (CR): C causes E with respect to G=O if and only if 9O0 � O s:t: 9c PG=O 0ðE jIC¼cÞ 6¼ PG=O0ðEÞ: Here, G=O means we have a causal graph G in which the variables O have been observed. O therefore identifies the background in which the causality question is raised, and O0 the context in which it is answered.16 5.1. Intervention and Activation. We have begun talking about inter- vening on variables rather than just observing (or conditioning on) them. Most of the literature on probabilistic causality has used notions of intervention, especially in reference to experimental procedure, but without clarifying the relation between intervention and observation. Recent work in Bayesian networks has developed tools to represent both formally. Technically, the best way to represent IC¼c in a Bayesian network is to add a parent node IC to C, allowing for any kind of success rate in manipulating the value of C (Korb and Nicholson 2003). It is best in the sense that it is the most flexible, for it can represent imperfect, as well as perfect, control—that is, it can represent manipulations that may fail to fix the value of C, as well as those which are guaranteed to succeed re- gardless of the state of the parents of C. Adding the intervention variable IC also allows us to represent any kind of interaction between the 14. C = c denotes a particular intervention rather than a particular value that C adopts. In general, an intervention need not select a single state. 15. A perfect map is one in which each missing arc implies a probabilistic independence (the Markov property) and each existing arc implies a probabilistic dependence (called ‘‘faithfulness’’ by Spirtes et al. 2000; cf. Pearl 1988). A causal Bayesian network is a Bayesian network which can model deterministic interventions on C by removing arcs into C. 16. Menzies (2002, forthcoming) has a similar approach which he calls ‘‘difference-making in context,’’ although he develops it with respect to deterministic causal models. #04353 UCP: PHOS article # 710301 253probabilistic causation manipulation and any alternative causes.17 Regardless, in the case usually imagined (if unusually realized), IC indeed exerts perfect control over C. In that case, it is easiest to represent the intervention by cutting all arcs into C, and then fixing C = c.18 Under perfect control, the two representations are equivalent, although only the arc-cutting representation will be a perfect map. However, since they are equivalent, it follows that observations of IC (i.e., manipulations of C ) cannot induce a probabilistic dependency with anything causally upstream from C.19 In particular, C and E cannot then have any common cause inducing a probabilistic dependency between them. Nothing in the discussion will hang on the difference between perfect and imperfect control, so we shall assume perfect control, while continuing to refer to IC as the potential controlling variable. Interventions and observations will have no probabilistic impact upon downstream events if they are blocked (d-separated) in the context O.20 Blocking: �2PathsðC; EÞ; is blocked in G=O if and only if O d-separates C from E in G, where Paths(C, E ) is the set of all directed paths from C to E. Equiva- lently, � is blocked if and only if 9Z Z2�^Z2Oð Þ.21 What is doing the blocking (or not) in these cases is a set of observed variables O. What we specifically need for our Wiggle Logic is the concept of paths being ‘‘blocked’’ or not either because of such obser- vations or because a direct connection between two variables X ! Y 2 PathsðC; EÞ has been otherwise blocked—for example, by hypo- thetically introducing a previously unknown variable between X and Yand observing its value. Let R stand for a set of restrictions on G, meaning a set of observations O together with a set of arcs which are restricted or 17. There is another kind of intervention which adding IC allows us to represent, namely IC deterministically forcing C to take a new non-degenerate probability distribution, inde- pendent of its other parents. 18. This is a graphical representation of Pearl’s ‘‘do calculus’’ (Pearl 2000). 19. Note that this is only true with perfect control and generally false otherwise. Since imperfect control introduces such complications, without any compensating benefit for our purposes here, we shall ignore it hereafter. 20. For a formal definition of d-separation, see Pearl (1988). Informally, it is the graphical counterpart to ‘‘screening off,’’ or conditional independence: observing a common ancestor (or intermediate node in a chain) blocks a path relating two events and observing a common descendent induces a probabilistic dependency. Sewell Wright (1934) was the first to rep- resent these relationships graphically. 21. Note that this is only equivalent because Paths(C, E ) consists exclusively of directed paths from C to E. #04353 UCP: PHOS article # 710301 254 charles r. twardy and kevin b. korb blocked by such a hypothetical observation. When describing a set of restrictions, we will by convention identify them with the variables at the head of the arcs being restricted. For example, R ¼ O[ParentsðEÞ would mean that all parents of E are either observed or their arcs into E are otherwise restricted. Then, Activation: � 2 Paths(C, E) is active in G=R where R�O if and only if (a). � is not blocked in G=O and (b). No X ! Y 2 � is restricted in R. 5.2. Paths Plus Probabilities. Probabilistic impact is insufficient as a criterion for causality because of odd cases of neutrality. Transitive clo- sure along a path is insufficient because of odd cases of intransitivity of the probabilistic dependency. We are now finally ready to consider how to articulate these two in a sufficient criterion for causal relevance. The most direct attempt to marry causal structure with probabilistic impact is to find probabilistic dependency along single paths: Wiggle Logic I: C causes E with respect to G=R if and only if 9� 2 PathsðC; EÞ and 9c PG=RðEjIC¼cÞ 6¼ PG=RðEÞ The criterion, illustrated in Figure 4, asserts that causal intervention reveals causal relationships when and only when downstream effects Figure 4. Wiggle Logic I. R fixes a background wherein one or more paths �i allow intervention on C to make a probabilistic difference. #04353 UCP: PHOS article # 710301 255probabilistic causation reflect the impact probabilistically. This rules out intransitivity cases, as we want. Note that the issues concerning screening off that vexed early attempts to provide a probabilistic criterion for causality do not even arise: observing IC cuts all other arcs into C, preventing any probabilistic influence from going backwards through a common cause. On the other hand, this formulation fails to cope with accidental neu- trality in small worlds: in the neutral Hesslow case, there just is no probabilistic dependency between the pill and thrombosis, because the dual pathways neutralize each other. But this is exactly and only because there are neutralizing multiple pathways. There is no difficulty in ex- tending WL I to cope with multiple paths: Wiggle Logic II: C causes E with respect to G=R if and only if 9R0 � R s:t: : 9!� 2 PathsðC; EÞ s:t: � is active in G=R0and 9c PG=R0 ðEjIC¼cÞ 6¼ PG=R0 ðEÞ where 9!� 2 Paths(C, E) means there is a unique directed path � from C to E (see Figure 5). The uniqueness requirement for � implies that any alternative directed paths from C to E are blocked by R0. The causality criterion, then, just combines transitivity (i.e., there being a directed path) with probabilistic dependence under intervention, but in such a way that any neutralizing Figure 5. Wiggle Logic II. R0 fixes a background and foreground wherein exactly one path �i allows an intevention on C to make a probabilistic difference. #04353 UCP: PHOS article # 710301 256 charles r. twardy and kevin b. korb alternative pathways are discounted. WL II trivially subsumes WL I; that is, if WL I is satisfied then WL II must be satisfied. For that reason also, any directly linked cause and effect will be identified by WL II. With this in mind, let us revise CR to talk of restrictions: Causal Relevance (CR II): C causes E with respect to G=R iff 9R0 � R s:t: 9c PG=R0 ðE jIC¼cÞ 6¼ PG=R0 ðEÞ: Clearly if WL II holds, then CR II holds: there is some state of the network such that wiggling C makes a difference to the distribution over E. We conjecture that CR II implies WL II: if the second clause holds without R0 intersecting all but one directed path from C to E, then it also holds under some such circumstance. The advantage of WL II, in any case, lies simply in making the joint dependency on transitive closure and probabilistic dependency explicit. R identifies the context within which the causality question is raised. R0 identifies some context within which it can be answered. R0 specifies some state of the Bayesian network G as a whole, with all unnamed variables left unobserved and unnamed arcs unrestricted. It may well be that R0 shall need to fix one or more parents of E to particular values before an intervention on C shows any probabilistic impact: that is, it may be that C affects E only in some specific interaction involving other causes. This criterion allows for that. 5.3. Objective Homogeneity. So what about objective homogeneity— or, the model homogeneity which is its representative? In complex worlds, where subjunctive properties are in principle computable be- forehand from the rich causal background, objective homogeneity may be enough. However, as we have seen, in simple worlds we need more. Wiggle Logic (and CR) obtain this ‘‘more’’ by reference to any context R0 (or O) in a causal model, homogeneous or not. But in applying these criteria, if we are looking at an inhomogeneous context, we are looking at a wrong context. Suppose, for example, that there is only a single, direct link between C and E. In that case, we can leave R0 ¼ O in WL II, since there are no alternative paths to intercept. But if O fails to specify a homogeneous background, if there are other causes of E that are not held constant, then whether an intervention on C induces a change in the distribution over E may depend upon the prior probability distribution over those unfixed alternative causes. In short, WL II will only provide us with a criterion for belief in causality—even relative to the model at hand, rather than the normative, metaphysical criterion we were after. Without objective homogeneity, WL II is satisfied too easily. #04353 UCP: PHOS article # 710301 257probabilistic causation In order to deal with this problem we need to revert to the original idea of OHT, requiring O to be model homogeneous in the first place.22 Thus, objective homogeneity refuses to go away. Because it works thread-by-thread, this forward-looking homogeneity does not allow a direct answer to the question, ‘‘What is the overall effect of contraceptive pills on thrombosis?’’ But then we already knew how to get the right answer to that question: use the traditional computable- beforehand homogeneous reference classes at C. What we answer here is how to determine whether there is in fact some causal relation between C and E—but not what measurable strength that relation has nor how it contributes to an overall impact on E. We leave the proper accounting of the notion of causal strength to future work. 5.4. Simple Simple Worlds. We suggest that WL II provides a fully satisfactory accounting for the neutral thrombosis case. But, it could well be objected that by introducing restriction and control variables we have violated our own assumption of the world being simple: we have added new variables to the model that was supposedly an exhaustive description of the causal world. In short, we have cheated. We unashamedly confess to cheating. Not being permitted this cheat is equivalent to restricting consideration to a world where there is no such thing as causal inter- vention. But it is arguable that in such a world there is no such thing as causality, and the only legitimate considerations are probabilistic. In that world there is no sense in which the original thrombosis model is superior to the simpler v-structure Pill ! Pregnancy Thrombosis—indeed, there is a sense in which the latter, being a simpler representation of the very same probability structure, is superior. We are not interested in worlds which are non-causal, and our criterion of causality cannot be faulted for failing to find any causality in them. 6. An INUS Condition for Probabilistic Causality. John Mackie (1993) proposed the INUS account of causality: that a condition C is a cause of E just in case it is an Insufficient, but Necessary, component of an Un- necessary, but Sufficient, condition for E. This fails, if only because it presupposes a deterministic universe and so rules upon synthetic matters with a priori hubris. What we here offer is quite different, namely an INUS condition for probabilitic causality: Probabilistic dependency in an objectively homogeneous context is an Insufficient (in that it requires a causal model to provide the directed chain connecting C and E as well as the parent set of E) but 22. Here is one such O: Parents(E )\�i plus the parents of every intermediate node on �i, ex- cluding the members of �i. This is sufficient, but not necessary, to obtain model homogeneity. #04353 UCP: PHOS article # 710301 258 charles r. twardy and kevin b. korb Necessary part of WL II, which is an Unnecessary (because of quantum mechanical cases where the Markov property fails), but Sufficient condition for causality. 7. Conclusion. Unanimity and homogeneity are, of course, related. For one, unanimity is properly defined in terms of homogeneity, as we ob- served above. But also they both make some attempt to relativize causal claims and avoid the mistake of mixing together distinct effects. Thus, some objections made to relativization apply to both. Those objections succeed, at best, in pointing out that homogeneity theory does not provide much of an account of ordinary causal talk. We shall leave such worries to those who take ordinary language philosophy seriously. Contextual unanimity theory for probabilistic causality is a red herring: it fails to find the true causal capacities hidden within the mass of arti- factual causal claims based upon averages; it also fails to account for ordinary causal talk. Being easily impressed by causal factors that merely push the effect in the same direction, the unanimity account distracts from what is important. It is only from our best estimates of precise causal strengths that we can hope to learn causal structure, as is clearly revealed in the recent literature on causal discovery. Objective homogeneity, properly understood within a causal context, does not and can not mix distinct causal powers. It handles interactions and allows us to make sense of the learning of causal structure, whether by human or machine. Although causal structure alone does not suffice to identify causality in Bayesian networks (because of intransitivity), causal structure combined with objective homogeneity in Wiggle Logic does. Finally, Wiggle Logic provides a clear account of the relation between epistemics and metaphysics. Wiggle Logic provides a clear criterion of causal relevance relative to a causal model—just as model homogeneity provides a clear criterion of homogeneity relative to a model (if we replace observations O with restrictions). As we learn more about the world we are in, we refine our causal model of that world, by adding missing nodes and arcs (or, perhaps, deleting some). Our account of causal relevance, and our best guess of what the objectively homogeneous reference classes look like, thereby become more sophisticated. In that way, our model homo- geneities may tend toward what are truly the objective homogeneities. Addendum I: More Pragmatics. We have repeatedly pushed aside ordinary causal talk, except to say that there is an obvious way in which the homogenist does not do violence to ordinary causal talk. But if a homogenist were to estimate the actual risk of smoking for a particular agent, that homogenist would have to specify a #04353 UCP: PHOS article # 710301 259probabilistic causation reference class. Yet this homogenist would certainly know that the refer- ence class is not truly objectively homogeneous. So, what should we say? Instead of relying on implicature, one could also give an account of actual causal talk as offering ‘‘best recommendations’’ given an episte- mically homogeneous reference class. Such an account might use decision theory to choose the optimal plan given an agent’s uncertainty about the homogeneous reference classes. Christopher Hitchcock has developed such an account (Hitchcock 2001a, 2002).23 Although his account is presented for a particular agent choosing its own actions, it is easily generalized. This is not our project, but we are sympathetic to such a decision-theoretic approach. Addendum II: How Objective Homogeneity Is Not Just about Probabilities. Christopher Hitchcock has asked us why probabilities are the right way to discriminate capacities.24 Suppose, he says, there is a population with three different subpopulations, P1, P2, and P3 relevant to smoking S and death D. Smoking always increases the probability of death, but it can do so either via lung cancer L or heart disease H. Smoking increases the probability of death among P1 and P2 via H, and among P3 via L. By strange coincidence it happens, however, that PðDjSP2Þ¼ PðDjSP3Þ, and analogously for :S. Hitchcock says, ‘‘Then, according to homoge- neity, populations P2 and P3 should be joined to form one homogeneous sub-population,’’ which would aggregate different capacities. If we model the situation in just the three variables S,P,D, then we have the model shown in Figure 6. In this case, objective homogeneity requires us to combine the populations P2 and P3, because of their identical probabilistic consequences for other variables in the model.25 23. This is also the orientation of Daniel Hausman’s forthcoming paper. 24. In private communication. 25. This is not strictly speaking true, however Salmon’s appeal for always adopting the broadest available homogeneous reference class (Salmon 1974) appears to us to be well justified, if only on the grounds of strengthening the statistical basis for inductive inference. Figure 6 #04353 UCP: PHOS article # 710301 260 charles r. twardy and kevin b. korb But what happens if we decide to model the story as Hitchcock has presented it? In that case our causal model would be as shown in Figure 7. Here we are no longer required to merge the subpopulations of P—rather, we are obliged not to do so, since that would violate the story, mis- describing causal effects on H and L. As its reformulation in WL II makes clear, objective homogeneity discriminates capacities relative to a causal model and not in vacuo. In short, the wrong model leads to wrong conclusions. references Cartwright, Nancy (1979), ‘‘Causal Laws and EAective Strategies’’, Noûs 13: 419–437. Reprinted and expanded in her 1983 book of the same title. ——— (1989), Nature’s Capacities and Their Measurement. Oxford: Clarendon Press. Church, Alonzo (1940), ‘‘On the Concept of Random Sequence’’, Bulletin of the American Mathematical Society 46: 130–135. Cooper, Gregory (1999), ‘‘An Overview of the Representation and Discovery of Causal Relationships Using Bayesian Networks’’, in Clark Glymour and Gregory F. Cooper (eds.), Computation, Causation and Discovery. Cambridge, MA: MIT Press. Development and Evaluation Service (1996), Factor V Leiden screening in oral contra- ceptive users. Technical Report 58, R&D Directorate, South and West Regional Health Authority, Canynge Hall, Whiteladies Road, Bristol BS8 2PR. http://www.doh.gov.uk/ research/swro/rd/publicat/dec/dec58.htm, using data from the 1994 Leiden thrombo- philia study and others. Dupré, John (1993), The Disorder of Things: Metaphysical Foundations of the Disunity of Science. Cambridge, MA: Harvard University Press. Dupré, John, and Cartwright, Nancy (1988), ‘‘Probability and Causality: Why Hume and Indeterminism Don’t Mix.’’, Noûs 22(4): 521–536. Eells, Ellery (1985), ‘‘Probabilistic Causality: Reply to John Dupré’’, Philosophy of Science 54: 105–114. ——— (1988), ‘‘Probabilistic Causal Levels’’, in Brian Skyrms and William L. Harper (eds.), Causation, Chance and Credence. Dordrecht: Kluwer, 109–133. Eells, Ellery, and Elliott Sober (1983), ‘‘Probabilistic Causality and the Question of Tran- sitivity’’, Philosophy of Science 50: 35–57. Glennan, Stuart (2002), ‘‘Contextual Unanimity and the Units of Selection Problem’’, Philosophy of Science 69:118–137. Halpern, Joseph Y., and Judea Pearl (2001), ‘‘Causes and Explanations: A Structural-Model Approach-Part II: Explanations’’, in Proceedings of the seventeenth International Figure 7 #04353 UCP: PHOS article # 710301 261probabilistic causation Conference on Artificial Intelligence – IJCAI-01. San Francisco: Morgan Kaufmann, 27–34. Hesslow, Germund (1976), ‘‘Discussion: Two Notes on the Probabilistic Approach to Causality’’, Philosophy of Science 43:290–292. Hitchcock, Christopher R. (2001a), ‘‘Causal Generalizations and Good Advice’’, Monist 84: 222–246. ——— (2001b), ‘‘The Intransitivity of Causation Revealed in Equations and Graphs’’, Journal of Philosophy 158(6): 273–299. ——— (2002), ‘‘Good Advice’’, in Henry Kyburg, Jr. and Mariam Thalos (eds.), Proba- bility is the Very Guide of Life: A Poor Folks’ Guide to the Uses of Probability. Chicago: Open Court. Humphreys, Paul (1989), The Chances of Explanation. Princeton: Princeton University Press. Korb, Kevin B. (1999), ‘‘Probabilistic Causal Structure’’, in Howard Sankey (ed.), Causation and Laws of Nature, Dordrecht: Kluwer, 265–311. Korb, Kevin B., and Ann E. Nicholson, (2003), Bayesian Artificial Intelligence. Boca Raton, FL: CRC Press. Mackie, John (1993), ‘‘Causes and Conditions’’, in Ernest Sosa and Michael Tooley (eds.), Causation. Oxford Readings in the Philosophy of Science. New York: Oxford Uni- versity Press, 33–55. Menzies, Peter (2002), ‘‘Difference Making in Context’’, in John Collins, Ned Hall, and L. A. Paul (eds.), Counterfactuals and Causation. Cambridge, MA: MIT Press. ——— (forthcoming), ‘‘The Causal EDcacy of Mental States’’, in Jean-Maurice Monnoyer (ed.), The Structure of the World: the Renewal of Metaphysics in the Australian School. Paris: Vrin Publishers. Neapolitan, Richard E. (2003), ‘‘Stochastic Causality’’, paper presented at the International Conference on Cognitive Science, Sydney, Australia, July 2003. Slides at http:// www.neiu.edu/reneapol/chapters.htm. Pearl, Judea (1988), Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan and Kaufman. ——— (2000), Causality. New York: Cambridge University Press. Salmon, Wesley C. (1970), ‘‘Statistical Explanation’’, in Robert G. Colodny (ed.), The Nature and Function of Scientific Theories. Pittsburgh: University of Pittsburgh Press, 173–231. Reprinted in Salmon et al. 1971. ——— (1974), Statistical Explanation and Statistical Relevance. Pittsburgh: University of Pittsburgh Press. ——— (1980), ‘‘Probabilistic Causality’’, Pacific Philosophical Quarterly 61:50–74. Skyrms, Brian (1980), Causal Necessity. New Haven: Yale University Press. Spirtes, Peter, Clark Glymour, and Richard Scheines (2000), Causation, Prediction, and Search, 2d ed. Cambridge, MA: MIT Press Suppes, Patrick (1970), A Probabilistic Theory of Causality. Amsterdam: North Holland. Wright, Sewall (1934), ‘‘The Method of Path CoeDcients’’, Annals of Mathematical Statistics 5(3):161–215. #04353 UCP: PHOS article # 710301 262 charles r. twardy and kevin b. korb