Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What are conditional probabilities conditional upon? Keith Hutchison The British Journal for the Philosophy of Science; Dec 1999; 50, 4; Academic Research Library pg. 665 Brit. J. Phi/. Sei. 50 ( 1999), 665-695 What Are Conditional Probabilities Conditional Upon? Keith Hutchisan ABSTRACT This paper rejects a traditional epistemic interpretation of conditional probability. Suppose some chancc process produces outcomes X, Y . ... , with probabilities P(X). P( Y) . ... lf later observation revcals that outcome Y has in fact been achieved, then the probability of outcome X cannot normally be revised to P(XIY) [= P(X nY)/P(}')]. This can only be done in exceptional circumstances-when more than just knowledge of Y-ness has been attained. The primary reason for this failure isthat the weight of a piccc of evidence varies with the means by which it is provided, so knowledge of Y-ness does not havc unifom1 impact on the probability of X. A better updating of the probability of Xis provided by P(XI Y* ), where Y* is not an outcome of the chance process being observed, but the sentence 'the outcome Y has been observed', an 'outcome' of the subsequent observa- tion proces~. This alternative formula is widely endorsed in practice. but not weil recognized in theory. where the ovcrsight has generated some unsatisfactory consequences. 1 The epistemic interpretation of conditional probahilitv 2 Clearing away sonw undergrowth 2.1 Non-epistemic Interpretations 2.2 The dual chance-processes involved in epistemic conditioning 2.3 ls the proposed 'new · rule really novel? 3 The negative argument 3.1 The basic chance-process 3.2 Elaboration of the chance-process 3.3 Refuting the Standard Fonnu/ation 4 The positive supp!ement 4.1 Correcting the Standard Formularion 4.2 When does the Standard Fonnu/ation work? 4.3 Updating in ignorance of epistemic procedure 4.4 Updating upon receipt of irrelevant or incorrect information 4.5 A revised model and formula 4.6 Abbreviating the revised model 4.7 An illustrative example 5 Some traditional information-puzzles: Monty Hall t'; Oxford University Press 1999 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 666 Keith Hutchisan 1 The epistemic interpretation of conditional probability The main thesis ofthis essay is very simple and negative. though supplemented by a more complicated positive sub-thesis. The core ofthe negative case argues (in Section 3) that there is something dramatically unsatisfactory with the standard epistemic interpretation of conditional probability (the 'Rule of Conditionalization ·, as it is often called), whilc the positive coda (Sections 4-5) proposes a replacement, a subtle adaption of the Standard Rule. Though this modified rule is already widely used, the difference between the two fonnulations is not weil recognized. Before the negative case is developed, however, a number of preliminaries need to be dealt with (in Section 2) to clarify the problem-after this long introduction has sketched the road ahead in some detaiL Widely deemed to govem the updating ofprobabilities (in the various senses ofthat difficult word) on receipt of new information, the version of the Rule of Conditionalization that I object to is typically formulated as follows: 1 Suppose circumstances are such as enable the reasonable observer of some chance-process to place 'confidence' P(X) in the outcome being of type X. THEN P(XI Y) ['=' P(XnY)!P(Y)] represents the confidence that observer should place in the outcome being of type X after the circumstances of the observer change. through addition of just the knowledge that some outcome of type Y has in fact been achieved. To sec what is meant by this Rule, it is useful to choose a concrete example. where there is Iittle confusion about the probability-values. even among those who espouse radically different notions of probability itself Suppose then a 1 For examples of thc cpistemic Rule stated in forms that sccm equivalcnt to mine. see Skyrms [19~6]. p. IR9): Resnick ([1986]. p. 75); Kyburg ([1990]. pp. 50-2); Howson and Crbach ( [ 1993]. p. I 04); Savage ([ 1954]. p. 44): Maher ([ 1993]. p. 85 ): Dudewic10 and Mishra 1[1988]. pp. 39-51 ). For a ven.ion of the rulc that differs importantly (yet erraticallyJ from the onc I treat as standard hcrc. see my discussion bclow (in Section 2.3) ofthe alternative formulation in terms of H ( 'hypothesis·) and F ( ·evidence · ). The subtle difference in meaning between the outcomc · }" here. and the vague 'E' there. is a key to this whole essay. A similar key is provided by the uscful treatment in De Finetti ([ 1974[. vol. I. pp. 134-51. More lcisurely than most. this discussion genuinely attempts to explain the rationale behind thc Standard Formulation. Two of De Finetti's rcmarks help to indicate whcre the view defended in this paper ditlcrs from the 'tandard one. Firstly. Dc Finetti says that 'every evaluation of probability" io. conditional upon 'the state of information in which [the ohscrvcr] tinds bim- self -hut r argue below that it is conditional upon hi' epistemic histor> (even if he does not know that history). Secondly. De Finetti motivates the Standard Formulation by envisaging a bctting game in which every debt is called off 'ifthe [conditionj does not turn out to he true'. This stipulation doe-. not yield epistemic conditionals, however. For De Finctti does not teil us what happcns when the condition is true but not known to bc true. To get the Standard Formulation, one needs to add something omitted hy Dc Finetti. hut difficult to comply with: that the debt is to be enforccd every time the condition turns oulto he truc (whether known tobe true or not). I An analogous requirement is introduced by in Section 4.2. when I observe that thc Standard Fonnulation prcsumö a 'persistent' observation-process.) For acceptable epistemic conditionals. however, one needs a more acccssible rule: the dcbt has to be enforced if ancl only if the condition is known to he mel. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditiona/ Prohabilities Conditional Upon? 667 family is chosen from the population of all two-child families in some region, and chosen 'randomly' -in thesensethat each family has equal probability of being chosen. (For simplicity, I shall use the word 'random' in this standard sense all through the discussion below.) The prohability that this family has two boys in it is approximately ± (while the probability that it has at least one boy is i). Suppose that after the choice has been madc. it is somehow ascertained that the family has at least one boy in it (without anything eise of relevance being discovered). Should the probability that the family has two boys now he updated? If so, has enough information been provided to allow a rcliable revision? And if that is so too, how is the updating to be effected? This example has been chosen to be one of those which precisely fits the Rute of Conditionalization as setout above; and that Rule applies immediately. It answers the first two of our questions in the affirmative: and provieles an algorithm that determines the updated probability for us. It gives us indeed no choice but to accept { as the updated prohability. 2 Forthis * is simply the value here of the formula's P(XnY)IP(Y), with X the outcome ('two hoys') of the initial draw whose probability is tobe revised. and Y the outcome asccrtained ('at least one boy'). 3 Arethese three answers good ones, however? I presume we all agree that the first of them is correct. 4 But the other two answers are \'erv suspect.Indeed. the dual function ofthis cssay is (firstly) to demonstrate that both are wrong. in that additional information is required to perform the updating reliably; then (secondly) to proviele a different, and weaker, rule, that uscs the appropriate extra information, to proviele a convincing updating. Both these points have been suggested clsewhere (in. say, the works cited in fns Rand 15)-hut with varying degrees of conviction. My task is to purse the question systematically. To catch a preliminary glimpse of why I take these views. suppose that the existence of a boy is revealed by simply randomly chom,ing a child from the selectcd family. Then in a long run ofsuch trials, halfthe families where a boy is revealcd will contain two boys. So it seems that a rational observer. who finds out both that and hmt· a boy has been detected, should now allocate the value ~ to the probability that the family has two boys. On the other hand, suppose the new knowledge had been acquired quite differently. by someone revealing the existence of a girl whenever possible. Such an informant would only indicate that the family contained a boy if it were impossible to indicate that it contained 2 For some endonement.1 of this ans wer, sec Hodges and Lehmalm ( [ 1964[. pp. 7R-9). plus fn. 7 below. Thc source' provided in fn. 7. howcver. need tobe rcad in conjunction with those cited in fn. 9. For some rejections of this ans wer. closely related to those pursued in thc prcsent essay ltogether with pointers to further endorscments of the 1 answer), see fns 6 and ~. ' For here: PtXnY) = 1 (thc probability-prior-to-the-new-lnformation of the family BOTH contain- ing two boys AND at least one hoy). while P(}} = 1 (the prnbahility-prior-to-the-new-infonnation of the state of affairs actuallv rcvealed bv that information). " lndccd. my remarks bclow simply do not address those-propensity-theorists. perhaps''-who believe the new information has no effect on the probability of two boys. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 668 Keith Hutchisrm a girl. So after thus finding out that the chosen family contained one boy. the probability that it contained two of them would surely j ump to 1. Whether or not these really are the best answers to our question here has been the subject of much debate (see e.g. the works cited in fns 5, 6, 7 and 8). and it is perhaps not essential to agree with them yet. What immediately matters is: to feel the pull in their direction; to note that they provide a good evaluation of the long-run frequencies; and to accept the clash between them and the postcrior probability (*) calculated using the standard Rule of Con- ditionalization. Yet that unreliable probability was the one imposed by the traditional algorithm, and we can now see why it is suspect: its observer did not know how the extra sex-information was being supplied. Finding out how the information was provided (apparently) enabled the observer to replace a defective estimate of the updated probability with a better one. 5 Such confticting probabilities are very common. Indeed it will turn out that it is only in quite special circumstances that the Standard Formularion of the Rule provides acceptable posterior probabilities. For conditional probabilities of the form P(XnY)IP(Y) in fact serve primarily to tell us somcthing else, vi:::.: 'How are we to update the probability of two boys if we find out that we were wrong about how the family was being chosen, in that it was not chosen at random from the whole population of [two-child] families, but only from a sub- population, that of such families with at least one boy?' (The * above is obviously the correct answer to this question.) The basicproblern with the Rulc ofConditionalization isthat the probability assigned by that Rule (here, the ~) is rarely the same as that tobe allocated after ascertaining the family belongs to the sub-population. if in truth the random choice was made from the whole population. For although arrival of the information does indeed indicate that the family was chosen from the sub- population, it does not provide the slightest reason to believe that it was chosen randomly from that sub-population. (And usually it is not, as I demonstrate at length below, for information is normally-if not always?-biased, and this bias in effect de-randomizes the earlier choicc.) So the Rule needs to be reformulated if it is to be successfully applied to the epistemic circumstances set out in its formulation-situations where partial information is received about the outcome of a prior chance-process. 1t is certainly not enough to insist that the Standard Rule is only meant to apply to situations where our information arrives free of bias (as hinted by at least one commentator-after glimpsing the inadequacies of the Standard Rule). 6 Firstly because bias is context-dependent, so information seems 5 This is the basic message of Nickerson (j 1996 j). For other similar messages. see also fns 8 and 15. "Cf Falk ([1993]. pp. Sn, 177-8). who gives the impression that thc probahility oftwo hoys in a two-child family after discovering at least onc boy would be * if the source was not tainted by sexist attitudes. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Probabilities Conditional Upon? 669 never to be bias-fee simpliciter. Secondly, because it tums out that the Rule only works when there is bias (but the bias has to be appropriate). In consequence, to impose this restriction would be to give up too hastily. For successful conditioning can be routinely effected with biased information. Indeed, what we need to do instead is just reformulate the Rule gi ven above so that it accords with prevailing practice. For despite the lip-service paid to the Rule in theoretical discussion, epistemic conditionals are often evaluated differently in concrete situations. They are then evaluated using a slightly different rule, inadequately distinguished from the articulation above. This alternative rule happily accommodates biased information; and it uses more data than the Standard Rule, since it typically rcquires that we know both P(XnY*) and P(Y*), where the Y* is not an outcome of the chance-process which takes place prior to the observation. Jt is instead an outcome of a subsequent observation-process, vi:::., the epistemic proposition 'the observer has ascertained the original outcome is of type Y'. The revised rule then assigns an updated probability given by P(XnY*)/P(Y*). To see this alternative rule in action, Iet us return to our family example, supposing that after selecting the family a girl is revealed to the observcr whenever possible (i.e. whenever the family contains at least one girl). Other- wise (i.e. when the family contains two boys) a single boy is revealed. The information hcrc is clearly biased against revealing boyness. but that does not hinder us from updating the probability of there being two boys ( or girls) in the family, after detecting one boy (or one girl). Indeed, an observer who understands how the information is being released, should allocate a prob- ability of ~ to there being two girls in the family if he finds out there is at least one girl. This ~ (as we have seen) is the answer given by the Standard Formulation, _, despite the bias. (lt provides, indeed, a preliminary illustration of what I meant above by ·appropriate' bias.) But it is also the answer given by the Y*-alternative, though I will not pause to perform the calculation, for the Y*-alternative is more interesting in the case of boy-information. For our observer should obviously reject this standard answer ~ (for the prob- ability of two boys) aftcr finding out by this means there is a boy in the family. Indeed, it is clearly a certainty that there are two boys, ifthe observer is told there is one. This, however, is the posterior probability provided by the alternative formula. For the Y*-formula recommends an updating to !1~, since both P(Y* )-the probability of finding out that there is at least one boy-and P(XnY*)-the probability of BOTH there being two boys AND finding out there is a boy in the family-are !· Similarly, in our earlier example (where the sex-information was released by Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 670 Keith Hutchisan making a random selection), P( Y*) = ~ and P(XnY*) = ~· so the alternative formula updates the probability to ~· the figure we earlier recommended (as according with the long-run frequency). The fact that we have reached updated probabilitics hcrc that diffcr from those provided by the Standard Formularion of the Rulc of Conditionalization (and will similarly meet many more below, as summarized in Table I in Section 3.3) does not directly refute that Rule, for morc information was used in reaching the non-standard value than the formulation under attack allows. The observer did not just ascertain that there was at least one boy in the family, for he was also presumed to have understood how that informationwas given to him. But we have certainly indicated that the updated probability varies with the process by which the information is received, for we have seen that the comparable information had different effects on the updating in the case of girls and boys. So the 'impact' of a piece of information varies with the means by which that information gets acquired. The Standard Rule ofConditionaliza- tion gives no hint that this is so, and does worse: it seems to preclude its accommodation. For the Rule's insistence that nn additional information reach the observer beyond knowledge that an outcome Y has been achieved means that the pre-conditions for application of the Rule are breached whenever reccipt of the new information brings with it the slightest hint as to how it arrived! We have, however, glimpsed an alternative formula, one which seems capable of accommodating such surplus information. Indeed, the alternative demands it, for its use requires one to know P( Y*) and P(XnY* ), and these depend on the epistemic procedures. In consequence, the revised rule gives better evaluations of the updated probability. A thoroughly satisfactory description of the process of conditioning is, however, difficult to formulate. One obstacle to its articulation will tease us all through the discussion below. the apparent need to insist that the Y (or Y*) here represent the totality of relevant new knowledge-as implied by my including the word 'just' in our initial formulation of the Standard Rule of Conditionalization. Another reflects the fact that it is not, in the end. essential to conditioning that the observcr acquirc knowledge of some outcome of the chance-process: onc can rationally update probabilities after receiving claims about outcomes that are known tobe inaccuratc: and onc can do the samc on receipt of knowledge about matters other than the outcomes of the chance- process under investigation. Such qucstions will bc pursucd further bclow, where they eventually Iead us to entertain a revised formula for the posterior probability slightly different to the Y*-formula above. (We abandon the latter formula only because it is not of wide enough scope. not because it is wrong.) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Probabilities Conditional Upon? 671 Another quite different set of difticulties arises because the interpretation of the words and symbols used to express the conditioning process are much at issue-varying with views as to the nature of probability. Some might interpret the 'contidence' as nothing more than an estimate of the long-run frequency: others will sec it as a measure of the weight of evidence available to the observer. Some will interpret the X and Y as sentences: others will interpret them as sets, outcomes perhaps of a chance-process. Fortunately, the essence of the case here can be presented without choosing between these alternative interpretations, for the argument develops out of those simple urn models at the centre of our understandings of probability, models where (it seems) the variety of interpretations is of least signiticance. The case does not involve exotic examples from the fringes of our under- standing-those inconvenient examples where the disputing interpretations drift apart. Indeed, my focus here will be on core examples of the conditioning process, the cases where there is a minimum of conceptual confusion, where prior probabilities are clearly detined, and where frequency-values (and safe betting ratios) are widely deemed an acceptable guide to prob- abilities. For an attack on the Standard Formulation is all the stronger. the more it relies on unproblematic chance-processes. Accordingly, it is unlikely that the probability values I cite below will be resisted, for (as already hinted) my primary target here is not false estimates of probability. What 1 seek is a better analysis of the procedures we use to reach those probabilities. As a consequence of this emphasis on articulation. I am not directly interested here in recommending novel probability assignments. except in the circumstances where too literal a reading of the Rule has triumphed over common sense. A good example of such a retreat from common sense is provided by the widespread claim that the probability of our randomly chosen two-child family having two boys must always change from ± to * upon discovery that the family contains a boy. As already observed, this grossly incorrect result comes from direct application of the Standard Rule of Conditionalization. and (yes 1) is sometimes cited as an example of the way thcory functions to correct the erroneous common sense of the beginner 17 7 E.g. Isaac ([1995]. pp. 2-'1-5). Cf. Feiler I[ 1968[, vol.l, pp. 11-'1-8. 125). Note especially the warning (at Feller"s p. 116) that these interpretations are not to be taken too seriously. and compare Billingsky ([ 1986], p. -'1-'18). 1t is not initially clear that these discussions are intended to include epistemic conditionals. To confirn1 that this is a reasonable reading. one needs also to consult the discussion cited in fn. 9 below. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 672 Keith Hutchisan More fully, my conclusions can be surveyed in the following (overlapping) terms.x When P(Y) -=/= 0, a) If P(XIY) is made equal to P(XnY)/P(Y) by fiat (via definition, axiom, etc.). then P(XIY) cannot be given the epistemic interpretation need for the Rule of Condi tional izati on. b) For P(XnY)/P(Y) is usually not a good estimate of the confidence to be placed in X after discovery that Y is true. c) Yet P(XnY)/P(Y) is sometimes a good estimate of the confidence to be placed in X after discovery that Y is true-but only in circumstances where more has been ascertained than just thc truth of Y. So for P(XnY)IP(Y) to yield the probability posterior to the discovery it is oftcn important that Y not be the 'total evidence ·. d) If indeed one knows that the process which provides the knowledge of Y is 'persistent' (in a technical sense elaborated below). then P(XnY)/P(Y) is a good estimate of the confidence to be placed in X after discovery that Y is true. e) More generally. the confidence tobe place in X after ascertaining Y is not a function of the variables X and Y, but is a function of X and a somewhat different variable, one representing the circumstances under which the truth of Y was conveyed to us- Y* above. The essence of my case lies in the final claim here. e). for once that is made clear, the other sub-theses will be relatively obvious. In order to establish e). however, we need to confront a conspicuous haziness in the way the traditional Rule of Conditionalization was expressed above. where there was (as already noted) no extended accommodation of the 'principle of total evidence'. the idea that Y needs to represent 'all' the evidence that becomes available to us. Though this principle is partially denied in sub-thesis c). a corrected version of it remains essential to all updating of probabilities. yet we also need a version which allows s(nne of the new evidence to be ignored-that which is irrele- vant. (Otherwise we are quickly led into an infinite regress.) But there are severe difficulties in articulating a criterion for relevance. so the principle s The argument ofthe present paper overlaps somewhat with those in: Bar-Hillel and Falk ( [1982]): Freund (I 19651J: Korb ([1994]): :-.lickcrson ([ 19961): Shafer ([ 1985 j): I plus some further Iiterature cited in these sources). But there are rnajor differences. S<) Shafer seeb to defend the Standard Formula for PI XI Y). and thus finds a protocol that saves it -thc onc l refcr to below as ·persistent' Iransmission: Shafer does not seek to emphasiLe that the formula is false for most protocols: Shafer focuses overwhelmingly on subjectivc probabilities. and barely appreciates (e.g. p. 269) that the same problern occurs for frequentisK :-.lickerson and Bar-Hillel and Falk by contrast ignore thc Standard Formula: in consequence. they do not dcvclnp thc contrast between "ontic" conditionals (to which that formula applies) and epistemic ones: neithcr do they develop a substitute forthat formula. But they fully appreciate that posterior probabilities vary with thc mcans by which a piece of information is received. Korb and Freund certainly recognize thü, too. hut Korb"s primary interest lies elsewhere. whilc Freund presents his argument as an attack on something like the principle of indiiTerence ( 'the assumption of equal a posteriori probabilitics·. p. 29). See also fn. 15. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Prohabilities Conditional Upon? 673 remains imperfectly formulated, here and elsewhere. Accordingly, we will continue to proceed via concrete example, for once this example is understood, generalization will be reasonably obvious, and a negative case of wide validity can be made without so\ving the problern ofidentifying total relevant evidence in general. As already hinted, we eventually see that the relevance of the information is not what matters, in that it is often rational to revise probabilities after the receipt of information which, by normal criteria, is irrelevant. But before we move to the core of the case, several minor pieces of house- keeping seem to warrant attention. 2 Clearing away some undergrowth 2.1 Non-epistemic interpretations of conditional probability Conditional probabilities are often interpreted hypothetically, and even coun- terfactually, P(XI Y) being (say) the probability to allocate X were it to happen to be the case that Y was true. It is important to observe that such ·ontic · interpretations (as wc shall call them below) are not being pursued in the present essay-whose concern is with investigatory situations. where the truth of Ybecomes ascertained or believcd, etc. For there are many circumstances in which the two types of conditional probability are dramatically different: Bovine Spongiform Encephalitis can (at the moment of writing) only be diagnosed via a post-mortem, so the probability that a cow known to be infected with BSE will be alive tomorrow is zero. But it is widely believed that many living cows have the disease. and can survive for long pcriods of time with it. The ontic conditional is a dubious guide to the epistemic one here. Nor are we concerned here with another class of questions, also seen as exemplifying conditional probability, those which estimate the probability of some fact or observation, if a particular scientific theory that bears on the fact is true (and vice versa). So I do not ask how we estimate the probability that a cow with BSE will survive one year after infection, if a particular speculation about the operation of BSE happens to be true. Though such conditionals are not the target of thc criticisms below. the discussion does have an indirect bearing on them, in part because of an ongoing tendency in the Iiterature to blur the distinctions betwcen the varieties of conditionals. 9 And more importantly, though the problems we are directly concerned with seem to have a different character, we will eventually conclude that epistemic conditionals are simply special cases of ontic ones-with the 0 See e.g. Kyburg ([ 1990] p. 50. where conditional probabi1ity is describcd epistemically, but stochastic indcpcndencc interpreted ontically): Resnick (] 1986], pp. 48, 75. whcre conditiona1 probability is introduced ontically, then-1ater-interpreted epistemically. and pp. 53-4. where the author oscillates between ontic and cpistemic intcrprctations): Dudewicz and Mishra ([ 1988]. pp. 38-50. where questions about conditiona1 probabi1ity are indiscriminately ('') pw.ed in epistemic and ontic terms ). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 674 Keith Hutchisrm chance-process specially modified to include the epistemic activity that pro- vides the partial knowledge used for updating. Indeed. it should already be obvious that the epistemic conditional which is the probabilitytobe assigned to X when Y has been ascertained is just the ontic conditional which is the probabilitytobe assigned to X when Y* is true. (This is tobe contrasted with P(XnY)IP(Y), the probabilitytobe assigned to X when it is only true that Y is true.) 2.2 The dual chance-processes involved in epistemic conditioning It is important to note (with e.g. Rosenkrantz [1977]. pp. 48-52) that as soon as we pose the sort of question the Rule of Conditionalization is routinely supposed to answer. we are envisaging two quite separate types of activity as taking place. The first is typically a chance-process in which a physical system evolves 'non-deterministically' from some initial macrostate to one of a range of final states-like the throw of a die which ends up in one of six possible orientations on the tablc below. (The letters X and Y without the asterisks have been used above to refer to final states. 'outcomes', of this initial process.) The second proccss is typically subsequent to the first, and is an information- process, conveying a message to some observer that bears on the outcome achieved in the first process. lf (say) our die had dots on it of more than one colour, an example of this second type of process would be the discovery that the dots on the uppermost face of the die were all red. As a result of the supplementary information provided by this second process. the observer might weil abandon the probabilities he initially allocated to the outcomes of the first process and replace them by updated probabilities. In our case. suppose (tobe concrete) that the die had red dots for 1, 2 and 3; and bluc dots for 4, 5 and 6. An observer who discovered that red dots had been thrown might then revise thc probability of I or 2 having been thrown from { to ~­ Probabilities prior to the second process are supplanted by those posterior to it. That is what conditioning is about. (More precisely. that is what condition- ing seems tobe about: we eventually see that it is not important that the second process be revealing facts about the outcome of the first one.) Of course. the second process here, the information-process, may itself involve unpredictable activity, and thus constitute aseparate chance-process; and even if the observing activity is not probabilistic, it can still be so conceptualized-since deterministic processes are simply chance ones with probabilities 1 and 0. The things I have designated Y* above are the outcomes (in the technical Kolmogorov sense) of this second chance-process. So the whole of our discussion here is set paradigmatically in a context where two Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Prohahilities Conditional Upon? 675 distinguishable chance-processes are taking place: an 'evolution-process' (as I shall often call it below), and then an 'observation-process'. I am not sure that these separate processes are always easy to distinguish, or that the observation has to take place after the evolution. But these conditions are often met, and characterize the conceptually simpler cases of conditioning. Such cases can serve as type-specimens for more complex situations, and accordingly we concentrate on them here. For the fact that the Standard Formularion of the Rule of Conditionalization fails to provide the simplest of probability updatings suffices to rob that formulation of its credibility. To make the negative case. there is no need to Iook at blurred situations. It is important to separate the two different processes set out above. 1 0 because the main problern I detect with the Standard Formtdarion is its implicit claim that the probabilities associated with the second of the pair, the observa- tion-process, are irrelevant to the updating of probability. 11 Its formulations routinely assert that posterior probabilities can be calculated from nothing more than a knowledge ofthe probabilities associated with the first process. the temporal evolution. the one that undergoes observation. For they declare that if the second process detects that the first process has produced an outcome of type Y. then the new probability for an outcome of type Xis simply P(XnY)/ P( Y), where P(XnY) and P( Y) are both probabilities of outcomes of the first process. In contrast, I claim we need to attend to the probabilities associated with the outcomes Y* of the second chance-process, the P( Y*) as I have called them, the probabilities of receiving the conditioning information. The need to embrace these probabilities is widely recognized in practice, for computations of posterior probabilities do in fact often use understandings of the second chance-process (as we will see in the next section). But the fact that it is these probabilities which are being used is often obscured, for the distinction between the two processes is blurred, especially in theoretical discussion. So the main problern confronted in this discussion is one of articulation, finding the right description of procedures that are already in partial use. 1 " Two of thc cxamples I give in Section 2.3 below of mis-application of the Rule of Conditio- nalization both im olve situations whcrc the first chance-pmcess. that generating the priors. is not spccificd in any detail, and more or less left to the imagination of the rcadcr. This indeed is one reawn that prior probabilities are so oftcn clusive. and (in consequence) so widely distrusted. Yct when the first process is clearly spelled out, thc priors are quite unproblematical, and should raisc no hackles at alll 11 For some discussions of conditioning. where it is reasonably clear that the probabilities of the obsen ation-proccss are overlooked. see: Rosenkrantz ([ 19771. pp. 48-9, cxample I). wherc thc protocol governing the stooge·s releao,e ofinformation is not mentioncd: Resnick ([1986]. p. 56, problem l ). where a 'persistent' proccss-in the sense set out at in Section 4.2 below-is prcsumably intended. but no indication is given that this is vital to the discussion; Dudcwicz & ~!ishra (j 1988J, pp. 38-50). wherc no hint is given that thc mode of observation is relevant. Contras! the discussions cited in fns 8 anJ 15. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 676 Keith Hutchisan Distinguishing the two chance-process involved m paradigmatic cases of conditioning is a vital first step in this analysis. 2.3 Is the proposed 'new' rule really novel? There is, however, a popular formulation of the conditioning process that might seem to have solved the problern of articulation here, the formulation phrased in terms of a 'hypothesis' (H = 'the family has two boys') and 'evidence' (E). It assigns an epistemic probability of P(HnE)IP(E) to the hypothesis after receipt of the 'evidence ·. and this formula gives the correct answer -so long as the vague Eis interpreted as referring to the second of the dual chance-processes distinguished above. To give the right ans wer, E has to be short for the sentence 'the evidence is acquired' -so that P(E) is the probability of acquiring the evidence (via that second process). So in our family example above, where boyness was only revealed if the family had two boys, P(E) can be interpreted as 'the probability of the observation-process indicating there is at least one boy in the family', and this is here k· So is P(HnE). In consequence, the revised probability recom- mended by this version of the conditioning process is now 1. a satisfactory ans wer. The calculation simply repeats that already carried out using Y* above. In many other situations such an interpretation of E (as referring to the second process) comes naturally, and the formula is repeated1y applied so as to be identical to the rule I formulate using Y*. It works weil-and in sofaras it does so. it supports my Y*-rule. Indeed, it does this so weil that it might seem my Y*-rule adds nothing to it, so the whole discussion here is pointless, targeting a mere straw man. Butthis is not so. Firstly, and relatively tritely. because it is not realized that the successful H-E formulation is so different from the standard X-Y formula- tion setout above. In particular, the successes ofthe H-E evidence-formulation are not deemed grounds to reject the X-Y outcome-formulation. In conse- quence, and far more importantly, there are important cases in the Iiterature where this H/E-formulation is used with E referring to Y, the content of the evidence, as opposed to Y*. the fact that it is received. So even if it is true that E is sametim es interpreted correctly, it is not ahmys interpreted correctly, and there is real need to distinguish the Y and Y* formulations-to avoid a confusion which Korb ([ 1994], pp. 142-5) aptly calls ·propositional mesmer- ization'. I gi ve two examples immediately below. both of which have rela- tively serious consequences. beyond mere quantitative mis-estimation (and Korb.loc. cit., gives another). So the H/E-formulation is in fact ambiguous and invites both interpretations, the false and the correct. My aim is to eliminate this source of confusion, by forcing the two interpretations further apart. A potent example ofthe failures here is provided by a familiar application of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Prohabilities Conditional Upon? 677 Bayes' Theorem to situations in which somc prediction made by a scientific theory His observed. Howson and Urbach argue ([1993]. pp. l19ff) that the probability of the theory is routinely increased by such a confirmation (and many others agree, e.g. Glymour [ 1980]. p. 92 and Salmon [ 1967]. p. 117). But the analyses here are fallacious, because they misidentify the evidence E, taking it tobethat which is observed (my 'Y'), rather than thefact that it is observed (my 'Y*'). For Howson and Urbach takc P(EIH) tobe I (deducing this from the stipulation that H =* E) and hence guickly infer that the posterior probability of the theory, P(HIE) = P(H)IP(E). Thi~. of coursc, can never be Iess than the prior probability P(H). since P(E) :S I, and will greatly exceed P(H) whenever P(E) is small. So they conclude (op. cit .. p. 120): 'any evidence [ ... 1 must confirm every hypothesis [ ... ] ofwhich it is a logical conseguence' (so lang as we stay outside the territory of evidence and theories that are so implausible their probabilities vanish altogether). To display the fallacy here, Iet us Iook at the concrete example provided by our two-child family. We are asking how one should update thc probability of the hypothesis H (with prior probability ~) that the randomly chosen family has two boys, after obsening something that follows from the truth of the hypoth- esis-that the family has at least one boy. Let E be the sentence expressing the observed conseguence of the hypoth- esis, i.e. 'the family has at least one boy'. Then E has prior probability P(E) =; and H =* E. Let further E* be the fuller Statement of the 'evidence'. the sentence 'the family is observcd to have at least one boy'. A probability cannot be confidently assigned to E* without more information about the observing process; but it is certainly not true that H =* E*. Because E* is not cntailed by H. this cannot be what Howson and Urbach (and others who endorse this analysis) mean by the 'evidence'. They must mean something closcr to my E, and I must assume this is what do mean. as there are no plausible alternatives. Then indeed, the conditional P(EIH) becomes I (as they presume). Given this choice of evidence E, however, the posterior probability has tobe evaluated as *· since then P(HIE) = P(H)/P(E) = ~/ ~- Yet this is the answer already noted above as being grossly wrang. A better treatment would use E"' as the evidence. to yield the posterior probability P(HIE*) = P(E*IH).P(H)/P(E*J which can readily be less than the prior probability. That happens whenever P(E*IH) < P(E*). I.c. if some conseguence of a hypothesis is especially unlikely to be observed when the hypothesis is true, thcn witnessing that conseguence can decrease the plausibility of the hypothesis-as intuition Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 678 Keith Hutchis(m would surely suggest. 12 Howson and Urbach and their supporters g1ve an analysis which denies this, for they misrepresent the evidence, misled by the ambiguity of the Hf E-treatment of conditional probabilities. My target here is no straw man! We can see much the same phenomenon in a farnaus discussion of Jeffrey's (11983], p. 165 ). centred on the updating of the probability of a hypothcsis after arrival of evidence which is so vague that its content cannot be put into words. Jeffrey claims the Rule of Conditionalization does not apply here. because there is no proposition (which I denote 'E') encapsulating the content of the new information acquired. Yet (as Jeffrey hirnself notes. op. cit.. p. 166) there is a proposition (my 'E*') encapsulating the fact that thc ineffable evidence has arrived-or Jeffrey would not have been able to describe hisproblern to us. So the applicability of my proposed alternative rule is not called into question. This does not say that the revised formula is correct (or helpful), but it does confirm that it differs from a widely endorsed interpretation of the Standard Formulation ofthe process ofupdating. (In fact, thc revised rule seems to make Jeffrey's special rulc quite redundant. but I do not argue that case here.) We are finally ready to begin our main critique of that formulation. As already indicated. we use variants of a simple um experiment, designed to model the two-child family problem-but simplified and elaborated so as to emphasize the role of the observation-processes that provide the updating information. 3 The negative argument 3.1 The basic chance-process Let us imagine an um contains (say) eight coins, 13 two with two heads on them, another two with two tails, and the remaining four with one tail and one head. Suppose a totally honest scrutineer ('Eve', we will call her) causes the state of these coins to change in a seemingly 'non-deterministic' manner, by making a random draw of one coin from an um. le Bcfore I arrive at work, I am 20% confident that both secrctaries will be away (becausc of a severe epidemic of ftu). After I arrive at work I find a note telling methat one ofthe secretaries is away. But in consequence of finding this note, my con11dence that both secretaries are away drops dramatically-bccause by far the most plau>.ible source for the note is the other secretary. Let: H be the sentence 'both sccretaries are away': E bc the sentence 'at least one sccretary is away'; and E* be thc sentence 'I find a note telling methat at least onc secretary is away'. Then H => E. But the truth of E* (i.e. obscrvation of E = ohservation of a comequence of H) virtually guarantees that H is false. Foranother well-known example (due to Riebmond Thomason). see V an Fraassen ([l9R4], p. 246). 11 At this stagein our analysis the S hereisnot important. and any multiple of 4 (including 4 it>.e\0 would serve equally well. The 8 is chosen as the smallest numher that will accommodate a variant of thc problem introduced below. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Condirional Probabilities Conditional Upon? 679 Suppose an observer ('Adam') is interested in betting on the head/tail properties of the coin chosen. For such an observer, the draw (our 'evolu- tionary' chance-process. as articulated in Section 2.2 above) can be weil modelled by choosing a 3-member outcome space ü ~ { hh, ht. tt}, where 'hlz' stands for the outcome-type description 'a two headed coin was chosen', and so on. Since the choice was declared to be random. the function p with values p( { hh}) = p( { tt}) = ~ & p( { ht} )= t. etc. will suffice to make the pair d2, p> a finite probability space (in the sense of Kolmogorov) that con·ectly models the chance-process. (I do not defend this claim, simply presuming the reader's familiarity with such matters.) Let H be the set { hh}, representing a draw that produces two heads. and Iet us consider at length the way our observer Adam should update the probability he assigns to this two-head possibility as his epistemic circumstances change. If all Adam knows is that implicit in the specification of the initial draw, he should assign a prior probability to H of p( { hh}) == ~- Suppose Adam Iater participates in some observation-process, and thus receives additional information about the head/tail-state of the drawn coin, indeed that it has at least one head on it, i.e. that an outc0111e in the subset E = {hh, ht) has been achieved. (Note that p(E) == ~.) If it is also true that this constitutes the totality of his new knowledge. then the traditional formulation of the Rule of Conditionalization is paradigmatically applicable. For we have all the prerequisites for application of that Rule in place: an evolutionary chance-process and an observational chance-process; an acceptable probabil- ity model for the evolution-process; and finally, the observation-process has revealed that an event represented by a subset of the outcome space has occurred. According to the Standard Rule, this is all we need to know to revise the probabi lities the observer should assign to all other outcomes-and it teils us that the new probability tobe assigned to His unequivocally *· 14 For reasons that will be spelled out at length soon. I do not accept this answer; and I suspect that many readers (those who reject the analogous answers to the family-with-one-boy problem) already agree with me. For the moment however. that is not what matters. The important point is to accept that one is committed to this answer if one endorses a version of the Rule of Conditionalization Iike that articulated in the opening of this paper. A defender of the Standard Rule could, it is true, deny this by insisting it is impossible to find out that the outcome is of type E without finding out something eise. Indeed. every time one discovers something about the coin drawn, one will also learn something about oneself. e.g. that one's eyes have recently been open. So E will never represent the total evidence here, and the 14 Forthat Rule assigns a posterior probability given by thc fonnula P(H nEi/P(EI which now take' the values p( (hh} n( hh, ht })lp( ( hh. ht}) = p( ( hh} )/~ = Yi = ~- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 680 Keith Hutchisan conditions for application of the Rule arenot met. I do not have any compelling response to this version of the objection. except to remark that it obviously undermines many other versions of the Rule of Conditionalization. If this objection were taken too seriously. it might weil protect such rules from refutation, but at the dramatic price of proteering them from application. No sentence could ever be formulated to encapsulatc the full evidence, and every case of conditioning would be confronted by Jeffrey's difficulty (noted in Section 2.3). One must (I presume) deem that this particular piece of additional knowl- edge is irrelevant. But as indicated above. I have no useful account to offer of how we decide what is relevant and irrelevant. and hope that it is relatively easy to make this decision in individual cases. In the end. we will observe that relevance is not the big issue, so I am in no hurry to deal with this problem! 3.2 Elaboration of the chance-process A far more serious version of the same objection observes that detection of an outcome of the evolution-process always brings with it information about some details of the observation-process. One cannot find out that the coin drawn has at least one head on it without understanding something about how one found out the coin possessed that head. That 'something' must be accom- modated when assessing the total evidence, unless we deem it irrelevant. following the strategy above. But that exit is rightly closed to us here. since an insistence upon the relevance of the observation-process constitutes the germ of my case! So the objection is a valid one. Yet its impact is slight. For it remains true that the conventional formulation of the Rule of Conditionalization does not force us to accommodate the source of our new knowledge. To see this, Iet us consider a slight modification of the above chance-process, one designed to leave the evolution-process unaltered in all essentials, but with details of the observation-process spelled out in advance. In consequence (as my concrete example should make clear) the obscrver Adam's understanding of the observation-process is acquired sepa- rately from (and well before) the new knowledge about the outcome of the evolution-process-hence cannot now get included in the 'total evidence'. So Iet us now suppose that (in this revised process) our scrutineer Eve takes the coin behind a screen after making her draw. There she makes several decisions, via processes declared in advance and understood by Adam. After implementing these decisions, she emerges from the screen with the selected coin flat on a saucer but covered. She then shows the saucer to Adam, and removes the cover. His eyes then see the coin. quickly detecting whether it is a head or a tail that is on show. Prior to Adam· s eyes seeing the coin, the probability of there being a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Prohabilities Conditional Upon? 681 two-headed coin on the saucer is 1· How should this probability be modified just after his eyes idcntify a head? The standard version of the Rule of Conditionalization still applies to this revised chance-proccss, and it commits us to the same answer as before, *· For an observer interested only in heads and tails can still model the elaborated process by exactly the same probability space as was used to model the simpler process, for the two evolution-processes are effectively identical: they end up with the same three possibilities for the head/tail-state of the coin drawn: and the prior probabilities of these statcs are quite unchanged. In this case, howevcr. it is implausible to insist that additional knowledge of relevance was acquired at thc time the head was recognized: for Adam possessed a full understanding of how he was going to find out about the coin before his eyes made the identification. The only things he ascertains as a result of identifying the coin face is the fact that the coin has at least one head on it-that the outcome is in the subset {lzh, ht). This knowledge now constitutes the total new evidence of relevance. So thc Standard Formula- tion of the Rule does indeed apply. and it Ieads to the posterior probability I reject, namely ~- 3.3 Refuting the Standard Formulation The reason that this posterior probability * is unsatisfactory. is that the prob- ability here obviously depends on what decisions the scrutineer Eve imple- ments behind the screen. 15 If she had agreed. for example, to show a tail whenever possible, then she will only show a head when the coin has two heads-and Adam (we have agreed) will know this. So the posterior prob- abilitytobe allocated by a wise observer to the coin's having two heads when one head has been revealed is l. Two heads now becomes a certainty after seeing one head. Not only does this clearly correct answer differ from that provided by the Standard Formula, 16 but more information has been used in its 1 ' This fact is apparently welJ recognised by many statisticiam •. though secms oddly obscured in their discu"ions. Sec c.g. Grimmet! and Stirzaker ([ 1993]. p. 24. problem 26), but note that the phcnomenon is hidden in a problem. while an indcx search for thc critical word 'protocol" Ieads to no exposition of note. For some other examples of the recognition. see the citations in fn. 8 abovc. 1° From oral discussion it has become clear that this claim is controversial. Yet those who dispule it are often unconsciously correcting the Standard Formulation in exactly the direction I recom- mend. The critical issue ;,, what onc takes as the denominator to evaluate the posterior probability, the prior probability of getting at least one head. or the prior probability that one will find out that one has got a head. Thc formula erroneously takes the forrner option. for it says that one uses P(Y) the prohability of getting (in the cvolution-procc". the one generating the priors) the discovcred outcorne Y. This Y is what the observer finds out. so P( Y) is not the probahility of discovering Y. An event might have a high probability of taking place. but discovery of the evcnt might be very unlikcly. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 682 Keith Hutchisan evaluation than the Standard Formularion deems to be required-more has bcen used than just the values of P(XnY) and P(Y). Indeed, knowledge restricted to the outcome-probabilities of the evolution-process-the simple selection of the coin-does not here enable Adam to update the probabilities. He cannot reach the answer of the last paragraph using such information alone. for his understanding of the observation-process was clearly vital there. (This is why I earlier described the Standard Formulation as pretending to usc too littlc data, and hcncc as 'strongcr' than thc process I articulate.) The answer provided by the Standard Formulation (vi;:;. *) does. however, apply when an appropriate observation-process is involved. If indeed Eve agrees to show a head whenever possible, the wise observer who understands that this is how she is deciding, should now allocate * as the posterior probability. (For a head will then be revealed after ~ of a-11 draws in a long run; but two heads will be present in! of draws. all ofthem included within the where the head is revealed. So the two-heads cases will constitute * of the cases where a head is revealed.) Yet if Eve simply tosses the coin, and reveals the face which comesdown uppermost, the situation is more complicated, for the bias on the coins becomes important-even though that (obviously!) had no bearing on the outcome probabilities of the evolution-process, the draw from the um. A fortiori, the bias cannot be reftected in the any conditional probabilities calculated using the Standard Formula. But they can be accommodated in more elaborate calculations that attend to the observation-process. lndeed, if the coins are known tobe unbiased. the posterior probability (for an observer who knows Eve makes her decision by tossing the coin) is easily evaluated as !· But if the coins are biased (and perhaps differcntly biased). a whole range of posterior probabilities can be provided. according to the degrees of bias. If the bias is unknown, I do not know what to do-but sec Section 4.3 below. Yet if the observer knows the bias, the calculation is straightforward: he uses knowledge of the bias. to calculate how often a head will be seen in a long run; he then calculates in what proportion of those cases, two heads will in fact be on the coin-and takes that proportion to be the posterior probability sought. So after the outcome Y of some evolution-process has been detected, the posterior probability to be allocated an outcome X depends on more than X, Y, and the prior probabilities of the outcomes ofthat process. lt varies with the epistemic procedure as weil. That establishes sub-thesis (e) in my summary above, and constitutes the core of my case. It also explains the title of my essay: for as the table below indicates in summary, conditional probabilities are conditional upon the protocols governing the release of information. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Prohabilities Conditional Upon? 683 Table 1. Protocol Used by Honest Scrutineer to Reveal at Least ONE Head on Coin Drawn Reveal head only if impossible to reveal tail Toss (unbiased) coin, then reveal upper face Reveal head whenever possible Reveal head using complex protocol in Section 4.7 Reveal head only if coin has head and tail Probability to he Accorded to nvo Heads. Ajier Discm·ering at Least ONE Head 100% 50'!c 33.3% 20'/c 0% 4 The positive supplement 4.1 Correcting the Standard Formulation The Standard Formularion of the Rule of Conditionalization claims that after ascertaining that an outcome of type Y has been achieved, one can calculate the posterior probability to be accorded to some other type X. using nothing beyond the range of prior probabilities of all outcomes of the evolution- process. The example just given shows that this is not true, yet makes it fairly obvious how to correct the formula more generally. Clearly, one cannot update the probability without access to the probabilities attached to the observation-processes that provide the updating information about the outcome achieved in the evolution-process under Observation. lndeed, the two critical probabilities are: (a) the probability that one will receive the evidence in question. P(Y*) as I have called it above; and (b) the probability that BOTH some outcome of type X will have been achieved in the process, AND the updating infonnation will have been received by the observer, i.e. P(XnY*). The posterior probabilitytobe assigned to Xis then just the ratio of these two quantities, P(XnY*)IP(Y*). 17 This is the replacement formula urged in the operring of the paper. 4.2 When does the Standard Formulation work? Despite this corrcction, there remain circumstances in which the Standard Formularion gives an acceptable answer. and it is worth briefly noting some 17 Exactly why this revised formula is true is hard to tcll, and thc problem of dcfending it is just a variant of the famous problern of justifying the convcntional formulation. Sec e.g. Howson and Urhach ([1993], pp. 99ft); Mahcr ([1993], pp. 120ft). A frcquency analysis is clear and works weil in situations (likc my urn draws) where frequcncies serve as a reliable guide to probabilities. I do not attempt a more elaborate justification, as I doubtthat will be an issue for many readers. They will bc unlikely to dispule the rule. 111erely claiming that it is what ü, already intended by thc Standard Formulation. Hencc the detour to confront this question in Section 2.3. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 684 Keith Hutchis(m examples of these. The obvious sufficient (but certainly not necessary: sec below) condition for this, isthat the observation-process besuch as will ensure that both P(}) = P(Y*) and P(XnY) = P(XnY*). These two requirements furthermore will bc met (trivially) whenever the observation-process is such as might weil be called 'persistent' -meaning that it never fail to in form ofthe truth of Y if Y is indeed true. So persistence is a sufficient condition for the Standard Formulation-granted, of course. that the observation is also ·reli- able · (in the sense of never indicating Y to be true when it is not). Such ·reliability' is implied by the presumption we have been making (right from the beginning of our discussion) that our observer is acquiring knowledge, but persistence (in my technical sense) is an additional requirement, and it is routine for epistemic process nottobe persistent. Many things are true without our finding out that they are true 1 If we think of the observation-process as a test (for Y-ness, of course, not X-ness !). then this condition for the accuracy of the Standard Formula is that often expressed by saying that the test produce neither false negatives (my ·persistence') nor false positives ('reliability'). Under such conditions, the Standard Formularion will clearly provide the right answer. So in the um draw example, we agreed with the answer *· if it was true that the scrutineer had agreed to reveal a head whenever possible, i.e. 'persistently'. Andin the two-boy family problem. the updated probability * seems correct, if the observer knows the family was randomly chosen, and that he has found out that it has at least one boy in it by some persistent procedurc, one revealing the existence of a boy whenever a boy is there. Being told that thcrc is a boy in the family only when a girl cannot be revealed, is not such a procedure. That is why the updated probability in these circumstances ( viz. I) is not the * provided by the formula. Thesefacts have a further consequence of importancc. for they show that the principle oftotal relevant evidence (as embedded in the Standard Formularion of the Rule of Conditionalization) is false: new knowledge does not have tobe restricted tothat allowed by this principle; and imposition of such a restriction can generate inaccurate updatings. For we have seen that one can confidently assign the posterior probability P(XI Y) to X, after ascertaining Y, if one has additionally ascertained that the information about Y was provided a persistent and reliable procedure-and if (as weil) nothing eise of relevance has been ascertained. There is clearly no necessity that the knowledge of the epistemic procedure here be acquired in advance of the knowledge of Y, as the timing of its arrival is not vital. So it is quite reasonable for this knowledge about the procedure to arrive slightly after the knowlcdge about the outcome of the evolution process. It would then be part of the new knowledge that creates the need to update probabilities. New knowledge does not then have tobe as restricted as the principle oftotal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Prohahilities Conditional Upon? 685 evidence often declares. To assign the posterior probability P(XIY) to each outcome of type X, after ascertaining that an outcome of type Y has been achieved, it is not necessary for the knowledgc that Y has bcen achieved to represcnt thc totalitv of relevant new knowledge acquired. Highly relevant knowledge-knowledge that the epistemic proccss was persistent can also be acquired-without jeopardizing the assigncd posterior probability. To relax the rule of total evidence to accommodate this fact is, however, relatively easy. and involvcs a mere rewrite of our earlicr Rule of Conditio- nalization. Doing this then gives us a restricted version of the Standard Formulation, one that now seems to give acceptable posterior probabilities, albeit within its limited scope. Using italics to indicate the modification, that revision reads: Suppose circumstances aresuch as enablc the reasonable ohscrver of some chance-process to place contidence P(X) in the outcome hcing of type X. THEN P(XnY)! P( Y) represcnts the confidence that obsen·er should place in the outcome heing of type X after the circumstances of thc ohserver change, through addition ()fjust TWO relevant pieces of knowledge-l'i::. that some outcome of r.vpe Y has in jäct heen achieved. AND that this indicatimz o{ Y -ness was provided bv a 'persistent· epistemic process. Conversely. if newly arrived knowledge is morc restricted-in accordance with the standard version of the principle of total evidence-to knowledge of the outcome-type Y alone. one cannot confidently update the probabilities of the other outcomcs. For if one does not already know thc procedures, it is not clear how to update the probabilities. lt does remains clcar, however, that one cannot trust the unadjusted formulation of the Rule of Conditionalization. But persistence is a rather strong requirement, and incompatible with most everyday epistemic activity (though quite acceptable in the careful model of conditioning that I sct out in Section 4.5). Reliability is (perhaps?) more common, and when information is received reliably, the whole discussion above can be significantly broadened. To illustrate this. I change temporarily to the language of 'hypothesis' and 'evidence·. as this secms to make our new situation come alive. For the Standard Rule also gives a correct updating when thc truth of some hypothesis H makes no difference to the transmission (as opposed to generation) of a piece of evidence. For P(HIE) then has the same value as P(HIE*), as can quickly be deduccd from our assumption that P(E*IE) = P(E*IHnE), since reliahility ensures that both P(E*nE) = P(E) and P(E*nH) = P(E*nE nH). But we still necd to know these facts ahout our epistemic activity to apply that Rule, so again thc principle of total evidcnce has to be relaxed. To update successfully, we need to have grasped our epistemic history. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 686 Keith Hutchis(i/l 4.3 Updating in ignorance of the epistemic procedure To see this last point vividly, we need only consider a variant of the chance- process last set out, whcre the scrutineer Eve remains honest (so that all information released is 'reliable' in the sense just noted). but where the observer Adam is now denied knowledge of the decision procedures used by Eve, and where. furthermore. she is not obliged to release information after every draw. Every time his cyes identify a head, the observer Adam will still know with certainty that the coin drawn possesses at least one head, so continues to acquire knowledge about the outcome of the cvolution-process, as presumed in the unadjusted Standard Formulation. But he will not now be in a position to assess the weight of his evidence. Eve may. for instance, be refusing torelease information when there are two heads on the coin drawn, so that it is impossible for there tobe two heads if one head is seen. Conversely, Eve may be doing the cxact opposite. showing the head only when there are two heads on the coin: the coin then has to havc two heads if the observer sees one head. But Adam cannot distinguish these two diametrically opposed situations. He is in con- sequence dramatically hampered in allocating posterior probabilities, and can have no confidence in the answer provided by the Standard Formulation. Adam is not completely incapacitated by these facts, however, and can certainly make guesses based on all sorts of evidence about how he believes Eve is making her dccisions, and hence estimate the critical prior probabilities, a: [== P(Y*). the probability of his having acquired the knowledge that an outcome of type Y has occurred] and ß [== P(X nY*). the probability of BOTH his having acquired this knowledge AND an outcome of type X having occurrcd]. But the posterior probability he will thus assign to two heads ( viz. ß!o:) is no more reliable than these guesses. and can be badly discordant with the reality of the situation. I.e. if the scrutineer Eve is choosing by a procedure totally differenttothat the observer Adam believes she is using. the long-term frequency with which two heads occurs among situations where the single head is revealed can be dramatically different to the probability the observer subjectively holds. Just how entitled he is to hold this probability in such circumstances is a question I do not wish to pronounce upon; 18 but we '" A subjectivist may im. ist tbat since probability represcnts a dcgree of belief. the observer has no choice but to hold the 'unsound" probability. But such a subjectivist will have to deny the standard Kolmogorov axiomatization, which attributes a probability of I to every tautology. For there are plenty of tautologies so complicated that somc ohservers will not bclieve they are tautologies. I do not know how to resolve this dilemma. The positive case I present here seems as strong (and weak) as the claim thal a tautology has tobe give thc prohability I. My negative case is far stronger. for it is clear thc Standard Formulation does worse. lt will not even allow the observer to hold the conditional probability allocated in the light ofhis suspicions as to how information is being provided. It insists uniformly on the posterior prohahility associated with a persistent observation-proccss. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Probahilities Conditional Upon? 687 must certainly recognize that the probabilitywill fail as a guide to safe betting ratios. It is (in some serious sense) a defective estimate of the posterior probability. To update probabilities safely, onc needs access to the something beyond that allowed by the unadjusted rule of total evidence~the probabili- ties governing the release of information. 4.4 Updating upon receipt of irrelevant or incorrect information The whole of the discussion above has emphasized the importance of distin- guishing between Yand Y*, Ybeing the knowledge provided to the observer by the action Y*. Wehave observed as wellthat posterior probabilities are directly dependent on thc probabilities associated with Y* rather than Y. In consequence it can readily happen that it is wise to update probabilities in situations where the Y is irrelevant, so lang as the associated Y* is not irrelevant. What the observer finds out may weil be of no interest; but the fact that he 'hears about if can be highly significant. Indeed, it can also be wise to update probabilities upon receipt of information Y that is badly wrong (whether suspected to be wrang or not); and it is possible to perform this updating rationally if one knows enough about thc proccss that provided the dubious information. The principles involved in supporting these claims are quite straightforward. and an example should make them clear. We peruse indeed a further adjustment of the basic um draw above, attend- ing now to the chemical composition of our eight coins. Suppose that four of the coins happentobe made ofbronze (with one bronze coin having two heads; one having two tails: and two having one head and one tail). Suppose also that the remaining four coins are made of nickel. Our obscrver Adam is to remain interested in betting on the head/tail properlies of a randomly drawn coin, and has no interest in the coins' chemical compositions. It is (I take it) obvious that the composition of the coin drawn has no bearing at all on its head/tail-state: the head/tail-statc of the coin drawn is 'statistically independent' (to use some common jargon) of its chemical-state. The probability of the selected coin having two heads on it is unatfected by the composition of the coin. and if one finds out that the coin is made of bronze one has found out something that would normally be deemed irrelevant to its head/tail-state. Butthis does not mean that one should avoid revising the probability ofthere being two heads on the coin after receiving the information that the coin is bronze. 19 For again. we must Iook at the observation-process. the rules governing Eve · s relcasing information. It could happen. for instance, that 19 I.e. two outcomes can be stati,tically indcpendent in the ontic sense. yet epistemically depen- dent. The persistent blurring of this distinction in the Iiterature (sec fn. 9 above) is simply a variant ver.sion of the problern attacked in the present paper ria the Rule of Conditionalization. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 688 Keith Hutchisan the scrutineer has promised only to releases the information that the coin is bronze when it BOTH has two heads on it AND is bronze. Otherwise, she will release no information about the chemical composition of the coin. Instead she will then indicate that the coin has a tail on it (if the coin does not have two heads), or that it has a head on it (if it has two heads but is not bronze). Undersuch conditions, the rational observer who knows in advance that this is how information is going tobe released will assign a posterior probability of 1 to the coin' s having two heads after discovery that it is bronze. Obviously, such a posterior probability is again provided by the formula I recommend, P(XnY*)IP(Y*) (for XnY* = Y* here). Yet the traditional for- mulation Ieads to no change in the probability after discovery that the coin is bronze. 20 It should also be obvious now that Eve's honesty is not an issue either. It is her practices that matter. not the words she uses to describe those practices, and discord between the words and the actions is of no interest to any observer who understands the actions. 21 Suppose indeed that Eve claims to release informa- tion as outlined in the last paragraph, but in fact does something different, revealing a bronze composition only when there is a head and a tail. The observer who knows this should simply ignore what Eve claimstobe doing. So if bronze is revealed to him, he will assign a posterior probability of 0 to the coin · s possessing two heads. And again, such a posterior probability is recom- mended by the formula I propose, for now P(X nY*)/P(Y*) = 0 since in these circumstances P(X nY*) = 0. Suppose, indeed. that the scrutineer were to make her 'dishonesty' patent, by declaring a familiar falsehood (e.g. that 2 plus 2 is 5) every time there were two heads, but revealed a chemical composition in all other circumstances. The observer who understands her actions is still able to update the probability of two heads. despite the gross irrelevance of her false claims. What matters to him is not the overt information declared in the scrutineer's message, but what might be called the 'latent' information, what one can infer from the fact that the claim has been made. 4.5 A revised model and formula Given then that the overt content of the message received by the observer is irrelevant to the transformation of prior probabilities into posterior ones, it seems a good idea to reconstrue our paradigm cases of conditioning, by 20 The probability of getting two heads on a bronze coin is 1/8; thc probability of getting bronze is 1/2. The conditional probability of gctting two heads given a bronze coin is just the ratio ofthese two probabilities, i.e. 1/4. This is the same as the prior probability of gctting two heads. ' 1 An observer who does not understand the actions may of course use the words as a guidc to the scrutineer's actions. and then honesty. etc. does become important. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Conditional Probabilities Conditional Upon? 689 imagining that the messages received are particularly bland. This reduces the risk of our being side-tracked by red herrings. So Iet us suppose that the information provided to the observer (in the observation process) takes an extremely simple form, say the illumination of a lamp, that comes on in one of two colours, green or red. This is intended to provide us with a relatively abstract model of a typical 'test' -to which there are three standard reponses: 'still waiting to hear' (= no light); 'the testwas positive' (= green light); 'the testwas negative' (=red light). More fully, I suggest the following type-specimen of the problern of con- ditioning. A chance-process takes place. A rational observer of the process does not witness its actual outcome, but does attribute probability P(X) to each possible outcome X. After the process has completed. the observer sees a light come on and display the colour green. He receives no other signals of significance. Should he revise the probability he assigns to each outcome X after seeing the green. and if so, how? Before answering this question. we should briefly note that no generality is lost here if we presume the observer is behaving in such a way as to recognize the green light every time it comes on, i.e. that the information process is persistent in the sense described above. 22 Realizing this enables us to simplify greatly the discussion below, since it means that there is no need to distinguish between epistemic and ontic conditionals. For the sentence 'the green light is on' is true in exactly the same circumstances as the sentence 'the observer realizes that a green light is being displayed'. It is, of course, obvious that there are circumstances in which the observer should update probabilities after seeing the green signal. It is much less obvious whether there is any general algorithm that governs this updating. A typical situation in which it is desirable to update is when there is a causallink between the unwitnessed outcome of the evolution-proccss and the colour of the light displaycd. The critical factor is the degree of correlation (however 22 For Iet us suppose this were not so. that the observer was watehing a light that shows eilher blue or yellow, and that he only saw the yellow interrnittently. To conditionalize on the non-persistent yellow signal is. however, the same as to conditionalize on an imaginary but persistent green signal that comes on whenever the observer recognizes yellow. So once we invoke the abstraction here of using a simple coloured-light-signal as a standardized representation of the conditionalizing inforrnation, there is no need to allow for non-persistent signals. Each of them can be replaccd by a persistent one. ln the language of tests, we are presuming that the observer always linds out the result of the test, on thc grounds that a tcst whose results only reach the observcr intermittently can be modelled by a different test. one whosc rcsults always reach the observer. The modeHing test is that variant test which is positive if BOTH the modelled test is positive AND the observer linds this out-negative otherwise. We arenot prcsuming here that the modelled test itself nevcr produces false negatives. That would require us to make presumptions ahout more than just how often the green light is secn. We would need to presume that the greenlight came on every time the property it was intended to revcal was present. We cannot judgc whether a test produce-; false negatives (or positives) without knowing the purpose of the test. lt is not a pruperty of the test simpliciter. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 690 Keith Hutchis(J/l generated) between each outcome X ofthe evolution-process and the display of a green light. hut to say that is to do little morc than rephrase the question asked. But at least that rephrasing enahles us to focus on what matters, and it is easy to analyse these in seemingly unproblematic frequentist terms. How often does the green light come on? In how many of those cases is X true? The updated probability we seek is surely-but see fn. 17-just the ratio of these two quantities. i.e. P(X is true & light is green)/ ?(light is green). 4.6 Abbreviating the revised model This 'ans wer' can be expressed rather differently. We can stop thinking of conditioning as involving a pair of chance-processes (one involving the temporal evolution of the state of the primary system, that under Observation, and the other involving the temporal evolution of the state of our signal- light). And we can start thinking in terms of a single, but two-stage, chance- process, a single non-deterministic evolution occurring within a composite physical system. The new evolving system isthat composed of the primary physical system under observation plus the signal-light. The two-stage evolution of the composite system produces outcomes which are pairs of the form (X,S) meaning that an outcome ofthe type X occurred in the primary system, and this was followed by the green signal. lt would be natural (and formally possible) to refer to this pair using a notation like XnS; and similarly to use S to refer to the composite outcome (!.1, S)-where !.1 is the set of all outcomes of the primary system. The symbol S then describes the event 'the signallight came on', irrespective of what happened in the primary system. Once that is done, the formula for the posterior probability simplifies. Now invoking nothing but probabilities of outcomes of a single (albeit composite) chance-process, it declares the posterior probahility to be the familiar ontic conditional ?(XIS)= P(XnS)IP(S). Of course. this Iooks just like the formula in the version of the Rule of Conditionalization that has been the target of this whole discussion. But the similarity is merely superficial, for the symbols have quite different meanings. The big difference isthat my formula incorporates the probabilities associated with the observer's receipt ofthe information, while the Standard Formulation ignores this. The latter's denominator P(Y) is simply the probability of the outcome Y being achieved in the primary sub-process within our composite. lt makes no essential reference to the means by which information about that outcomes is transmitted. The denominator in my formula by cantrast makes no reference to the outcomes of the initial process. It refers exclusively to the behaviour of the signalling process. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. What Are Cunditional Prohabilities Conditional Upun? 691 4.7 An illustrative example The discussion of thc last two sections was inconveniently abstract. and it is a good idea to see how it applies to concrete situations. So Iet us elaborate our discussion: by (firstly, in the prescnt section) observing the signal-light ana- lysis avoids the defective answer (*) to the two-boy family problem: and ( secondly, in the next section) by using the abbreviated model alone to system- atically evaluate one of the puzzling probabilities associated with a traditional paradox. The observational-process we imagine in our first illustration is going tobe convoluted, perhaps inconveniently so-but there are sevcral good reasons for tolerating the inconvenience: firstly, because the convolutions model the fact that information-ftow is often complex, with information sometimes withheld, and somctimes released. yet for different reasons in different circumstances, some of them erratic. and somc of them systematic or deliberate; and secondly, to demoostrate the ease with which the 'signal-light modelling' can handle the computation. Artificial simplicity would obscure that fact. So Iet us suppose that, after selecting the two-child family. a fair coin is tossed by the scrutineer. If it comes down heads. the scrutineer inspects the family. If it comes down tails. nothing eise happens and no light comes on. When the inspection is carried out, the scrutineer tosses the coin again if the family is found to contain two boys, and turns thc green light on, if she then gets a head.lfhowever, the family only contains one boy. she also turns on the green light. lf finally, the inspected family contains no boys, she turns the red light on. Prior to seeing the signal-light, a rational observer would assigns a prob- ability of 1 to there being two boys in the family. Suppose this observer sees the green-light. but gets no other relevant information. beyond the details above as to how the light is operated. What updated probability should he then assign to there being two boys in the family'? To answer this question now, we only need to ascertain two things. The first of them is what I have designated P(S) in Section 4.6 abO\·e. thc probability the green signal-light comes on. This is just ~-