Joyce’s Argument for Probabilism Patrick Maher (p-maher@uiuc.edu) Department of Philosophy, University of Illinois at Urbana-Champaign Abstract. James Joyce’s ‘Nonpragmatic Vindication of Probabilism’ gives a new argument for the conclusion that a person’s credences ought to satisfy the laws of probability. The premises of Joyce’s argument include six axioms about what counts as an adequate measure of the distance of a credence function from the truth. This paper shows that (a) Joyce’s argument for one of these axioms is invalid, (b) his argument for another axiom has a false premise, (c) neither axiom is plausible, and (d) without these implausible axioms Joyce’s vindication of probabilism fails. 1. Introduction James Joyce (1998) offered a new argument for the conclusion that a person credences ought to satisfy the laws of probability. The argument assumes that it makes sense to talk about the distance that a person’s credence function is from the truth in any possible world. Joyce also states six axioms that, he says, any adequate measure of this distance must satisfy. Joyce’s “Main Theorem” is that these axioms imply that if a person’s credence function b does not satisfy the laws of probability then there exists another credence function b∗ which does satisfy the laws of probability and which is closer to the truth in every possible world. Joyce assumes that an epistemically rational person strives to havecredences that areas close to the truthas possibleandso concludes that an epistemically rational person’s credences will satisfy the laws of probability. Joyce’s Main Theorem assumes that the set of propositions is count- able. However, the case in which the set of propositions is countably infinite raises mathematical issues that I judge not worthpursuing here, so in this article I will consider only the special case in which the set of propositions is finite. In that special case, at least, Joyce’s Main Theorem is correct.1 But there remains the question of whether his axioms are true. In Section 3 I show that Joyce’s argument for one of his axioms is invalid. In Section 4 I show that his argument for another axiom rests 1 There are a few small slips in Joyce’s proof but they can be corrected. For example, Joyce’s proposition II (p. 598) does not follow from the axiom Structure that he cites but it does follow from Normality. Also, his Lemma-3 (p. 599) is false if c = m, as is possible, but it is easy to show that the Main Theorem does hold if c = m. jafp.tex; 13/02/2001; 13:23; p.1 2 on a false premise. In Section 5 I argue that both these axioms are implausible. In Section 6 I show that without the implausible axioms Joyce’s vindication of probabilism fails. 2. Concepts and notation I will use the following concepts and notation. Except for ‘n’, ‘X → Y’, and ‘I’, this notation is Joyce’s. Ω: A finite algebra of propositions. (Joyce allows Ω to be countable but I am here considering only the special case in which it is finite.) n: The number of elements in Ω. �: The set of real numbers. X → Y: The set of all functions from X to Y. B: The set Ω → �. Joyce refers to elements of B as “credence func- tions.” V : The set of functions in Ω → {0, 1} that correspond to consistent truth value assignments to the elements of Ω (with 1 representing truth and 0 falsity). Elements of V can be thought of as possible worlds. Note that V ⊂ B. I: Theset of all I ∈ B×V → � whichare suchthat, for any b ∈ B and ω ∈ V , I(b, ω) is a rationallypermissible measure of the inaccuracy of b in the possible world ω. In the preceding definitions I said B and V are subsets of Ω → �. Here I was following what Joyce seems to be saying in the following passage (pp. 590f.): B is the family of all credence functions defined on a countable Boolean algebra of propositions Ω and V is the subset of B con- taining all consistent truth-value assignments to members of Ω. However, there are other places (p. 583 for example) where Joyce says or implies that B and V are subsets of �n. Let X1, . . . , Xn be some enumeration of the elements of Ω. Then we can identify any b ∈ Ω → � with the point (b(X1), . . ., b(Xn)) ∈ �n. Although Joyce does not mention it, I take it that he is assuming such an identification of Ω → � with �n. For simplicity, and to make my jafp.tex; 13/02/2001; 13:23; p.2 3 treatment correspond to Joyce’s, I will likewise assume this identifica- tion in what follows. 3. Weak Convexity The axiom that Joyce (p. 596) calls “Weak Convexity” may be stated as follows: Weak Convexity: If b, b∗ ∈ B, ω ∈ V , I ∈ I, I(b, ω) = I(b∗, ω), and m = 1 2 b+1 2 b∗ then I(b, ω) ≥ I(m, ω)with identity only if b = b∗. Here is Joyce’s argument for Weak Convexity quoted in full (pp. 596f., with two typographic errors corrected): To see why Weak Convexity is a reasonable constraint on gra- dational inaccuracy notice that in moving from b to m an agent would alter each degree of belief b(X) by adding an increment of k(X) = 1 2 [b∗(X)−b(X)].She wouldadd the same increment of k(X) to each m(X) in moving from m to b∗. To put it in geometric terms, the “vector” k that she must add to b to get m is the same as the vector she must add to m to get b∗. Furthermore, since b∗ = b +2k the change in belief involved in going from b to b∗ has the same direction but a doubly greater magnitude than the change involved in going from b to m. This means that the former change is more extreme than the latter in the sense that, for every proposition X, both changes alter the agent’s degree of belief for X in the same direction, either by moving it closer to one or closer to zero, but the b to b∗ change will always move b(X) twice as far as the b to m change moves it. Weak Convexity is motivated by the intuition that extremism in the pursuit of accuracy is no virtue. It says that if a certain change in a person’s degrees of belief does not improve accuracy then a more radical change in the samedirectionand of the same magnitude should not improve accuracy either. Indeed, this is just what the principle says. If it did not hold, one could have absurdities like this: “I raised my confidence levels in X and Y and my beliefs became less accurate overall, so I raised my confidence levels in X and Y again, by exactly the same amounts, and the initial accuracy was restored.” Joyce is here claiming that Weak Convexity follows from a premise that could be stated formally thus: Premise 1: If b, k ∈ B, ω ∈ V , I ∈ I, and I(b + k, ω) ≥ I(b, ω) then I(b + 2k, ω) ≥ I(b, ω). jafp.tex; 13/02/2001; 13:23; p.3 4 Define ρ ∈ B × Ω → � by the condition ρ(b, ω) = ∑ X∈Ω |b(X)− ω(X)|. A proof of the following theorem is given in Section 8.1. Theorem 1: Theassumptionthat ρ ∈ I is consistentwithPremise1 but not with Weak Convexity. This theorem shows that Premise 1 does not entail Weak Convexity and so Joyce’s argument for Weak Convexity is invalid. 4. Symmetry The axiom that Joyce (p. 596) calls “Symmetry” may be stated as follows: Symmetry: If I ∈ I and I(b, ω) = I(b∗, ω) then, for any λ ∈ [0, 1], I(λb + (1 − λ)b∗, ω) = I((1 − λ)b + λb∗, ω). Joyce offers a “rationale” or argument for Symmetry. As he presents it, this argument depends on Weak Convexity and we have just seen that Joyce has failed to justify that. However, Joyce’s essential point here seems not to depend on Weak Convexity, so I shall try to extract this essential point from his presentation. Speaking of a particular kind of violation of Symmetry, Joyce writes (p. 597): Given the initial symmetry of the situation this would amount to an unmotivated bias in favor of one set of beliefs or the other. I think he means that, in violating Symmetry, there would be a “bias” in favor of either b or b∗ and this bias would be “unmotivated” because of the symmetry involved in the fact that I(b, ω) = I(b∗, ω). A little later on the same page Joyce writes: [Symmetry] says that when b and b∗ are equally accurate there can be no grounds, based on considerations of accuracy alone, for pre- ferring a “compromise” that favors b to a symmetrical compromise that favors b∗. Although Joyce presents this as what Symmetry “says”, in fact Sym- metry makes no reference to “grounds” at all. So perhaps Joyce’s statement here is best interpreted as a restatement of his earlier claim, that any distinction between the “symmetrical compromises” would involve an “unmotivated bias”. jafp.tex; 13/02/2001; 13:23; p.4 5 Supposing that Joyce is here giving an argument for Symmetry, I will now try to formulate more clearly what this argument is. I am trying to find premises that fit what Joyce actually says as closely as possible, are as plausible as possible, and entail Symmetry. The following is the reconstruction that seems to me to best meet these conflicting desiderata. I will take the passages from Joyce that I quoted in the paragraph before last to be asserting: Premise 2: If a person judges b and b∗ to be equally accurate in ω, then that person has no grounds for judging λb +(1 − λ)b∗ and (1 − λ)b + λb∗ to have different accuracies in ω. Premise 2 departs somewhat from Joyce’s own wording because he talks, not of a person judging b and b∗ to be equally accurate, but of b and b∗ being equally accurate (see my last quotation). However, Joyce (p. 590) allows that there can be different measures of accuracy, so he cannot assume that there is a fact of the matter as to whether b and b∗ are equally accurate in ω. Hence my formulation of Premise 2 in terms of a person’s judgments. In addition to Premise 2 I take Joyce to be tacitly assuming: Premise 3: If a person has no grounds for judging that c and c∗ have different accuracies in ω then it is not rationally permissible for the person to judge that c and c∗ have different accuracies in ω. It follows from Premises 2 and 3 that if a person makes rationally permissible judgments, and the person judges that b and b∗ are equally accurate in ω, then the person does not judge that λb + (1 − λ)b∗ and (1 − λ)b + λb∗ have different accuracies in ω. This, together with the definition of I, entails Symmetry. So Symmetry does followvalidly from Premises 2 and 3. But I will now argue that Premise 2 is false. Consider the following example. (Here each b ∈ B will be regarded as a point in �n with its ith coordinate denoted bi.) ω = (1, 0, ω3, . . . , ωn) b = (−1, 0, ω3, . . ., ωn) b∗ = (2, 1, ω3, . . . , ωn) λ = 2/3 c = λb +(1 − λ)b∗ = (0, 1/3, ω3, . . . , ωn) c∗ = (1 − λ)b + λb∗ = (1, 2/3, ω3, . . . , ωn). Figure 1 shows the situation in the plane of the first two coordinates. Since ρ(b, ω) = ∑ X∈Ω |b(X)− ω(X)| = n∑ i=1 |bi − ωi| jafp.tex; 13/02/2001; 13:23; p.5 6 � b (−1, 0) � ω (1, 0) � b ∗ (2, 1) � c (0, 1/3) � c∗ (1, 2/3) ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ ✏✏ Figure 1. Counterexample to Premise 2 we have ρ(b, ω) = ρ(b∗, ω) = 2 but ρ(c, ω) = 4/3 > 2/3 = ρ(c∗, ω). So if I take ρ(b, ω) to measure the inaccuracy of b in ω then in this example I will judge that b and b∗ are equally accurate in ω but c is less accurate than c∗. According to Premise 2, I can have no grounds for taking c to be less accurate than c∗ in ω. But in fact there are perfectly sensible grounds for it, as follows. Moving from b to c improves accuracy with respect to X1 by 1 and moving from b∗ to c∗ does the same. On the other hand, moving from b to c reduces accuracy with respect to X2 by 1/3, while moving from b∗ to c∗ increases accuracy with respect to X2 by the same amount. Hence Premise 2 is false. A person who takes b and b∗ to be equally accurate in ω can havegrounds for taking λb+(1−λ)b∗ and (1−λ)b+λb∗ to have different accuracies in ω. 5. These axioms not plausible We have seen that Joyce’s arguments for Weak Convexity and Symme- try are both unsound. It also seems to me that neither of these axioms is self evident. I will now give a reason for thinking that in fact these axioms are false. It is natural to measure the inaccuracy of b with respect to the proposition X in possible world ω by |b(X)− ω(X)|. It is also natural to take the total inaccuracy of b to be the sum of its inaccuracies with respect to each proposition. But these two assumptions together imply that the overall inaccuracy of b in ω is ρ(b, ω). Hence, it is natural to jafp.tex; 13/02/2001; 13:23; p.6 7 use ρ(b, ω) as a measure of the inaccuracy of credence function b in the possible world ω. Furthermore, there are other contexts in which measures analogous to ρ seem the most natural ones to use. For example, imagine that a student takes three examinations, with the maximum possible score for each examination being 100. Suppose the student scores 92 on the first examination, 98 on the second, and 97 on the third. I now ask: How far short has the student fallen from the goal of a perfect score on all three examinations? I think the most natural answer to this question is 8 +2 + 3 = 13 points. But that answer assumes a measure with the same form as ρ. These considerations make it plausible that ρ ∈ I. But if ρ ∈ I then, by Theorem 1, it follows that Weak Convexity is false. The example used in Section 4 shows that it also follows that Symmetry is false. I conclude that, in the absence of any cogent argument for them, Weak Convexity and Symmetry are both implausible. 6. Other axioms not sufficient I have argued against two of Joyce’s axioms but have not criticized his other four axioms. In this section I will consider what can be inferred from these other axioms alone. Joyce’s first four axioms may be stated as follows. (I have altered Joyce’s formulations of these axioms to make them clearer and more precise.) Structure: If I ∈ I and ω ∈ V then a. I(b, ω) is a non-negative continuous function of b. b. For all X ∈ Ω and ε > 0 there exists δ > 0 such that, for all b ∈ B, if |b(X)| > δ then I(b, ω) > ε. Extensionality: I is a subset of B × V → �. Dominance: If I ∈ I and b(X) = b∗(X) for every X ∈ Ω other than Y then I(b, ω) > I(b∗, ω) iff |b(Y ) − ω(Y )| > |b∗(Y ) − ω(Y )|. Normality: If I ∈ I and |b(X)− ω(X)| = |b∗(X) − ω∗(X)| for all X ∈ Ω then I(b, ω) = I(b∗, ω∗). It is easy to prove: Theorem 2: The assumption that ρ ∈ I is consistent with Struc- ture, Extensionality, Dominance, and Normality. jafp.tex; 13/02/2001; 13:23; p.7 8 Hence these four axioms cannot be used to argue against the assump- tion that ρ ∈ I. As I mentioned in the Introduction, Joyce’s strategy for justifying probabilism is to show that if a credence function b does not satisfy the axioms of probability then there exists another credence function b∗ that does satisfy the axioms of probability and is closer to the truth in every possible world. On the other hand, we have the following theorem (proved in Section 8.2.) Theorem 3: If Ω contains at least one contingent proposition then there exists b ∈ B which does not satisfy the axioms of probability and is such that, for all b∗ ∈ B that do satisfy the axioms of probability, there exists ω ∈ V for which ρ(b, ω) ≤ ρ(b∗, ω). So if ρ ∈ I then Joyce’s vindicationof probabilism fails. But Theorem 2 says that the assumption that ρ ∈ I is consistent with Joyce’s first four axioms. Hence, without the implausible axioms of Weak Convexity and Symmetry, Joyce’s vindication of probabilism fails. 7. The Norm of Gradational Accuracy Besides the six axioms, Joyce’s argument for probabilism assumes that an epistemically rational person strives to have credences that are as close to the truth as possible. Joyce (p. 579) calls this “The Norm of Gradational Accuracy (NGA)”. I do not think that NGA is obviously true. Our concern with truth might more plausibly be understood as a concern to accept true propositions and not false ones, rather than to have credences that are close to the truth according to some favored measure of distance. Credences help determine what propositions it is rational to accept and this gives a connection between credences and truth. (See Maher 1993 ch. 6 for an elaboration of this approach.) I see no need to attach an additional epistemic value to having credences close to the truth; an advantage of not doing so is that we then do not need to make unjustifiable decisions about how to measure this dis- tance. So NGA seems to me more dubious than the laws of probability that Joyce is attempting to justify. However, the main point of this paper has been to show that, even if we grant Joyce NGA, his vindication of probabilism is still unsuccessful because two of his axioms are unjustified and implausible. jafp.tex; 13/02/2001; 13:23; p.8 9 8. Proofs 8.1. Proof of Theorem 1 Thinking of b, k, and ω as points in �n, let their ith coordinates be denoted bi, ki, and ωi respectively. Then we have: ρ(b + k, ω) = n∑ i=1 |bi + ki − ωi| = n∑ i=1 | 1 2 (bi − ωi)+ 12(bi +2ki − ωi)| ≤ n∑ i=1 ( 1 2 |bi − ωi| + 12 |bi +2ki − ωi| ) = 1 2 ρ(b, ω)+ 1 2 ρ(b +2k, ω). Hence ρ(b +2k, ω) ≥ 2ρ(b + k, ω) − ρ(b, ω) = ρ(b + k, ω)+ [ρ(b + k, ω) − ρ(b, ω)] ≥ ρ(b, ω), if ρ(b + k, ω) ≥ ρ(b, ω). Thus the assumption that ρ ∈ I is consistent with Premise 1. Now consider the case in which, for some ω, b = (ω1 + 1, ω2, ω3, . . . , ωn) b∗ = (ω1, ω2 +1, ω3, . . . , ωn). Then ρ(b, ω) = 1 = ρ(b∗, ω). Also, m = (ω1 + 1 2 , ω2 + 1 2 , ω3, . . . , ωn) and so ρ(m, ω) = 1 = ρ(b, ω). Since b �= b∗, this shows that it is not consistent with Weak Convexity to have ρ ∈ I. 8.2. Proof of Theorem 3 For each X ∈ Ω let nX denote the number of ω ∈ V for which ω(X) = 1. Define b ∈ B by the condition that, for all X ∈ Ω, b(X) = { 1 if nX > n/2 0 if nX ≤ n/2. } For each ω ∈ V let Xω be the conjunction of all Y ∈ Ω for which ω(Y ) = 1. Then ω(Xω) = 1 and ω′(Xω) = 0 if ω′ �= ω; hence nXω = 1 for all ω ∈ V . Since there is at least one contingent proposition in Ω, n ≥ 2 and so nXω ≤ n/2. Hence b(Xω) = 0 for all ω ∈ V . But jafp.tex; 13/02/2001; 13:23; p.9 10 {Xω : ω ∈ V } is a partition of Ω, so b does not satisfy the laws of probability. For any X ∈ Ω, ∑ ω∈V |b(X)− ω(X)| = nX|b(X)− 1| +(n − nX)|b(X)| = nX(1 − b(X))+(n − nX)b(X), since 0 ≤ b(X) ≤ 1 = nX +(n − 2nX)b(X) = { n − nX if nX > n/2 nX if nX ≤ n/2. } (1) Let b∗ be any element of B that satisfies the laws of probability. Then we likewise have∑ ω∈V |b∗(X) − ω(X)| = nX +(n − 2nX)b∗(X). Since 0 ≤ b∗(X) ≤ 1, it follows that ∑ ω∈V |b∗(X) − ω(X)| ≥ { n − nX if nX > n/2 nX if nX ≤ n/2. } (2) Comparing (1) and (2) we see that, regardless of the value of nX, ∑ ω∈V |b(X)− ω(X)| ≤ ∑ ω∈V |b∗(X) − ω(X)|. Since this holds for all X ∈ Ω, we have ∑ X∈Ω ∑ ω∈V |b(X)− ω(X)| ≤ ∑ X∈Ω ∑ ω∈V |b∗(X) − ω(X)|. Reversing the order of the summations and applying the definition of ρ then gives: ∑ ω∈V ρ(b, ω) ≤ ∑ ω∈V ρ(b∗, ω). Hence ρ(b∗, ω) ≥ ρ(b, ω) for at least one ω ∈ V . References Joyce, J. M.: 1998, ‘A Nonpragmatic Vindication of Probabilism’. Philosophy of Science 65, 575–603. Maher, P.: 1993, Betting on Theories. Cambridge: Cambridge University Press. jafp.tex; 13/02/2001; 13:23; p.10