Microsoft Word - II.doc 1 June 6, June 22, August 29, 2006 New abstract, Nov. 1, 2006 Minor rev. Dec 5 2006 Ignorance and Indifference John D. Norton Center for Philosophy of Science and Department of History and Philosophy of Science University of Pittsburgh Pittsburgh PA 15260 www.pitt.edu/~jdnorton The epistemic state of complete ignorance is not a probability distribution. In it, we assign the same, unique ignorance degree of belief to any contingent outcome and each of its contingent, disjunctive parts. That this is the appropriate way to represent complete ignorance is established by two instruments, each individually strong enough to identify this state. They are the principle of indifference (“PI”) and the notion that ignorance is invariant under certain redescriptions of the outcome space, here developed into the “principle of invariance of ignorance” (“PII”). Both instruments are so innocuous as almost to be platitudes. Yet the literature in probabilistic epistemology has misdiagnosed them as paradoxical or defective since they generate inconsistencies when conjoined with the assumption that an epistemic state must be a probability distribution. To underscore the need to drop this assumption, I express PII in its most defensible form as relating symmetric descriptions and show that paradoxes still arise if we assume the ignorance state to be a probability distribution. By separating out the different properties that characterize a probability measure, I show that the ignorance state is incompatible with each of the additivity and the dynamics of Bayesian conditionalization of the probability calculus. 2 1. Introduction In one ideal, a logic of induction would provide us with a belief state representing total ignorance that would evolve towards different belief states as new evidence is learned. That the Bayesian system cannot be such a logic follows from well-known, elementary considerations. In familiar paradoxes to be discussed here, the notion that indifference over outcomes requires equality of probability rapidly leads to contradictions. If our initial ignorance is sufficiently great, there are so many ways to be indifferent that that the resulting equalities contradict the additivity of the probability calculus. We can properly assign equal probabilities in a prior probability distribution only if our ignorance is not complete and we know enough to be able to identify which is the right partition of the outcome space over which to exercise indifference. Interpreting zero probability as ignorance also fails multiply. Additivity precludes ignorance on all outcomes, since the sum of probabilities over a partition must be unity; and the dynamics of Bayesian conditionalization makes it impossible to recover from ignorance. Once an outcome is assigned a zero prior, its posterior is also always zero. Thus it is hard to see that any prior can properly be called an “ignorance prior,” to use the term favored by Jaynes (2003, Ch. 12), but is at best a “partial ignorance prior.” For these reasons the growing use of terms like “noninformative priors,” “reference priors” or, most clearly “priors constructed by some formal rule” (Kass and Wasserman, 1996) is a welcome development. What of the hope that we may identify an ignorance belief state worthy of the name? Must we forgo it and limit our inductive logics to states of greater or lesser ignorance only? The central idea of this paper is that, if we forgo the idea that belief states must be probability distributions, then there is a unique, well-defined ignorance state and the project of this paper is to identify it. The instruments more than sufficient to specify this state already exist in the literature and are described in Section 2. They are the familiar principle of indifference (“PI”) and also the notion that ignorance states can be specified by invariance conditions. I will argue, however, that common uses of invariance conditions do not employ them in their most secure form. The most defensible invariance requirements use perfect symmetries and these are governed by what I shall call the “principle of the invariance of ignorance” (PII). In Section 3, I will review the familiar paradoxes associated with PI, identifying the strongest form of the paradoxes as those associated with competing but otherwise perfectly symmetric descriptions. I will also argue that 3 invariance conditions are beset by paradoxes analogous to those troubling PI. They arise even in the most secure confines of PII since simple problems can exhibit multiple competing symmetries, each generating a different invariance condition. In Section 4, I will argue that we have misdiagnosed these paradoxes as some kind of deficiency of the principle of indifference or an inapplicability of invariance conditions. Rather, they are such innocuous principles of evidence as to be near platitudes. They both derive from the notion that beliefs must be grounded in reasons and, in the absence of distinguishing reasons, there should be no difference of belief. How could we ever doubt the notion that, if we have no grounds at all to pick between two outcomes, then we should hold the same belief for each? The aura of paradox that surrounds the principles is an illusion created by our imposing the additional and incompatible assumption that an ignorance state must be a probability distribution. In the remaining sections, it will be shown that these instruments identify a unique, epistemic state of ignorance that is not a probability distribution. Section 5 describes the weaker theoretical context in which this ignorance state can be defined. It is based on a notion of non-numerical degrees of confirmation that may be compared qualitatively; and it may be selectively enriched to bring it closer to the full probability calculus. In Section 6, we shall see that implicit in the paradoxes of indifference is the notion that the state of ignorance is unchanged under disjunctive coarsening or refinement of the outcome space; and that this same state is invariant under a transformation that exchanges propositions with their negations. These two conditions each pick out the same ignorance state, in which a unique ignorance degree is assigned to all contingent propositions. In particular, in that state, we assign the same ignorance state of belief to all contingent propositions and each of their contingent, disjunctive parts. We shall see in Section 7 that this state is incompatible with both the additivity of the degrees of belief and also with the property that allows for Bayesian conditionalization, when these properties are separated out by algebraic means, so that any logic of induction that employs this state can use neither property. Section 8 contains some concluding remarks. 2. Instruments for Defining the State of Ignorance The present literature in probabilistic epistemology has identified two principles that can govern the distribution of belief. They are both based on the simple notion that beliefs must be grounded in reasons, so that when there are no differences in reasons there should be no 4 differences in belief. Applying this notion to different outcomes gives us the Principle of Indifference (Section 2.1); and applying it to two perfectly symmetric descriptions of the same outcome space gives us what I call the Principle of Invariance of Ignorance (Section 2.2). 2.1 Principle of Indifference The “principle of indifference” was named by Keynes (1921, Ch.IV) to codify a notion long established in writings on probability theory. I will express it in a form independent of probability measures.1 (PI) Principle of Indifference. If we are indifferent among several outcomes, that is, if we have no grounds for preferring one over any other, then we assign equal belief to each. Applications of the principle are familiar. In cases of finitely many outcomes, such as the throwing of a die, we assign equal probabilities of 1/6 to each of the 6 outcomes. If the outcomes form a continuum, such as the selection of a real magnitude between 1 and 2, we assign a uniform probability distribution. 2.2 Principle of Invariance of Ignorance A second, powerful notion has been developed and exploited by Jeffeys (1961, Ch.III) and Jaynes (2003, Ch. 12). The leading idea is that a state of ignorance can remain unchanged when we redescribe the outcomes; that is, there can be an invariance of ignorance under redescription. That invariance may powerfully constrain and even fix the belief distribution. Jaynes (2003, pp. 39-40) uses this idea to derive the principle of indifference as applied to probability measures over an outcome space with finitely many mutually exclusive and exhaustive outcomes A1, A2, … , An. If we are really ignorant over which outcome obtains, our 1 For completeness, I mention that this principle is purely epistemic. It is to be contrasted with an ontic symmetry principle, according to which outcomes A, B, C, … are assigned equal weights if, for every fact that favors A, there are corresponding facts favoring B, C, …; and similarly for B, C,… . In the familiar cases of die throws and dart tosses, it is this physical symmetry that more reliably governs the assigning of probabilities. 5 distribution of belief would be unchanged if we were to permute the labels A1, A2, … , An in any arbitrary way: Aπ(i)’ = Ai (1) where (π(1), π(2), … π(n)) is a permutation of (1, 2, … , n). A probability measure P that remains unchanged under all these permutations must satisfy2 P(A1) = P(A2) = …= P(An). If the outcomes Ai are mutually exclusive and exhaust the outcome space, then the measure is unique: P(Ai) = 1/n, for i= 1, … , n. This is the equality of belief called for by PI. This example illustrates a principle that I shall call: (PII) Principle of the Invariance of Ignorance. An epistemic state of ignorance is invariant under a transformation that relates symmetric descriptions. The new and essential restriction is the limitation to “symmetric descriptions,” which, loosely speaking are ones that cannot be distinguished other than through notational conventions. More precisely, symmetric descriptions are defined here as pairs of descriptions meeting two conditions: (S1) The two describe exactly the same physical possibilities; and each description can be generated from the other by a relabeling of terms, such as the additional or removal of primes, or the switching of words. An example is the permutation of labels of (1) above and a second is found below in (2a), (2b). (S2) The transformation that relates the two descriptions is “self-inverting.” That is, the same transformation takes us from the first description to the second, as from the second to the first. An example is the permutation that merely exchanges two labels; a second exchange of the same pair takes us back from the second description to the first. This principle is the most secure way of using invariance to fix belief distributions. What makes it so secure is the insistence on the perfect symmetry of the descriptions. That defeats any 2 The simplest way to arrive at this result is to consider a transformation that merely exchanges two labels, Ai and Ak, say, for i and k unequal, If the probability measure is to remain unchanged under all such exchanges, then we must have P(Ai) = P(Ak) for each pair i, k, which entails the equality stated. 6 attempt to find reasons upon which to base a difference in the distribution of belief in the two cases; for any feature of one description will, under the symmetry, assuredly be found in the second. So any difference in the two epistemic states cannot be grounded in reasons, but must reflect an arbitrary stipulation. We shall see, however, that common invocations of invariance conditions in the literature do not adhere strictly to this symmetry in the transformations and are thus less secure. If the outcomes form a continuum, the application of PII is identical in spirit to the deduction of the principle of indifference, though slightly more complicated. A clear illustration that gives the template for computing other cases is provided by applying PII to von Mises’ (1951, pp. 77-78) celebrated case of wine and water. We are given a glass with some unknown mixture of water and wine and know only that the mixtures lies somewhere between 1:2 parts water to wine and 2:1 parts water to wine. That is, the ratio of water to wine x lies in the interval 1/2 to 2; (2a) and the ratio of wine to water x’=1/x also lies in the interval 1/2 to 2. (2b) If we represent our uncertainty over x with the probability density p(x) and our uncertainty over x’ with the probability density p’(x’), the idea that our ignorance is unchanged by redescription turns out to fully specify both densities. The calculation that shows this has two parts. First we note that the transformation from x to x’ merely redescribes the same outcome, so the two should agree in assigning the same probabilities to the same outcomes. The outcome of x being in the small interval x to x+dx is the same outcome as x’ lying in x’ to x+dx’, where x’=1/x. Since the two outcomes must agree in probabilities we have:3 p(x’)dx’ = -p(x)dx That is, more precisely, A. Agreement in probability p’(x’) = -p(x) dx/dx’ (3a) In the second part, we note that there is a perfect symmetry between the two descriptions (2a) and (2b). Loosely speaking, that means that whatever our ignorance may be of the ratio of water to wine, it is just the same as our ignorance of the ratio of wine to water. Indeed had we mistakenly switched the labels in (2a) and (2b) it would make no difference to the problem posed. Formally that is expressed in the two descriptions (2a) and (2b) meeting the conditions 3 The negative sign arises since the increments dx and dx’ increase in opposite directions. 7 (S1) and (S2) above. The first condition (S1) is met in that (2a) becomes (2b) if we switch the words “water” and “wine” and replace the variable x by x’; and (2a) and (2b) still describe exactly the same outcome space. 4 Condition (S2) is met since x relates to x’ in exactly the same way as x’ relates to x. That is, the function that transforms x to x’ is exactly the same as the function that transforms x’ to x; they are both the taking of the arithmetic inverse x’ = 1/x x = 1/x’ (4) In other words, they are self-inverting, since composing the transformation with itself yields the identity map. We have complete symmetry of descriptions. So PII requires that the two probability distributions are the same: B. Symmetry p’(.) = p(.) (3b) Since dx/dx’ = -x2, the system of equations (3a), (3b) and (4) entail that any p(x) must satisfy the functional equation p(1/x) (1/x) = p(x) x (5) Notably, solutions of (5) do not include p(x) = constant. The most familiar solution is5 p(x) = K/x (5a) where the requirement that p(x) normalize to unity fixes K = 1/ln 4. The example of wine and water cleanly embodies the symmetry of descriptions needed to trigger the requirements of PII. Other familiar cases of symmetry appear to be a little less symmetric in so far as the transformations between the descriptions are not self-inverting. Take for example the redescription of all the reals by unit translation: x’ = x – 1 x = x’ + 1 (6) The transformation is not self-inverting so the perfect symmetry of descriptions fails in that we proceed from the x description to x’ by adding unity; and from the x’ to x by subtracting unity. 4 This symmetry can easily fails as it did in Von Mises’ original presentation. He took the ratio to lie in 1:1 to 2:1, so that permuting “wine” and “water” and replacing x with x’ does not lead to a description of the same outcome space. 5 Briefly, arbitrarily many solutions can be constructed by stipulating p(x) for 1≤x<2 and using (5) to define p(x) for 1/2 x1’(X), where we use the fact that x1’ is strictly decreasing in x. Similarly from X < 19 4. What Should We Learn from the Paradoxes? The moral usually drawn from the paradoxes of indifference is a correct by short-sighted one: indifference cannot be used as a means of specifying probabilities in cases of extensive ignorance. That there are analogous paradoxes for invariance conditions is less widely recognized. They are indicated obliquely in Jaynes’ work. He described (1973, §7) how he turned to the method of transformational invariance as a response to the paradoxes of indifference, exemplified in Bertrand’s paradoxes. They allowed him to single out just one partition over which to invoke indifference, so that (to use the language of Bertrand’s original writing) the problem becomes “well-posed.” The core notion of the method was (§7): Every circumstance left unspecified in the statement of a problem defines an invariance which the solution must have if there is to be any definite solution at all. We saw above that this notion leads directly to new paradoxes if our ignorance is sufficiently great to yield excessive invariance. Jaynes reported (§8) that this problem arises in the case of von Mises’ wine-water problem: On the usual viewpoint, the problem is underdetermined; nothing tells us which quantity should be regarded as uniformly distributed. However, from the standpoint of the invariance group, it may be more useful to regard such problems as overdetermined; so many things are left unspecified that the invariance group is too large, and no solution can conform to it. It thus appears that the “higher-level” problem of how to formulate statistical problems in such a way that they are neither underdetermined nor overdetermined may itself be capable of mathematical analysis. In the writer’s opinion it is one of the major weaknesses of present statistical practice that we do not seem to know how to formulate statistical problems in this way, or even how to judge whether a given problem is well posed. When the essential content of this 1973 paper was incorporated into Chapter 12 of Jaynes’ (2003) final and definitive work, this frank admission of the difficulty no longer appeared, even x2’(x), we have x2’(X) > x. The function x1’ is strictly decreasing since it is invertible, x1’(0)=1 and x1’(1)=0; and similarly for x2’. 20 though no solution had been found. Instead Jaynes (2003, pp. 381-82) sought to dismiss cases of great ignorance as too vague for analysis on the manifestly circular grounds that his methods were unable to provide a cogent analysis: If we merely specify ‘complete initial ignorance’, we cannot hope to obtain any definite prior distribution, because such a statement is too vague to define any mathematically well-posed problem. We are defining this state of knowledge far more precisely if we can specify a set of operations which we recognize as transforming the problem into an equivalent one. Having found such set of operations, the basic desideratum of consistency then places nontrivial restrictions on the form of the prior. My diagnosis, to be developed in the sections below, is that Jaynes was essentially correct in noting that invariance conditions may overdetermine an ignorance belief state. Indeed the principle of indifference also overdetermines such a state. In this regard, we shall see that both instruments are very effective at distinguishing a unique state of ignorance. The catch is that this state is not a probability distribution. Paradoxes only arise if we assume in addition that it must be. We thereby fail to see that PI and PII actually work exactly as they should. 5. A Weaker Structure In order to establish that PI and PII do pick out a unique state of ignorance, we need a structure hospitable to non-probabilistic belief states. Elsewhere, drawing on an extensive literature in axiom systems for the probability calculus, I have described such a structure (Norton, forthcoming). For a precise synopsis of its content, see Appendix. Informally, its basic entity is [A|B] is introduced through the properties of F. Framework. It represents the degree to which proposition B inductively supports proposition A, where these propositions are drawn from a (usually) finite set of propositions closed under the Boolean operations of ∼ (negation), ∨ (disjunction) and & (conjunction). The degrees are not assumed to be real valued. Rather it is only assumed that they form a partial order so that we can write [A|B] ≤ [C|D] and [A|B] < [C|D]. These comparison relations will be restricted by one further notion. Whatever else may happen, we do not expect that some proposition B can have less support that one of its disjunctive parts A on the same evidence. That is, we require monotonicity: if A⇒B⇒C, then [A|C]≤[B|C]. This much of the structure will provide the background for the analysis to follow. 21 The structure has further notions that will prove incompatible with the ignorance state to be defined. The first, introduced through the property A. Addition, is an addition operator ⊕, which allows the combining of degrees of support of mutually contradictory propositions to yield the degree of support of their disjunction. It is the surrogate of the additivity of the probability calculus. The second, introduced through the property B. Bayes Property, is a multiplication operator ⊗, which allows for the distinctive dynamics of updating associated with Bayes’ theorem. Both of these properties are compatible with the probability calculus and express essential elements of it. The existence of these two operators is logically independent of one another; we can have systems that have either one without the other. We shall see, however, that the unique state of ignorance will be incompatible with each of them individually. This structure is formulated in terms of “degrees of support.” On the supposition that we believe what we are warranted to believe, I will presume that our degrees of beliefs agree with these degrees of support. 6. Characterizing Ignorance Once we dispense with the idea that a state of ignorance must be represented by a probability distribution, we can return to the ideas developed in the context of PI, PII and their paradoxes and deploy them without arriving at contradictions. We can discern two properties of a state of ignorance: invariance under disjunctive coarsenings and refinements; and invariance under negation. As we shall see below, each is sufficient to specify the state of ignorance fully and it turns out to be the same state: a single ignorance degree of belief “I” assigned to all contingent propositions in the outcome space. 6.1 Invariance under Disjunctive Coarsenings and Refinements We saw in Section 3.1 above that the paradoxes of indifferences all depended upon a single idea: if our ignorance is sufficient, we may assign equal beliefs to all members of some partition of the outcome space and that equality persists through disjunctive coarsenings and refinements. This idea is explored here largely because of its wide acceptance in that literature. I have already indicated above in Section 3.1 that the idea is less defensible in cases in which there is not a complete symmetry between the two descriptions. We shall see in Section 6.2 below that 22 the same results about ignorance as derived here in Section 6.1 can be derived from PII using descriptions that are fully symmetric, related by self-inverting transformations. Let us develop the idea of invariance of ignorance under disjunctive coarsenings and refinements. If we have an outcome space Ω partitioned into mutually contradictory propositions Ω = A1 v A2 v … v An, an example of a disjunctive coarsening is the formation of the new partition of mutually contradictory propositions Ω = B1 v B2 v … v Bn–1, where B1= A1, B2= A2, … , Bn–1= An–1 v An (12) All disjunctive coarsenings of A1, A2, …, An are produced by finitely many applications of this coarsening operation along with arbitrary permutations of the propositions, as defined by (1) above. A partition and its coarsening are each non-trivial if none of their propositions is Ω or ∅. The inverse of a coarsening is a refinement. Assume that we have no grounds for preferring any of the members of the non-trivial partition of Ω = A1 v A2 v … v An, then by PI we assign equal belief to each, and, by supposition, this ignorance degree of belief [∅|Ω] [A|B] hold just in case [A|B] ≤ [C|D] but not [A|B] = [C|D]. For all admissible27 propositions A, B, C and D: [∅|Ω] ≤ [A|B] ≤ [Ω|Ω] [∅|Ω] < [Ω|Ω] [A|A] = [Ω|Ω] and [∅|A] = [∅|Ω] [A|B] ≤ [C|D] or [A|B] ≥ [C|D] (universal comparability) if A⇒B⇒C, then [A|C]≤[B|C] (monotonicity) A. Addition. For any admissible proposition Z and mutually contradictory propositions X and Y, there exists an addition operator ⊕ such that [X∨Y|Z] = [X|Z] ⊕ [Y|Z] where ⊕ is strictly increasing in both [X|Z] and [Y|Z]. B. Bayes Property is the conjunction of N. and M.: N. Narrowness. For any proposition A and any admissible B, [A|B] = [A&B|B] 27 Here and elsewhere, “admissible” precludes formation of the undefined [.|B], where B is of minimum degree. 35 M. Multiplication. For any proposition A and admissible propositions B and C such that A ⇒ B ⇒ C, there exists a multiplication operator ⊗ such that [A|C] = [A|B] ⊗ [B|C] where ⊗ is strictly increasing and thus invertible in both arguments (excepting [B|C], when [A|B]=[∅|B]). R. Real Values. For any admissible propositions A, A’, B and B’, the set of values possible for degrees of confirmation [A|B] can be mapped one-one onto a closed set of reals such that the mapped real values f([A|B]) > f([A’|B’]) just in case [A|B] > [A’|B’]. All these properties combined are sufficient to entail the existence of real valued degrees of support that can be rescaled to yield a conditional probability measure. References Bertrand, Joseph (1907) Calcul des Probabilités. 2nd Ed. Gauthier-Villars: Paris; repr. New York: Chelsea. Borel, Émile (1950) Elements of the Theory of Probability. Trans. John E. Freund. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1965. Cramer, Harald (1966) The Elements of Probability Theory and Some of its Applications. Huntington, New York: Robert E. Kreiger Publishing Co., 2nd Ed., 1966; reprinted 1973. Galavotti, Maria Carla (2005) Philosophical Introduction to Probability. Stanford: CSLI Publications. Gillies, Donald (2000) Philosophical Theories of Probability. London: Routledge. Howson, Colin and Urbach, Peter (1996) Scientific Reasoning: The Bayesian Approach. 2nd ed. Chicago and La Salle, IL: Open Court. Jaynes, E. T. (1973) “The Well-Posed Problem,” Foundations of Physics, 3, pp. 477-493. Jaynes, E. T. (2003) Probability Theory: The Logic of Science. Cambridge: Cambridge University Press. Jeffreys, Harold (1961) Theory of Probability. 3rd Edition. Oxford: Oxford University Press. Kass, Robert E. and Wasserman, Larry (1996) “The Selection of Prior Distributions by Formal Rules,” Journal of the American Statistical Association, 91, pp. 1343-70. Keynes, John Maynard (1921) A Treatise of Probability. London: MacMillan; reprinted New York: AMS Press, 1979. 36 Laplace, Pierre-Simon (1825) Philosophical Essay on Probabilities/ 5th ed. Trans. Andrew I. Dale. New York: Springer-Verlag, 1995. Norton, John D. (1994) "The Theory of Random Propositions," Erkenntnis, 41, pp. 325-352. Norton, John D. (forthcoming) “Probability Disassembled,” British Journal for the Philosophy of Science. Norton, John D. (manuscript) “Disbelief as the Dual of Belief.” Shafer, Glen (1976) A Mathematical Theory of Evidence. Princeton: Princeton University Press. Van Fraassen, Bas (1989) Laws and Symmetries. Oxford: Clarendon. Von Mises, Richard (1951) Probability, Truth and Statistics. 3rd German ed.; 2nd revised English translation. London: George Allen and Unwin, Ltd., 1957; repr. New York: Dover, 1981.