IN FAVOR OF LOGARITHMIC SCORING RANDALL G. MCCUTCHEON Abstract. Shuford, Albert and Massengill proved, a half century ago, that the logarith- mic scoring rule is the only proper measure of inaccuracy determined by a di�erentiable function of probability assigned the actual cell of a scored partition. In spite of this, the log rule has gained less traction in applied disciplines and among formal epistemologists that one might expect. In this paper we show that the di�erentiability criterion in the Shuford et. al. result is unnecessary and use the resulting simpli�ed characterization of the logarithmic rule to give novel arguments in favor of it. 1. Introduction: Scoring Rules Measures of epistemic utility (or disutility), i.e. scoring rules, are used in various disciplines to elicit faithful report of, and to measure the accuracy (or inaccuracy) of, probabilistic predictions. Given a partition A1; : : : ;An of an event space, we take a scoring rule for this partition to be a function : �f1; : : : ;ng ! [0;1], where is the set of n-tuples of non- negative reals summing to 1.1 We shall look upon (�;j) as the inaccuracy{a negatively oriented quantity that one seeks to minimize{of the forecast � = � Cr(A1); : : : ;Cr(An) � when Aj obtains. One scoring rule with a long and storied history is the quadratic rule (Brier 1950), B � (x1; : : : ;xn);j � = nX i=1 (xi � yi) 2; where yi = 1 for i = j and yi = 0 otherwise. 2 Other well known scoring rules include S � (x1; : : : ;xn);j � = 1 � xjpPn i=1 x 2 i (the so-called spherical rule), as well as the logarithmic rule (Good 1952), L � (x1; : : : ;xn);j � = � log xj: 1We shall assume that agents adopt credence functions obeying the probability axioms. In particular, we shall not concern ourselves with vindications of probabilism by way of accuracy considerations. Strictly speaking we would say that for an agent submitting an incoherent credence function, inaccuracy ought simply to remain unde�ned. 2For 2 cell partitions, the square di�erence between prior and posterior is the same for each cell, so it is more common to use a \half Brier" score, equal to one square di�erence rather than their sum. 1 2 RANDALL G. MCCUTCHEON Though the quadratic rule is the most popular (\by far", say Fallis and Lewis 2015) scoring rule, many investigators adopt an implicitly pluralistic attitude; each of several contenders, on this view, has good points and bad, with suitableness to a given application depending on a weighing of various considerations. 2. A Heuristic Argument for the Log Rule We contend, to the contrary, that the logarithmic rule has su�cient virtues (and other rules su�cient defects) that it should be looked on as at least the clear favorite (and probably as the only serious contender). Though this conviction is bolstered in part by the arguments we develop below, it was, for us, in fact cemented by some austere, �rst blush information theoretic heuristics. Since we �nd these considerations as convincing now as ever, we rehearse a version here. Imagine a tedious game of 20 questions in which we think of a number from 1 to 1024. Obviously you can �gure out our number in 10 questions if you like. First you ask whether or not the number is greater than 512 (say). Regardless of how we answer, you get 1 bit of information, winnowing the pool of live numbers to 512. If we say \no" you next ask if the number is greater than 256, and so on. What is going on? You gain 1 bit of information when we �rst answer \no, the number is not greater than 512" because your posterior probability ( 1 512 ) in the actual number is twice as great as your prior ( 1 1024 ). This multiplier, 2, represents your information gain. To convert a multiplicative quantity to an additive one, one takes the logarithm of 2. (It is customary to use logarithms base 2.) You might have started with a riskier question, say \is the number greater than 256?" If we had answered \no" your risky behavior would have paid o�; credence in the actual number would have quadrupled, from 1 1024 to 1 256 . Hence you'd have gained 2 bits of information (log2 4 = 2). But if we had answered \yes" your risky behavior would have cost you. Credence in the actual number would have jumped from 1 1024 to merely 1 768 . The multiplier here is 4 3 , and log2 4 3 � :415. Since the probability of this disappointment is 3 4 , the expected information gain of the riskier path is only 3 4 (:415) + 1 4 2 � :811. So it's more prudent to ask, initially, whether the number is greater than 512. The log rule is based on just this sort of \information counting". Suppose a weatherman is asked credence in the proposition that it will rain tomorrow. If he answers 1 2 , then regardless of whether it rains or not, he will gain 1 bit of information upon seeing the actual outcome. Namely, his credence in it will double, increasing from 1 2 to 1. If he has initial credence in rain 1 4 and it rains, he will gain 2 bits of information. Namely, his credence in the actual outcome will quadruple, increasing from 1 4 to 1. If it does not rain, however, his credence in the actual outcome will increase by a factor of 4 3 (from 3 4 to 1). This, as we have seen, gives him � :415 bits of information. IN FAVOR OF LOGARITHMIC SCORING 3 The weatherman seeks to adopt credences that anticipate the actual to the greatest extent possible, in the sense of minimizing expected information gain. That is, he wants as much of the information that will be re ected in his posterior credences to be re ected already in his prior credences (the less he learns tomorrow, the more he knows today). The fact is general. It's rational to want your credences to re ect as much of your knowledge as possible. Using the log rule to measure inaccuracy captures this intuition; inaccuracy simply corresponds to the amount of posterior information not re ected in the priors, i.e. � log2 x (bits gained, or surprisal, in information-theoretic parlance), where x is prior credence in the relevant actual outcome. 3. Shuford, Albert and Massengill on Logarithmic Scoring Our more theoretical arguments for logarithmic scoring meanwhile are based on a result of Shuford et. al. (1966). Consider a partition (E1;E2;E3) of an outcome space and an agent who announces a prior of (x1;x2;1 �x1 �x2) on that partition. Under the log rule, this agent's expected inaccuracy is then given by L(x1;x2) = �p1 log x1 � p2 log x2 � p3 log(1 � x1 � x2); where pi is the actual probability (objective chance or ideal epistemic probability) of Ei. We note two features: (1) The minimum expected inaccuracy occurs at x1 = p1;x2 = p2. (2) Inaccuracy is a di�erentiable (on (0; 1)) function of the agent's credence in the actual cell alone. For its satisfaction of the �rst property one says that the logarithmic rule is proper{there's an incentive to have credences equal to the actual probabilities. Though this is plainly a desirable feature, there are other proper scoring rules (among them the Brier score and the spherical rule). What sets the log rule apart is what Shuford et. al. (1966) proved, namely that the logarithmic is the only scoring rule (up to a constant multiple) satisfying both (1) and (2).3 In light of this result, we take it that the case for the logarithmic rule (as against pluralism) turns on whether (2) can be established as a necessary desideratum. 3To see this, consider a scoring rule assigning value f(p), where p is the agent's credence in the actual cell. (By de�nition f(1) = 0; inaccuracy is zero when the agent's prior re ects certainty in the actual outcome.) Expected score is S(q1;q2) = p1f(q1) + p2f(q2) + (1 � p1 � p2)f(1 � q1 � q2): Since the function S is di�erentiable and has a global minimum at q1 = p1, q2 = p2, its partial derivatives Sq1 = p1f 0(q1) � (1 � p1 � p2)f0(1 � q1 � q2) and Sq2 = p2f0(q2) � (1 � p1 � p2)f0(1 � q1 � q2) are both equal to zero there. This yields p1f 0(p1) = p2f0(p2). But this should hold for all p1 > 0, p2 > 0 with 0 < p1 + p2 < 1. In other words xf 0(x) is constant on (0;1) and one quickly determines f(x) to be some constant multiple of log x. 4 RANDALL G. MCCUTCHEON We argue for this in two stages. First, we strengthen the result of Shuford et. al. by eliminating the di�erentiability condition. (This technical portion is relegated to an ap- pendix.) Second, we give two arguments that inaccuracy ought to be a function of the agent's credence in the actual cell alone.4 3.1. Inaccuracy and Likelihood A more compelling feature of logarithmic scoring that has been noted in the literature (see, e.g., Bernardo 1979), is that it promotes a strong relationship between accuracy and Bayesian con�rmation. In typical applications, one is interested in assigning a score to a probabilistic model for a random variable having unknown distribution (in response to a random sample R taken from it). When the distribution is unknown, R provides evidence that may con�rm one candidate model at the expense of another. For A and B in the support of one's prior distribution over the \true chances", A receives greater con�rmation5 by R than does B when A better �ts the evidence, i.e. when Pr(RjA) > Pr(RjB). It is plausible that accuracy should mirror con�rmation, i.e. that A should be deemed more accurate than B, given sample R, precisely when A receives greater con�rmation by R. Only a rule for which score is a function of credence assigned the actual cell alone can have this property. To illustrate, consider an agent C and a random experiment with outcome space fE1;E2;E3g. We suppose that C views the experiment's true probability function (c1;c2;c3) as a random variable having a continuous distribution, the statistics of which may be described by a probability density function f(x;y;z) de�ned on triples (x;y;z) of non-negative reals summing to 1. (C's actual credence in Ei is of course the expectation of ci under this distribution.) The con�rmation a�orded a given triple A = (x;y;z) by random sample R = E1 is proportional to Pr(RjA) = Pr(E1j(x;y;z)) = x. If greater accuracy is to correspond to greater con�rmation, then, the inaccuracy of A = (x;y;z) upon observation of E1 must be a function of x = A(E1). One might think that this argument only works if both A and B lie in the support of C's distribution over the true chances; if one starts out knowing that the true chances are either (1 3 ; 1 2 ; 1 6 ) or (1 6 ; 1 2 ; 1 3 ) then one cannot justify scoring, say, (1 4 ; 1 4 ; 1 2 ) in any particular way by its con�rmation by evidence, since one's prior probability that this triple corresponds to the actual chances (or even the actual ideal epistemic probability) is zero. (Thanks to an anonymous referee for this point.) Recall though that inaccuracy scores are functions of credences and outcomes alone; they do not further depend upon, say, one's distribution 4At least one set of authors, Knab and Schoen�eld (2015), explicitly state (as part of an argument that quadratic scoring can give \strange" results) that \...a probabilistic agent's accuracy...at world w should be determined solely by the amount of credence she invests in the true theory at w, and the amount of credence she invests in false theories at w." As we restrict to agents that obey the probability axioms, that is precisely what we are arguing for here. 5We take the degree of con�rmation of A by R to be Pr(AjR) Pr(A) when Pr(A) > 0. IN FAVOR OF LOGARITHMIC SCORING 5 over the true chances. Justi�cation in cases where both A and B are in the support of C's distribution over the chances therefore generalizes to cases in which they are not. There is precedent for both acknowledging the existence and denying the force of such considerations. R. Selten (1998) in particular writes: \The logarithmic scoring rule has a close connection to the maximum likelihood principle. However, in spite of this theoretical advantage, the logarithmic scoring rule is not really recommendable." He then goes on to detail several objections against logarithmic scoring, two of which we'll examine in the next section. For now, we move to our second argument in favor of (2), which is more di�cult to answer. 3.2. Inaccuracy and Untested Conditional Probabilities Consider again an experiment having outcome space partitioned as fE1;E2;E3g. A cre- dence function Cr over this space is wholly determined by (a) the restriction of Cr to the subspace generated by fE1;E2 _ E3g, and (b) the conditional probability Cr(E2jE2 _ E3). Suppose that there is a scoring rule S and credence functions A = (a1;a2;a3) and B = (b1;b2;b3) with b1 = a1 such that S(A;1) > S(B; 1), i.e. A is judged less accurate than B when R = E1 is observed. Since Cr is wholly determined by (a) and (b), S(Cr;1) is wholly determined by (a) and (b) as well. Note that the restrictions of A and B to the subspace generated by fE1;E2 _ E3g are identical; (a), therefore, plays no part in the di�erence of S(A;1) and S(B; 1). The reasons for this di�erence must therefore be found in (b), i.e. in the fact that the conditional probabilities A(E2jE2 _E3) and B(E2jE2 _E3) disagree. To bring out the oddness of this, we can again take advantage of the fact that we are free to choose the experiment in any way we like. Here is our choice. First, a coin of uncertain bias is tossed. If the coin comes up heads, stop. If tails, a 6-sided die of uncertain bias is then rolled. Let now E1 = heads; E2 = tails ^ six; and E3 = tails ^ :six. For emphasis, let A = ( 7 10 ; 1 10 ; 1 5 ) and B = ( 7 10 ; 3 20 ; 3 20 ). The outcomes of the toss and the potential roll are (by stipulation) causally independent. Among agents respecting this stipulation, then, A would be adopted by, and only by, agents for whom the coin has expected propensity 7 10 to land heads and the die has expected propensity 1 3 to land six. B, meanwhile, would be adopted by, and only by, agents for whom the coin has expected propensity 7 10 to land heads and the die has expected propensity 1 2 to land six. Suppose that the experiment is run and the coin lands heads, i.e. E1 is realized. An agent adopting A (call her Amy) will be scored as less accurate than an agent adopting B (Beatrice). Why? Not on account of their attitudes toward heads; these are equivalent. Because Amy believes that the die has expected propensity 1 3 to land six, then, whereas 6 RANDALL G. MCCUTCHEON for Beatrice the expected propensity in six is 1 2 . But, Amy will no doubt protest, the die was not even rolled! No evidence bearing in any way upon the propensity of the die to land six was gathered! Assuming that no satisfactory response to Amy's complaint is forthcoming, all the pluralist can now do is acknowledge that the example does tell against rules that do not score by the actual cell alone whilst holding out hope that other considerations might tell equally against proper rules (constant multiples of the log rule by Shuford et. al. 1966 and our appendix) that do. That project succeeding, it might then be thought that this is just one of those situations in which \one can't have it all", and that the weight of various considerations ought to determine the best choice of rule in speci�c applications{just what pluralism recommends. Alas, this project is not as promising as one might hope. For, as we show in the next section, several well regarded objections to the log rule miss their mark. 4. Objections to Logarithmic Scoring Answered In this section we answer three objections to logarithmic scoring. This set of objections is surely not exhaustive, but we believe it to be fairly representative. 4.1 Convexity Joyce (1998) defends a constraint on scoring rules S, according to which, for every pair of distinct credence functions c1, c2 and outcome j, if S(c1;j) = S(c2;j) then S( 1 2 c1+ 1 2 c2;j) < S(c1;j). That is, when two distinct credence functions are judged to be equally inaccurate for a given outcome, the midpoint of the two must be judged strictly more accurate than either. So, for example, since ( 7 10 ; 3 20 ; 3 20 ) is the midpoint of ( 7 10 ; 1 10 ; 1 5 ) and ( 7 10 ; 1 5 ; 1 10 ) (and since every scoring rule that merits consideration is invariant under permutation of cells), an advocate for this constraint would score ( 7 10 ; 3 20 ; 3 20 ) as strictly more accurate than ( 7 10 ; 1 5 ; 1 10 ) when E1 obtains. In his (2009), Joyce gives new arguments in favor of this constraint, which he terms Convexity (note, however, that he also backs away from his earlier position somewhat, choosing to treat Convexity as an \optional constraint"): IN FAVOR OF LOGARITHMIC SCORING 7 ...suppose that a single ball will be drawn at random from an urn con- taining nine white balls and one black ball. On the basis of this evi- dence, a person might reasonably settle on a credence of b = 0:1 for the proposition that the black ball will be drawn and a credence of b = 0:9 for the proposition that a white ball will be drawn. Suppose that the ball is drawn, and that we learn that it is black. We are then asked to advise the person, without telling her which ball was drawn, whether or not to take a pill that will randomly raise or lower her credence for a black draw, with equal probability, by 0.01, while leaving her credence for a white draw at 0.9. If our only goal is to improve the person's epistemic utility, then our advice should depend on the convexity of the score for truths at credence 0.1. For a rule that is convex here...the pill's disadvantages outweigh its advantages. Note that the pill induces probabilistic incoherence; it changes credence in black while leaving credence in white the same. That fact diminishes the force of the argument, as it opens the door for a critic to claim that incoherence, rather than concavity, is responsible for any encountered disutility. Better, we think (and we'll assume this going forward), would be to allow the agent's credence in white to vary in the expected way{become .89 when credence in black becomes .11, etc. Subsequent these changes we do at least agree with Joyce's claim that a scoring rule ought to deem use of the pill epistemically undesirable in the mean (the logarithmic rule does, as it satis�es Convexity for 2 cell partitions). The best explanation we see for this is that if one assigns probabilities .09 and .11 to two independent events A and B then one assigns probability :0099 to the conjunction A ^ B, whereas if one assigns probabilities .1 and .1 to A and B respectively then one assigns probability :01 > :0099 to the conjunction. We accept (2) so, given any �nite set of actual (independent) outcomes, we think one's inaccuracy with respect to the corresponding experiments ought to be a strictly decreasing function of the probability one assigns to the conjunction of those outcomes (so it is epistemically worse to have assigned half of them probability .09 and half of them probability .11 than it is to have assigned all of them probability .1). That reasoning is unavailable for partitions having more than 2 cells. If an agent has credences ( 7 10 ; 3 20 ; 3 20 ) over a 3 cell partition (E1;E2;E3) and we know that E1 is the case, the agent is exposed to no epistemic risk, from our perspective, if she takes a pill that will �x her credence in the true outcome E1 but induce small o�setting random perturbations in her credences for E2 and E3. What Joyce saw as epistemically undesirable was the employment of \a random process that has just as much chance of moving her away from 8 RANDALL G. MCCUTCHEON the truth as it has of moving her toward it". This does not preclude indi�erence to the employment of a process inducing movements known (by us) to be orthogonal to the truth.6 4.2 Hypersensitivity A number of objections to logarithmic scoring congregate around the fact that the log rule gives an inaccuracy score of in�nity to an agent with zero credence in a realized cell of a scored partition. Even Schuford et. al. (1966), summarizing their important positive results for the rule, write: \In review, the `logarithmic' scoring system is the only one which has the property that the student's score depends only on the probability that he assigns to the correct answer when there are more than two possible answers. All other (proper scoring rules) lack this property. We �nd, however, that the logarithmic scoring system is unbounded and thus impossible to realize in practice, e.g. how can one give a student a score of minus in�nity?" Selten (1998), meanwhile, writes: \The use of the logarithmic scoring rule implies...that wrongly describing something extremely improbable as having zero probability is an unforgivable sin." We believe, to the contrary, that the log rule is both simple to realize (note that even a one-in-a-trillion outcome that comes out actual incurs an inaccuracy score of only 40 bits or so) and equitable in its judgments. As to the log rule's no-forgiveness policy regarding zero credences in actual outcomes, we �nd this reasonable. O�ending agents would, in theory, take and lose arbitrarily many bets against a zero credence actuality, perhaps taking hallucination that the bets were going against them as a likely explanation for their mounting debts. Selten (1998) makes a related complaint, saying of the log rule that \...it is too sensitive with respect to di�erences between very small probabilities...." Precisely, Selten calls a scoring rule S hypersensitive if for every � > 0 and every M > 0 there are probability distributions (over n-cell partitions, n � 2) r and p assigning positive measure to each cell such that the Euclidean distance from r to p is at most � but the r-expectation of S(p) exceeds the r-expectation of S(r) by at least M. As Selten notes, the log rule is hypersensitive. As to why this is a problem, he writes: \...in general, it will be very 6A further indication that this example isn't harmful to the log rule is that the (standard piece- meal version of; see below) quadratic rule is more likely than the logarithmic rule to favor the pill taker over the non-taker over longish, �nite sequences of independent draws from the urn. Letting �q and �l represent the greater inaccuracy incurred by the pill taker under the quadratic and log- arithmic rules respectively, �q takes on values (:0021;�:0019;�:0179; :0181) and �l takes on values � (:016119665;�:01594154;�:13750352; :15200309) with probabilities (:45; :45; :05; :05). So p V ar(�q) E(�q) = p :000036 :0001 = 60, whereas p V ar(�l) E(�l) � p :00120056 :0005580758 � 59:96898. In 104 trials the expectations would increase 10,000-fold and the standard deviations 100-fold, so a case where the pill taker had lesser measured in- accuracy would lie � 1 :5996898 � 1:66753 standard deviations from the mean under logarithmic scoring (occurs with frequency p � :0477), but only 1 :6 � 1:66667 standard deviations under piecemeal quadratic scoring (p � :0478). IN FAVOR OF LOGARITHMIC SCORING 9 di�cult to judge how small a very small probability should be. Usually there will be no good theoretical reasons to specify a probability as 10�5 rather than 10�10. (...) such di�erences can be of crucial importance for the comparison of the two theories." One needs to discriminate between two types of case, however. In the �rst type of case, where one is scoring a partition with a few large cells and a few (or one) small exceptional cell(s), hypersensitivity fails to manifest in logarithmic scoring provided one assigns even (very) modestly realistic credences. If one is scoring the toss of a fair coin, with outcomes heads, tails and other (other being a conjuction of such unlikely scenarios as \lands on edge", \ ies o� into space", \unreadable", etc.), it matters little (in the mean) if one assigns other credence 10�10 or even 10�50 in a case where the true probability is 10�5; the unlikeliness of the outcome dwarfs the magnitude of the penalty. Mean inaccuracy will of course increase dramatically if one assigns other an excessively low probability, such as 10�10 9 , but these are just deserts for such an unconscionably impoverished estimate. In the other type of case, in which there are many small, unexceptional cells, such as when a respondent �lls out a multi-question survey or reports credences about the outcome of a large single elimination tournament, there typically are \good theoretical reasons" to specify one probability over another. In the case of a 64-competitor single elimination tournament, there are 263 possibilities for the �nal bracket, the most likely of which may have true probability � 2�30. Even so it is easy enough to specify an accurate prior for the realized bracket using the piecemeal approach of assigning credences to each contest, conditional (where applicable) on results-to-date, and multiplying. We'll return to this point below. 4.3 Neutrality The �nal objection we will consider is based on symmetry considerations. Given a scoring rule S and a probability measure p on a partition of event space, Selten (1998) de�nes V (pjq) to be the q-expectation of S(p) (i.e. the expected inaccuracy of p in a case where q gives the true probabilities), and de�nes the expected score loss of p at q by L(pjq) = V (qjq)�V (pjq). (This is a measure of the greater mean inaccuracy incurred by choosing p rather than the true probability function q.) Selten then formulates an axiom of Neutrality, which states that L(pjq) = L(qjp) for any q and p. He writes: The interpretation of (Neutrality) becomes clear if one looks at the hypothetical case that one and only one of two theories p and q is right, but it is not known which one. The expected score loss of the wrong theory is a measure of how far it is from the truth. It is only fair to require that this measure is \neutral" in the sense that it treats both theories equally. If p is wrong and q is right, then p should be considered to be as far from the truth as q in the opposite case that q is wrong and p is right. 10 RANDALL G. MCCUTCHEON Having defended the logarithmic rule against the charges of no-forgiveness and hypersen- sitivity, it may seem odd that we are choosing to address this charge last; if no-forgiveness and hypersensitivity are justi�ed, then Neutrality clearly isn't. We think, however, that there is value in looking at an argument against Neutrality that does not bring in near-zero probabilities; more so, in that it will serve well as an introduction to the next section. Selten introduces four axioms in all, of which Neutrality is the last, then shows that, together, these axioms characterize the quadratic rule.7 The logarithmic rule, meanwhile, satis�es the �rst three axioms but fails Neutrality. Since we accept the �rst three axioms, then, for us Neutrality and the quadratic rule are simply equivalent. Our argument against the former, then, will consist in showing how the latter engenders a de�cient notion of \expected score loss". Consider two agents, p and q. We assume that p has credence 1 2 in A, which has true prob- ability 1 4 , while q has credence 1 4 in an independent event B for which the true probability is 1 2 . According to the reasoning behind Neutrality, p is \as far from the truth" regarding A as q is regarding B. I.e., inaccuracy should be scored in such a way that their \expected score losses" are equal. (And so they are, according to the quadratic scoring rule.) Sup- pose that we now attempt to esh out p and q's epistemic attitudes without introducing further expected score loss: p and q correctly deem A and B to be independent, and both p's credence in B and q's credence in A are aligned to the true probabilities. The situation is now as follows. Over the partition W = fA ^ B;A ^ :B;:A ^ B;:A ^ :Bg; p's and q's credence functions are (1 4 ; 1 4 ; 1 4 ; 1 4 ) and ( 1 16 ; 3 16 ; 3 16 ; 9 16 ), respectively. The true probabilities, meanwhile, are given by r = (1 8 ; 1 8 ; 3 8 ; 3 8 ). It is easy to see that A's quadratic score over W is always (3 4 )2 + 3(1 4 )2 = 3 4 . We leave it to the reader to verify that the true expectation (i.e. the r-expectation) of B's quadratic score is, however 49 64 ; in particular, q now has higher expected score loss, according to the quadratic rule. Since \expected score loss" should mean, roughly, \expected amount of gratuitious inac- curacy", one ought to reject any scoring rule according to which either p or q incurs any additional expected score loss in eshing out their attitudes as they do, i.e. in the ideally 7Though several good arguments against the quadratic rule appear in the literature of the past decade or so, few (if any) authors have o�ered wholesale endorsment of the logarithmic rule in its stead. H. Leit- geb and R. Pettigrew (2010) show that the quadratic rule is not consistent with Je�rey conditionalization (Je�rey 1965), but seem more willing to jettison the latter than the former. B. A. Levinstein (2012), responding to Leitgeb and Pettigrew, shows that the logarithmic rule does cohere with Je�rey condition- alization, but stops short of embracing it. Fallis and Lewis (2015) show that the quadratic rule doesn't even cohere with standard conditionalization. They do not, however, endorse the logarithmic rule. IN FAVOR OF LOGARITHMIC SCORING 11 rational manner{let alone di�erent amounts of it! The log rule, by contrast, isn't sub- ject to this objection. We therefore judge the quadratic scoring rule to be unacceptable. Concomitantly, we reject Neutrality. 5. On the Computational Intractability of Competing Rules Though we think that the arguments of the previous two sections provide compelling reasons to prefer proper rules that score by the actual cell alone (i.e. logarithmic rules), some readers will of course still insist on clinging to their favorite alternatives. In this section we suggest that the theoretical disadvantages these readers will meet with are the least of their worries. Indeed, once one advances beyond toy examples, scoring a credence function by a rule that doesn't score by the actual cell alone is apt to become computationally intractable. Let us return to the single elimination tournament example. In practice, it would be extraordinarily tedious to specify probabilities for all 263 possible brackets. For the log rule this isn't a problem. The agent simply assigns probabilities for each \�rst round" contest (outcomes from a given round may not be independent conditional on results of past rounds in general, so this constitutes a simplifying assumption), then after learning which competitors prevailed in those contests (but nothing else), assigns probabilities for each \second round" contest, etc. At the end, one may by working backward �gure out the agent's prior probability in the actual bracket, which is su�cient to compute the agent's inaccuracy under the log rule. (This, owing to the identity log Cr(A \ B) = log Cr(A) + log Cr(BjA), is equal to the sum of the individual inaccuracy scores for the 63 contests for which the agent provided credences.) For scoring rules that don't score by credence in the actual cell alone, though, this shortcut won't serve. In order to score the 263-cell partition arising from the elimination tournament with the quadratic rule, for example, one requires credences for every cell. Unlike with the log rule, then, there is no simple, equivalent way to compute the score \piecemeal". Natural-looking attempts (such as computing quadratic scores for each contest and adding them) can give con icting results. To illustrate, suppose that we predict rain with probabilities 1 2 in New York and 1 2 in Tokyo, whereas your probabilities are 1 3 and 4 5 , respectively. Suppose further that we agree that these are independent events. If it rains in both cities then the sum of our quadratic scores for the 2-cell partition determined by the weather in the two cities respectively is (1 2 )2 + (1 2 )2 = 1 2 , while yours is (2 3 )2 + (1 5 )2 = 109 225 < 1 2 . You are more accurate, then, according to this piecemeal approach. On the other hand, your initial credence function on the smallest common re�nement of the two independent partitions considered, namely� (NY, Tokyo), (NY, :Tokyo), (:NY, Tokyo), (:NY, :Tokyo) ; 12 RANDALL G. MCCUTCHEON was ( 4 15 ; 1 15 ; 8 15 ; 2 15 ). So your actual quadratic score over the common re�nement is �11 15 �2 + � 1 15 �2 + � 8 15 �2 + � 2 15 �2 = 190 225 : Our credence function on the re�nement meanwhile was (1 4 ; 1 4 ; 1 4 ; 1 4 ), yielding a quadratic score of (3 4 )2 + 3(1 4 )2 = 3 4 < 190 225 . We are therefore more accurate, according to this more \holistic" approach. So these are di�erent rules. Indeed, piecemeal versions of the quadratic rule depend further on the generating sequence of partitions employed. Suppose you are scored �rst on the event that it rains in either both or neither of the cities in question (in which you have credence 2 5 ). Upon learning the truth of this event you would come to have credence 2 3 in NY. If you are then scored on NY, your running score would be (3 5 )2 +(1 3 )2 = 106 225 . But imagine an agent whose priors are 1 2 in NY and :52 in Tokyo. That agent's running score would be .4804 (which lies strictly between 106 225 and 109 225 ) by either of the piecemeal methods we've considered. So, again, these are di�erent rules. Since scoring rules that are not functions of the actual cell alone generally depend on all cells, this is a problem that may plague any proper scoring rule that fails to satisfy (2); that is, all proper scoring rules, except for the logarithmic rule. For large partitions, such rules are di�cult to evaluate, and seemingly natural piecemeal variants fail to be equivalent{both to the target rule and to each other. 6. Conclusion In light of the compelling heuristics in its favor and the results of Shuford et. al (1966), it is surprising that the logarithmic scoring rule has lagged in popularity. Our goal has been to render it more palatable, or at least a \necessary evil". Other scoring rules meanwhile contradict con�rmation, pay heed to untested di�erences in conditional probabilities, attribute increases in expected score loss to agents who have extended their credences ideally and are likely to either depend arbitrarily on a choice of generating partitions or make computation intractable. In light of this, epistemologists and others who employ scoring rules to evaluate the accuracy of credences and have neglected the logarithmic rule would do well to reconsider its merits. 7. Appendix Theorem 1. Let f, taking values in the extended reals, be strictly decreasing on [0; 1] with f(1) = 0. If for every p;q � 0 with p + q � 1 the function H(x;y) = pf(x) + qf(y) + (1 � p � q)f(1 � x � y) has a strict global minimum at x = p, y = q then f is di�erentiable on (0;1). IN FAVOR OF LOGARITHMIC SCORING 13 Remark. We adhere to the convention that 0 � 1 = 0, where applicable. Given that, Theorem 1 as formulated immediately generalizes to versions with greater numbers of cells. For example, one may establish that if, for every p;q;r � 0 with p + q + r � 1, H0(x;y;z) = pf(x)+qf(y)+rf(z)+(1�p�q�r)f(1�x�y) has a strict global minimum at x = p, y = q, z = r then f is di�erentiable on (0;1). (The proof is immediate; just set r = 0 and apply Theorem 1.) The theorem does not, on the other hand, admit of a 2-cell version. If for example f(x) = 8 >< >: 3 + (1 � x)2 0 � x < 1 3 1 + (1 � x)2 1 3 � x � 2 3 (1 � x)2 2 3 < x � 1 then H00(x) = xf(x) + (1 � x)f(1 � x) has a strict global minimum at x = p for every 0 � p � 1 but f is not di�erentiable (or even continuous) on (0; 1). (Cf. Section 7.2.2 of Predd et. al. 2009.) Proof of Theorem 1. Suppose 0 < x;y and x + y < 1. Then for any 0 < � < y, xf(x) + yf(y) + (1�x�y)f(1�x�y) < xf(x + �) + yf(y ��) + (1�x�y)f(1�x�y); so that x � f(x + �) � f(x) � > y � f(y) � f(y � �) � : (1) De�ne �(w;�) = f(w + �) � f(w) for w;� > 0 with w + � < 1. Then x�(x;�) > y�(y � �;�); 0 < x;y; x + y < 1; 0 < � < y: Making the substitution z = y � �, one has x�(x;�) > (z + �)�(z;�); x;z;� > 0; x + z + � < 1: Switching the roles of x and z, z�(z;�) > (x + �)�(x;�) > x + � x (z + �)�(z;�); or z x + � �(z;�) > �(x;�) > z + � x �(z;�); x;z;� > 0; x + z + � < 1: (2) We claim that lim inf�!0+ �(x;�) � > �1. Otherwise, dividing (1) by � and choosing a \bad" sequence of � tending to zero, one could conclude that lim inf �!0+ f(y) � f(y � �) � = �1 for any y 2 [1�x 2 ; 1 � x]. Letting then M > 0 be arbitrary and t = inf � y 2 [ 1 � x 2 ;1 � x] : f(y) > f(1 � x) + M(1 � x � y) ; 14 RANDALL G. MCCUTCHEON if t > 1�x 2 then choosing � small so that f(t) � f(t � �) < �M�, one obtains f(t � �) > f(t) + M� � f(1 � x) + M(1 � x � t) + M� = f(1 � x) + M(1 � x � (t � �)): So t = 1�x 2 and f(1�x 2 ) � f(1 � x) + M(1�x 2 ). But M is arbitrary, so this is absurd. Suppose now that 0 < x < z < 1. For any �;y > 0 with y + z + � < 1, from (2) �(x;�) > y + � x �(y;�) and �(y;�) > z + � y �(z;�): It follows that �(x;�) > (y + �)(z + �) xy �(z;�): (3) Similarly from (2) �(x;�) < y x + � �(y;�) and �(y;�) < z y + � �(z;�); from which follows �(x;�) < yz (x + �)(y + �) �(z;�): (4) Letting y = 1�z 2 , (3) and (4) give (1�z 2 )z (x + �)(1�z 2 + �) �(z;�) > �(x;�) > (1�z 2 + �)(z + �) x(1�z 2 ) �(z;�) (5) whenever 0 < x < z < 1 and � < 1�z 2 : Fix now x 2 (0;1) and let B > 1 be arbitrarily close to 1. Fix � > 0 su�ciently small that x + 3� < 1, (1�z 2 )z (x + )(1�z 2 + ) > B �1 3 and (1�z 2 + )(z + ) x(1�z 2 ) < B 1 3 for every z 2 [x;x + �] and 0 < < �. Next let 0 < < � be any number so small that, choosing N such that � N � > � N+1 , one has N+1 N < B 1 6 . Now for j = 0;1; : : : ;N, set zj = x + j . From (5), ( 1�zj 2 )zj (x + )( 1�zj 2 + ) �(zj; ) > �(x; ) ! �(zj; ) > �(x; ) ( 1�zj 2 )zj (x+ )( 1�zj 2 + ) : IN FAVOR OF LOGARITHMIC SCORING 15 Summing from 0 to N (and recalling that �(x; ) < 0) yields: �(x;�) > NX j=0 �(zj; ) > NX j=0 �(x; ) ( 1�zj 2 )zj (x+ )( 1�zj 2 + ) > B 1 3 �(x; )(N + 1): Since � N � , one then has � �(x;�) � �� B �1 2 � > � �(x; ) � � B �1 6 (N + 1) > N�(x; ) � � �(x; ) : Letting tend to zero one obtains lim sup !0+ � �(x; ) � � � �(x;�) � �� B �1 2 � : (6) Similarly, from (5) �(x; ) > ( 1�zj 2 + )(zj + ) x( 1�zj 2 ) �(z; ) ! �(x; ) ( 1�zj 2 + )(zj+ ) x( 1�zj 2 ) > �(z; ): Summing from 0 to N � 1 yields B �1 3 �(x; )N > N�1X j=0 �(x; ) ( 1�zj 2 + )(zj+ ) x( 1�zj 2 ) > N�1X j=0 �(zj; ) > �(x;�): Since � N+1 < , �(x; ) > �(x;�)B 1 3 N > �(x;�)(N + 1)B 1 3 �N � � �(x;�) � �� B 1 2 � : Letting tend to zero one obtains lim inf !0+ � �(x; ) � � � �(x;�) � �� B 1 2 � : (7) Since B may be taken arbitrarily close to 1 and lim sup�!0+ � �(x;�) � � < 1, f 0+(x) = lim !0+ � �(x; ) � exists for arbitrary x 2 (0;1) by (6) and (7). One may show similarly that f 0 � (x) = lim !0� � �(x; ) � exists as well. (In particular, f is continuous on (0,1).) 16 RANDALL G. MCCUTCHEON Dividing by � on both sides of (1) and letting � tend to zero, xf 0+(x) � yf 0 � (y) for every x;y > 0 with x + y < 1: (8) We claim that equality holds in (8). Assume for contradiction that xf 0+(x) > yf 0 � (y) for some x;y > 0 with x + y < 1. Choose T < 1 such that xf 0+(x) > Tyf 0 � (y). Then for all su�ciently small � > 0, x � f(x + �) � f(x) � > Ty � f(y) � f(y � �) � : (9) Let S = sup n 2 [0; �] : y � f(y � � + ) � f(y � �) � > T�1x � f(x + �) � f(x + � � ) �o : Note that (by continuity) y � f(y � � + S) � f(y � �) � � T�1x � f(x + �) � f(x + � � S) � : (10) Suppose that S < �. Since (y � � + S)f 0+(y � � + S) � (x + � � S)f 0 � (x + � � S), for all su�ciently small > 0 one has S + < � and (y � � + S) � f(y � � + S + ) � f(y � � + S) � >T �1 2 (x + � � S) � f(x + � � S) � f(x + � � S � �) � !y � f(y � � + S + ) � f(y � � + S) � > T�1x � f(x + � � S) � f(x + � � S � �) � ; the implication for � su�ciently small (� is chosen after T). Adding this to (10), y � f(y � � + S + ) � f(y � �) � � T�1x � f(x + �) � f(x + � � S � ) � : Thus S + is in the set, the supremum of which is S. This contradiction establishes that S = �. Thus (10) says that y � f(y) � f(y � �) � � T�1x � f(x + �) � f(x) � , contradicting (9) and establishing that equality does in fact hold in (8). Taking then x = y 2 (0; 1 2 ), xf 0 � (x) = xf 0+(x), so f is di�erentiable on (0; 1 2 ). Finally for x 2 [1 2 ;1), choose y 2 (0;1 � x) and note that xf 0+(x) = yf 0 � (y) = yf 0+(y) = xf 0 � (x); so f is di�erentiable on (0;1).8 8Thanks to Steve Kalikow, the anonymous referees and the editors at Philosophy of Science. IN FAVOR OF LOGARITHMIC SCORING 17 References Bernardo, J.M. 1979. \Expected Information as Expected Utility." The Annals of Statis- tics 7:686-690. Brier, Glenn W. 1950. \Veri�cation of Forecasts Expressed in Terms of Probability." Monthly Weather Review 78:1-3. Fallis, Don and Peter J. Lewis. 2015. \The Brier Rule Is not a Good Measure of Epistemic Utility (and Other Useful Facts about Epistemic Betterness)." Australasian Journal of Philosophy 94: 576-590. Good, I.J. 1952. \Rational Decisions." Journal of the Royal Statistical Society, Ser. B 14:107-114. Je�rey, R. 1965. The Logic of Decision. New York: McGraw-Hill. Joyce, J.M. 1998. \A Nonpragmatic Vindication of Probabilism." Philosophy of Science 65(4): 575-603. Knab, Brian and Miriam Schoen�eld 2015. A Strange Thing about the Brier Score, M-Phi, http://m-phi.blogspot.nl/2015/03/a-strange-thing-about-brier-score.html Leitgeb, H. and R. Pettigrew. 2010. \An Objective Justi�cation of Bayesianism II: The Consequences of Minimizing Inaccuracy." Philosophy of Science 77: 236{272. Levinstein, Benjamin Anders. 2012. \Leitgeb and Pettigrew on Accuracy and Updating." Philosophy of Science 79: 413-424. Predd, J., Robert Seiringer, Elliott H Lieb, Daniel N. Osherson, H. Vincent Poor and Sanjeev R. Kulkarni. 2009. \Probabilistic Coherence and Proper Scoring Rules." IEEE Transactions on Information Theory 55(10):4786-4792. Selten, Reinhard. 1998. \Axiomatic Characterization of the Quadratic Scoring Rule." Experimental Economics 1:43-62. Shuford, Jr., Emir H., Albert, Arthur and Massengill, H. Edward. 1966. \Admissible Probability Measurement Procedures." Psychometrika 31(2): 125-145. rmcctchn@memphis.edu