Immodest Inductive Methods IMMODEST INDUCTIVE METHODS* DAVID LEWIS' Princeton University Inductive methods can be used to estimate the accuracies of inductive methods. Call a method immodest if it estimates that it is at least as accurate as any of its rivals. It would be unreasonable to adopt any but an immodest method. Under certain assumptions, exactly one of Carnap's lambda-methods is immodest. This may seem to solve the problem of choosing among the lambda-methods; but sometimes the im- modest lambda-method is A = 0, which it would not be reasonable to adopt. We should therefore reconsider the assumptions that led to this conclusion: for instance, the measure of accuracy. Suppose you are looking for an inductive method to trust. By an inductive method, I mean a systematic way of letting the available evidence govern your degree of belief in hypotheses. We can represent a method by a function C from pairs of propositions to real numbers in the unit interval. You trust the method if your degree of belief in any hypothesis h, conditionally on evidence e, is C(h I e). One thing you can do given an inductive method C is to estimate the values of numerical magnitudes. By a (numerical) magnitude, I mean a function from all pos- sible worlds to numbers. The speed of a given racehorse (in a given race), for in- stance, is that function whose value at any possible world w is the speed of that horse in that race in the world w (or some arbitrarily chosen value if that horse does not exist, or does not run in that race, in the world w). For any magnitude m there is a set Vm of its possible values. For each value v in Vrm, there is a proposition Pv which holds at all and only those possible worlds at which m has the value v; we regard Pv as the proposition that m has the value v. The C-mean estimate, on evi- dence e, of a magnitude m may now be defined thus: (DI) Ec(m I e) =dfv -C(p I e) V where v ranges over V,m. You might, for instance, wish to use your inductive method C to bet on the horse whose C-mean estimated speed, on the evidence available to you, is highest. You should hope to give your trust to an inductive method C that will give you accurate estimates; you want there to be no more difference than you can help between the actual values and your C-mean estimates, on the available evidence, of * Received May, 1969. 1I am grateful to Robert L. Goble for many valuable comments; to the University of Cali- fornia for a Faculty Fellowship supporting the work reported in this paper; to the U.C.L.A. Campus Computing Network for an allocation of computer time used to prepare the numerical examples shown in Figures 1 and 2; to Diane Wells for drawing the figures; and to an anony- mous referee for Philosophy of Science who suggested certain improvements in exposition. 54 This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp IMMODEST INDUCTIVE METHODS 55 the various magnitudes in which you are interested. It is plausible that the un- desirability of errors might rise more than linearly with the size of the errors. Thus you might wish to measure accuracy by considering, say, the squared error of the inductive method in various estimating tasks. But you cannot just pick the most accurate method-not unless you already know the actual values of the magnitudes you wish to estimate, in which case you do not need to estimate them. The best you can do is pick the inductive method with the highest estimated accuracy, just as you might bet on the horse with the highest estimated speed. The trouble is that you need an inductive method to estimate anything, even to estimate the accuracy of various inductive methods. And your selection of a method with the highest estimated accuracy will come out differently depending on which method you use to make the estimate. It is as if Consumer Reports, Consumer Bulletin, etc., each published rankings of the consumers' magazines, as they do of other products. You would have to know which one to read in order to find out which one to read. Let us say that an inductive method C recommends an inductive method C' if the C-mean estimate of the accuracy of C' is not exceeded by the C-mean estimate of the accuracy of any rival method. An inductive method might or might not recom- mend itself. If it does, let us call it immodest. When asked which method has the best estimated accuracy, the immodest method answers: "I have." We may restate these definitions more precisely, making explicit reference to the class of competing inductive methods, to the way in which accuracy is measured, and to the total evidence available for use in estimating accuracies of methods. (D2) Method C recommends method C' in the class M of methods, under the accuracy-measure A, on the evidence e, iffEc(A(C') I e) ? Ec(A(C") I e) for any method C" in the class M. (D3) Method C is immodest in M, under A, one, iffEc(A(C) I e) > Ec(A(C') I e) for any method C' in M. Notice that it depends on the evidence which methods are immodest. It may happen according to (D3), and as we shall see it does happen under the assumptions we are about to make, that a method which is immodest on evidence el is not im- modest on different evidence e2. Does the immodesty of an inductive method give you any good reason to trust it? Certainly not every immodest inductive method deserves your trust. Consider Barker's gypsy ([1], p. 17) who, when asked whether her method of crystal-gazing is reliable, answers: Oh, yes indeed, you may be sure that my crystal gazing yields reliable answers to all your questions. I know that it does, for I conducted an empirical inquiry into the matter; seeking an answer to the question whether my crystal gazing is a reliable way of answering questions, I looked into my crystal ball, and the answer that I saw there was "Yes." Immodesty is too easy to come by. This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp 56 DAVID LEWIS But reverse the question: would non-immodesty give you any good reason not to trust an inductive method? Indeed it would. Suppose you did trust some non- immodest method. By definition, it estimates some competing method to be more accurate than itself. So if you really did trust your original method, you should take its advice and transfer your trust to one of the competing methods it recommends. It is as if Consumer Bulletin were to advise you that Consumer Reports was a best buy whereas Consumer Bulletin itself was not acceptable; you could not possibly trust Consumer Bulletin completely thereafter. The answer to our first question, whether immodesty is a good reason to trust an inductive method, ought to be: it depends on the competition. Any immodest method deserves your trust more than any non-immodest method, but the immodest methods must compete among themselves on other grounds. Immodesty is a neces- sary but not sufficient condition of adequacy for inductive methods. The requirement of immodesty will not help you much in choosing an inductive method unless few of the otherwise adequate methods are immodest. We might expect all methods to be immodest; in that case, it will get you nowhere to require immodesty as a condition of adequacy. How many methods are immodest? We cannot answer this question yet; we must first specify the class M of methods you wish to choose from, the accuracy-measure A you have adopted, and your total available evidence e. Assume you intend to apply your chosen inductive method only to a limited range of inductive tasks. Your interests are confined to propositions about a certain uni- verse of N things, and about a certain family of k kinds (mutually exclusive and jointly exhaustive) to which those things may belong. You will be content with an inductive method defined on any pair of such propositions. Assume further that you have already adopted several conditions of adequacy which rule out all but the A-methods: the inductive methods discussed by Carnap in [2]. Your remaining problem is to pick one of the A-methods. The A-methods are of interest partly because they comprise all and only the methods that satisfy certain rather plausible conditions of adequacy ([2], sections 4, 8), but even more because they are simple and well understood. The quickest motivation for them is as follows. First, let us stipulate that any adequate method C must conform to the basic axioms of conditional probability. Second, if e is a com- plete description of a sample and h is the proposition that a certain thing not in the sample belongs to a certain kind i, what should C(h I e) be? It should be close to the relative frequency of kind i in the sample if the sample is large, but close to l/k if the sample is small. Therefore, let us stipulate that C(h I e) is the relative frequency of kind i in an augmented sample consisting of the actual sample plus a fictitious sample of A things, A/k of each kind, where A is any positive real, or 0 or 0o. This has the desired result since when the actual sample is large compared to A the actual part of the augmented sample predominates, whereas when the actual sample is small compared to A the fictitious part of the augmented sample pre- dominates. Carnap shows ([2], sections 5, 10) that these two stipulations suffice to determine a unique inductive method C,; the parameter A measures the method's caution in learning from experience. This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp IMMODEST INDUCTIVE METHODS 57 Suppose next that you have adopted one of the accuracy-measures employed by Carnap in [2], sections 21-24: an accuracy-measure based on mean square errors of estimates of relative frequencies. Choose arbitrarily some small number t; we shall see that the value of this parameter does not matter. Let i range over the k kinds of things, and let ri be the magnitude whose value ri(w) at any possible world w is the relative frequency in the universe, at world w, of things of kind i. Let TtW be the set of all pairs where i is any one of the kinds and j is any proposition true in w that completely describes a sample containing t things. We regard each such pair in TtW as representing an estimating task: the task of estimating the relative frequency ri of kind i on evidence j. Note that the set TtW is finite. The error of inductive method C on the task represented by at the world w is the difference between the C-mean estimate Ec(ri I j) of r, on evidence j and the true value ri(w) of ri at w. Carnap suggests that we measure the inaccuracy of C at the world w by the mean square error of C at w on all such tasks: we take the mean over all pairs in the set Tt, Carnap shows ([2], section 21) that the mean square error (with parameter t) of a A-method CA at w is given approximately by (1). (1) t - A2/k + (A2 - t) >i ri(w)2 k(t + A)2 The approximation consists in taking estimated relative frequencies in the rest of the universe excluding the t things described byj, rather than estimated relative frequencies in the entire universe; thus it is a good approximation when t is sufficiently small compared to N. Carnap does, in practice, use the approximate mean square error as given by (1); so let us follow his practice, defining the family of accuracy-measures At for the A-methods as follows. (D4) AI(CA)(w) = dl -_(t - A2/k + (A2 - t) >ji ri(w)2' k(t +A)/ Thus At(CA) is that magnitude whose value at any world w is the approximate negative mean square error of CA at w on estimating tasks represented by the pairs in Tt,v Suppose finally that as you set out to pick an inductive method your total avail- able evidence e is a complete description of a certain sample containing s things such that, for each kind i, si things in the sample belong to kind i. Having specified the class of inductive methods you wish to choose from, the accuracy-measure you wish to maximize, and the total evidence at your disposal, we are ready to reconsider the question: how many methods are immodest? The answer is: exactly one. To show this, we begin by noting that CA recommends CA' on the evidence e if and only if A' is chosen to maximize the CA-mean estimate of At(CA'). Since At(CA,) is linear with 2i ri -we thus denote the magnitude whose value at any possible world w is >ji ri(w)2 -we find that the CA-mean estimate on e of At(CA,) is given by (2). t {t- A k + (A'2 -t)EcA(2l r2 ) (2) - (t -" A'2/k +k(t + A')2 This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp 58 DAVID LEWIS Setting the derivative of (2) with respect to A' equal to 0 and solving, we obtain (3). We can easily verify that (3) gives a maximum value of (2). (3) At _ 1- EcQi r? I e) EcA(>,i r? I e) -l/k Observe that the parameter t has now vanished. That is why we were free to choose t arbitrarily. Equation (3) gives a necessary and sufficient condition for CA to recommend CAT under any one of the accuracy-measures At. Next, let h range over all possible statistical distributions: that is, propositions giving the relative frequencies in the universe of each of the k kinds. Since the statistical distributions are mutually exclusive and jointly exhaustive propositions, and since every statistical distribution h implies that each r, has a definite value rih, we can easily obtain (4). (4) > ri2C(| e)2ECA(E r? 1 e). ih i Using certain properties of the A-methods ([2], section 11) we obtain (5). (5) Z riCA(h I e) ( si + Ak)( N s) ? N Next, let CA be the A-method which would be most accurate, under any At, if the relative frequencies of the kinds in the universe were the same as the relative frequencies of the kinds in the sample described by the evidence e-that is, at a possible world w where, for each kind i, ri(w) = si/s. The results of [2], section 22, yield (6). (6) S z\ S2 + S2k Next, let d be the proposition that a certain two things-an arbitrarily chosen two not in the sample described by e-both belong to the same kind. This proposi- tion is of no relevance to our topic in itself; but it happens that by considering it we can obtain some useful equations. Since the statistical distributions are mutually exclusive and jointly exhaustive, we obtain (7). (7) Cx(d i e) = - CA(dh i e) = : CA(d I h e)CA(h I e). h h Again using properties of the A-methods ([2], section 11) we obtain (8). (8) CA(d I e) si (s + Ak (Si + 1 + A/k ? s+ A s + 1+Ax We obtain equation (9) by considering corresponding terms on the left and right. Whenever h is a statistical distribution inconsistent with e, CA(h I e) = 0 and the term vanishes on both sides; but whenever h is a statistical distribution consistent with e, it can easily be shown that the left-hand factors of the terms on the left and right are equal. This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp IMMODEST INDUCTIVE METHODS 59 (9) 4 C,(d I h e)CA(h I e) = jj[ [(Nt-s ( N-1 g si)]CA(h N e)- Given the system of equations (3)-(9), it is now merely a matter of laborious algebra to solve for A' in terms of N, s, and A. The plan is as follows. First substitute the right hand side of (8) for the left hand side of (7) and the right hand side of (9) for the right hand side of (7). After sim- plifying the resulting equation with the aid of (4), (5), and (6), it becomes possible to solve for E&,(j rf I e) in terms of N, s, A, and k. Substituting this solution into (3) and simplifying further, we eventually obtain (10) as a necessary and sufficient condition for CA to recommend CA'. ClA + C2A + C3 (10) A C4A2?C5A?+C6 where C-=N2+ AN2 - N -AN + S + As S2 C2= 2AN2s + 2N2s -2Ns2 + AS2 C3 = AN2s2 + A'ANS2 C4 = N + AN - s - As + s2 C5-N2 + AN2 + 2Ns2 As2 C6 N2s2 + N2s + AN2s - ANs2. Notice that the parameter k has vanished and that the only relevant properties of the evidence e turn out to be those given by the two numbers s and A. The behavior of the recommendation relation between A' and A, specified by (10), is illustrated in Figure 1. Setting A' A in (10), we obtain the cubic equation (11) as a necessary and sufficient condition for CA to be immodest. (11) 0= C4A3 + (C5 C A2 + (C6-C2)A _C3. If the sample described by e is empty, s = 0; if the sample is uniform, containing things of only one kind, A = 0. In either case, A = 0 is the only non-negative real solution of (11). If s and A are both positive, on the other hand, C4 and C3 are positive; hence (11) has a positive solution, and A 0 is not a solution. If s and A are positive, moreover, C5- C1 cannot be negative unless C6 - 2 is also negative, so by Descartes' rule of signs (11) can have at most one positive solution. Thus in every case (11) has a unique non-negative solution. The behavior of such solutions of (11) is illustrated in Figure 2. This completes the proof of the result stated earlier; exactly one A-method is immodest. More precisely: Theorem: Let M be the class of A-methods; let A be one of Carnap's mean square error accuracy-measures At; and let e be a complete description of a sample. Then exactly one A-method is immodest in M, under A, on e. The immodest A-method is Co iff the sample described by e is either empty or uniform. This theorem may seem welcome. Our new condition of adequacy, immodesty, and This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp 60 DAVID LEWIS ,2=64 - 64=64 464 4=~~~~~8 A8 10 48 10 10 N 16 N =128 N=1024 - , S _ 8 _ N,, - . = - 64- N12 - , S - 512 N 2 ~~~~~~2 I. t / 111111 t t ittill I !!111 ', 11te I. . i i t t tIIJiltl! t II liffl} I II 1111]1 A A 410 A10 A'0 A64 464 10 10 '8 A 48 10 z 641 4 : 7 N=16 N=128 N= 1024 S= 16= ~ 7 S:128=NV8 A 0 lo A 10to A=64 X'-X RELATION 10-=A 41 GIVE N BY ( 10) ,71 = A8 LOG A R IT HM IC 1N 128 N1024 SCALE ~~~~~S==N/ S 32= 10 I 10 FI GU R E I the conditions of adequacy that restrict you to the A-methods are enough to solve completely your problem of choosing an inductive method. You need only choose the one remaining adequate method. The conditions of adequacy thereby determine the degrees to which you should believe all propositions of interest to you, given total evidence consisting of a complete description of a sample. But you should not accept this seeming solution to your inductive problem. In view of the second part of the theorem, it is not satisfactory. In some cases the im- modest A-method is Co; and Co, as Carnap argues ([2], section 14), is an extremely This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp IMMODBST INDUCTIVE METHODS 61 A p < A t / A N=1 I-= N =128 - N=1024 S =N/2 N/ N= S8 N's1 ' w *willl1 R ! tl S-64=-2 S=512= N2 I is 10 [ A 10 t 10 10 10= 10t A A A I1- / N-16 t N=128 1 N 1024 S==N N i6=N N /S-2- / S 168 S28= '8 1 1 10 10 X-A RELATION GIVEN BY ( i I 0 = A ~~~~~~A N 128 1N~ 1024 L O G A R I T H M I C t -=s =/ N02 LOGARITHMIC I ~~~S==N, S=32=N S C A L E ' * 32 I 10 t 10 FIGURE 2 unreasonable method. It calls for jumping to conclusions, with absolute certainty, on little or no evidence. Suppose, for instance, that your total evidence e is the proposition that a certain thing belongs to a certain kind i. Thus e describes a uniform sample of size 1. Then A - 0, so the immodest A-method on e is C0. Let h be the proposition that a certain other thing is also of kind i; C0(h I e) = 1. In terms of our original motivation for the A-methods, the fictitious part of the augmented sample vanishes, so we take the relative frequency of kind i in the actual sample. The small size of the sample makes This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp 62 DAVID LEWIS no difference. Knowing only that one thing is of kind i, you are supposed to believe to degree 1-to be absolutely certain-that the other thing is too. Surely that would be unreasonable. Or suppose you have no evidence; e is merely the necessary proposition. Then s = 0, so the immodest A-method on e is again Co. Let d be the proposition that a certain two things are both of the same kind; CO(d I e) = 1. You are supposed to be certain a priori of the contingent proposition d. That too would be unreasonable. If C0 is inadequate, and if only A-methods are adequate, and if only immodest methods are adequate, and if your total evidence happens to be a description of an empty or uniform sample, then you will be left with no adequate inductive method. You will have no reasonable way to assign degrees of belief to propositions on the basis of your evidence. What can we do about this conclusion? We might accept it; but it amounts to a severe inductive scepticism. Ordinary scepticism is content to claim that there is no good reason to adopt any particular inductive method, but this scepticism is worse: it claims that there are good reasons not to adopt any given inductive method. So much the worse for any philosophical argument that leads to such a conclusion! It will not help to use an exact expression for mean square error in place of the approximation (1). I used that approximation for simplicity and to stay close to [2]. But an exact expression is known (R. Carnap, personal communication); and when that is used in place of (1) in defining accuracy, it turns out to lead to the same un- welcome conclusion: C0 is uniquely immodest on evidence consisting of an empty or uniform sample. We can hardly overcome our objections to choosing C0. If trusting C0 in the cases I described would not be a clear case of inductive unreason, what would be? I do not think we should escape by rejecting immodesty as a condition of adequacy. Consider what that would mean. If you wish to maximize accuracy in choosing a method, and you have knowingly given your trust to any but an im- modest method, how can you justify staying with the method you have chosen? If you really trust your method, and you really want to maximize accuracy, you should take your method's advice and maximize accuracy by switching to some other method that your original method recommends. If that method also is not im- modest, and you trust it, and you still want to maximize accuracy, you should switch again; and so on, unless you happen to hit upon an immodest method. Im- modesty is a condition of adequacy because it is a necessary condition for stable trust. We might escape by looking beyond the A-methods, hoping that in some larger class of inductive methods we will always find an immodest method better than C0. Carnap gives conditions of adequacy that rule out all but the A-methods; but, as he recognizes, some of these conditions are onlyprimafacie plausible. Moreover, there are certain well-known objections to the A-methods, independently of the problem of the unique immodesty of C0. Alternatively, we might escape by rejecting Carnap's mean square error accuracy- measures; I prefer this way out. The reasons for demanding immodesty under what- ever accuracy-measure you want to maximize seem to me strong, but it is not at all This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp IMMODEST INDUCTIVE METHODS 63 obvious that you should want to maximize accuracy as measured by mean square error of estimates of relative frequencies of kinds. These measures are suggested by well-established practices in statistics, for instance least-squares curve-fitting. We have studied them because Carnap used them in [2], but Carnap did not argue for them there. If rejecting them is an easy way out of the problem of the unique im- modesty of C0, that seems a rather good reason for rejecting them. One plausible change in the accuracy-measure comes to mind at once. Perhaps in taking the mean square error of estimates of relative frequencies of kinds on the basis of samples of size t, we should take the mean not over all such samples but only over those which include the sample described by our total evidence e. (This would mean choosing t > s.) Why care about error in cases we already know cannot arise? This change might be appropriate on other grounds, but it will not solve our difficulty: C0 is still uniquely immodest when s = 0. To summarize: I have argued that immodesty-in the class of otherwise adequate methods, under an appropriate accuracy-measure, on the total evidence-is a necessary condition of inductive adequacy. Whether it is a condition that will help much in choosing a method depends on how selective it is. When it is applied to the A-methods, using Carnap's accuracy measures, it is extremely selective. But it is too selective, since sometimes there is no adequate method left. I take this not as an objection to the condition of immodesty, but rather as a reason to expand the class of eligible inductive methods, to find a different accuracy-measure, or both. Having done one or both, we will face a new version of the question: how many, and which, inductive methods are immodest? REFERENCES [1] Barker, S. F., Induction and Hypothesis, Cornell University Press, Ithaca, New York, 1957. [2] Carnap, R., The Continuum of Inductive Methods, University of Chicago Press, Chicago, 1952. This content downloaded from 130.132.173.105 on Tue, 4 Jun 2013 12:12:27 PM All use subject to JSTOR Terms and Conditions http://www.jstor.org/page/info/about/policies/terms.jsp Article Contents p. 54 p. 55 p. 56 p. 57 p. 58 p. 59 p. 60 p. 61 p. 62 p. 63 Issue Table of Contents Philosophy of Science, Vol. 38, No. 1 (Mar., 1971), pp. 1-156 Front Matter Physical and Psychic Energy [pp. 1 - 12] The Place of the Explanation of Particular Facts in Science [pp. 13 - 34] Towards a Reassessment of Comte's 'Méthode Positive' [pp. 35 - 53] Immodest Inductive Methods [pp. 54 - 63] Hume and the Fiery Furnace [pp. 64 - 78] Discussion On the Conceivability of Mechanism [pp. 79 - 86] Functional Statements in Biology [pp. 87 - 95] Formal Models and Achinstein's "Analogies" [pp. 96 - 104] Methodology as an Exercise in Economic Analysis [pp. 105 - 117] On an Argument against Reduction Sentences [pp. 118 - 120] Supercalifragilistic Reduction: A Reply to Jan Berg [p. 121] Book Reviews untitled [pp. 122 - 126] untitled [pp. 126 - 132] Recent Books [p. 133] Abstracts from Synthese [pp. 134 - 137] Back Matter [pp. 138 - 156]