University of Groningen Theory change and Bayesian statistical inference Romeijn, Jan-Willem Published in: Philosophy of Science DOI: 10.1086/508963 IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Final author's version (accepted by publisher, after peer review) Publication date: 2005 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Romeijn, J-W. (2005). Theory change and Bayesian statistical inference. Philosophy of Science, 72(5), 1174-1186. https://doi.org/10.1086/508963 Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 06-04-2021 https://doi.org/10.1086/508963 https://research.rug.nl/en/publications/theory-change-and-bayesian-statistical-inference(3cbcf05c-b827-4491-a6c2-d4d0d76cc7b6).html https://doi.org/10.1086/508963 University of Groningen Theory Change and Bayesian Statistical Inference Romeyn, Jan-Willem Published in: Philosophy of Science IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2008 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Romeyn, J-W. (2008). Theory Change and Bayesian Statistical Inference. Philosophy of Science, 72(5), 1174-1186. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 03-07-2018 https://www.rug.nl/research/portal/en/publications/theory-change-and-bayesian-statistical-inference(6ac25c2c-0ed6-4a0a-8fa5-e67f56bb33d5).html Philosophy of Science, 72 (December 2005) pp. 000–000. 0031-8248/2005/7205-0043$10.00 Copyright 2005 by the Philosophy of Science Association. All rights reserved. Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML Proof 1 Theory Change and Bayesian Statistical Inference Jan-Willem Romeyn† This paper addresses the problem that Bayesian statistical inference cannot accom- modate theory change, and proposes a framework for dealing with such changes. It first presents a scheme for generating predictions from observations by means of hy- potheses. An example shows how the hypotheses represent the theoretical structure underlying the scheme. This is followed by an example of a change of hypotheses. The paper then presents a general framework for hypotheses change, and proposes the minimization of the distance between hypotheses as a rationality criterion. Finally the paper discusses the import of this for Bayesian statistical inference. 1. Introduction. This paper is concerned with Bayesian statistical infer- ences. These inferences are here considered in a scheme that generates predictions by means of hypotheses: Bayesian updating is used to adapt a probability over hypotheses to known observations, and this adapted probability is further used to generate predictions over unknown obser- vations. The hypotheses in the scheme represent the theoretical structure that underlies the predictions. However, after we have chosen these hy- potheses and a prior probability over them, updating fully determines the probabilities over the hypotheses at any later stage, and thus also the predictions resulting from that. There is no room for any further amend- ments to the hypotheses or the prior over them after they have been chosen. In Bayesian statistical inference, the theoretical structure is there- fore fixed. The fixity of the theoretical structure in the above schemes is a specific form of a problem for Bayesianism on the whole. In the philosophy of science it has been formulated, among others by Earman (1992, 195–198), as the problem that Bayesianism fails to accommodate theory change. But the fact that Bayesian inference is in this sense dogmatic is at the †To contact the author, please write to: Department of Philosophy, University of Groningen, UVA/FMG/Psychology, Methodology Unit, Roeterstraat 15, 1018 WB, Amsterdam, Netherlands; e-mail: j.w.romeyn@uva.nl. WORKING 2 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML origin of many other criticisms, including the criticism of Dawid (1982) that Bayesian inference is by definition calibrated. Furthermore, as hy- potheses can be considered as specific terms in the observation language, changing the hypotheses in the scheme comes down to changing the lan- guage with which the predictions are made. The same problem can there- fore be seen in light of the fact that Bayesianism fails to accommodate language change, as noted by Gillies (2000) and discussed elaborately by Williamson (2003). This paper addresses the above problems with Bayesianism. More in particular, it proposes a way of dealing with theory change within Bayesian statistical inference. The plan of the paper is to introduce the Bayesian scheme for generating predictions from hypotheses, to present an example of such a scheme, then to show in the example how hypotheses can be changed, and finally to give a general framework for it. 2. Hypotheses, Conditioning, and Predictions. This section describes the schemes for making predictions. It defines observations and observational hypotheses in terms of an observational algebra, and it presents degrees of belief as probability assignments over this algebra. This set-theoretical underpinning may seem unnecessary in the context of a short paper. How- ever, as will become apparent in Sections 5 and 6, the underpinning is essential for a correct understanding of hypotheses change. The predictions range over possible observations K, a set of consecutive natural numbers, say {0, 1}. At every time t we observe one number . We can represent these observations in an observational algebra.q � Kt Let be the space of all infinite observation sequences e:qK . . .e p q q q . (1)1 2 3 The observational algebra , a so-called cylindrical j-algebra, consists ofQ all possible subsets of the space . If we denote the tth element in aqK series e with , we can define an observation as an element of theqe(t) Qt algebra as follows:Q q qQ p {e � K Fe(t) p q}. (2)t Note that there is a distinction between the observations and the valuesqQt of observations q. The values, represented with small letters, are natural numbers. The observations, denoted with large letters, are elements of the algebra .Q In the same way we can define an element in the algebra that refers to q1 THEORY CHANGE AND BAYESIAN INFERENCEWORKING 3 Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML a finite sequence of observations. If we define the ordered sequence , we can write. . .e p Aq q q St 1 2 t e q ′ ′tE p {e � K FGt ≤ t : e(t ) p q }. (3)′t t Again, it must be noted that the small letters refer to a sequence ofet natural numbers, while the large letters are elements of the algebra andEt carry a sequence of natural numbers as argument. The observations and sequences of observations are related to each other in the natural way: qQ ∩ E p E . (4)t�1 t t�1 As in this equation, I normally refer to sequences of observations with the expression , thereby suppressing the reference to the sequence .E et t Observational hypotheses can also be seen as elements of the obser- vational algebra. If we say of an observational hypothesis h that its truth can be determined as a function of an infinitely long sequence of obser- vations e, then we can define hypotheses as subsets of in the followingqK way: qH p {e � K FW (e) p 1}. (5)h Here if and only if the proposition h is true of e, andW (e) p 1h otherwise. The hypotheses can thus be an argument of the sameW (e) p 0h probability functions over the observational algebra. A partition of hy- potheses is a collection defined by the followingH p {H , H , . . . , H }0 1 N condition for the indicator functions :Whn qGe � K : W (e) p 1. (6)� hn n This means that the hypotheses are mutually exclusive and jointlyHn exhaustive sets in .qK Belief states are represented with probability functions over . TheyQ take observations , sequences , and hypotheses as arguments. TheqQ E Ht t n functions are defined relative to a partition and a sequence of knownH observations : the function represents the belief state upon ob-e pt [H,e ]t serving under the assumption of a partition . It can be constructedE Ht by conditioning a prior probability function on the observationsp[H,e ]0 :Et p (7) p p (7FE ). (7)[H,e ] [H,e ] tt 0 Because of this, we have . Updating the probability by simplep (E ) p 1[H,e ] tt conditioning is known as Bayes’ rule. Both the probabilities assigned to observations and those assigned to hypotheses can be updated for new observations in this way. The probability before updating is called the prior probability, and the one after updating the posterior. WORKING 4 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML To calculate the predictions, we can employ a partition of hypotheses, and apply the law of total probability: q qp (Q ) p p (H )p (Q FH ). (8)�[H,e ] t�1 [H,e ] n [H,e ] t�1 nt t t n The terms are called the posterior likelihoods of onq qp (Q FH ) Q[H,e ] t�1 n t�1t the hypotheses . The prediction is obtained by weighing these posteriorHn likelihoods with the posterior probability over the hypotheses, .p (H )[H,e ] nt Both posterior probabilities of equation (8) can be obtained from a Bayesian update of the prior probability according to expressionp[H,e ]0 (7). In this paper the likelihoods do not change upon conditioning: q qp (Q FH ) p p (Q FH ). (9)[H,e ] t�1 n [H,e ] t�1 nt 0 That is, the observations influence the predictions only via the probability over the hypotheses. Part of the input probabilities for generating the predictions are therefore the likelihoods .q qp (Q ) p (Q FH )[H,e ] t�1 [H,e ] t�1 nt 0 The predictions are further determined by the probability assignment over the hypotheses, . This probability can be determined byp (H )[H,e ] nt means of the relation qp (Q FH )[H,e ] i ni�1p (H ) p p (H ) , (10)[H,e ] n [H,e ] n qi i�1 p (Q )[H,e ] ii�1 where q equals the last number in the sequence . Note that the denom-ei inator can be rewritten with equation (8), substitutingqp (Q ) t p[H,e ] ii�1 . Recall further that the likelihoods are in this paperqi � 1 p (Q FH )[H,e ] i ni�1 equal for all sequences , as expressed in equation (9). The posteriorei�1 probability can therefore be determined recursively by the priorp (H )[H,e ] nt probability for all n, and the likelihoods for all nqp (H ) p (Q FH )[H,e ] n [H,e ] i n0 0 and . These are the other input probabilities for generating thei ≤ t predictions. In sum, predictions can be generated if we assume hypotheses, their likelihoods, and a prior probability over them. The prior and the likeli- hoods are first used to determine the posterior probability over the par- tition. The likelihoods are then used together with this probability over the partition for generating the prediction itself. The whole construction that uses hypotheses to generate predictions is called the hypotheses scheme. 3. Contaminated Cows. This section gives an example of a hypotheses scheme. Needless to say, the case presented falls short of actual scientific cases in many respects. The focus here is on the conceptual issues rather than on actual applications. Consider a veterinary investigating a herd of cows during an epidemic, THEORY CHANGE AND BAYESIAN INFERENCEWORKING 5 Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML classifying them into contaminated and uncontaminated. The farmer claims that the herd has been treated with a drug that reduces the risk of contamination. It is an accepted fact about the epidemic that the av- erage incidence rate among untreated cows is 0.4, as more than half of the cows shows a natural resistance against contamination from other cows. The incidence rate among treated cows is 0.2 on average, because the drug is not always effective. The aim of the investigation is to decide whether the cows have been treated with the drug, and further to predict the incidence rate of the contamination in the herd. The observations of the veterinary consist in test results concerning a number of cows. The result of testing cow t can be that it is contaminated, , or that it is not, . The test results can then be framed inq p 1 q p 0t t the observational algebra. The veterinary may set up a scheme using a partition of two hypotheses on treatment with the drug, in whichD D1 means that the cows are in fact treated and means that they are not.D0 It must be noted that these hypotheses are not linked to observations directly, since the observations only concern contaminations of cows. The relation that treatment bears to the observations is given by the incidence rates, and this relation is purely statistical. For the observational content of the hypothesis on treatment we may takeD1 1 if f (e) p 0.2 W (e) p (11)d1 {0 otherwise, where is the relative frequency of results in the infinite sequencef (e) q p 1t e. The hypothesis may be defined in a similar way using .D f (e) p 0.40 A more precise definition is that the hypotheses comprise all so-called Von Mises collectives for the given incidence rates, but for present pur- poses the loose definition suffices. Being sets in the observational algebra, the hypotheses can also appear as arguments in the probability functions . The fact that the veterinaryp[D,e ]t is undecided on whether the farmer has treated his cows can be reflected in p (D ) p p (D ) p 0.5. (12)[D,e ] 0 [D,e ] 10 0 Hypotheses on other relative frequencies, which are strictly speaking part of the partition, are thus given a zero probability. The likelihoods of cow t being contaminated on the hypotheses that it has or has not been treated are 1p (Q FD ) p 0.2, (13)[D,e ] t 10 1p (Q FD ) p 0.4. (14)[D,e ] t 00 These likelihoods are determined by the hypotheses. I further assume that WORKING 6 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML TABLE 1. Number of Tests t 0 1 2 3 4 5 p (D )[D,e ] 15 .50 .33 .20 .11 .06 .03 1p (Q )[D,e ] t�15 .30 .33 .36 .38 .39 .39 the estimated incidence rates are not affected by the running investiga- tions, so that equation (9) holds. With these values in place, the veterinary can start to predict the in- cidence rate in the herd, and decide over the treatment efforts by the farmer. Imagine that the first five test results are positive, e p 11111. (15)5 Subsequent updating on these test results yields the probabilities and predictions shown in table 1. The probability that the farmer has treated his cows diminishes, and the probability that the next test result is positive tends to 0.4. The conclusions expressed in the above values are that the farmer very probably did not treat his cows, and that a random cow from the herd has a probability close to 0.4 of being contaminated. It must be stressed, however, that these conclusions follow from the test results only if they are combined with the hypotheses scheme . The scheme offers two pos-D sible hypotheses, and the observations are used to divide the probability between them. It is only relative to the partition that most probabilityD settles on after , so that the predictions are equal to the likelihoodsD e0 5 that prescribes for the test results.D0 This example thus illustrates that the hypotheses in the scheme deter- mine a range of probabilistic patterns, from which the observations may select the best fitting one. The hypotheses partition functions as an as- sumption on what patterns can be picked up in the observations. The partition may therefore be called an inductive assumption. Finally, it can be noted that the partition of hypotheses is associated with the theory underlying the scheme. In this case it concerns a classi- fication of a state of the cows into treated and not treated. Both these concepts come with specific observational contents, which define the rel- evant patterns in the observations. There is no conceptual space within the hypotheses scheme, at least not as it is set up in the above, to conclude anything other than that the cows are treated or not treated. In order to create this conceptual space, we must add hypotheses to the scheme. 4. Careless Vaccination. This section shows how the hypotheses employed THEORY CHANGE AND BAYESIAN INFERENCEWORKING 7 Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML in the above scheme can be changed. I describe this change, and illustrate that it allows us to derive different conclusions and predictions. Imagine that the veterinary becomes suspicious of the test results. After all, more than half of the cows are normally immune. The sequence of test results must therefore be a rather unusual stochastic fluctuation on the average relative frequency of 0.4. The veterinary therefore decides to reconsider the inductive assumptions that underly the scheme, and to run a number of additional tests with an adapted scheme. In particular, she investigates the drug that the farmer claims to have used, and finds that it is a vaccinate with a rather unstable quality. In most cases it works very well, even reducing the risk of contamination to 0.025, but careless use turns the vaccinate into a substance that causes a portion of 0.9 cows to be, or at least to appear, contaminated after treatment. The hypotheses that the veterinary wants to add to the scheme are that the drug has been used either carefully or carelessly. The additional hypotheses may be collected in a separate partition ,C with for careful and for careless treatment. Both hypotheses onlyC C1 0 apply to the case in which the cows have actually been treated, . TheD1 combined partition is in which ,B p {B , B , B } B p D B p0 10 11 0 0 10 , and . Hypothesis is again defined with the rel-D 7 C B p D 7 C B1 0 11 1 1 0 ative frequency of 0.4, and the new hypotheses and can be definedB B10 11 with 0.9 and 0.025 respectively. These three relative frequencies define the new partition. It is notable that the hypotheses and cannot be viewed as in-B B10 11 tersections and : judged from the definition using relativeD ∩ C D ∩ C1 0 1 1 frequencies, the original set and both sets and are disjoint. TheD B B1 10 11 relation between the old and the new hypotheses is a rather different one. We must imagine that within every infinite sequence , that is, withine � D1 every possible world in which all cows are treated, we make a further selection of the observations into those concerning cows that have beenqt vaccinated with care, and those concerning cows that have been vaccinated carelessly. So and can be distilled from the old one by breakingB B10 11 up every , for which , into two subrows and bye � D f (e) p 0.2 e e1 0 1 means of a place selection, taking care that the relative frequencies of the two subrows are 0.9 and 0.025 respectively, and by grouping these subrows into and . Because , such place selections can al-B B 0.025 ! 0.2 ! 0.910 11 ways be constructed. The likelihoods of the hypotheses may again be equated to the relative frequencies that define the hypotheses: 1p (Q FB ) p 0.9, (16)[B,e ] t 100 1p (Q FB ) p 0.025. (17)[B,e ] t 110 WORKING 8 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML TABLE 2. Number of Tests t 5 7 9 11 13 15 p (B )[B,e ] 1015 .01 .03 .14 .49 .80 .95 1p (Q )[B,e ] t�115 .39 .42 .47 .62 .80 .88 In order to arrive at the overall incidence rate of 0.2 for treated cows, the veterinary may further assume that a portion of 0.2 of all farmers do not treat the vaccinate with the necessary care, as 0.2 # 0.9 � (1 � . I come back to this choice in Section 6. Finally, using0.2) # 0.025 p 0.2 the probability assignment after five tests, the combined probability of treatment with the drug and the lack of care is p (B ) p 0.03 # 0.2 p 0.006 (18)[B,e ] 105 It must be noted that with the employment of , the probability over theB observational algebra really undergoes an external shock: instead of al- locating 0.030 probability on the set , we now allocate 0.006 onD B1 10 and 0.024 on .B11 With these new hypotheses and the associated inductive assumptions, the veterinary can run a number of additional tests. Let us say that the next ten test results are all positive too, e p 111111111111111. (19)15 Subsequent updating on these test results yields the probabilities and predictions shown in table 2. Now the probability for approaches 1,B10 while the predictions for a cow in the herd to be contaminated tend to 0.9. Clearly these values differ from those that were to be expected on the basis of .D The conclusions expressed in these values are that the farmer did treat his cows with the drug, but that he did not apply it with the necessary care. The further conclusion is that the incidence rate of the epidemic in his herd is 0.9. Again, these conclusions are drawn from the test results in combination with the inductive assumptions of partition . It is onlyB when compared to the other members of the partition that the hypothesis , which prescribes an incidence rate of 0.9, fits the test results best. ForB10 present purposes, however, it is most notable that these conclusions differ from those derivable from .D 5. A Framework for Changing Partitions. The above illustrates how we can change a partition of hypotheses during an update procedure. This section gives a general framework for such changes, and draws attention to the need for new criteria of rationality to guide them. THEORY CHANGE AND BAYESIAN INFERENCEWORKING 9 Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML On the change of partition itself I can be relatively brief. Let us say that the old partition consists of hypothesesH p {H , H , . . . , H }0 1 N with likelihoodsHn q qp (Q FH ) p v . (20)[H,e ] t�1 n nt The addition of a partition to this partition generatesF p {F , F , . . . , F }0 1 M a combined partition , which consists of hypothesesG p H # F N # M . Each of these hypotheses may be associated with a relativeG p H 7 Fnm n m frequency of the observation q, denoted , so thatqgnm q qp (Q FG ) p g . (21)[G,e ] t�1 nm nmt The details of the partition change may be such that for some of the we have that for all q and m. We can then collect the hy-q qH g p vn nm n potheses under the single index number n, as for example above.G Bnm 0 More in general, if two hypotheses and are such that qG G g p′ ′nm n m nm for all q, we can merge them into a single hypothesis. In the extremeqg ′ ′n m case in which for all q the vary only with m, the change of partitionqgnm comes down to a replacement of by .H F With the introduction of new hypotheses, the probability over the ob- servational algebra undergoes an external shock. First, the probability over the hypotheses themselves changes. But since the new hypotheses have different likelihoods, the probability over most other elements of the algebra changes as well. It is in this paper assumed that at the time of change t, the new probability assignment over the hypotheses observes the following restriction: p (G ) p p (H ). (22)� [G,e ] nm [H,e ] nt t m That is, the probability assignment arrived at by updating over is takenH over into the new partition . This restriction serves to link every collectionG to the original hypotheses , but it can be dropped if further∪ G Hm nm n details of the partition change permit it. Finally, within the limits set by this restriction, the probabilities of the hypotheses can vary freely.Gnm It can be noted that the change in probability due to partition change is not one that can be represented as Bayesian conditioning. Conditioning determines how to adapt probability assignments if for some observation or the probability is externally fixed to 1. It is quite different to setqQ Et t the probability of a number of hypotheses to zero, and to redistributeHn this probability over new hypotheses . A partition change is thereforeGnm an external shock to the probability assignment to which we cannot apply Bayesian updating. Now there are many arguments to the effect that Bayesian updating is the only rational way to adapt a probability as- signment to new information, but these arguments do not apply in this WORKING 10 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML case. It seems that the possibility of partition change necessitates new criteria of rationality, and the definition of an associated update operation. 6. Distance between Partitions. This section answers the need for a ra- tionality criterion and an associated update operation. In particular, it elaborates on a distance function between the old and the new partition, and shows how to minimize this distance during the partition change. Williamson (2003) argues that changes in the assignment must be con- servative, that is, as small as possible, and further that such conservatism can be explicated by a minimization of the cross-entropy distance function between the old probability and the new probability p, under the re-p 0 strictions imposed by the external shock. The distance function is defined by p(U ) D( p, p ) p p(U ) log , (23)�0 p (U )U 0 where the index U runs over all sets in the finite algebra over which p 0 and p are defined. As elaborated in Kullback (1959) and Paris (1994, 120– 126), minimizing this distance under the external restrictions effectively minimizes the information change that is induced in the probability as- signment by the external shock. Interestingly, the operation of minimizing cross-entropy coincides with the operation of a Bayesian update in the case that some probability is restricted to 1. It therefore accordsqp (Q )[H,e ] tt with Bayesian statistical inference to adopt the minimization of cross- entropy as the update operation in cases of partition change. We are not yet done with the update operation for partition change. For one thing, the above distance function blows up if the algebra contains an infinite number of elements, as is the case for the algebra . We needQ to select a finite collection of elements of the algebra, for which we may then minimize the distance between the old and the new probability assignment. As already indicated in the example, it is rather intuitive to choose a minimization of the distance between the likelihoods of and of theHn associated collection : the likelihoods fully express the hypotheses,∪ Gm nm and the distance between the likelihoods is therefore an intuitive measure for the closeness of the two partitions. A further reason for choosing this collection can be found in the relation between the old and the new hypotheses. Recall that the likelihoods of observations in are determined by the relative frequencies of theqQ Ht n observations within the possible worlds for which is true. Withq � K Hn the change of hypotheses, we effectively make a further division of these possible worlds into the hypotheses : each infinite sequence of obser-Gnm THEORY CHANGE AND BAYESIAN INFERENCEWORKING 11 Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML vations , having a relative frequency , must be split into M infiniteqe � H vn n subsequences , having relative frequencies , and these subsequencesqe gm nm can then be incorporated into separate hypotheses, . Because thee � Gm nm hypotheses are derived from the original hypotheses in this way,G Hnm n we may expect the relative frequency associated with the aggregate to be the same as, or at least close to, the original relative frequency∪ Gm nm associated with .Hn Any hypothesis prescribes the likelihoods for infinitely many obser- vations , associated with different times . However, these likeli-qQ t ≥ 0t�t hoods are in this paper constant, and it seems natural to define the distance between the partitions as the distance between the likelihoods at a single time . For we can use the old likelihoods . For pqt � t p p (Q FH )0 [H,e ] t�t nt we use the aggregated likelihoods, given by q qg p p (Q F ∪ G )n [G,e ] t�t m nmt p (G )[G,e ] nmt qp p (Q FG ) (24)� [G,e ] t�t nmt m � p (G )[G,e ] nmm t qp r g . (25)� nm nm m Here the are defined by the fraction in equation (24), so thatrnm . The are a function of these .q� r p 1 g rnm n nmm We can now use the distance function to find the aggregated likelihoods that are closest to the likelihoods , forq qp (Q F ∪ G ) p (Q FH )[G,e ] t�t m nm [H,e ] t�t nt t any time t. These distances are defined for each hypothesis separately:Hn qgnqD (r ) p g log . (26)�n nm n qvq n The distance is a function of the fractions , which determine how thernm probability of is distributed over the . The update operation afterH Gn nm a hypotheses change is to find, for every separately, the values ofHn that minimize the distance function .r Dnm n This can be employed to provide a further underpinning for the choice of the probabilities and in the example. It was statedp (B ) p (B )[B,e ] 10 [B,e ] 115 5 there that the veterinary chooses these probabilities in order to arrive at the overall incidence rate of 0.2. Note that the distance between the like- lihoods of and the aggregated likelihoods of is zero and thereforeH G minimal if we find values for so that . In the caseq qr g p � r g p vnm n nm nm nm of the partitions and , the equation simply becomesD B 0.9 # r �10 , for which is the solution.0.025 # (1 � r ) p 0.2 r p 0.210 10 It must be stressed that the above is not the full story on partition change. There are many cases of partition change that are not covered WORKING 12 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML by the above framework, but that can in principle be treated in a similar way. One such case deserves separate attention here. The above example presents a probability assignment that is not open-minded: almost all hypotheses on relative frequencies are given a zero probability. This may cause the impression that the framework for partition change can only be applied if the old probability assignment is not open-minded. It may be hard to see what other hypotheses can be added if, for instance, the prior probability already includes all possible hypotheses on relative fre- quencies. However, the above framework can also be used to change a partition of all hypotheses on relative frequencies into a partition of hy- potheses that concern Markov processes. The application of the frame- work for partition change is thus not limited to cases in which the prior is not open-minded. 7. Concluding Remarks. The above shows how we can frame a partition change, and provides a procedure to render this change rational, em- ploying a distance function between the partitions. I complete the paper with a summary and some remarks on the proposed framework in the context of Bayesian statistical inference. The proposed framework enables us to adapt the hypotheses that func- tion in a scheme for making predictions. By writing down the predictions in terms of such hypotheses schemes, I locate the theoretical structure underlying the predictions inside the probability assignment. Theoretical developments can therefore be framed as external shocks to the probability assignment representing the opinions, just as new observations. I then argue that the operation that updates the assignment for the external shock is a generalized version of Bayesian conditioning, namely cross- entropy minimization. The framework is therefore a natural extension of Bayesian statistical inference. On the whole, the paper proposes an answer to the problem that Bayesian statistical inference cannot accommodate theory change. The paper may also fulfill a role in an older discussion between in- ductivists and Popperians: the above basically shows how we can encom- pass a notion of conjecture within an inductivist setting. It is a typical feature of Carnapian inductive logic that there is no room for an explicit formulation of inductive assumptions, as such assumptions are part and parcel of the choice of language. Conjectures can therefore not be captured within a Carnapian logic. However, the above framework locates the premisses in the hypotheses schemes, and further allows us to change them. It provides a truly nonmonotonic probabilistic inductive logic, in the sense that the inductive assumptions may be altered underway. It is hoped that this paper is a first step in freeing inductive logic from its dependence on language. THEORY CHANGE AND BAYESIAN INFERENCEWORKING 13 Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML REFERENCES Dawid, A. P. (1982), “The Well-Calibrated Bayesian”, Journal of the American Statistical Association 77: 605–613. Earman, J. (1992), Bayes or Bust. Cambridge, MA: MIT Press. Gillies, D. (2001), “Bayesianism and the Fixity of the Theoretical Framework”, in D. Corfield and J. Williamson (eds.), Foundations of Bayesianism. Dordrecht: Kluwer, 363–379. Howson, C., and P. Urbach (1989), Scientific Reasoning: The Bayesian Approach. LaSalle, IL: Open Court. Kullback, S. (1959), Information Theory and Statistics. New York: Wiley. Paris, J. (1994), The Uncertain Reasoner’s Companion. Cambridge: Cambridge University Press. Williamson, J. (2003), “Bayesianism and Language Change”, Journal of Logic, Language and Information 12: 53–97. q2 WORKING 14 JAN-WILLEM ROMEYN Friday Sep 22 2006 03:29 PM PHOS v72n5 720547 VML QUERIES TO THE AUTHOR 1 Gillies is dated 2001 in the Refs. Which date is correct? 2 Au: Not cited in text. Either cite or delete from the References. Philosophy of Science Reprint Order Form Department of Philosophy University of South Carolina Columbia, SC 29208 REPRINT ORDER MUST BE RECEIVED PRIOR TO PRINTING OF JOURNAL ISSUE. Reprints are shipped 2-4 weeks after publication of the journal. TO BE COMPLETED BY AUTHOR: Philosophy of Science Vol ______ No _____ Month ________________________ Author(s): _____________________________________________________________________ No of pages in article __________ Title of Article: ______________________________________________________________________________________________ R E P R I N T C H A R G E S (please compute) _______ Quantity $ ___________ Covers $ ___________ Subtotal $ ___________ GST (7% for Canadian destinations only) $ ___________ Non-U.S. and non-Canada shipping (Non-U.S. orders add 45% to subtotal) $ ___________ TOTAL DUE (US $) $ ___________ Prices include shipping for U.S. and Canadian orders. Non-U.S and non-Canadian orders are shipped via Airmail at an additional cost of 45% of the total printing charge. SHIPPING INSTRUCTIONS BILLING INSTRUCTIONS (Institutional Orders Only) Name ____________________________________________________ Institution _______________________________________________ __________________________________________________________ ________________________________________________________ _________________________________________________________ City _______________________ State _____ Zip ______________ Street_____________________________________________________ Country _________________________________________________ City __________________________ State_____ Zip ______________ *Phone _________________________________________________ Country ___________________________________________________ * Please include a phone number in case we need to contact you about your order MAKE CHECKS AND PURCHASE ORDERS PAYABLE TO: The University of Chicago Press. Orders need to be received prior to the printing of the journal issue. All orders must be accompanied by one of the three payment options: purchase order, check/money order, or Visa/Mastercard in U.S. currency via an American bank. Terms are net 30 days. Reprints are shipped 2-4 weeks after publication of the journal. 1) G Check or Money Order for total charges is attached OR 2) Please charge to: G VISA G MASTERCARD Cardmember name as it appears on card (please print clearly) ________________________________________________________ Card Number _______________________________________________Expiration Date __________________________________ Signature _________________________________________________ Phone __________________________________________ 3) Institutional Purchase Order No. ___________________________________ Purchase Order attached G to come G Reprint rate chart on page 2 Return this form with your proofs ONLY if you wish to order reprints REPRINT PRICE LIST 1-4 5-8 9-12 13-16 17-20 21-24 add’l 4 pages COVER S 25 $26 $30 $35 $41 $46 $50 $6 $14.00 50 $34 $40 $46 $53 $59 $66 $7 $18.00 100 $52 $65 $77 $91 $103 $116 $13 $35.00 150 $67 $86 $104 $125 $143 $163 $19 $53.00 200 $87 $111 $135 $162 $174 $202 $25 $70.00 250 $116 $147 $177 $211 $241 $276 $31 $88.00 300 $138 $176 $212 $253 $289 $330 $37 $105.00 350 $161 $204 $243 $294 $336 $373 $43 $123.00 400 $183 $233 $329 $337 $385 $440 $49 $140.00 Number of pages Q u a n ti ty