The Curve Fitting Problem: A Bayesian Approach


Philosophy of Science Association

The Curve Fitting Problem: A Bayesian Approach
Author(s): Prasanta S. Bandyopadhayay, Robert J. Boik, Prasun Basu
Source: Philosophy of Science, Vol. 63, Supplement. Proceedings of the 1996 Biennial Meetings
of the Philosophy of Science Association. Part I: Contributed Papers (Sep., 1996), pp. S264-S272
Published by: The University of Chicago Press on behalf of the Philosophy of Science
Association
Stable URL: http://www.jstor.org/stable/188535
Accessed: 31/03/2010 17:03

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=ucpress.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

The University of Chicago Press and Philosophy of Science Association are collaborating with JSTOR to
digitize, preserve and extend access to Philosophy of Science.

http://www.jstor.org

http://www.jstor.org/stable/188535?origin=JSTOR-pdf
http://www.jstor.org/page/info/about/policies/terms.jsp
http://www.jstor.org/action/showPublisher?publisherCode=ucpress


THE CURVE FITTING PROBLEM: A BAYESIAN APPROACH 

PRASANTA S. BANDYOPADHAYAYt 

Montana State University 

ROBERT J. BOIKt 

Montana State University 

PRASUN BASU? 

University of Rochester 

In the curve fitting problem two conflicting desiderata, simplicity and goodness- 
of-fit, pull in opposite directions. To this problem, we propose a solution that 
strikes a balance between simplicity and goodness-of-fit. Using Bayes' theorem we 
argue that the notion of prior probability represents a measurement of simplicity 
of a theory, whereas the notion of likelihood represents the theory's goodness-of- 
fit. We justify the use of prior probability and show how to calculate the likelihood 
of a family of curves. We diagnose the relationship between simplicity of a theory 
and its predictive accuracy. 

Overview. Two conflicting desiderata, simplicity and goodness-of-fit, play key 
roles in fitting a curve to numerical data. Simplicity determines the shape of the 
curve. Goodness-of-fit, on the other hand, determines the curve that best captures 
the data. Working with a straight line is easy for predicting future data because it 
is simple. A linear equation, however, does not necessarily fit the available data; 
a non-linear equation may fit the data better, although the nonlinear equation is 
more complex. In the curve fitting problem, these two desiderata, simplicity and 
goodness-of-fit, pull in opposite directions. How can we make the best trade-off 
between these conflicting desiderata? Glymour (1980) writes, "The only moral I 
propose to draw (in case of the curve fitting problem) is that there is no satisfactory 
rationale for curve fitting available to use it." In response to Glymour, Turney 
(1990) has suggested a method to ease the tension between simplicity and goodness- 
of-fit. Forster and Sober (1994) have proposed a non-Bayesian solution to this 
problem. 

We suggest a solution to the curve fitting problem by using Bayes' theorem. Bayes' 
theorem states that one can get a posterior probability if one knows the prior prob- 
ability and the likelihood function. That is, Pr(HIE) = Pr(H)Pr(EjH)/Pr(E), where 
Pr(E) > 0 is the marginal probability of the evidence; Pr(H) is the agent's prior prob- 
ability about the hypothesis, H, before any evidence is known; and Pr(EIH) is the 
probability of the evidence given the hypothesis and is called the likelihood function. 

In our proposal, prior probability measures the simplicity of a hypothesis. A 

tDepartment of Philosophy, Montana State University, Bozeman, MT 59717. 
:Department of Mathematical Sciences, Montana State University, Bozeman, MT 59717. 
?Simon School of Business and Management, University of Rochester, Rochester, NY 14627. 

Philosophy of Science, 63 (Proceedings) pp. S264-S272. 0031-8248/96/63supp-0031 $2.00 
Copyright 1996 by the Philosophy of Science Association. All rights reserved. 

S264 


THE CURVE FITTING PROBLEM 

hypothesis gets a higher probability than its competitors, ceteris paribus, if it has 
fewer parameters. In contrast, we say that the likelihood function measures the 
goodness-of-fit. A hypothesis with more parameters generally has a higher likeli- 
hood than one with fewer parameters. Given prior probability and likelihood func- 
tion of a hypothesis, we get its posterior probability. We choose the hypothesis 
that has the highest posterior probability as making the best trade-off between 
simplicity and goodness-of-fit. 

1. Sketch of Solution. Consider three hypotheses, Hi, H2, and H3 in a domain in 
which each is mutually exclusive of the others: Hi: E(Ylx) = ao + o,x; H2: E(Ylx) 
= 

0o + ctix + a2x2; and H3: E(Ylx) = ao + alx + a2x2 + a3x3; where Y is a 
random variable, E(Ylx) is the conditional expectation of Y given x, and x is an 
explanatory variable. To say that these hypotheses are mutually exclusive is to say 
that the coefficient of xi under hypothesis Hi is not equal to zero. 

According to Bayes' theorem, the posterior probability of a hypothesis is di- 
rectly proportional to prior probability of the hypothesis multiplied by its likeli- 
hood. In our view, the prior probability represents a measure of the simplicity of 
the hypothesis. We assign prior probability as a decreasing function of the number 
of parameters. That is, Hi, with the fewest parameters, gets the highest prior prob- 
ability; H2, the second highest; and H3 and any other hypotheses with more pa- 
rameters are assigned lower probabilities. Specifically, for our three hypotheses, 
H1 is assigned 1/2, followed by H2 with 1/4 and H3 with 1/8. The remaining hy- 
pothesis, denoted by He, we call the catch all hypothesis and assign it probability 
1/8. The expression "catch all" hypothesis is due to Shimony (1993). For a Baye- 
sian account of catch-all hypothesis and its relation to the logical omniscient con- 
dition, see Earman 1992. Because the prior probability of Hc is small and we do 
not have any clue as to how to calculate its likelihood, we don't consider it to be 
a serious contender. A justification for preferring simpler hypotheses is given in 
Section 2. 

The likelihood of the ith hypothesis, denoted by Li, provides an answer to the 
question how likely is the evidence given the hypothesis. The likelihood function 
is sometimes expressed as a probability function, Li = Pr(datalH); i.e., the prob- 
ability of the data given any vector of parameters ai = (ao0 . . . a) belonging to 
Hi. We choose to evaluate the likelihood function at the value of a, which makes 
the data most probable under Hi. That is, we take the maximum of Pr(datalH) 
over the entire parameter space of Hi and denote its value by Li. This maximal 

value, Li, is obtained by equating the vector of coefficients to the maximum like- 
lihood estimate (MLE), 0i and is a measure of the highest degree of support that 
the data can provide under a particular hypothesis. A Bayesian justification for 

using Li to measure the likelihood is given in section 3. 

2. Justification of Prior Beliefs. An agent's prior belief for a theory represents that 
agent's belief in the hypothesis before any evidence is known. Another agent may 
have a different prior probability for the same hypothesis. Bayesianism allows two 
agents to start with non-extreme divergent priors, provided their assignments of 
priors are consistent with the probability calculus. For this reason, Bayesians are 
sometimes branded as subjectivists. Bayesians not only apply Bayes' theorem, but 
also they interpret Bayes' theorem. According to Bayesians, as evidence accumu- 
lates, two agents with two non-extreme divergent priors that obey the probability 
calculus will eventually converge to strong belief in the correct hypothesis subject 

S265 


PRASANTA S. BANDYOPADHAYAY ET AL. 

to certain constraints. If your degrees of belief disobey the probability calculus, 
then, according to Bayesians, you will be dutched provided you are involved in a 
bet with a Dutch bookie. For different ramifications of the Dutch-bookie argu- 
ment, see Skyrms 1990. For a criticism of Bayesians, see Kyburg 1992. 

Bayesians can assign any prior probability to a hypothesis, provided that the 
probability calculus is satisfied. Why, then, do we choose to give the highest prior 
probability 1/2 to the hypothesis which has fewest parameters? An answer to this 
question depends on our account of simplicity and its relationship to the assign- 
ment of priors. In our account, simplicity of a theory determines its prior proba- 
bility. Interplay of two factors, formal and non-formal, on other hand, determines 
simplicity of a theory. The formal factor that is relevant to the simplicity consid- 
eration is paucity of parameters. The non-formal factors that play key roles in 
determining simplicity are epistemological and pragmatic factors. 

We consider H1 as the simplest hypothesis because we find that it is easiest to 
work with a hypothesis with fewer parameters. Though our selection of HI as the 
simplest hypothesis is based on a pragmatic consideration, this pragmatic consid- 
eration is not necessarily devoid of any relationship with our epistemic reason for 
embracing H1 as the simplest hypothesis. 

Many philosophers, including van Fraassen (1980), contend that reasons for 
accepting a hypothesis may be numerous. Some reasons for acceptance are prag- 
matic, whereas others are epistemic. Van Fraassen thinks that epistemic reasons 
for acceptance of a theory cannot be pragmatic reasons for acceptance of a theory 
and vice versa. In contrast, Harman (forthcoming) convincingly argues that the 
distinction between pragmatic reasons and epistemic reasons need not be exclusive. 
In other words, pragmatic reasons can sometimes be epistemic reasons and vice 
versa. Following Harman, we contend that if a consideration makes a difference 
in the probability of embracing a theory, then it is an epistemic reason for em- 
bracing the theory. If there is a pragmatic consideration, in the light of which we 
decide that the simplest theory has the highest prior probability, then this is a 
pragmatic-epistemic reason for believing the theory. That is, it is likely that a 
simpler theory will be true. 

3. Bayesian Justification for Maximizing the Likelihood. In the example of section 
1, we assigned higher likelihoods to hypotheses which had many adjustable pa- 
rameters. The pertinent question, however, is how can we calculate the likelihood 
of a family of curves? By a family of curves, we mean the infinite set of curves 
generated by allowing the coefficients ao ........ a, to take on any values subject 
to ai #0 0. Our proposal is to calculate the likelihood of a family of curves by the 
likelihood of the best fitting curve in that family. Bennett (in private discussion) 
urges on us the need for focusing on the likelihood of the best fitting curve while 
calculating the likelihood of a family of curve. 

One criticism of the proposed approach is that it appears to assign prior prob- 
ability 1 to the maximum likelihood estimator, ai. Forster while commenting on a 
previous version of this paper (presented at the APA meeting, Central Division, 
1995), has raised this objection. Sober (in private correspondence) has also objected 
to the same point. Because the MLE cannot be computed until the data are ob- 
served, our prior appears to depend on the data. In this section, we show that this 
is not the case. We do not assign prior probability of 1 to the MLE. On the 
contrary, our approach spreads the prior probability for the vector a, over the 
whole of i + 1 dimensional space. The MLE, ai, is but one point in this space. 
Our approach is identical to what Rosenkrantz (1977) called the method of aver- 

S266 


THE CURVE FITTING PROBLEM 

aged likelihoods, but our choice of prior distributions for the unknown parameters 
differs from his. 

Suppose that there are k hypotheses (k = 3 in our example), where each hy- 
pothesis corresponds to a family of polynomial curves. The ith hypothesis can be 

written as H,: E(Ylx) = axij where ac forj = 1, . .., i are unknown regression 
j=0 

coefficients and x is a known explanatory variable. A sample of n data points (i.e., 
the evidence) will be drawn. Denote the sample by (Y, x), where Y = (Yi Y2 ... 
Y,)' and x = (xl x2... x")'. It is assumed that if Hi is true, then the data points 

are independently and normally distributed with mean E(Y,tx,) = ajxtJ and 
j=0 

variance a2. This normality assumption is conveniently summarized as 

Y]Hi, X,, a,, a2 
- N(Xiai, 2n2I), (1) 

where Xi: n x (i + 1) and a,: (i + 1) x 1 are given by 

^ 1 X^ X2 ... XI a, /1 x1 x2 ... x1 a 
X1= 

: : : : and a1i - 

\ X Xn n 
X/ \ ai 

To compute the posterior distribution of Hi, prior distributions on Hi and on 
the parameters ai and 

2 must be specified. In Section 1, prior probabilities were 
assigned to Hi using an inverse relationship: the fewer the number of adjustable 
parameters in H,, the larger its prior probability. This prior can be stated as 

Pr(H1) = 0 for i = 1,. ..,k. (2) 
For the example of Section 1, 01 = 1/2, 02 = 1/4, and 03 = 1/8. The remaining 
hypotheses were collectively assigned probability 1/8. 

For the assignment of priors to a, and a2, it is assumed that little is known about 
these parameters before collecting data. Accordingly, diffuse priors will be adop- 
ted. Diffuse priors spread the probability over the entire parameter space in such 
a way that no points are greatly favored over others. Specifically, we assume that 

ailXi, 2, 2c, di - N(ai, Vi); and that ln(a2) - Uniform(- o, oo); (3) 
where ai is the prior mean of ai; Vi is the prior variance of ai; Vi = o2 zT'(il)(Xi1/X)- 
and t is a large positive constant. The prior mean of ai can be assigned any value 
because the posterior distribution of the hypothesis depends on ai only minimally 
when z is large. 

The prior adopted for ai is a conjugate prior (Berger 1985) for the normal density 
of Y in (1). Furthermore, the prior on ai is a special case of the prior used by Smith 
and Spiegelhalter (1980) as well as a special case of the g-prior suggested by Zellner 
(1986). Further details concerning the prior on a, are available from the authors. 
The parameter T controls how diffusely the prior probability on ai is spread over 
the i + 1 dimensional space. As the value of T increases, the prior distribution on 
a, becomes more diffuse. In our calculation of posterior probabilities, we will take 
z to be an infinitely large value. The prior on cr2 is improper and was suggested by 
Jeffreys (1961). It is an invariant diffuse prior and says that on the log scale, a2 is 
equally likely to be in any interval of fixed size. 

The posterior probability of Hi conditional on the data is given by Bayes' the- 
orem as 

Pr(HilY, t, tj,... ak) = Pr(YIH,T, a ... Ak) Pr(H1)/Pr(YIT, a. . , ak). 

S267 


PRASANTA S. BANDYOPADHAYAY ET AL. 

Using the probability model in (1) and the priors in (2) and (3), the posterior 
probability, for fixed T and a ..., ak, is 

Pr(HI|Y, T, ai, ... ak) 

Pr(H) r(H Y , ,) Pr Pr(YX a 2 Pr(a r52)daT,d&2 

k 

PrHj P r(X,Hj) Pr(YI Xj, , 
2) 

Pr(alX, (2, , aj) Pr(C2)dcjd(y2 

Oi[ SSE, + (i - ,)'X'Xi(ai- a)]-nI2- (i+1)/2 

k 

Oj[SSE + (j X - a)'XXj(aj - a)jp1]-n/2 p,j-(+)/2 
j=i 

where (pj = 1 + 1'(+') forj = 1,.... k; 

&i = (XI'X)-'X'Y; and SSEi = (Y - Xia)'(Y - Xia). (4) 

The quantities in (4) are the maximum likelihood estimator of ai under hypothesis 
Hi and the corresponding sum of squared residuals (lack of fit measure) from the 
maximum likelihood fit, respectively. Mathematical details concerning the required 
integrations can be obtained from the authors. 

Our approach is to compute Pr(H,iY) as the limiting probability as T -- oo. The 
result is Pr(HilY) 

= lim Pr(HilY, ', al ... , cak) = OiSSEi'2 OjSSE-n/2 . (5) 

Alternatively, one can say that the posterior probability of Hi is proportional to 
the numerator of (5) because the denominator becomes constant when condition- 
ing on the data. That is, 

Pr(Hi,Y) Oi,/SSE7'2, (6) 

where k is any constant. 
In Section 1, it was proposed to compute the posterior probability by multiply- 

ing the prior probability of Hi by the maximized likelihood function. Using the 
notation of this section, the proposal was to compute Pr(HilY) as 

Pr(H,iY) oc ,OiL = Oi max exp - (Y - Xiai)'(Y - Xii)/(2(2) /(2n(U2)n/2, 

where the maximum is taken with respect to ai and a2. Equating the derivatives 
with respect to unknown parameters to zero and solving the resulting normal 
equations yields (6), with k = e-"/2 (2n/n)-r2. Thus, if one adopts the priors in (2) 
and (3), then the posterior probability is obtained by multiplying the prior prob- 
ability of Hi by the maximized likelihood function. 

4. Illustration of Likelihood Calculations. Consider the following example. Sup- 
pose that Sue heats her house with natural gas. The amount of gas required to 
heat the home depends on the outside temperature. That is, if the weather is cold, 
Sue needs more gas to heat her house as long as her family's habits, the insulation 
of the house and the other relevant factors remain unchanged. She measures her 
household's natural gas consumption each month during one heating season, from 
October to the following June. For the sake of simplicity, we assume each month 
consists of 30 days. Outside temperature influences gas consumption only when it 

S268 


THE CURVE FITTING PROBLEM 

is cold enough to require heating. We measure the usual need for heating in degree 
days. One heating degree day is accumulated for each degree the average daily 
temperature falls below 65 degree Fahrenheit. An average temperature of 20?F, 
for example, corresponds to 45 degree days. In Table I, the explanatory variable, 
x, is heating degree days per day for the month, and the response variable, Y, is 
gas consumption per day in hundreds of cubic feet 

TABLE I 
Month 

Variable Oct Nov Dec Jan Feb Mar Apr May June 
x 15.6 26.8 37.4 36.4 35.5 18.6 15.3 7.9 0.0 
Y 5.2 6.1 8.7 8.5 8.8 4.9 4.5 2.5 1.1 

The following summary statistics were calculated using equations (4) and (6). 

TABLE II 
Maximum Likelihood Estimates 

Hypothesis a0 al 2 3 SSE L Pr(HIY) 
1 1.221 0.203 1.300 0.0173 0.0087 
2 1.095 0.223 -0.0005 1.259 0.0199 0.0050 
3 0.975 0.300 -0.0064 0.0001 1.140 0.0310 0.0039 

Table II shows that HI makes a better trade-off between simplicity and goodness- 
of-fit than either H2 or H3. Even though the likelihoods of H2 and H3 are larger 
than that of HI, the posterior probability for H1 is largest because of the impact 
of higher prior probability assigned to H1. In the fifth and final section, we will 
discuss the implications of this example. 

5. Predictive Accuracy and Simplicity. Predictive accuracy is a comparative con- 
cept. We say that one theory has a greater predictive accuracy than another only 
if the former is much closer to the truth than the latter. The evidential role of the 
simplicity of a theory is related to its predictive accuracy. The evidential role sim- 
plicity plays can be understood in two ways: (i) We can evaluate the simplicity of 
a hypothesis with respect to its retrodiction and (ii) we can consider a theory's 
predictive accuracy in the future. 

Philosophers disagree as to whether simplicity has an evidential role in theory 
choice. Different stripes of instrumentalists contend that simplicity never plays an 
evidential role. According to them, simplicity is a pragmatic reason for embracing 
a theory. Realists, on the other hand, argue that simplicity sometimes plays an 
evidential role in theory appraisal. Quine (1992) and van Fraassen belong to the 
former group, whereas Forster and Sober belong to the latter. We think both are 
mistaken in taking simplicity to be playing one role or another in theory choice. 
Both instrumentalists and realists are mistaken, for whether a simple theory has 
better predictive accuracy depends on what future data we are confronted with. If 
the future data are more amenable to a simpler hypothesis like HI, then the simpler 
hypothesis is likely to have a better predictive accuracy than a less simple hypoth- 
esis, and so to have an evidential role to play in theory appraisal. If, however, the 
new data fit well with a complex hypothesis H2, then it has better predictive ac- 
curacy and so has an evidential role to play. The key idea in understanding the 

S269 


PRASANTA S. BANDYOPADHAYAY ET AL. 

connection, if any, between simplicity of a theory and its predictive accuracy is 
related to what kind of future data, linear or quadratic, we confront. 

Consider the posterior probabilities in Table II. Hypothesis 1 looks to be most 
predictively accurate, given currently available evidence. When we take into ac- 
count some future data, however, then we might find that a less simple hypothesis, 
say H2, gives better predictive accuracy. Recall the linear equation E(Ylx) = ao + 
alx under HI. Sue wants to know what her gas consumption will be next February. 
Suppose that next February, her gas consumption is 870 cubic feet per day. We 
cannot compare this year's gas consumption with last February rate (880 cubic 
feet per day) unless the average temperatures for the two months are the same. 
Suppose that next February has an average of 40 degree days. We thereforefore- 
cast from the regression equation how much gas the house would have used at 40 

degree day this year. Our forecast is Y = 1.221 + (0.203) 40 = 9.341 or 934 cubic 
feet. Sue estimates that she saved about 64 cubic feet per day. 

Compare this prediction with the prediction based on H2: Y = 1.095 + 
0.223x- 0.0005x2 = 9.243. In other words, 924 cubic feet per day. This gives a 
more accurate prediction than that based on H1. Though the prediction does not 
agree with the exact amount of gas consumption during the month of February, 
it is much closer to the actual value than that of made on the basis of H,. 

This example shows the senses in which the predictive accuracy of a theory does 
not always depend on the simplicity of a theory. Although H1 is simpler than H2, 
H2 is predictively more accurate than H,, at least for next February. One possible 
pragmatist's contention is that any linear hypothesis is always simpler than any 
other nonlinear hypothesis. This fact is independent of whether the theory in ques- 
tion is predictively accurate. If two theories are predictively accurate, then accord- 
ing to pragmatists, we can choose the simpler of the two theories. If theories are 
not predictively accurate, then, if simplicity is our sole concern, we can go for the 
simpler theory. In both the cases, for pragmatists, simplicity of a theory has noth- 
ing to do with its predictive accuracy and therefore, has no evidential role to play 
in theory appraisal. 

Our response to this defense of simplicity is that simplicity is not the sole cri- 
terion in theory choice. Informativeness or empirical adequacy of a theory plays 
an equally important role in preferring one theory to the other. Pragmatists like 
van Fraassen, however, won't agree with this pragmatist's rejoinder to our argu- 
ment. For him, the necessary condition for acceptance of a theory involves the 
belief that the theory in question is empirically adequate. In the above scenario, 
the linear hypothesis is not an empirically adequate theory, since it does not pro- 
vide a predictively accurate account of the relation between the response variable 
and the explanatory variable. Since the linear hypothesis is not an empirically 
adequate theory, van Fraassen won't consider it to be a contending hypothesis. 
We do not know Quine's position at this point. 

To see why the nature of future data is crucial in this situation, we will enrich 
this example with further details. The next February, Sue's gas consumption is, in 
fact, 870 cubic feet. Our additional information is that Sue adds insulation to her 
attic during the summer, anticipating that her gas consumption will be reduced in 
the coming year. Given this information, Sue saved about 64 cubic feet per day 
by adding insulation based on the prediction of the linear hypothesis. In contrast, 
based on the prediction of the second degree equation, Sue estimates saving 54 
cubic feet per day. 

If we furnish our situation with this additional information, Forster and Sober 
may argue, that this does not show that the predictive accuracy of the second 

S270 


THE CURVE FITTING PROBLEM 

degree hypothesis is better than the first degree hypothesis. In their rejoinder, they 
seem to be saying that this further information that Sue added insulation to her 
attic during the summer, is not what the linear hypothesis is supposed to capture 
at the time of forecasting. When these two hypotheses are constructed, they are 
constructed based on information available at that time. Since the added infor- 
mation is not available at the time of the construction of these hypotheses, the 
claim that the first degree hypotheses is not predictively accurate, according to 
them, changes the subject if we conclude from this that the simpler of the two 
hypotheses fails to be predictively accurate. I think that they are right in pointing 
out that the new information is not what we can expect of a linear or nonlinear 
hypothesis to predict beforehand. 

Our discussion about the Sue example shows that a simpler theory has an evi- 
dential role to play in theory choice (since it has a greater predictive accuracy than 
its complex counterparts), does not make sense unless we know what kind of future 
data we will be predicting based on this simpler theory. This is argued in two ways: 
(i) In the Sue example before any further information added and (ii) in the Sue 
example after any further information added. 

(i) In the first case, a simpler theory will have greater predictive accuracy so 
long as the future data are amenable to this simple hypothesis. Even 
though H1 has the highest posterior probability, as soon as we bring in 
new datum, H2 or H3 could provide more accurate predictions. 

(ii) In the second case, i.e. the Sue example, after any further information is 
added, Forster and Sober rightly point out that we cannot expect our 
linear or nonlinear hypotheses to anticipate the fact about Sue that she 
adds insulation to her attic during the summer so that the next February 
her gas bill will be appreciably lower. So, it is clearly a change of subject. 
Recall, however, that the first degree hypothesis is not predictively ac- 
curate at any rate before any further information is supplied to us. In 
this situation, whether the first order hypothesis fails to be predictively 
accurate depends on what future data we encounter. In any case, to ap- 
preciate the evidential relations between simplicity of a theory and its 
predictive accuracy so that we can say that a simpler theory is a good 
predictor, we need to know the nature of future data. This is a point 
which Forster and Sober's recent article overlooks. 

Summing Up. In the curve fitting problem two conflicting desiderata, simplicity 
and goodness-of-fit pull in opposite directions. To this problem, we proposed a 
Bayesian solution that strikes a balance between simplicity and goodness-of-fit. 
Using Bayes' theorem we argued that the notion of prior probability represents a 
measure of simplicity of a theory, whereas the notion of likelihood represents the 
theory's goodness-of-fit. We justified the use of prior probability and showed how 
to calculate the likelihood of a family of curves. We also diagnosed the relationship 
between simplicity of a theory and its predictive accuracy. 

REFERENCES 

Berger, J. 0. (1985), Statistical Decision Theory and Bayesian Analysis, Second edition. New 
York: Springer-Verlag. 

Earman, J. (1992), Bayes or Bust?. Cambridge, MA: MIT Press. 
Forster, M. and Sober, E. (1994), "How to Tell when Simpler, More Unified, or Less Ad 

Hoc Theories will Provide More Accurate Predictions", British Journalfor Philosophy 
of Science 45, 1-35. 

S271 


S272 PRASANTA S. BANDYOPADHAYAY ET AL. 

Glymour, C. (1980), Theory and Evidence. Princeton: Princeton University Press. 
Harman, G. (Forthcoming), "Pragmatism and Reasons for Belief' in C. B. Kulp (ed.), Re- 

alism/Antirealism and Epistemology. Totowa, NJ: Rowman and Littlefield. 
Jeffreys, H. (1961), Theory of Probability, third edition. New York: Oxford University Press. 
Kyburg. H. (1992), "The Scope of Bayesian Reasoning", in D. Hull, M. Forbes and K. 

Okruhlik (eds.), PSA-1992, 2, East Lansing: The Philosophy of Science Association, 
pp. 139-152. 

Quine, W (1992), The Pursuit of Truth. Cambridge: Harvard University Press. 
Rosenkrantz, R. D. (1977), Inference, Method and Decision. Boston: D. Reidel Publishing 

Company. 
Shimony, A. (1993), "Scientific Inference" in Search for a Naturalistic World View, Vol. 1. 

New York: Cambridge University Press. 
Skyrms, B (1990), The Dynamics of Rational Deliberations. Cambridge, MA: Harvard Uni- 

versity Press. 
Smith, A. F. M., & Spiegelhalter, D. J. (1980), "Bayes Factors and Choice Criteria for Linear 

Models", Journal of the Royal Statistical Society, Series B 42, 213-220. 
Turney, P. (1990), "The Curve Fitting Problem-A Solution", British Journalfor the Phi- 

losophy of Science 25, 509-530. 
Van Fraassen, B. (1980), The Scientific Image. New York: Oxford University Press. 
Zellner, A. (1986), "On Assessing Prior Distributions and Bayesian Regression Analysis with 

g-Prior Distributions", in P. K. Goel and A. Zellner, (eds.), Bayesian Inference and 
Decision Techniques. New York: Elsevier Science Publishing Company, Inc., pp. 
233-243. 


	Article Contents
	p.S264
	p.S265
	p.S266
	p.S267
	p.S268
	p.S269
	p.S270
	p.S271
	p.S272

	Issue Table of Contents
	Philosophy of Science, Vol. 63, Supplement. Proceedings of the 1996 Biennial Meetings of the Philosophy of Science Association. Part I: Contributed Papers (Sep., 1996), pp. i-vi+S1-S348
	Front Matter [pp.i-iv]
	Preface [pp.v-vi]
	Philosophy of Biology
	Darwinism, Process Structuralism, and Natural Kinds [pp.S1-S9]
	Random Drift and the Omniscient Viewpoint [pp.S10-S18]
	Innateness and Canalization [pp.S19-S27]
	Theoretical Modeling and Biological Laws [pp.S28-S35]
	Heroic Antireductionism and Genetics: A Tale of One Science [pp.S36-S45]

	Space-Time Issues
	Inferences from Phenomena in Gravitational Physics [pp.S46-S54]
	Is There a Syntactic Solution to the Hole Problem? [pp.S55-S62]
	Strange Couplings and Space-Time Structure [pp.S63-S70]
	Gauge Invariance, Cauchy Problem, Indeterminism, and Symmetry Breaking [pp.S71-S79]
	Why General Relativity Does Need an Interpretation [pp.S80-S88]

	Methodology Naturalized and Contextualized
	Understanding Bias in Scientific Practice [pp.S89-S97]
	The Empirical Character of Methodological Rules [pp.S98-S106]
	Points East and West: Acupuncture and Comparative Philosophy of Science [pp.S107-S115]
	Instrumental Rationality and Naturalized Philosophy of Science [pp.S116-S124]

	Color and Color Vision
	The Evolution of Color Vision without Colors [pp.S125-S133]
	Can Colour Be Reduced to Anything? [pp.S134-S142]
	True Theories, False Colors [pp.S143-S150]

	Philosophical Issues in Quantum Theory
	On What Being a World Takes Away [pp.S151-S158]
	Exposing the Machinery of Infinite Renormalization [pp.S159-S167]

	Causation and Explanation
	Evidence and Association: Epistemic Confusion in Toxic Tort Law [pp.S168-S176]
	Completeness and Indeterministic Causation [pp.S177-S184]
	Testament of a Recovering Eliminativist [pp.S185-S193]
	Stretched Lines, Averted Leaps, and Excluded Competition: A Theory of Scientific Counterfactuals [pp.S194-S201]

	Philosophical Issues in Cognitive Science
	Theories in Children and the Rest of Us [pp.S202-S210]
	Mental Models in Data Interpretation [pp.S211-S219]
	Similarity as an Intertheory Relation [pp.S220-S229]
	On the Use of Visualizations in the Practice of Science [pp.S230-S238]
	Tensor Products and Split-Level Architecture: Foundational Issues in the Classicism-Connectionism Debate [pp.S239-S247]

	Confirmation
	Novelty, Severity, and History in the Testing of Hypotheses: The Case of the Top Quark [pp.S248-S255]
	Confirmation and the Indispensability of Mathematics to Science [pp.S256-S263]
	The Curve Fitting Problem: A Bayesian Approach [pp.S264-S272]
	Oracles, Aesthetics, and Bayesian Consensus [pp.S273-S280]
	When Several Bayesians Agree That There Will Be No Reasoning to a Foregone Conclusion [pp.S281-S289]

	Realism and Anti-Realism
	A Paradox for Empiricism (?) [pp.S290-S297]
	Antirealist Explanations of the Success of Science [pp.S298-S305]
	Scientific Realism and the 'Pessimistic Induction' [pp.S306-S314]
	Ontic Realism and Scientific Explanation [pp.S315-S321]

	Modal Interpretations of Quantum Theory
	Logical Foundations for Modal Interpretations of Quantum Mechanics [pp.S322-S329]
	Possible Worlds in the Modal Interpretation [pp.S330-S337]
	Van Fraassen on Preparation and Measurement [pp.S338-S346]

	Back Matter [pp.S347-S347]