11827 31..59 Laplace’s Demon and the Adventures of His Apprentices Roman Frigg, Seamus Bradley, Hailiang Du, and Leonard A. Smith*y The sensitive dependence on initial conditions ðSDICÞ associated with nonlinear models imposes limitations on the models’ predictive power. We draw attention to an additional limitation than hasbeen underappreciated,namely, structural model error ðSMEÞ. A model has SME if the model dynamics differ from the dynamics in the target system. If a non- linear model has only the slightest SME, then its ability to generate decision-relevant pre- dictions is compromised. Given a perfect model, we can take the effects of SDIC into account by substituting probabilistic predictions for point predictions. This route is fore- closed in the case of SME, which puts us in a worse epistemic situation than SDIC. 1. Introduction. The sensitive dependence on initial conditions ðSDICÞ associated with nonlinear models imposes limitations on the models’ pre- dictive power. These limitations have been widely recognized and exten- Received December 2012; revised June 2013. *To contact the authors, please write to: Roman Frigg, Department of Philosophy, Logic, and Scientific Method, London School of Economics and Political Science; e-mail: r.p.frigg@lse .ac.uk. Seamus Bradley, Munich Centre for Mathematical Philosophy, Ludwig-Maximilians- Universität München; e-mail: seamus.bradley@lrz.uni-muenchen.de. Hailiang Du, Centre for the Analysis of Time Series, London School of Economics and Political Science; e-mail: h.l.du@lse.ac.uk. Leonard A. Smith, Centre for the Analysis of Time Series, London School of Economics and Political Science; e-mail: lenny@maths.ox.ac.uk. yWork for this article has been supported by the London School of Economics’s Grantham Research Institute on Climate Change and the Environment and the Centre for Climate Change Economics and Policy funded by the Economics and Social Science Research Council and Munich Re. Frigg acknowledges financial support from the Arts and Humanities Research Council–funded ManagingSevere Uncertainty project andgrant FFI2012-37354 of theSpanish Ministry of Science and Innovation ðMICINNÞ. Bradley’s research was supported by the Alex- ander von Humboldt Foundation. Smith would also like to acknowledge continuing support from Pembroke College, Oxford. We would like to thank Wendy Parker, David Stainforth, Erica Thompson, and Charlotte Werndl for comments on earlier drafts and helpful discussions. Philosophy of Science, 81 (January 2014) pp. 31–59. 0031-8248/2014/8101-0009$10.00 Copyright 2014 by the Philosophy of Science Association. All rights reserved. 31 sively discussed.1 In this article we draw attention to an additional problem than has been underappreciated, namely, structural model error ðSMEÞ. A model has SME if the model dynamics differ from the dynamics in the target system. The central claim of this article is that if a nonlinear model has only the slightest SME, then its ability to generate decision-relevant probabilistic predictions is compromised. We will also show that SME in fact puts us in a worse epistemic situation than SDIC. Given a perfect model, we can take the effects of SDIC into account by substituting probabilistic predictions for point predictions. This route is foreclosed in the case of SME, which relegates both point predictions and accurate prob- abilistic predictions to the sweet land of idle dreams. To reach our conclusion, we retell the tale of Laplace’s demon, but with a twist. In our rendering of the tale, the Demon has two apprentices, a Senior Apprentice and a Freshman Apprentice. The abilities of the apprentices fall short of the Demon’s in ways that turn them into explorers of SDIC and SME. By assumption, the Demon can compute the unabridged truth about everything; comparing his predictions with those of the apprentices will reveal the ways in which SDIC and SME curtail our predictive abilities.2 In section 2 we introduce our three protagonists as well as basic elements of dynamical systems theory, which provides the theoretical backdrop against which our story is told. In section 3 we follow the apprentices on various adventures that show how predictions break down in the presence of SME. In section 4 we provide a general mathematical argument for our conclusion, thereby defusing worries that the results in section 3 are idio- syncrasies of our example and that they therefore fail to carry over to other nonlinear models. In section 5 we briefly discuss a number of scientific modeling endeavors whose success is threatened by problems with SME, which counters the charge that our analysis of SME is philosophical hair- splitting without scientific relevance. In section 6 we suggest a way of em- bracing the problem, and in section 7 we draw some general conclusions. 2. The Demon and His Apprentices. Laplace ð1814Þ invites us to consider a supreme intelligence who is able both to identify all basic components of 1. For a discussion of the unpredictability associated with nonlinear systems, see Werndl ð2009Þ and references therein. For discussions of chaos more generally, see, e.g., Smith ð1992, 1998, 2007Þ, Batterman ð1993Þ, and Kellert ð1993Þ. 2. In other tellings of the tale, we have referred to this triad as the Demon, his Appren- tice, and the Novice; the impact of chaos on the Demon is discussed in Smith ð1992Þ, and his Apprentice was introduced in Smith ð2007Þ. Of course, if the universe is in fact stochastic, then the Demon will make perfect probability forecasts and appears rather similar to I. J. Good’s Infinite Rational Org. In a deterministic universe, it is the ðseniorÞ Apprentice who shares the similarity of perfect probabilistic forecasts. 32 ROMAN FRIGG ET AL. nature and the forces acting between them and to observe these compo- nents’ initial conditions. On the basis of this information, the Demon knows the deterministic equations of motion of the world and uses his unlimited computational power to solve them. The solutions of the equations of mo- tion together with the initial conditions tell him everything he wants to know so that “nothing would be uncertain and the future, as the past, would be present to ½his� eyes” ð4Þ. This operationally omniscient creature is now known as Laplace’s Demon. Let us introduce some formal apparatus in order to give a precise state- ment of the Demon’s capabilities. In order to predict the future, the Demon possesses a mathematical model of the world. It is part of Laplace’s original scenario that the model is a model of the entire world. However, nothing in what follows depends on the model being global in this sense, and so we consider a scenario in which the Demon predicts the behavior of a partic- ular part or aspect of the world. In line with much of the literature on modeling, we refer to this part or aspect of the world as the target system. Mathematically modeling a target system amounts to introducing a dy- namical system, X ; ft; mð Þ, which represents that target system. As indicated by the notation, a dynamical system consists of three elements. The first element, the set X , is the system’s state space, which represent states of the target system. The second element, ft, is a family of functions mapping X onto itself, which is known as the time evolution: if the system is in state x0 ∈ X at time t 5 0, then it is in y 5 ft x0ð Þ at some later time t. The state x0 is called the system’s initial condition. In what follows we assume that ft is deterministic.3 For this reason, calculating y 5 ft x0ð Þ for some future time t and a given initial condition is making a point prediction. In the dynamical systems we are concerned with in this article, the time evolution of a system is generated by the repeated application of a map U at discrete time steps: ft 5 U t, for t 5 0; 1; 2; : : : ,4 where Ut is the result of applying U t times. The third element, m, is the system’s measure, allowing us to say that parts of X have certain sizes. With this in place, we can describe Laplace’s Demon as a creature with the following capabilities: 1. Computational Omniscience: he is able to calculate y 5 ft xð Þ for all t and for any x arbitrarily fast. 2. Dynamical Omniscience: he is able to formulate the true time evo- lution ft of the target system. 3. In fact, it suffices for ft to be forward deterministic; see Earman ð1986, chap. 2Þ. 4. This is a common assumption. For an introduction to dynamical systems, see Arnold and Avez ð1968Þ. LAPLACE’S DEMON AND HIS APPRENTICES 33 3. Observational Omniscience: he is able to determine the true initial condition x0 of the target system. If these conditions were met, the Demon could compute the future with certainty. Laplace is quick to point out that the human mind “will always remain infinitely removed” from the Demon’s intelligence, of which it offers only a “feeble idea” ð1814, 4Þ. The question then is what these short- comings are and how they affect our predictive abilities. It is a curious fact that while the failure of computational and observational omniscience has been discussed extensively, relatively little has been said about how not being dynamically omniscient affects our predictive abilities.5 The aim of this article is to fill this gap. To aid our explorations, we provide the Demon with two apprentices— the Senior Apprentice and the Freshman Apprentice. Like the master, both apprentices are computationally omniscient. The Demon has shared the gift of dynamical omniscience with the Senior Apprentice: they both have the perfect model. But the Demon has not granted the Senior observational om- niscience: she has only noisy observations and can specify the system’s ini- tial condition only within a certain margin of error. The Freshman has not yet been granted either observational or dynamical omniscience: he has nei- ther a perfect model nor precise observations. Both apprentices are aware of their limitations and come up with cop- ing strategies. They have read Poincaré and Lorenz, and they know that a chaotic system’s time evolution exhibits SDIC: even arbitrarily close initial conditions will follow very different trajectories. This effect, also known as the butterfly effect, makes it misinformative to calculate y 5 ft z0ð Þ for an approximate initial condition z0 because even if z0 is arbitrarily close to the true initial condition x0, ft z0ð Þ and ftðx0Þ will eventually differ significantly. To account for their limited knowledge about initial conditions, each ap- prentice comes up with a probability distribution over relevant initial states, which accounts for their observational uncertainty about the system’s ini- tial condition. Call such a distribution p0 xð Þ; the subscript indicates that the distribution describes uncertainty in x at t 5 0.6 The relevant question then is how initial probabilities change over the course of time. To answer this question, they use ft to evolve p0 xð Þ forward in time ði.e., to calculate ptðxÞÞ. We use square brackets to indicate that ft½ p0ðxÞ� is the forward time image of p0ðxÞ. The time evolution of the distribution is given by the Frobenius-Perron operator ðBerger 2001, 126–27Þ. If the time evolution is one-to-one, this operator reduces to ptðxÞ 5 p0ðf2tðxÞÞ. 5. See, however, Smith ð2002Þ and McWilliams ð2007Þ. 6. Our argument does not trade on the specific form of p0 xð Þ; we assume p0 xð Þ is ideal given the information available. 34 ROMAN FRIGG ET AL. The idea is simple and striking: if p0ðxÞ provides them with the proba- bility of finding the system’s state at a particular place in X at t 5 0, then pt xð Þ is the probability of finding the system’s state at a particular place at any later time t. And the apprentices do not only make the ðtrivialÞ state- ment that pt xð Þ is a probability distribution in a purely formal sense of be- ing an object that satisfies the mathematical axioms of probability; they are committed to the ðnontrivialÞ claim that the probabilities are decision rel- evant. In other words, the apprentices take pt xð Þ to provide us with pre- dictions about the future of sufficient quality that we ought to place bets, set insurance premiums, or make public policy decisions according to the probabilities given to us by pt xð Þ. This solves the Senior Apprentice’s problem, but the Freshman has a further obstacle to overcome: the fact that his model has a structural model error ðSMEÞ. We face a SME when the model’s functional form is rele- vantly different from that of the true system. In technical terms, by SME we mean the condition when the dynamical equations of the model differ from the true equations describing the system under study: in some cases we can write fM t 5 fT t 1 dt, where f M t is the dynamics of the model, fT t is the true dynamics of the system, and dt is the difference between the two. 7 The Freshman’s solution to this problem is to adopt what he calls the closeness-to-goodness link. The leading idea behind this link is the maxim that a model that is close enough to the truth will produce predictions that are close enough to what actually happens to be good enough for a certain predictive task. Given that we consider time evolutions that are generated by the iterative application of a map, this idea can be made precise as follows. Let UT be the Demon’s map ðwhere the subscript T stands for ‘True’, as the Demon has the true modelÞ, and let UF be the Freshman’s approximate time evolution. Then DU :5 UT 2 UF is the difference between the two maps, assuming they share the same state space. Furthermore, let pT t xð Þ be probabilities obtained under the true time evolution ðwhere fTt 5 U t TÞ, and pFt ðxÞ the probabilities that result from the approximate time evolution ðwhere fFt 5 UtFÞ; Dpðx; tÞ is the difference between the two. The closeness-to-goodness link says that if DU is small, then Dpðx; tÞ is small too for all times t, presupposing an appropriate notion of being small. The notion of being small can be explained in different ways without altering the 7. Note that this equation assumes that the model and the system share the same state space, that is, that they are subtractable ðsee Smith 2006Þ. They need not be. Also note that SME contrasts with parameter uncertainty, where the model shares the true system’s mathematical structure, yet the true values of certain parameters are uncertain in the model. Parameters may be uncertain when the mathematical structure is perfect, but they are indeterminate given SME: no set of parameter values will suffice to perfect the model. LAPLACE’S DEMON AND HIS APPRENTICES 35 conclusion. Below we quantify DU in terms of the maximal one-step error and Dpðx; tÞ in terms of the relative entropy of the two distributions. 3. The Apprentices’ Adventures. The Demon schedules a tutorial. The Se- nior Apprentice claims that while her inability to identify the true initial condition prevents her from making valid point predictions, her probability forecasts are good in the sense that, conditioned on the information the De- mon allows her ðspecifically her initial probability distribution p0 xð ÞÞ, she is able to produce a decision-relevant distribution pt xð Þ for all later times t. The Freshman does not want to play second fiddle and ventures the bold claim that dynamical omniscience is as unnecessary as observational om- niscience and that he can achieve the decision relevance using an imperfect model and the closeness-to-goodness link. The all-knowing Demon requires them to put their skills to test in a concrete situation in ecology: the evolution over time of a population of rapidly reproducing fish in a pond. To this end, they agree to introduce the population density ratio rt: the number of fish per cubic meter at time t divided by the maximum number of fish the pond could accommodate per cubic meter. Hence rt lies in the unit interval 0; 1½ �. Then they go away and study the situation. After a while they reconvene and compare notes. The Freshman suggests that the dynamics of the system can be modeled successfully with the well- known logistic map: rt11 5 4rt 1 2 rtð Þ; ð1Þ where the difference between times t and t 1 1 is a generation ðwhich, for ease of presentation, we assume to be 1 weekÞ. Recall from section 2 that a dynamical system is a three-partite entity consisting of a state space X , a time evolution operator ft ðwhere ft 5 Ut if the time evolution is gener- ated by the repeated application of a map U at discrete time stepsÞ, and a measure m. The Freshman’s model is a dynamical system that consists of the state space X 5 0; 1½ �; his time evolution fF t is generated by iteratively applying 4rt 1 2 rtð Þ, which is UF; m is the standard Lebesgue measure on 0; 1½ �. The Demon and the Senior Apprentice know the true dynamical law for rt: ~rt11 5 1 2 εð Þ4~rt 1 2 ~rtð Þ 1 ε 16 5 ~rt 1 2 2~r 2 t 1 ~r 3 t � �� � ; ð2Þ where ε is a small parameter. The tilde notation is introduced and justified in Smith ð2002Þ. The right-hand side of equation ð2Þ, which we call the quartic map, is UT; applying UT iteratively yields f T t . 36 ROMAN FRIGG ET AL. It is immediately clear that the Freshman’s model lacks a small structural perturbation: as ε → 0 the Demon’s map converges toward the Freshman’s. Figure 1 shows both UT and UF for ε 5 0:1, illustrating how small the dif- ference between the two is. We now associate the DU with f F t ’s one-step error: the maximum dif- ference between fF t and fT t xð Þ for x ranging over the entire X . The maxi- mum one-step error of the model is 5 � 1023 at x 5 0:85344, where rt11 5 0:50031 and ~rt11 5 0:49531, and hence it is reasonable to say that DU is small. Applying the closeness-to-goodness link, the Freshman now expects Dpðx; tÞ to be small too. That is, starting with the same initial probability distribution p0 xð Þ, he would expect pTt xð Þ and pFt ðxÞ to be least broadly similar. We will now see that the Freshman is mistaken. Since it is impossible to calculate pTt xð Þ and pFt ðxÞ with pencil and paper, we resort to computer simulation. To this end, we partition X into 32 cells, which, in this context, are referred to as bins. These bins are now the atoms of our space for evaluating predictions: in what follows we calculate the Figure 1. Equation ð1Þ in dotted line and equation ð2Þ in shaded line, with rt and ~rt on the X-axis and rt11 and ~rt11 on the Y-axis. Color version available as an online enhancement. LAPLACE’S DEMON AND HIS APPRENTICES 37 probabilities of the system’s state x being in a certain bin. This is of course not the same as calculating a continuous probability distribution, but since nothing in what follows hangs on the difference between a continuous distribution and one over bins, and for the sake of notational ease, we refrain from introducing a new variable and take ‘pTt xð Þ’ and ‘pFt ðxÞ’ to refer to the probabilities of bins. Similarly, a computer cannot handle analytical functions ðor real numbersÞ, and so we represent p0 xð Þ by an ensemble of 1,024 points. We first draw a random initial condition ðaccording to the invariant measure of the logistic mapÞ. By assumption this is the true ini- tial condition of the system at t 5 0, and it is designated by the cross in figure 2a. We then choose an ensemble of 1,024 points consistent with the true initial condition. These 1,024 points form our ensemble, shown as a distribution in figure 2a. Dividing the numbers on the Y-axis by 1,024 yields an estimate of the probability for the system’s state to be in a particular bin. Figure 2. Evolution of the initial probability distribution under the Freshman’s approximate dynamics ðblackÞ and the Senior’s true dynamics ðgrayÞ. The gray cross marks the Demon’s evolution of the true initial condition; the black cross is the Freshman’s evolution of the true initial condition. Y-axis in d is rescaled to make the details more visible. Color version available as an online enhancement. 38 ROMAN FRIGG ET AL. We now evolve all these points forward both under the Senior’s dy- namics ðgray linesÞ and the Freshman’s dynamics ðblack linesÞ. Figures 2b– 2d show how many points there are in each bin at t 5 2, t 5 4, and t 5 8. While the two distributions overlap relatively well after 2 and 4 weeks, they are almost completely disjoint after 8 weeks. Hence, for this x0 these calculations show the failure of the closeness-to-goodness link: DU being small does not imply that Dpðx; tÞ is also small for all t. In fact, for t 5 8, Dpðx; tÞ is as large as can be because there is no overlap at all between the two distributions.8 Two important points emerge from this example. The first point is that even though chaos undercuts point predictions, one can still make informa- tive probabilistic predictions. The position of the gray cross is appropri- ately reflected by the gray distribution at all times: the gray probability dis- tribution remains maximally informative about the system’s state given the information available. The second and more unsettling point is that the ability to reliably make decision-relevant probabilistic forecasts is lost if nonlinearity is combined with SME. Even though the Freshman’s dynamics are very close to the De- mon’s, his probabilities are off track: he regards events that do not happen as very likely, while he regards what actually happens as very unlikely. So his predictions here are worse than useless: they are fundamentally mis- leading. Hence, simply moving an initial distribution forward in time under the dynamics of a model ðeven a good oneÞ need not yield decision-relevant evidence. Even models that yield deep physical insight can produce disas- trous probability forecasts. The fact that a small SME can destroy the utility of a model’s predictions is called the hawkmoth effect.9 The effect illus- trates that the closeness-to-goodness link fails. This example shows that what truly limits our predictive ability is not SDIC but SME. In other words, it is the hawkmoth effect rather than the butterfly effect that decimates our capability to make decision-relevant forecasts. We can mitigate against the butterfly effect by replacing point forecasts with probabilistic forecasts, but we have no comparable move with force against the hawkmoth effect. And the situation does not change in the long run. It is true that distributions will spread with time and as t → `. As the distribution approaches the system’s natural measure it be- comes uninformative. But becoming uninformative and being misleading are very different vices. 8. This notion is made precise in terms of relative entropy below. 9. Thompson ð2013Þ introduced this term in analogy to the butterfly effect. The term also emphasizes that SME yields a worse epistemic position than SDIC: hawkmoths are better camouflaged and less photogenic than butterflies. LAPLACE’S DEMON AND HIS APPRENTICES 39 One could object that the presentation of our case is biased in various ways. The first alleged bias is the choice of the particular initial distribu- tion shown in figure 2a. This distribution, so the argument goes, has been carefully chosen to drive our point home, but most other distributions would not be misleading in such a way, and our result only shows that unexpected results can occur every now and then but does not amount to a wholesale rejection of the closeness-to-goodness link. There is of course no denying that the above calculations rely on a particular initial distribution, but that realization does not rehabilitate the closeness-to-goodness link. We have repeated the same calculations with 2,048 different initial distributions ðchosen randomly according to the nat- ural measure of the logistic mapÞ, and so we obtain 2,048 pairs of pTt xð Þ and pFt ðxÞ for t 5 2, t 5 4, and t 5 8. So far we operated with an intuitive notion of the difference between two distributions. But in order to analyze the 2,048 pairs of distributions, we need a formal measure of the difference between two distributions. We choose the so-called relative entropy: SðpFt jpTt Þ :5 E1 0 pFt ln pFt pTt � � dx; where ‘ln’ is the natural logarithm.10 The relative entropy provides a mea- sure for the overlap of two distributions. If the distributions overlap per- fectly—pFt equals p T t —their ratio is then one in the logarithm, and the entropy is zero; the more dissimilar the distributions, the higher the value of SðpFt jpTt Þ. Hence, it is reasonable to consider Dpðx; tÞ :5 SðpFt jpTt Þ. Figure 3 shows a histogram of the relative entropy of our 2,048 distributions at t 5 8. The histogram shows that the Freshman’s probabilities are in line with the Senior’s only in about a quarter of the cases. Almost half of the dis- tribution pairs have relative entropy 7 or more. The two distributions shown in figure 2d have a relative entropy of 8.23.11 So our histogram shows that at t 5 8 almost half of all distribution pairs are as disconnected as those in figure 2d and, hence, are seriously misleading. There is a temptation to respond that this does not show that probabilities are useless; it only shows that we should not use these probabilities when they are misleading. The problem with this suggestion is that outside our 10. In our case the integral becomes a sum over the bins of the partition. For a discussion of relative entropy and information theory, see Curd and Thomas ð1991Þ. 11. Given that our ensemble is only finite, we assign the probability 1=ð1; 024 � 32Þ to any bin with no ensemble member at all. If that bin occurs, then the entropy would be ~10.4 nats. Hence, ~10.4 reflects the maximum value of the entropy that can be observed in these experiments. 40 ROMAN FRIGG ET AL. thought experiment we have no means to tell when that happens. The only thing we have is the model, which we know to be imperfect in various ways. Our tale shows that model probabilities and probabilities in the world can separate dramatically, but we do not know where and when. In cases in which we have no means of separating the good from bad cases,12 we had better be on guard. The second alleged bias is the use of an 8-week forecast: had we used a different lead time, say 2 or 4 weeks, the Freshman’s endeavors would have been successful because at t 5 4 his distribution is close the Senior’s. Un- fortunately this is insufficient: regularly getting the probability distribution only slightly wrong is enough to face catastrophic consequences. To see this, let us observe the Freshman’s next endeavor. Still not ac- cepting the Demon’s evaluation, he opens the Pond Casino. The Pond Ca- sino functions like a normal casino in that it offers bets at certain odds on 12. In the case of recurrent dynamics, we may have such means; see Smith ð1992Þ. Figure 3. Histogram of the relative entropy of 2,048 pairs of distributions at t 5 8. Color version available as an online enhancement. LAPLACE’S DEMON AND HIS APPRENTICES 41 certain events, the difference being that the events on which punters can place bets are not outcomes of the spinning of a roulette wheel but future values of rt. The Freshman takes the above division of the unit interval into 32 bins, which are his basic events ðsimilar to the slots of a roulette wheelÞ, and offers to take bets based on a four-step forecast. More specifically, playing a ‘round’ in the Pond Casino at time t amounts to placing a bet at t on bin Bi, where the outcome is whether the system is in Bi at t 1 4. So if you bet, say, on B31 at t 5 3, you win if rt57 is in B31. Had the Freshman offered bets on an eight-step forecast, one would expect him to fail given that his probabilities at t 5 8 are fundamentally misleading. Given that his probabilities look close to the Senior’s at t 5 4, however, he holds the hope that he will do well. What is the payout for a winning bet? Let A be an event that can obtain in whatever game is played in a casino. The odds o Að Þ the casino offers on A are the ratio of payout to stake. If, for instance, the casino offers o Að Þ 5 2 ð‘two for one’Þ, a punter who bets £1 on A gets £2 back when A obtains. Within the context of standard probability theory, odds are usually taken to be the reciprocals of probabilities: o Að Þ 5 1=p Að Þ. When flipping an un- biased coin, the probability for heads is 0.5, and if you bet £1 on heads and win, you get £2 back.13 The Freshman follows this convention and takes the reciprocals of pF t ðxÞ in a four-step forecast as his odds. Now a group of nine punters enters the casino. Each has £1,000, and they adopt a simple strategy. In every round, the first punter bets 10% of his total wealth on events with probability in the interval ð1=2; 1�. We call this strategy fractional betting ðwith f 5 1=10Þ for the probability interval ð1=2; 1�.14 The second punter does the same with events with probability in ð1=4; 1=2�, the third with events with ð1=8; 1=4�, and so on, with ð1=16; 1=8�, ð1=32; 1=16�, ð1=64; 1=32�, ð1=128; 1=64�, ð1=256; 1=128�, ½0; 1=256�. The minimum bet the casino accepts is £1, so if a punter’s wealth falls below £1 he is effectively broke and has to leave the game. Using the same initial distribution as above ðshown in fig. 2aÞ, the Pond Casino now offers odds reflecting the Freshman’s probabilities. The out- comes of bets are of course determined by the true dynamics. We now gen- erate a string of outcomes based on the true dynamics and trace the punters’ 13. We use so-called odds-for throughout this article. They give the ratio of total payout to stake. Odds-to give the ratio of net gain to stake ðnet gain is the payout minus the stake paid for the betÞ. Odds-for and odds-to are interdefinable: if the odds-for for an event are a=b, then the odds-to are a 2 bð Þ=b. Since in this case odds-for are equal to 1=p Að Þ, the odds-to are 1 2 p Að Þ=p Að Þ, which is equal to p :Að Þ=p Að Þ, where :A is ‘not A’. 14. The argument does not depend on fractional betting, which we chose for its sim- plicity. Our conclusions are robust in that they hold for other betting strategies. 42 ROMAN FRIGG ET AL. wealth, which we display in figure 4 as a function of the number of rounds played. We see that the punters have the time of their lives. Three of them make huge gains very soon, and a further four follow suit a bit later. After 2,500 rounds, seven out of nine punters have increased their wealth at least ten- fold, while only two of them have gone bust. So the punters take a huge amount of money off the casino. There is a temptation to make the same move as above and argue that this is a ‘bad luck event’ due to the particular initial distribution, which should not be taken as indicative of the casino’s performance in general. We counter in the same vein and consider again 2,048 randomly chosen ini- tial probability distributions. For each of these we let the game take place as before. If the above was a rare special event, then one would expect to see different results in the other 2,047 runs. Since producing another 2,047 plots like the one seen in figure 4 is not a viable way to present the outcomes, we assume that the casino starts with a capital of £1,000,000 and calculate the time to bust. Figure 5 is a histogram of how the casino per- Figure 4. Wealth of nine punters as a function of the number of rounds played. Color version available as an online enhancement. LAPLACE’S DEMON AND HIS APPRENTICES 43 forms with our 2,048 different initial distributions. Once more the picture is sobering. Most casinos go bust after just a few rounds, and the last one is going out of business after 40 rounds. Offering odds based on pFt ðxÞ is disastrous. Recall that the punters betting against the apprentice are not using any sophisticated strategy and have no extra knowledge to gain an advantage over the house. They are not, for instance, keeping track of the past as clever punters would ðand indeed do in card-counting systems for games like blackjack whereby the bettor exploits the information contained in the past sequence of cardsÞ. In such a scenario the bettor is using more informed probabilities than the implied probabilities of the casino’s odds, and it is indeed no surprise if the casino loses money against such bettors. Our punters are not of this kind. They simply bet on the basis of the values of the odds offered. One punter just bets on all events with implied probabilities in the range ð1=16; 1=8�. The information is entirely sym- metrical—the punters know nothing that the house does not know. Hence, our worry is not just that the apprentice loses money: a punter with access to the system probabilities could obviously do well against the house. Our Figure 5. Histogram of time to bust for 2,048 distributions. Color version available as an online enhancement. 44 ROMAN FRIGG ET AL. worry is that the house does disastrously even against punters who know no more than the house. Frustrated with his failures, the Freshman cannot help himself and starts peeping over the Demon’s shoulder to get the exact initial condition. He convinces the Demon to repeat the entire casino adventure, but rather than moving probability distributions forward in time, he now calculates the trajectory of the true initial condition ðwhich he gleans from the DemonÞ under his dynamical law. This, he thinks, will guarantee him a success. For want of space we do not follow his further adventures in detail, and in fact there is no need to. A look at figure 2 suffices to realize that he has set himself up for yet another fiasco. The gray crosses in figure 2 are the true time evolution of the true initial condition; the black crosses are the Fresh- man’s time evolution of the true initial condition. We see that the trajec- tories of the true initial condition under the two dynamical laws soon be- comes completely different, and any prediction generated with the model is, once again, seriously misleading. So even if the Freshman was observa- tionally omniscient, he would not be able to generate decision-relevant predictions. SME is a serious issue independently of SDIC. The moral is now unavoidable: offering odds according to the probabilities of an imperfect model can be disastrous even when information is entirely symmetrical be- tween all parties. 4. From Example to Generalization. An obvious line of criticism would be to argue that the problems we describe are specific to the logistic map and do not occur in other systems. So the question is, how general are the effects we have discussed in the last section? To answer this question we review a number of mathematical results about the structural stability of dynamical systems. Our conclusion will be sober. There are special cases in which the above effects do not occur,15 but in general there are no such assurances. Not only are there no general stability results; there are in fact mathematical considerations suggesting that the effects we describe are generic. So we urge a shift of the onus of proof: rather than assuming that nonlinear models are structurally stable and asking the skeptic to make his case, the default assumption ought to be that models are not structurally stable and hence exhibit the effects we describe. Using a particular model for predictive purposes therefore requires an argument to the effect that the model is structurally stable. Roughly speaking, a dynamical system is structurally stable if its tra- jectories change only a little if the equation is changed only a little. An- dronov and Pontrjagin ð1937Þ presented the first systematic study of struc- 15. Integrable Hamiltonian systems, which respect the Kolmogorov-Arnold-Moser theorem, being one example with structural stability. LAPLACE’S DEMON AND HIS APPRENTICES 45 tural stability, providing both a definition of structural stability and a theo- rem. They consider a two-dimensional system that is defined on a disk D2 in the plane with the equations dx=dt 5 P x; yð Þ and dy=dt 5 Q x; yð Þ. We obtain the perturbed system by adding a differentiable function to each equa- tion: dx=dt 5 P x; yð Þ 1 p x; yð Þ and dy=dt 5 Q x; yð Þ 1 q x; yð Þ. The origi- nal system is structurally stable if and only if for any real number ε > 0 thereisarealnumberd > 0 such that there exists a smooth ε-homeomorphism hε : D 2 → D2 that transforms the trajectories of the original system into tra- jectories of the perturbed systems. Being an ε-homeomorphism means that whenever the absolute value of both p x; yð Þ and q x; yð Þ as well as their first derivatives are v, and ptðEtÞ 5 v if pFt ðEiÞ≤ v, where v can be any real number so that 0 ≤ v ≤ 1. We call odds thus calculated threshold odds. For the limiting case of v 5 0 the pt Eið Þ correspond to probabilities, and the respective odds correspond to probabilistic odds. It is important to emphasize that the threshold rule applies to all possible events and not only the atoms of the partition—the idea being that one simply does not offer p’s smaller than v no matter what the event under consideration is. In particular, the rule applies simulta- neously to events and their negation. If, for instance, we set v 5 0:2 and have pF t ðEiÞ 5 0:95 ðand hence, by the axioms of probability, pFt ð:EiÞ 5 0:05Þ, then pt Eið Þ 5 :95 and pt :Eið Þ 5 0:2, where :Ei is the negation of Ei ði.e., the nonoccurrence of EiÞ. This move is motivated by the following observation. In figure 2 we see that, based on pFt , we sometimes offer very long odds on events that are in 21. We only consider discrete and countable event spaces. 22. Nonprobability odds have been introduced in Judd ð2007Þ and Smith ð2007Þ. LAPLACE’S DEMON AND HIS APPRENTICES 51 reality ði.e., according to pTt Þ very likely to happen. It is with these events that we run up huge losses. Putting a lower bound on the pt Eið Þ amounts to limiting large odds and thus the amount one pays out for an actual event that one’s model wrongly regarded as unlikely. We now repeat the scenario of figure 4 with one exception: the Fresh- man Apprentice now offers nonprobability odds with a thresholds of v 5 0:05, v 5 0:1, and v 5 0:2. The result of these calculations is shown in figures 7a, 7b, and 7c, respectively. We see that this strategy brings some success. Already a very low threshold of v 5 0:05 undercuts the success of five out of seven punters, and only two still manage to take money off the casino. A slightly higher threshold of v 5 0:1 brings the number of successful punters down to one. So for v 5 0:2 the Freshman Apprentice achieves his goal of running a sustainable casino. The second way of shortening odds is damping. On this method the betting quotients are given by pt Eið Þ 5 1 2 b 1 2 pFt Eið Þ � � , where the damp- ing parameter b is a real number 0 ≤ b ≤ 1. We see that for b 5 1 the pt correspond to probabilities. We call odds thus calculated damping odds. We now repeat the same calculations as above, and the results are very similar ðwhich is why we are not reproducing the graphs hereÞ. For b 5 0:95 only two punters succeed ðindeed the same two as aboveÞ. With a slightly stronger damping of b 5 0:9 only one is still winning ðagain the same as aboveÞ, and for b 5 0:8 all punters are either losing or not playing at all ðbecause no bets in their range are on offerÞ. The moral of this last part of our tale is that shortening odds, either by introducing a threshold or by damping, can provide some protection against losses. In doing so the Freshman has attempted to introduce what we call sustainable odds. There are no doubt better ways to construct sustainable odds and better meet the challenges to their use in decision support. How to construct more useful varieties of sustainable odds is the question for a future project. For now we just note that while probability odds are easier to use, using them leads to disaster. Furthermore, we can regard the amount of deviation of the shortening parameters from their ‘probability limits’ ði.e., the deviation of v from zero and of b from oneÞ as a measure of the model inadequacy: the greater this deviation, the less adequate the model. We would like to point out that also this last part is closer to reality than it seems. The sustainable yet interesting casino is modeled on a coopera- tive insurance company. Rather than playing for gain, the ‘bets’ placed are insurance policies bought to compensate for losses suffered should cer- tain events happen. What makes our insurance a cooperative insurance is its attempt to offer a full payout ðto fully compensate its clientsÞ at the lowest rates that allow it to operate in a sustainable way ðan insurance 52 ROMAN FRIGG ET AL. company that goes bust is of little useÞ. So our nonprobability odds casino has a close real-world cousin, and the morals drawn above are relevant beyond the tale of Laplace’s Demon. So far we have shown that one is all but certain to go bust when allowing bets on model probabilities. The conclusion of our argument might be seen as a decision-theoretic one: that it is pragmatically advantageous to adopt nonprobabilistic odds. This is not the interpretation we favor. We prefer to see it as an epistemological argument, albeit one that involves talk of bet- ting. We are not making any decision-theoretic assumptions in coming to our conclusions. We mean for our agent to be shortening his odds due to epistemological flaws, not just so as to avoid bad outcomes. Talk of casi- nos, betting, and going bust helps to put an epistemic problem into focus— the main point is that the pragmatic flaw ðsystematic and statistically pre- mature ruinÞ points to an epistemological flaw in the agent’s representation of belief. Needless to say, the use of nonprobability odds raises a host of issues. How exactly should nonprobability odds inform decision making? Pre- sented with nonprobability odds, what decision rules should we apply? These are important questions for decision theory and rational choice, but we cannot discuss these here. An attempt to dismiss these issues quickly might be to try to bring these issues back into well-charted territory by denying that nonprobability odds are really sui generis items. Regarding them as such, so the argument goes, is a red herring because, even if we have odds whose inverses do not add up to one, it is trivial to renormalize them, and we then retrieve the homely probabilities for which there are well-worked-out decision theories. Unfortunately things are not as simple. The problem is that the p do not satisfy the axioms of probability even if they are renormalized to add up to one. The source of the problem is that nonprobability odds do not respect the symmetry between betting for and betting against that is enshrined into probabilities. For probabilities, we have p Eð Þ 1 p :Eð Þ 5 1 for any event E.23 Nonprobability odds need not add up to one: p Eð Þ 1 p :Eð Þ can take any value greater than one ðwhich is easy to see in the case of threshold oddsÞ. For this reason the p are not probabilities, and renormalizing is not an easy route back into the well-charted territory of probabilism. And, of course, the renormalized odds need not prove sustainable. Furthermore, one might worry that these nonprobabilistic odds do not have the requisite connection to degrees of belief in order for them to play the role of fixing degrees of belief. That is, one might worry that such odds 23. Odds-for for the negation are derived from probabilities by taking p :Eð Þ 5 1 2 p Eð Þ and then applying the shortening rule. LAPLACE’S DEMON AND HIS APPRENTICES 53 Figure 7. Wealth of punters as a function of the number of rounds played with the casino offering threshold odds, with thresholds of ðaÞ 0.05, ðbÞ 0.1, and ðcÞ 0.2. Color version available as an online enhancement. allow one to avoid the pragmatically bad consequences of model error, but they do not line up with degrees of belief. For example, Williamson ð2010Þ argues that symmetry—the claim that your limiting price to sell a bet should be equal to your limiting price to buy that bet—is an intuitive part of what he calls the ‘betting interpretation’ of degrees of belief: “While we do in, practice, buy and sell bets at different rates, the rate at which we would be prepared to both buy and sell, if we had to, remains a plausible inter- pretation of strength of belief ” ð37Þ. Others disagree and do suggest that nonsymmetrical odds can serve as a ðperhaps partialÞ characterization of strength of belief ðsee, e.g., Dempster 1961; Good 1962; Levi 1974; Suppes 1974; Kyburg 1978; Walley 1991; Bradley 2012Þ. If one knows one’s model is imperfect, it is hard to see a successful case in favor of symmetrical odds from model-based probabilities as relevant to rational belief or action. We would not like to leave the issue without a brief remark about Dutch books. One might worry that our Freshman is subject to a Dutch book when he offers nonprobabilistic odds. That is, one might worry that a smarter bettor might be able to guarantee to make money out of the apprentice by buying a set of bets that guarantee the bettor a sure gain, whatever happens. This is not the case. This is for the same reason that casinos cannot be Dutch Figure 7. Continued. LAPLACE’S DEMON AND HIS APPRENTICES 55 booked. In a casino, you cannot bet on ‘not red’ with symmetrical proba- bility to ‘red’. In connection with this point, it is worth pointing out an analogy between the current project and the standard Dutch book argument. The latter argues from a pragmatic flaw ðbeing subject to a Dutch bookÞ to an epistemic conclusion ðyour degrees of belief ought to satisfy the probability calcu- lusÞ. We take ourselves to be doing the same sort of thing: we argue from a pragmatic flaw ðhouses go bust faster than expected, statisticallyÞ to an epistemic conclusion ðnonprobability oddsÞ. That is, we do not take our- selves to be merely making the point that one can avoid going bankrupt by shortening one’s odds. We are making the stronger claim that in the pres- ence of model error, model probabilities sanction only nonprobability de- grees of belief. We conclude this section with an explanation of why one final response to our argument will not succeed. One might respond that we get wrong probabilities because we use probabilities in a bad way. From a Bayesian perspective one could point out that by using one particular model to gen- erate predictions we have implicitly assigned a prior probability of 1 to that model. Given that we have no reason to assume that this model is true— indeed, there are good reasons to assume that it is not—this confidence is misplaced, and one really ought to take uncertainty about the model into account. This can be done by using probabilities: put a probability measure on the space of all models that expresses our uncertainty about the true model, generate predictions with all those models, and take some kind of weighted aggregate of the result. This, so the argument goes, would avoid the above problem, which is rooted in completely ignoring second-order uncertainty about models. Setting aside the fact that it is unfeasible to generate predictions with an entire class of models, in practice there are theoretical limitations that ground the project. The first problem is that it is not clear how to circum- scribe the relevant model class. This class would contain all possible mod- els of a target system. But the phrase ‘all models’ masks the fact that math- ematically this class is not defined, and indeed it is not clear whether it is definable at all. The second problem is that even if one could construct such a class in one way or another, there are both technical and conceptual problems with putting an uncertainty measure over this class. The technical problem is that the relevant class of models would be a class of functions, and function spaces do not come equipped with measures. In fact, it is not clear how to put a measure on function spaces.24 The conceptual issue is that even if the technical problem could be circumvented somehow, what 24. This is a well-known problem in the foundations of statistical mechanics; see Frigg and Werndl ð2012Þ. 56 ROMAN FRIGG ET AL. measure would we chose? The model class will contain an infinity of models, and it is at best unclear whether there is a nonarbitrary measure on such a set that reflects our uncertainty about model choice. And even if one can form a revised probability distribution in light of higher-order doubt about the model, it will still be inaccurate relative to the distribu- tion given by the true model.25 Finally, we, like the Freshman, are restricted to sampling from the set of all conceivable models, which need not contain a perfect model even if such a thing exists. For these reasons this response does not seem to be workable. 7. Conclusion. We have argued that model imperfection in the presence of nonlinear dynamics is a poison pill: treating model outputs as probabil- ity predictions can be seriously misleading. Many operational probability forecasts are therefore unreliable as a guide to rational action if interpreted as providing the probability of various outcomes. Yet not all the models underlying these forecasts are useless. This raises the question, what conclusion we are to draw from the insight into the unreliability of models? An extreme reaction would be to simply get rid of them. But this would probably amount to throwing out the baby with the bathwater because imperfect models can be qualitatively infor- mative. Restricting models to tasks of purely qualitative understanding is also going too far. The question is how we can use the model where it provides insight while guarding against damage where it does not. Finding a way of doing this is a challenge for future research. We have indicated that one possible route could be to use nonprobability odds, but more needs to be said about how these can be used to provide decision support, and there may be altogether different ways of avoiding the difficulties we sketch. We hope this article leads merely to a wider acknowledgment that these challenges are important and their solution nontrivial. REFERENCES Andronov, Aleksandr A., and Lev Pontrjagin. 1937. “Systèmes Grossiers.” Doklady of the Academy of Sciences of the USSR 14:247–51. Arnold, Vladimir I., and André Avez. 1968. Ergodic Problems of Classical Mechanics. New York: Benjamin. Barreira, Luis, and Claudia Valls. 2012. Ordinary Differential Equations: Qualitative Theory. Wash- ington, DC: American Mathematical Society. Batterman, Robert W. 1993. “Defining Chaos.” Philosophy of Science 60:43–66. Berger, Arno. 2001. Chaos and Chance: An Introduction to Stochastic Aspects of Dynamics. Ham- burg: de Gruyter. 25. We thank an anonymous referee for drawing our attention to this point. LAPLACE’S DEMON AND HIS APPRENTICES 57 Bradley, Seamus. 2012. “Dutch Book Arguments and Imprecise Probabilities.” In Probabilities Laws and Structures, ed. Dennis Dieks, Wenceslao Gonzalez, Stephan Hartmann, Michael Stoeltzner, and Marcel Weber, 3–17. Berlin: Springer. Curd, Thomas M., and Joy A. Thomas. 1991. Elements of Information Theory. New York: Wiley. Dempster, Arthur. 1961. “Upper and Lower Probabilities Induced by a Multivalued Mapping.” Annals of Mathematical Statistics 38:325–39. Earman, John. 1986. A Primer on Determinsim. Dordrecht: Reidel. Fan, Shu, and Rob J. Hyndman. 2012. “Short-Term Load Forecasting Based on a Semi-parametric Additive Model.” IEEE Transactions on Power Systems 27:134–41. Frigg, Roman, Leonard A. Smith, and Dave A. Stainforth. 2013. “The Myopia of Imperfect Climate Models: The Case of UKCP09.” Philosophy of Science 80, no. 5, forthcoming. Frigg, Roman, and Charlotte Werndl. 2012. “Demystifying Typicality.” Philosophy of Science 79:917–29. Good, Irving J. 1962. “Subjective Probability as the Measure of a Non-measurable Set.” In Logic, Methodology and Philosophy of Science, ed. Ernest Nagel, Patrick Suppes, and Alfred Tarski, 319–29. Stanford, CA: Stanford University Press. Hagedorn, R., and Leonard A. Smith. 2009. “Communicating the Value of Probabilistic Forecasts with Weather Roulette.” Meteorological Applications 16:143–55. Hayashi, Shuhei. 1997. “Invariant Manifolds and the Solution of the C1-Stability and Q-Stability Conjectures for Flows.” Annals of Mathematics 145:81–137. Jenkins, Geoff, James Murphy, David Sexton, Jason Lowe, and Phil Jones. 2009. “UK Climate Projections.” Briefing report, Department for Environment, Food and Rural Affairs, London. Judd, Kevin. 2007. “Nonprobabilistic Odds.” Working paper, University of Western Australia. Judd, Kevin, and Leonard A. Smith. 2004. “Indistinguishable States.” Pt. 2, “The Imperfect Model Scenario.” Physica D 196:224–42. Kellert, Stephen. 1993. In the Wake of Chaos. Chicago: University of Chicago Press. Kyburg, Henry. 1978. “Subjective Probability: Criticisms, Reflections, and Problems.” Journal of Philosophical Logic 7:157–80. Laplace, Marquis de. 1814. A Philosophical Essay on Probabilities. New York: Dover. Levi, Isaac. 1974. “On Indeterminate Probabilities.” Journal of Philosophy 71:391–418. Mañé, Ricardo. 1988. “A Proof of the C1 Stability Conjecture.” Publications Mathématiques de l’Institut des Hautes Études Scientifiques 66:161–210. McGuffie, Kendal, and Ann Henderson-Sellers. 2005. A Climate Modelling Primer. Chichester, NY: Wiley. McWilliams, James C. 2007. “Irreducible Imprecision in Atmospheric and Oceanic Simulations.” Proceedings of the National Academy of Sciences 104:8709–13. Orrell, David, Leonard A. Smith, Tim Palmer, and Jan Barkmeijer. 2001. “Model Error in Weather Forecasting.” Nonlinear Processes in Geophysics 8:357–71. Palis, Jacob, and Stephen Smale. 1970. “Structural Stability Theorems.” In Global Analysis, ed. Shiing-Shen Chern and Stephen Smale, 223–31. Proceedings of Symposia in Pure Mathe- matics 14. Providence, RI: American Mathematical Society. Peixoto, Marilia C., and Maurício M. Peixoto. 1959. “Structural Stability in the Plane with Enlarged Boundary Conditions.” Anais da Academia Brasileira de Ciências 31:135–60. Peixoto, Maurício. 1962. “Structural Stability on Two Dimensional Manifolds.” Topology 2: 101–21. Robinson, Clark. 1976. “Structural Stability of C1 Diffeomorphisms.” Journal of Differential Equations 22:28–73. Smale, Stephen. 1966. “Structurally Stable Systems Are Not Dense.” American Journal of Math- ematics 88:491–96. ———. 1967. “Differentiable Dynamical Systems.” Bulletin of the American Mathematical Soci- ety 73:747–817. Smith, Leonard A. 1992. “Identification and Prediction of Low-Dimensional Dynamics.” Physica D 58:50–76. ———. 2002. “What Might We Learn from Climate Forecasts?” Proceedings of the National Academy of Sciences of the USA 4:2487–92. 58 ROMAN FRIGG ET AL. ———. 2006. “Predictability Past Predictability Present.” In Predictability of Weather and Cli- mate, ed. Tim Palmer and Renate Hagedorn, 217–50. Cambridge: Cambridge University Press. ———. 2007. Chaos: A Very Short Introduction. Oxford: Oxford University Press. Smith, Peter. 1998. Explaining Chaos. Cambridge: Cambridge University Press. Snyder, Ralph D., J. Keith Ord, and Adrian Beaumonta. 2012. “Forecasting the Intermittent Demand for Slow-Moving Inventories: A Modelling Approach.” International Journal of Forecasting 28:485–96. Suppes, Patrick. 1974. “The Measurement of Belief.” Journal of the Royal Statistical Society B 36:160–91. Thompson, Erica L. 2013. “Modelling North Atlantic Storms in a Changing Climate.” PhD diss., Imperial College, London. Walley, Peter. 1991. Statistical Reasoning with Imprecise Probabilities. London: Chapman & Hall. Werndl, Charlotte. 2009. “What Are the New Implications of Chaos for Unpredictability?” British Journal for the Philosophy of Science 60:195–220. Williamson, Jon. 2010. In Defense of Objective Bayesianism. Oxford: Oxford University Press. LAPLACE’S DEMON AND HIS APPRENTICES 59