Casey Helgeson Modus Darwin reconsidered Article (Accepted version) (Refereed) Original citation: Helgeson, Case (2015) Modus Darwin reconsidered. British Journal for the Philosophy of Science. pp. 1-20. ISSN 0007-0882 (In Press) © 2015 The Author This version available at: http://eprints.lse.ac.uk/61099/ Available in LSE Research Online: February 2015 LSE has developed LSE Research Online so that users may access research output of the School. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LSE Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE Research Online website. This document is the author’s final accepted version of the journal article. There may be differences between this version and the published version. You are advised to consult the publisher’s version if you wish to cite from it. http://bjps.oxfordjournals.org/ http://bjps.oxfordjournals.org/ http://eprints.lse.ac.uk/61099/ Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. MODUS DARWIN RECONSIDERED CASEY HELGESON Abstract. Modus Darwin is the name given by Elliott Sober to a form of argument that Sober attributes to Darwin in the Origin of Species, and to sub- sequent evolutionary biologists who have reasoned in the same way. In short, the argument form goes: Similarity, ergo common ancestry. In the present pa- per I review and critique Sober’s analysis of Darwin’s reasoning. I argue that modus Darwin has serious limitations that make the argument form unsuited for supporting Darwin’s conclusions, and that Darwin did not reason in this way. Casey Helgeson Centre for Philosophy of Natural and Social Science Lakatos Building London School of Economics Houghton Street London WC2A 2AE C.Helgeson@lse.ac.uk 1. Introduction One of the central tenants of modern evolutionary biology is the shared ancestry of all extant life on Earth. Darwin’s On the Origin of Species took a big step in that direction. Darwin could address only the portion of Earth’s biota of which nineteenth-century naturalists were aware, and he could see only a short ways back into the long history of life. But he argued compellingly that diverse groups of organisms had evolved each from a single ancestor species, concluding that “ani- mals have descended from at most only four or five progenitors, and plants from an equal or lesser number” (Darwin; 1859/2003, 484). It was a radical conclusion, yet his scientific audience was largely convinced (Bowler; 1989; Larson; 2004).1 Date: February 24, 2015. Thanks to: Bengt Autzen, Matt Barker, David Baum, Maclolm Forster, Jillian Scott McIntosh, Trevor Pearce, Bill Saucier, Elena Spitzer, Michael Titelbaum, Joel Velasco, Peter Vranas, and especially Elliott Sober. Also to audiences at Philosophy of Biology in the UK 2014, APA Pacific 2013, and ISHPSSB 2013. This work was supported by a National Science Foundation Graduate Research Fellowship and a Visiting Fellowship at the Tilburg Center for Logic and Philosophy of Science. 1The broad acceptance of common ancestry by Darwin’s scientific audience – within a decade or two of the Origin – should not be confused with their lukewarm response to natural selection, which languished until the modern synthesis. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. In a series of publications, Elliott Sober has sought to clarify and analyze Dar- win’s case for common ancestry, and to generalize Darwin’s reasoning to encompass contemporary thinking about newer evidence for the hypothesis (Sober; 1999, 2008, 2011; Sober and Steel; 2002, 2014). Sober’s project is thus part exegesis, part epis- temology: How does Darwin argue?, and How does that argument justify common ancestry? In answer to the first question, Sober attributes to Darwin the following argument form: Similarity, ergo common ancestry. This form of argument occurs so often in Darwin’s writings that it deserves to be called modus Darwin. The finches in the Galapagos Islands are similar; hence, they descended from a common ancestor. Human beings and monkeys are similar; hence, they descended from a common ancestor. The examples are plentiful, not just in Darwin’s thought, but in evolutionary reasoning down to the present. (Sober; 1999, 265) To address the epistemological question, Sober sets out to formalize modus Darwin with mathematical rigor, ultimately deriving the force of the argument form from the Law of Likelihood (explained below). In this essay I review and critique Sober’s analysis of Darwin’s reasoning. After introducing Sober’s account, I temporarily bracket Darwin exegesis to focus on the epistemic merits of modus Darwin as Sober understands it. Here I argue that several difficulties undermine Sober’s defense of that argument form. Then I turn back to Darwin and argue against attributing to him the suspect argument form similarity, ergo common ancestry. I suggest an alternative reading of key Origin passages, and offer a partial epistemological defense of the reasoning that I see therein. 2. Modus Darwin Sober derives the normative force of modus Darwin from the Law of Likelihood (Hacking; 1965; Royall; 1997; Sober; 2008), according to which an observation supports one hypothesis over another whenever that observation is more proba- ble supposing the one hypothesis were true, compared with supposing the other hypothesis were true. More formally, observation o favors hypothesis h1 over hy- pothesis h2 if and only if p(o|h1) > p(o|h2). Mapping this framework onto Darwin’s reasoning requires identifying an observation o, and two hypotheses h1 and h2. Similarity between two species is the observation o. The hypothesis h1 is com- mon ancestry (CA), which says that those two species descended from a single ancestor species. For the alternative hypothesis h2, Sober chooses separate ances- try (SA), meaning that the two species’ lineages trace back to separate origin-of-life events. These are, however, only the rough, qualitative statements of o, h1, and h2. To evaluate the inequality p(o|h1) > p(o|h2), Sober must define o more concretely and then formally characterize h1 and h2 as stochastic (chancy) processes that can produce such outcomes with some concrete probability. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. Regarding the observation o, when do two species count as “similar”? Any two species are similar in some ways and dissimilar in others. What is the right yardstick? Sober initially sidesteps this thorny question, and begins with a simpler and more tractable observation: that two species share the same trait on a single dichotomous character. A dichotomous character is one that has just two possible states, for example an insect might have wings or lack them, or the edge of a leaf might be smooth or serrated. (Coding morphology in terms of dichotomous characters typically masks more continuous underlying variation, but dichotomous characters are adequate in many scientific contexts, and they provide a convenient starting point for the formalization of modus Darwin.) Does the observation favor CA over SA sensu the Law of Likelihood? To gen- erate the required conditional probabilities Sober repurposes the idealizations and mathematical framework of contemporary phylogenetic inference, as follows. Let variables X and Y represent the two species, where each can take states {0, 1}, standing for the two possible states of the dichotomous character. So the observa- tion o is both species in the same state (either both 0 or both 1). Each hypothesis is then characterized by a schematic genealogy for the two species (Figure 1), plus a stochastic model describing how the character variables change states as they move along a line in the genealogy. (While Darwin’s primary target in Origin was a non-evolutionary, creationist version of the separate ancestry hypothesis, Sober prefers to reconstruct modus Darwin using a separate ancestry hypothesis that allows for evolutionary change. The idea is that this choice leaves the basic form of Darwin’s reasoning intact, with the added benefit of illuminating the fundamen- tal similarity between Darwin’s reasoning and subsequent arguments made within evolutionary theory.) SA CA X Y X Y Figure 1. Schematic diagrams illustrating lineages postulated by the common ancestry (CA) and separate ancestry (SA) hypotheses. The model of character-state evolution (applied in the same way to all solid lines in both Figure 1 schematics) works as follows. Each solid line comprises a number of time steps (the same number for each of the four lines); the variable associated with each line starts in one state or the other, and then undergoes this many time steps of evolution. At every step there is a small probability that the variable changes from its present state to the other state. (Two state-change probabilities are required: 0 → 1 and 1 → 0, which need not be equal.) The probability of Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. changing states at any given step depends only on the current state of the variable. The longer the stretch of lineage, the greater the chance that the character variable will change along that stretch. In which state does a character variable begin? The initial state of is determined by a random draw from a probability distribution over the state space {0, 1} (i.e., a coin flip—though the coin may be biased). And here lies the only difference between the stochastic models of CA and SA: for SA the initial states of X and Y are set by two independent draws from that distribution, whereas for CA just one draw is required because X and Y must begin in the same state (think of this as the point just before speciation). With CA and SA so characterized, Sober proves the following result: for X and Y to end up in the same state at the end of the process is more probable on CA than on SA regardless of time steps, state-change probabilities, and starting-state distribution (Sober; 2008, chap. 4).2 In other words, two species found in the same state always favors CA over SA. It isn’t hard to understand intuitively why this is so. If the state-change probabilities are small relative to the number of time steps, then the most probable outcome along any branch is stasis. In this case, since CA puts the two species in the same state from the start, chances are good they will both still be in that state at the end. The chances of ending in the same state are somewhat smaller on SA, since X and Y may or may not begin in the same state. As the probability of state change along a branch increases (due either to long lineages or high state-change probabilities), p(o|CA) and p(o|SA) converge on the same value, though p(o|CA) must always be a little bit higher. The opposite is true for species found in different states: mismatches always favor SA over CA. Sober goes on to extend this treatment to cover multistate characters as well, where the variables X and Y can now take any number of states {1, 2, . . . n} and correspondingly more state-change probabilities are needed: one for every possible transition from one state to another (i → j, for all i, j ∈ {1, 2, . . . n}). Sober shows that, here too, X and Y in the same state at the end of the process is more probable on CA than on SA. Mismatches on multistate characters, however, are more complicated. Some mismatches will favor CA, while others will favor SA, depending on the details (Sober; 2008, 295–314). Finally, Sober returns to the question of overall similarity by considering a set of observations {o1, o2, . . . om}, each concerning a different trait. Given such a set, including both matches and mismatches, which hypothesis is favored overall ? As described above, the evidential import of of each individual observation oi is encoded by the ratio of conditional probabilities p(oi|CA)/p(oi|SA). Supposing that the process by which each trait evolves is probabilistically independent of 2With these very minor assumptions: the starting-state distribution gives non-zero probabilities to both states; transition probabilities are strictly between 0 and 1; and time steps are finite. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. that governing every other trait,3 the set of observations favors CA over SA if and only if the product of those ratios (one from each observed trait) is greater than one—in mathematical notation, if and only if: (1) m∏ i=1 p(oi|CA) p(oi|SA) > 1. The idea is that for similar species (e.g., humans and chimpanzees, or two types of finch) the calculation will come out in favor of CA.4 Application to biogeography. To interpret Darwin’s geographical distribution observations (how species are distributed about the globe), Sober develops a vari- ant of modus Darwin that proceeds from observed geographical proximity rather than anatomical similarity, in other words: proximity, ergo common ancestry. All that’s required is a reinterpretation of the stochastic model of character-state evo- lution as a model of geographical dispersal. Consider a multistate character with ten discrete states. The model governing how this character evolves requires a ten-by-ten matrix of transition probabilities, one for each possible transition from one state to another (Equation 2). Allow non- zero probability only between neighboring states (and between a state and itself). Now think of the states themselves as geographical locations along a line (e.g., islands in an archipelago) rather than variants of an anatomical character. And think of state change as geographical dispersal rather than morphological evolution. A species can disperse from one location to another only by passing through the locations in-between, thus the zeros for non-neighboring state transitions. “Neutral evolution within an ordered n-state character is formally just like random dispersal across an n-island archipelago.” (Sober; 2008, 326) (2)   .99 .01 0 0 0 0 0 0 0 0 .01 .98 .01 0 0 0 0 0 0 0 0 .01 .98 .01 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 .01 .99   3While this assumption is certainly not true, it is a standard idealization in, e.g., phylogenetic inference from genetic data (thinking of each nucleotide site, or of each codon, as a trait). 4Not that anyone explicitly carried out such a calculation. But evaluating historical reasoning through the lens of modern epistemic norms always involves out-of-context formalities. If one aims, in addition, to illuminate the way that an author or their audience actually reasoned, the fact that no actor drew pen and paper to calculate is not necessarily an obstacle: the success of a simple reasoning heuristic might require some math to explain, and the cognitive underpinnings of ostensibly qualitative reasoning can approximate complex calculations (as per the wide-spread use of Bayesian statistics in the descriptive modeling of human and animal reasoning, see, e.g., Kemp and Tenenbaum’s (2008) descriptive model of pre-Darwinian taxonomic reasoning). Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. A random draw from a distribution over the ten states determines where a species begins.5 And just as with anatomical modus Darwin, the difference be- tween CA and SA is that CA posits one random draw (whence both species begin dispersing) while SA posits two independent draws, one for each species. The ob- servation o is then the observed spatial separation between species (|X −Y|) after a period of dispersal. In this example, Sober calculates that: With ten locations, the expectation under the separate-ancestry hypoth- esis is that X and Y will be a bit more than three islands away from each other. If X and Y are more spatially proximate than this, then CA has the higher likelihood; if not, not. (Sober; 2008, 326) Sober goes on analyze Darwin’s use of geographical distribution observations in the Origin by mapping the reinterpreted formalism onto Darwin’s (chap. 12) discussion of the Galapagos Archipelago. I will return to the Galapagos example later on. 3. Limitations of Sober’s Formal Framework Set Darwin to one side for the moment and consider the argument form modus Darwin on its own merits. Is Sober’s formal, probabilistic argument cogent? And what does it tell us about the argument form similarity, ergo common ancestry ? These are the question I address here. I will argue that two features of Sober’s formal framework sharply limit the validation that it provides for modus Darwin. The core mathematical result underlying Sober’s analysis is that two species found in the same state for a single character always favors CA over SA. While this conclusion is striking in its generality, it does not by itself get one very far towards applying modus Darwin to real observations. Most applications call for a continuous, or at least multi-state, treatment, where exact matches will be few and far between. And as soon as we leave behind the special case of the exact match, all of the details and parameters that Sober’s proof manages to bracket become important again. Here modus Darwin can pronounce evidential favoring verdicts only after additional assumptions fix the moving parts within the stochastic models of CA and SA. Can the right assumptions be identified in the contexts in which the inference form is supposed to operate? There is reason for worry in the cases of branch lengths, and the size of the anatomical space. Anatomical space. Suppose we compare species X and Y on a given anatom- ical character, and we model this character as having 10 ordered states (Sober’s example, from above). Say it’s the length of a certain bone in centimeters, and the species measure 1cm and 4cm, for a difference of 3. Sticking with the Equa- tion 2 transition probabilities (and the resulting equilibrium distribution for the 5The state-change probabilities (Equation 2) determine what is called the equilibrium distribution of the location variable, which gives the probabilities of finding the variable in each of its ten states after (loosely speaking) infinitely many time steps. Sober uses this equilibrium distribution as the starting-state distribution; in this case that distribution is uniform. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. initial states) and supposing a middling 300 time steps, the observation gives a likelihood ratio a hair above 1 (i.e., no evidence either way). But now rethink one of the modeling decisions that led to this number. Who said the range of possible character states is 1–10? Perhaps the upper limit is instead 5, or 15. Figure 2 displays likelihood ratios for the same observation—and others—recalculated on the assumption that the range of possible states is 1–5 (light gray) and 1–15 (dark gray). Using the 1–5 space, our 3cm observation registers as evidence favoring SA, but using the 1–15 space, the observation favors CA. Looking across possible observations, on the 1–5 space the evidence turns against CA when the difference between states passes 1; in the case of 1–15 the same happens only above 4. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 fifteen states five states difference between states lik el ih oo d ra tio 0. 0 0. 5 1. 0 1. 5 2. 0 Figure 2. Likelihood ratios p(obs.|CA)/p(obs.|SA) for character state observations, varying the assumed range of possible states. Ratios above one favor CA, ratios below one favor SA. In general, positing a larger anatomical space raises the likelihood ratio, making the evidence appear more favorable to CA, while positing a smaller space lowers the ratio, pushing the needle back towards SA. This effect happens through the denominator, p(obs.|SA). (If the starting states of X and Y are chosen indepen- dently from a uniform distribution, then a bigger space makes larger observed differences more probable.) The size of the space matters little to the numerator, p(obs.|CA): if variables that begin in the same state will have typically evolved apart by 3 units at the end of the process, it makes no difference to the outcome whether the space in which this occurs is 15 units wide or 150. The problem is that the choice between different state spaces appears to be ar- bitrary. What could privilege one over another? You might think to use the range of states observed across all taxa, but surely the organisms that have evolved so far don’t exhaust all possible anatomies. If there is no sensible way of fixing the allowable character states, then Sober’s formal framework does not, in the end, yield any evidence rulings. In other words, that framework fails to demonstrate Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. how the Law of Likelihood can be brought to bear on the matter of common an- cestry versus separate ancestry, in which case the framework does little to validate the argument form modus Darwin. Branch lengths. Continuing with the example of the ordered, 10-state character (now bracketing concerns about how to choose the range of possible states), Sober writes that the expectation under SA is that species X and Y will be observed to differ by just over three states, and that observations below this threshold favor CA over SA. But this analysis understates the dependence of the evidential favor- ing verdict on the stipulations that go into the model’s evolutionary mechanics: while the expectation claim is correct, it does not follow, and it is not generally true, that distances below that expectation favor CA. For trait differences of 1–3, the direction of evidential favoring depends on branch length, a term from phylo- genetics that refers to the probability of change along a lineage. Branch length is a function of both the number of time steps (in the solid lines of Figure 1) and the transition probabilities (e.g., Equation 2): one branch is longer than another if change is more probable along that branch, whether this is due to more time steps, or to higher transition probabilities, or a combination of the two. To understand intuitively the dependence of evidential favoring on branch length, consider the case of branch lengths so short that any change at all is very improb- able. Since CA puts the species in the same state to begin with, they will very probably still be in the same state after the period of evolution, for a trait differ- ence of 0. In this case, observing a difference of 1, 2, or 3 will heavily favor separate ancestry. The broader picture of dependence on branch length can be seen in Fig- ure 3, which displays likelihood ratios for all observations 0–9, calculated on three different assumptions about the number of time steps of evolution. The shorter the time frame, the closer the two states must be for the observation to favor CA. And since the mathematics is the same for geographical dispersal, the lesson applies equally to biogeographical modus Darwin. (Equation 2 transition probabilities are used throughout; alternatively, one could explore dependence on branch length by fixing the number of time steps and scaling the transition probabilities—with equivalent results.) So using Sober’s likelihood framework to interpret similarity/proximity as evi- dence bearing on common ancestry requires knowledge of branch lengths. What does this mean for modus Darwin? One thing the formal framework is meant to do is show that similarity can indeed be evidence for common ancestry. Depen- dence on branch length does not stand in the way—it means only that the range of circumstances under which similarity does count as evidence for common ancestry is defined by, among other things, details about branch length. But we might hope to go beyond saying that similarity can sometimes be ev- idence for common ancestry to argue that agents in a particular context were justified in taking the similarities that they observed as evidence for common an- cestry. For example, that in deploying modus Darwin, Darwin himself made a Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. 0 1 2 3 4 5 6 7 8 9 t=10 t=150 t=1000 difference between states lik el ih oo d ra tio 0 1 2 7. 2 Figure 3. Likelihood ratios p(obs.|CA)/p(obs.|SA) for observed character state differences, varying the number of time steps. Ratios above 1 favor CA and ratios below 1 favor SA. good argument. Or that readers swayed by that argument were convinced ratio- nally. Sober’s likelihood framework is relevant here as well. No one is claiming that Darwin explicitly calculated any likelihood ratios, but Sober’s rigorous prob- abilistic framework articulates a line of reasoning that can also be appreciated qualitatively. Which hypothesis fits better with observed anatomical similarities: That two species started out identical, then evolved some? Or that they started with randomly chosen anatomies, then evolved some? This qualitative version of Sober’s likelihood reasoning displays the same dependence on branch length: one cannot judge without knowing something about how long the species have been evolving, and how quickly species evolve. Could a mid-nineteenth century natu- ralist have had sufficient grasp of the pace and timescale of evolution to justifiably argue that common ancestry is the better fit with the observed similarities? Insight into the timescale of biological change came from geology via paleontol- ogy. Tremendous progress was made in the eighteenth and nineteenth centuries in collating layers of sediment from sites around the world, resulting in a coherent time-ordering of geological eras and of the fossil remains carried within those lay- ers. But the project of assigning absolute dates to geological eras (and thus fossil remains) proceeded much more slowly. The nineteenth century was characterized by competing and wildly divergent estimates of the age of the earth and its geolog- ical eras (Gohau; 1990), and by interdisciplinary jostling on the subject between biologists, geologists, and physicists (Shipley; 2001). Early nineteenth century catastrophists thought in terms of hundreds of thou- sands of years (Cuvier), or of millions (de Serres, Buckland). Lyell’s uniformitarian assumptions led him to posit 240 million years since the beginning of the Cambrian period—which contained the earliest known fossils at that time (Gohau; 1990). But Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. physicists balked at the idea of a steady-state earth, and following mid-century de- velopments in thermodynamics William Thompson (later Lord Kelvin) calculated at most one half—and more probably one tenth—of that time for the earth’s entire history from a molten state to its present condition (Burchfield; 1975). Thomp- son’s work was influential, pushing most geologists in the late nineteenth century away from uniformitarianism and towards shorter time scales and faster geological processes (Bowler; 1989, chap. 7). Darwin followed Lyell in matters geological, and his own back-of-the-envelope calculations were even more generous than Lyell’s Cambrian estimate.6 Darwin had originally assumed an almost unlimited amount of time for life to evolve (Bowler; 1989; Larson; 2004), and the trend towards shorter time scales put pres- sure on his theory of natural selection. In particular, Thompson’s timeframe was regarded by all parties as too short for Darwin’s slow, gradual process to yield the observed diversity of life. The discrepancy contributed to skepticism about natural selection and encouraged evolutionists’ explorations of alternative pro- cesses, including orthogenesis, saltationism, and Lamarckian inheritance. Though skeptical of Thompson’s results, Darwin himself gave the inheritance of acquired characteristics an ever greater role in later editions of the Origin, in part to allow for more rapid evolution (Larson; 2004, chap. 5). Yet through all of the uncertainty and discord over the timescale, pace and processes of evolution, naturalists grew ever more committed to common ancestry and evolution by some mechanism or other —also called the “theory of descent.” Writing in 1907, American entomologist Vernon Kellogg summarized the state of play in his Darwinism to-day (“Darwinism” referring specifically to Darwin’s theory of natural selection): While many reputable biologists to-day strongly doubt the commonly reputed effectiveness of the Darwinian selection factors to explain descent – some, indeed, holding them to be of absolutely no species-forming value – practically no naturalists of position and recognized attainment doubt the theory of descent. Organic evolution, that is, the descent of species, is looked on by biologists to be as proved a part of their science as gravitation is in the science of physics or chemical affinity in that of chemistry. Doubts of Darwinism are not, then, doubts of organic evolution. (Kellogg; 1907, 3) So the broad historical narrative shows increasing conviction on common an- cestry driving research into the mechanisms of evolution and inheritance, while 6Darwin gives an example close to home: a large geological feature in south-eastern England called the Weald, where relatively deep geological strata are exposed. Higher layers of known (local) thickness must have been worn away over time, and based on what Darwin considers a conservative estimate of the rate of denudation (wearing down, by various means) he estimates that the process must have required 300 million years (Darwin; 1859/2003, 285–7). All of the strata in question are well above the Cambrian layer—so compared to Lyell’s Cambrian estimate, Darwin’s 300 million is a bigger number for a period known to be significantly shorter. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. fuzzy ideas about the pace and timescale of evolution were pushed around by constraints from geology and physics.7 Consider again the essence of Sober’s like- lihood reasoning: In any given case, are the observed similarities more like what would result from two species starting out identical, then evolving a while (CA), or starting with randomly chosen anatomies and then evolving a while (SA)? Given their ignorance about branch lengths, could Victorian naturalists have sensibly approached this question, in even a qualitative and tentative way? The broad- brush history rehearsed above suggests not, in which case we cannot view Sober’s likelihood reasoning as providing a rationale by which Darwin or his readers could have justifiably interpreted similarity as evidence for common ancestry. To summarize, dependence on branch length does not undermine Sober’s demon- stration that similarity can sometimes be evidence for common ancestry. But it does mean that without knowledge of branch lengths, one cannot know when this “sometimes” obtains. Sober’s likelihood ratios also depend on how one specifies the space of possible character states. This poses a deeper challenge since unlike branch length, the extent of the character space appears to be a fundamentally arbitrary mathematical stipulation. If there is no correct way of fixing the char- acter state space, then the framework’s evidence rulings are themselves arbitrary and fail to show that similarity can sometimes be evidence for common ancestry. Did Darwin use Modus Darwin? So far I have raised concerns about the epistemological merits of modus Darwin. But now that the inference form looks increasingly difficult to justify, we might step back and (giving Darwin the benefit of the doubt) reconsider the attribution. While Sober sees modus Darwin at work throughout Darwin’s thinking, two spe- cific Origin passages receive special attention (Sober; 2008). In what follows, I examine these two passages and ask whether modus Darwin plays a role in the reasoning displayed there. In each case, I’ll answer ‘No,’ and briefly sketch an alternative reading. Adaptive characters. Are some similarities between species X and Y more telling than others in favor of common ancestry? Sober raises the question while discussing the combined evidence from a set of observations (Sober; 2008, 297), and cites the following passage as Darwin’s answer: On my view of characters being of real importance for classification, only in so far as they reveal descent, we can clearly understand why analogical or adaptive character, although of the utmost importance to the welfare of the being, are almost valueless to the systematist. For 7To quicky finish the narrative: In the early twentieth century Thompson’s age-of-the-earth calculations were conclusively undermined by new understanding of radioactivity, and geologists began to embrace longer time scales. Another few decades’ work in genetics and related fields would ease other worries about the efficacy of natural selection (Mayr; 1982, 510–525), leading to the modern synthesis and the resurgence of “Darwinism.” Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. animals, belonging to two most distinct lines of descent, may readily become adapted to similar conditions and thus assume a close external resemblance; but such resemblances will not reveal—will rather tend to conceal their blood-relationship to their proper lines of descent. (Darwin; 1859/2003, 427) Within the formal framework discussed above, Sober shows that transition proba- bilities that bias both X and Y towards a particular state (i.e., selection) give rise to smaller likelihood ratios for the observation of both species in the favored state, compared to symmetrical transition probabilities (drift) (Sober; 2008, 297–8). In other words, matches on adaptive characters are weaker evidence for CA over SA, just as Darwin said. Or did he? The quoted passage comes from a section of chapter 13 labeled “classification,” in which Darwin reinterprets existing taxonomic practice in light of his theory of evolution. Mid-nineteenth century taxonomic classifications used a groups-within-groups structure to represent relationships between taxa. In essence, Darwin said that those taxonomic structures were in fact genealogical trees (now we would say phylogenetic trees), and that existing taxonomic practice amounted to a method of phylogenetic inference. To drive home the point, Darwin picked out a handful of taxonomic practices that—though widely followed—had no deep methodological justification, and he argued that those practices made sense in light of his theory of evolution and interpretation of taxonomy. One of those poorly-grounded practices was the discounting of adaptive characters. So Darwin’s comments address the role of adaptive characters in phylogenetic systematics, where the competing hypotheses are alternative genealogical trees (Figure 4), all of which presuppose common ancestry. Such trees differ only on which species have more recently diverged from which. In this context, separate ancestry is out of the picture, and with it modus Darwin. The fundamental mode of reasoning to which Darwin’s discussion of adaptive characters adds a caveat is not similarity, ergo common ancestry but rather greater similarity, ergo more recent ancestry. The latter is the basic credo of phylogenetic inference (a comparatively well-researched inference problem). The adaptive characters passage does not show Darwin using modus Darwin after all. i j m A A AA A A � � �� j m i A A AA A A � � �� m i j A A AA A A � � �� tree 1 tree 2 tree 3 Figure 4. Three competing genealogical hypotheses. This is not to say that Darwin’s discussion of classification and adaptive char- acters doesn’t ultimately contribute to his case for common ancestry. Darwin Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. takes the branching, tree-like structure of his common ancestry hypothesis to ex- plain the groups-within-groups nature of existing taxonomic relations (Winsor; 2009), as well as, when combined with natural selection, the otherwise mysteri- ous usefulness (for classification) of non-adaptive traits, rudimentary organs, and embryological characters (Richards; 2009). And these explanatory feats, thinks Darwin, redound to the credit of his theory. Of course this is only a casual sketch of Darwin’s reasoning—not a philosophical analysis linking that reasoning to well- defined epistemic norms. Yet so long as it is descriptively accurate, we can see that modus Darwin is not invoked. Galapagos. Darwin’s Origin discussion of the Galapagos Archipelago is the sec- ond spot where Sober explicitly maps modus Darwin onto a specific passage. Dar- win’s brief discussion of the Galapagos comes at the end of two chapters devoted to the geographical distribution of species, where it serves as an illustration of the following generalization: “The most striking and important fact for us in regard to the inhabitants of islands, is their affinity to those of the nearest mainland, without being actually the same species” (Darwin; 1859/2003, 238–9). Darwin takes this feature of island biogeography to speak in favor of common ancestry, and Sober reconstructs that reasoning as follows. Each Galapagos species {X1, X2, . . . Xn} is paired with a species found on mainland South America {Y1, Y2, . . . Yn} on the basis of close anatomical similarity (Figure 5). For each pair, the anatomical sim- ilarity of species Xi to its mainland counterpart Yi supports CA over SA, for that pair, via modus Darwin. On top of that anatomical evidence, the geographical proximity of Xi and Yi then adds further support for CA over SA for that pair, now by the geographical distribution variant of modus Darwin (Sober; 2008, 330).8 �� �� rrr rr r Xi &% '$r rr r rr Yi Figure 5. Schematic representation of the Galapagos {X1, . . . Xn} and mainland South American {Y1, . . . Yn} species featured in Sober’s reading of Darwin’s Galapagos Archipelago illustration. I have another reading. Darwin’s island biogeography generalization is a special case of an even more general trend that he introduces at the very beginning of his first chapter on geographical distribution, namely that the more accessible any two geographical regions (by migration or dispersal), the more similar the inhabitants 8Sober sees an additional inference in Darwin’s discussion of the Galapagos, concerning whether the geographical origin of each pair (Xi, Yi) is the same (Sober; 2008, 331–2). But this additional reasoning is not an instance of modus Darwin, and no longer concerns CA versus SA. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. of those regions (Darwin; 1859/2003, 347–50). That Darwin’s real focus is relative proximity within groups of species (n > 2) is not obvious from his (rather cursory) treatment of the Galapagos, but it is clear looking at the few examples that he discusses in (somewhat) greater detail. His primary illustrations feature South America’s unique rodents and flightless birds. The agouti, viscacha, coypu and capybara are each other’s closest taxonomic relations (i.e., they’re more similar to each other than to anything else in the world) and they all live in nearby regions of South America. Somewhat similar, but less so (again, as judged by existing taxonomic classifications) are the beaver and muskrat, which are found much further afield in North America and Europe; hares and rabbits are even more widely dispersed. The flightless birds (greater rhea, Darwin’s rhea, emu, and ostrich) illustrate the same pattern (Darwin; 1859/2003, 349). Darwin’s argument goes roughly as follows. Suppose a group of species shares a branching, tree-like ancestry, and suppose the true tree is reflected (albeit im- perfectly) in taxonomists’ classifications. How might this be checked against geo- graphical distribution observations? Consider any species, together with its closest taxonomic relations plus a somewhat more distally classified species or two (similar to what we now call an outgroup). Since more recent common ancestry leaves less time for geographical dispersal, the closest taxonomic relations should typically be found somewhere more accessible than the outgroup species.9 The observed trend with which Darwin opens his geographical distribution discussion (the more accessible the regions, the more similar the inhabitants) shows that this is gen- erally the case. This relationship is difficult to explain on the supposition that each species was created independently, so the observations support the common ancestry suppositions from which we began. As before with the adaptive characters passage, my alternative reading falls short of a deep epistemological analysis or evaluation of the argument. The point is that Darwin’s geographical distribution argument does not conform to modus Darwin. The step in Darwin’s reasoning that links accessibility to ancestry presupposes CA for every species pair. And while modus Darwin attends to the absolute proximity between two species, e.g., “If X and Y are more spatially proximate than [three units away], then CA has the higher likelihood; if not, not” (Sober; 2008, 326), Darwin is talking about relative proximity (X is closer to Y than to Z), with no regard for scale. Indeed, on an absolute scale the Galapagos are very inaccessible from South America, being separated by 600 miles of open ocean. Darwin’s point is that even in such cases, the general pattern of relative similarity mirroring relative proximity persists: for a given Galapagos species, the most similar species found outside the Galapagos archipelago inhabit the most accessible region, the South American mainland; less similar species are found further afield.10 9Compare Sober’s (2008, 327–9) discussion of Darwin’s “space-time principle.” 10One special threshold of accessibility does play a role in Darwin’s reasoning: if it were impossible for a species or their ancestors to get from point A to point B, then species in those locations Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. So neither the adaptive characters passage nor the Galapagos example illustrate modus Darwin in action. While two false alarms don’t show that Darwin never used modus Darwin, I hope that by supplying alternative readings of the passages that Sober discusses explicitly, I have at least shifted the burden of proof. My own judgement—which goes beyond what I can argue for here—is that (at least in the Origin) modus Darwin plays at best a minor role in Darwin’s reasoning.11 Other passages that may appear to espouse similarity, ergo common ancestry are in my view most likely abbreviated rehearsals of Darwin’s blanket reinterpretation of biological classifications as genealogical hypotheses. They are, in other words, further instances of greater similarity, ergo more recent ancestry. I suggest it is this phylogenetic thinking—and not modus Darwin—that occurs again and again in Darwin’s reasoning, as a recurring element within various arguments that Darwin constructs in support of his theory. But it may not be quite correct to say that Darwin himself made inferences of the form greater similarity, ergo more recent ancestry. Nineteenth century taxonomic classifications were produced by specialists with years of experience working on specific groups of organisms. Except where Darwin did this kind of work himself (e.g., on barnacles), he would have relied on the work of others, those classifications becoming part and parcel of any judgments of similarity between species. Other naturalists with deeper knowledge of the taxa in question would have proceeded, in the course of constructing their classifications, roughly along the lines of greater overall similarity, ergo closer taxonomic relatedness, to which Darwin added “by ‘closer taxonomic relatedness’ I think you mean more recent common ancestry.” In any case, the beginning-to-end chain of observation and reasoning that goes from in-depth knowledge of comparative anatomy to a particular genealogical tree for a given set of taxa is something to which Darwin contributes, and on which many of his arguments rely. Modus Darwin versus phylogenetic inference Given Darwin’s reliance on something like the inference form greater similarity, ergo more recent ancestry, one might wonder whether this mode of reasoning founders on the same objections raised above to Sober’s defense of modus Dar- win. I should therefore briefly explain why those particular objections do not apply.12 Sober’s likelihood-based defense of modus Darwin proceeds by applying a could not share common ancestry. Darwin is therefore keen to emphasize the mechanisms and “accidental means” by which prima facie implausible journeys might have happened. 11A good candidate may be Darwin’s closing comments on common ancestry (Darwin; 1859/2003, 484) where he suggests, on the grounds that all species share some basic chemical and cellular similarities, that there is just one original species from which everything evolved. But he concedes this a flimsy argument, and doesn’t take it very seriously. 12More generally, phylogenetic inference is the subject of a massive and ever-expanding scientific literature that I cannot hope to review here. See Baum and Smith (2013) for a non-technical Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. probabilistic model of character state evolution along the branches of the Figure 1 genealogies to arrive at likelihoods p(obs.|CA) and p(obs.|SA), which can then be compared. Consider the analogous approach to greater similarity, ergo more recent ancestry. Here the competing hypotheses are different genealogical trees (Figure 4). And the observation to which a tree assigns a probability is not a single character state comparison between species X and Y , but rather a set of such comparisons, one for each pairing of species within the group under consideration. In the case of three species, there are three comparisons to make. It is convenient to arrange the numbers on a 2×2 table, where each cell shows the difference between the character states of the species associated with that row and column (Figure 6). Applying Sober’s model of character state evolution along the branches of any Figure 4 tree yields a stochastic model that assigns a probability to such data. This generates a likelihood p(obs.|treei) for each of trees 1–3. These likelihoods can be compared, and the tree that posits most recent ancestry for the most similar species pair indeed has the highest likelihood (Figure 7). (Sober’s mathematical modeling of CA and SA borrows from methods of phylogenetic inference, so shifting from CA and SA to competing genealogical trees is just to return to the original context. The result is a simplified version of how likelihoods of trees are calculated within contemporary maximum-likelihood and Bayesian phylogenetic inference.) i m j j 13 3 · m 10 · i · (a) -� 20 30 33 i m j (b) Figure 6. (A) Example data for calculating likelihoods of phylo- genetic trees, and (B) trait values that would generate those data. My first objection to Sober’s defense of modus Darwin was that the likelihood ratio p(obs.|CA)/p(obs.|SA) is inappropriately sensitive to how one models the space of character traits. Recall that the culprit is the quantity p(obs.|SA); a bigger space allows for more divergent starting states, making observations of large character state differences more probable. In contrast, p(obs.|CA) does not depend on the size of the character space—provided it is not so small that the species have already bumped into the endpoints—so there is no need to specify its size beyond “bigger than what evolution has so far explored.” When it comes to comparing tree versus tree (as per greater similarity, ergo more recent ancestry ), overview, Sober (1991) for a philosophically-oriented introduction, and Felsenstein (1988) for an early review of mathematical methods. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. every hypothesis in the mix affirms common ancestry, and like p(obs.|CA) the likelihoods p(obs.|treei) can be calculated without having to postulate a concrete range of possible character states. (The calculations below employ an infinite one-dimensional anatomical space.) So the “anatomical space” objection does not apply to a likelihood-based defense of greater similarity, ergo more recent common ancestry. My second objection was that Sober’s likelihood reasoning requires knowledge of branch length that was unavailable in Darwin’s time. While branch lengths are a source of uncertainty in phylogenetic inference as well, there is an important sense in which that uncertainty is less debilitating than in the case of modus Darwin. In the likelihood contest between CA and SA, simply stretching or shrinking all branches proportionally can change the direction of evidential favoring from one hypothesis to the other: an observation that appears to favor common ancestry instead favors separate ancestry if you double the time scale. The same is not true when comparing one tree to another, as some example calculations will illustrate. Using the Figure 6 numbers as an example, Figure 7 displays the likelihoods p(obs.|treei) for trees 1–3 (refer to Figure 4) over a very wide sweep of branch length assumptions.13 The important feature of Figure 7 is that the lines never cross, meaning that the ranking of hypotheses by likelihood is independent of branch length. This independence is a general feature of the inference problem, not specific to these example observations. Even very severe uncertainty about the overall timescale of evolution therefore does not undermine claims about the observations favoring one tree over another. (Though in the limit, the three likelihoods converge to the same value, meaning that evidence for one tree over another gradually weakens. See Sober and Steel (2014) for an in-depth look at this phenomenon.) Conclusion What is absolutely clear is that Darwin is eager to convince his readers of common ancestry, and that some of the Origin passages where he argues most pointedly for this conclusion involve talk of “similarity” or “resemblance.” But the structure of the arguments can be somewhat opaque. Sober sees similarity, ergo common ancestry at work in those arguments, and launches an (informed and enlightening) investigation into the epistemology of the inference form and its relation to modern statistical inferences within evolutionary biology (Sober; 1999, 2008, 2011; Sober and Steel; 2002, 2014). My aim here has been to review and assess the argument form modus Darwin and its role in Darwin’s case for common ancestry in the Origin. I have argued 13The quantity varied is the number of time steps from the root of the tree to any leaf (assumed equal along all three paths); the branching takes place after half this number of steps. (Selectively stretching only certain sections of a tree, on the other hand, can upend the likelihood ranking—a genuine issue in phylogenetic inference, as rates of evolution can vary over time and between lineages.) Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. timesteps (in thousands) lik el ih oo d tree 1 tree 2 tree 3 1 2 3 4 5 6 7 8 9 10 11 12 0 .0 00 5 0. 00 1 0. 00 15 Figure 7. Likelihoods p(obs.|treei) for Figure 6a observations, over a range of branch lengths. (Equation 2 transition probabilities used throughout; scaling the transition probabilities with a fixed number of timesteps yields equivalent results.) that the probabilistic justification Sober offers for modus Darwin is inadequate. The basic form of that justification is of course sound (it is the foundation of both likelihoodist and Bayesian statistics): compare the probability of an observation supposing common ancestry were true with the same observation’s probability supposing separate ancestry were true. But this is easier said than done. Sober picks an observation type and offers a recipe for calculating the two probabilities, but the recipe calls for some far-fetched ingredients. One of those is branch length, a perfectly legitimate scientific quantity that is routinely estimated with some confidence in modern molecular phylogenetics but not by Victorian naturalists. Another is the range of possible character states, a dubious notion that has no significance within evolutionary theory. Sober’s mathematical construction provides a framework for investigating and rigorously evaluating modus Darwin. I have continued to use that framework here and it has enabled the present analysis. But for the reasons just rehearsed, that construction does not yield a satisfactory justification for modus Darwin, especially not in the nineteenth century context. In any case, it is far from clear that Darwin argued in that way. Closer inspection of the passages that motivate the attribution to Darwin reveal a different argument form, one familiar from contemporary phylogenetics: greater similarity, ergo more recent ancestry. This argument form is more defensible, both epistemically and exegetically, though it cannot replace modus Darwin as a self-contained argument Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. for common ancestry—indeed it presupposes that conclusion. Greater similarity, ergo more recent ancestry describes just one step of reasoning, used by Darwin in constructing more complex arguments. References Baum, D. A. and Smith, S. D. (2013). Tree thinking: An introduction to phylogenetic biology, Roberts. Bowler, P. J. (1989). Evolution: The history of an idea, University of California Press. Burchfield, J. D. (1975). Lord Kelvin and the Age of the Earth, University of Chicago Press. Darwin, C. (1859/2003). On the Origin of Species: A Facsimile of the First Edition, Harvard University Press. Felsenstein, J. (1988). Phylogenies from molecular sequences: inference and reliability, Annual review of genetics 22(1): 521–565. Gohau, G. (1990). A history of Geology, revised and translated from the French by Albert V Carozzia and Marguerite Carozzi. Translation of: Histoire de la geólogie, Rutgers University Press. Hacking, I. (1965). The Logic of Statistical Inference, Cambridge University Press. Kellogg, V. L. (1907). Darwinism today, Bell. Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form, Proceedings of the National Academy of Sciences 105(31): 10687. Larson, E. J. (2004). Evolution: The Remarkable History of a Scientific Theory, Random House Digital, Inc. Mayr, E. (1982). The growth of biological thought: diversity, evolution, and inheritance, Belknap Press. Richards, R. J. (2009). Classification in Darwin’s origin, in M. Ruse and R. J. Richards (eds), The Cambridge Companion to the “Origin of Species”, Cambridge University Press, chapter 10, pp. 173–193. Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm, Chapman and Hall. Shipley, B. C. (2001). ‘Had Lord Kelvin a right?’: John Perry, natural selection and the age of the Earth, 1894–1895, in C. L. E. Lewis and S. J. Knell (eds), The Age of the Earth: from 4004 BC to AD 2002, Vol. 190, Geological Society of London, pp. 91–105. Sober, E. (1991). Reconstructing the past: Parsimony, Evolution, and Inference, second printing edn, MIT press. Sober, E. (1999). Modus Darwin, Biology and Philosophy 14(2): 253–278. Sober, E. (2008). Evidence and Evolution: The Logic Behind the Science, Cambridge University Press. Accepted at British Journal for Philosophy of Science. Author’s penultimate draft; please don’t cite. Sober, E. (2011). Did Darwin Write the Origin Backwards?: Philosophical Essays on Darwin’s Theory, Prometheus Books. Sober, E. and Steel, M. (2002). Testing the hypothesis of common ancestry, Journal of Theoretical Biology 218(4): 395–408. Sober, E. and Steel, M. (2014). Time and knowability in evolutionary processes, Philosophy of Science 81(4): 558–579. Winsor, M. (2009). Taxonomy was the foundation of Darwin’s evolution, Taxon 58(1): 43–49. Helgeson_Modus Darwin_2015_cover Helgeson_Modus Darwin_2015_author