Mechanistic modelling of cancer: some reflections from software engineering and philosophy of science Cañete-Valdeón, José M. and Wieringa, Roel and Smallbone, Kieran 2012 MIMS EPrint: 2012.115 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK ISSN 1749-9097 http://eprints.maths.manchester.ac.uk/ CONCEPTS & SYNTHESIS Mechanistic modelling of cancer: some reflections from software engineering and philosophy of science José M. Cañete-Valdeón & Roel Wieringa & Kieran Smallbone Received: 17 August 2012 /Revised: 26 October 2012 /Accepted: 29 October 2012 /Published online: 13 November 2012 # Springer-Verlag Berlin Heidelberg 2012 Abstract There is a growing interest in mathematical mechanistic modelling as a promising strategy for un- derstanding tumour progression. This approach is ac- companied by a methodological change of making research, in which models help to actively generate hypotheses instead of waiting for general principles to become apparent once sufficient data are accumulated. This paper applies recent research from philosophy of science to uncover three important problems of mecha- nistic modelling which may compromise its mainstream application, namely: the dilemma of formal and informal descriptions, the need to express degrees of confidence and the need of an argumentation framework. We report experience and research on similar problems from soft- ware engineering and provide evidence that the solu- tions adopted there can be transferred to the biological domain. We hope this paper can provoke new opportu- nities for further and profitable interdisciplinary research in the field. Keywords Cancer . Systems biology . Mechanistic models . Knowledge representation . Explanation Introduction Mechanistic modelling has been employed in biology at least since the past century: A paradigmatic example is the Hodgkin−Huxley giant squid model for action potentials (Hodgkin and Huxley 1952), regarded a stunning accom- plishment for which the authors shared the Nobel Prize in 1963. In the field of cancer research, mechanistic models are focused on describing specific aspects of tumour progres- sion in order to explain the underlying biological processes which drive them (Anderson and Quaranta 2008). An early, highly cited example is Fearon and Vogelstein’s work on colorectal tumorigenesis (Fearon and Vogelstein 1990). During the twenty-first century, the number of mechanistic models of cancer has been increasing, in most cases de- scribed with mathematics—see Komarova (2005) and Araujo and McElwain (2004) for reviews. In parallel to advances in biological modelling, philoso- phy of science has experienced a renewed interest on the issues of mechanistic representation and explanation, whose ideas are elaborations of the rich body of research on mod- elling developed by philosophers of science during the last part of the twentieth century. The goal of this paper is twofold. Our first aim is to identify and analyse, with the help of recent philosophical results, some open problems of mechanistic modelling which are undermining its successful transfer to mainstream application by clinical oncologists and tumour biologists. Although they will be detailed throughout the paper, a brief introduction of the identified problems will help to clarify the context of our discussion. To begin with, it is not clear how mathematics can coexist with natural language (still the notation preferred by many biologists) as instruments for describing mechanistic models. This makes the advantages of formal modelling inaccessible to a large majority of the biological community. Second, papers where models are Communicated by: Sven Thatje J. M. Cañete-Valdeón (*) School of Informatics, University of Sevilla, Sevilla, Spain e-mail: jmcv@us.es R. Wieringa Computer Science Department, University of Twente, Enschede, The Netherlands K. Smallbone Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester, UK Naturwissenschaften (2012) 99:973–983 DOI 10.1007/s00114-012-0991-4 presented contain scientific hypotheses which are necessary to relate the models with some subject of interest in the world. Hypotheses are complex objects in their own, with varying degrees of confidence. Unfortunately, domain- specific notations such as SBML (Hucka et al. 2010), CellML (Lloyd et al. 2004), and Biocharts (Kugler et al. 2009) are not expressive enough to adequately describe the richness of scientific hypotheses, while the use of mathematics in biology is more focused towards describ- ing the models. Last, cancer researchers often employ unstructured natural language to explain why their models are claimed to represent reality. Existing argumentation frameworks might greatly help them to organize, share and validate their reasoning. All of these issues have been the subject of intense research in software engineering. This discipline deals, on the one hand, with hardware and software constructs which have been engineered in such a way that their properties are largely predictable. On the other hand, software engineering also deals with those parts of the physical and human world that form the so-called “problem world” of the software system. While those parts are in general better understood than the growth of cancer cells, those, at least, that involve human interaction are far less well understood (Jackson 2001). It is therefore not far-fetched to hope for useful adaptation of certain software engineering principles to bi- ological systems. The second goal of this paper is to de- scribe some approaches that have worked well in software engineering and to provide evidence on how they can be transferred to the biological domain. The rest of this paper is organised as follows. The section on “Related work” introduces related research and concepts from philosophy of science on mechanistic representation. Sections on “The dilemma of formal and informal descrip- tions”, “The need to express degrees of confidence about knowledge” and “The need of an argumentation framework” present the identified problems, solutions from software engi- neering and evidence of its application to modelling of cancer. The paper closes with conclusions and references. Running example: Throughout this paper, we will em- ploy the model of Gatenby and Gillies (2004) for cell −environment interactions in carcinogenesis, described by the authors with natural language and diagrams. The model is mechanistic because it is organised as a process where component parts (such as cells, substrates and basement membrane) interact and produce an emergent behaviour. The process explains that, in pre-malignant lesions, an ad- aptation to intermittent hypoxia happens by persistent me- tabolism of glucose to lactate (even in aerobic conditions), resulting in increased glucose consumption. Upregulation of glycolysis, in turn, leads to microenvironmental acidosis, which requires an evolution towards phenotypes resistant to acid-induced cell toxicity. Evolved cells with upregulated glycolysis and acid resistance have a powerful growth ad- vantage, which promotes unconstrained proliferation and invasion. We will also consider a mathematical formalisa- tion by Smallbone et al. (2007) of the model of Gatenby −Gillies. The formalisation is a hybrid approach (Anderson 2005), which represents cells in a two-dimensional array governed by rules that determine the discrete state of every cell, together with equations for the continuous metabolite distributions of oxygen, glucose and H+. Related work Representation in biology has long been a subject of interest in philosophy of science. Waddington (1968), (von Bertalanffy 1968) and others followed a logical empiricist programme. They claimed that biology should seek general theories similar to those found in physics, such as Newton’s theory of gravitation and its elaboration in the Principia Mathematica. They regarded scientific theories as linguistic entities, where a small number of fundamental definitions and mathematical axioms constitute the essence of the the- ory, and axioms are then elaborated deductively in the form of theorems. For a generation now, a number of philosophers of sci- ence have been developing an alternative to the logical empiricist account of scientific theories. Such an alternative is sometimes called the “semantic view of theories”, by way of contrast to the “syntactic” character of theories in logical empiricism (also called the “received view”). Giere (1999, p.122) introduces the term “model-based view” to generi- cally refer to the different “semantic” accounts. This term aims to reflect the common agreement among philosophers that predicates present in theories (linguistic entities such as “pendulum” in classical mechanics) do not refer to anything in the real world. Instead, predicates refer to conceptual, idealised entities called “models”. For example, the predi- cate “simple pendulum” refers to a mass swinging from a massless string attached to a frictionless pivot, subject to a uniform gravitational force, and in an environment with no resistance. Clearly, this is an ideal object: No real pendulum exactly satisfies any of these conditions, so no real pendu- lum is a simple pendulum as characterised in classical mechanics. On a model-based view, there is no simple answer to the question “what is the structure of scientific theories?” (Giere 1999, p.98). In contrast, there exist differ- ent accounts, opposite in occasions, yet all sharing the former idea of models as a common point. The model-based view has continued its evolution throughout the twenty-first century with a focus on mecha- nisms and models of mechanisms (Machamer et al. 2000; Bechtel and Abrahamsen 2005; Glennan 2005; Craver 2006; Woodward 2002). The former term is normally used 974 Naturwissenschaften (2012) 99:973–983 to refer to something in the real world, while the latter is employed to denote the representation tool. Some defini- tions for mechanisms are: & Machamer et al. (2000, p. 3): “Mechanisms are entities and activities organized such that they are productive of regular changes from start or set-up to finish or termi- nation conditions”. & Bechtel and Abrahamsen (2005, p. 423): “A mechanism is a structure performing a function in virtue of its component parts, component operations, and their orga- nization. The orchestrated functioning of the mechanism is responsible for one or more phenomena”. & Glennan (2005, p. 445): “A mechanism for a behavior is a complex system that produces that behavior by the interaction of a number of parts, where the interactions between parts can be characterized by direct, invariant, change-relating generalizations”. Glennan (2005, pp. 443–444) argues that: “Perhaps be- cause of the realist tendencies of the philosophers involved, most of the literature has focused on the properties of mechanisms themselves and has not said much about the relationships between mechanisms and their models or the- oretical representations”. Matthewson and Calcott (2011, p. 738) agree in that “it is sometimes difficult to see where the analysis of mechanisms in the world finishes and where (or if) a discussion of their representations begins”. Glennan (2005) and more recently other authors such as Matthewson and Calcott (2011) and Schaffner (2007) have elaborated on one of the model-based views of theories: constructive realism (Giere 1988). This philosophical programme regards a scientific theory as an heterogeneous structure consisting, on the one hand, of a family of (non- linguistic) models at different levels of abstraction; on the other hand, a set of (linguistic) hypotheses claiming the similarity of models with something in the real world (a system or a class of systems). However, as anything can be similar to anything else in a limitless number of ways, a scientific hypothesis must specify the respects in which a model is claimed to be similar to something in the world and the degrees of accuracy in which the similarity is claimed. Matthewson and Calcott (2011) agree with Giere (2006) in that scientists construct different representations of mech- anisms according to their different purposes. The authors claim that modelling involves two distinct steps: First, the model is described (which delineates the properties that the model has) and second, the model is deployed: Relevant similarities are found between the properties of the model and the target of interest in the world. According to the authors, “an understanding of a model and the relevant similarities between this model and its target enables scien- tists to describe, predict or explain part of the world” (Matthewson and Calcott 2011, p.741). Schaffner (2007, p.146) also elaborates on constructive realism and claims that the typical theory in the biomedical sciences is a structure of overlapping interlevel causal tem- poral prototypical models. The models of such a structure usually constitute a series of idealised prototypical mecha- nisms and variations that bear family or similarity resem- blances to each other. The author concedes that only a few (but important) theories in biology “have a very broad scope and are characterizable in their more simplified forms as a set of ‘laws’ which admit of mathematically precise axi- omatization and deductive elaboration”, such as certain for- mulations of Mendelian genetics. In the context of this paper, we employ the term “mech- anistic model” in Glennan’s sense, with independence of the description language, whether mathematics (e.g. Smallbone et al. 2007), words (e.g. Fearon and Vogelstein 1990; Gatenby and Gillies 2004) and diagrams (e.g. Bos et al. 2009). Moreover, we adopt constructive realism, with the elaborations made by the mentioned authors, as the working view of scientific theories. The dilemma of formal and informal descriptions One difficulty with mechanistic modelling of cancer is actu- ally a linguistic matter: Which is the most convenient lan- guage for describing cancer models? Some alternatives are equations, informal diagrams, natural language and struc- tured languages (such as SBML, CellML and Biocharts). The difficulty of developing linguistic tools able to describe tumour growth has long been acknowledged in the literature (Brú et al. 2003). Numerous voices have risen in favour of establishing mathematics as the lingua franca for cancer research, based on the conviction that cancer is dominated by non-linear system dynamics, whose outcomes cannot be simply determined by verbal or linear reasoning alone but can be adequately expressed in terms of ordinary or partial differential equations, or other mathematical constructs such as cellular automata (Gatenby and Maini 2002, 2003; Maini and Gatenby 2006). Proponents of this approach also advo- cate a methodological change: Mathematical models can help to formulate biological hypotheses, favouring an active way of researching which contrasts to the more traditional approach of expecting that the general principles of complex biological processes become apparent once sufficient data are accumulated. However, the adoption of mathematics as a universal language for describing cancer dynamics is not exempt of controversy. Weinberg (2007) points out two main prob- lems: (1) Scientists still do not have enough knowledge about biology and biochemistry to create truly useful, pre- dictive models of tumour development; (2) as a conse- quence of this lack of knowledge, parameters need to be Naturwissenschaften (2012) 99:973–983 975 assumed or arbitrarily fitted to existing data sets to ensure that the predictive powers of mathematical models conform to actual observations. Weinberg’s position has been strong- ly rebutted by the mathematical biology community. Thus, Gatenby and Maini (2003) argue that similar simplifying assumptions are required in most experimental designs and even simple underlying mechanisms may yield highly com- plex observable behaviours. However, Komarova (2005) unveils another difficulty: “cancer modelling is often very detached from experimental and clinical cancer research” […] Because of the involved mathematical expositions, journals that publish theoretical work are mostly inaccessi- ble to the wider, biologically oriented community”. Accordingly, cancer researchers face a dilemma: Dynamics of cancer progression is better described with mathematics but, at the same time, the complex formalism of the descriptions and its fruitful manipulation are not easily accepted by a large part of experimental biologists. A similar question has long been the subject of intense debate and research in software engineering. During the development of a software system, a number of specifica- tions need to be elaborated. Specifications address different aspects of the system-to-be such as its components, behav- iour, physical deployment and user requirements. They cov- er different levels of abstraction, which help engineers to reason about their design before proceeding to write the program code. There is a myriad of languages for elaborating software specifications, and some of them have mathematical seman- tics. Yet, many engineers prefer to use informal diagrams and textual descriptions. The fact is that formal specifica- tions are harder to build and to maintain, but they are susceptible of automatic and semi-automatic analysis, proof and simulation. In contrast, informal specifications are eas- ier to understand and require less bookkeeping, but tool support is very limited. This is even more noticeable in cancer models described with mathematical equations, which produce an emergent behaviour not easy to under- stand but that can be simulated with software tools. We will mention two approaches to this problem from requirements engineering, an area of software engineering focused on the systematic handling of the requirements of a software system and especially concerned with describing the real world where the software problem is located. The first proposal, the so-called “two-button” approach, is working well in the practice of requirements engineering (Lamsweerde A van 2004). It consists of keeping both formal and informal descriptions of the subject of interest; the former must be used only “when and where needed” to enable formal analysis techniques (Lamsweerde A van 2009, p.583). Table 1 shows an example, extracted from the requirements of a software system that must schedule meetings for people (Lamsweerde A van 2009, pp.9-12). The requirements specification includes the following sys- tem goal in natural language: Intended participants shall be notified of the meeting date at least 3 weeks before the meeting takes place (Lamsweerde A van 2009, p.591). A formal specification for that goal can be expressed in linear temporal logic (LTL) (Manna and Pnueli 1992). To this aim, two entities have been defined, Participant and Meeting, with arbitrarily chosen instances p and m, respectively. There are also three predicates: Scheduled(m), Invitation(p, m), and Notified(p,m), which are intended to express that some property holds in some arbitrarily chosen current state of the instances: Meeting m has been scheduled; participant p has been invited to meeting m, and participant p has been notified of meeting m. The LTL operator }≤d means “some- time in the future with deadline d”. The expression m.Date- 3w means “three weeks before the date when meeting m takes place”. The informal specification is more expressive and easier to understand. In contrast, the formal specification can be handled by a tool to prove certain properties. For example, a model checker (Clarke et al. 2000) might determine whether a given behavioural specification of the system satisfies the previous goal. It is important to note that, in the two-button approach, informal statements are not equivalent to their formal counterparts: If they were, informal statements would be formal after all. Rather, both must be regarded as complementary views of the subject of interest. It is important to note that we are not claiming that languages employed for describing software systems are adequate (in general) for representing biological systems. We argue that the underlying idea, i.e. the two-button ap- proach, may make formal models of cancer more appealing Table 1 A two-button specification of a goal, extracted from the requirements of a meeting-scheduling software system (Lamsweerde A van 2009, p.591) Goal (informal specification): Intended participants shall be notified of the meeting date at least 3 weeks before the meeting takes place. Goal (formal specification): ∀p: Participant, m: Meeting Scheduled(m) ∧ Invitation(p,m) ∧ ¬Notified(p,m) ⇒ } ≤ m.Date-3w Notified(p,m) The specification consists of an informal part and a formal part, both of which are complementary views that serve different purposes. Linguis- tically, the informal part is expressed in English, while the formal part is expressed in linear temporal logic. The intuitive semantics of the formal symbols is: ∀ (for all), ∧ (and), ¬ (not), ⇒ (implies) and }≤ d (some time in the future with deadline d). The formal specification establishes a statement that must be satisfied by any arbitrarily chosen Participant and Meeting, namely p and m. The formal specification employs three predicates (properties that can be true at one time and false at another): Scheduled(m) holds if meeting m is scheduled; Invitation(p,m) holds if participant p has received an invitation to meeting m; and Notified(p,m) holds if participant p has been notified of meeting m. The expression m.Date-3w means “three weeks before the date when meeting m takes place” (Lamsweerde A van 2009) 976 Naturwissenschaften (2012) 99:973–983 to many experimental biologists who are not comfortable with pure formal reasoning. Equations and their solutions can be linked to paragraphs and diagrams explaining the underlying processes and conclusions. An example will help to illustrate our point. Smallbone et al. (2007) formalised the model of Gatenby and Gillies (2004) by a hybrid cellular automaton: an MxN array of automaton elements (with a specific rule-set governing their evolution) together with oxygen, glucose and H+ fields, each satisfying reaction–diffusion equations. We have extracted a rule from the model and built a two-button specification with natural language and LTL (see Table 2). The term ϕa(i, j) denotes the amount of ATP produced by the cell located at row i and column j of the array, and a0 denotes a certain threshold value, assumed 0.1. The LTL operator } means “eventually in the future”. The specification of Table 2 can be regarded as two com- plementary views of the same portion of the mechanistic model described in Smallbone et al. (2007). The informal view may be employed for linear reasoning with words, while the LTL formulation may be employed in a more in-depth research, where hypotheses are examined for internal consis- tency and compatibility with extant data and where predic- tions are tested by experiments (Gatenby and Maini 2002). The other proposal for the formal/informal dilemma is suggested by Jackson (2001) in the Problem Frames ap- proach to requirements engineering, where the author is concerned with the question of representing the part of the physical and human world in which a software system is intended to work (the so-called “problem context”). Reality is informal in nature but the system must somehow deal with it to bring about some desired effects, i.e. to satisfy the problem requirements. Jackson (2001, p. 163) expresses this point very clearly: (1) any formalisation of the real world is at best approximate; and (2) we cannot confidently limit the considerations that might affect the domain properties and behaviour we are interested in: There is always much that has not been considered, and some of it may prove decisive. The author provides us with several tools to represent the world. One of them is a classification of the kinds of things (phenomena) that typically appear in the problem context, such as individuals (phenomena that can be named and distinguished) and states (relationships among individuals that can be true at one time and false at another). Another tool for dealing with informal domains is the use of desig- nations, which establish the empirical meaning of a set of ground terms that we can later use in our formal descrip- tions. For example: Dead cð Þ � state : Cell c is not alive: LowATP cð Þ � state : The amount of ATP produced by cell c has fallen below a critical threshold value: The first designation introduces the formal term Dead (c). Then, after the designation symbol (≈), it specifies what kind of phenomenon it is: a state phenomenon. Finally, it gives a recog- nition rule by which we can determine whether what we are observing in the world is, or is not, an instance of the designated phenomenon. According to Jackson, when we write a designa- tion, we are introducing a new class of observations that we can make of the world and naming the newly observable phenomena so that we can refer to them in our description (Jackson 2001, pp.163-164). Another class of observations is introduced by the term LowATP (c): the empirical fact that the amount of ATP produced by a cell has fallen below a critical value. The reader may find Jackson’s designations essentially identical to correspondence rules of logical empiricism (see Giere 1988, p.25, for a review). Correspondence rules are used to empirically interpret the non-logical terms of a scientific theory, which is understood by logical empiricists as a formal, logical system. The symbol R in a theory, for example, could be interpreted as standing for the path of a light ray. However, according to Giere (1988), scientific theories make no distinction between the interpretation of “observable” terms and the interpretation of “non-observ- able” terms. Postpositivist philosophers of science have been unanimous in rejecting correspondence rules in all their manifestations (Giere 1988). As we argued in “Related work”, this paper does not follow a logical empiricist account of scientific theories, so we do not consider a direct relationship between the statements of a theory (formal or not) and the real world. In constructive realism, such a relationship is indirect through the intermediary of a theoret- ical model. Therefore, we will regard a designation as the linking of a formal term with some concept employed in a model. Thus, Dead (c) is interpreted as “conceptual cell c is not alive” in the model that is being defined. A related problem is that of identification: the linking of a term (or a concept) with Table 2 A two-button specification of one of the rules governing the evolution of a carcinoma, according to the model of Gatenby and Gillies (2004) Model rule (informal specification): If the amount of ATP produced by a cell falls below a critical threshold value, the cell dies. Model rule (formal specification): ∀i, j 1≤i≤M ∧ 1≤j≤N ∧ ϕa(i, j)