key: cord-0042649-7le1mxwy authors: Kretzschmar, Mirjam; Wallinga, Jacco title: Mathematical Models in Infectious Disease Epidemiology date: 2009-07-28 journal: Modern Infectious Disease Epidemiology DOI: 10.1007/978-0-387-93835-6_12 sha: 57e542a1452b19797da4dcdefe504985b53e3e26 doc_id: 42649 cord_uid: 7le1mxwy The idea that transmission and spread of infectious diseases follows laws that can be formulated in mathematical language is old. In 1766 Daniel Bernoulli published an article where he described the effects of smallpox variolation (a precursor of vaccination) on life expectancy using mathematical life table analysis (Dietz and Heesterbeek 2000). However, it was only in the twentieth century that the nonlinear dynamics of infectious disease transmission was really understood. In the beginning of that century there was much discussion about why an epidemic ended before all susceptibles were infected with hypotheses about changing virulence of the pathogen during the epidemic. Only towards the end of the twentieth century did mathematical modeling come into more widespread use for public health policy making. Modeling approaches were increasingly used during the first two decades of the AIDS pandemic for predicting the further course of the epidemic and for trying to identify the most effective prevention strategies. But the real impact of mathematical modeling on public health came with the need for evaluating intervention strategies for newly emerging and reemerging pathogens. In the first instance it was the fear of a bioterrorist attack with smallpox virus that sparked off the use of mathematical modeling to combine historical data from smallpox outbreaks with questions about vaccination in modern societies (Ferguson et al. 2003) . Later the outbreak of the SARS virus as a newly emerging pathogen initiated the use of mathematical modeling for analyzing infectious disease outbreak data in real time to assess the effectiveness of intervention measures (Wallinga and Teunis 2004) . Analysis of historical data about pandemic outbreaks of influenza A have led to the important insight that the basic reproduction number of influenza has been low in historical outbreaks, but the serial interval is short (Mills et al. 2004 ). This implies that in principle an outbreak of influenza can be stopped with moderate levels of intervention, but measures have to be taken very rapidly in order to be effective. In contrast, for an infection such as measles with a high basic reproduction number, very high levels of vaccination coverage are needed for elimination. Such insights gained from mathematical analysis are extremely helpful for designing appropriate intervention policy and for the evaluation of existing interventions. The central idea about transmission models, as opposed to statistical models, is a mechanistic description of the transmission of infection between two individuals. This mechanistic description makes it possible to describe the time evolution of an epidemic in mathematical terms and in this way connect the individual level process of transmission with a population level description of incidence and prevalence of an infectious disease. The rigorous mathematical way of formulating these dependencies leads to the necessity of analyzing all dynamic processes that contribute to disease transmission in much detail. Therefore, developing a mathematical model helps to focus thoughts on the essential processes involved in shaping the epidemiology of an infectious disease and to reveal the parameters that are most influential and amenable for control. Mathematical modeling is then also integrative in combining knowledge from very different disciplines like microbiology, social sciences, and clinical sciences. For many infections -such as influenza and smallpox -individuals can be categorized as either "susceptible," "infected" or "recovered and immune." The susceptibles that are affected by an epidemic move through these stages of infection ( Fig. 12.1 A key quantity in infectious disease epidemiology is the reproduction number, denoted by the symbol R, which is defined as the number of secondary cases that are infected by one infectious individual. As an example we can sketch the typical course of an epidemic if the reproduction number R = 3 (here the generation time equals the duration of infectivity; Fig. 12 .2). In the illustration of Fig. 12 .2, the number of new infections increases in the first generation by a factor equal to the reproduction number R. The number of available susceptible individuals is depleted in the course of the epidemic. When the last infected person fails to contact any susceptible person, the epidemic dies out. The infection attack rate is the total proportion of the population that is eventually infected during the epidemic, and it is denoted by A. This infection attack rate is completely determined by the reproduction number R and the contact process that describes who contacts whom ( Fig. 12.3 new infection is like influenza, with a reproduction number of about R = 1.5, we expect that more than half of the population will be infected; and if the new infection is like smallpox, with a reproduction number of about R = 5, we expect that almost the entire population will be infected during an epidemic without interventions. To capture the epidemic dynamics over time, we need to incorporate the natural course of infection of an individual host. As time proceeds, an infected host moves from the incubation period through the prodromal phase and the infectious period to recovery and immunity ( Fig. 12.4) . For influenza and smallpox, such timelines are depicted in Fig. 12 .4. The duration of the incubation period and the relative infectiousness in the stages before symptom onset (the prodromal phase) and the symptomatic stage are crucial in determining the success of control strategies such as contact tracing and isolation of symptomatic cases. The timelines determine another epidemiological key quantity, the generation time T. This generation time is defined as the typical duration between the time of infection of a source and the time of infection of its secondary case(s). For influenza, the generation time is in the order of T = 3 days. For smallpox, the generation time is in the order of T = 20 days. The chain reaction nature of the epidemic process leads to exponential growth in real (calendar) time during the initial phase of the epidemic, once the number of infected individuals has become large enough to avoid chance events that lead to an early extinction of the epidemic. The exponential growth rate r is determined by the precise timelines of infection. There is a lower limit to the growth rate r that is set by both the reproduction number R and the generation time T (specifically, r > ln (R)/T). To illustrate the strength of this basic approach to epidemic modeling, we use it to assess the impact of border closure on epidemic spread. The number of infected persons that will try to cross the border from an infected country into a country that is not yet infected will increase exponentially with a growth rate r. Closing the borders will stop most infected persons, but a proportion p might slip through. Therefore, closing the borders will result in a reduction by a factor p of the exponential growth of number of imported cases. This reduction corresponds to a delay in the exponential growth of the number of imported cases (specifically, the delay is at most (-ln p /ln R) T). Therefore, border closure will only postpone the import of cases for a few generations of infection. For example, if closure was to reduce all of those infected travelers who would ordinarily have crossed the border to 1%, the introduction of an influenza epidemic may be delayed by about a month, and the introduction of a smallpox epidemic may be delayed by about 2 months. The key epidemiological variables that characterize spread of infection are the generation time T and the reproduction number R. If a novel infection starts spreading, such as SARS in 2003, these key variables are unknown. But even if an outbreak of a more familiar infection occurs, such as norovirus, we might be groping in the dark about the precise values of these key variables. Yet, if modeling is to be helpful in infectious disease control, it is crucial to have the best possible estimates for the generation time and the reproduction number, along with other quantities such as the incubation time and hospitalization rate. Estimation would be easy if we had perfect information about the outbreak. If we would know exactly who had infected whom, and if we know precisely who was infected when, we could simply measure the duration of each time interval from infection of a case back to time of infection of its source, and the distribution of the length of these time intervals would inform us about the generation interval. Similarly, we could simply count for each infected individual how many others were infected by this individual, and the distribution of such counts would inform us about the reproduction number. Of course, in a real world such information is not available and we have to deal with incomplete observations, proxy measures, and reporting delays. But real-time estimating procedures have been proposed that attempt to reconstruct the likely patterns of who infected whom, and who was infected when, from the incomplete data and proxy measures, using standard statistical techniques for dealing with missing data and censoring (Wallinga and Teunis 2004; Cauchemez et al. 2006) . The main message is that during an outbreak it is important to collect data on cases (time of symptom onset) and about the relation between cases (existence of an epidemiological link). The more accurate this data is, the more useful it is to estimate the key model ingredients, the generation time T and the reproduction number R, and the more helpful this data can be in predicting the likely future course of the epidemic without intervention and the required control effort to curb the epidemic. Many of the above ideas can be formalized mathematically in the so-called SIR model that describes the dynamics of different states of individuals in the population in terms of a system of ordinary differential equations. The variables of the system are given by the compartments described above: the group of susceptible persons (denoted by S), the group of infected persons (denoted by I), and the group of removed persons (removed from the process of transmission by immunity) (denoted by R). The mathematical model provides a precise description of the movements in and out of the three compartments. Those movements are birth (flow into the compartment of susceptible individuals), death (flow out of all compartments), transmission of infection (flow from S into I), and recovery (flow from I into R) ( Transitions between compartments are governed by rates, which in the simplest version of the model are assumed to be constant in time. The birth rate ν describes the recruitment of new susceptibles into the population, the death rate μ the loss of individuals due to a disease-unrelated background mortality, and γ denotes the recovery rate of infected individuals into immunity. The key element of the model is the term describing transmission of infection according to a rate β using a mass action term. The idea behind using a mass action term to describe transmission is that individuals of the population meet each other at random and each individual has the same probability per unit time to meet each other individual. Therefore, for a susceptible person the rate of meeting infected persons depends on their density or prevalence in the population, or in mathematical terms λ = βI, where λ is the so-called force of infection. The force of infection is a measure of the risk of a susceptible person to become infected per unit time. It depends on prevalence, either in an absolute sense on the number of infected people in the population, or in a relative sense on the fraction of infected people in the population. In the latter case we would get λ = βI / N with N denoting the total population size. The parameter β is a composite parameter measuring the contact rate κ and the probability of transmission upon contact q, so β= κq. The flow chart in Fig. 12 .5 can be translated into a system of ordinary differential equations as follows: For a full definition of the model the initial state of the system has to be specified, i.e. the numbers or fractions of the population in the states S, I, and R at time t = 0 have to be prescribed. Values for the parameters ν, μ, γ , and β have to be chosen either based on estimates from data or based on assumptions. Then standard numerical methods can be used to compute the time evolution of the system starting from the initial state. Up to now the model describes disease transmission without any possible intervention. We now incorporate vaccination of newborns into this simple system to obtain some important insights into the effect of universal newborn vaccination. We denote the fraction of newborns that are vaccinated immediately after birth by p. Then instead of having a recruitment rate of ν the recruitment is now (1-p)ν into the susceptible compartment, while pν is recruited directly into the immune compartment. In terms of model equations this leads to We will now derive some basic principles using this model as an example. The most important concepts of epidemic models can be demonstrated using the SIR model. Let us first consider an infectious disease which spreads on a much faster time scale than the demographic process. Then, on the scale of disease transmission the birth rate ν and the death rate μ can be considered to be close to zero. When can the prevalence in the population increase? An increase in prevalence is equivalent with dI/dt > 0, which means that βSI/N > γ I. This leads to βS/N > γ or equivalently to βS/(γ N)>1. In the situation that all individuals of the population are susceptible we have S = N; this means that an infectious disease can spread in a completely susceptible population if β/γ >1. The quantity R 0 = β/γ is also known as the basic reproduction number and can in principle be determined for every infectious disease model and can be estimated for every infectious disease. In biological terms the basic reproduction number describes the number of secondary infections produced by one index case in a completely susceptible population during his entire infectious period (Diekmann et al. 1990; Diekmann and Heesterbeek 2000) . The effective reproduction number R -as mentioned in Section 12.2-describes the number of secondary cases per index case in a situation where intervention measures are applied or where a part of the population has already been infected and is now immune. If R 0 > 1 the infection can spread in the population, because on average every infected individual replaces himself by more than one new infected person. However, this process can only continue as long as there are sufficiently many susceptible individuals available. Once a larger fraction of the population has gone through the infection and has become immune, the probability of an infected person to meet a susceptible person decreases and with it the average number of secondary cases produced. If -as we assumed above -there is no birth into the population, no new susceptible individuals are coming in and the epidemic outbreak will invariably end. Analysis of the model shows, however, that the final size of the outbreak will never encompass the entire population, but there will always be a fraction of susceptible individuals left over after the outbreak has subsided. It can be shown that the final size A (attack rate in epidemiological terms) is related to the basic reproduction number by the implicit formula A = 1 -exp(-R 0 A). In other words, if the basic reproduction number of an infectious disease is known, the attack rate in a completely susceptible population can be derived. The situation changes when we consider the system on a demographic time scale where births and deaths play a role. Assuming that ν and μ are positive, with the same arguments as above we get that R 0 = β/(γ +μ). Now if R 0 > 1 the system can develop into an equilibrium state where the supply of new susceptible persons by birth is balanced by the transmission process and on average every infected person produces one new infection. This so-called endemic equilibrium can be computed from the model equations by setting the left-hand sides to zero and solving for the variables S, I, and R in terms of the model parameters. First one obtains the steady state population size as N * = ν/μ (the superscript * denotes the steady state value). The steady state values for the infection-related variables are then given by Hence the fractions of the population that are susceptible, infected and recovered in an endemic steady state are given by Note that the fraction of susceptible individuals S * /N * in the endemic steady state is independent of the vaccination coverage p. On the other hand, the prevalence of infection I * /N * depends on p: the prevalence decreases linearly with increasing vaccination coverage until the point of elimination is reached. This means we can compute the critical vaccination coverage p c , i.e., the threshold coverage needed for elimination from 0 = 1-1 / R 0 -p c as p c = 1 -1 / R 0 . As we would expect intuitively, the larger the basic reproduction number, the higher the fraction of the population that has to be vaccinated in order to eliminate an infection from the population. However, it also follows that elimination can be reached without vaccinating everybody in the population. The reason is that with an increasing density of immune persons in the population, the risk for those who are not yet vaccinated to be exposed decreases. This effect -the indirect protection of susceptible individuals by increasing levels of immunity in the population -is known as herd immunity. Besides the positive effect of decreasing the risk of infection for non-vaccinated persons, herd immunity has the sometimes adverse effect of increasing the mean age at first infection in the population. This can lead to an increased incidence of adverse events following infection, if the coverage of vaccination is not sufficiently high. For an infection such as smallpox with an estimated basic reproduction number of around 5, a coverage of 80% is needed for elimination, while for measles with a reproduction number of around 20 the coverage has to be at least 96%. This provides one explanation for the fact that it was possible to eradicate smallpox in the 1970 s whereas we are still a long way from measles eradication. There are some countries, however, that have been successful in eliminating measles based on a consistently high vaccination coverage (Peltola et al. 1997 ). Building on the basic ideas of the SIR framework, numerous types of mathematical models have been developed in the meanwhile, all incorporating more structure and details of the transmission process and infectious disease dynamics. A first obvious extension is the inclusion of more disease-specific details into a model. Compartments describing a latent period, the vaccinated population, chronic and acute stages of infection, and many more have been described in the literature (Anderson and May 1991). Another important refinement of compartmental models is to incorporate heterogeneity of the population into the model, for example, by distinguishing between population subgroups with different behaviors or population subgroups with differences in susceptibility or geographically distinct populations. Heterogeneity in behavior was first introduced into models describing the spread of sexually transmitted infections by Hethcote and Yorke (Hethcote and Yorke 1994) . Later, during the first decade of the HIV/AIDS pandemic, models were proposed that were able to describe population heterogeneity in sexual activity and mixing patterns between population subgroups of various sexual activity levels (Koopman et al. 1988 ). Models of this type are used frequently for assessing the effects of intervention on the spread of sexually transmitted infections. Age structure has also been modeled as a series of compartments with individuals passing from one compartment to the next according to an aging rate, but this requires a large number of additional compartments to be added to the model structure. This also shows the limitation of compartmental models: with increasing structure of the population the number of compartments increases rapidly and with it the necessity to define and parameterize the mixing between all the population subgroups in the model. The theory of how to define and compute the basic reproduction number in heterogeneous populations was developed by Diekmann et al. (1990) . Geographically distinct population groups with interaction among each other have been investigated using the framework of meta-populations for analyzing the dynamics of childhood infections (Rohani et al. 1999 ). Age structure can best be described as a continuous variable, where age progresses with time. Mathematically this leads to models in the form of partial differential equations, where all variables of the model depend on time and age (Diekmann and Heesterbeek 2000) . Analytically, partial differential equations are more difficult to handle than ordinary differential equations, but numerically solving an age-structured system of model equations is straightforward. In a deterministic model based on a system of differential equations it is implicitly assumed that the numbers in the various compartments are sufficiently large such that stochastic effects can be neglected. In reality this is not always the case. For example, when analyzing epidemic outbreaks in small populations such as schools or small villages, typical stochastic events can occur such as extinction of the infection from the population or large stochastic fluctuations in the final size of the epidemic. In contrast to deterministic models, stochastic models are formulated in terms of integers with probabilities describing the transitions between states. This means that outcomes are given in terms of probability distributions such as the final size distribution. Questions of stochastic influences on infectious disease dynamics have been studied in various ways, starting with the Reed-Frost model for a discrete time transmission of infection up to a stochastic version of the SIR model introduced above (Bailey 1975; Becker 1989) . Finally, stochastic models have been investigated using simulation techniques also known as Monte Carlo simulations. An important theoretical result from the analysis of stochastic models is the distinction between minor and major outbreaks for infectious diseases with R 0 >1. While in a deterministic model a R 0 larger than unity always leads to an outbreak if the infection is introduced into an entirely susceptible population, in a stochastic model a certain fraction of introductions remain minor outbreaks with only a few secondary infections. This leads to a bimodal probability distribution of the final epidemic size following the introduction of one infectious index case. The peak for small outbreak sizes describes the situation that the infection dies out after only a few secondary infections, the peak for large outbreak sizes describes those outbreaks that take off and affect a large part of the population. The larger the basic reproduction number, the larger the fraction of major outbreaks in the susceptible population (Andersson and Britton 2000) . Some aspects of contact between individuals cannot easily be modeled in compartmental models. In the context of the spread of sexually transmitted diseases models were developed that take the duration of partnerships into account, the socalled pair formation models (Hadeler et al. 1988) . Extending those models to also include simultaneous long-term partnerships leads to the class of network models, where the network of contacts is described by a graph with nodes representing individuals and links representing their contacts (Keeling and Eames 2005) . Different network structural properties have been related to the speed of spread of an epidemic through the population. In the so-called small world networks, most contacts between individuals are local, but some long-distance contacts ensure a rapid global spread of an epidemic (Watts and Strogatz 1998) . Long-distance spread of infections is becoming increasingly important in a globalizing world with increasing mobility -as the example of the SARS epidemic in 2003 demonstrated. Recently the concept of scale-free networks where the number of links per node follows a power law distribution (i.e., the probability for a node to have k links is proportional to k -γ with a positive constant γ ) was discussed in relation to the spread of epidemics. With respect to the spread of sexually transmitted diseases a network structure where some individuals have very many partners while the majority of people have only few might lead to great difficulties in controlling the disease by intervention (Liljeros et al. 2001) . Network concepts have also been applied to study the spread of respiratory diseases (Meyers et al. 2003 ). Mathematical models have been widely used to assess the effectiveness of vaccination strategies, to determine the best vaccination ages and target groups, and to estimate the effort needed to eliminate an infection from the population. More recently, mathematical modeling has supported contingency planning in preparation for a possible attack with smallpox virus (Ferguson et al. 2003) and in planning the public health response to an outbreak with a pandemic strain of influenza A (Ferguson et al. 2006) . Other types of intervention measures have also been evaluated such as screening for asymptomatic infection with Chlamydia trachomatis (Kretzschmar et al. 2001) , contact tracing (Eames and Keeling 2003) , and antiviral treatment in the case of HIV. In the field of nosocomial infections and transmission of antibiotic-resistant pathogens modeling has been used to compare hospitalspecific interventions such as cohorting of health workers, increased hygiene, and isolation of colonized patients (Grundmann and Hellriegel 2006) . In health economic evaluations it has been recognized that dynamic transmission models are a necessary requisite for conducting good cost-effectiveness analyses for infectious disease control (Edmunds et al. 1999) . It is a large step from developing mathematical theory for the dynamics of infectious diseases to application in a concrete public health-relevant situation. The latter requires an intensive focusing on relevant data sources, clinical and microbiological knowledge to make a decision about how to design an appropriate model. Appropriate here means that the model uses the knowledge available, is able to answer the questions that are asked by policy makers, and is sufficiently simple so that its dynamics can be understood and interpreted. In the future it will be important to strengthen the link between advanced statistical methodology and mathematical modeling in order to further improve the performance of modeling as a public health tool. One of the first comprehensive texts on epidemic modeling is Bailey (Bailey 1975 ). Bailey treats both deterministic and stochastic models and links them to data. A more recent, but also classic text for infectious disease modeling is Anderson and May (1991); however, it deals mainly with deterministic unstructured models. Its strength is a good link with data and discussion of public health relevant questions. In Diekmann and Heesterbeek (2000) the mathematical theory of deterministic modeling is laid out with many exercises for the reader. A focus of the book is the incorporation of population heterogeneity into epidemic modeling and a generalization of the basic reproduction number to heterogeneous populations. In Andersson and Britton (2000) an introduction to stochastic epidemic modeling is given. Becker (1989) describes advanced statistical methods for the analysis of infectious disease data taking the specific characteristics of these data into account. A recent text incorporating case studies from applications of epidemic modeling was published by Keeling and Rohani (2007) . Stochastic epidemic models and their statistical analysis Estimating in real time the efficacy of measures to control emerging communicable diseases On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations Bernoulli was ahead of modern epidemiology Contact tracing and disease control Evaluating the cost-effectiveness of vaccination programmes: a dynamic perspective Strategies for mitigating an influenza pandemic Planning for smallpox outbreaks Mathematical modelling: a tool for hospital infection control Models for pair formation in bisexual populations Epidemic disease in England -the evidence of variability and persistency of type Gonorrhea transmission dynamics and control Modeling infectious diseases in humans and animals Contributions to the mathematical theory of epidemics-II. The problem of endemicity Contributions to the mathematical theory of epidemics -I. 1927 Contributions to the mathematical theory of epidemics-III. Further studies of the problem of endemicity Sexual partner selectiveness effects on homosexual HIV transmission dynamics Comparative model-based analysis of screening programs for Chlamydia trachomatis infections The web of human sexual contacts Applying network theory to epidemics: control measures for Mycoplasma pneumoniae outbreaks Transmissibility of 1918 pandemic influenza No measles in Finland Opposite patterns of synchrony in sympatric disease metapopulations Collective dynamics of small-world networks Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures