key: cord-0015267-xm5a816o
authors: Kirkeby, Carsten; Brookes, Victoria J.; Ward, Michael P.; Dürr, Salome; Halasa, Tariq
title: A Practical Introduction to Mechanistic Modeling of Disease Transmission in Veterinary Science
date: 2021-01-26
journal: Front Vet Sci
DOI: 10.3389/fvets.2020.546651
sha: 2376bc1361e6d5913017d2397bbb78a6bf149846
doc_id: 15267
cord_uid: xm5a816o

Computer-based disease spread models are frequently used in veterinary science to simulate disease spread. They are used to predict the impacts of the disease, plan and assess surveillance, or control strategies, and provide insights about disease causation by comparing model outputs with real life data. There are many types of disease spread models, and here we present and describe the implementation of a particular type: individual-based models. Our aim is to provide a practical introduction to building individual-based disease spread models. We also introduce code examples with the goal to make these techniques more accessible to those who are new to the field. We describe the important steps in building such models before, during and after the programming stage, including model verification (to ensure that the model does what was intended), validation (to investigate whether the model results reflect the modeled system), and convergence analysis (to ensure models of endemic diseases are stable before outputs are collected). We also describe how sensitivity analysis can be used to assess the potential impact of uncertainty about model parameters. Finally, we provide an overview of some interesting recent developments in the field of disease spread models.

INTRODUCTION A disease spread model is a simplified representation of a real-life system of disease transmission. As defined by Lessler and Cummings (1) , disease spread models (also known as mechanistic models of disease spread) include explicit hypotheses of the biological mechanisms that drive infection dynamics. Therefore, they differ from statistical models such as regression models. Disease spread models are motivated by a need to better understand the transmission dynamics of a disease, predict the spread of the disease in a population and its effects, and study how the spread can be influenced (including the evaluation of different strategies to improve surveillance and control of diseases). The quote, "all models are wrong, but some are useful, " (2) is often stated because disease spread models are simpler than reality, but they generate information which is otherwise difficult to obtain (3) . For example, experiments on disease transmission and control might insufficiently represent real-life disease ecology, or not be feasible due to high resource requirements (such as time and monetary costs), or logistical and ethical constraints. In addition, observational studies of disease spread might not provide comparisons of the relevant control strategies, or not occur in the population of interest (e.g., transboundary diseases).

Models of disease transmission can represent diverse diseases, including bacterial and viral infections, as well as parasites and vector-borne diseases, in a range of host populations and environments, and at different scales (4) . Disease spread models might identify critical elements and knowledge gaps by reconstructing a system using available knowledge (5) . They can also be useful decision-making tools by simulating surveillance or control of a specific disease and comparing strategies in specific contexts, such as outbreak situations (6, 7) . Models have also been used to inform outbreak preparedness [e.g., (8, 9) ], and the control of endemic pathogens [e.g., (10) (11) (12) (13) ].

Here, we focus on modeling the spread of infectious diseases of animals in a range of contexts. The methods described are not unique to veterinary systems and are used in other disciplines such as ecology and human health. In particular, we focus on a class of model called individual-based models (IBMs). Mancy et al. (4) provide an in-depth discussion of the different motivations for developing disease spread models in ecology and animal health. They present a conceptual framework to guide model construction, focusing on the pre-modeling stage (model selection, establishing, and testing the theory). In building on Mancy et al. (4) our objectives are 3-fold; (1) to provide a practical introductory guide to the process of developing a mechanistic model of animal disease transmission using IBMs, aimed at researchers beginning in this field; (2) to describe important concepts before, during and after the programming stage of developing model of animal disease transmission; and (3) to provide practical examples of models, including code, in veterinary science. Thus, we provide a hands-on introduction to model building, and its use and challenges, for scientists starting to work on disease spread models.

Before we describe the steps of model building in the context of IBMs, we briefly describe some key terms, concepts, and approaches applied in disease spread modeling. Terminology in this field can be inconsistent; for a list of terms and definitions used throughout this guide, see Appendix 1.

Disease spread models simulate the transmission of an infectious disease between the disease hosts, who are modeled as units of interest. This unit is the smallest entity of the model and could be an individual animal (or part of it; for example, a quarter of the udder in a mastitis model), a group of animals, herds, or populations in regions or countries. The units of interest can be aggregated and modeled as proportions of the total population in each disease state (see below) at a given time, or modeled as individuals whose disease status is tracked through the disease states included in the model.

The simulated system includes time, making the model dynamic. Time can be modeled as a continuous or discrete process. In the latter a fixed time-interval is chosen and the model steps through each consecutive interval (time-step) and updates the numbers of units of interest in each disease state from the beginning to the end of the simulated period (for example, every day, for a year) or until the disease fades out. In contrast, if time is modeled as a continuous process, the rate of change in the relative numbers of units of interest in each disease state in the system is continuously modeled using differential equations.

For discrete time models, the length of a time step is designated by the modeler and depends on the disease dynamics, purpose of the model (for example, predictions in monthly timesteps might be useful for surveillance or disease control), the availability of data needed to parameterize the model (outbreak data might only be available on a yearly scale), and the time spent by an individual unit of interest in each disease state of the model (see below). Whilst daily time-steps are typical for most discrete disease-spread models (11) , weekly (14) or biweekly [e.g., (15) ], biannual (16) , or even yearly time steps can be used [for example, when simulating long duration control programs, such as (13)].

A model can be deterministic or stochastic. A model is stochastic when there is variation in model outputs arising from the use of distributions to describe input parameters (rather than fixed values), or by allowing model events to occur as random processes (inherent stochasticity). See section "Modeling Disease Transmission" for illustration of the difference between deterministic and stochastic. The outputs from a stochastic model will vary every time the model is run. In contrast, outputs from deterministic models are consistent each time the model is run.

Disease spread models represent the dynamics of infection, or progression of the modeled units of interest through disease states, for instance Susceptible (S), Infectious (I), and Recovered (R) states (an SIR model). In a susceptible state, a unit of interest has yet to be exposed to an infectious individual and infected (termed "effective contact"). Once effective contact has occurred, an individual is in an infectious state prior to transition to a recovered state (or death). This basic formulation can be expanded with other disease states; for example, an Exposed (E) state representing the latent period of the infection can occur prior to transitioning to the Infectious (I) state [for example, within-herd spread of FMD; (17) ]. The modeled states are dependent on the natural history of the disease, the purpose and scale of the model, and the resolution of available data. For example, differentiation of clinical and subclinical infectious states can be included if the subclinical state is considered significant to spread given the scale of the model, or if clinical detection of the disease is an essential aspect in the model. In a model of rabies spread, the preinfectious period of rabies was considered essential to include in a model in which the dog populations were small (18) , and not considered necessary in a similar but larger-scale model of rabies spread in dog populations in Chad (19) . We illustrate how the dynamics of infection as modeled in an SEIR model relate to the dynamics of disease (the observed states) in Figure 1 .

The way in which the units of interest contact each other, or how they "mix, " is a core component of a disease model. Homogeneous contact means that all the units have equal probability of contact with each other (no clustering). Heterogeneous contact means that the probability of contact between units of interest is not equal, hence clustering (spatial or related to other contact characteristics) exists in the population. Heterogeneous contact can be modeled by stratifying models into population groups (for example, by age or farm type), modeling contacts between units of interest according to a network structure, or modeling specific characteristics of units that influence contact [for example, furious rabies in dogs; (18) ].

Since Kermack and McKendrick first formulated the basic compartmental equation-based SIR model using differential equations in 1927 (20) , numerous approaches to modeling disease transmission have been developed. For a comprehensive description of modeling approaches, see Mancy et al. (4) . Briefly, models can be classified according to how the disease hosts are modeled (as individual units of interest, or as groups in which the proportion of units of interest in disease states are followed) and how contact occurs (the connectivity between units), then further differentiated on how time is modeled (discrete or continuous) and whether stochasticity is included.

Here, we focus on individual-based models [IBMs, or Individual-level models; Mahsin et al. (21) ] in which individual units of interest are described and followed through the disease states. The units of interest in IBMs represent discrete entities (such as individuals or herds) and time steps are discrete.

An advantage of IBMs is that units of interest can be assigned their own properties that can influence disease transmission, detection or control. They are therefore useful to simulate heterogeneity in disease transmission between the units of interest. For example, in a model of foot-and-mouth disease (FMD), an individual herd might be predominantly either sheep or cattle, which might influence disease susceptibility and transmission at the herd level (22, 23) . Agent-based models (ABMs) are a subset of IBMs in which contact-and hence disease transmission-is simulated between explicit pairs of individual units of interest. ABMs often include explicit movement of-and therefore, contacts between-individual units of interest, thus introducing contact heterogeneity in the population due to spatial variation (24). In an example in which rabies transmission was modeled, individual dogs were assigned specific roaming characteristics that influenced their contacts with other dogs (25) . In a further example, heterogeneity of contacts between individuals was assigned using individuals' social network parameters (18, 19) . Consequently, these models can have a high level of complexity, but also be computationally intensive (and consequently, relatively slow to implement and simulate).

If the unit of interest in an IBM is a group of individual animals (for example, herds), within-group disease spread can be modeled using an equation-based model with proportions of the unit of interest in disease-state compartments. In this case, specific individuals are not tracked. Such models are called nested models in ecological modeling (26) .

Model building can be divided into three stages: preprogramming, programming and post-programming. These stages are common to all model types, and include different elements that should be considered (Figure 2) . We describe the concepts associated with each stage in detail below (labeled according to Figure 2 ).

In Appendix 2 (and https://github.com/ckirkeby/MDT), code examples are shown. We include code for a difference equation model, and a differential equation model (two model types not addressed in this article, but to enable the readers to compare the inputs and outputs with IBMs), and IBMs, for which we include examples of an individual-based stochastic model (at herd level), and an individual-based stochastic model (at individual animal level). We link the code for IBMs with each stage below.

When designing a model, it is important to consider the research question to be investigated. This not only drives the type of model that might be appropriate, but also dictates the model outputs required by the end-user (27) .

For example, whilst a model generally estimates the epidemiological consequences of the disease in terms of the number of infected individuals and epidemic duration, in the case of exotic diseases, the outputs could also be needed for contingency planning to improve surveillance and control; for example, identifying sentinel herds, culling capacity, or laboratory capacity [for example, (28, 29) ]. In this case, it is essential to generate capacity-related data, such as the number of surveillance teams required, by including these parameters in the model. Similarly, if the purpose is to compare different surveillance strategies, sensitivity and specificity of tests used to detect disease need to be included (30, 31) .

Evaluation and identification of optimal control strategies given a particular set of circumstances and constraints might also be a goal [for example (12, 17, 23, 32) ]. This would require policy-specific knowledge to inform model processes, as well as data and knowledge of mechanisms to simulate control strategies. For example, to simulate vaccination, estimates of vaccinationspecific parameters such as the number of individuals or herds vaccinated per day, vaccine efficacy, time required to order vaccine and perform vaccination could be included (9, 32) . In addition to epidemiological metrics, the optimal control strategies could be defined according to economic outputs (33) such as in a bio-economic disease spread model [for example, (11) ].

In the context of an IBM, the minimum inputs that must be included are a parameter to describe disease transmission (β; see later), and the number of individuals in each disease state. This will include at least one infectious individual as well as susceptible individuals (see code example, Appendix 2; https://github.com/ ckirkeby/MDT). Additional parameters, such as the number of surveillance teams deployed, can be included as the model steps through the discrete time intervals; for example, in response to trigger levels such as a threshold number of infected animals for disease detection.

The largest unit of interest is selected so the disease spread model sufficiently represents the true system. As described previously, this epidemiological unit of the model can range from individuals [e.g., (16) ] or their parts [e.g., (12) ] to sub-or entire populations (34) .

The choice of epidemiologic unit of interest is highly dependent on the purpose of the model, the disease of concern and the data available to parameterize the model. In models in which disease spread needs to be captured at the individual animal level (for example, because disease detection or control is performed at this level), individual animals are modeled and followed. In the case of modeling the spread of an exotic disease in animals aggregated in herds, the herd might be a more realistic unit to model, because surveillance and decisions occur at the herd-level.

Practical programming considerations also influence the choice of this unit of interest. For example, it is more likely that individual animals as units of interest are computationally more challenging, and therefore, herds are often mire suitable to be the epidemiologic unit of interest (see also Section Programming stage). In some systems, there might be more than one unit of interest to be modeled, as in the case of vector-borne diseasesboth the vector and the animal can be units of interest (35) .

In Appendix 2 we provide code examples of IBMs using different units of interest (also available online at https://github. com/ckirkeby/MDT).

To create a model that is a sufficient representation of a real-life system, decisions need to be made about which known processes to include and exclude. This decision is bound to available information on the system. Such information is important to gather prior to model building to assess the level of uncertainty that is due to limited knowledge, how much data about the system is available, and the feasibility of delivering requested outputs. If essential data are missing to fulfill the designated purpose, options include collecting more data before modeling is initiated, re-specifying model complexity, or re-evaluating the model purpose. Following the principle of parsimony, a model should only be as complex as necessary to achieve the model purpose, thereby requiring the minimum number of assumptions (36) .

Processes that should be considered include the population dynamics of the unit of interest (birth and death rate, and lifespan-this is usually based on age, or in the case of a livestock production system, this could be parity), migration of individual units in and out of the system, the contact patterns between the units and the production system of the modeled population (for example, milk or beef production), if this is relevant. It also includes knowledge of the epidemiology of the disease to be modeled, such as the relevant disease states and their durations, the modes of transmission of the causative pathogen (for example, whether or not airborne spread is an essential pathway of transmission) and how the disease develops in the individuals.

Model specification (units of interest, disease, and system dynamics and how they are modeled-for example, discrete vs. continuous time and deterministic vs. stochastic) is typically an iterative process and is re-examined as data gathering for parameterization occurs (Figure 2 , section Documentation and Communication). If data about population dynamics, disease dynamics and the system in which disease occurs are available at an individual level, and modeling at this level of detail and heterogeneity is considered valuable (for example, if the population is small or heterogeneity of the system is considered an important feature of disease transmission), an IBM is likely suitable. Otherwise, other model types can be considered (4).

In Figure 3 we show the difference in output between a deterministic and a stochastic model. 

Programming languages can be classified in many ways-such as whether interpreted directly or compiled (running one single line of code at a time, rather than all the code has to be run together; for example, Python and R vs. C++ and Fortran, respectively); and whether they are "high" or "low" level languages. This latter classification refers to the machine-readability of the language; many languages used in the context of disease modeling can be considered high-level (for example, Java, C++, R, and Python).

In general, programs written using high-level languages require more memory space but are more readable by a human, and therefore more accessible to people without detailed programming knowledge. Programs written using low-level languages (e.g., Assembly language) can better utilize hardware specific features. These programs require a high level of knowledge to write and maintain. They can be hardware-dependent making them less portable between computer architectures.

Features resulting from language classification are not always exclusive; with many factors affecting the overall performance and efficiency of a program. For example, a complex "realworld" program written in a more user-friendly and high-level language with a modern optimizing compiler can produce highly efficient machine code with excellent performance. The result is likely to outperform an equivalent program hand-written in the less user-friendly, low-level Assembly language converted to machine code via an assembler. Advances in computational power and improvements in system architecture enable the horizontal scaling of models by running processes in parallel across multiple cores to reduce "wall time" (the time taken to complete a simulation).

Focusing on final run speed also ignores the concept of overall programming productivity. Programming in some languages is more challenging and less accessible to the research team, which increases the time required for programming. An increasing number of researchers use the free software R (37), which is a statistical programming language suitable for building many model types, including equation-based [for example, (38) ] and individual-based models [for example, (11, 32) ]. There are many packages available for languages such as R, and they are wellsupported and maintained by R's open-source community, which allows the team to focus on modeling the system and the disease.

In regards to code programming, we highly recommend that modelers annotate their code during modeling with detailed descriptions of each part of the code. For a description of good practice in animal health modeling, see EFSA (39). Annotation assists the modeler to remember the function of each line of code, and also facilitates use of the model by others. Following publication of a study, it is a requirement of many journals that the code be made available to readers. Version control such as git (https://git-scm.com, accessed 10/09/2019) is a very valuable tool so that modelers can easily track changes in the code, and view previous versions (branches) of the model. This is of particular value when more than one modeler is involved in the project or when published code is used by other researchers. Locally, version control can be as simple as saving the script in a new file named with the specific day it is changed. We also highly recommend that during the programming process, each line or chunk of co-de should be executed with fictitious inputs to check for errors (debugging). This is part of the model verification (see section Model Verification and Validation for more details).

Initially when constructing an IBM, the host population dynamics are modeled as the "background" for the disease dynamics. For example, a model of canine rabies spread requires a population of dogs or a foot-and-mouth disease model the population of cloven-hoofed animals. An understanding of the population of interest's demographics are critical. Whilst demographic data for livestock populations can often be gained from government or industry sources, it might be necessary to conduct studies of other populations (such as companion animals) prior to modeling to for example determine age structure and birth and death rates (40) .

The population dynamics are linked to the disease model; for example, newborns can be susceptible, infected or immune (see section Modeling disease transmission). Also, characteristics can be allocated to the units of interest in case they influence disease transmission. In an example of Johne's disease (paratuberculosis) transmission, individual cattle or herds are modeled, and characteristics, such as individuals' milk production and lactation duration, are included because these characteristics influence disease spread [e.g., (11, 41) ].

In disease spread models, it can be important to include a spatial component to the population to allow spatio-temporal modeling of disease transmission (see section Modeling disease transmission). This can be realized by using geolocations of the units of interest, e.g., farms, as a feature of the population structure [e.g., (17, 42) ]. Spatio-temporal modeling could also represent population structures other than farms, as in the case of modeling spatio-temporal distributions of vectors that transmit bluetongue virus (43) , or in the location of dog's residence in a rabies transmission model (44) .

Once the background structure of the disease dynamic system has been modeled, it should be verified and tested (see sections Model verification and validation) before disease transmission is added to the model. This is to ensure that the model simulates the system with sufficient accuracy, as well as to determine computing requirements such as the number of iterations required for burn-in (see section Modeling disease transmission).

As discussed previously, each stage of disease in the transmission model should reflect a -state during the course of infection in the modeled system. In the simplest framework, an SI model with two, mutually exclusive disease states; Susceptible (S) and Infectious (I), all individuals in the model are assigned to either S or I (see code examples in Appendix 2; https://github.com/ ckirkeby/MDT). For each simulated time step, each individual has a probability of acquiring infection and thus transitioning from S to I, depending on the contact pattern between individuals and the disease transmission rate given a contact. In the case of the SI model, there is no probability of individuals returning to the S state. In the case that animals can recover from the disease, the model becomes an SIS model in which infectious individuals return to the S state. The transmission from I to S is quantified by the recovery rate (see below, in the context of an SIR model), which can be influenced by self-recovery or by treatment. The recovery rate is thus a probability of recovering during each time step. Recovery rates must be estimated from epidemiological studies on the duration of infection. This duration of infection can either be modeled as a fixed timespan, i.e., a fixed number of days can be assigned to it, or as a distribution, after which it will revert to the S state.

Another common framework is the SIR model (see the code example in Appendix 2, https://github.com/ckirkeby/MDT), in which the infectious individuals can enter the Recovered (R) state -which represents either "recovery" (and resistance to infection) or "removal" from the population; for example, in the case of a rabies model, infected dogs always die and therefore are removed. The transition from I to R is also modeled via a recovery rate (denoted as "r" in the code example). Following this logic, the disease transmission framework can be further extended dependent on the disease; for example, by introducing an Exposed (E) state for latently infected individuals before progressing to the I state. As previously mentioned, even if some disease states occur in reality, it is not always useful or necessary to represent them in the model.

In the case of modeling endemic diseases, once the population and disease dynamics frameworks are modeled, an IBM might need to be simulated for enough time steps to reach a stable prevalence ("burn-in" period; the number of time steps for the population characteristics and the disease prevalence to stabilize). When such a model is used to assess control strategies, these strategies are usually implemented after the burn-in period, when a stable situation has been reached.

The process of disease transmission is the core dynamic process in the model. Generally, transmission can be considered as either direct (from host to host) or indirect, for example via the environment or vector transmitted (45) . It can also be dependent on model features that increase contact heterogeneity; for example, some models are spatially explicit and the probability of transmission varies according to distance, mimicking a system in which transmission varies with spatial location (46) .

Since disease transmission is the core process in a disease transmission model, we guide the reader through the foundation of this in the context of an IBM, such as those shown in code in Appendix 2 (https://github.com/ckirkeby/MDT). In the case of direct transmission, we first describe β, a parameter that underpins the modeling of disease transmission in equation based models, and then we describe how this parameter can be used in IBMs (47) . Beta is defined as the per capita rate at which two specific individuals come into effective contact per unit time [sometimes called the transmission rate; Vynnycky and White (48) ]. An effective contact is one which is sufficient for disease transmission to occur. This effective contact rate, β, comprises a contact rate between individuals (C), and the probability of transmission per contact (P):

The contact rate C in the above equation is defined per unit time, and is fundamentally different between density-dependent or frequency-dependent transmitted diseases (49) (50) (51) . In densitydependent transmission, the greater the density of individuals, the greater the probability of contact per unit time (52) :

where dI/dt is the rate of new infections per unit time t, β is the effective contact rate, and S and I are the number of susceptible and infected individuals, respectively.

In frequency-dependent transmission, the rate of new infections per unit time, dI/dt, is independent of the density of individuals in the population (N):

where S and I are the same as in Equation 2, but β ′ is not equivalent to β in Equation 2 due to the underlying difference between the contact rates (C) of these two types of transmission. The difference between these two types of transmission is demonstrated in a study of mange in a fox population in the UK, in which researchers compared density and frequency dependent transmission and found that mange transmission was most likely frequency dependent in this population (53) .

As an example of a method to allow a random process of becoming infected that can be used at each time step in an IBM, we extend Equation 3 to calculate a probability of infection per susceptible individual, P(S), so each individual can be separately subjected to a Bernoulli process of becoming infected (54):

with the same notation as for Equation 2 , and N is the total number of individuals in the modeled population. If β is fixed, then the probability of infection for all susceptible individuals is the same (for all individuals and all simulated time), and assumes homogeneity of transmission in the population. In IBMs, β may vary from one individual to another representing the susceptibility and infectiousness of the individual, thus representing natural heterogeneity in transmission. This could be driven by a lower probability of infection as a result of, for instance, vaccination or due to different contact rates between individuals. The R code examples demonstrate this type of transmission in Appendix 2 (https://github.com/ckirkeby/MDT). In this way, the infection pressure is scaled to the proportion of the population that are infected within each time step, i.e., I changes over time, whereas β and N (within a closed system) remain constant. The infection process is dynamic because the P(S) changes over time with changing numbers of I in the population (assuming a fixed N and β).

As mentioned at the start of this section, it is possible to consider the spatial structure of the underlying demography and define the probability of effective contact per time step for a susceptible unit of interest dependent on its distance from infectious units in the model. For this approach, distance kernels can be built from which the probability of effective contact can be drawn (such as used in 8, 23) . This spatially dependent contact rate can be combined with information on the frequency of contacts between units of interest. For example, the frequency of potential contacts between herds may not only depend on the distance between them, but also on the frequency of movements between herds, which in turn may depend on the herd types (55, 56) .

When appropriate knowledge and data are available, the contact structure of a population can be based on a social network (18, 57) . A heterogeneous herd contact structure between groups of animals (for example, calves and heifers) and homogenous contacts within animal groups might also be described (11, 12) .

There are also several ways to simulate indirect (environmental) disease transmission. It can be similarly spatially dependent as described for the direct transmission, or simulated as a fixed transmission probability:

Here, P(S) is the probability of infection of a susceptible individual S, and β i is the indirect disease transmission rate. This fixed transmission rate can be based on a stable baseline infection pressure, or more variable, such as bacteria from infected individuals shed over time in the environment (11) . When disease transmission occurs through both direct and indirect contacts, a combination of both of these direct and indirect pathways can be used (12) .

In Figure 4 we show an example of an SI model in which the transmission rate, β, is varied.

Model verification and validation is essential to ensure that model concepts, programming and outputs are reliable, accurate, and representative for the modeled system (27, 58) . Model verification ensures that model code and the conceptual framework are implemented correctly. Verification is also called computerized model verification, internal validation, or conceptual validation (58) . Several methods can be used for model verification, including: (1) The rationalism method, in which several scenarios are simulated with different inputs, and outputs are compared to determine whether the changes in outputs are rational given the changes in the inputs (sensitivity analysis, see below); (2) The tracing method, in which individuals or other units of interest are followed through the different time steps and checked that they behave as expected; and (3) The face validitation method, in which an expert is asked to evaluate the outputs or even the code to verify the credibility of the model.

Model validation (also called external or operational validation) ensures that the model predictions have a satisfactory range of accuracy in relation to the actual behavior of the modeled system in real life (adapted from 54). Real-life data (i.e., empirical outbreak data) is needed to fully execute this process. To our knowledge, few models in veterinary science have been externally validated (59) (60) (61) . This is usually due to the high associated costs or ethics of obtaining such data, and the complexity of the modeled systems. If empirical outbreak data are lacking from the setting in which the model was built and applied-such as in the case of exotic diseases and regions with historical disease freedom-then validation options might include either adapting the model to a region where data are available, or using previous outbreak data. For example, historical data from the last Swiss FMD outbreak was used to validate a current FMD model for Switzerland (61) . In the two upper plots, β is higher than the two lower plots. This results in higher number of infected than susceptible at day 12 in the upper scenario compared to the lower scenario. In the histograms on the right, the resulting distributions of susceptible (dark gray) and infected (light gray) individuals are shown. Note that β is not kept constant, but varied for each iteration, incorporating uncertainty around this key parameter.

Convergence analysis assesses the repeatability of the outputs based on the number of iterations (repetitions) the model is simulated, and is conducted before final model simulations. Above a given threshold of simulations, the output statistics should be independent of the number of model iterations. This stability can be checked by ensuring that the variance of the outputs of interest (for example, the number of infected individuals or epidemic duration) is stable. A commonly used approach is to visualize the change in the variance when increasing the number of iterations (62), or to use thresholds of the coefficient of variance as a decision metric (9, 18, 63) .

We have included an example of how to determine convergence of a model in Appendix 2 (https://github.com/ ckirkeby/MDT).

Sensitivity analysis is essential to understand and examine the robustness of model predictions to changes in input parameter values, model structure and processes (64) . Sensitivity analysis can be used to identify parameters and processes that have a major influence on model predictions; therefore, the values of these parameters-and the way in which processes are modeledmust be certain enough to produce model predictions acceptable to the end-user.

During sensitivity analysis, the behavior of the model and the outputs of interest are examined when the model or its parameters are varied. There are different ways to approach sensitivity analysis. Sensitivity analysis of input parameters can be assessed by changing input values within a specified range (local sensitivity analysis) or the entire parameter space (global sensitivity analysis) to examine the impact of these changes on model outputs. The influence of parameters can also be examined singly (one-at-a-time sensitivity analysis) or in combination with other parameters [for example, a "Sobol" sensitivity analysis, (65) ]. Sensitivity analysis can also be implemented by modeling a specific process in alternative ways to examine the impact of this process on model predictions (this is sometimes referred to as structural sensitivity analysis).

The simplest method of sensitivity analysis of input parameters is one-at-a-time perturbations (66). However, this does not allow assessment of the sensitivity of the model output to changes in combinations of other parameter values' change. Many more methods exist and have been used in the context of IBMs (10, 66, 67); a complete review is beyond the scope of this article.

We have included code in Appendix 2 to conduct a simple sensitivity analysis on a model parameter (also available online at https://github.com/ckirkeby/MDT).

Presentation of clear results that deliver project requirements is an important element for transparent communication of the model outputs. This should already be reflected and incorporated during the design stage. Deterministic models provide single value outputs (without variation), whereas stochastic models provide distributions of outputs. Thus, when results from stochastic models are presented, it is essential to not only show median or mean values, but also the variation around these values; for example, using boxplots or histograms. From a disease spread model, outputs usually include the number of infected units of interest and the epidemic duration. Other outputs can also include the number of units of interest under control (culled, vaccinated, or banned in movements), economic metrics in case of a bio-economic model, predicted changes in production (such as milk yield or growth rates), or maps from spatiallyexplicit models.

Good documentation is essential to enable reproducibility of the model, communication of model outcomes, and comparison between different models. Standardized protocols for disease spread model documentation have been developed, such as the ODD (Overview, Design concepts, and Details) (68) and TRACE (69) and can be used to communicate models in scientific publications.

At all stages of model design, development and implementation, communication should be maintained with relevant stakeholders. These will include the end-users of the model, but can also include experts for the specific disease and system modeled, and those that are funding model development and implementation. Comprehensive communication at all stages ensures that the model focus remains on the defined purpose so that useful information is provided to the end-users, or that the end-user can adapt the model according to specific needs during the modeling process.

Recent developments in disease spread models used in veterinary science include the development of models that model more than one disease. Mostert et al. (70) present a bio-economic stochastic dynamic model that simulates subclinical and clinical ketosis, mastitis, metritis, displaced abomasum, and lameness in dairy cattle. In intense production systems, such as in the dairy sector, it is an advantage to evaluate the impact of several diseases concurrently, to optimize management strategies. Inclusion of economic impacts and the economics of disease mitigation in these models facilitates broader use, in addition to improving animal welfare.

Many populations can also be captured in one model. One example is the trend for models of vector-borne diseases (which we have not covered here, and introduces at least one more population, the vector, into the model).

Ensemble modeling is a relatively new approach in veterinary epidemiology (71) . Decisions on how to respond to an incursion of FMD virus in a previously disease-free country are complex and several models of FMD spread have been developed and applied. These vary in their disease processes modeled, assumptions made and parameterization. For any set of inputs, outputs from these various models are plausible. Variability in model outputs can be valuable because these are likely to include the range of realizations that could be observed during an FMD outbreak. A method of reconciling variabilityborrowed from fields such as meteorology, climate-change science and medical science-has recently been applied to this situation. Using outputs from six different models which simulated the spread of FMD in the Midlands and Wales areas of the United Kingdom in 2001, Webb et al. (71) applied a Bayesian Reliability Ensemble Average (BREA) method to integrate outputs regarding outbreak duration and two control methods. The BREA method determines the weights applied to each model output based on agreement with observed data (bias criterion) and consensus between models (convergence criterion). The latter was used by Webb et al. (71) and their case study highlights the potential of ensemble modeling to reduce the uncertainty of outputs from individual models, thus improving decision-making.

We emphasize two well-known, key axioms: (1) . disease spread models are simplified representations of real-life systems so that "all models are wrong, but some are useful" (2) , and (2) . model outputs can only be as accurate as model inputs allow.

Model simplification is often driven by data availability; therefore, full use of any available data is recommended. However, when considering whether more data should be collected or how a process should be modeled, we note that highly detailed models (more complex processes with more parameters, such as IBMs) can produce output that might be less generalizable than more simplified models. In addition, the output from more simplified models might adequately predict the essential components of disease transmission needed to achieve the endusers' objectives. This presents modelers with dilemmas: a highly detailed model is not necessarily less "wrong" or more "useful" than a simplified model. Whilst the steps of model verification, validation, and sensitivity analysis can help avoid too much or too little simplification, we recommend that particularly during the design phase, modelers focus on development of the simplest model to achieve useful output-whilst we focus on an introduction to modeling using IBMs, we do not suggest that they are the foundation of modeling approaches.

Communication between end-users and modelers about the value and assumptions of a model is critical. We therefore recommend that modelers and end-users, wherever possible, establish a framework for communication about modeling objectives, the need for verification, validation, and sensitivity analysis, and application of model outputs to ensure optimal use of simulation modeling, to improve animal health, welfare, and production.

The original contributions generated for the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author/s.

Mechanistic models of infectious disease and their impact on public health

Science and statistics

Wrong but useful-what Covid-19 epidemiologic models can and cannot tell us

An integrated framework for process-driven model construction in disease ecology and animal health. F Vet Sci

Reviewing model application to support animal health decision making

Transmission dynamics and economics of rabies control in dogs and humans in an African city

Eradicating BVD, reviewing Irish programme data and model predictions to support prospective decision making

Simulation of transmission and persistence of African swine fever in wild boar in Denmark

Development of a novel rabies simulation model for application in a non-endemic environment

Saltelli global sensitivity analysis and simulation modelling to identify intervention strategies to reduce the prevalence of Escherichia coli o157 contaminated beef carcasses

Simulating the epidemiological and economic impact of paratuberculosis control actions in dairy cattle

A strain-, cow-, and herd-specific bio-economic simulation model of intramammary infections in dairy cattle herds

Epidemiological and economic evaluation of alternative on-farm management scenarios for ovine footrot in Switzerland. Front Vet Sci

A stochastic model simulating paratuberculosis in a dairy herd

A simulation model to calculate costs and benefits of dry period interventions in dairy cattle

A simulation of Johne's disease control

Vaccination against foot-and-mouth disease I: epidemiological consequences

Rabies-induced behavioural changes are key to rabies persistence in dog populations: investigation using a network-based model

The importance of dog population contact network structures in rabies transmission

A contribution to the mathematical theory of epidemics

Geographically-dependent individuallevel models for infectious diseases transmission

Parameterization of the duration of infection stages of serotype O foot-and-mouth disease virus: an analytical review and meta-analysis with application to simulation models

Simulation of foot-andmouth disease spread within an integrated livestock system in Texas, USA

Using roaming behaviours of dogs to estimate contact rates: the predicted effect on rabies spread

Linking within-and between-host dynamics in the evolutionary epidemiology of infectious diseases

Principles of epidemiological modelling

Estimating resource requirements to staff a response to a medium to large outbreak of foot and mouth disease in Australia

Simulated effects of introducing emergency vaccination or depopulation during FMD outbreaks in Denmark

Sampling considerations for disease surveillance in wildlife populations

The impact of resources for clinical surveillance on the control of a hypothetical foot-and-mouth disease epidemic in Denmark

Evaluation of the benefit of emergency vaccination in a foot-andmouth disease free country with low livestock density

An epidemiological and economic simulation model to evaluate strategies for the control of bovine virus diarrhea in Germany

A stochastic model simulating paratuberculosis in a dairy herd

Simulating spread of bluetongue virus by flying vectors between hosts on pasture

Model relativism: a situational approach to model building

R: A Language and Environment for Statistical Computing, version 3.3. 1. Vienna: R Foundation for Statistical Computing

Bayesian inference in an extended SEIR model with nonparametric disease transmission rate: an application to the Ebola epidemic in Sierra Leone

Guidance on good practice in conducting scientific assessments in animal health using modelling

Demographic studies of owned dogs in the Northern Peninsula Area, Australia, to inform population and disease management strategies

Modelling the spread of bovine viral diarrhea virus (BVDV) in a beef cattle herd and its impact on herd productivity

A spatial simulation model for the dispersal of the bluetongue vector Culicoides brevitarsis in Australia

Modelling targeted rabies vaccination strategies for a domestic dog population with heterogeneous roaming patterns

Infectious Diseases of Humans

Understanding African swine fever infection dynamics in Sardinia using a spatially explicit transmission model in domestic pig farms

Evaluating potential sources of bovine tuberculosis infection in a New Zealand cattle herd

An Introduction to Infectious Disease Modelling

A clarification of transmission terms in host-microparasite models: numbers, densities and areas

Hostparasite population dynamics under combined frequency-and density-dependent transmission

Modeling Infectious Diseases in Humans and Animals

How should pathogen transmission be modeled?

Demonstrating frequency-dependent transmission of sarcoptic mange in red foxes

Mathematical models in infectious disease epidemiology

Results of epidemic simulation modeling to evaluate strategies to control an outbreak of foot-and-mouth disease

The foot-and-mouth epidemic in Great Britain: pattern of spread and impact of interventions

Which mechanisms drive seasonal rabies outbreaks in raccoons? A test using dynamic social network models

Verification and validation of simulation models

Stochastic simulation modeling to determine time to detect bovine viral diarrhea antibodies in bulk tank milk

Vaccination of dogs in an African city interrupts rabies transmission and reduces human exposure

Evidence for emergency vaccination having played a crucial role to control the 1965/66 foot-and-mouth disease outbreak in Switzerland

Simulating the epidemiological and economic effects of an African swine fever epidemic in industrialized swine populations

Controlling disease outbreaks in wildlife using limited culling: modelling classical swine fever incursions in wild pigs in Australia

Sensitivity analysisto identify keyparameters in modelling the spread of bovine viraldiarrhoea virus in a dairy herd

Making best use of model evaluations to compute sensitivity indices

Identification and review of sensitivity analysis methods

The ODD protocol: a review and first update

Towards better modelling and decision support: documenting model development, testing, and analysis using TRACE

Estimating the economic impact of subclinical ketosis in dairy cattle using a dynamic stochastic simulation model

Ensemble modelling and structured decision-making to support emergency disease management

This article was initiated during a special course in simulation modeling held at the University of Sydney in June 2019, funded by the University of Sydney and University of Copenhagen Partnership Collaboration Awards 2019. We thank David Brookes for his advice on information technology.

CK wrote the first draft of the manuscript. All authors participated in writing the manuscript.

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fvets. 2020.546651/full#supplementary-material Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.