key: cord-0000069-pk7pnmlo authors: Hanley, Brian title: An object simulation model for modeling hypothetical disease epidemics – EpiFlex date: 2006-08-23 journal: Theor Biol Med Model DOI: 10.1186/1742-4682-3-32 sha: 61d9a0fe39f4e845c44a06787de6f5f033b998a3 doc_id: 69 cord_uid: pk7pnmlo BACKGROUND: EpiFlex is a flexible, easy to use computer model for a single computer, intended to be operated by one user who need not be an expert. Its purpose is to study in-silico the epidemic behavior of a wide variety of diseases, both known and theoretical, by simulating their spread at the level of individuals contracting and infecting others. To understand the system fully, this paper must be read together in conjunction with study of the software and its results. EpiFlex is evaluated using results from modeling influenza A epidemics and comparing them with a variety of field data sources and other types of modeling. EpiFlex is an object-oriented Monte Carlo system, allocating entities to correspond to individuals, disease vectors, diseases, and the locations that hosts may inhabit. EpiFlex defines eight different contact types available for a disease. Contacts occur inside locations within the model. Populations are composed of demographic groups, each of which has a cycle of movement between locations. Within locations, superspreading is defined by skewing of contact distributions. RESULTS: EpiFlex indicates three phenomena of interest for public health: (1) R(0 )is variable, and the smaller the population, the larger the infected fraction within that population will be; (2) significant compression/synchronization between cities by a factor of roughly 2 occurs between the early incubation phase of a multi-city epidemic and the major manifestation phase; (3) if better true morbidity data were available, more asymptomatic hosts would be seen to spread disease than we currently believe is the case for influenza. These results suggest that field research to study such phenomena, while expensive, should be worthwhile. CONCLUSION: Since EpiFlex shows all stages of disease progression, detailed insight into the progress of epidemics is possible. EpiFlex shows the characteristic multimodality and apparently random variation characteristic of real world data, but does so as an emergent property of a carefully constructed model of disease dynamics and is not simply a stochastic system. EpiFlex can provide a better understanding of infectious diseases and strategies for response. The most commonly used measure in public health, R 0 , is estimated from historical data and derived from SIS/SIR type models (and descendents) for forward projection [14, 15] R 0 is the basic reproductive ratio for how many individuals each infected person is going to infect [16] R 0 is often used on its own in public health as an indicator of epidemic probability; if R 0 < 1 then an epidemic is not generally considered possible, for R 0 > 1, the larger the value, the more likely an epidemic is to occur. R 0 is a composite value describing the behavior of an infectious agent. Hence, R 0 can be decomposed classically, for example, as: p d c, where p is probability of infection occurring for a contact, d is duration of infectiousness, and c is number of contacts [17] . However, R 0 in the classical decomposition above, while it is one of the best tools we have, does not account for age segregation of response, existing immunity in population, network topology of infectious contacts and other factors. These observations were significant in the motivation for developing EpiFlex. The EpiFlex model was designed to create a system that could incorporate as much realism as possible in an epidemic model so as to enable emerging disease events to be simulated. There are limitations, described below in a separate section, but the model is quite effective as it stands. In most cases, the limitations of EpiFlex are shared by other modeling systems. There are a variety of methods used for mathematical modeling of diseases. The most common of these are the SIR (susceptible, infected, recovered) of Kermack and McKendrick [15] , SIS (susceptible, infected, susceptible), SEIR (susceptible, exposed, infected, recovered), and SIRP (susceptible, infected, recovered, partially immune) as developed by Hyman et al. [18] and further developed by Hyman and LaForce [19] . The SIRP model was used as the starting point for development of the object model of Epi-Flex. In SIRP, the SIR model is extended to include partial immunity (denoted by P) and the progressive decline of partial immunity to allow influenza to be modeled more accurately. (See Appendix.) There is a need for experimentation in more realistic discrete modeling, since the lattice type of discrete modeling is understood to skew in favor of propagation, as discussed by Rhodes and Anderson [20] and Haraguchi and Sasaki [21] . Others such as Eames and Keeling [22] and Edmunds et al. [12] have explored the use of networks to model interactions between infectable entities, and Ferguson et al. [23] and others have called for more balance in realism for epidemiology models. Since EpiFlex was completed, Lloyd-Smith et al. [17] have shown the importance of superspreading in disease transmission for the SARS epidemic. EpiFlex is designed to take these issues into account. There are known weaknesses in SIS-descended models, some of which are discussed by Hyman and LaForce [14] . They suggested that a model dealing with demographics and their subgroups would be useful and described a start Theoretical Biology and Medical Modelling 2006, 3:32 http://www.tbiomed.com/content/3/1/32 toward conceiving such a model, creating a matrix of SIRP flows for each demographic group within a "city" and modeling contacts between these groups. Thus, the possibility of building an entirely discrete model using the object-oriented approach, essentially setting the granularity of the Hyman-LaForce concept at the level of the individual, together with Monte Carlo method, was attractive. The object method of design seemed to be a good fit, since object-oriented programming was invented for discrete simulations [24] . An Object-Oriented (OO) design defines as its primitive elements "black box" subunits that have defined ways of interacting with each other [25] . The OO language concept originally was conceived for the Simula languages [24] for the purpose of verifiable simulation. Enforcement of explicit connections between objects is fundamental to OO design, whereas procedural languages such as FORTRAN and COBOL do not because data areas can be freely accessed by the whole program. OO languages wrap data in methods for accessing the data. If each "black box" (i.e. object) has a set of specified behaviors, without the possibility of invisible, unnoticed interactions between them, then the simulation can potentially be validated by logical proof in addition to testing. (It would take an entire course to introduce OO languages and concepts, and there is not space to do so here. Interested readers are suggested to start with an implementation of Smalltalk. There are excellent free versions downloadable. Smalltalk also has an enthusiastic and quite friendly user community. See: http:// www.smalltalk.org/main/.) The design of EpiFlex is described more completely in the appendix. Design proceeded by establishing the definition of a disease organism as the cornerstone, then defining practical structures and objects for simulating the movement of a disease through populations. The disease object was assigned a set of definitions drawn from literature that would allow a wide spectrum of disease-producing organisms to be specified. The aim was to minimize the number of configuration parameters that require understanding of mathematical models. The hosts that are infected became the second primary object. A host lives and works in some area, where hosts are members of some demographic group, which together determine what of n types of contacts they might have to spread an infectious disease. The hosts move about the area in which they live between locations at which they interact. In EpiFlex, an area contains some configured number of locations, and locations are containers for temporary groups of hosts. Since people travel between metro areas, the model supports linkages between areas to move people randomly drawn from a configurable set of demographic groups. The remainder of this section presents the disease model adopted, an overview of each component, an overview of program flow, and a description of the core methods. This is followed by discussion of results from the EpiFlex software system. This model has up to four stages during the infection cycle: the Incubation, Prodromal, Manifestation, and Chronic stages; to this is added a fatality phase. I have named this 'extended-SIRP'. Fig. 1 shows a diagram of this model. The model of Fig. 1 allows us to track the different phases of the disease process separately, and to define variable infectiousness, symptoms, fatality, recovery and transition to chronic disease at each stage as appropriate. This allows us to model the progress of a disease in an individual more realistically. For diseases that have no identifiable occurrence of a particular stage, this stage can be set to length zero to bypass it entirely. The 8 contact types designed into EpiFlex are drawn from literature in an attempt to model spread of infection more accurately. These contact types are: blood contact by needle stick, blood to mucosal contact, sexual intercourse, skin contact, close airborne, casual airborne, surface to hand to mucosa, and food contact. The probability of infection for a contact type is input by the user as estimated from literature or based on hypothetical organism characteristics. Durations of disease stages are chosen uniformly at random from a user-specified interval [R low , R high ]. Random numbers, denoted by ξ, on [0, 1] are used to seed the determination of the infected disease stage periods (denoted I Incubation , I Prodromal , I Manifestation , I Chronic ). R low and R high are taken from medical literature and describe a range of days for each stage of an illness. These calculations are simply: (ξ × (R high -R low )) + R low = D, where D is days for a particular stage. (This may be extended in the future to include ability to define a graph to determine the flatness of distribution and the normative peak. This will make a significant difference in modeling of diseases such as rabies, which can, under unusual circumstances, have very long incubations.) One of the following three equations describing immunity decay is chosen; L is the current level of partial immunity, P is the level of partial immunity specified as existing Random values on [0, 1] are then used to decide whether an infection occurs during the partial immunity phase P shown in the chart above. This decision uses the output of the immunity level algorithm, L, which is a number on [0, 1], as is the random value ξ: EpiFlex uses a dynamic network to model the interactions between hosts at a particular location based on the skew provided and the demographic segments movement cycles. The networks of contacts generated in this version Extended-SIRP disease model of Epiflex Figure 1 Extended-SIRP disease model of Epiflex. S: susceptible I: Infected R: recovered P: partially immune F: fatality Extended SIRP breaks the infected stage I into 4: I Incubation , I Prodroma l, I Manifestation , I Chronic , and adds a fatality terminating stage. of EpiFlex are not made visible externally; they can only be observed in their effects. (See: Limitations of EpiFlex modeling.) Their algorithms were carefully designed and tested at small scales, observing each element. A location describes a place, the activities that occur there, and the demographic groups that may be drawn there automatically. A location can have a certain number of cells, which are used to specify N identically behaving locations concurrently. This acts as a location repetition count within an area when the location is defined. The user sets an average number of hosts inhabiting each cell, and a maximum. There is also a cell exchange fraction specifiable to model hosts moving from cell to cell. The algorithm for allocating hosts in cells is semi-random. It randomly puts hosts into cells in the location. If a cell hits the average, then it does another random draw of a cell. If all locations are at maximum, then it overloads cells. Interactions are within the cell. So a host must be exchanged to another cell in order to be infective. See the appendix for 'Location component', and also with an open model look at how hospitals were defined. Households are modeled at this time using a cell configuration. EpiFlex is implemented with a Monte Carlo algorithm such that each host in a location is assigned a certain number of interactions according to the Cauchy distribution parameter setting for that location. This distribution describes a curve with the y axis specifying the fraction of the maximum interactions for the location and x axis specifying the fractional ordinal within the list of hosts in the location. The distribution can be made nearly flat, or severely skewed with only a few actors providing nearly all contacts, as desired by the user of EpiFlex. Note that the structure of the network formed also depends on what locations are defined, what demographic groups are defined for the population, and how demographic groups are moved between locations. Each location has a maximum number of interactions specified per person, which is used as the base input. Initially, a Gaussian equation was used, but it was discarded in favor of a Cauchy function since this better fits the needs of the skew function and computes faster. The algorithm iterates for each infectious host, and selects other hosts to expose to the infected party in the location, by a Monte Carlo function. This results in a dynamically allocated network of interactions within each location. The exposure cycle also makes use of Monte Carlo inputs. Each location has a list of contact types that can take place at a particular location, and a maximum frequency of interactions. This interaction frequency determines how many times contacts that can spread a disease will be made, and the contact specification defines the fractional efficacy of infection by any specific route. Modeling the effect of different types of contacts has been discussed in the literature, e.g. Song et al. [26] . EpiFlex attempts to make a more generalized version. For each host infection source, target hosts are drawn at random from the location queue. A contact connection is established with the target as long as the contact allocation of that target has not been used up already. Contact connections made to each target are kept track of within the location to prevent over-allocation of contacts to any target. Thus, for each randomly established connection, a value is set on both ends for the maximum number of connections that can be supported. Once the maximum for either end of the link is reached, the algorithm will search for a different connection. The location algorithm is described below in more detail. The user specifies the maximum number of connections for a location; the σ output from a Cauchy distribution function determines how many connections an individual will have. This allows variations in the degree of skewness for superspreading in a population to be modeled, which has been shown to be of critical importance by Lloyd-Smith et al. [17] . If p = position in queue, q = number of hosts in queue for location: X = p/q, where X denotes the proportional fraction of queue for position. If K is a constant chosen for the location to express skew distribution, the Cauchy distribution function is: If κ is the number of contacts for a particular host and κ max is maximum number of contacts for any given host in the location: When hosts move from one location to another within the model, they tend to maintain a rough order of ordinal position. Consequently, when there is a high σ for a location, the high connection host in one location tends to be a high connection host in another. This reflects real-world situations, (though not perfectly) and corresponds better than persistently maintaining high connection individuals from location to location, since host behavior changes from place to place. The Cauchy distribution function is fairly fast in execution. The function can be used to approximate the often radical variations seen in epidemiology studies; as an extreme example, one active super-spreader individual might infect large numbers, when one or even zero is typical [17] . This type of scale-free network interaction has been explored by Chowell and Chavez [27] . The Cauchy function allows networks to be generated dynamically within each type of location in a very flexible manner, such as corresponding to super-spreader dynamics [17] . In addition to the specification of skew within a location, the network of contacts is also defined by (a) what locations are present and (b) the movement cycles defined for each demographic group within the model. Processing time increases with population. This slowing is an expected characteristic of an object modeling system and is the price paid for the discrete detail of the EpiFlex model. The primary source of this increase in processing time is the sum of series of possible infectious events that are modeled for each iteration. It therefore scales as a series sum not as a log, based on the contagiousness of the disease and the number of potential hosts in a location with an infected host. This is minimized by only processing infectious host contacts. The increase stems from the characteristics of networks in which each node has n connections to other nodes. When iteration is done for a location containing infectable hosts, it is the number of infected hosts that creates an element of the series. The infected hosts are put into a list, and each one interacts randomly with other hosts (including other infected ones) in the location. Thus, considered as a network with m nodes, each of the m nodes is a host. A temporary connection to another host is made to n other nodes where n