key: cord-0076178-aphhopfq
authors: Xavier, Joao B.; Monk, Jonathan M.; Poudel, Saugat; Norsigian, Charles J.; Sastry, Anand V.; Liao, Chen; Bento, Jose; Suchard, Marc A.; Arrieta-Ortiz, Mario L.; Peterson, Eliza J.R.; Baliga, Nitin S.; Stoeger, Thomas; Ruffin, Felicia; Richardson, Reese A.K.; Gao, Catherine A.; Horvath, Thomas D.; Haag, Anthony M.; Wu, Qinglong; Savidge, Tor; Yeaman, Michael R.
title: Mathematical models to study the biology of pathogens and the infectious diseases they cause
date: 2022-03-15
journal: iScience
DOI: 10.1016/j.isci.2022.104079
sha: 30c91a8b47d9ad48e2ce1241e7f0a0adf9d307b1
doc_id: 76178
cord_uid: aphhopfq

Mathematical models have many applications in infectious diseases: epidemiologists use them to forecast outbreaks and design containment strategies; systems biologists use them to study complex processes sustaining pathogens, from the metabolic networks empowering microbial cells to ecological networks in the microbiome that protects its host. Here, we (1) review important models relevant to infectious diseases, (2) draw parallels among models ranging widely in scale. We end by discussing a minimal set of information for a model to promote its use by others and to enable predictions that help us better fight pathogens and the diseases they cause.

How does an outbreak spread in a population of people that interact frequently? How does a pathogen grow in a new host using the nutrients available in the infected tissue to make the energy, the proteins, and all the building blocks it needs to replicate? How does the gut microbiome protect its host against invasion by an enteric pathogen? Why does a drug that kills a pathogen in vitro fail to cure an infected patient? These questions seem widely different, but they share commonalities: Answering them is key to fighting infectious diseases.

Infection results from a combination of processes that range widely in scale: from the biological molecules involved in the intracellular metabolism of pathogens to the interactions between individual people who may host and spread a virus across a population ( Figure 1 ). Despite the wide range of scales, in each of these processes the behavior of the system results from the interactions between many elements. Understanding how each element works in isolation is important. But that knowledge alone can be insufficient if the behavior of the full system is an emergent property of the interactions among its elements (Casadevall et al., 2011) . Mathematical modeling provides a way to quantify interactions among elements and dissect their relative contribution to infection dynamics.

Mathematical models can offer insights by switching the focus away from the elements-metabolites, bacteria, viruses, people, etc.-to the system of interactive elements. In many cases, the elements become nodes in a network, and their interactions become edges. We can then ask how the network structure-how the nodes connect to each other to create positive and negative feedback-determines how the system functions. The structure of the network can explain functions of the system, such as robustness to changes in individual elements (Bhalla and Iyengar, 1999) , which are emergent properties. This is key to identifying vulnerabilities that could be targeted by novel therapies, such as an enzyme essential for the pathogen that could be precisely targeted to stop its replication.

To illustrate how network structure can determine function, consider the bacterial chemotaxis system. This system is essential for the fitness of bacterial pathogens because it helps the pathogen cell move toward nutrients. But the chemotaxis system of a bacterium like Escherichia coli is difficult to understand by investigating each protein involved in the signal transduction that makes the system in isolation. A key function of the chemotaxis is its robustness: its ability to adapt. When a bacterium swims toward a nutrient source, the nutrient concentration increases. The signal transduction system must adapt its ability to continue sensing relative changes in the nutrient concentration even though the absolute levels in the nutrient concentration have increased. A system lacking robustness would be overwhelmed by higher levels and the pathogen cell would stop swimming toward the nutrient source. Mathematical modeling played a key role in understanding how robustness emerges from the network of interacting proteins: The scientists used parallel computer simulations to determine how small changes in individual proteins-such as alterations in enzyme kinetics caused by point mutations-altered the system's ability to adapt to higher nutrient levels. The simulations showed that rather than being a function of any single protein, robustness is an emergent property of the system that results from the network's connectivity. The ability of E. coli's chemotaxis system to adapt gradually to higher nutrient concentrations is robust to small changes in any of the network's elements (Barkai and Leibler, 1997) .

Fighting infectious diseases often means finding weak spots in a system that supports pathogen life and causes its collapse. Examples include drugs designed to block key elements in a biochemical network, such as a key protein in the bacterial chemotaxis network, or population lockdowns imposed to reduce person-to-person transmission and mitigate a pandemic (Schlosser et al., 2020) . Any perturbation should consider the system's properties; otherwise, it can have unwanted results. Designing a combination of antibiotics to treat a bacterial infection, for example, must consider unwanted consequences of simultaneously impacting two distinct processes in bacterial physiology. The antibiotic spiramycin inhibits protein synthesis, but the antibiotic trimethoprim inhibits DNA synthesis. Combining the two would seem like a good idea: it would impact two key bacterial processes simultaneously. In reality, combining the two drugs has an unwanted consequence: they suppress each other, and the pathogen grows better relative to separate antibiotic administration (Chait et al., 2007) . Understanding the system governing bacterial growth helps make sense of this unexpected consequence. Bacterial growth requires a certain balance. Inhibiting DNA synthesis with spiramycin creates an imbalance-protein synthesis outpaces DNA synthesis Figure 1 . Mathematical models in infectious diseases study the interaction between elements defined at a range of scales, from the molecules inside a pathogen's cell all the way to the people infected by pathogen as the disease spreads across a population level-and impacts growth. Spiramycin slows protein synthesis which creates the reverse imbalance. Combining the two corrects the imbalance: The drug combination, by restoring system balance, is antagonistic (Bollenbach et al., 2009 ). This principle of system balance, exemplified by DNA versus protein synthesis, can be grasped qualitatively. But mathematical modeling can lead to quantitative insights, and enable precision interventions that would otherwise be inaccessible. For example, computer models allow in silico systematic analysis of synergistic and antagonistic interactions between drugs (Yeh et al., 2009) . The models can be used to study drug combinations in situations where treatment efficacy must be balanced with a high risk of resistance (Torella et al., 2010) .

Beyond designing drug interventions, mathematical models can help explain counterintuitive observations at different scales, such as why probiotics impair the recovery of the gut microbiota after antibiotic treatment (Suez et al., 2018) , or why some public health interventions limit the spread of a pandemic better than others (Brauner et al., 2021) . It is hard to find common features among all the diverse models used for infectious diseases. It was interesting to find that several models use mass-action kinetics as their central assumption, despite their widely different scales. We conclude by suggesting practices that we acquired from our experiences of publishing mathematical models that facilitate their use by other researchers and further our understanding of infectious diseases.

The SIR model: modeling disease spread

The SIR model is perhaps the first that comes to mind when thinking about infectious disease. This compartmental model describes how a disease spreads across a population. In its deterministic form, the SIR model is a system of three coupled ordinary differential equations (Kermack and McKendrick, 1927) . Each equation describes individuals in a different state: susceptible (S), infected (I), and resistant (R) (Figure 2 ). On their right-hand side, the equations have one or two terms of two kinds: sources (positive terms) and sinks (negative terms).

The SIR model assumes mass-action kinetics. In this case, the elements are individual hosts, and the kinetics quantifies host transition between states. The terms for the infection rate (bIS) and the recovery rate (gI) are assumed to follow kinetics similar to those used to model chemical reactions, where the reaction rate is proportional to the concentrations of its reagents. This means that susceptible individuals become infected at a rate proportional to the number of infected individuals, and that infected individuals recover at a constant rate. The mass-action assumption is valid for reactions that occur when reagent molecules collide randomly because the rate of random collisions increases proportionally to the concentrations of the reagents. People do not interact randomly in a population as molecules do in a chemical flask, but the SIR model is still useful despite the simplification. SIR can compute infection curves during periods when an epidemic spreads through a population undisturbed, as well as compare scenarios when the rate of encounters changes (Schlosser et al., 2020) . The assumptions may break down when measures imposed for mitigation or seasonal changes in population activity affect the likelihood of certain encounters. For example, a lockdown during a pandemic could decrease the number of encounters between the population at large, but increase the rate of select encounters, such as between healthcare and other essential workers or between individuals confined in the same house. The model assumptions might need to change to forecast the success of proposed confinement strategies accurately.

Some versions of the SIR expand on the classic version (Equation 1) to account for social networks, geography, and other constraints (Keeling and Danon, 2009 ). Some additions include stochasticity (Tornatore et al., 2005) , spatial structure (Keeling, 1999) , heterogeneity within the population such as different age groups (Franceschetti and Pugliese, 2008) , and vaccination strategies (Shulgin et al., 1998) . To forecast demand for healthcare resources, the SIR model may also be expanded to account for symptomatology,

The virulence of a pathogen-its ability to thrive in a living host and cause infection-depends on the metabolic abilities of the pathogen (Brown et al., 2008) . The host tissues provide the growth media, but it is the system of metabolic reactions encoded by each pathogen's genome that determines whether the pathogenic cell can grow on that media, by making all the building blocks needed for new copies of itself (Fuchs et al., 2012) . Genome-scale network reconstructions unify knowledge about a specific organism from a wide array of sources-its genome sequence, assays that determine which metabolites it can grow on, its transcriptome measured in different conditions, and more-to show how the molecules in the network work together to enable pathogen proliferation (Sertbas and Ulgen, 2020) .

Genome-scale metabolic networks can be represented as mathematical expressions and then used to compute metabolic fluxes ( Figure 3 ). One way to make these computations is by flux balance analysis (FBA), a simplification that assumes the system is in balanced growth (Orth et al., 2010) . FBA simplifies a system of coupled ordinary differential equations-where each differential equation describes one biochemical reaction-into a system of algebraic equations. But balanced growth in the system still has too many equations to produce a unique solution. Additional assumptions can be added as upper or lower bound constraints on reactions such as rates of nutrient consumption measured experimentally. FBA tends to use an objective function that maximizes the rate of biomass production. This assumption is valid, for example, when the bacteria were selected to grow as fast as possible.

A high-quality model requires a well-annotated genome. Recent advances in the field use any given bacterial genome sequence and produce a draft genome-scale reconstruction network for that organism automatically (Mendoza et al., 2019) . A network produced automatically is often incomplete, even when the genome is well annotated but even more so when the genome belongs to a poorly studied organism with many genes of unknown function. Some methods make additional assumptions to gap-fill incomplete iScience Review pathways. Manual curation and validation can take weeks or months of work by trained researchers (Norsigian et al., 2020b) . Fortunately, the process tends to speed up exponentially as the number of high-quality genome-scale networks increases. The BiGG Models knowledge base centralizes high-quality genomescale metabolic models. The website allows users to browse and search models. This makes the process of adapting a high-quality reference network to a variant that differs in a few metabolic genes relatively easier than starting a model from scratch for a poorly studied organism that is only distantly related to existing models. The approach of adapting a high-quality reference model has been used to quickly generate multi-strain models and predict metabolic differences among a range of new variants. A recent update to the BiGG database includes multi-strain models (Norsigian et al., 2020c) .

A key part of creating a high-quality whole genome-scale network model is its validation and refinement of new experiments. This strategy was used recently for the pathogen Clostridoides difficile, an intestinal pathogen resilient to many common antibiotics, which causes severe intestinal infection in humans and can lead to medical emergencies such as pseudomembranous colitis, toxic megacolon, and perforation (Czepiel et al., 2019; Farooq et al., 2015) . C. difficile metabolism is crucial to understanding its outbreaks. Two epidemic strains of this pathogen (RT027 and RT078) acquired distinct ways to grow on low concentrations of trehalose. The two epidemic lineages emerged after trehalose entered the human diet, which could have been selected for the ability to grow on the threshold and may have helped their emergence (Collins et al., 2018) . A genome-scale network model of C. difficile built on two previous versions was further refined and curated with experimental data (Norsigian et al., 2020a) . The new data included BIOLOG phenotype microarrays, which are multi-well plates pre-loaded with many nutrients such as different sugars. BIOLOG data are commonly reported as a binary table: the bacteria either grow or do not on a certain nutrient. The genome-scale model can predict if the bacteria grows on that nutrient or not, and this binary prediction is compared to the BIOLOG data. When a prediction disagrees with the data, the model can be refined. For example, if there is a false negative-the model predicts no growth but the data show the pathogen can indeed grow-that could mean that a gene encoding for a transporter of that nutrient is mis-annotated. A literature search and an investigation of transporters in related organisms may reveal the transporter that had been annotated incorrectly. Non-targeted proteomics can be combined with BIOLOG microbial phenotyping microarray-based experiments to detail the metabolic A D E C B Figure 3 . Genome-scale models provide a way to study how a pathogen can thrive in different environments, for example by using the nutrients available in the various organs of its human host (A) A model describes the intracellular metabolism of the pathogen as a network of metabolites connected by biochemical reactions.

(B) Each reaction has a stoichiometry and a rate.

(C) Flux balance analysis (FBA) assumes that the pathogen is in balanced growth, which means that the concentration of each intracellular metabolite is in a steady state and its time derivative equals 0. Balanced growth is a simplifying assumption that turns a system of differential equations into a system of algebraic linear equations. The system is then solved using FBA by adding experimental constraints and an objective function, such as the biomass equation.

(D) A high-quality genome-scale network can describe a reference strain with a well annotated genome, and be used to predict its growth phenotypes.

(E) The model of the reference strain may also be customized to model other closely related strains. A set of network models can be used to compare the strains' abilities to use different nutrients.

iScience 25, 104079, April 15, 2022 5 iScience Review pathways necessary for a gut bacterium such as Bifidobacterium dentium to metabolize dietary sugars typical of the human diet (Engevik et al., 2021a) .

Each microbe has only a fixed number of genes and a limited number of regulators. But the connections between those genes and their regulators form gene regulatory networks. Gene regulation networks enable microbes to turn genes on and off in practically infinite ways. Natural selection acts on network variations introduced by horizontally acquired genes, gene duplications, gene deletions, and gene mutations, and favors regulatory circuitry that controls gene expression to optimize fitness (Brooks et al., 2011; Yan et al., 2017) . Gene regulation networks enable pathogens to respond to different stimuli with dynamic behaviors (eg, sporulation (de Hoon et al., 2010) and dormancy (Peterson et al., 2020) ) while still using a relatively small number of components.

Mathematical modeling requires transcriptional data acquired across multiple conditions and perturbations-different growth media, in isogenic mutants, etc. The models describe the influence of transcription factors (TFs) on individual target genes or clusters of co-regulated genes and uses machine learning methods such as linear regression and decision tree ensembles (Huynh-Thu et al., 2010) , among others. In the case of gene clusters, the network reconstruction is achieved in essentially two steps. The first step clusters the genes into sets of co-regulated genes; the clustering algorithm is often semi-supervised and can integrate prior knowledge from the literature . The second step models the expression of the co-regulated gene sets as a combination of TFs and environmental factors that influence the cluster . Interactions between TFs and gene clusters are inferred based on the mRNA profiles of gene clusters and TFs. Recent work has shown the advantages of estimating the regulatory activity of TFs and post-transcriptional regulators by leveraging the expression profile of known and putative targets when reconstructing gene regulation networks (Arrieta-Ortiz et al., 2015; Arrieta-Ortiz et al., 2020 Fu et al., 2011) . A recent improvement organizes gene regulatory elements-the DNA sequences within each gene promoter that TFs specifically bind to-of every gene using their distributions across the entire genome to explain what TFs bind the sequences, in what contexts they are bound, and the consequence of TF binding on activating or repressing transcription of downstream genes (Brooks et al., 2014) . This approach reveals environmental context whereby genes from different regulons can be co-regulated and special situations where genes of the same regulon, or even the same operon, can be conditionally and differentially expressed. These approaches have been recently applied to epidemiologically important pathogens such as Mycobacterium tuberculosis (Peterson et al., 2021) and Clostridoides difficile (Arrieta-Ortiz et al., 2021).

An alternative approach to infer gene regulatory networks is by decomposition methods such as independent component analysis (ICA) (Sastry et al., 2019) . ICA extracts regulatory signals underlying transcriptomic data and yields both the composition of the transcriptional regulatory network (TRN) (termed ''iModulons'') and the activity of the iModulons in the samples used to build the model. The iModulons represent a set of genes whose expression levels vary with each other, but are independent of all other genes in the genome. Their activities represent the collective expression level of the iModulon genes in a given sample. Therefore, both the static structure and the condition-specific dynamics of the transcriptional regulation network can be inferred simultaneously. A number of iModulons for E. coli, Bacillus subtilis, and Staphylococcus aureus have already been inferred and methods now exist to expand these to other bacteria by leveraging the rapidly growing transcriptomic profiles in public repositories (Poudel et al., 2020; Rychel et al., 2020 Rychel et al., , 2021 Sastry et al., 2021) .

Models of gene regulation networks have been expanded to include the networks of small non-coding RNAs (sRNAs) and discover new sRNA targets and interactions between sRNAs and TFs that were then experimentally validated (Modi et al., 2011) . Other models integrate gene regulation networks with metabolic networks to show how information flows between the different types of networks (Chandrasekaran and Price, 2010; Covert et al., 2001; Immanuel et al., 2021) . Gene regulatory network models integrated with metabolism are immensely useful in interpreting the causal, mechanistic, and physiological drivers of pathogen response to host-relevant stresses and drug treatment. In so doing, the models are useful to uncover mechanisms of drug action at a systems level and also discover novel vulnerabilities that can be exploited as new drug targets, or means to identify synergistic drug combinations Peterson et al., 2016 Peterson et al., , 2020 iScience Review Gene regulation networks, and even of integrated regulatory-metabolic models, can be expanded to model the coordinated behavior of more than one organism, for example, host-pathogen interactions to predict the outcomes of infection. Those models benefit from experimental resources such as transcriptional regulator deletion mutants from the fungal pathogen Candida glabrata, which revealed genetic and epigenetic pathways involved in stress resistance and virulence (Filler et al., 2021) . RNA sequencing can also be used to study the transcriptome of the infected hosts and the pathogens simultaneously (Peterson et al., 2019; Westermann et al., 2017) . This type of experimental work can provide new data for network models that combine host and pathogen gene regulation networks to understand interactions between host and pathogen (Schulze et al., 2016) . Similarly, integrated regulatory and metabolic models are also useful to investigate complex host and microbiome influences on the success of a pathogen to infect and colonize. The models can be used to formulate nutritional and probiotic interventions to prevent and treat infections by pathogens such as C. difficile (Arrieta-Ortiz et al., 2021; Girinathan et al., 2020) .

Gene regulation models can also explain phenotypic heterogeneity, a strategy that helps pathogens escape host attack and drug treatments. The models can compute growth characteristics, such as the growth rate and the carrying capacity, as well as drug susceptibility of different phenotypic states that emerge within the same monoclonal population. Integrating regulatory and metabolic processes models with quantitative systems pharmacology models can predict the clearance rate of a heterogenous pathogen population in complex contexts such as granuloma formed by tuberculosis (Azer et al., 2021) .

Predicting the susceptibility of a pathogen to a drug is essential to guide clinical intervention. This is traditionally done using in vitro assays in artificial media. But the sensitivity of these assays depends on the realism of the artificial conditions used: susceptibility in situ is often quite different, and a drug that works in vitro can easily fail to cure a patient. Mathematical models can help fill gaps between laboratory experiments and clinical intervention. Antibiotic-persistent bacteremia caused by methicillin-resistant Staphylococcus aureus (MRSA) exemplifies this problem well. Often, the antibiotic minimum inhibitory or bactericidal concentrations determined in the laboratory indicate that an S. aureus infection is susceptible in vitro, but then the infection persists despite treatment in vivo (Lewis, 2020; Li et al., 2019) . There are several reasons for the disagreement, including the emergence of new antimicrobial resistance, complex phenomena such as heteroresistance, tolerance, and persistence (Balaban et al., 2019) . Despite distinct mechanisms, they have the same impact: each lowers the efficacy of the antimicrobial therapy and its ability to treat the infection.

Physicochemical parameters such as pH, salt, temperature, oxygen, carbon dioxide, and other co-factors such as ionic strength of the biofluid or tissue compartment can also impact the susceptibility of microbes to antibiotic therapy (Goode et al., 2021) . Most in vitro assays lack host immune effectors such as kinocidins, defensins, or others that influence outcomes in situ (Sakoulas et al., 2014; Yeaman et al., 2002; Yount et al., 2011) . Cells of the host immune system, such as neutrophils, macrophages, and other professional defense cells, interact with and can be influenced by interactions with antimicrobial agents (Algorri and Wong-Beringer, 2020). Not all immune system interactions with antimicrobial agents are additive or synergistic (Yang et al., 2017) . Some antibiotics can have off-target effects on host immune cells and impair their trafficking, opsonophagocytic, or intracellular killing functions. Also, how the immune system defends against infection varies among anatomical locations in the body. For example, the mechanisms of host defense are very different on relatively inert epithelial or cutaneous surfaces such as skin compared with the central nervous system (Lee et al., 2021) . The human body represents a diverse array of anatomic, physiologic, and microbiologic ecosystems that influence the impact of the antimicrobial agent on the target pathogen. These different habitats may influence the pharmacodynamics and pharmacokinetics of antimicrobial agents, including access and activity. Over and above these factors, infections are dynamic systems. The contexts in which microbes and antimicrobial agents interact change over the course of the infection as the pathogen and the host act and counteract in relation to one another and complicate the use of models to predict the outcome of a drug intervention.

Predicting the outcome of antimicrobial therapy in vivo resembles solving a multidimensional equation involving the target pathogen, antimicrobial agent, immune effectors, the context of the host (including pharmacology), and time. Mathematical modeling to study the outcome of anti-infective therapy in vivo has proven challenging, but recent work has led to important progress in the field.

iScience 25, 104079, April 15, 2022 7 iScience Review A recent computational tool integrates host immunity and antibiotic dynamics to investigate treatment outcomes in tuberculosis (Pienaar et al., 2015) . These findings suggested that unforeseen antibiotic gradients and hypoxic microenvironments exist within granulomas that enable mycobacterial subpopulations to evade susceptibility. Pharmacodynamic modeling has also emerged as a tool to understand static versus dynamic aspects of infection. A model of bacterial phenotypes compared susceptible vegetative cells, resting cells, constitutively resistant cells, and adaptively resistant cells (Jacobs et al., 2016) . Importantly, the model indicated that the significant differences between static and dynamic contexts can explain uncertainties in therapy outcomes. A ''P-system'' modeling paradigm was used to study the real-time evolution of antimicrobial resistance (Campos et al., 2019) . This work pointed to the complex and embedded parameters that intersect to determine net susceptibility or non-susceptibility of a pathogen to antimicrobial therapy across orders of magnitude-from resistance plasmid to microbial community.

Another recent mathematical model explored the dynamics of MRSA persistence in the face of host immunity and typical antibiotic regimens (Mikkaichi et al., 2019) . Using an ensemble approach, computational models integrating selected parameter sets were generated under conditions simulating vancomycin therapy as used in clinical practice. Models distinguishing persistent versus resolving MRSA bacteremia outcomes were identified. Next, machine learning was implemented to identify specific parameters most contributing to either persistence or resolution. Parameters that correlated with persistent outcomes included bacterial growth rate, e.g. low-energy, slow growth rate associated with small colony variant phenotypes (Bates et al., 2003; Manuse et al., 2021) , and immune clearance, e.g. reduced molecular and cellular immune effector efficacy. Lastly, specific pharmacological strategies were modeled to investigate interventions that may avoid persistence outcomes in MRSA bacteremia. Results predicted that microbicidal agents effective against persistent cells, but not agents that prevent the emergence of persistent phenotypes, were more likely to have efficacy in persistent MRSA bacteremia.

These examples illustrate the recent efforts to model antimicrobial therapy in vivo. These approaches are becoming more sophisticated, integrating multiple variables and dynamic processes relevant to the in vivo setting that were traditionally missing in conventional antimicrobial susceptibility assays in vitro. Challenges remain, but the models of complex interaction that accurately describe and predict efficacy versus failure of antimicrobial regimens in clinical infection would result in a quantum leap toward improving therapeutic success, reduce antimicrobial resistance, and identify novel anti-infective agents and strategies (Talebi Bezmin Abadi et al., 2019).

Pathogens rarely exist in isolation. In their environmental reservoirs, they encounter many types of resident microbes and when they colonize the host they encounter the host's microbiome. Pathogens are constantly interacting with other microbes within an ecosystem: this ecology is another crucial aspect of infectious disease.

The human microbiome is a collection of microbes that naturally colonize the human body. Some of these microorganisms make up the first line of our defense against pathogen invasion (Becattini et al., 2017; Buffie et al., 2012) . In the gut microbiome, the densest of our microbiomes, bacteria naturally keep pathogens out by competing for nutrients (Oliveira et al., 2020) , producing molecules that inhibit or kill pathogen cells , or by training and stimulating the human immune system (Schluter et al., 2020) . Highthroughput sequencing can help study which microbes are present and how abundant they are (Jovel et al., 2016) . Longitudinal microbiome sequencing of samples taken from the same person over time reveals how the community responds to perturbations such as changes in diet (David et al., 2014) , antibiotic therapy (Dethlefsen and Relman, 2011) , and immune therapies (Schluter et al., 2020) . The community response allows us to parameterize mathematical models of the ecosystem network (Bucci et al., 2016; Buffie et al., 2015; Stein et al., 2013) . Understanding the ecology governing the gut microbiome, and how it contributes to our defense against pathogens, can benefit from mathematical modeling.

The gut microbiome can be described by systems of coupled differential equations that make mass-action assumptions. The Lotka-Volterra model was originally derived to describe predator-prey dynamics in animal ecosystems (Figure 4 ). This simple set of equations can be expanded to describe any number of microbial species interacting in a microbiome. Each differential equation in this system describes the dynamics of a microbe as a sum of terms. where X is a vector of absolute abundances of all species at a certain time, m max is a vector containing the maximum specific growth rates of each species, M is a matrix of interactions that represents the network, and ε is a vector containing the impact of a perturbation (for example, an antibiotic) on each species. The matrix can be parameterized using timeseries data using constraints derived from machine learning (Bucci et al., 2016; Stein et al., 2013) . This approach to model networks in microbiome ecology has applications in infectious diseases. A network model parameterized from mouse experiments identified a gut bacterium, Clostridium scindens, that is a key player in resisting colonization by the pathogen C. difficile (Buffie et al., 2015) . The same approach can also include the fungi in the gut microbiome to describe multi-kingdom ecological interactions (Rao et al., 2021) , and to include the host white blood cells to quantify how gut bacteria impact systemic immunity (Schluter et al., 2020) .

Identifying the molecular mechanisms of those ecological interactions requires approaches that go beyond population profiling. There are considerable efforts underway to develop metabolomic approaches that are capable of measuring microbially derived metabolites (i.e. bile acids/salts, amino acids and biogenic amines, short-chain fatty acids, and neurotransmitters) in a number of in vitro and in vivo models. Metabolomics measurements can be made on growth media collected from in vitro or ex vivo culture systems used to grow microbes or intestinal organoids (Engevik et al., 2021a (Engevik et al., , 2021b (Engevik et al., , 2021c Horvath et al., 2020; Ihekweazu et al., 2021; Luck et al., 2021) . There is an increasing number of protocols for metabolomics using extracts of cecal content and fecal samples, and homogenized gastrointestinal tract and brain tissues harvested from gnotobiotic mice (Engevik et al., 2021a (Engevik et al., , 2021b (Engevik et al., , 2021c Horvath et al., 2020; Ihekweazu et al., 2021; Luck et al., 2021) .

Besides ecology, evolution too is important to understand processes important for infectious diseases such as the emergence of antibiotic resistance. High-throughput sequencing has inspired new models to A C B Figure 4 . Lotka-Volterra equations model ecosystem dynamics in a multispecies system like the gut microbiome (A) The dynamics of predator and prey species in animal ecosystems inspired the Lotka-Volterra model, a system of two coupled ordinary differential equations that assumes that the law of mass action applies. The right-hand side of each differential equation has terms that represent the gains (sources) or losses (sinks) of each species. (B) The Lotka-Volterra system can be generalized to any number of species (n).

(C) Network models inferred from timeseries data can be used to study the relationship between network structure and function in the gut microbiome. One application is in finding gut bacteria such as Clostridium scindens that inhibit the growth of pathogens like Clostridioides difficile.

describe antibiotic response evolution over the unobserved network that characterizes the relationships of pathogens to each other over time (Cybis et al., 2015; Mather et al., 2013) . Modern phylogenetics and phylodynamics provide a framework to reconstruct these evolutionary relationships . These approaches start with molecular sequence alignments from the pathogens of interest, assume continuous-time Markov chain models that describe how the sequences and antibiotic response mutate over the unobserved history, and solve the inverse problem of inferring this history and changes of antibiotic response from the data. Recent examples in infectious diseases incorporate further pathogen phenotypes and epidemiological models of the host population (Dellicour et al., 2018) .

Making comparisons and drawing parallels between models Network representation to facilitate system analysis

The dynamics of epidemics, the biochemical reactions occurring inside pathogen cells, and the speciesspecific interactions in the microbiome are all examples of processes in infectious diseases that can be mathematically modeled as networks. In many problems, there are often many networks that fit the data comparably well. The need to compare network models arises from different areas of science and technology-social sciences, economics, biology, computer science, telecommunications, transportation, and others. Some of those methods have been reviewed elsewhere (Tantardini et al., 2019) and an indepth review is outside our scope. Instead, here, we discuss different methods to represent gene expression networks with examples of when one network representation or another works better in a specific situation.

Most network models of gene regulation discretize the interaction between regulated genes and regulating factors (Karlebach and Shamir, 2012) . For small networks, discretized networks actually outperform more sophisticated models (Garcia-Alonso et al., 2019) . The size of a network, for example, is a simple feature that distinguishes networks that can only regulate specific tasks because-they tend to be small-from networks that can regulate cellular physiology more globally-they tend to be larger (Rocks et al., 2019) . A discretized network model can infer whether a cellular system would be able to recover from an arbitrary perturbed state back to its initial state (Liu et al., 2011) . Discretized networks could in principle be used to study whether regulators or regulated sites are under stronger evolutionary pressure, which would correlate with their functional relevance, similar to the way the covariation of amino acid residues across protein homologs informs of links that can reveal new protein structures (Ovchinnikov et al., 2017) . Discretized networks can identify which elements of the network would be most vulnerable to perturbations (Greenwood et al., 2020) .

Gene regulatory networks that include the directionality and strength of gene-gene interaction can be used to study network motifs: patterns of wiring between multiple genes or regulators observed repeatedly across a network, and their structure encodes distinct functionality (Milo et al., 2002) . Motifs have been used to compare transcriptional networks within individual organisms. Aside from network motifs, directionality also allows one to predict molecular processes that could become co-compartmentalized. A directed network that extends beyond gene regulators can be used to predict which microbial systems will contain distinct microbes that all specialize in different metabolic pathways (Perez-Garcia et al., 2016) .

The dynamic behavior of gene regulatory networks may also be compared. Without any information on the rate constants of the gene in the network genes, for example, we may not predict whether negative feedback will lead to oscillations (Elowitz and Leibler, 2000) . Therefore, networks can be compared through their temporal dynamics or the relative dynamics of different edges within each network. However, to accurately describe the function of small gene regulatory networks, we need more parameters. For instance, varying temperature-and therefore kinetic constant of all proteins encoded by genes-can render an essential gene redundant even if neither the gene nor the network has changed (Ben-Aroya et al., 2008; Cassidy et al., 2019) . Many transcript molecules in mammals and microbes are low abundance. Low abundance means that relative changes in the copy number of molecules will lead to noise (Paulsson, 2004) . According to the internal confirmations of individual promoters, the conformations of regulatory factors, and the transition probabilities between all those, cells with the same kinetic rates will demonstrate cell-to-cell variability (Elowitz et al., 2002) . Notably, interferons and other host defense genes contain promoter states that seem to ensure high levels of variability between single cells (Shalek et al., 2013) , which may allow hosts to reduce the adaptation of pathogens (Avraham et al., 2015) . iScience Review

We discussed diverse mathematical models applied to infectious diseases. These models illustrate the wide range of scales that mathematics can model: from molecular processes happening inside the cells of bacterial pathogens to the social interactions between people that can fuel an outbreak across a population. It also demonstrates the challenge of drawing parallels across those models. To conclude this overview, we would like to dedicate some time to a common challenge that many of us encountered as practitioners of mathematical modeling: how to share mathematical models with others in the scientific community.

Models of the same class can define strict sets of rules. For example, whole-genome networks can adhere to the format of the BiGG framework. Adopting a common file format in the BiGG database ensures that every model can run in the same computational framework, such as the COBRA toolbox for Matlab or the CobraPy implementation in Python (Heirendt et al., 2019) . The authors of this paper were encouraged by the US NIAID to find a way to share diverse models of infectious diseases among us. From our experience, we propose these steps:

1. Preparing a slide deck or a white paper with a short presentation to the model.

2. Setting up a code repository such as GitHub, with instructions on how to install and run the code, including any packages from third parties needed.

3. Including a ''Hello World!'' type of walk-through demonstration, typically a Jupyter notebook or similar, to enable others to run a simple application of the code. 5. Define a format for reporting the results, or provide routines to convert the output into a uniform format. Omitting analysis files and variations in output format are a major challenge in reusing published mathematical models.

From our shared experiences, these steps provide a set of minimum information needed to reproduce a model. These steps are abstract, but they are also general enough that they can in principle be applied to any type of model in infectious diseases and other problems in systems biology. The code repository and especially the walk-through code can go a long way in making sure others can understand and reuse the code. To illustrate these principles, we provide examples of publishing models adhering to this format (Table 1) . iScience Review CONCLUSION Mathematical models can be useful to study diverse facets of infectious diseases. In this review, we have covered many topics to showcase this wide diversity: Epidemiology and dynamics, intracellular metabolic networks, dynamic interplay of pathogens and immune cells, transcriptional regulation, microbial communities, and more. We illustrated each theme with examples. But in many cases, our examples only covered a theme briefly and we could not give each theme the depth that it deserves. Still, we hope that showing the wide diversity of scales makes obvious a common challenge of mathematical modeling of infectious disease. It is a common adage of modeling that a model is only as good as its assumptions. The steps listed here to publish a model should help modelers make their assumptions transparent by providing their code, by listing the parameters used, and demonstrating the use of their models with simple examples. Adopting these practices can enable others to reuse our models toward gaining new insights on infectious diseases that would otherwise be inaccessible.

This work was supported in part by the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH) grants U01 AI124275 (MSKCC), U01 AI124290 (Baylor), U01 AI124302 (Boston College), U01 AI124319 (UCLA), U01 AI124316 (UCSD), U19 AI135995 (Scripps), U19 AI135976 (Omics4TB), U19-AI135964 (Northwestern). We acknowledge the NIAID/DMID Systems Biology Consortium for Infectious Diseases Modeling Working Group for the platform for method sharing and for critical feedback on the manuscript. We thank Reed Shabman for helpful comments and careful revisions of the paper and Liliana Brown for the support of the Program this paper originated from.

Differential effects of antibiotics on neutrophils exposed to lipoteichoic acid derived from Staphylococcus aureus

Identifying the measurements required to estimate rates of COVID-19 transmission, infection, and detection, using variational data assimilation

An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network

Inference of bacterial small RNA regulatory networks and integration with transcription factor-driven regulatory networks

Predictive regulatory and metabolic network models for systems analysis of Clostridioides difficile

Pathogen cell-to-cell variability drives heterogeneity in host immune responses

History and future perspectives on the discipline of quantitative systems pharmacology modeling and its applications

Definitions and guidelines for research on antibiotic persistence

Robustness in simple biochemical networks

Staphylococcus aureus menD and hemB mutants are as infective as the parent strains, but the menadione biosynthetic mutant persists within the kidney

Commensal microbes provide first line defense against Listeria monocytogenes infection

Toward a comprehensive temperature-sensitive mutant repository of the essential genes of Saccharomyces cerevisiae

Emergent properties of networks of biological signaling pathways

Nonoptimal microbial response to antibiotics underlies suppressive drug interactions

The Inferelator: an algorithm for learning parsimonious regulatory networks from systemsbiology data sets de novo

Inferring the effectiveness of government interventions against COVID-19

A system-level model for the microbial regulatory genome

Adaptation of cells to new environments

Revisiting the host as a growth medium

MDSINE: microbial dynamical systems INference engine for microbiome time-series analyses

Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile

Profound alterations of intestinal microbiota following a single dose of clindamycin results in sustained susceptibility to Clostridium difficile-induced colitis

Simulating multilevel dynamics of antimicrobial resistance in a membrane computing model

BioWar: scalable agent-based model of bioattacks

Microbial virulence as an emergent property: consequences and opportunities

Repressive gene regulation synchronizes development with cellular metabolism

Antibiotic interactions that select against resistance

Probabilistic integrative modeling of genomescale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis

Dietary trehalose enhances virulence of epidemic Clostridium difficile

Regulation of gene expression in flux balance models of metabolism

Assessing phenotypic correlation through the multivariate phylogenetic latent liability model

Clostridium difficile infection: review. Eur

Diet rapidly and reproducibly alters the human gut microbiome

Phylodynamic assessment of intervention strategies for the West African Ebola virus outbreak

Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation

Hierarchical evolution of the bacterial sporulation network

A synthetic oscillatory network of transcriptional regulators

Stochastic gene expression in a single cell

The metabolic profile of Bifidobacterium dentium reflects its status as a human gut commensal

Bifidobacterium dentium-derived y-glutamylcysteine suppresses ER-mediated goblet cell stress and reduces TNBS-driven colonic inflammation

Human-derived Bifidobacterium dentium modulates the mammalian serotonergic system and gut-brain Axis

Identification of Candida glabrata transcriptional regulators that govern stress resistance and virulence

Threshold behaviour of a SIR epidemic model with age structure and immigration

Metabolic adaptation of human pathogenic and related nonpathogenic bacteria to extra-and intracellular habitats

Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

Benchmark and integration of resources for the estimation of human transcription factor activities

The mechanisms of in vivo commensal control of Clostridioides difficile virulence

Ruggedness testing of liquid chromatography-tandem mass spectrometry system components using microbiome-relevant methods and matrices

Inferring regulatory networks from expression data using tree-based methods

Bacteroides ovatus promotes IL-22 production and reduces trinitrobenzene sulfonic acid-driven colonic inflammation

Quantitative prediction of conditional vulnerabilities in regulatory and metabolic networks using PRIME

Distinguishing antimicrobial models with different resistance mechanisms via population pharmacodynamic modeling

Characterization of the gut microbiome using 16S or shotgun metagenomics

Constructing logical models of gene regulatory networks by integrating transcription factor-DNA interactions with expression data: an entropy-based approach

Mathematical modelling of infectious diseases

The effects of local spatial structure on epidemiological invasions

A contribution to the mathematical theory of epidemics

Microbiotaderived lantibiotic restores resistance against vancomycin-resistant Enterococcus

PACAP is a pathogen-inducible resident antimicrobial neuropeptide affording rapid and contextual molecular host defense of the brain

The science of antibiotic discovery

Phenotypic and genotypic characteristics of methicillin-resistant Staphylococcus aureus (MRSA) related to persistent endovascular infection. Antibiotics (Basel) 8, 71

Controllability of complex networks

Neurotransmitter profiles are altered in the gut and brain of mice monoassociated with Bifidobacterium dentium

Bacterial persisters are a stochastically formed subpopulation of low-energy cells

Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts

A systematic assessment of current genome-scale metabolic reconstruction tools

Identifying determinants of persistent MRSA bacteremia using mathematical modeling

Network motifs: simple building blocks of complex networks

Functional characterization of bacterial sRNAs using a network biology approach

Systems biology analysis of the Clostridioides difficile core-genome contextualizes microenvironmental evolutionary pressures leading to genotypic and phenotypic divergence

A workflow for generating multi-strain genome-scale metabolic models of prokaryotes

BiGG Models 2020: multi-strain genome-scale models and expansion across the phylogenetic tree

Klebsiella michiganensis transmission enhances resistance to Enterobacteriaceae gut invasion by nutrition competition

What is flux balance analysis?

Protein structure determination using metagenome sequence data

Summing up the noise in gene networks

Metabolic network modeling of microbial interactions in natural and engineered environmental systems

Path-seq identifies an essential mycolate remodeling program for mycobacterial host adaptation

Intricate genetic programs controlling dormancy in Mycobacterium tuberculosis

MtrA regulation of essential peptidoglycan cleavage in Mycobacterium tuberculosis during infection

Network analysis identifies Rv0324 and Rv0880 as regulators of bedaquiline tolerance in. Mycobacterium Tuberculosis

In silico evaluation and exploration of antibiotic tuberculosis treatment regimens

Revealing 29 sets of independently modulated genes in Staphylococcus aureus, their regulators, and role in key physiological response

Multikingdom ecological drivers of microbiota assembly in preterm infants

Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks

Limits of multifunctionality in tunable networks

iModulonDB: a knowledgebase of microbial transcriptional regulation derived from machine learning

Machine learning uncovers independently regulated modules in the Bacillus subtilis transcriptome

Nafcillin enhances innate immunemediated killing of methicillin-resistant Staphylococcus aureus

The Escherichia coli transcriptome mostly consists of independently regulated modules

Mining all publicly available expression data to compute dynamic microbial transcriptional regulatory networks

COVID-19 lockdown induces disease-mitigating structural changes in mobility networks

The gut microbiota is associated with immune cell dynamics in humans

How to predict molecular interactions between species?

Genomescale metabolic modeling for unraveling molecular mechanisms of high threat pathogens

Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells

Pulse vaccination strategy in the SIR epidemic model

Ecological modeling from time-series inference: insight into dynamics and stability of intestinal microbiota

Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus. Evol. 4, vey016

Post-antibiotic gut mucosal microbiome reconstitution is impaired by probiotics and improved by autologous FMT

World health organization report: current crisis of antibiotic resistance

Comparing methods for comparing networks

Optimal drug synergy in antimicrobial treatments

Stability of a stochastic SIR system

Resolving host-pathogen interactions by dual RNA-seq

Modeling COVID-19 dynamics in Illinois under nonpharmaceutical interventions

Antibiotic-Induced changes to the host metabolic environment inhibit drug efficacy and alter immune function

Bow-tie signaling in c-di-GMP: machine learning in a simple biochemical network

Synthetic peptides that exert antimicrobial activities in whole blood and bloodderived matrices

Drug interactions and the evolution of antibiotic resistance

Context mediates antimicrobial efficacy of kinocidin congener peptide RP-1

All authors contributed to the concept and writing of the manuscript. JBX made the figures.