key: cord-0434458-ym9r9ai5 authors: Peterson, Andrew A.; Goldsmith, C. Franklin; Rose, Christopher; Medford, Andrew J.; Vegge, Tejs title: Should the rate term in the basic epidemiology models be second-order? date: 2020-05-10 journal: nan DOI: nan sha: 124ffab26ca4cedf5b76d4a64ed734434b95b9a8 doc_id: 434458 cord_uid: ym9r9ai5 Global leaders are taking unprecendented steps with long-term financial and societal impact to prevent the spread of COVID-19, relying on early data and epidemiological models of differing complexity. The standard"compartment"models of the spread of disease through a population follow the law of mass action; that is, the rate of conversion of individuals from one compartment to another is directly proportional to the number of individuals in the former. The basic S-I-R model, which tracks susceptible, infectious, and recovered people, is functionally identical to autocatalytic reactions studied in chemical kinetics. However, if we account for heterogeneity in individuals' susceptibility to catch the disease, then the most susceptible individuals are likely to be removed from the pool early; this is analogous to evaporative cooling in chemical kinetics. Here, we use standard tools from statistical physics to account for this effect, which suggest that the rate of appearance of new cases should be proportional to the emph{square} of the number of susceptible people in the population, not the first power. We validate this finding with both Monte Carlo (stochastic) and many-compartment differential equation (deterministic) simulations. This leads to quite different predictions of the peak infection rate and final outbreak size, with the second-order model giving milder predictions of each. This has dramatic effects on the extrapolation of a disease's progression when the data collected comes from the early, exponential-like region, and may remove a systematic bias from such models. The classic models of epidemiology divide the population into categories based on their experience with the disease. 1, 2 The simplest is the so-called S-I-R model, which tracks the number of people susceptible to the disease, the number of people who are actively infectious, and the number of people who are recovered (or deceased) and are considered neither infectious nor susceptible to re-infection. Individuals are assumed to move between compartments by the following conversions: That is, an infectious person can convert a susceptible person into a second infectious person with a certain rate proportional to β, and an infectious person can recover at second rate proportional to γ. This is directly analogous to an autocatalytic reaction in chemical kinetics. 3, 4 The differential equations that track the disease in the classic S-I-R model are written as where x S , x I , and x R track the fraction of susceptible, infectious, and recovered people in a population, with x S + x I + x R = 1. The model is functionally identical to those used in chemical kinetics under the law of mass action, 7 where the rate of a reaction is directly proportional to the concentration of reactants. Although this model is very simple, it provides epidemiologists and policymakers with valuable intuition on the progression of an outbreak, and often forms the basis for more complex models that include such effects as geography, travel, latency, susceptibility to re-infection, stochasticity, vital dynamics, etc. 8, 9 Our discussion focuses on the rate of new infections, r = βx I x S , with the assumption that many of the suggestions in this article are transferable to more sophisticated models which include similar rate terms. It is well-established that there is a large heterogeneity in the transmission of disease, with relatively few individuals responsible for spreading the vast majority of new cases. 5,10 This is the so-called "super-spreader" effect, in which new infections are found to roughly follow the 80/20 rule: about 80% of new cases can be attributed to just 20% of spreaders, 11-13 although for various diseases the most infectious 20% are estimated to create around 45-90% of new cases. 5 As an example, Figure 1 shows published data on disease spread for the 2003 SARS outbreak in Singapore; most individuals infected few or no others, but a long tail of highly infectious people exists, which dominated the total new cases. The likelihood of infecting new people is considered to be continuously distributed in a population. 5 Although the reasons for variations in spreading are debated, it is clear that there must a mixture of behavioral attributes (such as the number of new people encountered by an individual in a typical day, density of their work environment, or personal hygiene habits) and inherent (such as genetic differences in immune system, age, etc.). Here, we hypothesize that if the likelihood to spread a disease follows a distribution in the population, then the likelihood to catch a disease shouldas a first assumption-follow a similar distribution. We can imagine that many of the same traits responsible for spreading (e.g., frequency of encountering new people, hygiene) apply to catching as well, while other inherent traits (e.g., immune system, 14 age, etc.) may drive variability in spreading and catching differently. If certain people are more prone to catch a diseasee.g., due to physiological, demographic, or geographic conditions-then those people will be disproportionately infected towards the beginning of an outbreak. As these infected individuals are removed from the pool of the susceptible, not just the number of people in the susceptible pool will decrease, but also the average susceptibility of the pool will decrease, which should slow the rate of spread of the disease. This is analogous to evaporative cooling effects in physical chemistry: the particles that are removed via evaporation are disproportionately the highest energy particles in the system, and thus the mean energy (that is, the temperature) of the system decreases; this slows the rate of evaporation over time. Here we investigate the importance of this effect on epidemiological models, using the tools of statistical physics. We will define an individual's infection susceptibility, ε, as being proportional to their likelihood of becoming infected at a given instant. Under this definition, an individual with ε=2 will become infected in half the time, on average, as an individual with ε=1, given they exist in the same population of infected people. Without loss of generality, we define the mean susceptibility of the initial, uninfected population to be 1. As a first approximation, we will assume that the shape of the susceptibility curve is qualitatively similar to the shape of the spreading curve; i.e., most people are modestly susceptible, but some people are highly susceptible. For simplicity, we choose an exponential distribution. We will examine the impact of this specific choice later in this work. The population distribution of people according to their susceptibilities can then be written as where n S (ε) is the number density of individuals as a function of their susceptibility ε, andε is the mean susceptibility of the population. Since a person's instantaneous likelihood of being infected is proportional to ε, then the infection probability population density will be This distribution, which we refer to as the "draw probability" is shown along with the population distribution in Figure 2 . The draw probability is skewed to the right and will be for any assumed distribution; i.e., it is more likely to withdraw individuals with higher susceptibilities-by definition. For an exponential distribution, the draw probability has a mean value of 2ε (see SI); that is, the average person who gets sick was twice as susceptible as the susceptible population as a whole. This will effectively put downward pressure on the average susceptibility of the vulnerable pool. How does this affect the model predictions? For brevity, we'll summarize the logic of the derivation here; full rigor is provided in the SI. With our definition of susceptibility, the rate of new infections will be the integral over all values of susceptibility: (This form is general, if the entire population has an identical susceptibility; that is, ε=1, the above equation simplifies to the classic S-I-R rate.) This rate equation can be expressed in terms of the average susceptibility (ε) by recognizing that ∞ 0 ε n S (ε) dε =ε N S , which gives r = βx Iε x S . We then need to express how the average susceptibilityε varies over the course of time; since, on average, the susceptibility of a newly infected person is 2ε, the total susceptibility of the population changes as d(εN S )/dN S = 2ε, where N S is the number of susceptible individuals. Accordingly, the average susceptibility of the susceptible pool decreases in direct proportion to the number of individuals left in the pool: This is quite a dramatic effect, as we see when this is inserted into the rate equation: That is, when we account for the heterogeneity of the susceptible pool, the rate equation becomes second-order in the number of susceptible individuals remaining! The differential equations in the S-I-R model should therefore be expressed as below to account for the heterogeneity of infection susceptibilities among the population: Before examining how this difference is manifested, we need to understand the relationship between the infection rate coefficient β in differential equations (1) and (4). At early stages in the outbreak of a new disease, the doubling time is often used to estimate the rate coefficients; at this stage, x S ≈1, so both the first-and second-order equations behave identically: That is, an (approximately) exponential growth equation results with x I /x I,0 = exp((β −γ)t), and (β −γ) would be fit to doubling times in an identical fashion. (This is analogous to initial-rate measurements in chemistry.) Thus, the estimate of β would be identical for either model, so we can make an apples-to-apples comparison of how first- second-order models follow essential exponential characteristics, and would give identical predictions of the rate coefficients (or R0). However, they diverge dramatically after the early exponential phase. and second-order behavior manifests itself as the disease progresses, based on estimates of the rate coefficients at early stages. This is shown in Figure 3 , which tracks the growth in either model; at early stages both models are indistinguishable from exponential growth; however, as the models diverge from the exponential model they also diverge significantly from each other, with the 2nd-order model showing much dampened growth. We next analyze the implications of this for model predictions, in which we assume that β and γ are estimated in the early stages of the disease, when the estimates will be identical for either model. Figure 4 shows the results of numerically integrating the classic first-order model, eqs. (1), versus the second-order model that accounts for variation in infection susceptibility, eqs. (4). The secondorder behavior becomes very significant as the epidemic progresses, ultimately predicting a less steep peak and a substantially lower percentage of individuals who are eventually infected. A crucial output of an S-I-R-like model is the fraction of the population that must gain immunity before the rate of new cases falls into decline (often referred to as herd or community immunity). When x S falls bellow a critical value, x * S , then the sign of dx I /dt changes from positive to negative; i.e., the disease is no longer formally an epidemic. Policymakers (ideally) take into consideration predictions of x * S to determine when the outbreak will begin to decline naturally, and thus when it is safe to loosen societal constraints, such as stay-at-home orders. The first-order and second-order models have different predictions for x * S : (β/γ is often referred to as R 0 , the basic reproduction number, in epidemiological reports. 2 ) These critical x * S 's are shown with dashed lines on the figure for both models. For the second-order model, the peak occurs when a much lower fraction of the population is infected. With the arbitrary parameters chosen for the current modelwhich we emphasize are not chosen to accurately simulate any current outbreak-we see the implications are quite dramatic: the classic S-I-R model suggests the outbreak goes into decline when ∼50% of the population has gained immunity, whereas if we account for heterogeneity in infection susceptibility, we see this number is closer to 25%. These estimates would lead to very different policy implications and economic outfall. The second-order behavior was inferred by assuming that infection susceptibilities are continuously distributed in a population. If this is correct, we should see the same behavior emerge if we compartmentalize the population into discrete levels of susceptibility, then allow the number of these compartments to grow. We performed such simulations using both stochastic and deterministic approaches, both of which are described in the SI. For the stochastic (Monte Carlo) simulations, we divided the population into discrete levels of susceptibility, and randomly withdrew individuals according to the distribution of equation (2) . During this process, we saw theε of the susceptible pool decrease in direct proportion to x S , as predicted by equation (3), and the overall population susceptibility followed x 2 S . Our deterministic simulations took the same approach of compartmentalizing the population, but treated each compartment with a separate differential equation. We show results of of numerically integrating the differential equations in Figure 5 , with a single compartment (that is, every-one has ε=1), the first-order behavior is recovered; as we add compartments we quickly see the second-order behavior emerge. This suggests that models that explicitly include categories for variation in susceptibility or transmission [15] [16] [17] may implicitly capture at least a portion of the second-order behavior, though more detailed comparisons are required. In our treatment so far, we have used an exponential distribution to capture the variability in susceptibility in a population. This "Boltzmann" distribution allows us to use the tools of statistical physics to examine the consequences of this effect, leading to a very simple, yet consequential result: changing a first-order term to a second-order term. What happens if the distribution of susceptibilities in the population follows a different distribution? For a hint at possible distribution shapes we can examine those for disease spreaders. As mentioned earlier, in an examination of several diseases, the most infectious 20% of spreaders were found to be responsible for 45-90% of new infections; 5 under the exponential distribution, the top 20% of spreaders would be responsible for about 50% of new cases. Therefore, we can consider the exponential distribution to be on the conservative end of estimates. If the true initial distribution of susceptibility has a longer tail (that is, a greater proportion of highly-susceptible individuals) than the exponential distribution, this will make the effect more pronounced. That is, these very high-ε individuals will be disproportionately "discovered" at the beginning of the outbreak, putting even greater downward pressure on the average, before the dynamics of the process ultimately convert the distribution to a more exponential-like form. The effect of the initial distribution can readily be simulated with either the stochastic or binned deterministic The left plot shows the results of integrating a pure 1st-order and pure 2nd-order model as limiting cases, and compares integrating a first-order model with the population binned by susceptibility with an increasing number of bins. The binning structure is shown on the right; the height of the block indicates the population density of the bin, and the red dot indicates the mean of the bin used in the discrete model. approaches discussed earlier. If the variability in susceptibility among individuals in the population is low: say that the entire population fell in 0.9 < ε < 1.1, then this effect will be much less pronounced. However, we emphasize that any variability in susceptibility between individuals in the population will always lead to the average susceptibility of those infected, ε removed , to be greater than the average susceptibility of the uninfected pool, which will still put downward pressure on the mean. It is only in the (unrealistic) limit that all individuals have identical susceptibility (ε=1) that the first-order model emerges. Interestingly, in both types of simulations we ran, we found that the dynamics of the process itself pushed distributions into an exponential shape. As we describe in the SI, even if we set the initial population to have such extreme shapes as a uniform distribution or a linear decay, the shape of the distribution evolved into an exponential distribution. This is perhaps not surprising, as the exponential (Boltzmann) distribution is derived as the distribution that maximizes entropy 18, 19 -that is, the most probable distribution-and we can expect processes to evolve towards such high-entropy states. (We derive the mechanism for this effect in the SI.) Given these factors, we suspect that the exponential distribution is not just a reasonable and convenient distribution, but the most appropriate one to use in the absence of more detailed information. There are other factors which could make this effect either more or less pronounced. First, this derivation does not assume any correlation between spreading propensity and infection susceptibility. Put simply: are some super-spreaders also super-catchers? If a positive correlation exists between super-spreaders and super-catchers, then the super-spreaders will be most active at the beginning of an outbreak and contribute disproportionately to the rate of infection, but as the outbreak progresses, they also will be disproportionately removed, causing a more downward pressure on the relative rate of infection. In contrast, if there is no correlation, then superspreaders pop up throughout the outbreak with a frequency identical to their prevalence in the population, and the average kinetics are unaffected. It seems plausible that many of the social traits that are found in superspreaders also would be common in super-catchers. We leave the mathematical analysis of this possible synergistic effect to future work, but if a positive correlation exists, we might consider the second-order behavior to be a lower bound on the kinetic effects of variation in population in the S-I-R model. On the other hand, an implicit assumption in our derivation is that individuals' susceptibilities are static quantities; that is, they do not significantly change with time. For example, if an individual changes their behavior patterns from one month to the next, they may go from having a low susceptibility to a high susceptibility. We leave the balance between these two counter-balancing factors to future works, but for now we consider the second-order model to be a reasonable approximation, and a significant deviation from the classic first-order model. The prior analysis raises the question: Can this modification to the S-I-R model account for any discrepancies in real-world data? We believe this modification may remove an implicit, systematic bias which would push the model predictions of the ultimate size of outbreaks downward. It is not uncommon for models to over-predict the ultimate size of epidemics. 20 els predict worse-than-observed outcomes, including successful public-health interventions.) The classic S-I-R model predicts the final size of an epidemic based on the basic reproduction number (β/γ) to be given by the implicit equation below, although we note that temporary measures such as social distancing can reduce the final size of an outbreak. 2, 24 The equations for the final size, expressed as the final proportion of susceptible (neverinfected) individuals x ∞ S in the 1st-and 2nd-order models are: In the 2009 H1N1 influenza outbreak, estimates of the final outbreak size-which were based on the initial re-productive numbers-were reportedly much higher than occurred in practice; 20 we expect this pandemic to be a reasonable test of the model predictions, since the outbreak reached global proportions while large-scale interventions, such as social distancing, were not employed. The ultimate fraction of the population infected was on the order of 20%, as measured by seroepidemiological studies and shown in Figure 6 . The classic, 1st-order S-I-R model gave predictions of the final proportion infected to be greater than 50% (on average), as also shown in the figure. (We calculated these values based on 78 measurements of the reproductive number from reference [21] ; our final-size calculations are in agreement with reference [20] .) Using the same 78 published estimates of reproductive numbers, the final size estimated from the second-order model is around 30% and much closer to the final size observed in practice, as shown in the figure. In conclusion, if we account for heterogenerity in infection susceptibility, the rate equation of the standard S-I-R model is changed from first-to second-order. This provides a mathematically simple route to account for variability in human populations that are implicitly neglected in classic epidemiological models, although its interactions with more complex simulations need to be determined. Similar arguments should hold for models in population biology where populations exhibit variability, such as in the classic predator-prey models. 25 We hope that this insight, from analagous systems in chemical kinetics and statistical physics, can help to inform future epidemiological models, and perhaps remove an inherent systematic bias from model predictions. * andrew peterson@brown.edu Should the rate term in the basic epidemiology models be second-order? Andrew A. Peterson, C. Franklin Goldsmith, Christopher Rose, Andrew J. Medford, Tejs Vegge Contents S1 Detailed derivation of emergence of second-order kinetics 1 S1. Here, we provide a more detailed derivation of the emergence of second-order kinetics. We follow the classic S-I-R model, where the population is divided into susceptible, infectious, and recovered, and define N S , N I , and N R as the number of people in each of these compartments. The fraction of the population in these compartments is x S , x I , and x R , defined by dividing each of these numbers by the total population, N total (= N S + N I + N R ). We also assume a "virgin epidemic" [1] , in which the entire population is in principle susceptible to the disease (minus the tiny fraction infected at the start of the model). We examine the virgin S-I-R model for simplicity and transparency, and this general framework should lend itself to adaptation to more complex models. We define an individual's infection susceptibility ε to be proportional to their instantaneous rate of becoming infected. If we have discrete levels of susceptibility in a population, the rate equation of new infections is written as where x S,i is the fraction of the total population with susceptibility ε i . If we assume the susceptibility is continuously distributed with a distribution given by n S (ε), then the rate will formally be defined aŝ is properly a distribution of rates; a finite rate for a specific range of susceptibilities is obtained by integratingr(ε) between any two values of ε. The total rate of new infections at any point in time is then 1 arXiv:2005.04704v1 [q-bio.PE] 10 May 2020 We can note that the average susceptibility of the susceptible pool can be calculated, at any time, bȳ Thus, in either case the rate of new infections simplifies to where we emphasize that the mean susceptibilityε of this population can change over the course of the epidemic, consistent with the definitions ofε above. Equation (S.2) is general and applies to any distribution-discrete, continuous, or single-valued. We also enforce that the average susceptibility is one at time zero,ε 0 =1, to ensure compatibility of the definition ofε with the classic model. (We note that we recover the classic S-I-R model by assuming that every individual is identically susceptible to infection; that is, whenε(t) =ε 0 = 1.) We next need to find how this value changes in response to the epidemic. For reasons we discuss in the text and in Section S4, we choose a continuous, exponential distribution to describe the variation in infection susceptibility within the population. This distribution is given by We can verify that the total number of individuals in the distribution is N S and that the mean of this distribution is given byε: For convenience, we will define the total susceptibility of the population as E ≡ N S ·ε = ∞ 0 ε n S (ε) dε. The average ε of a person removed from the population, at any instant in time, is found from the distribution of rates as (see note in Section S1.1 below) That is, the average not-yet-infected person has susceptibility ofε, while the average person becoming infected has susceptibility 2ε. This is valid at any time, provided the population stays exponentially distributed (which, as we discuss in Section S4 we consider to be a reasonable assertion). Therefore, the total susceptibility of the pool changes as Since E =ε N S , it follows that dE dNS =ε + N S dε dNS . Combined with the previous result, we obtain: For the case of a virgin epidemicε 0 = 1 and N S,0 ≈ N total , so we find That is, the average susceptibility decreases in direct proportion to the fraction of the population still susceptible. Inserting this into our rate equation, (S.2), gives rise to second-order kinetics: And thus, if we account for the heterogeneity of infection susceptibility in the population, the S-I-R model should be written as Here, we provide justification for equation (S.3). (Note that for simplicity, we referred to this as the "draw probability" in the main text; what we provide here is more rigorous.) Let's first consider the simple case where we just have two discrete susceptibilities, ε 1 and ε 2 . The average ε of those individuals withdrawn (infected) in the time period between t and t + δt is ε removed = ε 1 · number withdrawn with ε 1 in (t, t + δt) + ε 2 · number withdrawn with ε 2 in (t, t + δt) number withdrawn with ε 1 in (t, t + δt) + number withdrawn with ε 2 in (t, t + δt) Taking the limit as δt → 0 gives us the instantaneous average ε of those withdrawn at time t (using L'Hôpital's Rule and the Fundamental Theorem of Calculus): This can be generalized to any number of levels as ε removed = i ε i r i i r i In the continuum limit, the above becomes We provide here a brief, qualitative derivation that reaches the same conclusion as above, in an alternate framework that may be more intuitive to some. First, consider the exponential distribution; the "sliver" at ε=0 represents an infinitesimally small segment of the population that can never become sick. This "sliver" will never leave the susceptible category, and thus the height of the curve at ε=0 will stay constant with time. The population distribution was given as where the intercept is N S /ε. At time zero, in a virgin epidemic, this quantity is Since the intercept does not change with time, the above equality must be true at all times. Putting inε 0 =1 and x S = N S /N total , we immediately findε To numerically validate the response of the mean population susceptibilityε to the dynamics, as well as to explore the effect of the initial distribution, we performed Monte Carlo (stochastic) simulations. In these simulations, we divided the population into small compartments, each with the same value of ε; that is, all individuals in bin i had a discrete susceptibility ε i . As we increase the number of compartments, the spacing between adjacent values of ε i becomes smaller, approaching the continuous limit. The probability of drawing from bin i is proportional to the number of individuals in that bin times the susceptibility of that bin, that is In the main text, we referred to this as the "draw probability". After assigning an initial distribution of the population-that is, filling the bins, which as a first approximation we assigned as exponential-we next sampled from the above probability distribution. That is, we drew a random number and assigned it to one of the above categories based on its probability. Figure S-1 shows a snapshot of this Monte Carlo simulation for N =100,000 individuals divided into M =1000 bins. Initially, the bins were distributed according to an exponential distribution with mean susceptibilityε=1. The bins were spaced such that each bin initially had 100 individuals in it; that is, bins on the right were wide and bins on the left were narrow. Samples were withdrawn from bins one-by-one according to the probability distribution above. This figure was made when 40,000 individuals had been removed from the pool of susceptible. The top-left plot shows a histogram of the current distribution of individuals according to their susceptibility, which we can see has remained exponential. The curve below this shows the draw probability, which is simply the population of each bin times the susceptibility ε of that bin. The top-right curve shows the average susceptibilityε of the susceptible pool; importantly, we see this decreasing in direct proportion to x S , as predicted by equation (S.4) . The curve below this shows the total susceptibility of the population over the course of the simulation, where we see second-order behavior emerge. Our Monte Carlo simulations were in excellent agreement with the equations derived in this work, which are shown in the figure with the curve marked "theory". The simulation described here started with a perfect exponential distribution. In Section S4, we discuss the effect if another distribution was instead assumed. In principle, if we divide the population into many bins according to their susceptibilities ε i , and run a first-order model assuming classic S-I-R behavior, we should see the second-order behavior emerge as the number of bins is increased. In this approach, we can divide individuals into M + 2 compartments: infected (I), recovered (R), and susceptible (S i , for i = 1, · · · , M ). The ordinary differential equations (ODEs) for this first-order model are then written as We numerically integrated the above set of M + 2 equations, using the LSODA package as implemented in SciPy [2] . This allowed us to explore the effect of increasing the number of bins M . In all simulations, we kept the mean of the initial populationε 0 to be 1. When only a single bin was used (M =1), all individuals have identical susceptibility ε=1; this limit therefore is identical to the classic S-I-R model. As the number of bins M becomes large, we approach the continuum limit: that is, ε is continuously distributed in the population. We can thus test to see if second-order behavior emerges by increasing the number of bins in a first-order model. The results of this were shown in the main text of the manuscript, where we did indeed see second-order behavior emerging. This was shown assuming the initial distribution was exponentially distributed (when M > 1), and we again saw the distribution remain exponential throughout the course of the simulation. We discuss the effect of other initial distributions in Section S4. In statistical physics, the "Boltzmann" distribution, e −ε/kBT , arises naturally by maximizing entropy under the constraint of conserving energy; this is actually how the concept of temperature is formally derived in many textbooks. [3] (In this equation, k B is the Boltzmann constant, which serves to convert units between temperature T and energy ε.) In the Boltzmann distribution, the energies ε are the microscopic distribution: such as kinetic energies of gas-molecules. As a maximum-entropy distribution, this implies that this is the most likely distribution to be encountered at a given temperature. In the mathematics and simulations described in the text, we chose the exponential distribution to describe the variability in susceptibilities ε that might be encountered in a population, both due to its similarity to published curves for spreading and its elegance in mathematical treatments. What happens if the initial distribution of ε in the susceptible population follows a different distribution? We can test this quite simply by running either our Monte Carlo or binned-ODE simulation with different initial distributions. Interestingly, when we do so, we find that all distributions of ε tend towards the exponential distribution during the course of the simulation, which we observed in both the Monte Carlo and the binned-ODE simulations. We tested this even on extreme distribution shapes, such as uniform and triangle-shaped distributions, and we invariantly saw distributions tend towards a Boltzmann shape (which we quantified based on the distribution's root-mean-squared error, RMSE, versus an exponential distribution with the same mean). We show this for a triangle distribution in Figure S-2 . The left column shows the progression in the binned ODE model; the initial population was given a "triangle" distribution: that is, a linear decay from the y intercept to 0. As the simulation progresses, the shape of the distribution progresses towards an exponential distribution. We see exactly the same behavior (but with noise) in the case of the Monte Carlo simulation. We emphasize that no mathematics including an exponential distribution of ε's was introduced into either simulation; the distribution emerges naturally from the dynamics. Why does this occur? Clearly the mechanism is different than in the cases Boltzmann was considering, where particles exchange energy with one another. Here, we have explicitly assumed that individuals' susceptibilities are static-that is, they do not naturally re-distribute over the time scale of an epidemic. We can consider the distribution of susceptibilities, which we'll write here as n S (ε, N ), to evolve with the number of people removed from the susceptible pool, N , according to the differential equation: Upon integration, this results in n S (ε, N ) = n S (ε, 0) e −a ε N If the initial distribution is exponential (n S (ε, 0) = e −b ε , with b > 0), we see that the distribution stays exponential at all times: n S (ε, N ) = e −b ε e −a ε N = e −(b+a N )·ε Now let's examine if the initial distribution is a perturbation from an exponential (n S (ε, 0) = e −(b+N (ε))ε ), where N (ε) describes the perturbation from an exponential distribution, which we'll assume to be within a similar magnitude as b for all ε. We have n S (ε, N ) = e −(b+N (ε))·ε e −a ε N = exp {− (b + a N + N (ε)) · ε} We can see from the above equation that as N grows, b + a N will start to dominate over N (ε), thus making the distribution tend towards an exponential. In our simulations, we saw this for such large perturbations as triangle-and square-shaped initial distributions. Susceptibility to infection Estimating the basic reproductive ratio for the Ebola outbreak in Liberia and Sierra Leone Modeling COVID-19 on a network: super-spreaders, testing and containment Risk SIR Model with Optimally Targeted Lockdown Molecular Driving Forces: Statistical Thermodynamics in Biology, Chemistry, Physics, and Nanoscience. Garland Science Did Modeling Overestimate the Transmission Potential of Pandemic (H1N1-2009)? Sample Size Estimation for Post-Epidemic Seroepidemiological Studies Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature Models overestimate Ebola cases Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola Generality of the Final Size Formula for an Epidemic of a Newly Invading Infectious Disease Mathematical Models in Population Biology and Epidemiology Modeling infectious epidemics Molecular Driving Forces: Statistical Thermodynamics in Biology, Chemistry, Physics, and Nanoscience. Garland Science