key: cord-0220572-aydi0uju authors: Hindes, Jason; Assaf, Michael; Schwartz, Ira B. title: Extreme outbreak dynamics in epidemic models date: 2021-08-02 journal: nan DOI: nan sha: 7836e787f5da746b31efbd254368fe80ef286b4d doc_id: 220572 cord_uid: aydi0uju Motivated by recent epidemic outbreaks, including those of COVID-19, we solve the canonical problem of calculating the dynamics and likelihood of extensive outbreaks in a population within a large class of stochastic epidemic models with demographic noise, including the Susceptible-Infected-Recovered (SIR) model and its general extensions. In the limit of large populations, we compute the probability distribution for all extensive outbreaks, including those that entail unusually large or small (extreme) proportions of the population infected. Our approach reveals that, unlike other well-known examples of rare events occurring in discrete-state stochastic systems, the statistics of extreme outbreaks emanate from a full continuum of Hamiltonian paths, each satisfying unique boundary conditions with a conserved probability flux. Introduction. Epidemic models are useful for understanding the general dynamics of infectious diseases, rumors, election outcomes, fads, and computer viruses [1] [2] [3] [4] [5] [6] [7] [8] . Moreover, in the early days of emerging disease outbreaks, such as the current COVID-19 pandemic, societies rely on epidemics models for disease forecasting, as well as identifying the most effective control strategies [9] [10] [11] [12] . To this end it is useful to quantify the risks of local epidemic outbreaks of various sizes. Within a given population, outbreak dynamics are typically described in terms of compartmental models [1, 4, 13] . For example, starting from some seed infection, over time individuals in a population make transitions between some number of discrete disease states (susceptible, exposed, infectious, etc.) based on prescribed probabilities for a particular disease [9, 10, 12, [14] [15] [16] . In the limit of infinite populations the stochastic dynamics approach deterministic (mean-field) differential equations for the expected fraction of a population in each state [1, 4, 13, 17] . Yet for real finite populations, outbreak dynamics have a wide range of different outcomes for each initial condition, which are not predicted by mean-field models. A natural and canonical question (for both statistical physics and population dynamics) is, what is the distribution of outbreak sizes? Beside stochastic simulations [1, 14, 17, 18] , methods exist for e.g., recursively computing the outbreak statistics [16, 19, 20] , solving the master equation for the stochastic dynamics directly by numerical linear algebra [18] , or deriving scaling laws for small outbreaks near threshold [21] [22] [23] . Yet, in addition to being numerically unstable for large populations, computationally expensive, or limited in scope, such methods also fail to provide physical and analytical insights into how unusual and extreme outbreaks occur. Here we develop an analytical approach based on WKB-methods [24] [25] [26] which provides a closed-form expression for the asymptotic outbreak distribution in SIR, SEIR, and COVID-19 models with fixed population sizes (N ) and heterogeneity in infectivity and recovery [12, [27] [28] [29] . We show that each outbreak is described by a unique most-probable path, and provide an effective picture of how stochasticity is manifested during a given outbreak. For instance, compared to the expected mean-field dynamics each outbreak entails a unique, depletion or boost in the pool of susceptibles and an increase or decrease in the effective recovery rate, depending on whether the final outbreak is larger or smaller than the mean-field prediction. Most importantly, unlike usual rare-event predictions for epidemic dynamics, such as extinction or other large fluctuations from an endemic state [26, [30] [31] [32] , and fade-out [33] , our results do not rely on metastability [25, [34] [35] [36] [37] , and thus are valid for the comparatively short time scales of outbreaks, O(ln N ) [38] . In sharp contrast to systems undergoing escape from a metastable state, we show that the outbreak distribution corresponds to an infinite number of distinct paths-one for every possible extensive outbreak. Each outbreak connects two unique fixed-points in a Hamiltonian system, both with non-zero probability flux. Hence, by solving a canonical problem in population dynamics and non-equilibrium statistical physics, we uncover a new degenerate class of rare events for discrete-state stochastic systems. Baseline model. We begin with the Susceptible-Infected-Recovered (SIR) model, often used as a baseline model for disease outbreaks [1, 2, 4, 13] . Individuals are either susceptible (capable of getting infected), infected, or recovered/deceased, and can make transitions between these states through two basic processes: infection and recovery. Denoting the total number of susceptibles S, infecteds I, and recovereds R in a population of fixed size N , the probability per unit time that the number of susceptibles decreases by one and the number of infecteds increases by one is βSI/N , where β is the infectious contact rate [1, 2, 4] . Similarly, the probability per unit time that the number of infecteds decreases by one is γI, where γ is the recovery rate [1, 2, 4] . Combining both processes results in a discrete-state system with the following stochastic reactions: As N is assumed constant, the model is appropriate for the short time scales of early emergent-disease outbreaks, arXiv:2108.00994v2 [q-bio.PE] 28 Jan 2022 for example, with an assumed separation between the outbreak dynamics and demographic time scales, as well as re-infection [1] . From the basic reactions (1-2), the master equation describing the probability of having S susceptibles and I infecteds at time t is Solving this equation allows one to predict the probability that a particular proportion of a population eventually becomes infected for a given set of parameters. This is our goal here, as in many other works [1, 14, [16] [17] [18] [19] [20] [21] [22] [23] . Yet, in full generality such equations cannot be solved analytically, and one must resort to high-dimensional numerics, recursive computations, and/or large numbers of simulations [18] . Yet, if N is large it is possible to construct an asymptotic solution to Eq.(3) for all O(N ) outbreaks using WKB methods [24] [25] [26] , as we will show. First, to summarize what is known for large N , let us define the fraction of individuals in each disease state x w = W/N where W ∈ {S, I, R}. Note that as the total population size is constant, 1 = x r +x s +x i . The mean-field limit of the reactions (1-2), corresponds to a simple set of differential equations:ẋ s = −βx i x s ,ẋ i = βx i x s − γx i , andẋ r = γx i . Of particular interest is the total fraction of the population infected in the long-time limit, x * r = x r (t → ∞), whose average, x * r , can be found by integrating the mean-field system. For small initial fractions infected, the solution (according to the mean-field) depends only on the basic reproductive number, R 0 ≡ β/γ [1, 2, 4, 13] , and solves the equation 1−x * r = e −R0x * r [1, 39] . But, what about a half, a fourth, twice, etc. of this expected outbreak, or a case in which the entire population eventually becomes infected? Since the SIR-model is inherently stochastic and governed by Eq.(3), such solutions are also possible. To get a sense of how the probabilities for various outbreaks arise, and to guide our analysis, we perform some stochastic simulations, and plot (on a semi-log scale) the fraction of outcomes that result in a given total-fraction infected. Examples are shown in Fig.1 for outbreaks: 100% (blue), 98% (red), and 96% (green) when R 0 = 2.5. For reference, the meanfield outbreak of 89% (magenta) is also plotted. Here and throughout, simulations were performed using Gillespie's direct method [1, 14, 40] starting from a single infectious individual. Notice that for each outbreak value, ln P is linear in N , with a slope that depends on the outbreak, ln P (x * r ) N S(x * r ). This asymptotic WKB scaling is consistent with what we expect on general theoretical grounds for large deviations in stochastic population models with a small O(1/N ) noise parameter [24] [25] [26] 30] . Equipped with the WKB hypothesis for the distribution of outbreaks, we substitute the ansatz P (x s , x i , t) ∼ exp[−N S(x s , x i , t)] into Eq.(3), and keep leading-order terms in N 1. In particular, we do a Taylor expan- FIG. 1. Extreme outbreak probability scaling with the population size in the SIR model. Plotted is the probability that 100% (blue), 98% (red), 96% (green), and 89% (magenta) of the population are infected during an outbreak vs N . Results from 10 11 simulations (symbols) are compared with theoretical lines whose slopes are given by Eq. (7). Here R0 = 2.5. sion of P (x s , x i , t); e.g., P (x s + 1/N, x i − 1/N, t) e −N S(xs,xi,t)−∂S/∂xs+∂S/∂xi . This allows finding the leading-order solution [41], called the action, given by S(x s , x i , t) [24, 26, 30] . Taking the large-N limit in this way converts the master equation (3) [24, 25] , with a Hamiltonian given by Here the momenta of the susceptibles and infecteds are respectively defined as p s = ∂S/∂x s and p i = ∂S/∂x i . As a consequence, in the limit of N 1 the outbreak dynamics satisfy Hamilton's equations:ẋ w = ∂H/∂p w andṗ w = −∂H/∂x w , just as in analytical mechanics [42] . Furthermore, solutions are minimumaction [25] , or maximum-probability. Namely, given boundary conditions for an outbreak, Hamilton's equations will provide the most-likely dynamics. As in mechanics, once the dynamics are solved, the action S(x s , x i , t) can be calculated along an outbreak path: Before continuing our analysis, let us comment on the distribution, P (x s , x i , t), and explain the sense in which certain outbreaks are extreme. As P (x s , x i , t) scales exponentially with N (for large N ), if the action S(x s , x i , t) associated with an outbreak differs significantly from 0, the outbreak will occur with an exponentially small probability, just as we observe in Fig.1 . In fact, the special case of S = 0 (p i = p s = 0) is nothing other than the aforementioned mean-field prediction, which nicely quantifies why it is the most-likely extensive outbreak. Results. In order to find the probability distribution of outbreaks, we observe that Hamiltonian (4) does not depend explicitly on time; that is H evaluated along an outbreak is conserved in time [42] . Now, we substituteṗ i = −∂H/∂x i , and write the Hamiltonian for the SIR model in a suggestive form, H = −x iṗi . Thus, if we consider the same large-population limit as the usual mean-field analysis discussed above, and restrict ourselves to outbreaks that start from small infection, e.g., x i (t = 0) = 1/N with N 1, it must be that H 0. Notably, because the energy is zero, we can drop the explicit time dependence in Eq. (5) . As a result, since the number of infecteds grows and then decreases during the course of an outbreak with x i (t) = 0 for general t, one must have p i = const. At this point, we highlight a crucial difference between our analysis for stochastic outbreak dynamics, and the traditional use of WKB for analyzing large deviations in population models with metastable states. In the latter, the traditional H 0 condition of the WKB usually derives from the fact that the model has a locally unique stable fixed-point for the mean-field coordinates, e.g,ẋ = 0 [25, 26, [30] [31] [32] [34] [35] [36] [37] . Common examples are stochastic switching and extinction from endemic equilibria. In our case, the zero-energy condition corresponds to a conserved momentum, and in fact, an infinite number of them. The non-zero momentum boundary conditions entailed by the conserved momenta are distinct from other known categories of extreme processes in discrete-state non-equilibrium systems and stochastic populations, and hence we uncover a new degenerate class. Now that we know that outbreaks in the SIR model are defined according to a conserved momenta, i.e., m ≡ e pi , we can equate the Hamiltonian (4) to zero, and find the non-constant fluctuational momentum p s , along an outbreak in terms of x s , m, and R 0 , This momentum is necessary for evaluating Eq. (5). Continuing on toward our main goal of calculating the action, we note that the integral over p i vanishes, since it is a constant of motion and x i (t = 0) = x i (t → ∞) 0. Furthermore, the integral over H also vanishes since H 0. As a result, in order to determine the action, we need to compute the integral over p s [Eq. (6)] from the initial state x s = 1 to the final state x s (t → ∞) = x * s . The only thing left for us is to express x * s in terms of the conserved momentum m. This can be done by using Hamilton's equations, see SM for details. Doing so, we arrive at the total action accumulated in the course of an outbreak Note that S is a function of x * s only, since for fixed R 0 there is a complete mapping between the final outbreak size and m (see SM Eq.(A9) for x * s (m)). Equation (7) is our main result: the asymptotic solution of Eq.(3) for the distribution of all O(N ) outbreaks. [43] Our main result can now be tested in several ways. First, we go back to the motivating Fig.1 . Recall that our approach predicts that, as a function of N , the action gives the slope of ln P (x * r ) N S(x * r ). As such, we can overlay lines in Fig.1 , where the slopes are predictions from Eq.(7). Doing so for three extreme outbreak values (as well as the mean-field), we observe very good agreement, especially for larger values of N . Second, we can fix N and R 0 , and see how well Eq.(7) predicts the full distribution. Such comparisons with stochastic simulations are shown in the upper panel of Fig.2 (a) . In particular, we plot the fraction of 10 12 simulations that resulted in an outbreak x * r in blue, and the solutions of Eq. (7) with a black line. Again, the agreement between the two is quite good for the population size N = 2000 and R 0 = 1.7. Disagreement increases as the outbreak sizes approach O(1/N ). Qualitatively, we can see that our theory captures the full cubic structure of the outbreak distribution, with a local maxima at the smallest outbreak (here 1/N ) and the mean-field solution, x * r 0.69 [24, 30, 33, 35] . To get more insight into the outbreak distribution, one can use Eq. (7) to compute the action, e.g. in the vicinity of the mean-field, x * r . Locally the distribution is a Gaussian around x * r , with a relative variance that takes a minimum at R 0 5/3, for which stochastic deviations from the mean-field outbreak are minimized (See SM for further details on the distribution's unique shape). Before moving to more general outbreak models we mention a few important qualitative details that emerge from our approach. In particular, let us consider the stochastic dynamics for the fraction of the population infected,ẋ i = ∂H/∂p i . Substituting Eq.(6) intoẋ i , yieldṡ First, note that when m = 1 (p i = p s = 0), we uncover the mean-field SIR model system, From the mean-field, we can recover Eq.(8) with the suggestive transformations x s → x s + (m − 1)/mR 0 , and γ → γ/m [44] . Recalling that each outbreak is parameterized by a unique constant m, evidently the effect of demographic stochasticity is to add an effective constant reduction (or boost) to the pool of susceptibles and to increase (or decrease) the effective recovery rate, depending on whether the final outbreak is smaller (m < 1) or larger (m > 1) than the mean-field, respectively. We can test our prediction that a conserved m constrains an entire outbreak path by picking a particular final outbreak size, corresponding to a particular value of m, and compare to stochastic trajectories. One method for comparison is to build a histogram in the (x i , x s ) plane from many simulations that end in the same outbreak size, and plot the constant-m prediction. The latter can be found by solving the differential equation dx i /dx s =ẋ i /ẋ s from Hamilton's equations, or An example is shown in Fig.2 (b) for a final outbreak of 86% when R 0 = 1.7 (the mean-field prediction is 69%). The color map for the histogram is plotted along with the prediction from Eq. (9) . As expected, the outbreak-path prediction lies in the maximum density region. Thus, not only does our approach predict probabilities, but also the optimal dynamics that leads to outbreaks-driven by an effective conserved momentum, m. General model. We now generalize our results to more complex and realistic outbreak models. Typically, such models derive from the same basic assumptions as SIR, but have more states and free parameters. For example, epidemiological predictions for COVID-19 (at a minimum) require an incubation period of around 5 days, and an asymptomatic disease state, i.e., a group of people capable of spreading the disease without documented symptoms [27] [28] [29] . Both features: finite incubation and heterogeneity in infectious states, can form the basis of a more general class of outbreak models [9, 10, 12, [27] [28] [29] . Within this class, we assume that upon infection, susceptible individuals first become exposed (E), and then enter an infectious state at a finite rate α. By assumption there are several possible infectious states (e.g., asymptomatic, mild, severe, tested, quarantined, etc.) that an exposed individual can enter according to prescribed probabilities [9, 10, 12, [27] [28] [29] . In addition, infectious states can have their own characteristic infection rates and recovery times. Putting these ingredients together, let us define N infectious states, I n , where n ∈ {1, 2, ..., N }, each with their own infectious contact rate β n and recovery rate γ n , and which appear from the exposed state with probabilities z n [12, [27] [28] [29] 45] . See SM for list of reactions. Following the WKB-prescription above, the Hamilto-nian for our general class of outbreak models is H = n β n x i,n x s e pe−ps −1 + γ n x i,n e −pi,n −1 + αz n x e e pi,n−pe −1 . (10) Despite the increased dimensionality and parameter heterogeneity, the general outbreak system defined through Eq.(10) can also be solved analytically by precisely the same approach as the baseline SIR model. As in the latter, the essential property that makes the system solvable is the constancy of all momenta except for p s . This property ensures that, here again, there is one free constant, m, that determines all momenta and the final outbreak size. Demonstrating this requires a few additional steps of algebra, but the result is a simple update to Eq.(7) that involves a sum over the heterogeneities {z n , β n , γ n }. See SM for general outbreak solution, Eq.(A29). An important consequence of the general solution is that, in the special case of the SEIR model [1] , where there is only one infectious state, the outbreak action is identical to the SIR model, Eq. (7). Namely, finite incubation changes the dynamics of outbreaks, but has only a subexponential contribution to their probability. An example prediction from our general analysis is shown in the lower panel of Fig.2 (a) . The analytical solution (black line) is in very good agreement with stochastic simulations of a COVID-19 model with asymptomatic (n = 1) and symptomatic (n = 2) infectious individuals. The infection parameters [46] take realistic heterogeneous values, i.e., β 1 = 1.8, β 2 = 1.12, γ 1 = 1, γ 2 = 0.8, α = 2, z 1 = 0.3, and N = 4000 [12, [27] [28] [29] 45] , where z 1 = 0.3 is a typical value for the fraction of asymptomatic infection. Despite the increased complexity, the distribution in the more general model is also well-captured by our theory. Before concluding, it is worth mentioning that although in real outbreaks the parameters in Eq.(10) may fluctuate in time, if the fluctuations are fast compared to outbreak time-scales O(ln N ) [38] , we expect the distribution to approach the SIR model with effective time-averaged parameters, which can be computed using methods detailed in [47, 48] . On the other hand, if the fluctuations are slow with respect to the same time scales, we expect the distribution to be described by integrating over the solution of Eq.(10), with weights given by the probability-density of rates [47, 48] . In the intermediate regime, one must solve a Hamiltonian system with increased dimensionality, which includes both demographic noise and environmental variability. In this way, our results can provide a basis for understanding even more realistic outbreak dynamics. Conclusions. We solved the canonical problem of predicting the outbreak distribution of epidemics in large, fixed-sized populations. Our theory was based on the exponential scaling of the probability of extensive outbreaks on the population size, which allowed the use of a semiclassical approximation. By analyzing SIR, SEIR, and COVID-19 models, we were able to derive simple formulas for the paths and probabilities of all extensive outbreaks, and find an effective picture of how stochasticity is manifested during outbreaks. Most importantly we showed that, unlike other well-known examples of rare events in population models, the statistics of extreme outbreaks depend on an infinite number of minimumaction paths satisfying a unique set of boundary conditions with conserved momenta. Due to their distinct and degenerate phase-space topology, extreme outbreaks represent a new class of rare process for discrete-state stochastic systems. As with other extreme processes, our solution can form the basis for predictions in many other scenarios, including stochastic outbreaks mediated through complex networks. JH Here the momenta of the susceptibles and infected are respectively defined as p s = ∂S/∂x s and p i = ∂S/∂x i . As in classical mechanics, the outbreak dynamics satisfy Hamilton's equationṡ As shown in the main text, since the Hamiltonian does not depend explicitly on time it is a constant of motion, and furthermore, it can be shown that H 0 when x i (t = 0) 0. As a consequence, p i is a free constant throughout the epidemic outbreak, and we can define the constant m = e pi . Equating the Hamiltonian (A1) to zero, after some algebra we find: At this point we can use Hamilton's equations (A2-A3) to express the final outbreak size in terms of the initial momentum of infected. Because the total population density is constant, the density of the recovered individuals, x r = R/N , satisfies:ẋ r = −ẋ s −ẋ i = γx i /m. As a result, we can write a differential equation for dx s /dx r by dividing Eq. (A2) byẋ r , which yields: Substituting e ps from Eq. (A6) into this equation, and integrating x s from 1 to x * s , and x r from 0 to 1 − x * s , we find an implicit equation for the final outbreak versus m: This equation can be solved for m, and the result is where W 0 (z) is the principle solution for w of z = we w . Thus, for fixed R 0 , we have a complete mapping between the final outbreaks and the free-parameter m: each outbreak corresponds to a unique value of m. In addition to finding the final outbreak size as function of m, we can find the trajectory for the population fraction infected during an outbreak by solving dx i /dx s . Using Hamilton's equations (A2-A3) and Eq. (A6) we find Integrating x i from 0 to x i , and x s from 1 to x s , results in Eq.(9) from the main text, or Notably, at m = 1 (the mean-field solution), Eq. (A8) becomes x * s = e −R0(1−x * s ) , which yields the well-known mean-field total outbreak size x * r = 1 − x * s = 1 + W 0 −R 0 e −R0 /R 0 . In addition, Eq. (A11) becomes x i (x s ) = 1 − x s + ln(x s )/R 0 , and we recover the wellknown mean-field result of how the fraction of infected depends on that of the susceptibles. Having found p s (x s ), and x * s (m) we can find the action by integrating p s dx s between x s (0) = 1 and x * s . The result is Eq. (7) from the main text. In this section, we generalize the SIR-model results to a broader class of outbreak models with finite incubation and heterogeneity in infectious states. First, let us list the possible reactions in the larger class (typical of COVID-19 models) described in the main text: (S, E) → (S −1, E +1) with rate S n β n I n /N, (A12) (E, I n ) → (E −1, I n +1) with rate z n αE, (A13) (I n , R) → (I n −1, R+1) with rate γ n I n . (A14) From these, the Hamiltonian Eq. (10) directly follows from the WKB limit described in the main text and in SM.I: + αz n x e e pi,n−pe −1 + γ n x i,n e −pi,n −1 . To solve the system Eq.(A15), let us adopt the convenient notation, m = e pe , m s = e ps , and m i,n = e pi,n . As before, we look for solutions withṗ i,n = −∂H/∂x i,n = 0, which implies Because the left hand side is a constant and the right hand side has no explicit dependence on (β n ,γ n ), we define an outbreak constant, (A20) we need to know the upper and lower limits for the integrals in Eq.(A20). As with the SIR model, since: all momenta are constant except p s , H = 0, x e (t = 0) ≈ 0, x i,n (t = 0) ≈ 0, x e (t → ∞) → 0, and x i,n (t → ∞) → 0, the only non-zero integral comes from p s , which depends on x s (t → ∞) ≡ x * s . One useful strategy for finding the final fraction of susceptibles x * s is to find relationships between the timeintegrals of x e , x i,n , and x s . As in the SIR model, let us start with three of Hamilton's equations determined from Eq.(A15): By defining I e ≡ ∞ 0 x e (t)dt and I i,n ≡ ∞ 0 x i,n (t)dt, we can integrate Eqs.(A22-A23) with respect to t. Remembering that x i,n (t = 0) ≈ 0, x i,n (t → ∞) → 0, and x r (t → ∞) → 1 − x * s, , the result is: Similarly, separating t and x s in Eq. (A21) and integrating over all time we get In this section we explore the unique shape of the outbreak distribution. In terms of analytical scaling, in Eq. (A9) we have expressed x * s (in the SIR model) as a function of m which allows finding an explicit solution for the outbreak distribution as a function of x * s . While this gives rise to a cumbersome expression, further analytical progress can be done in the vicinity of m 1, which is a local maximum of the distribution. Expanding the right hand side of Eq. (A9) around m = 1, we find m in terms of x * s , with x * s ≡ −W 0 (−R 0 e −R0 )/R 0 being the mean-field solution for x * s . Indeed, m is close to 1 in the vicinity of the maximum of the distribution, x * s x * s . Plugging Eq.(A30) into the action function Eq. (7), and approximating up to second order in x * s − x * s , we find . (A31) Equation (A31) means that the distribution in the vicinity of the mean field outbreak size is a Gaussian with a width of σ = (N S ( x * s )) −1/2 . Notably, close to the bifurcation, R 0 − 1 1, x * Modeling Infectious Diseases in Humans and Animals Stochastic epidemic models and their statistical analysis Application of sir epidemiological model: new trends Infectious Disease Modelling Annual Review of Statistics and Its Application Distribution of outbreak sizes for sir disease in finite populations Proceedings of the National Academy of Sciences Extinction and Quasi-stationarity in the Stochastic Logistic SIS Model Course of Theoretical Physics) For brevity we have dropped the dependence on xi in the final-outbreak action Eq A similar effect occurs in cell biology in a mRNA-protein genetic circuit, where fluctuations in the mRNA copy number can be effectively accounted for by taking a protein-only model with a modified production rate For COVID-19 modelling, a typical choice for time units would be t = 1 corresponding to 10 days s 1 − 2(R 0 − 1), and the width simplifies to σ 2/ N (R 0 − 1), whereas for R 0 1, x * s e −R0 and σ N −1/2 e −R0/2 . These calculations lead to a very interesting result: the coefficient of variation (COV), COV = σ/ x * s , receives a minimum at R 0 = 1.66. That is, the deviation from the meanfield outbreak size is minimized at R 0 5/3, whereas at R 0 → 1 or R 0 1, the COV diverges, see Fig. S1 (a). Another unique aspect of the outbreak distribution is the least-likely small outbreak, x min r , which lies in between the mean field and the minimum outbreak x * r = 0. The least-likely small outbreak satisfies ∂S/∂x * r (x min r ) = 0 from the main text Eq. (7). For outbreaks smaller than the mean field, x min r can be used to separate outbreaks into increasing or decreasing likelihoods. Usefully, we can track its dependence on parameters and compare to both Monte-Carlo simulations and the mean field.An example is shown in Fig.S1 (b) , where we plot x min r and x * r versus R 0 . The predicted value of x min r (from solving Eq. (7)) is shown with a blue line, which can be compared directly with simulation results shown with blue squares. The latter were determined by first building histograms from 10 11 stochastic simulations in the SEIR model, similar to Fig.(2) (a) , and then fitting the smallest-probability region below the mean-field value with a quartic polynomial of x * r . The polynomialfits were done to log 10 (P ). After fitting, the local minimum was extracted for each plotted value of R 0 . The least-likely small outbreaks can also be compared to the mean-field result, x * r , shown in red. Similar to the blue series, lines are theory predictions and points represent the local maxima of the simulation-based histograms.Note that we predicted that SEIR-model distributions are identical (on log scale) to SIR-model distributions, as described in the main text. This claim is tested in Fig.S1 (b), since simulations were performed under the former, while theory derived from the latter. As we can see, the two agree very well.