key: cord-1043800-wumlclej authors: St-Onge, Guillaume; Sun, Hanlin; Allard, Antoine; H'ebert-Dufresne, Laurent; Bianconi, Ginestra title: Universal nonlinear infection kernel from heterogeneous exposure on higher-order networks date: 2021-01-18 journal: Physical review letters DOI: 10.1103/physrevlett.127.158301 sha: 5d3f68fe40cb0666c2bb0f1eea0f9e594eafc360 doc_id: 1043800 cord_uid: wumlclej The colocation of individuals in different environments is an important prerequisite for exposure to infectious diseases on a social network. Standard epidemic models fail to capture the potential complexity of this scenario by (1) neglecting the higher-order structure of contacts which typically occur through environments like workplaces, restaurants, and households; and by (2) assuming a linear relationship between the exposure to infected contacts and the risk of infection. Here, we leverage a hypergraph model to embrace the heterogeneity of environments and the heterogeneity of individual participation in these environments. We find that combining heterogeneous exposure with the concept of minimal infective dose induces a universal nonlinear relationship between infected contacts and infection risk. Under nonlinear infection kernels, conventional epidemic wisdom breaks down with the emergence of discontinuous transitions, super-exponential spread, and hysteresis. The colocation of individuals in different environments is an important prerequisite for exposure to infectious diseases on a social network. Standard epidemic models fail to capture the potential complexity of this scenario by (1) neglecting the higher-order structure of contacts which typically occur through environments like workplaces, restaurants, and households; and by (2) assuming a linear relationship between the exposure to infected contacts and the risk of infection. Here, we leverage a hypergraph model to embrace the heterogeneity of environments and the heterogeneity of individual participation in these environments. We find that combining heterogeneous exposure with the concept of minimal infective dose induces a universal nonlinear relationship between infected contacts and infection risk. Under nonlinear infection kernels, conventional epidemic wisdom breaks down with the emergence of discontinuous transitions, super-exponential spread, and hysteresis. Mathematical models of epidemics play an increasingly important role in public health efforts and pandemic preparedness [1] . By providing insights on the interplay of the biological and sociological aspects of epidemics, models can test potential interventions in silico and suggest potential outcomes [2] . However, large-scale forecasting comparisons show that statistical models often outperform mechanistic models that make assumptions about spreading dynamics [3] . In this letter, we look at the interplay of two commonly used assumptions in disease models: Random mixing and the linearity between infection risk and exposures to infected individuals. In almost all disease models, doubling the number of contacts between susceptible and infectious individuals doubles the risk of infection for the susceptible individuals. Some past work in mathematical biology has considered nonlinear infection rates [4, 5] , but these models are rarely used in practice. Other fields such as sociology do often consider generalized contagion models, often dubbed complex contagions [6, 7] . In these complex contagions, having a nonlinear relationship between infection rate and sources of infection allows the model to consider mechanisms such as social reinforcement [8] , where a set of multiple exposures can have more impact than the mere sum of unique exposures. The mathematical convenience of assuming random mixing when modeling infectious diseases comes at the price that all contacts between susceptible and infectious individuals are effectively equivalent. This assumption has often been lifted using heterogeneous mathematical models where individuals are distinguished by some individual features such as their intrinsic susceptibility or reaction to the infection [9, 10] , relaxing the mass-action assumption directly [11, 12] , or by specifying an underlying contact network [13, 14] . Moreover, the linearity assumption says that all increments in the total exposure to infectious individuals (measured for example as a viral inoculum) are equivalent. Evidence asso-ciated with the minimal infective dose of different infectious diseases shows that not all exposures are equal, and that some minimal dose might be required for an infection to likely occur. More precisely, the ID 50 value is a measure of the dose needed to cause an infection in 50% of individuals. These concepts are needed because our immune system is usually able to handle microscopic challenges from viruses and bacteria alike. While an infective dose of tuberculosis might only require between 1 and 5 bacteria [19] , some enterics might require up to 10 9 pathogenic particles [20] , and others like common respiratory infections still require further study [21] . There are indeed multiple different physical mechanisms behind immune evasion, for example some airborne viruses need to find their receptors on lung epithelial cells, while some bacteria might instead require interaction with the immune system [22] . These mechanisms are reviewed in Refs. [22] [23] [24] [25] [26] , and all of them combine to determine the ID50 of specific pathogens. Likewise, the decay or clearing rates of pathogens in non-infectious courses can also vary a lot, potentially requiring days for bacteria to hours for airborne viral infections. For example, mathematical models for the pathogenesis of SARS-CoV-2 or influenza A use decay rates of the order of 7-18 hours but empirical estimates vary wildly (see Ref. [27] and [28] and references therein). To study the effect of simultaneously relaxing these two assumptions, we consider a social structure where individuals attend a certain number of environments such as work places, gyms, or supermarkets. This division of contact structure in environments is motivated by the known role of superspreading events, which are for example critical to the ongoing spread of COVID-19 [29] [30] [31] [32] [33] [34] [35] [36] . While variations at the individual level is often used to explain superspreading [37] , we focus here on the variability of environments and of temporal patterns [38] [39] [40] [41] [42] [43] [44] at the group level, which undoubtedly affect epidemics [31] , especially when a certain exposure within a where the number of people involved (size), the duration of the event, and the resulting proportion of infected individuals (attack rate) are all available (extracted from Refs. [15, 16] , see the supplemental material [17] ). (b)-(c) Framework for contagions on hypergraphs [18] , where the size m of the hyperedges (environment), the hyperdegree k of the nodes (individuals), and the participation time to the environment τ are all heterogeneous, distributed according toP(m),P(k), and P(τ) respectively. For the sake of simplicity, we assume the same distribution P(τ) for all environments. (b) At each time step t, an individual participates for a time τ (drawn independently) to each environment. (c) An individual gets infected with probability θ m (ρ) in the environment at time step t, which depends on the size m and the fraction infected ρ. certain time window is needed to confidently spark an infection. Interestingly, available case data highlight how there is no expected size or duration for such events. Transmission is highly context dependent on the settings (e.g., ventilation) and activity (e.g., singing or shouting) such that the resulting superspreading events are heterogeneous in size, duration and attack rate, as shown in Fig. 1(a) . Higher-order contact structures and heterogeneous temporal patterns are therefore key ingredients for more realistic models of spreading dynamics. Mathematically, we represent the contact structure as a hypergraph [45] [46] [47] where each environment is described by a hyperedge connecting m nodes (individuals) and where each node is incident to k hyperedges. All hyperedges of a same size m are considered equivalent, although this assumption is relaxed in the supplemental material to consider additional sources of heterogeneity [17] . To model heterogeneous temporal patterns, we consider a discrete-time process, where at each time step t = 1, 2, . . . , we draw for each individual a participation time τ ∈ [1, τ max ] for each environment to which they are connected [Fig 1(b) ]. The time steps correspond to fixed temporal windows of size τ max , during which susceptible individuals can get infected through their participation to environments. We first study the impact of the spatiotemporal co-location patterns on the infection kernel θ m (ρ), the probability of getting infected in an environment of size m when a fraction ρ of the other participants are infectious [ Fig. 1(c) ]. We then analyze the properties of the resulting contagion process. Universal infection kernel from heterogeneous exposure. Let us consider a susceptible individual participating to an environment of size m for a duration τ, where a fraction ρ of the other participants are infectious. During this exposure period, some of the other m − 1 individuals might participate to the environment as well. We assume that the considered individual receives an infective dose κ ∈ [0, ∞) from the infectious individuals, distributed according to π(κ; λ), where λ ≡ κ . A reasonable assumption is that the mean dose received is proportional to the time spent in the environment and to the proportion of infectious people, λ = β f (m)τρ, where β is a rate of dose accumulation and f (m), unitless, modulates the typical number of contacts in environments frequented by m individuals. While this is not a strict requirement for our results to hold (see supplemental material [17] ), we further assume that the random variable for the dose can be written as κ = λu, where u is a random variable that is independent of λ. In this case, u can be seen as an intrinsic property of the contagion processdetermined by rates of viral shedding, diffusion in the environment, variability of human interactions, etc.-while λ acts as a scale parameter, i.e., π(κ; λ) = π(κ/λ; 1)/λ ≡ π(κ/λ)/λ. To incorporate the concept of minimal infective dose, we assume that an individual develops the disease if κ > K, a perspective analogous to standard threshold models [48] [49] [50] and related to the assumption that successful host invasion necessitates multiple attempts by the pathogen [51] . The probability of getting infected in the environment is then The infection kernel θ m (ρ) is calculated by averaginḡ Π(K/λ) over P(τ). We focus here on the case of heterogeneous exposure periods modeled with a Pareto distribution P(τ) = C α τ −α−1 , where C α is a normalization constant, α > 0 and τ ∈ [1, τ max ]. However, for our dose mechanism to be well defined, we can only average over participation times τ ∈ [1, T ], where T ≤ τ max is the clearing window, i.e., the characteristic time for the immune system to get rid of any dose κ < K. If we assume that this clearing window is sufficiently large compared to τ c ≡ K/β f (m)ρ, the characteristic time to get infected in the environment, we can neglect events where τ ≥ T as they do not contribute significantly to the infection kernel [17] . We therefore redefine our support as τ ∈ [1, T ] and the infection kernel is When 1 τ c T and π(y) decreases faster than y −α−1 , then the integral on the right converges to a constant, the term in T −α can be neglected, andΠ(τ c ) τ −α c , which implies where D α is some constant. This form of infection kernel is universal, driven by temporal patterns, and does not depend on the value of K (given K > 0) or on the particular form of π. We illustrate it in Fig. 2 (a) using an exponential for the dose distribution-other cases such as the Weibull and the Fréchet distribution are presented in the supplemental material [17] . Let us stress that the condition 1 τ c T is not restrictive. On the time scale on which we measure the exposure periods, τ c 1 implies that the contagion does not spread too fast, otherwise the whole population would be rapidly infected, while T τ c suggests that it is transmissible before the immune system is able to clear the dose received. More broadly, we do not need to assume that π(y) decreases faster than y −α−1 , nor that λ is a scale parameter for κ. In fact, π(κ; λ) does not even need to be a continuous distribution and P(τ) could be asymptotically power law for large τ. In this more general context, we still recover a universal infection kernel θ m (ρ) ∝ ρ ν in most cases, but ν is not always directly equal to α [17] . Linear infection kernels (ν = 1) are recovered only in some specific cases, for instance when α = 1 or when we use a Poisson distribution for κ and K = 1. From now on, we make abstraction of the underlying distributions π(κ; λ) and P(τ), and focus on the resulting effective model parametrized by ν. Epidemic spreading with nonlinear infection kernel. We now illustrate the consequences of nonlinear infection kernels for contagion on hypergraphs. To simplify the mathematical analysis, we consider a Susceptible-Infectious-Susceptible model. At each time step, an infected node becomes susceptible with probability µ, while a susceptible node gets infected through a hyperedge of size m with probability θ m . If a node is part of multiple hyperedges, we assume that the probability of infection through each hyperedge are independent. This is reasonable if k max T ≤ τ max for a maximal hyperdegree k max . For instance, if T is of the order of a few hours, τ max a week, and an individual participate to an environment once a day, the night allows the immune system to clear any dose κ < K accumulated the day before. To obtain some analytical insights, we introduce a degreebased mean-field theory [14] , an approximation equivalent to consider an infinite size annealed hypergraph, where at each time step, the connections between nodes and hyperedges are shuffled. We validate our approach with Monte Carlo simulations on quenched hypergraphs in the supplemental material [17] . With the mean-field approximation, the marginal probability for a node to be infected at time t only depends on its hyperdegree, which we note ρ k (t). The global prevalence is then simply I(t) = k ρ k (t)P(k) and the evolution of the system is described by [14] where Θ k (ρ) = 1−[1−θ(ρ)] k is the probability for a susceptible node of hyperdegree k to get infected.ρ(t) is the probability that a node belonging in any hyperedge is infected andθ(ρ) is the probability for a susceptible node to get infected in any hyperedge, whereθ m (ρ) is the probability for a node to get infected in a hyperedge of size m. Because of the annealed structure,θ m (ρ) is just the average of θ m (ρ) with ρ = i/(m − 1) over a binomial distribution, with θ m (ρ) defined at Eq. (2). Figure 2 (b) shows a first consequence of the nonlinear kernel: the apparition of super-exponential growth for the global prevalence I(t) when ν > 1. Note that the growth is approximately exponential until a sufficiently high prevalence. For ν ≤ 1, we instead have a standard exponential growth until saturation is reached. In the steady state of the epidemic dynamics, we obtain a self-consistent solution since Θ k is a function ofρ. For contagions with a nonlinear infection kernel, the phase transition associated with the order parameter I * can be continuous or discontinuous with a bistable regime. Consequently, we define the invasion threshold β c such that for all β > β c , the absorbing state I * = 0 is unstable [dashed line in Fig. 2(c) ]. We also define the persistence threshold β p such that for all β < β p , the absorbing state I * = 0 is globally attractive [dotted line in Fig. 2(c) ]. For continuous phase transitions, β c and β p coincide, and is called the epidemic threshold; for a discontinuous phase transition, β p < β c , and for all (4-6) to evolve the system. (b) Supra-linear kernels ν > 1 lead to a super-exponential growth for the global prevalence I(t). We use β = 5 × 10 −4 , β = 0.025 and β = 0.077 for ν = 0.5, ν = 1 and ν = 1.5 respectively.τ is the median exposure period. (c) The phase diagram for stable solutions in the stationary state (t → ∞) can be continuous or discontinuous with a bistable regime. Sub-linear and linear kernels ν ≤ 1 lead to a continuous phase transition, and the invasion threshold β c vanishes for ν → 0. Supra-linear kernels ν > 1 can lead to a discontinuous phase transition with a bistable regime. β ∈ (β p , β c ), there exists typically three solutions, I * 1 = 0 and I * 2 , I * 3 > 0, with I * 1 and I * 3 locally stable. The invasion threshold β c can be found by imposing G (0) = 1, the persistence threshold β p is obtained by imposing bothρ * = G(ρ * ) and G (ρ * ) = 1 forρ * > 0, and any tricritical point can be found by imposing G (0) = 1 and G (0) = 0. In the supplemental material [17] , we obtain an exact selfconsistent expression for the invasion threshold, and using again an asymptotic approximation, we find It depends in a intricate manner on both the moments ofP(m) andP(k) and the nonlinearity of the infection kernel. As illustrated in Fig. 2 (c), the invasion threshold can become very small for ν < 1 (vanishing for ν → 0), even for homogeneous P(k) andP(m). Note that to obtain a sub-linear kernel ν < 1, it typically requires a distribution P(τ) ∝ τ −α−1 with α < 1, which implies that the mean participation time τ diverges. In the supplemental material [17] , we show that if instead we fix τ while varying α, there exists an optimal temporal heterogeneity α * that minimizes the invasion threshold β c , and maximizes early spread. The minimal kernel exponent ν c leading to a discontinuous phase transition is given by a tricritical point [17] . Although exact solutions require numerical evaluation, we get three insights from an asymptotic expansion. (i) ν > 1 is necessary in order to have a discontinuous phase transition, but it is not sufficient: ν c depends on the first three moments ofP(k), and in a more complicated manner on the distributionP(m). (ii) It is necessary to have environments of size m > 2 to have a discontinuous phase transition. (iii) A more heterogeneous P(k) leads to a larger ν c . Similar observations were made in Ref. [52] for m ≤ 3. Conclusion. Our framework captures many properties usually overlooked for the sake of simplicity in epidemic models: the higher-order structure of contacts, the temporal heterogeneity of human activity and thresholding effects over the exposure due to the host immune system. In the supplemental material [17] , we also demonstrate that our results are robust to variations in individual infectiousness or local transmission in different environments. In particular, we recover a universal nonlinear infection kernel that provides a connection between complex contagions based on nonlinear infection kernels [53] and threshold models [48] [49] [50] . Our results challenge a key assumption of most epidemic models and ask: Why assume a linear relationship between the number of infectious contacts and the risk of infection? This question is critical since three of the basic insights gathered from epidemic models break down under nonlinear infection kernels: they can lead to a discontinuous relationship between disease transmission and epidemic size, to a bistable regime where macroscopic outbreak and disease-free state coexist, and to a super-exponential growth. While the first two are difficult to assess for real contagions, super-exponential spread has been observed for influenza-like illness [54] . Even though we considered the SIS model to simplify the analysis, the universal infection kernel θ m (ρ) ∝ ρ ν could be directly integrated in more realistic models-such as SEIR or SIRS-where the same phenomenology typically carries over. The phenomenology being drastically different from standard epidemiological models begs the following question: Why do linear models work? Even for a nonlinear kernel θ m (ρ), the probability of infectionθ m (ρ) (averaged over hyperedge configurations) is linear inρ ifρ 1 [see Eq. (6)]. Therefore, linear models are a good approximation when the prevalence is sufficiently low, but breaks down at higher prevalence, as clearly illustrated in Fig. 2 The mathematical framework we use to solve the SIS model hinges on a mean-field or annealed approximation, as in other studies [52, [55] [56] [57] , thereby suppressing dynamical correlations within hyperedges. As we show in the supplemental material [17] , dynamical correlations can be captured using approximate master equations [58] [59] [60] [61] [62] [63] [64] , which are more complicated but provide similar results with better agreement to simulations. Future works could investigate more thoroughly the interplay between dynamical correlations, nonlinear kernels, and spatiotemporal heterogeneity. Altogether, our conclusions stress the need to embrace heterogeneity in disease modeling; in the infection dynamics itself, in patterns of temporal activity, and in the higher-order structure of contact networks. Epidemics should be seen as the result of a collective process, where higher-order structure and temporal patterns can drive complex dynamics. The authors thank Dr. Sean Diehl for help with advanced immunological concepts. This work was supported by the In the following sections, we make use of the following notations for asymptotic analysis: • f ∼ g stands for f is asymptotically equal to g The last two would be the equivalent to the small O and the small Omega in the Bachmann-Landau notation. When f is asymptotically proportional to g, we use f ∝ g, and it should be clear from the context if it is only true in some regime. As in the main text, let us assume that a susceptible individual participates to an environment of size m, with a fraction ρ of the m − 1 individual being infected, for a duration τ. During this period, the susceptible individual receives an infective dose κ of distribution π(κ; λ), where λ ≡ κ is (S1) The susceptible individual develops the infection if κ ≥ K, hence with probabilitȳ Note that the distribution could be discrete or continuous. The infection kernel is then obtained by averaging over the participation time, for some domain D τ . In this paper we focus on heterogeneous exposure modeled by P As mentioned in the main text,Π(K; λ) is ill-defined for τ ≥ T , where T is the clearing window for the immune system, but fortunately, we will be able to neglect these events. In fact, we always have thatΠ(K; λ) ≤ 1, hence the remainder associated to participation times τ ≥ T is upper-bounded As it will become clear in this section, terms O(T −α ) have a negligible contribution to the infection kernel. For mathematical convenience, we thus redefine our support as τ ∈ [1, T ] and the infection kernel as where, as in the main text, τ c ≡ K/β f (m)ρ is the characteristic time for an individual to become infected. Integrating by parts, we have Making the change of variable x = τ/τ c , we obtain As we will make clear with a few cases, in the regime where 1 τ c T , depending on the asymptotic properties of each term in Eq. (S7), many distributions π(κ; λ) map to the universal nonlinear infection kernel for some proportionality constant D α . An important subclass is all distributions π(κ; λ) where λ is a scale parameter, i.e., π(κ; λ) = 1 In this case, Eq. (S7) becomes Making the change of variable y = 1/x, and recall thatΠ is the complementary cumulative distribution, we obtain In the regime 1 τ c T , the integral on the right is bounded from below by some constant, or in asymptotic notation it is Ω(1), hence the last term is Ω(τ −α c ), whereas the second term is O(T −α ) so we can neglect it. This also justifies neglecting participation times τ ≥ T since R(T ) = O(T −α ). The infection kernel becomes It is now clear that for all distributions π(y) that decrease faster than τ −α−1 c in the tail, the integral converges to a constant D α andΠ(τ c ) τ −α c , therefore, we recover the infection kernel ∝ ρ α . (S13) If instead π(y) decreases slower than y −α−1 , let's say as y −ξ−1 for large y, where ξ < α, then both terms are asymptotically proportional to τ −ξ c , where E α is another constant prefactor. If ξ = α, we would have θ m (ρ) ∝ ρ α ln (1/ρ) . (S16) So except for this limit case, we always have with ν = min(α, ξ). This is illustrated in Fig. S1 with the Weibull and the Fréchet (inverse Weibull) distributions. When λ is a scale parameter, the properties of the dose distribution are constrained, for instance the fact that κ = λ, this implies that Var(κ) = κ 2 − κ 2 ∝ λ 2 . Here we investigate other scenarios. Perhaps the simplest case is to consider π(κ; λ) = δ(κ−λ), i.e., the dose is no longer a random variable, it is strictly proportional to the time spent in the environment and to the number of infected people. In this case, Var(κ) = 0. We directly havē where H(· · · ) is the Heaviside function, and therefore We thus recover the same result as before under the less restrictive condition 1 ≤ τ c T . This suggests that if π(κ; λ) is sufficiently concentrated around its mean, our result should hold. Another case study we consider is when Var(κ) ∝ λ 2−z , where z ∈ [0, 1]. We achieve this using the gamma distribution π(κ; σ, ψ) = κ ψ−1 e −κ/σ Γ(ψ)σ ψ . Since the mean of the distribution is κ = ψσ and the variance is Var(κ) = ψσ 2 , we can set σ = λ 1−z and ψ = λ z , in which case we respect κ = λ and have Var(κ) = λ 2−z . We are thus able to interpolate between two cases: λ being a scale parameter (z = 0), for which we know our result hold, and where the mean and the variance are the same (z = 1). We do not provide an analytical proof, but we show in Fig. S2 that the kernel θ m ∝ ρ α appears to hold for all z < 1, whereas for z = 1, we have a kernel θ m ∝ ρ ν where ν = min(α, 1). These again belong to the general class of universal power-law infection kernels. A last case we consider is when π(κ; λ) is discrete, i.e., κ ∈ N 0 . Let us consider more specifically the Poisson distribution This distribution can be justified from a mechanistic perspective: let us assume that individuals from a hyperedge interact at a constant rate β f (m). During the period of duration τ, a susceptible individual has on average λ ≡ β f (m)ρτ interactions with infected individuals (doses), and the distribution for κ is the Poisson distribution above. Without loss of generality, we assume that K ∈ N is a small positive integer. It will be more convenient to work with τ c ≡ 1/λ f (m)β = τ c /K, the characteristic time to receive a unit dose. We rewrite Eq. (S7) as We have that the complementary cumulative distribution is and For the sake of simplicity, let us consider K = 2, which implies and Inserting these in Eq. (S25), we obtain The integral converges to Γ(2 − α) if 1 τ c T and α < 2. If α = 2, the integral diverges as ln(τ c ) and for α > 2, it diverges asτ α−2 c . Consequently, we have Ignoring the limit case α = 2, we have θ m (ρ) ∝ ρ ν where ν = min(α, 2). In the general case K 2, similarly we can show that ν = min(α, K). Our previous results assume that P(τ) is a pure power-law distribution. However, we can relax this assumption by assuming that P(τ) ∝ τ −α for τ > τ * . In this case, it is straightforward to generalize our results by splitting the domain [1, T ] in two parts, where C α is again a normalization factor. Note that the first term can be upper-bounded, Integrating by parts the second term, we obtain Therefore, noting the similarity with Eq. (S7), all our previous results hold in the more restrictive regime τ * τ c T . Instead of having a piecewise distribution P(τ), many real distributions are asymptotically power law, i.e., P(τ) ∝ τ −α−1 for sufficiently large τ. One just have to pick a τ * large enough such that P(τ) is a sufficiently good approximation of a power law for τ > τ * . For the asymptotic developments above, we assume that 1 τ c T . In Fig. S3 , we illustrate what happens when this is not respected. For the sake of simplicity, let us say that the conditions are broken if τ c ≤ 10 or T /τ c ≤ 10. Let us recall that τ c = 1/β f (m)ρ. In Fig. S3(a) , we pick β and f (m) such that τ c ≤ 10 for ρ ≥ 0.1 (on the right of the vertical dashed line), in which case the condition τ c 1 is not respected. We see that the scaling ρ α is lost in this regime, but is still valid for ρ 0.1. In Fig. S3(b) , we pick T , β and f (m) such that T /τ c ≤ 10 for ρ ≤ 0.1 (on the left of the vertical dashed line), in which case the condition T τ c is not respected. We see that the scaling ρ α is lost in this regime, but is still valid for ρ 0.1. In summary, the condition T τ c , for which we can choose a specific tolerance such as T /τ c ≥ 10, gives a lower bound ρ 1 for the validity of θ m (ρ) ∝ ρ α , while the condition τ c 1, or as we chose τ c ≥ 10, gives an upper bound ρ 2 . We can therefore assess that the infection kernel is valid on some domain [ρ 1 , ρ 2 ]. The results put forward in this paper highlight the role of the burstiness of interactions for the emergence of nonlinear infection kernels. The heterogeneity of degree k r and environment sizes m γ is also incorporated in the modeling approach we use in the main text. In this section, we show that adding other layers of heterogeneity do not alter the form of the universal kernel θ m (ρ) ∝ ρ ν . We demonstrate it in the case where λ is a scaling parameter for the sake of simplicity. The attack rate in an environment is certainly affected by the conditions of the environment itself: for instance, non-ventilated places might be more prone to infections than better ventilated ones. Let us assume that β ∈ [0, ∞) is an i.i.d random variable for each environment, drawn from an arbitrary distribution Q(β). If Q(β) is a power law, it plays a similar role as P(τ), since β and τ appear side by side in the definition of the mean dose κ = λ. With τ and β both distributed as power laws, the distribution R(x) for x = τβ is a mixture of power-law distributions [1] . The tail of R(x) is then dominated by the slowest decaying distribution, either Q(β) or P(τ). In all cases, we could extend the analysis above using R(x) instead P(τ), and we would recover that θ m (ρ) ∝ ρ ν with ν being the exponent of the distribution that decreases the slowest (see Fig. S4 lower row) . We therefore assume that Q(β) decreases faster than a power law, i.e., Q(β) β −n ∀n ∈ N in the limit β → ∞. Recall that τ −1 c ≡ f (m)βρ, hence τ c = τ c (β). We can rewrite Eq. (S10) as Let now us suppose that we can divide the support for β in three subintervals, In the middle subinterval, we assume that the usual condition for asymptotic expansion holds, 1 τ c (β) T for all β ∈ B 2 . In the other subintervals, this condition is not respected for all β, i.e., there are values for β ∈ B 1 such that τ c (β) ≈ T or even τ c (β) T and values for β ∈ B 2 such that τ c (β) ≈ 1 or even τ c (β) 1. We investigate the asymptotic behavior of θ In the subinterval B 2 , we can replace the inside of the integral on β with the usual asymptotic expansion, leading to In the subinterval B 1 , note that smaller values of τ c increases the value of the integrand on β-by definition, λ ∝ τ −1 c , hence the mean dose received is larger for smaller τ c . Therefore, we can bound from above θ B 1 m by using for all β the smallest value possible on the subinterval, τ c (β) → τ c (β 1 ) T . In this case, the asymptotic expansion we used for B 2 is again possible, which implies in which case we can neglect the contribution from θ B 1 m (ρ) Finally, in the subinterval B 3 , we again use the fact that smaller values of τ c increases the value of the integrand on β. In the limit β → ∞, τ c → 0, and we are left with the upper bound For the universal kernel θ m (ρ) ∝ ρ α to hold, we must have A sufficient condition is if More intuitively, conditions at Eqs (S43) and (S46) indicate that the probability for β ∈ B 1 and β ∈ B 3 must be sufficiently small, i.e., environments of low or intense spreading are rare compared with environments where the spreading is moderate, satisfying 1 τ c (β) T . Infection kernels obtained with different Q(β) are presented in Fig. S4 , where we once again recover the form θ m (ρ) ∝ ρ ν . Similarly, all infectious individuals are not equivalent. For instance, the viral load within an infectious individual depends on the time since infection. Or more simply, people's behavior varies widely, some follow diligently public health recommendations (hand-washing, social distancing, etc.) and some do not. This motivates assigning i.i.d. random variable β j with distributioñ Q(β j ) to characterize the infectiousness of each infected individual j ∈ {1, . . . , i}, where i ≡ ρ(m − 1). A susceptible individual then receives a dose κ j with distributionπ(κ j ; λ j ) from each infectious individual j, where κ j = λ j = β j f (m)τρ. However, it is straightforward to verify that adding this additional source of heterogeneity does not affect our results either. If we define β = i j=1 β j , with distribution Q(β), and λ = i j=1 λ j , then κ = i j=1 κ j has a distribution π(κ; λ) corresponding to Therefore, this is totally equivalent to the case of a heterogeneous environment, with the infection kernel θ m (ρ) defined by Eq. (S38). In the main text, it is assumed that a node of hyperdegree k interacts in k different environments at each time step. More realistically, we can consider that an individual participate to a hyperedge with probability p, hence the effective degree of a node for a time window is drawn from a binomial distribution with parameters k and p. This can be incorporated in our approach simply by using The effect is simply to map θ m (ρ) → pθ m (ρ), leaving its scaling with ρ unaltered. Therefore, the whole phenomenology is preserved. For a specific environment, we suggest in the main text that the participation time τ j of each individual j is i.i.d., with distribution P(τ). However, our result about the infection kernel does not require the independence of the participation time. For instance, the participation time of all individuals belonging to an environment could be perfectly correlated, i.e., τ j = τ for each individual j belonging to the environment. This could be more representative of certain scenarios, for example, of certain workplaces where people have the exact same schedule. More generally, we expect some varying level of correlation for the participation time to different types of environments. The point is that correlations would be reflected on the form of the dose distribution π(κ; λ) and its mean-a more correlated participation time would be associated with a larger rate of dose accumulation β since the individuals spend more time altogether in the environment. Since our result does not depend on the form of π(κ; λ), if the condition 1 τ c T is respected, so should be our result about the infection kernel θ m (ρ) ∝ ρ ν . As discussed in the main text, the phase transition is characterized by G(ρ * ). In this section, we derive self-consistent equations for both the invasion threshold β c and any tricritical point (β c , ν c ), where ν c is a limit value for the nonlinear kernel exponent separating a continuous and a discontinuous phase transition. All time-varying quantities are assumed to have reached the steady state-to simplify the notation, we drop the asterisk. First, let us rewrite G(ρ) [Eq. (7) in the main text] and its derivatives as In the limitρ → 0, we haveθ → 0, which implies Θ k → 0 for all k and Therefore, we have Second,θ(ρ) is rewritten asθ hence it depends onρ only throughθ m (ρ). Finally, it is straightforward to obtain the derivatives of Eq. (6) in the main text forρ = 0, i.e., Combining Eqs. (S49), (S50) and (S51), we have self-consistent equations to solve for both the invasion threshold and the tricritical points. We can identify the invasion threshold β c for a fixed structure given byP(k) andP(m), and for an arbitrary distribution P(τ). Combining Eqs. (S49a), (S50) and (S51a), the constraint G (0) = 1 becomes The left-hand side is intuitively interpreted as the average number of new infections caused by an infected node in a single time step near the absorbing state. Let us assume that the conditions for the universal infection kernel are respected. Consequently, we can use We keep the dependence on β and f (m) to see the impact of these parameters, whereas D α is just some constant that we cannot tune like the others. Consequently In the case f (m) = 1, we see that having ν < 1 induces a very small value for the invasion threshold. For a fixed structure given byP(k) andP(m), the tricritical points (β c , ν c ) are solutions to both G (0) = 1 and G (0) = 0. We substituteθ (0) = µ/ k in (S49b) to obtain Since the second term is strictly positive, it is necessary thatθ (0) > 0 to have a tricritical point. Using again the asymptotic expansion for θ m (ρ) in combination with Eq. (S50) and (S51b), we get The tricritical point is evaluated at β = β c , hence we can substitute the asymptotic expansion for the invasion threshold [Eq. (S55)] to have an expression that only depends on ν, We get three insights from Eqs. (S56) and (S58). The invasion threshold β c is minimized for a value α * ≈ 1.15. β c goes to infinity in the limit α → 1, and converges to a certain value in the limit α → ∞. 1. It is necessary that ν > 1 to have a tricritical point, but not sufficient: it depends on the first three moments ofP(k), and in a more complicated manner on the distributionP(m). 2. It is necessary to haveP(m) > 0 for at least one value m > 2, i.e., environments of size m = 2 cannot lead to a discontinuous phase transition. 3. A more heterogeneousP(k) will typically require a larger ν to reach a tricritical point. Indeed, if we keep k fixed, but increase the value for the second and third moments (using a broader distribution for instance), the negative term on the right in Eq. (S56) increases, while the positive term is invariant. In the main text, we fixed the minimal participation time to τ min = 1, thereby fixing the scale of events relative to this minimal period, but we let τ vary with α. Here we show the consequence of fixing τ = 1, and to let τ min vary. For the sake of simplicity, we assume T → ∞. For a power-law distribution P(τ) = C α τ −α−1 , this is achieved if we set τ min = (α − 1)/α. Note that it is only possible for α > 1, i.e., when the mean is well-defined. As shown in Fig. S5 , keeping a fixed τ reveals another story for the bifurcation diagram. For a sufficiently large β, we now have that I * grows with α-a homogeneous participation time lead to bigger outbreaks-, which is the opposite of what we found for a fixed τ min . Moreover, the invasion threshold is now a non-monotonic function of α. It is minimized for some value α * ≈ 1.2, and the early spread of an epidemic would be maximized for this α * . In the main text, we supposed an annealed structure-nodes and hyperedges are shuffled at each time step. This is why we can assume that the number of infected individuals in a hyperedge of size m is distributed according to a binomial distribution When calculatingθ m (ρ) at Eq. (6), we restrict ourselves to hyperedges with at least one susceptible node, hence we average θ m [i/(m − 1)] over the distribution which is again a binomial, but over the m − 1 remaining nodes. In reality, people do engage with the same people in a recurrent manner, which can be represented by a quenched structure, i.e., an individual always participate to the same hyperedges. In this case, we can track the evolution of H m,i (t) using approximate master equations (AMEs) of the form In Eq. (S61), l counts the number of recoveries from t to t + 1 and j counts the number of infections. is the probability that a susceptible node belonging to a hyperedge (m,i) becomes infected. We have defined which is the probability of getting infected by the hyperedge (m,i), while R is the average probability for a susceptible node to get infected through any other hyperedge to which it belongs. It corresponds to The term in curly brace {· · · } is the probability for a susceptible node of hyperdegree k to get infected in any of the other k − 1 hyperedges to which it belongs. We average all this over a distribution ∝ kρ k (t)P(k), which is the probability for a random susceptible node in a hyperedge to be of hyperdegree k. Finally, the probabilityθ is now calculated as . Combining these equations with Eq. (4) in the main text, we have a closed system. Note that this new set of equations still depends on the infection kernel θ m,i , hence our main result is directly transposed to this case where dynamical correlations are taken into account. In fact, the derivation of the infection kernel is agnostic to the underlying mathematical framework used to model the contagion. It is also worth pointing out that we do not capture exactly all the local dynamical correlations with Eq. (S61): although we track H m,i (t), we assume that each susceptible node gets infected with independent probability φ m,i . However, for µ 1 and φ m,i 1, at most one node either recovers or becomes infected at the same time-in other words, we have asynchronous state transitions. In this limit (effectively taking the continuous time limit), we capture exactly the dynamical correlations within hyperedges. Therefore, Eq. (S61) can be seen as a good approximation for small µ and φ m,i . In Fig. S6 , we show that the phenomenology of the contagion process using AMEs is very similar, almost indistinguishable from the phenomenology of the annealed system [see Fig. 2 (b) and 2(c) in the main text]. Linear and sublinear kernel (ν ≤ 1) still yield a standard exponential spread with a continuous phase transition, with a very small invasion threshold β c for small ν. Supralinear kernel (ν > 1) still lead to super-exponential spread with a discontinuous phase transition. It worth mentioning that even though the results presented here suggest that dynamical correlations within hyperedges have a negligible impact on the contagion process, this does not hold for all scenarios (see Fig. S7 ). Moreover, in Ref. [2, 3] , it was shown that these dynamical correlations are crucial for the onset of mesoscopic localization, which does significantly alter the phenomenology of the system. We expect dynamical correlations to have a larger impact for heterogeneous hyperedge size distributionP(m) and more nonlinear contagions (larger ν). To validate both the annealed equations and the AMEs, we performed simulations of the contagion process on random hypergraphs. We generate random hypergraphs with fixed hyperdegree and hyperedge size distributionsP(k) andP(m). First, we draw a hyperdegree sequence k = [k 1 , k 2 , . . . , k N ] fromP(k), where the number of nodes N is initially fixed. Second, we draw a hyperedge size sequence m = [m 1 , m 2 , . . . , m N ] fromP(m), where the number of hyperedges N is not fixed. We want to have N j=1 k j = N j=1 m j , therefore we add hyperedges one at a time until N j=1 k j ≤ N j=1 m j . If N j=1 k j < N j=1 m j , we remove hyperedges randomly one at a time until N j=1 k j ≥ N j=1 m j . We alternate between these two processes until the equality is reached, which also determines N . Third, we assign the nodes to the hyperedges, best interpreted as a bipartite graph realization. We create a stub list (list of half edges) for each sequence. For instance, for node j, we include k j copies of the label j in the stub list of nodes. Similarly, we include m j copies of hyperedge j in the stub list of hyperedges. These two stub lists are of the same size by definition, hence we shuffle both and match the entries of the lists, thereby assigning nodes to hyperedges at random. However, this process does not avoid assigning a node j multiple times to a same hyperedge if k j > 1. To remove these multiple assignments, we swap the concerned entries of the node stub list with another chosen at random, making sure it does not create another hyperedge with multiple assignments. In the bipartite representation, this move is equivalent to a double edge swap [4, 5] . We then perform a number of additional random double edge swaps (N in our simulations) to ensure the uniformity of the generating process. Let us denote a node j and the set of hyperedges γ to which it is assigned as ∂ j. The parameters α, T, β, f (m) and K are fixed. To simulate the contagion process, we start with an initial set of infected nodes I(0) and a set of susceptible nodes S(0) at time t = 0. At each time step t → t + 1, we perform the following operations. . Comparison with simulations for the temporal evolution and the phase diagram. We use an exponential dose distribution π(κ; λ) ∝ e −κ/λ , a power-law distribution of participation time P(τ) ∝ τ −α−1 with α = 2, a clearing window T → ∞, and f (m) = 1. We set the recovery probability to µ = 0.05. We use Poisson distributions for bothP(k) andP(m), with k = 5 and m = 10. For the simulations, we use random hypergraphs with N = 10 4 nodes. We use Eqs. (4)- (6) in the main text for the annealed solution and Eqs. (S61)-(S65) for the AMEs. (a) Temporal evolution. We performed 200 simulations each on a different hypergraph realization starting with a fraction I(0) = 10 −2 of infected nodes. The semitransparent solid black lines show the prevalence I(t) for each realization. The AME solution is inside the region of high density of paths obtained from the simulations. (b) Phase diagram. For the simulations, we sampled the quasistationary state of the contagion process to estimate the stationary prevalence I * . We performed measures on 10 hypergraph realizations. The semitransparent black circles correspond to the average prevalence and the vertical bars (too small to see) correspond to the standard deviation. Because we sample the quasistationary state, only the upper branch of the hysteresis loop is obtained. Both the annealed and AME solutions fit well with the simulations. a randomly chosen state in the history with the current state of the system. At any time during the simulation, if the absorbing state is reached, we replace the current state with one chosen randomly from the history. To estimate quantities in the stationary state, we first let the system thermalize by performing 10 3 time steps. During this period, we use a short decorrelation period t d = 1 to make sure the whole history has been updated since the initial state. Then, we perform an additional 10 3 time steps during which we measure the system at regular intervals t d = 10. In Fig. S7(a) , we present multiple realization of the stochastic process (semitransparent solid black lines) for the temporal evolution of the prevalence I(t). The darker region corresponds to a high density of paths for I(t). Interestingly, while both analytical approaches have a similar qualitative temporal evolution, AMEs (solid blue line) best capture the temporal evolution, with their prediction being in the region of high density of paths obtained from the simulations. Therefore, dynamical correlations do matter here. Moreover, since α = 2, we have a supralinear infection kernel (ν = 2) leading to superexponential spread, a property validated by the simulations. In Fig. S7(b) , we compare the phase diagram obtained with our analytical approaches versus the simulations. In this case, both the annealed and the AMEs correctly predict the average prevalence in the stationary state-the prediction from AMEs are only slightly superior here. The simulations also confirm the discontinuous phase transition predicted by our analytical approaches. Using "outbreak science" to strengthen the use of models during epidemics Why Model? Results from the second year of a collaborative effort to forecast influenza seasons in the United States Influence of nonlinear incidence rates upon the behavior of SIRS epidemiological models Some epidemiological models with nonlinear incidence How Behavior Spreads: The Science of Complex Contagions The Spread of Behavior in an Online Social Network Experiment Global Behavior of an Age-Structured Epidemic Model Endemic Models with Arbitrarily Distributed Periods of Infection I: Fundamental Properties of the Model Van Ark, Epidemiological models for heterogeneous populations: Proportionate mixing, parameter estimation, and immunization programs Modeling heterogeneous mixing in infectious disease dynamics Epidemic Spreading in Scale-Free Networks Epidemic processes in complex networks SARS-CoV-2 superspreading events from around the world What settings have been linked to sars-cov-2 transmission clusters? See Supplemental Material for a broader derivation of the infection kernel, the consideration of other sources of heterogeneity, the characterization of the phase transition, and a treatment of dynamical correlations, which includes Refs Icons made by Freepik from www.flaticon.com Pathogenesis of tuberculosis: Pathway to apical localization Syndromes of enteric infection Inactivation of influenza A viruses in the environment and modes of transmission: A critical review Immune subversion and quorum-sensing shape the variation in infectious dose among bacterial pathogens Common themes in microbial pathogenicity revisited Bacterial strategies for overcoming host innate and adaptive immune responses Patterns of antigenic diversity and the mechanisms that maintain them Evolution of intracellular pathogens Mathematical modeling of interaction between innate and adaptive immune responses in COVID-19 and implications for viral pathogenesis A review of mathematical models of influenza A infections within a host or cell culture: lessons learned and challenges ahead Secondary attack rate and superspreading events for sars-cov-2 Evidence that coronavirus superspreading is fat-tailed Superspreading events in the transmission dynamics of SARS-CoV-2: Opportunities for interventions and control Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside china Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in shenzhen, china: a retrospective cohort study Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within israel Characterizing superspreading events and agespecific infectiousness of SARS-CoV-2 transmission in Georgia, USA Covid-19 superspreading suggests mitigation by social network modulation Superspreading and the effect of individual variation on disease emergence Bursty Human Dynamics Universal features of correlated bursty behaviour Temporal networks Dynamics of person-to-person interactions from distributed rfid sensor networks Dynamical and bursty interactions in social networks Social network dynamics of face-to-face interactions Temporal properties of higher-order interactions in social networks, arXiv Hypergraphs: Combinatorics of Finite Sets Networks beyond pairwise interactions: Structure and dynamics The why, how, and when of representations for complex systems Threshold Models of Collective Behavior A simple model of global cascades on random networks Universal Behavior in a Generalized Model of Contagion A mechanistic underpinning for sigmoid dosedependent infection The effect of heterogeneity on hypergraph contagion models Dynamical behavior of epidemiological models with nonlinear incidence rates The effect of a prudent adaptive behaviour on disease transmission Simplicial models of social contagion Simplicial SIS model in scalefree uniform hypergraph Social contagion models on hypergraphs Propagation dynamics on networks featuring complex topologies Adaptive networks: Coevolution of disease and topology High-accuracy approximation of binary-state dynamics on networks Willeboordse, Effective degree network disease models Gleeson, Mathematical modeling of complex contagion on clustered networks Social confinement and mesoscopic localization of epidemics on networks Master equation analysis of mesoscopic localization in contagion dynamics on higher-order networks Exact distribution of the product of m gamma and n Pareto random variables Configuring random graph models with fixed degree sequences Configuration models of random hypergraphs How to simulate the quasistationary state Superspreading events data from We create a set of new infectedĨ(t + 1). For all j ∈ S(t), we iterate over hyperedges γ ∈ ∂ j. For each hyperedge γ we: (i) draw a participation time τ γ ii) given τ γ , the hyperedge size m γ , and the fraction of the other nodes that are infected ρ γ , we draw a dose κ γ from π(κ; λ γ ), where λ γ = β f (m)τ γ ρ γ S (t + 1) \Ĩ(t + 1). the previous algorithm is sufficient to simulate the time evolution of the contagion process, estimating quantities in the stationary state can be problematic for finite size systems because the contagion process can get stuck in the absorbing state where all nodes are susceptible. We therefore use the quasistationary-state method [6] to avoid this problem. We keep a history of past states (I, S)-50 states in our case-which we update after each decorrelation period t d . To update the history, we replace Master equation analysis of mesoscopic localization in contagion dynamics on higher-order networks Configuring random graph models with fixed degree sequences Configuration models of random hypergraphs How to simulate the quasistationary state SARS-CoV-2 superspreading events from around the world What settings have been linked to sars-cov-2 transmission clusters? Superspreading events data from We provide Python modules to solve for the infection kernel, the invasion threshold β c , the time evolution of the system, both for the annealed and quenched systems (AMEs), and to perform simulations of the contagion process. The code is available here [7].In the main text, Fig. 1(a) was created using data from Refs. [8, 9]. They provide collections of COVID-19 superspreading events that were either reported in news outlet, on government websites, or in research papers.We kept those where the attack rate, the duration of the event, and the size of the event (number of people attending) were either reported or could be calculated. The curated data is available here [10].