key: cord-0481931-v896hrm3 authors: Cotton, Peter title: Addressing the Herd Immunity Paradox Using Symmetry, Convexity Adjustments and Bond Prices date: 2020-06-12 journal: nan DOI: nan sha: 53fd57e3099d3ab639117b9b4c96f6966dd15bdd doc_id: 481931 cord_uid: v896hrm3 In constant parameter compartmental models an early onset of herd immunity is at odds with estimates of R values from early stage growth. This paper utilizes a result from the theory of interest rate modeling, namely a bond pricing formula of Vasicek, and an approach inspired by a foundational result in statistics, de Finetti's Theorem, to show how the modeling discrepancy can be explained. Moreover the difference between predictions of classic constant parameter epidemiological models and those with variation and stochastic evolution can be reduced to simple"convexity"formulas. A novel feature of this approach is that we do not attempt to locate a true model but only a model that is equivalent after permutations. Convexity adjustments can also be used for cross sectional comparisons and we derive easy to use rules of thumb for estimating threshold infection level in one region given knowledge of threshold infection in another. We take an pragmatic approach to modeling epidemics subject to variation and random evolution. We show that modeled quantities can be computed as corrections to those that would be computed in deterministic homogeneous counterparts. We appeal to de Finetti's Theorem to justify the surprising generality of this strategy, despite the fact that the models we consider are simple mixtures of models -each one of which might not, in isolation, be considered plausible at the fine scale of neighbourhoods. Our ansatz is a collection of non-interacting regions, termed cities for the sake of exposition. Within each city we imagine there are strongly interacting sub-populations called neighborhoods. 1 We assume each city shares the same stochastic generative model. The path taken by each city's epidemic will be different, but we assume that mean quantities for a stochastic model for a city can be viewed as equivalent to an aggregate model for many cities -as if all paths are played out at once. We consider model observables, or functionals, that are expectations with respect to the model for one city. We focus in particular on two mean quantities: • The overall (mean) early stage growth rate • The mean across cities of the number of susceptible people at peak infectionwhere peak infection time is defined city by city. The second quantity is simpler to compute than a contemporaneous measure of peak infection, due to peak infection occurring at different times in different cities. Our approach is to consider only functionals, such as those listed above, that are symmetric with respect to interchange of neighborhoods. We consider two models providing the same mean observable quantities to be equivalent, and to belong to the same equivalence class (orbit). For example, here are three equivalent expressions for the expected number of infections after three days. We will describe this chain of equalities with slightly more formality. For now, it suggests that we may be well served by characterizing the orbit generated by a symmetry group which passes through nature's true model -perhaps by finding a representative of each orbit. To that end, we attempt to span the space of equivalence classes by considering only exchangeable models (there is precisely one on each orbit). Then, within that class, we only consider models that are mixtures of even simpler models. The simpler models assume neighborhoods are independent and identically distributed (iid). It may be clear to the statistical reader that this approach is inspired by de Finetti's Theorem and its variants for finite collections of variables. For concreteness, and risking the ire of de Finetti, we shall further narrow things down by choosing a specific type of iid model. Our choice is a stochastic variant of the SIR compartmental model in which infections rates follow an Ornstein-Uhlenbeck process. This choice is well motivated both by empirical epidemiological literature and a physical model for repeat contacts. It also takes advantage of the connection to short rate models for interest rates, and we use the Vasicek bond pricing formula with negative interest rates to compute mean early stage growth. In a similar fashion we can estimate mean peak infection using properties of the growth process. Moreover we can relate these quantities to their classic deterministic equivalents by multiplicative corrections that we call convexity adjustments. For example a crude but useful rule of thumb to account for stochastic evolution of infection rate is susceptible multiplier = 1 + relative variation R where the relative variation is the standard deviation of infection rate divided by mean infection rate. We advocate the use of convexity adjustments as a means of rapidly locating the orbit of nature's true sub-population model -by which we mean the orbit induced by permutations of neighborhoods. This paper assumes variation in infection rate is plausible, but only indirectly adds to the evidence for this insofar as it may explain peak infection rates. It is widely believed that infection rate (β as we shall refer to it) exhibits variation across various dimensions because it commingles a variety of factors driving the number of people who will be infected in unit time. Some are biological. Others are behavioral. Infection rate as defined in a compartmental model may be influenced by behavior and patterns of socializing that differ from place to place [7] . At the individual level age [29] dyspnea [26] cardiovascular disease [4] , pregnancy [28] , [30] and other factors may influence outcome and transmission likelihood. If airborne transmission [19] is a key factor, or if supply of ultraviolet light [14] is important, these factors will play a bigger role in some parts of a city than others -even in one apartment versus another. Humidity [24] is considered important. Variation in transmission rates may be driven by culture. A reasonable hypothesis is that household size and mixing of generations therein may be an important dynamic. Economic variation may be a driver also. Some may be better placed to afford protective measures, or voluntarily reduce hazardous work. A recent study of SARS CoV-2 in Iran suggests that wind speed may reduce infection in addition to other climatological factors [2] such as temperature, which is said to drive down infection rates by three percent for every one degree increase [27] . Another intriguing possibility is that existing vaccinations may be associated with dramatically lower incidence [13] . If true, those with the vaccinations and those without will constitute two very different groups with two very different turning points. Population density is another prime culprit because of its inherent plausibility and dramatic variation. We will devote additional time to population density in Section 3.4.1. Figure 1 : Outline of the strategy adopted in this paper. When only symmetric functionals of a model are required, we are motivated to consider representatives from orbits of the symmetric group only, rather than the full model space. Simple mixtures of models in which neighborhoods are independent might be used to quickly traverse a collection of exchangeable models, one of which might provide the same answers as nature's true model -a model we are unlikely to ever locate. Convexity adjustments relate stochastic model fucntionals to their deterministic counterparts, allowing for a potentially rapid resolution of modeling conundrums precipitated by early onset heard immunity. Our strategy is represented stylistically in Figure 5 and is hopefully clear in broad brush terms from the introduction. Here we go into more detail to avoid ambiguity, though the reader familiar with exchangeability and finite versions of de Finetti's Theorem might safely skim parts of this Section. Consider the following questions that might be asked of the model 1. When will peak infection occur? 2. What is the distribution of case counts after seven weeks? 3. What is the mean growth rate in the first three weeks? 4. How many people will remain susceptible when infection peaks? 5. How many cases will there be in the limit t → ∞? Next imagine a city (more generally some population) is divided up in some manner (we use the term neighbourhoods). We imagine some partition P 1 , . . . , P n and some quantity of interest X 1 , . . . , X n measured on each. We suppose, however, that only aggregates such as X = X 1 + . . . X n matter for decision making purposes. More generally any symmetric function can be allowed, such as the sum of the squares of infections in each neighborhood. We formalize this notion with the symmetric group of permutations. Under consideration is a space of models (joint distributions) on n variables. There is an operation on this collection of models induced by permutations π in the symmetric group S(n) acting on S = {1, . . . , n}. Namely if φ is any joint distribution over X = (X 1 , . . . , X n ) then π · φ is the function that when applied to x = (x 1 , . . . , x n ) yields As is well appreciated these actions form a group. If π is another permutation then applying the definition twice π ((π · φ)) (x 1 , . . . , x n ) = (π · φ) (π (x)) = φ (π (π (x))) = φ ((π • π)(x)) = ((π • π)φ) (x 1 , . . . , x n ) Let U denote the symmetrizing operator π that averages any function over all possible permutations. 2 This operator remains unchanged when composed with any permutation since given that the right hand side is a reordering of the sum on the left. Under the action of U any model φ is mapped to an exchangeable model U (φ) meaning that it is left unchanged by the action of all elements of the symmetric group. Furthermore if ψ 1 and ψ 2 are both exchangeable and ψ 1 = πψ 2 for any permutation π then showing that there is one and only one exchangeable model per orbit. 3 These orbits, such as are for us a useful decomposition of the set of possible models, and the considerations above show that the set of exchangeable models traverse the collection of all orbits. Next we formalize the notion that an important planning question for an epidemic might be invariant to permutations. Decision making is said to be driven by functionals of the model. Denoting the space of all models by M, any functional f : M → R on models (joint distributions) can also be acted on by a permutation π. 4 This action is The notation U for the symmetrizing operator comes from U -statistics. 3 In the second to last equality we are using the fact that every element of the symmetric group can be considered a permutation of the group itself, in turn because all elements are invertible. 4 We follow a convention of defining this action in a contragredient manner, but this is choice is not essential to the argument. The functionals need not be scalar valued but in practice we can take them on one at a time. and we can define a symmetrizing operator, also denoted U , on this space. It may be suggestive to write the application of the functional in bra-ket notation: where we appeal to the fact that the sum comprising U is left unchanged when we replace every permutation with its inverse, since this amounts to a mere reordering of terms. Equation 1 is the key to our suggested approach, for if we now suppose nature has a true model φ * and we observe f (φ * ), we also know this value is shard by the same functional applied to a symmetrized version of nature's model. On the surface this is perhaps a controversial way to solve the challenge of how to create an accurate model that is close to reality (i.e. by not really trying). But seemingly what matters is not proximity to truth but proximity to the orbit of the truth. A limitation is that this approach leaves partly unaddressed the challenge of assessing the reasonableness of a choice of orbit (beyond its ability to match empirical data) a task not necessarily made obvious by any given representative of that orbit. However, we know that the exchangeable models constitute a traversal set. This particular traversal set may bring an additional advantage provided by, or at least inspired by, de Finetti's Theorem. In Section 4 we will consider a collection of stochastic infection rate models. However that is not a requirement and the point can be made by considering the more familiar constant parameter SIR model [1] . Suppose we allow β to vary with each realization and leave γ fixed for simplicity. 5 To avoid confusion between average quantities and varying we shall separate out the 5 If using deterministic evolution seems out of spirit with de Finetti's Theorem, the reader may decide to add some simple perturbation of this model, for instance through the addition of a measurement error distribution on instrumented quantities i(t), s(t) and r(t) -or perhaps a more elaborate scheme involving delays in measurement. variation as: whereβ is a constant rate of infection and η(ω) is a function defined on the space Ω over which variation takes place. Assume the population density is a measure on the same space and we have both where ρ(ω) is population density. 6 Thenβ can be interpreted as a population mean infection rate and η(ω) is the variation of the same. If this model is applied to a neighborhood P k and at the same time applied to all other neighbourhoods independently, the city model will be exchangeable. In fact the city model will be iid with respect to neighborhoods. Let us denote such a model as The mixture model satisfies once ω is chosen. Assume now that η takes on the law of a known distribution -preferably one with closed form moment generating function or harmonic mean for reasons that will become clear. We allow θ to summarize parameters controlling η. For example η might be a rescaled Beta distribution and θ = (α, β), overloading the use of β just for a moment. Denote a mixing of models induced by η's law as This model is a mixture of independent, identically distributed models. The integration over η will induce dependence between the neighborhoods. The question arises as to whether there is a θ for which m θ can serve as a proxy for the truth. As per our discussion, this will be true if nature's model lies in the same orbit and, since m θ is exchangeable, this also implies that We provide no guarantee that every exchangeable model can be represented in this fashion (unless ρ is allowed to be a signed measure, which some readers may wish to consider) but as θ varies we may nonetheless traverse a reasonably representative collection of orbits. If the orbit of nature's true model exhibits strong positive correlation between neighborhoods -which in the case of an epidemic it almost certainly does -our chances that it exists on the orbit O(m θ ) for some choice of θ go up substantially. We now come to the chief advantage of using mixtures of simple models. Predictions of the mixture model can often be written as multiples of those for deterministic models. The multiplier captures the impact of Jensen's Inequality, and can often be expressed in closed form in which case it is particularly easy to invert. Our approach makes it difficult to create very badly mispecified models. As an aside, Jensen's Inequality has cropped up in numerous biological contexts. The effect of daily fluctuations in temperature on Rhodnius prolixus, a vector of Chagas disease, was studied by Roalandi and Schilman [20] . A Jensen's Inequality effect was noted in regard to prevalence of sigma virus and host Drosophila melanogaster in Georgia peach stands [25] . Closer to our interest, the nature of bias introduced in cohort-based models for disease progression was considered by Elbasha and Chhatwal [8] . We continue with the same notation as in Section 2. Suppose in a given realization ω peak infection occurs at time t * . At this moment the proportion of the population yet susceptible is s(t * ). In the classic SIR model we expect s(t * ) = γ/β but with a mixture model this quantity gets scaled by a quantity we know exceeds unity (by Jensen's Inequality). 7 We have instead The ratio of the mixed versus unmixed model with β =β is a multiplicative convexity 7 Fix two people Irene and Sam, respectively infectious and susceptible. Let β denote the instantaneous probability (per unit time) that Irena both interacts with, and also infects, Sam. Let γ denote the instantaneous probability that Irene ceases to be contagious. In more general compartmental models γ may comprise a sum of hazard rates for recovery, death, secure isolation or other states. In the absence of any other dynamics influencing the number of susceptible people 8 we have the accounting identity βs(t * ) = γ and an expression for the vulnerable fraction of society as a ratio of instantaneous rates. adjustment, where we would be tempted to use the notation H for herd immunity were it not for the confusion this would cause (J ≈ H −1 not H, where H is standard notation for the harmonic mean). Furthermore this is only approximate. 9 To illustrate, suppose that the law of the variation function η is a scaling of a Beta distributed random variable. 10 The mean of Beta(α, β) is α α+β and the harmonic mean exists for α > 1 and β > 0 which we assume. The harmonic mean is α−1 α+β−1 and the ratio of the mean to harmonic mean is which provides the convexity adjustment in closed form. A brief pragmatic comment on calculation of convexity adjustments. Suppose η = cϕ for some constant c and unormalized variation function ϕ. Then and it is often more convenient to compute a ratio of two unnormalized integrals than worry about constants. In particular we will consider cases where ϕ ∝ ρ α and thus 9 A slightly more careful analysis would consider the case when β(ω) < γ and the first order condition is no longer relevant. A better approximation is provided by where η min = γ/β. For smaller variation this may be splitting hairs. When one considers that many prominent epidemic models failed to predict peak infection to within a multiple of two, it seems reasonable to surmise that in practice, any halfway-reasonable convexity adjustment is likely to move the modeler significantly closer to a well specified model. We use the simpler harmonic mean approximation here because it is so easy to invert, thus giving the researcher a simple ballpark estimate of variation. 10 We need to scale so that the mean of η is unity. Let Ω = (0, β/α) and ρ = 1 β/α be the uniform measure and ν be defined by the inverse cumulative distribution of a beta distributed variable that has been scaled by β/α > 0. for some integral I with parameter α. Evidently it is not necessary to carry through all constants in the integrals in order to determine J(α) or a ratio of two J(α). With these thoughts in mind, if a reported number (R say) is known to be derived from a peak infection estimate, which is to saŷ for some estimateŝ(t * ) of peak s we should be careful to interpret this directly as an estimate of the population size and not a link to mean infection rate via the SIR identity R 0 = γ/β. Even if we bravely identify s T with s(t * ) we would be better served bȳ β = JγR instead of β = γR 0 . The mean rate of contagionβ we infer is higher than a constant β that might be inferred using a homogeneous model. In the preceding discussion of peak infection we implicitly assumed critical times T * = {t ω } ω∈Ω in the evolution of an ensemble of classic SIR compartmental models. Really, peak infection globally occurs when and it is bold to approximate this by assuming balance holds for each ω simultaneously: Here s(ω)ρ(ω)dω = sdρ is interpreted as the count of susceptible people in a subpopulation at this turning point t ω . This gives rise to the approximation However the first approximation may be poor. In the appendix we consider a further convexity adjustment taking into account the differing times of peak infection. On the other hand as variation is increased the set T , the times t ω at which individual peaks are reached, will become more disperse so the estimateŝ(t * ) will be lower than it otherwise would be with no variation. In fact we will havê for some ratio F > 1 depending on choice ofβ and η (also γ and i(0)). 11 Thus a more careful inference would beR Convexity adjustments are also available for early stage growth. We consider mean growth starting with the same proportion of initially infected people i(0) where again infection is drawn at random according to β(ω) =βη(ω) as before. 12 We further assume s(t) ≈ 1 since we assume the measurement takes place in the early stage of the epidemic. So approximately we have We might refer to the quantity defined for some time t that is well before herd immunity, as the growth effective infection rate. It plays a similar role to β in the constant parameter SIR model, at least as far as growth in aggregate infections is concerned. The growth effective infection rate is a 11 The mnemonic here is that F stands for Fourier. For readability all ratios F , J and G to follow are defined so as to be greater than 1. 12 We are attempting to determine properties of an exchangeable model that lies on the same orbit as a realistic one. For a discussion of interacting groups and early stage growth see [17] for example. multiple ofβ, where we introduce G and note that again by Jensen's Inequality G(t) > 1. The growth ratio G will depend on the parameters of η and the time t marking the end of the interval (0, t) where we measure growth. It will be apparent that G is closely related to the moment generating function of the variation function η. In financial terminology it also plays a role analogous to a yield. We will draw a closer connection to finance in Section 4. Note for now that dividing bȳ β we have showing that adjusted growth as a multiple ofβ, when time is measured in units of 1/β, is equal to the extent to which the logarithm of the moment generating function of η grows faster than linear time. In those more sensible time units is slightly more convenient expression of G with t =βt. is exact and this approximation may be useful elsewhere. The exponent is the variance σ 2 of the variation function η. Thus measured growth to time t will be and effective β g given by although this demands care in interpretation for non-gaussian variation. The issue is thatβt may not be small so approximations for small times may have limited utility. Early stage growth is often measured after a few multiples of 1/β to minimize small sample effects, censoring and related measurement inaccuracies. It may also be prudent to choose t larger than 1/γ, the typical recovery time, to reduce the impact of short time effects such as an initial, rapid decrease in novel interactions. If η varies by a large percentage over some regions of the probability space we must be additionally careful. Therefore in what follows we assume t is several multiples of 1/β. To the extent that analytic approximations are useful one might steer towards the expansions around tβ = ∞ rather than t = 0. We shall also compute G for several distributions of η in Section 6.4. Thus far we have considered deterministic evolution. However to prepare for Section 4 we briefly remark that this is not a prerequisite and we can allow stochastic dynamics. In the field of quantitative finance, considerable work has gone into the finding of solutions of the form where x s is a stochastic process and the integral can be viewed as a Feynman-Kac formula solving a diffusion equation. One interpretation arises when x s is a short rate (instantaneous rate of interest) and the integral is the balance in an account after time t when all interest is reinvested. Assuming a convenient expression or method of calculation, the growth convexity adjustment is facilitated by the fact that is tractable (where the equality glosses over some technicalities). More frequently, at least in the fixed income literature, formulas as provided for the closely related quantity whose interpretation is the price of a bond with maturity t. The insurance and failure rate literature also provided examples of convenient choices of stochastic process x s where x s is interpreted as an instantaneous rate of failure of a machine, or default of a bond issuer and the integral represents a probability of survival (either a real world probability or more likely, in the case of credit default swaps, a pricing measure). For some solutions there is a trivial connection between the moment generating function and the survival function. We give an example in Section 4 of using −x in place of x to compute mean growth for an epidemic. 13 The convexity adjustment G ratios assume s(t) ≈ 1 and will be invalid if the susceptible population rises to an appreciable level for any appreciable mass of Ω. Having considered the impact on both early stage growth and peak infection, we turn to the questions of how observations of the former might inform the latter -and how one might find models which are capable of fitting both types of evidence for some choice of parameters. The mean infection rateβ and variation η describe the system and thus must relate the two, but they are not directly observed. Through a homogeneous SIR model lens we might 1. Estimate R 0 by some means 2. Forecast s(t * ) = 1/R 0 . And we note that theoretically R 0 = γ/β. The evidence from Sweden seems to suggest there is something askew with this logic. Here is a possible explanation. Assuming γ is known and the true system behaves is as we have described it -as a mixture with varying β -the first step will in fact produce an estimate of the growth effective infection rate β g due to the relation R 0 = γ/β g . Here the denominator is related toβ and η by as we have seen. We have also determined that the susceptible population at peak s(t * ) relates toβ and η via Thus we might write The threshold number of infections will be higher due to two applications of Jensen's Inequality. One can add a third if we also consider the impact of peak infection at different times and are able to estimate this We emphasize that the term G(t) will be a function of the interval (0, t) over which a measurement of early stage growth is made. Assume now some empirical data at different stages of epidemics and a desire to build models that can, for some choice of parameters, explain multiple types of observation coherently. In particular suppose we are presented with data for the two functionals we have paid attention to: early stage growth and peak infection susceptible population. We do not feel qualified to accurately quantify mispecification given the miriad issues involved with COVID-19. But let us suppose, for the sake of discussion, that when a constant parameter SIR model is used to infer R at peak infection the ratio of β to γ is approximately 5/4. On the other let us also suppose, again just for illustration, that this ratio implies 1.7 times less early stage growth than is implied by reproduction numbers on the lower end of the spectrum. 14 We propose to use the relationships established in Section 3 to quickly identify the 14 In the case of constant γ the time until an infectious person transmits to another is exponentially distributed. This leads to a relationship between reproduction number R, growth r and recovery γ R = 1 + r γ consistent with R = β/γ and r = β − γ as we expect. When variation is introduced into γ the reproduction number will be reduced relative to the reproduction number computed using the mean of γ. However we shall limit our attention to convexity adjustments for the variable β, referring the reader to Wallinga and Lipsitch for further discussion of mean generation times [23] . parameter θ which controls the variation in η and thus β. In other words, we solve for θ or more abstractly, solve for η. We emphasize again that the variation implied by this procedure must be viewed in the light of de Finetti's Theorem. The mixing serves as a proxy for dependence between neighborhoods that are otherwise independent, so it need not represent a seemingly realistic level of actual spatial variation. Referring back to our original Figure 5 , it is apparent that inverting convexity adjustments can move us much closer to the correct orbit. This approach stands in stark contrast to other modeling strategies which involve modifications to elaborate simulations that attempt to model reality at a granular level (using geospatial demographic data, transport networks and so forth). These models may in some respect be closer to the truth. However they may easily be further from the true orbit. Inverting the convexity adjustments is unlikely to present serious difficulty but for concreteness, one simple iterative approach would simply assume equal convexity adjustments arising. Let's say the functionals f 1 and f 2 represent measurements and J 1 and J 2 the respective convexity adjustments, once might suppose γ is known andβ and θ are to be determined. We assume a missing factor of λ where, in the case of Sweden's puzzling herd immunity, we might set λ = 1.7 say. As a first pass we express this model mispecification as where φ * is nature's unseen true model but f 1 (φ * ) is observed, and andβ 0 is some first guess of infection rate parameter taking into account the conflicting evidence. For instance if threshold infection implies β = 1.25γ and growth implies β = 1.65γ then we take β * = 1.45γ. Here φ * is nature's true model and the right sides of these equations are observed quantities. 15 Each of these equations might be solved for θ independently, leading to a next guess at θ (say the average of the two) and a next value of λ. This simple approach is motivated by the observation that J's may be independent ofβ altogether or only weakly depend onβ. Before leaving the topic of convexity adjustments we point our that longitudinal prediction is not the only application. Since variation driven adjustments provide corrections for measurable quantities relative to deterministic compartmental models, they can also provide a very convenient way to relate two different epidemics at the same stage subject to different conditions. Moreover, the convexity adjustments per se may not vary strongly with underlying assumptions used for the deterministic models. So it is possible that this sort of comparison may sidestep problems associated with calibration of compartmental models. To illustrate we consider threshold infection levels and once again the measurable of interest is the number of susceptible people when balance holds. To illustrate we assume variation η is driven by population density. Though this is no the only source of variation it is certainly significant and moreover the degree of variation also varies considerably. This is manifest in the way population density takes very different shapes, referring to the geometry of population density. Figure 2 shows the population density for London, whereas the equivalent for Paris is depicted in Figure 3 . 16 Empirical study of population density and its effect on epidemic growth is a muddied picture with some authors appealing to the likelihood of a large impact (e.g. [21] ) whereas in a study of pine beetles a thinning of the population did not noticeably change an epidemic's equilibrium [16] . Population fluctuations appear to drive pathogen spread in Niger [9] . Different rates of microparasite transmission were observed in cities versus countries, providing indirect evidence of the role of density [10] . HIV transmission reconstructed with phylogenetic methods revealed density dependence [15] . COVID-19 cases in China and the U.S. have seemingly concentrated in areas of high population density -though there are of course confounding factors such as proximity to international airports and movements of people. Without seeking to evaluate the 15 This may well be the case at time of writing. If f 1 and f 2 represent measurements of early stage growth of an epidemic and threshold infection, as we have been discussing, then evidence from Sweden may be interpreted as an onset of herd immunity with s(t) ≈ 0.8 corresponding to a ratio of β to γ in the SIR model of 5/4, much lower than R values implied by spread of the disease. 16 Population graphics provided by Matt Daniels of The Pudding. evidence we suppose merely in this section that population density plays a role -possibly a weak one. 17 Focusing on cross-section comparison, we suppose two towns: Town A and Town B. We assume the epidemic has long since run its course in Town A. We are given the task of estimating herd immunity levels and reproduction numbers for Town B. This town shares many of the same properties as Town A including the same mean contagiousness -which is to say the likelihood of a contagious person infecting another. The only difference between the two towns we can identify is that Town A's population profile is shaped like a bell curve whereas Town B's falls away more like an exponential distribution. Gamma (r 2 exp(−r)) 3 Skirt (1/r 3 ) 6 Based on ratios of convexity adjustments to be provided, we suggest the following way to answer this challenge. A calculations such as the following can be used to adjust for population shape: where R hom represents a reproduction number that that would exist in the absence of any variation. We probably never get to see R hom , however, because there isn't a country, city or neighbourhood in the world lacking variation. Instead, we first infer a true underlying level of contagiousness in Town A by subtracting a convexity effect, and then add back the convexity correction for Town B. The coefficients provided in Table 3 .4.1 are simply the first terms in a Taylor expansion of the product of convexity adjustments. We provide some formalization of these calculus exercises. We identify the probability space with the plane R. The link between density and contagiousness is muddied [21] [15] but even a weak relationship may be important. We shall for simplicity assume that contagiousness β is a function of population density via a power law. for some 0 < α < 1 and population density ρ. Further, we shall assume that density is a radial function from the origin and thus so is β(·) = β(r) is as well. We can write β(r) β = η(r) ∝ ρ(r) α As per our comments in Section 3.1.1 it is marginally cleaner to work with integrals of an unnormalized variation φ and determine the ratio of the integral. If a city's population density varies as ρ(r, θ) = 1 2π e −r 2 /2 as we move out a distance r from the center then we take φ(r) = e −r 2 α/2 ρφ = 1 2π ∞ 0 e − 1+α 2 r 2 rdr = 1 1 + α Recycling the same integral with α → −α and taking a ratio: Merely as an example, J(a = 1/4) = 5/3 if we use a power law coefficient loosely informed by Measles outbreaks in the United States. Next consider a city with more sprawl. Set ρ(r, θ) = 1 2π e −r to create a city whose density falls off more abruptly at first, but with a much longer tail. With φ(r) = e −αr we have which is the square of the Jensen's ratio for a Gaussian city. Comparing with the Gaussian case and holding everything else constant, even a ten percent relationship between log contagiousness and log density might imply a twenty percent correction in the number of people who escape the virus when we translate the data from a Gaussian city to an Exponential one. More examples of shape adjustments are given in Section 6.5 We have outlined a strategy using mixtures of independent identically distributed (iid) models that are carefully interpreted as representatives of equivalence classes of models as per Figure 5 . We now apply the approach using models where each neighborhood's epidemic follows a stochastic compartmental model. Infection rate follows an Ornstein-Uhlenbeck process, which implies early stage growth follows an Ornstein-Uhlenbeck process as well. This approach also adopts the suggestion in Section 3.2.2, namely that early stage growth calculations can be borrowed from the fixed income literature. We call this a Vasicek compartmental model after the corresponding work on interest rates by Vasicek (in which short rate follow an Ornstein-Uhlenbeck process) and we borrow his formula for the price of a bond that pays no coupons. While this may be taken as a mere example and a somewhat arbitrary (suspiciously convenient) choice of basis, we provide in Section 7 some motivation for Ornstein-Uhlenbeck infection rates leaning both on empirical work [5] and also a more theoretical justification in terms of a physical model for repeat contact probability [6] . In contrast to most of the discussion in Section 3, we allow infection rate β(t) to follow a stochastic process rather than a deterministic process. However in a manner similar to the preceding it will have parameters driven by ω ∈ Ω once again, where Ω is an additional probability space having nothing to do with the evolution of the stochastic process driving β. A general example is Here β ∞ is a mean reversion level and the rate κ(ω) controls how quickly reversion occurs. This is inserted into the SIR differential equations as before. Here the initial value of infection rate, variance, mean reversion level and pull κ can all be varied with ω. Notice that if we define then dr(t) = dβ(t) and r(t) may be seen to satisfy a similar stochastic differential equation. Add zero to the drift: to see this, noting that r(t) also has different initial condition r 0 = β 0 − γ. We begin with an elementary observation that applies if evolution of β(t) is unrelated to choice of ω, or more accurately if it is conditionally independent. By the law of iterated expectations we have (8) which will simplify in some cases, as we shall see, to give simple additive corrections for early stage growth arising from variation. Unlike the deterministic mixture model, the mean growth is not obvious even for fixed ω. The mean across different paths generated by W t requires an expectation of the exponential of a stochastic integral. Fixing ω for now, expected growth in infections is given by where τ (t) and B(t) are functions not depending on the initial rate r 0 (ω) and we have written the expectation with a W subscript to emphasize that this is with respect to the law of the stochastic process (not the probability space Ω that might do its own mixing independently). Here r 0 (ω) appears only as a linear term. The function τ depends only on κ ω τ (t) := 1 − e −κt κ and is approximately equal to t for small times. The function B(t; κ, σ) is given by and revealed in the next section to be a thinly veiled bond price formula. The initial growth rate is r 0 (ω) as we expect and the eventual growth rate is although this holds only so long as s(t) ≈ 1. We have some flexibility to match an infection rate decaying towards a lower level, but actually the σ 2 terms allow us to fit slightly more complicated term structures, as is revealed by rearranging the growth equation. Here we might view the functions 1, τ /t and (1 + τ /2)(τ /t) as a basis for shapes taken by growth as a function of time. Or, separating the roles of the parameters, we might introduce ratios so that ν δ represents the ratio of the mean infection rate attenuation to eventual infection rate and ν σ is the ratio of the ergodic standard deviation of the infection rate process to the long term mean. Then growth is written Figure 4 : Basis for early stage growth shape using Equation 14 . The difference between initial infection rate and long term infection rate is multiplied by the orange function τ /t which can be interpreted as including an attenuation due to repeat contacts. Variation drives a yield component shown in green, which has been scaled down by a factor of 200 relative to the term τ 2 κt in Equation 14 . The green line would be multiplied by ν 2 σ where ν σ = σ √ 2κr∞ is the ratio of the long run standard deviation of instantaneous growth to the long run mean of the same. We plot this basis in Figure 4 , which might be viewed as a non-orthogonal basis. One might write to emphasize a natural choice of parameters r ∞ , ν 0 and ν 2 σ , yield basis and resultant ratio to long term growth. Two of these basis functions can be accommodated in the deterministic setting also, but we can revert to that by setting σ ≡ 0 rather than treating it separately. Our terminology is suggestive and as noted the growth formula in Equation 11 appears elsewhere in another guise. It is the price of a bond paying no coupons in a Vasicek short rate model [22] . This correspondence obviates a full derivation, though that is a straightforward if lengthy exercise. 18 Skipping past that thanks to Vasicek's formula, we need only observe that the price of a zero coupon bond with maturity t, mean reversion level x ∞ , initial short rate x 0 and κ and σ as before is and τ as above. If we let x = −r with x 0 → −r 0 and x ∞ → −r ∞ we have and since W t and −W t have the same law, r t follows a "Vasicek process". Thus by using negative interest rates in this manner we have mean early stage growth and we have justified Equation 9. 19 18 The Vasicek bond price formula is a convexity calculation for a normal random variable. Most of the terms arise from the formula for the variance of the Ornstein-Uhlenbeck process, one half of which contributes to the difference between E[exp( t 0 r s ds)] and exp(E[ t 0 r s ds]). 19 The exponentially exploding bond price may seen counterintuitive. However a bond becomes extremely expensive when it is an attempt to shield wealth from the confiscatory effect of negative Again let us fix ω ∈ Ω. From the "inner" mean growth rate defined in Equation 9 and given in Equation 11 , we derive the convexity adjustment defined in Equation 5 for mean early stage growth relative to the deterministic model simply by setting σ(ω) ≡ 0. It is apparent from Equation 10 that where we recall that is the long term ergodic variance of the infection rate β. This is the multiplicative adjustment that needs to be made to account for variation induced by different paths taken by infection in different places, as compared with simply assuming infection rate is the same everywhere. Referring to Equation 8 we next consider the overall convexity taking into account variation due to ω ∈ Ω, and show that this is easily computed for some choices. In other words we allow η to determining the law of β 0 =βη in analogy with the deterministic SIR case considered in Section 2. Using an expression for the mean exponentialed integral analogous to Equation 11 , we have This will simplify in some cases we now consider. instantaneous interest rates. Suppose now that under the measure ρ the variables r ∞ , σ, κ and r 0 vary independently. The numerator in Equation 17 then factors into product terms, many of which cancel immediately with terms on the denominator. which we recognize as a sum of convexity adjustments that might be independently computed. For example the first term is a convexity adjustment pertaining to the ergodic mean variance. If we write for some convenient choice of variation function η then the approach of Section 3 applies (assuming independence allows this to be recycled in the second term also). To focus on this first term in Equation 18 momentarily, which is the only one to survive several mean generation times: where G Ω is a convexity adjustment arising from variation of σ 2 ∞ over Ω. On the other hand G W might also be called a convexity adjustment although it arises from the existence of σ 2 ∞ and is proportional to the mean value of the ergodic variance. In this way a simple meta-population model with two types of variation is easy to relate to its deterministic counterpart at least as far as early stage growth is concerned. This property arises from choice of an infection rate falling into the affine class, as we remarked earlier, which is evidently a tractable choice. In contrast, if early stage growth is not linear in the initial value of β 0 then we have more work to do. Likewise if a complicated dependency is assumed between the variables r ∞ , σ, κ and r 0 as functions of ω then no simple calculation such as this may apply. This is not to suggest that there are not other ways to make progress simplifying Equation 11 for other choices. The instantaneous infection rate at the onset of herd immunity will determine whether balance occurs between recovery and new infections (as compared with the mean growth up until that point in time). 20 The mean and variance of β(t) are so with as shorthand then the ergodic distribution is β(t → ∞) ∼ N (β ∞ , σ 2 ∞ ). As noted in Equation 6 an approximate convexity adjustment for peak infection might break down into two components: one arising from deviation of the harmonic mean from the arithmetic mean and the other arising from a superposition effect. Some approximations for peak infection might be couched in terms of an auxiliary function h(a, b, c; µ, σ) := 1 which is a generalization of the harmonic mean. To see this, note that the susceptible population can be approximated as are the mean and variance of the infection rate. This approximation assumes that when β(t) < γ the epidemic never really gets going and the susceptible population remains close to unity. Using the Taylor expansion valid for x ∈ (0, 1) let { i } n i=1 be independent standard normal. In the limit n → ∞ we have because the odd powers have zero mean and the even powers are the well known moments of the normal distribution. The simplest convexity adjustment for peak infection using only the first term is where we recognize the two ratios. The adjustment in Equation 22 is an easy mental calculation as R ∞ := β ∞ /γ is a kind of limiting R 0 value (to mix conventions somewhat) whereas σ ∞ /β ∞ is the limiting relative variation of infection. Thus to translate relative variation into a herd immunity adjustment we can use the very coarse but easy to remember formula: susceptible multiplier = 1 + relative variation R This, we emphasize, is the number we should multiply the susceptible population by to correct an estimate that is derived by assuming infection follows a deterministic trajectory. Note that even for R values such as R = 2 or R = 4 this is an appreciable correction. We can also use more accurate approximations using the derivation of Equation 22 , of course. It is possible that some compartmental models for disease spread suffer from a pronounced flaw of averages, and it is notable that in comparing model predictions for early stage growth and peak infection, these errors combine as we have shown in Section 3. However we need not discard the elegance and simplicity of compartmental models, if simple corrections for variation are available. We have treated the predictions of models exhibiting variation as perturbations of models that do not. This follows a long tradition in applied mathematics and in particular homogenization [18] . It helps us build on the intuition of the simpler models while still capturing the first order effects. We have shown that mixture models and stochastic models are not only interesting in their own right, and may resolve serious model mispecification issues, but also find interpretation within an approach that is motivated by de Finetti's Theorem. In this setting, mixture models with tractable convexity adjustments can help locate a model that is equivalent to nature's, at least as far as symmetric aggregate quantities are concerned. In this approach it is not a requirement that the amount of variation implied by convexity adjustments be plausible purely from the point of view of population variation. And it may be possible to provide actionable intelligence without ever coming close to the true model. From an analytical perspective it is clear from the size of the adjustments that care is required any time we have variation that reflects genuine heterogeneity. This is true of cross sectional comparisons also, if the amount of variation can be expected to vary. Population density may have only a weak relationship to contagion but because it varies so dramatically a convexity adjustment may be warranted. Finally, we have appealed to the usefulness of growth models of a different kind, taken from the interest rate literature. Though this connection is limited to early stage growth the flexibility provided may be of benefit to epidemiology. Stochastic growth rates and complex term structures are a given in that field. So our hope is that this isn't the last use of Fixed Income modeling techniques for disease modeling. We provide further examples of convexity adjustments for peak infection, early stage growth and population profile. We suppose the law of η is an Inverse Gamma distribution with parameters α > 1 and β chosen such that the mean β α−1 = 1. That is to say β = α − 1. The mean of the inverse is α/β and thus In passing, we note that this example might be formalized by letting the probability space be Ω = (0, 1) with uniform ρ equivalent to Lebesgue measure and η the inverse cumulative distribution function for the Inverse Gamma function. In similar fashion other examples can be made to satisfy the definition we provided for the variation function η, although when η does not depend on ρ this may seem a tad ceremonial. Suppose instead the law of η is log-normally distributed, which is to say that with mild abuse of notation η ∼ e µ+σZ for standard normal Z. The mean is whereas the harmonic mean is The ratio J is the ratio of the mean to the harmonic mean and thus We set σ 2 = −2µ so that the mean is 1. So alternatively we can write bearing in mind µ < 0. Fix α > 1 and let σ = 1 − 1 α . Assume the law of η follows a Pareto distribution, which is to say that the probability it exceeds x is x σ α for x > σ. We have The following approximation of the harmonic mean of a gaussian not centered near zero is provided by Dimitri Offengenden. One starts with the following approximation which Offengenden finds is approximately valid for x ∈ (0, 1) with error no more than one to two percent, and at the same time quite accurate for larger values of x (fitting considerably better than the Taylor expansion, for example). Let { i } n i=1 be independent standard normal. In the limit n → ∞ we have where i are independent standard normal. In Section 4.9 we considered an approximation using a modified harmonic mean h. There are no doubt many ways to approximate this. Here we make two observations only 1. It suffices to consider the case of standard normal Recall the auxiliary function that generalizes the harmonic mean: h(a, b, c; µ, σ) := 1 be an affine transformed version of X. As Y is manifestly a standard normal random variable, and since max(X, 1) = max (Y σ X + µ X , 1) allowing one to specialize to the case of standard normal random variables when approximating h. Furthermore where the first term is merely a multiple of the cumulative normal distribution function. In Section ?? we considered the case c = −∞ where only the second term survives. The technique used there to approximate 1 a+bx = 1 a 1 1+ b a x could be applied to convert the second term into a sum of normal options. 21 This might be overkill, admittedly, and the reader likely won't need reminding that this class of approximations assumes independent equilibria are reached between populations -likely an assumption leading to greater error. 22 21 We again thank Dimitri Offengenden for the suggestion. 22 In future work we hope to use these convexity adjustments as features for predicting actual peak infection levels in agent simulations, thereby providing some calibration useful for actual epidemics. In order to provide more tractable examples of early stage growth adjustments due to variation, it obviously behoves us to consider variation functions with known moment generating functions. These examples are standard. We merely enforce normalization, which sometimes reveal bounds on the growth convexity adjustments. Recall that we use t =βt to measure time in units that are convenient for this purpose. Let Ω = (0, 1) with uniform measure ρ and define η by the inverse cumulative distribution function of the Gamma distribution. Then the law of η is Gamma distributed and where k and θ are the Gamma distribution parameters. We must set k = 1/θ to achieve a mean E[η] = 1. The probability density function is 1 Γ(k)θ k x k−1 e −x/θ but will only serve us if θ is small enough that t < 1/θ, for otherwise Equation 25 is invalid. If it is valid then where the inequality follows from the limitation t < 1/θ corresponding to t < 1 βθ . Similarly we can allow η to be Poisson distributed with parameter λ = 1. Then and thus Excess growth is unbounded. However a more careful analysis would curtail growth for some values of η where the approximation s(t) ≈ 1 is no longer valid. Let η be distributed as the the negative binomial distribution with parameters r and p, and with p = 1 1+r so that E[η] = 1. Then and so is the excess growth. With η Binomial(n,p) and p = 1/n a similar calculation yields Let Ω = (0, 2) with η(ω) = ω and ρ(ω) = 1 2 so that E[η] = 1 as required. Since we have in agreement with the previous gaussian approximation for small t after we note that the variance of η is σ 2 = 1/3. But when t is several multiples of of the timescale 1/β the limit t 1/β becomes more relevant. We have G(t) → 2 Figure 6 : An example of a stylized population density profile resembling a circus tent. In similar fashion we could quadratic η. We set Ω = (0, showing that if the limit is relevant, the growth inflated by recovery is three times what it would be β were to be held constant. G(t) → 3 Following on from Section 3.4.1 we provide more examples of convexity adjustments when contagion varies as a power law of population. The city's population takes the form of a circus tent shown in Figure 6 . Here we assume the city has a radius of b and the tent's roof, pitched at a height 1, falls off at a 45 degree angle. Quite unlike a circus tent it is assumed that the height of this roof represents population density, which drops abruptly to zero as we reach a distance b from the center of town. 23 The multiplicative correction takes values on the order of 1.5 for a ≈ 1/4 and b ≈ 0.9, for example. This function is plotted in Figure 7 6.6.1 Power law population distribution An example when J is unbounded is provided by a "skirt". Suppose ρ(r) ∝ 1/r 3 and the population lives outside the unit circle. Then It is of note that the correction is inaccurate beyond small values of α. The size of the immune group cannot exceed unity and this is not taken into account. With density proportional to re −r and β(r) the α power of the same, J 1 (α) = ∞ 0 (rexp(−r)) 1+α rdr ∞ 0 (rexp(−r)) 1−α rdr 23 That is to say that all citizens reside in a conical version of the Luxor Hotel. Figure 7 : Convexity adjustment to the susceptible population as a function of parameters a and b determining contagiousness in a tent-shaped population profile. Population density falls off linearly as we move away from the origin until we reach a radius b < 1 as shown in Figure 6 . Contagiousness is related to density by a power law with coefficient a 1. Even a weak relationship between population density and contagiousness can imply a significant ratio between the number of susceptible people at peak infection and the number of susceptible people at peak infection when we assume homogeneous density. where γ is the Euler-Mascheroni constant, γ ≈ 0.57721. If instead density is proportional to r 2 e −r then J 2 (α) = ∞ 0 (r 2 exp(−r)) 1+α rdr ∞ 0 (r 2 exp(−r)) 1−α rdr Notice that these adjustment are material even if the power law α relating population density to contagiousness is relatively small, say α ≈ 0.1. We provide further motivation for the choice of Ornstein-Uhlenbeck infection rates Chowell et al [5] propose modeling infection rates as time varying: We refer the reader to the extensive empirical survey by these authors that led them to suggest this model for early stage growth. The reader will recognize this as the mean of an Ornstein-Uhlenbeck process. In [6] the probability of novel contact in an infinite agents model is shown to take the form β(t; β 0 , α 0 ) = β 0 Ein(α 0 t) 1 − e −α 0 t α 0 t This model assumes that agents populate the plane R 2 and make visits to nearby locations with probability proportional to a bivariate gaussian distribution. When probability of repeat contact is taken into account we can approximate the system by a compartmental model with delay differential equation for infection. In this approach the infection rate is multiplied by an average over the infected cohort P (t; α) = Because this is a vintage effect it may be seen to fall within a broad class of models considered by Kermack and McKendrick [1] , [11] [12] . This particular choice leads to infection rates resembling the patterns seen in Figure 8 , which can be seen to be reasonably well approximated by our choice. The reader is referred to the paper for further details [6] . Figure 8 : A physical infinite agents model provides further motivation for choice of decaying mean-revering infection rate. In [6] the variation in infection rate β(t) revealed by numerical solution of delay differential equations in a modified SIR compartmental model. The form taken accounts for repeat contacts. It can be seen that infection rate is reasonably well approximated by an Ornstein-Uhlenbeck process -at least for early stage growth and up until the time of peak infection -the period relevant to convexity adjustments presented here (in the late stage of the epidemic a shifting in the vintage of the infected cohort may lead to a second drop in infection rate). Containing Papers of a Mathematical and Physical Character Investigation of effective climatology parameters on COVID-19 outbreak in Iran Basic estimation-prediction techniques for Covid-19, and a prediction for Stockholm Analysis of myocardial injury in patients with COVID-19 and association between concomitant cardiovascular diseases and severity of COVID-19 Mathematical models to characterize early epidemic growth: A review Repeat Contact and the Spread of Disease Mixing patterns between age groups in social networks Characterizing Heterogeneity Bias in Cohort-Based Models Rural-urban gradient in seasonal forcing of measles transmission in Niger Cities and villages: Infection hierarchies in a measles metapopulation Contributions to the mathematical theory of epidemics-II. The problem of endemicity Contributions to the mathematical theory of epidemics-III. Further studies of the problem of endemicity Significantly Improved COVID-19 Outcomes in Countries with Higher BCG Vaccination Coverage: A Multivariable Analysis. medRxiv COVID-19 Coronavirus Ultraviolet Susceptibility 2020 COVID-19 Coronavirus Ultraviolet Susceptibility. purple sun Using an epidemiological model for phylogenetic inference reveals density dependence in hiv transmission Density-dependent population dynamics of mountain pine beetle in thinned and unthinned stands Pros and cons of estimating the reproduction number from early epidemic growth rate of influenza A (H1N1) A framework for adaptive multiscale methods for elliptic problems. Multiscale Modeling and Simulation Dynamic social networks and the implications for the spread of infectious disease The costs of living in a thermal fluctuating environment for the tropical haematophagous bug, Rhodnius prolixus Effects of population density on the spread of disease An equilibrium characterization of the term structure How generation intervals shape the relationship between growth rates and reproductive numbers High Temperature and High Humidity Reduce the Transmission of COVID-19 The prevalence and persistence of sigma virus, a biparentally transmitted parasite of Drosophila melanogaster Risk Factors Associated with Acute Respiratory Distress Syndrome and Death in Patients with Coronavirus Disease Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries Clinical features and obstetric and neonatal outcomes of pregnant patients with COVID-19 in Wuhan Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan Analysis of the pregnancy outcomes in pregnant women with COVID-19 in Hubei Province