key: cord-0500642-7wcmm04j authors: Wilkinson, Ryan; Roper, Marcus title: Homogeneous Interpretable Approximations to Heterogeneous SIR Models date: 2020-12-24 journal: nan DOI: nan sha: c5b25da3dd526f6809739e03a64d68bc4e742089 doc_id: 500642 cord_uid: 7wcmm04j The SIR-compartment model is among the simplest models that describe the spread of a disease through a population. The model makes the unrealistic assumption that the population through which the disease is spreading is well-mixed. Although real populations have heterogeneities in contacts not represented in the SIR model, it nevertheless well fits real US state Covid-19 case data. Here we demonstrate mathematically how closely the simple continuous SIR model approximates a model which includes heterogeneous contacts, and provide insight onto how one can interpret parameters gleaned from regression in the context of heterogeneous dynamics. The SIR-compartment model is among the simplest models that describe the spread of a disease through a population. The model makes the unrealistic assumption that the population through which the disease is spreading is well-mixed. Although real populations have heterogeneities in contacts not represented in the SIR model, it nevertheless well fits real US state Covid-19 case data. Here we demonstrate mathematically how closely the simple continuous SIR model approximates a model which includes heterogeneous contacts, and provide insight onto how one can interpret parameters gleaned from regression in the context of heterogeneous dynamics. Differences in susceptibility and rates of contact between individuals strongly affect their likelihood of catching Covid-19. High contact rates may explain why outbreaks have been particularly devastating for prisons [4] , nursing homes [5] , and cruise ships [6] , while variations in susceptibility between students, families and caretakers, and school staff vastly complicate modeling of the conditions needed to reopen schools during the epidemic. Heterogeneities also strongly influence the thresholds for herd immunity [7] . However, heterogeneity-capturing models contain many unknown parameters that are difficult to fit to real data. Accordingly, public health departments continue to make predictions about the progress of the epidemic and about the effectiveness of social distancing based on so-called well-mixed models (reviewed in [8] ), and these models can be made to fit existing case data very well. To show the general applicability of well-mixed models, we first sought to determine how closely the number of cumulative cases in different US states from the beginning of their respective outbreaks to May 20th, by which time stay at home orders had been relaxed, changing the transmission rate of the disease [9] , can be modeled by the SIR model. Solutions of SIR models with different initial conditions and populations are mathematically similar when rescaled by the population size and translated in time [8] . Accordingly, we tested whether cumulative cases for many different states could be collapsed onto a single master curve if the number of cases was scaled by an appropriately chosen population size, and the curves shifted in time to account for the different arrival times of index cases in each state. Even though the 52 states and territories have different heterogeneous compositions based on their age-demography, urban-rural subpopulations, and other factors, the state Covid curves could be clustered into just 7 different clusters of curves ( Fig. 1 and Supplementary Materials). We then modeled the master curves for each state using a well-mixed SIR-model [10] . In the model, a group of susceptibles (S ) transition via contact with infectious individuals to the infectious (I ) group, and after some time recover or are removed (R). The three compart-ments evolve according to: while dR dt = γI ensures that the total number of individuals S + I + R = N tot remains constant. The susceptibility coefficient β represents the number of infections caused by a single infected individual in an otherwise susceptible population in unit time. γ is a basic recovery rate. For the above equations to be valid the population should be well-mixed: everyone in the population interacts with everyone else at all times. Given geographic considerations and changes in mixing behavior during a pandemic, this assumption cannot be true. Accordingly, much modern epidemiological modeling, including of Covid-19 [8, 11, 12] , has focused on the role of heterogeneous contacts, either by ramifying compartments or by using networks to model connections between individuals [7, [13] [14] [15] [16] [17] [18] [19] [20] [21] . Previous work has shown that under certain conditions heterogeneous models by well-mixed models [22] [23] [24] , but there is limited data showing these conditions are met by real epidemics. The SIR model fits the US state data for 6 of the 7 clusters, accounting for 37 of 52 states and territories (Fig. 1 ). The remaining data fit well at larger case numbers, when the data are less likely to be influenced by stochastic effects [25] . Given its neglect of heterogeneities in population contacts, why does the basic SIR differential equation fit pandemic data so well? A Heterogeneous Extension.-To examine why the simple SIR model fits real pandemic data at all, we consider a multi-population SIR model that allows for populations to be heterogeneous in their contacts and susceptibilities. The model assumes that the population is split into K subpopulations, with population sizes N i , i = 1, 2, . . . , K, that interact with each other with contact rate β ij , i.e. β ij is the number of infections a single infected individual in subpopulation i could cause in subpopulation j, if j contains only susceptible individuals. Our subpopulations represent geographic or demographic partitions of N tot . In this first treatment we neglect differences in recovery rate based on individual characteristics. The susceptible, infectious, and removed quantities of subpopulation i is given by S i , I i , and R i respectively and also write i S i = S, i I i = I, i R i = R, and To emphasize the similarity between the multipopulation and the single-population SIR model we write the equation in a form with a simple SIR part, with arbitrary coefficientβ plus a residual. We obtain a best SIR fit by minimizing the L 2 norm of the ratio of terms on an arbitrary interval (t 1 , t 2 ), yielding a unique minimizingβ [25] :β We give asymptotic results forβ here and in [25] and numerically explore a specific class of contact matrices. Intermediate Mixed/Unmixed Models.-We first consider the case where subpopulations either do not interact or else interact at identical rates: Here δ ij is the Kronecker delta, b a mixing parameter a The least-squares fitting algorithm that generated these results treats γ and Ntot as known. ranging from 0 to 1, and A ij is an adjacency matrix describing which subpopulations interact. Assume initially A ij ≡ 1. When b is 0 the populations do not mix, and infections spread within but not between subpopulations, and when b = 1 the subpopulations mix completely, ef- R0 estimates from fitting entire case data curve optimally to well-mixed model (blue) agree with linearized analysis by next-generation matrix method (yellow) [3] , but not to empirical fits to the data assuming exponential growth (red). fectively merging into a single homogeneous population. Scaling β ij N j ensures that interactions between subpopulations are proportionate to their sizes. A single SIR model fits this model for b 0.2 ( Fig 2) . Moreover, theβ gathered by Eq. 3 is very close to the β achieved by least squares model fitting to the full simulation, although unsurprisingly least squares fitting performs better at fitting the actual curve for lower values of b. The agreement betweenβ and the fitted β is encouraging: for the consideration of real data, one can only deduce model parameters via some sort of fitting algorithm without knowledge of subcompartmental dynamics, but the result suggests that least squares fitting optimally estimates the susceptibility parameter to maximize the simple SIR part of the model relative to the intrapopulation dynamics. A Model with Network Structure.-We now model subpopulations whose interactions are prescribed by a network with adjacency matrix A ij . A ij = 1 indicates two subpopulations that maintain frequent contact with one another, such as a pair of communities which go to the same grocery store or school. For the purpose of analysis, we modeled random connections between subpopulations as random Erdős-Réyni networks parameterized by mean degree. The SIR model approximates the graph dynamics model above b = 0.2, with the fit improving as the mean degree of each node is increased (Fig. 3) . However,β does not asymptote to β = 0.2 whether we use Eqn. 3 or least squares fitting. Instead, increasing mixing allows the model to gain awareness of how sparse the connections between subpopulations are, ultimately causingβ to decrease. When all subpopulations have the same size: Both numerator and denominator include a term which relies on graph structure and gains weight with the mixing parameter b and the covariance between the fraction of susceptibles and infecteds across subpopulations. Just as for a complete graph,β agrees with β from least squares fits. Narrowing of error bars as mean degree of the random network increases express the decreasing importance of network structure as the mean degree of the network increases. Connections to other macroscopic spread estimators.-We have shown that one can generate a single populationlevel transmission rate for a heterogeneous population. From β we may derive the basic reproduction number R 0 = β/γ, the expected number of secondary cases produced in a completely susceptible population, by a typical infectious individual [26] . We compare the results from Eq.3 with basic exponential growth rate fitting on the first 20 days of the simulation and the next-generation matrix method from [3] (Fig. 4) . In [3] , R 0 is computed for a general compartment model from the Jacobian matrix of the system. This matrix is evaluated at a disease-free equilibrium to determine the average number of individuals that a typical infectious person infects when the population is asymptotically disease free, i.e. S i = N i for each i. Applying the method in [3] to Eq. 4 and making the assumption that N i = N j for all i and j as in the simulations, we obtain: where ρ denotes the spectral radius of the matrix A ij . In the complete network case (as in Fig. 2 ), ρ(A ij ) = K − 1, which yields R 0 = β/γ, matching the asymptote in Fig. 4 . As the mean degree of our random network increases, R 0 values computed from the next generation matrix method in [3] and the R 0 computed fromβ in Fig. 4 converge. The exponential growth rate consistently underestimates the contact rate, while, for small values of b, the next generation matrix method R 0 exceeds the optimal estimate fromβ. The R 0 computed in [3] comes from the linearized dynamics-it provides a threshold for the stability of disease-free equilibria (see [27] ). By contrast the R 0 computed in this paper is computed by approximating the spread of the disease by a homogeneous model. Surprisingly, the two methods produce confluent results even under modest levels of mixing between subpopulations. Discussion.-Even loosely-mixed heterogeneous popu-lations can exhibit well-mixed dynamics that can be described by an SIR model. Importantly, the optimal susceptibility parameter is 1. obtainable from least squares fitting, and 2. interpretible in terms of intrapopulation dynamics. Our analysis of data from states and territories provide empirical evidence that there is enough mixing between the disease hotspots within each state or territory that approximation by a well-mixed model is appropriate. The concordance between β obtained by curve fitting and Eqn. 3 affirms that least squares fitting extracts information on the subpopulation interaction structure and dynamical asymmetry between subpopulations. Although our analysis of state data is restricted to the duration of stay at home orders, our framework allows for timevarying contact structures: for example a step function β can fit state data beyond May 20th [25] . Although we have examined what a simple susceptibility parameter could mean in the context of model fitting, two large questions remain. First Our state models are built up of well-mixed subpopulations; we make no assertion about whether these subpopulations represent households, neighborhoods, cities, or counties. Indeed, an important corollary of our analysis is that with even modest levels of connectedness between the subpopulations they can function as well-mixed, making the details of the substructure undetectable in the data. Second, state case data can be aligned by scaling population sizes, but the population sizes used for rescaling are unconnected to the state's real population. Since even loosely connected networks of subpopulations have closeto-well mixed dynamics, our results suggest that state populations must be divided into disconnected networks, and the effective N tot for a state measures the size of its disease-carrying networks. Whether these networks are truly disconnected, or the disease has a second phase of spread between networks is key to predicting its long term spread. The University of Kansas Science Bulletin The covid tracking project Infectious Disease Modelling See Supplementary Material at URL