key: cord-0836455-rjmkg43m authors: Kolokolnikov, Theodore; Iron, David title: Law of mass action and saturation in SIR model with application to Coronavirus modelling date: 2020-11-16 journal: Infect Dis Model DOI: 10.1016/j.idm.2020.11.002 sha: de1de60ce3c0ace40a2f57f36457b73808775cf5 doc_id: 836455 cord_uid: rjmkg43m When using SIR and related models, it is common to assume that the infection rate is proportional to the product of susceptible and infected individuals. While this assumption works at the onset of the outbreak, the infection force saturates as the outbreak progresses, even in the absence of any interventions. We use a simple agent–based model to illustrate this saturation effect. Its continuum limit leads a modified SIR model with exponential saturation. The derivation is based on first principles incorporating the spread radius and population density. We use the data for coronavirus outbreak for the period from March to June, to show that using SIR model with saturation is sufficient to capture the disease dynamics for many jurstictions, including the overall world-wide disease curve progression. Our model suggests the R(0) value of above 8 at the onset of infection, but with infection quickly “flattening out”, leading to a long-term sustained sub-exponential spread. As the coronavirus pandemic has spread and dramatically altered human lives on a global scale, it also generated an intense interest to model the epidemic. These include ODE models [1] [2] [3] [4] , statistical models [2, 5] , agent-based models [6, 7] , spatial network models [5, 8] and PDE models [9, 10] ; see also [6, 11] for reviews. Some of these papers had an outsized influence on public thinking and policy [3, 7] . At the heart of many of these efforts are the so-called compartmental models, consisting of various classes of individuals and their interactions. The simplest such model is the classic SIR (Susceptiple-Infected-Removed) model. It has the form where S, I and R is the susceptible, infected and removed (or recovered) population, respectively, and N is the total number of infected individuals. The basic assumption underying SIR and related models is that the disease spreads when an susceptible individual encounters an infected individual. The probability of this encounter is typically modeled using the law of mass action from chemical literature, where this probability is propotional the product SI of susceptible and infected population, with β being the infection rate. The infected individuals then recover with a recovery rate γ. Can SIR model by itself capture the coronavirus infection? To answer this question, we took data from the from John Hopkins CSSE database [12] , which lists the total recorded infections as well as the number of recovered and dead for different countries. We took the period from March 15 to June 15, 2020, corresponding to the initial wave of infection (and mostly after the implementation of the quarantine measures). We used the sum of recovered and dead as a proxy for "removed" population R(t), while I(t) is obtained by substracting by substracting "removed" from the total number of infected. We can then estimate β and γ directly from the equations (1) by solving β ≈ − N S SI and γ ≈ R I (where we approximate the derivatives using the finite differences on one-week running averages). This is shown in Figure 1 . We took the data from March 15 to June 15. As can be seen, the recovery rate γ is relatively constant, while the infection rate β has a clear decrease over time. Similar decrease has been noted by others, see e.g. [13, 14] . In this note we propose a simple modification of the basic SIR model (1) to incorporate spatial distribution of population, and which can partially explain the decrease of β over time, even without time-dependent changes in parameters. Our implementation also naturally incorporates the population density into the model. We also test the modified model against the COVID data. We start with the following very simple agent-based model that implicitly incorporates spatial component. Assume that each agent has some "foraging area" characterized by radius r. Agents can only encounter each-other within their respective foraging area. Upon encounter, assume there is a probability p of infection. Assume that a susceptible individual spends on average µ fraction of the time interacting with others (i.e. going shopping, to bars, etc). Finally an, infected individual recovers with a daily rate γ. The matlab implementation of this is shown in Figure 2 . We now derive the mean-field limit of this agent-based model. Assuming the agents are well Figure 2 : Top: self-contained Matlab code for simulating ABM and its mean-field limit (2) . Cut-and-paste to run. Bottom: the output of the simulation. mixed, the chances of encountering an infected individual and then getting infected is given by a = p r 2 π A , where A is the total area of the domain. When I infected agents are present, the chances of getting infected is then given by µ(1 − (1 − a) I ). We also assume that the foraging area is sufficiently small (compared to the total area A) so that 0 < a 1, in which case we may approximate µ(1 − (1 − a) I ) ∼ µ (1 − exp (−aI)) . As the result, the mean-field limit of this agent-based simulation takes the form (2) Figure 2 shows the direct comparison of the agent-based model described above, and the meanfield limit (2); excellent agreement can be seen. The parameter a can be rewritten in terms of the population density as follows. Write A = N/ρ, where N is the total population and ρ is the population density, so that This shows that the parameter a is directly proportional to the population density. Non-pharmaceutical interventions are used to reduce µ, α and increase γ. In the context of coronavirus, the rate of interaction µ is decreased by stay-at-home orders. Social distancing and mask-wearing will have the effect of reducing the exponent α. Finally, isolating infected individuals has the effect of increasing γ. The model (2) reduces to the classical SIR model (1) for small infection rates I via Taylor expansion 1 − exp − α N I ∼ α N I, so that β = µα. This shows that the reproduction number R 0 = β/γ = ρµpr 2 π/γ is proportional to the population density ρ. This is consistent with the progress of the epidemic: large cities with high population densities (e.g. New York, Toronto, Montreal...) have been hit much harder than smaller localities with lower population density. As the number of infected I increases, the saturation terms kicks in and eventually stabilize the force of infection, leading to linear rather than exponential growth. We used nonlinear least-squares method to fit the model (2) to the data for several juristictions. This consists of minimizing the L 2 distance to fit both the infected as well as recovered individuals simulataneously: min Here, I j , R j is the real-world data from John Hopkins database [12] , and I(t), R(t) is the solution to (2) for given parameters α, β, γ. In this way, we solve all three parameters, without having to assume a-priori values for any of them. We used python's standard scipy.optimize package to perform the minimization. We also experimented with different distance metrics, such as log distance, which gave very similar results. Figure 3 shows the fit for several juristictions with a good agreement. The following table gives the corresponding parameter values for these jurisdictions: It is interesting to note that the values of parameters are relatively consistent (same order of magnitude) among the different data sets. Also, the parameter γ is consistent with other estimates in the literature [2, 15] . For the US dataset, we used US census to compute the effective population density for urban areas where the majority of population resides [16] . We obtained an estimate of ρ ≈1000 people per km 2 . This gives us a value of a = pr 2 π ≈ 5.8 km 2 . It is hard to disentangle the probability p of the infection from from the infection area πr 2 per individual. One possibility is to use the average daily driving range estimate, which was reported in [17] to be 75 km involving on average 2.3trips. This gives r ≈ 32 km and p ≈ 0.0017. However this does not account for other modes of transportation (walking, public transport, etc). For Canada, we used Canadian Census [18] which lists 942 Canadian cities and towns having a total area of 23998 kmˆ2 and a total population of about 27 million (out of a total population of 37 million). This gives an effective density estimate for Canadian population of ρ ≈ 1174 people/kmˆ2, yielding a value of a = pr 2 π ≈ 11.2 km 2 , about twice that of the US. Assuming similar driving staistics as the US, would yield the same radius r and p of about twice as as much as the US. For the model (2), the reproduction number R 0 is given by R 0 = µα/γ. For world, US, Canada and Russia, R 0 is 8.1, 49.13, 13.26 and 8.8, respectively. It is in the higher-end of the current estimates for Covid-19. However, the time-dependenent (dynamical) reproduction number, R t , Figure 4 : Dynamical reproduction number R t for for (2) using the best-fit parameters. It is relatively high at the start of the outbreak but quickly decreases. decreases rapidly due to saturation. It is given by and is plotted in Figure 4 . Currently US appears to have the highest growth rate with Canada the slowest of the four. Note that Russia had a long period of exponential growth, which is reflected in a relatively long period of near-constant R t before R t comes down. We have presented a modified SIR model with saturation. Rather than using a phenomenological saturation [6, 19] , we used agent-based modelling and then derived the mean-field limit continuum limit using basic probabibility. We were able to fit the model parameters to obtain a good agreement with the disease progression during the period of March 15 to June 15, when quarantine and shutdown policies were mostly in-place. Without saturation, the fit is very bad. With saturation, it is relatively good. Right: synthetic data from ABM model. The curve with saturation corresponds to the exact parameters used in ABM simulation without any fitting. The curve without saturation is the "best-fit" parameters and does not accurately capture the peak of infection. Fitting the original SIR model (1) without any saturation and without dynamically changing the infection rate β gives rather terrible results as illustrated in figure 5 . An alternative is to assume that parameters are time-dependent [2, 4, 9, 20] . Other models that can reproduce sub-exponential growth include the the generalized logistic growth model [21] . One may ask whether a model with more that just three compartments (such as SEIR, E=exposed, or SAEIR, A=asymptomatic etc) but without saturation can produce a better fit than SIR with saturation? Looking at the data of Figure 3 , the infected cases exhibit what looks like a subexponential growth. At least in this stage of the epidemic (as of June 2020), only a small fraction of the total population has been infected (worldwide, less than 1%). Without saturation, regardless of the number of compartments used, the model becomes effectively linear when only a small fraction is infected (e.g., β N SI ∼ βI in (1)). Linear ODE models generically exhibit exponential growth, so it is impossible to obtain sub-exponential growth without either having saturation, or time-dependent parameters. We completed the first draft of this paper by the end of June. By July, the US re-entered a period of fast coronavirus growth, mostly due to re-opening of economy. In effect, the parameters "on the ground" have shifted. By the time the paper was accepted in October 2020, a third and a more severe wave is well under way. Figure 6 shows the estimated β and γ for the US from April to end of October, 2020. A second peak in β is visible in mid-July. Then from beginning of September, β starts to grow again with no end in sight as of this writing. This third wave coincided with reopening of schools, and is the most intense of the three waves so far. One possible explanation is that asymptomatic transmission is increased with kids acting as disease vectors. There is also a possibility of mutation to a more contageous strain. A more complex model with time-dependent infection rate is required to better model the progression of the disease on a longer time-scale. This is left for future work. The scaling of contact rates with population density for the infectious disease models The challenges of modeling and forecasting the spread of covid-19 Projecting the transmission dynamics of sars-cov-2 through the postpandemic period A path out of covid-19 quarantine: an analysis of policy scenarios Measles metapopulation dynamics: a gravity model for epidemiological coupling and dynamics How should pathogen transmission be modelled? Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: Theory and simulations Spatial modeling of covid-19: Greece and andalusia as case examples Localized outbreaks in an sir model with diffusion Modeling the spread of infectious diseases: A review, Analyzing and modeling spatial and temporal dynamics of infectious diseases An interactive web-based dashboard to track COVID-19 in real time Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions Fast spread of covid-19 in europe and the us suggests the necessity of early, strong and comprehensive interventions, medRxiv Science forum: Sars-cov-2 (covid-19) by the numbers Urban Areas in the United States and Puerto Rico American driving survey: Methodology and year one results A generalization of the kermack-mckendrick deterministic epidemic model Inferring change points in the spread of covid-19 reveals the effectiveness of interventions Realtime forecasts of the covid-19 epidemic in china from