key: cord-0193040-s2kqg25x authors: Fodor, Z.; Katz, S. D.; Kovacs, T. G. title: Why integral equations should be used instead of differential equations to describe the dynamics of epidemics date: 2020-04-16 journal: nan DOI: nan sha: d20e69fd76536e0367d3e3c797adda841e87db3a doc_id: 193040 cord_uid: s2kqg25x It is of vital importance to understand and track the dynamics of rapidly unfolding epidemics. The health and economic consequences of the current COVID-19 pandemic provide a poignant case. Here we point out that since they are based on differential equations, the most widely used models of epidemic spread are plagued by an approximation that is not justified in the case of the current COVID-19 pandemic. Taking the example of data from New York City, we show that currently used models significantly underestimate the initial basic reproduction number ($R_0$). The correct description, based on integral equations, can be implemented in most of the reported models and it much more accurately accounts for the dynamics of the epidemic after sharp changes in $R_0$ due to restrictive public congregation measures. It also provides a novel way to determine the incubation period, and most importantly, as we demonstrate for several countries, this method allows an accurate monitoring of $R_0$ and thus a fine-tuning of any restrictive measures. Integral equation based models do not only provide the conceptually correct description, they also have more predictive power than differential equation based models, therefore we do not see any reason for using the latter. A commonly used approach to describe the dynamics of epidemics is based on SEIR-type (Susceptible-Exposed-Infectious-Recovered) differential equations [1] [2] [3] [4] [5] . Recently these methods have been applied to the COVID-19 pandemic to determine the basic reproduction number [6] [7] [8] [9] [10] [11] , the incubation period [12] [13] [14] and to describe the dynamics of the pandemic [15] [16] [17] [18] [19] [20] [21] [22] . In this framework there is no natural and transparent way to account for the delay between the introduction of public health measures and the corresponding change in the newly reported cases. This delay occurs due to the incubation period of the disease. Even after refinements to try to account for some aspects of this delay [15] [16] [17] 23] , these models are still plagued by uncontrolled approximations. We suggest that models based on differential equations should be replaced with ones based on an integral equation, which explains the time delay without any additional step or input. The approach we present is not completely new, it was already implicit in the original Kermack-McKendric theory proposed in 1927 [24] . Even though several variants of that model with various degrees of precision circulate in the current literature [25] [26] [27] [28] , their superiority over the differential equation based method has not been widely recognised. We emphasise that in the form presented below, the integral equation based approach is neither technically nor conceptually more complicated than the one based on differential equations. More importantly, it gives a much more realistic representation of the epidemic dynamics. We demonstrate this by comparing how the two types of approach describe the New York City data [29] , taken from the currently ongoing COVID-19 pandemic. Firstly, we show that even in the initial simple exponential phase of the epidemic, the correct, integral equation based model can give a significantly larger estimate for R 0 , the basic reproduction number. Secondly, we show that the oscillations in the graph of the number of newly infected people after restrictive public interaction measures are introduced, is well described by the integral equation model. This is a generic feature of COVID-19 data, seen in several countries, and there is no simple way the differential equation based models could explain it. Thirdly, using data of several countries, we illustrate how the change of R 0 in time can be monitored using the integral equation formalism. Our comparison here is based on a simple SEIR-type differential equation model and its integral equation counterpart containing the same variables, but a different time evolution. It can be seen from our general discussion that the differences we find are generic features of the two types of models and carry over to more sophisticated versions thereof. We urge everyone to test how different their results are between the two approaches in the case of the particular models they use. The mathematical modelling of how infectious diseases spread is almost exclusively based on compartmental models. In this framework the population is divided into different categories and a dynamical model is set up to describe how the number of individuals in each category evolves with time. A simple model of this type is the so called SEIR model, in which the compartments are susceptible (not yet infected), exposed and infected (but not yet capable of infecting others), infectious, and recovered/removed (not capable of infecting others any more). In its simplest form the model is characterised by three parameters, α, β, γ that determine the transition rates among different compartments through the following set of differential equations: where S(t), E(t), I(t) and R(t) are the number of susceptible, exposed, infectious and recovered individuals, all functions of the time, and N = S + E + I + R is the total population. In the initial phase of a rapidly developing epidemic, the situation we are concerned with here, S/N ≈ 1, which we will assume. With this approximation the two functions describing the dynamics are E(t) and I(t). At this point it is instructive to introduce t e and t i , the average days an individual spends in the people who were exposed between today and t e = 4 days ago. The red area represents those that are infectious, i.e. people who were exposed more than t e days ago, but not more than t e + t i = 7 days ago. In the SEIR model the number of people who cease to be infectious from day 11 to day 12 is equal to the daily average of the red area in the top panel. In reality, however, only people in the leftmost red histogram, here a much smaller number, cease to be infectious. Similarly, in the SEIR model the number of people who become infectious is the daily average of the blue area, when in reality it is only people in the leftmost bar of the blue area who become infectious from day 11 to day 12. Clearly the only case when the SEIR model correctly represents the situation is when the number of newly infected people per day is constant in time. categories E and I respectively. It is convenient to write the equations in terms of these parameters and the basic reproduction number, R 0 , using the simple relations α = 1/t e , β = R 0 /t i and γ = 1/t i . With these new parameters the simplified form of the differential equations valid for the initial stage when S/N ≈ 1 is d dt The SEIR equations are based on the assumption that the transition rates between compartments are proportional to the number of people in those compartments. Furthermore, all people in a compartment are treated equally, irrespective of how much time they already spent there. These assumptions are only correct when R 0 is close to 1 and changes slowly. In the following we derive the integral equation description of epidemics. Let us denote the number of newly infected individuals on day t by ρ(t). The function ρ(t) contains the entire history of infections. This can be depicted in a histogram with time flowing from left to right along the horizontal axis ( Fig. 1 ). Based on this information we would like to determine ρ(t + 1), the number of individuals becoming infected on the following day. As in the SEIR model above let us here also assume that anyone exposed to the infection will become infected, but will not be infectious for t e days. Such a person will become infectious t e days after exposure and will remain in this category for an additional t i days, after which he is isolated and ceases to be capable of infecting others. In the simplest version of the model t e and t i can represent averages, but later on we will indicate how to generalise the model by using continuous distributions. In Fig. 1 the blue area represents those in category E, with t e = 4, and the red area those in I, with t i = 3. Now, exactly as in the SEIR model above, the number of newly infected on the following day, day 12 in the figure, will be where the sum is the total number of infectious individuals at time t, i.e. the red area in the top panel of Fig. 1 . At this point we can depict the situation at time t + 1 simply by shifting both the blue and the red region, showing the people in category E and I, one day forward to the right (bottom panel of Fig. 1 ). From this point on the same procedure can be further iterated day by day: on each day we calculate the number of newly infected for the following day and shift the windows delineating the exposed and the infectious by one day forward. At this point the objection could be raised that we have been comparing the continuous SEIR model with a discretised model. However, the framework presented in the histograms can be easily generalised from days to arbitrarily fine time steps. In the limit of infinitely fine time steps dt, we can choose the height of the histogram bars ρ(t) such that ρ(t)dt be the number of people becoming infected in the time interval [t, t + dt]. In this case the sum in the right hand side of Eq. (3) becomes an integral and the equation can be rewritten as We emphasise that only this continuous version of the model describes the real situation properly. Determination of R 0 , the example of New York City We illustrate the shortcomings of the SEIR model in two examples. First let us consider the case of constant R 0 > 1 which leads to the exponentially growing solution with exponent λ, i.e. all relevant quantities (E(t), I(t) of SEIR and ρ(t) of the integral equation) are proportional to exp(λt). A simple substitution gives: is the solution of the discretised equation (3) with time step ∆t. Figure 2 shows these three R 0 (SEIR, integral equation, discretised integral equation) obtained using realistic parameters that describe the initial phase of the pandemic in New York City. There is a striking difference between the SEIR and integral equation predictions. While the SEIR result is largely insensitive to the split of the total incubation period into t e and t i , this is not the case for the solution of the integral equation. The true value of R 0 can be 3-4 times larger than the one predicted by the SEIR model. For t i ≈ 2 days which, as we will later see, provides a reasonable description of the daily reported cases, R 0 can be as high as 20. R 0 is, of course region dependent. Less populated areas can have much smaller R 0 values. (4) and its discretised version (3) with an hourly time step, respectively. Their agreement indicate that such a discretisation is sufficient. A daily discretisation would give 34% higher R 0 values. One might think that this difference is only due to the relatively large incubation period (as compared to the characteristic growth time of the pandemic) and the SEIR model gives reliable estimates for small incubation periods. Surprisingly this is not the case. A trivial expansion of R The effect of decreasing R 0 , delay and oscillation The parameter R 0 in both the SEIR-and the integral equations is in general time dependent. The most important goal of the first restrictive measures was to decrease the value of R 0 as fast as possible and bring it below the critical value of 1. In the following we study how the different models react to a sudden drop of R 0 and compare them qualitatively to the daily reported new cases in New York City [29] . To simplify our discussion we will assume that all cases are reported exactly after the incubation period, i.e. t e + t i days after first exposure. In the case of the SEIR model the rate of people leaving the incubation period is dR/dt = γI while in the case of the integral equation it is ρ(t − t e − t i ). These two functions will be referred to as "new reported cases" in the following. If we assume that the majority of cases are reported when symptoms emerge, it follows that any change in the parameters will only show up t e + t i days later in the data. Any reasonable model should be able to naturally account for this delay. Quantities in differential equations react immediately to any change of the parameters, thus no differential equation based model (such as SEIR or its simple extensions) is expected to explain such a delay. The integral equation (4), on the other hand, naturally provides a delay. The value of R 0 was instantaneously decreased to 0.95 in both cases on day 73. The difference between the two models is quite dramatic. The SEIR solution reacts immediately and turns smoothly to a decreasing exponential function. The solution of the integral equation is qualitatively different. There is a delay of t e + t i after which the effect of the lockdown becomes visible, then an oscillation follows. The characteristic scales of this oscillation are indicated in the figure. Right: time evolution of the new reported cases after a gradual lockdown from day 71 to day 74 confronted with the data of New York City. The initial parameters are the same as for the left panel. R 0 is now decreased to 0.95 linearly in time during this four day period in both models (for a detailed analysis how R 0 changed and dropped to a smaller value in New York City see later). The main features of the curves are similar to the left panel but the amplitude of the oscillation is reduced, making it similar to the real data. Note that we did not fit our model to the data. This plot is an illustration that the main features (delay and oscillation) of the data are well captured by the integral equation and neither of these is reproduced by SEIR. The simplest possibility is to assume that R 0 decreases instantaneously after a successful lockdown. The left panel of Figure 3 shows how the number of reported new cases evolves after a lockdown happens on March 13 (day 73 of 2020) which reduces R 0 to 0.95. A more realistic situation is shown in the right panel of Figure 3 . Here the value of R 0 was gradually decreased from March 11 to March 14. In both cases the expected delay, which is clearly visible in the data, is only explained by the integral equation. The reported data of many countries show an interesting oscillation after the effect of a lockdown starts to show up. This feature is also naturally described by the integral equation. One remark is in order. The incubation period of 6.4 days is accidentally very close to the weakly cycle of 7 days. Thus, the oscillation might in principle be just a weekend effect. Looking at the weekend during the strongly exponential growth with enough statistics (March 14, Saturday to March 16, Monday) one observes less cases during Saturday and Sunday and an accordingly higher number of cases on Monday than the average exponential growth would predict. Correcting the data for this effect later would weaken but not eliminate the observed oscillation, it seems to be a real effect. This is also supported by the fact that different countries have minima and maxima of their oscillation on different days (e.g. in Italy the minima are on Mondays and Tuesdays, in the Netherlands they are on Tuesdays and Wednesdays). It is in principle possible to determine the incubation time t e and t i from this oscillatory pattern. Taking the distance of subsequent minima we determined t e + t i . Using data of New York City, Italy, Spain, Germany, and the Netherlands [29, 30] the period of the oscillations seems quite robust and is around t e + t i = 7.4 days on average with a spread of 0.2 days. Once t e and t i are known, one can solve eqn (4) for β(t) or equivalently R 0 (t) by taking ρ(t) and the (numerical) integral from the actual data: In this way one can continuously monitor the effect of restrictive or easing measures. We used this procedure to analyse data from seven countries as well as New York City. In each country the raw data for the daily number of new infections shows large daily fluctuations, most likely due to some anomalies in reporting new cases. In order to remove these fluctuations we performed the following smearing procedure. If the original data is given byρ(t) then we replace this by wρ(t−1)+(1−2w)ρ(t)+wρ(t+1) and repeat this procedure three times. We checked that the function R 0 (t) depends only mildly on the choice of w. The results, for which we used w = 0.25 are shown in Figure 4 . For all regions we can observe a large drop in R 0 . However, while in China R 0 went significantly below 1, effectively stopping the epidemic, in European countries and New York City it is just below or around 1 even today. It seems inevitable that in these regions any significant easing of the restrictions will induce a second wave of the epidemic. An accurate monitoring of R 0 is extremely important in this situation. As demonstrated above, the integral equation formulation is the right approach here. In the previous sections we compared the SEIR model and the integral equation when both t e and t i are fixed. In reality the incubation period is described by a probability function P (τ ) which gives the probability that a person is infectious a time τ after exposure. The integral equation (4) can easily be generalised to include P : In general, the transmission rate β also changes in time since new restrictive measures can be implemented. This can also be incorporated into the equation, however, in that case R 0 might not have a transparent interpretation. This equation has been around in the literature for a long time (see e.g. [25] [26] [27] ) but unfortunately it has not yet been widely adopted. Any evolution computed using this equation is expected to have qualitatively similar features (delay and oscillation) as in the simple approximation presented above. The SEIR model is not a good approximation of this integral equation even if the P (τ ) probability is built into its parameters. For later stages of a pandemic when S/N < 1 the integral equation can be generalised to include S as well. Furthermore, one can divide the population into sub-compartments e.g. by age groups or location. Each sub-compartment with a population of N i can be described by its own ρ i (t) infection history and P i (τ ) probability function and the cross infection rates between compartments j and i by β i←j . The 26)). For completeness the original data are also shown (right vertical axes). Clearly as health authorities are getting better and better data the accuracy on R 0 determination will improve. generalised coupled integral equations in this case read: As an example, we ran some simulations of the full course of the epidemic until S saturated close to zero. We found that depending on the parameters the integral equation and the SEIR equations can predict a significantly different maximal number of cases at the peak of the epidemic. Any further extension which can be included in the SEIR equations (e.g. birth and death rate, day/night differences, inhomogeneities, meta-population systems, etc.) can also be naturally included in the integral equation formalism. We demonstrated that the approximation leading to differential equation based models of epidemic spread is not sufficient in the case of the COVID-19 pandemic. In particular, differential equations provide a good description only if R 0 is close to 1 and if it changes only slowly compared to the incubation period of the disease. Note that even if R 0 is larger than 1, the initial exponential phase of the epidemic can be well described by differential equations, but they can significantly underestimate the value of R 0 , the larger the R 0 , the more so. If, however, as a result of restrictive measures, R 0 suddenly drops, differential equations fail to describe the ensuing oscillatory behaviour of the number of new cases and as a result, they are not suitable for a precise monitoring of R 0 . In contrast, integral equations are naturally suited to this task, even if sharp changes occur in R 0 . We also presented a simple way to implement the integral equation formalism in numerical simulations, which do not become more complicated than the numerical solution of the SEIR equations. In summary, the integral equation approach has more predictive power than the most widely used differential equation based models and also eliminates a serious uncontrolled approximation of the latter. In the light of the present results we urge practitioners of the field to rethink and possibly consider implementing the integral equation based technique. Infectious Diseases of Humans: Dynamics and Control, Dynamics and Control Transmission dynamics and control of severe acute respiratory syndrome The use of mathematical models to inform influenza pandemic preparedness and response Pandemic potential of a strain of influenza a (h1n1): early findings Stage-structured transmission of phocine distemper virus in the dutch 2002 outbreak Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions An updated estimation of the risk of transmission of the novel coronavirus (2019-ncov), Infectious disease modelling Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan, china Early dynamics of transmission and control of covid-19: a mathematical modelling study The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application Incubation period of 2019 novel coronavirus (2019-ncov) infections among travellers from wuhan, china Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak The effect of human mobility and control measures on the covid-19 epidemic in china The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2) Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions Risk assessment of novel coronavirus covid-19 outbreaks outside china How will country-based mitigation measures influence the course of the covid-19 epidemic? Projecting the transmission dynamics of sars-cov-2 through the postpandemic period Characterizing the dynamics underlying global spread of epidemics A contribution to the mathematical theory of epidemics, Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character Integral equation models for endemic infectious diseases Appropriate models for the management of infectious diseases How generation intervals shape the relationship between growth rates and reproductive numbers Estimating the number of infections and the impact of non-pharmaceutical interventions on covid-19 in 11 european countries For the daily reported cases in Austria we used