key: cord-0905648-ewnbn36x authors: Schmitt, François G. title: An algorithm for the direct estimation of the parameters of the SIR epidemic model from the I(t) dynamics date: 2021-12-23 journal: Eur Phys J Plus DOI: 10.1140/epjp/s13360-021-02237-7 sha: 6b4888cf5b0b9a5c242d81b732031cc7e3f39b28 doc_id: 905648 cord_uid: ewnbn36x The discrete SIR (Susceptible–Infected–Recovered) model is used in many studies to model the evolution of epidemics. Here, we consider one of its dynamics—the exponential decrease in infected cases I(t). By considering only the I(t) dynamics, we extract three parameters: the exponent of the initial exponential increase [Formula: see text] ; the maximum value [Formula: see text] ; and the exponent of the final decrease [Formula: see text] . From these three parameters, we show mathematically how to extract all relevant parameters of the SIR model. We test this procedure on numerical data and then apply the methodology to real data received from the COVID-19 situation in France. We conclude that, based on the hospitalized data and the ICU (Intensive Care Unit) cases, two exponentials are found, for the initial increase and the decrease in I(t). The parameters found are larger than reported in the literature, and they are associated with a susceptible population which is limited to a sub-sample of the total population. This may be due to the fact that the SIR model cannot be applied to the covid-19 case, due to its strong hypotheses such as mixing of all the population, or also to the fact that the parameters have changed over time, due to the political initiatives such as social distanciation and lockdown. Among the different models that have been proposed for epidemic evolution, the SIR model is one of the most studied. In its continuous version, it is a system of coupled differential equations indicating the evolution of three quantities: the number of "susceptibles" S(t) of a population corresponding to individuals that can be infected but are not yet infected by a disease; the number of infected people I (t) and the number of recovered people R(t) [1] . In its original and simple version, this model assumes an homogeneous mixing in the population, so that contact is possible between any member of the S population and the I population. The R population has immunity and cannot be infected again. a e-mail: francois.schmitt@log.cnrs.fr (corresponding author) There is a total of N elements, and the sum of each subpopulation is stable: S + I + R = N . The system of evolution equations can be written as: where S = d S/dt, β > 0 and α > 0 are parameters and N 1 is the total size of the population. The parameter β characterizes the intensity of the infection, and α the intensity of recovery. Since N , S, I and R have all the same dimension (number of individuals), for the equilibrium of dimensions the parameters β and α have both a dimension of 1/time. T I = 1/α is the mean time spent in the I class [2] . In Eq. (1), we can focus only on the first two lines, since the dynamics of R = N − S − I is deduced from the values of S and of I . This model has been studied in many textbooks and review articles [2] [3] [4] [5] [6] [7] . The initial exponential increase in I (t) has been well documented, together with S monotonous decreasing, and R monotonous increasing. However, to our knowledge from studying the literature, the exponential decrease in I (t) has not been closely studied. Here, in the next section, we estimate the coefficient of the exponential decrease, showing that its value is not the same as the initial exponential increase. Using only the I (t) dynamics, and more precisely the two values of the exponential slopes (for increase and decrease) and of the maximum value of I , we further show that all parameters of the SIR model can be estimated numerically. In a further section, we numerically check this procedure and show that all parameters can indeed be estimated with a rather good precision. Finally, we apply this procedure to the COVID-19 pandemic in France, assuming that it corresponds to a SIR model with constant parameters. First, we will recall how the initial exponential is obtained. This is classical work and is here quickly reviewed. There is a initial small number of infected people: We can also assume that R = 0 initially, and that S = N − I − R is very close to N , so that initially S/N ≈ 1. Then, Eq. (1) has the following form for small time periods: giving an exponential initial increase for the infected population: where the exponent is: The growth rate needs to have β > α to have an exponential increase. The following notation is classical, introducing the basic reproduction number R 0 : so that the condition β > α may be written R 0 > 1. When this number is larger than 1, there is an exponential increase in the infected population. Let us divide the second line of Eq. (1) by its first line. We obtain: Hence, the evolution of I (S) (time vanished) is the following: where C is a constant obtained considering for t = 0, I (t) = I 0 and S(t) ≈ N . Hence, This then gives, negating the term I 0 /N which is very small compared to other terms: I first increases, then decreases. The maximum value I max is found for S = S max = N R 0 , and we have for I max : The maximum value of the rescaled I max depends only on R 0 . The curve which is obtained is shown in Fig. 1 . We see that it increases nonlinearly with R 0 . We know that at infinite time, we have limit values denoted S e , I e , R e . By dividing the first line of Eq. (1) by the third line, solving the differential equation obtained, we find finally the evolution of the proportion of S versus R: showing that there is an exponential decrease in susceptible population with respect to the recovered population. We also have: since R < N . This shows that S is decreasing and is minored by a positive value, hence S e > 0. An example of dynamics of S(t), R(t) and S(t), chosen for the parameters I 0 = 2, N = 100, 000, β = 1.5 and R 0 = 2 is shown in Fig. 2 . This illustrates the fact that I (t) → 0 and S(t) → S e , as t → ∞. Below we show how S e can be estimated. First, let us do as in [2] , and consider Eq. (1) integrated: (13) giving: This shows that I is integrable, hence for its limit value: I e = 0. This corresponds to the end of the epidemic. The corresponding value of S e can be found by taking I = 0 in Eq. (9) and is given by: This is an implicit relation providing S e /N for a given value of the parameter R 0 . We can consider 1 < R 0 ≤ 10 and solve numerically the implicit relation: find the solution of f (x) = 1 R 0 log x + 1 − x = 0 giving the value of x = S e /N . The result is shown in Fig. 3 . Three curves are shown, for R 0 = 1.5, 2 and 2.5. The value of S e /N is the intersection with the horizontal axis. Doing this numerically for a number of values of R 0 , we obtain the curves given in Fig. 4 providing S e /N vs. R 0 . On the left panel, we show a log-plot, together with the fit S e /N = exp −3.9( √ R 0 − 1) . This fit is excellent until a value of R 0 4.5. Fig. 4 , we see that it is approximately valid for R 0 ≥ 5. Since R + I + S = N , for the end point for which I e = 0 we have R e /N + S e /N = 1; hence, the relative frequency of recovered people R e /N at the end of the epidemic is R e N = − 1 R 0 log S e N . This part and the following are the main contributions of the present work. There is no known analytical expression for the behavior of I (t) for all values of t. We know that there is an exponential increase for small values of t, then a maximum, then a decrease in I (t). Let us introduce the following constant: Hence, we have as a limit law an exponential decrease in I (t) with characteristic exponent Γ : where the value of the constant C > 0 cannot be directly deduced from the SIR model. It is proved in the appendix that Γ > 0 when R 0 > 1 and β > 0. Using the numerical extraction of S e for each value of R 0 , let us plot the curves for the exponents γ /β and Γ /β. This is shown in Fig. 5 . The first exponent is increasing monotonically, whereas the second one reaches a maximum value of Γ max 0.2984 for R 0 max 2.15, after which it decreases. Since Γ is estimated through a nonlinear and implicit relation between S e /N and R 0 , an explicit analytical expression for the values of Γ max and R 0 max is not available. We also see that we always have γ > Γ , showing that the two exponents are different as far as R 0 > 1. The shape of I (t) is always asymmetric. To our knowledge, this is the first time the slopes γ and Γ have been proved to be different and compared for the SIR model. From the previous relations, we can propose here an algorithm to extract all SIR parameters only from the infected population dynamics. We assume here that we have experimental data providing the curve I (t) of infected people. This is the easier to measure, since the S population is not clearly defined (say, all the population of a country, or a sub-sample perhaps?). The I (t) curve is typically bell-shaped, with an increasing part, a maximum value and a decreasing part. As seen in the previous section, this curve is asymmetric: it decreases more slowly than it increases, but visually using a linear plot, this asymmetry is not always We also see that we have always γ > Γ very clear. Our objective here is, assuming a SIR model, to extract all relevant parameters only from this curve. There are three parameters needed to fully define this model: R 0 , β, and N (α is deduced when R 0 and β are known). Indeed, N is a priori unknown since we do not know what the subsample of the total population is which is interacting homogeneously to obtain the I (t) dynamics. S e /N will also be a relevant value to be estimated, since it is included in the exponential decrease: it is deduced from the value of R 0 . Here, we provide a recipe for estimating all these parameters only from the I (t) curve. (1) The first step is to estimate the coefficient of the exponential increase at the start, I (t) = I 0 exp γ t . The numerical estimation of the slope γ gives: The exponential decrease I (t) = Ce −Γ t provides the value of the parameter Γ : (2) For the second step, R 0 is estimated. We consider the ratio ρ = γ /Γ for which there is no more dependence on the parameter β: The ratio depends only on R 0 and Se/N , but the latter is related to R 0 through a nonlinear relation plotted in Fig. 3 , and hence, the R 0 value can be estimated from ρ using a nonlinear relationship which is shown in Fig. 6 . From the nonlinear relationship represented in this figure, we can deduce the experimental value of R 0 . When R 0 is large, we know from Fig. 4 that S e /N 1, and hence, using Eq. (21) ρ simplifies into: which is also shown in the same figure as an asymptotic relation. In such cases, when R 0 ≥ 6, we simply have R 0 = ρ + 1 (and also Γ = α; β = Γ R 0 ) or otherwise R 0 is estimated by an interpolation of the plot of this nonlinear curve. (3) From this, using Eq. From the value of R 0 , we can also estimate S e /N using the plot shown in Fig. 3 . (4) A last step is the experimental estimation of the maximum value I max of I (t), which gives the value of N through the following relation: In this section, the above algorithm is applied to two cases: first numerical simulations of the model; and then data from the real world. In this subsection, numerical simulations are performed in order to check the algorithm which has been proposed here. The simulations are done numerically by running the SIR model with given values for the parameters β, α, R 0 , and N ; choosing an initial value of I 0 . Subsequently, the algorithm proposed above is applied exclusively on the I (t) curve, in order to consider if the estimated parameters are close to the ones used to perform the simulation. This is a consistency test. Taking I 0 = 2, N = 100, 000, and different values of R 0 and β, let us first consider R 0 = 2.0 and β = 1.5. Figure 7 shows the evolution numerically of I (t). We see an increase, then a decrease in the curve of infected people. On the right panel, the curve is shown in log-linear plot. Straight lines indicate an exponential initial increase then decrease. We see that the shape is not symmetrical. The decrease is slower than the increase. The values of γ and Γ are extracted from these curves by a best fit of the exponential law, for the region where straight lines are visible. This has been done keeping β constant and varying the value of R 0 . The result is shown in Fig. 8 , in log-linear plot to emphasize the exponential curves, for the value β = 1.5, but other values have of course been tested. All curves have similar behavior, with different time values for the maximum, and also different decrease and increase slopes. Let us now consider the numerical values found for simulations with a fixed value for β (β = 1.5, 2 and 3) and values of R 0 from 2 to 5. Using the procedure described above, we estimate the value of I max , the slopes γ and Γ , their ratio ρ, and obtain the value of R 0 , β, and N . All the values for the examples shown are visible in Table 1 . We see that using this procedure the values of the parameters R 0 and β are estimated with an error below 1%, and N with an error of a maximum of 5%, and most of the time below 2% . It is also visible that estimated values of β and N are very slightly decreasing with increasing values of R 0 . This numerical test shows that the procedure proposed here is able to extract the correct values of parameters, with a very good precision, probably not accessible from experimental data. In the next section, this procedure is applied to real-world data. In order to test the proposed procedure from real-world data, and in the framework of the COVID-19 crisis, we have chosen here data from the COVID-19 pandemic in spring 2020 in France. The sources come from the French government (data publicly available from dashboard.covid19.data.gouv.fr). A confinement of the population was effective on March 17, 2020 and lasted for almost 2 months, with a progressive opening of the country beginning on May 11, 2020. The data of March 17 are taken as day 0 in the data reported, and two series are considered consecutively as I curves: first the total hospitalized people due to COVID-19 and then, the total Intensive Care Unit (ICU) people due to COVID-19; both recorded daily. In each case, 100 days are considered. Table 1 Chosen values for a numerical test of the procedure described in the text. The values of β, R 0 , and N are given choices, and the values with * are numerical estimates. We see that the parameters β and R 0 are retrieved with a high precision of a maximum of 1 % error, and for N , with an error of a maximum of 5%, and most of the time below 2% Estimation of parameters Figure 9 shows the plot of I (t), having the correct shape. We find a maximum value for t = 28 (April 14), with I H −max = 32, 292. The exponential fits are the following: exp(0.162T + 8.1) for the initial increase, and exp(−0.021T + 11.15) for the decrease, where T is the time in days. This gives γ H = 0.162 and Γ H = 0.021. Hence, the ratio is We then also consider another classification, when the curve I is I icu -the number of people in intensive care units (ICU) in France since the beginning of the pandemic. This is shown in Fig. 10 , where the double exponential behavior is also found. The two slopes are here γ icu = 0.16 and Γ icu = 0.0346; and also the value I icu−max = 7148. We thus obtain γ icu /Γ icu = 4.62, which is quite large again. This gives R 0−icu = 5.52. We conclude with the following values: β icu = γ icu × 1.22 = 0.195, α icu = 0.0354 giving a mean residence time of ICU state of 28 days; and also N icu = 1.96 × I icu−max = 14000. For a given epidemic situation, the basic reproduction number R 0 can be defined independently of the model considered: it is the average number of secondary cases in the infected population [8, 9] . The significance and estimation of the basic reproduction number is a matter of great interest [8] [9] [10] [11] [12] [13] . This number is difficult to measure directly and is a complex metric [13] . We have shown here how to obtain it directly in the framework of the SIR model, when the infection curve dynamics is known. R 0 is directly and precisely estimated from the ratio ρ = γ /Γ of initial and final exponential slopes of I (t). The estimation has been shown to be quite precise. Other methods have been proposed to extract R 0 : first, α can be estimated as the inverse of the mean residence time in the infected situation for a given individual. However, this method requires the information about the situations of a large number of individuals, which is not always the case or possible. Next, parameters are extracted by performing a "best fit," guessing first some values, and then performing an optimization through a nonlinear fit [2] . Other more complex methods define a next-generation operator and extract R 0 from the eigenvalues of this operator [11, 14] . Since the outbreak of the COVID-19 pandemic in China late 2019, a huge number of works have been submitted and published: in March 2020, there were already 500 papers in the press [15] . Among them, many have been devoted to applying the SIR model to different cities or countries: to several countries [16] ; to Wuhan administrative region [17] ; to Italy [18] ; to Brazil [19] ; among others. Other variants have also been proposed, such as the SEIR model for the USA [20] , or the stochastic SIR model [21] . Very early in the COVID-19 outbreak, some papers have tried to estimate R 0 . For example, there is a highly cited work that was accepted at the end of January 2020, and since published, that estimated the value of R 0 from Wuhan data, using Laplace transform of a function, which is modeled in an ad hoc way, assuming a Gamma probability density function [22] . This work proposes values of R 0 from 2.24 to 3.58, but the important role of the assumptions should not be forgotten. There are other works that have proposed some basic fits of the exponential initial growth of the epidemic and found values of, e.g., 3.1 for Italy [23] or 4.5 for France [24] . Concerning France, some other works have already addressed the French COVID-19 pandemic outbreak and the estimation of R 0 : see, e.g., [25, 26] , where values from 1.7 to 4.2 have been reported. Some papers have proposed meta-analysis of published works. For example, [27] considered as early as February 7, 2020, 12 studies and found values of R 0 ranging from 1.4 to 6.49, obtained using methods varying from the stochastic Markov chain model and SEIR to compartmental models and stochastic simulations. The same type of work was done in March, 2020, concerning 23 studies that found R 0 values ranging from 1.9 to 6.49 [28] . We see that ordinarily when a SIR or a similar model is used, the nonlinear fits-other complicated methods-are employed on the growth part but do not use the information provided by the decreasing part of the infection curve. Our argument here is that both of these should be considered, as they are part of the same dynamics. But of course, in our method, we need to assume (1) that the SIR model is the correct one to model the pandemic and (2) that parameters have not changed in time. Indeed the initiative of political authorities has been, all around the world, trying to decrease R 0 by social distancing and sometimes lockdown of the population during weeks or months. These political decisions may change the parameters of the model and correspond to limits in the application of our method. Using our algorithm, we have found two different R 0 values. First, by considering the hospitalized population (due to in France, we found R 0−H = 8.7, β H = 0.183, α H = β H /R 0−H = 0.021, and N H = 50709. This is found assuming that this SIR model is valid to describe the I H dynamics, and that the parameters have not changed during the lockdown. With this assumption, we find a SIR model with quite a large R 0 value and a susceptible population (S value), initially around 50,000 people, far from the about 67 million approximate population of France. Let us recall that this model assumes a homogeneous mixing of the infected and susceptible populations. This means that this model could work here, if we consider only a sub-population who are concerned with being in contact with the infected population. The second set of values was found by applying the model to the section of the infected population who were put in intensive care units (ICU). Let us recall that the lockdown was decreed in France (as in other countries such as China, Italy, and Spain, at that time) by the French government in order to maintain the number of ICU at a given limit (5000 at the time), and avoid loosing control of the situation in hospitals. In this framework, to consider the SIR model with I being the ICU part of infected people, has some justification. Then, the S part of the population is here the hospitalized population: they are susceptible to going from the hospitalized state to the ICU state. Using the algorithm, we found R 0−icu = 5.52, β icu = 0.195, α icu = 0.0354, and N icu = 14, 000. This means that, if this model is correct and if the parameters have not changed during the 100 days of the study, there were 14,000 susceptible people that could have gone to the ICU state. We also remark that the two estimates of β are quite close, whereas for α there is a difference, and the mean duration for an individual in the H -state is T H = 1/α H = 47 days, whereas T icu = 1/α icu = 28 days. These numbers seem realistic and translate the fact that the decreasing exponentials are rather slow. There is a long memory in both curves. Are such large values of R 0 realistic? First, let us note that in their review, Delamater et al. [13] mention relatively large values of R 0 , going from 12.5 for measles in the USA (1912-1928); 13.7 to 18 for measles again in England and Wales in ; and values of 12 to 17 for pertussis in England Wales and the USA in the same periods. The values found here for COVID-19 in France are much lower. Furthermore, it has been already mentioned that there is a higher contact rate for high density population [29] . This may also be supported by recalling that in Eq. (1) the coefficient β corresponds to a contact rate between the S and I populations. In the field of turbulent particle dynamics (which could be an analogy with human movements in an active society), when there is a large concentration of particles, it has been shown that the contact rate is multiplied by a new factor corresponding to this accumulation, and that this factor can be as large as 10 [30, 31] . This means that when there is an accumulation of particles, the contact rate increases with a factor that can be explained and modeled. It is known that human population is not uniform in space; there are large inhomogeneities in human population density, in a way which has been characterized as multifractal, i.e., highly concentrated: see, e.g., [32] for an analysis of the French population density. Such accumulation may explain the large values for the contact rate, and hence, the large value of R 0 found here. However, associated with this large R 0 value is the fact that the N population is rather limited, meaning that most of the population is not concerned with the risk of infection: there is a huge concentration of the transmission from a limited number of infected people to an also limited number of susceptible people. Finally, we need to emphasize again that this application to real-world data is valid only on the hypothesis that the SIR model is correct and has constant parameters. The algorithm proposed here in the framework of the SIR epidemic model needs to know the maximum value of the infected population I max , the exponential slope for the initial increase γ and the exponential slope of the final decrease in the I dynamics Γ . From these three empirical estimations, we first compute the ratio ρ = γ /Γ and from it, using a nonlinear curve, the basic reproduction number R 0 is directly estimated. From R 0 and γ , the value of β is extracted, as well as the α parameter. Then, the total number N corresponding to the initial susceptible population S 0 is estimated from I max and R 0 . This algorithm has been tested using different values of the parameter R 0 . It has also been applied to a situation of real-world data, for the COVID-19 situation in France. The value of R 0 , which was found to be 8.7 for the hospitalized population and 5.52 for the ICU population, is larger than other estimates where R 0 is between 2 and 3. However, this may be linked with the applicability of the SIR model, since our data analysis is valid only for this model. What has been shown, is that, if the SIR model can be applied to the COVID-19 infectious dynamics in France, with fixed values of its parameters, then the R 0 value is relatively large. It may be that the lockdown had changed the behavior of the population, and that the parameters had changed over time. Other more complex models than the SIR may also be applied to these realworld data, e.g., taking into account the stratification of the population, SIR model with time changing parameters, or the heterogeneity of the population. The quality of the experimental data is also an issue. Indeed, real data of infected and recovered people are hugely biased, since, e.g., many people have been infected and then recovered without leaving any record. There is a large uncertainty in the application of statistical physics to societal phenomena, including epidemics measurements. Let us finally remark that in the SIR model the decline of the epidemics depends on the depletion of the susceptible population. At the scale of a country, this decline is possible either by limiting the contacts between individuals, so that the total concerned individuals N is much smaller than the total population of the country. Another approach, done in 2021 all over the world, is to diminish the susceptible population through the vaccination. The perspectives of this work may be to extend the approach to other famous epidemic models, such as the SEIR (where another state is introduced-the "exposed" population, which is infected but not infectious). This introduces a delay in the dynamics, and the calculation performed here would need to be revised. Proof Since Γ = β 1 R 0 − S e N , the condition Γ > 0 is equivalent to state that 1/R 0 > S e /N . The value of S e /N is the solution 0 < x < 1 of: We need to show that this solution verifies R 0 x < 1. It is a standard relation to consider that one has for y > −1 [33] : By writing x = 1 + y, we have for x > 0: Using Eq. (24), we have: Since x − 1 < 0, this gives R 0 ≤ 1 x . We still need to exclude the equality R 0 x = 1. For this, let us note that if R 0 x = 1, inserting x = 1/R 0 in Eq. (24) gives an equation for R 0 : − log R 0 + R 0 − 1 = 0. Its only solution is R 0 = 1. Since we assume here that R 0 > 1, we may write that R 0 x = 1. Hence, it is proved that R 0 x < 1, implying Γ > 0. Epidemic Modelling: An Introduction Mathematical Biology: I An Introduction 59th IEEE Conference on Decision and Control (CDC What will be the economic impact of COVID-19 in the us? Rough estimates of disease scenarios Acknowledgements Peter Magee of www.englisheditor.webs.com is thanked for the paper's English proofing. In this appendix, we prove the following theorem: Γ = β 1 R 0 − S e N > 0 when β > 0 and R 0 > 1.