key: cord-0558285-6vf2d8en authors: Hirose, Hideo title: A Relationship Between SIR Model and Generalized Logistic Distribution with Applications to SARS and COVID-19 date: 2020-09-21 journal: nan DOI: nan sha: 1b7f5d0debc3d659d0c5c467ba7639199ddcdbd0 doc_id: 558285 cord_uid: 6vf2d8en This paper shows that the generalized logistic distribution model is derived from the SIR model under certain conditions. In the SIR model, there are uncertainties in predicting the final values for the number of infected population and the infectious parameter. However, by utilizing the information obtained from the generalized logistic distribution model, we can perform the SIR numerical computation more stably and more accurately. Applications to SARS and COVID-19 using this combined method are also introduced. The SIR model (see [1, 2, 3] , e.g., for general descriptions) has been commonly used in infectious disease spread simulations for more than half a century although the mathematical model is very simple. Such a long life-length proves the effectiveness of this model, resulting in a variety of expansions and many actual applications (see [4, 5, 6, 7] , e.g., for specific applications). Similar to other cases, this model has been recently applied to Coronavirus disease 2019 caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); see references of [8, 9, 10, 11, 12, 13, 14, 15] . Meanwhile, the generalized logistic distribution has also been used in statistical infectious disease spread predictions (e.g., [16, 17] ) as an empirical model. The method is also used in COVID-19 case (see [18] , e.g.). However, there seems to be no plain explanations connecting this differential equation model and probability distribution models. In this paper, firstly, we show that the generalized logistic distribution model can be derived from the SIR model under certain conditions. Then, we can propose a more accurate Email address: hirose_hideo@kurume-u.ac.jp (Hideo Hirose) and stable prediction method by combining these two models. Applications to SARS and COVID-19 using this combination method are also introduced. The SIR model uses the ordinary differential equations where S, I, and R mean the susceptible, infectious, and removed populations, and the parameters λ, and γ are the infection rate, and removal rate (recovery rate). In the SIR model, for example, a person could change his or her condition from susceptible to infected with a ratio λ, then to removed with a ratio γ. Removed persons will never become susceptible. From equations (1), which means S(t) + I(t) + R(t) = const. This is a total population size, and we denote this by N . Here, we introduce T (t) = I(t) + R(t) for further discussions. This is the number of cumulative infected persons. Assuming that I(t) is small enough comparing to S(t) at early stages, that is, S(0) ≈ N , then Equation (2) can be written This fundamental ordinary differential equation can be easily solved as using integration, where I(0) is the initial number of infected persons. If we define R 0 = λN/γ which is called the basic reproduction number at time 0, I(t) becomes increasing when R 0 > 1, and decreasing when R 0 < 1. That is, R 0 plays an important role in determining the pandemic phenomena or extinction phenomena at early stages of the infectious disease spread. Therefore, to know R 0 as early as possible is considered to be crucial as well as to find the solutions for the SIR model. By rewriting R 0 = λN/γ to R 0 = λN I(t)/γI(t), it is worth mentioning that the condition that the number of inflow of infected persons is equivalent to the number of outflow of removed persons results in R 0 = 1. Later, this concept is useful to consider the current reproduction number R c , which represents a reproduction number at current time t. In solving SIR ordinary differential equations as an initial value problem, we require parameter values of λ and γ, and initial values of S(0), I(0) and R(0). By observation, we can obtain daily populations for I(t) and R(t) because daily numbers of newly infected, died and recovered (cured) persons are noticed in public. We note that I(t) is not identical to the daily number of newly infected persons. Then, parameters λ and γ for the SIR model at time t can be roughly obtained by using the simultaneous difference equations below, regarding the differential equations as the difference equations. Since daily values of λ(t) and γ(t) are unstable to some extent, such an instability shall be removed by adopting the mean values computed from the latest λ(t) and γ(t) values; for example, seven days average values can be used. However, this is not an optimal solution. To find much more accurate estimates for parameters λ and γ, we may use the best-backward solution (BBS) method explained below. First, we obtain the initial guesses of λ (0) and γ (0) for λ and γ, e.g., by using the simultaneous difference equations above. Then, we estimate the optimum values ofλ andγ for λ and γ by using the simplex method [19] . In optimization, we evaluate the following function E(n, s) (k) iteratively, whereT (t j ),S(t j ) andR(t j ) are the numbers of observed values for cumulative infected persons, susceptible persons and removed persons; T (k) (t j ) and R (k) (t j ) are k-th iterative solutions for the numbers of cumulative infected persons, susceptible persons and removed persons of the ordinary differential equations of the SIR. We continue this iteration until |E(n, s) (k+1) − E(n, s) (k) | < ε holds, where ε is a small positive number. In solving SIR equations, the solutions are obtained backward from time t = t n to time t = t n−s somehow, e.g., Runge-Kutta method. Finally, we can reachλ andγ. We note that Taking into account the importance of the most recent observed values, we often use s = 7. However, S(t) becomes unreliable because it strongly depends on N , and we cannot determine an appropriate size of N . It may be a small district size, or a large country size, depending on the region of disease spread. That is, N is unknown in general because plausible uninfected persons who can contact infected persons are not identified. It is an uncertainty factor in using the SIR model that we cannot determine N . To look at this phenomenon, we have performed a simulation study using the SARS case in Hong Kong in 2003. We assume cases that N is 2,000 (strongly restricted area population), 10,000, 100,000, 1,000,000, and 6,810,000 (actual Honk Kong population in 2003). Figure 1 shows the predicted curves for the number of cumulative infected persons, T (t), after 30th day using observed data from 22nd to 30th day via the SIR model; here, day 1 was March 17, 2003. In the figure, the dotted points showing the observed values for T are superimposed. We see that the prediction curves strongly depend on N . The larger the value of N , the steeper the increasing tangent. In the cases of N ≥ 100, 000, the curves show blow-up at the moment of 30th day, although T (t) is bounded above by the upper limit N . Therefore, we can mention that we cannot predict the robust value for T if N is unknown, typically at early stages. Figure 1 : Various cases of the predicted curves for the number of cumulative infected persons, T (t), after 30th day using from 22nd to 30th day observed data via SIR model in Honk Kong in 2003. We assumed cases that N is 2,000, 10,000, 100,000, 1,000,000, and 6,810,000. We can also see another uncertainty factor in estimating λ using the SIR model. Looking at the estimated parametersλ at t for various N in Figure 2 , they also seem to be affected by the value of N . The figure suggests the strong dependency of λ on N . We cannot estimate the consistent value for λ if N is clearly determined, typically at early stages. The generalized logistic distribution (GLD) model developed by Richards [20] can be applied to flexible growth function for empirical use. This model is based on a simpler model [21] to describe the increase of weight as a function of the metabolism process of animals. The GLD is applied also to other fields such as hydrology (see [22] , e.g.), medical fields in infectious disease spread modelling such as SARS, FMD, Zika, Ebola, and SARS Cov-2 and in growth modelling such as physiochemical phenomenon, psychological issues, survival time of diagnosed leukemia patients, and weight gain data. The three-prameter generalized logistic distribution function is defined as where σ, µ and β denote the scale, location, and shape parameters, respectively. By introducing z = (t − µ)/σ, we have the standard generalized logistic distribution expressed by where In estimating the parameters θ = (σ, µ, β) T , we often use the maximum likelihood estimation method. Since observed data are usually daily data, the likelihood function L(θ) can be constructed by using the grouped truncated model expressed as where k i (0 ≤ i ≤ n) represents the number of infected persons from time −∞ to t 0 or t i−1 to t i . When the total number of cases is known in advance, we can also use the trunsored model [17] . In the SIR model (1), (2), (3), we have assumed that I(t) is small enough comparing to S(t) at early stage, i.e., S(0) ≈ N , then we have derived the simple ordinary differential equation (4) with the solution of (5). This solution shows the explosive increasing population for the infected persons when R 0 > 1. In the real world, the number of infected persons is bounded above. Thus, a much more realistic model is required at later stages in disease spreading. Since T (t) = N − S(t), from Equation (1), we have Assuming that R(t) = 0, i.e., none of infected persons will be transferred to the removed population, then, this equation becomes This equation shows a symmetry between T (t) and (N −T (t)), and consequently, the solution represents increasing flat S-shaped curve. By integration, we can easily derive the solution of (16) as which is called the logistic function. The inflection point becomes (log(N/T (0)− 1)/λN, N/2) by solving d 2 T (t)/dt 2 = 0. In the real world, this model is still unrealistic because asymmetric curves are often observed. Then, we assume that (T (t)/N ) shall be (T (t)/N ) m because it would be natural to think that the susceptible persons would be more/less affected by infected persons depending on the magnitude of m rather than linearly affected. For example, if half of the population is already infected, the ratio of infectious persons will not be half but will be inflated to 3/4 when m = 2, and it will be shrunk to (2 − √ 2)/2 when m = 1/2. Thus, we assume the following ordinary differential equation where b is a constant holding that b = λN when m = 1. To solve this equation, firstly, we set y(t) = T (t)/N , and further, we use change of variables such that z(t) = (y(t)) −m (refer to [23] , e.g., for such transformations). Then, Equation (20) can be written as which can also be solved by integration, and the solution is This reveals that the curve of T (t) shows the same shape to one expressed by Equation (12) except for the scale. Therefore, the generalized logistic distribution is derived from the SIR model with certain assumptions. These assumptions could be applied to the cases at early stages of the disease spread, we may use this probability distribution as the statistical model representing the infectious disease spread phenomena. The time t for the inflection point becomes (log(((N/T (0)) m − 1)/m))/bm by solving d 2 T (t)/dt 2 = 0. We have the relationships of parameters between Equation (11) and Equation (20) such that b = β/σ, k = (1 + exp(µ/σ)) −β , m = 1/β, σ = 1/(bm), µ = (log(k −m − 1))/(bm), β = 1/m. Using exactly the same example in the previous section, we show, in Figure 3 , the predicted curves for the number of cumulative infected persons, T (t), after 30th, 33rd, 45th and 73th day using observed data from the first day to the last observed day, by maximizing the likelihood function (21) . Dotted points show the observed values for T . In this GLD model, we can estimate the final (i.e., t → ∞) value for T (t) such that where F (t;θ) expresses the cumulative distribution function value using estimated parameterθ at time t. Looking at the figure, we see that the prediction curves for T (t) are close to the observed values, although the prediction curves show underestimated results to some extent. In addition, the GLD model can predict the final valueT (∞) = N even at early stages. It will not make sense that we compare the final valueT (∞) using the GLD model with that using the SIR model becauseT (∞) using the SIR model does not provide consistent values. Therefore, we compare the valuê T (t conv ) using the GLD model with that using the SIR model, where t conv expresses the time whenT (t) seems to converge. In the SARS case example, we set t conv = 117. We introduce the two terms ofT (t conv ) GLD (t) andT (t conv ) SIR (t); the former represents predictedT (t conv ) estimated at truncation time t using the GLD model, and the latterT (t conv ) at time t using the SIR model. We define L-plot such that time t locates in horizontal axis, and thatT (t conv ) locates in the vertical axis. Figure 4 shows the comparison betweenT (t conv ) GLD (t) andT (t conv ) SIR (t) at various time t; in the figure, observed T (t) are superimposed. Although we have selected a rather small value of 117 for t conv , the estimated value ofT (117) SIR (t) shows unstable behavior thanT (117) GLD (t) does. Therefore, we consider using the information for N using the GLD model, and to combine the GLD model use and SIR model use next. As described above, we can predict the number of cumulative infected persons after the last day of observation using the SIR model under appropriate conditions. However, to keep the estimates reliable, we should pay attention to provide adequate parameter values, in particular for infection parameter λ and total population N . Otherwise, even if we can roughly obtain the current reproduction number R c , we cannot know consistent estimates for λ and N . On the contrary, we do not require the information of N to estimate the parameter θ in the GLD model. In addition, we can estimate N even at early stages. Thus, in order to obtain the more accurate estimates of the parameters in the SIR model, we may utilize that information from the GLD model. This is called the combination method. Using these two models simultaneously, we can expect to make more accurate predictions. Figure 5 shows the observed COVID-19 case data in Hubei in 2020 [24] . Looking at the figure, we see that the infection spreads very quickly, but recovered slowly. From the infection to recover, it took three weeks referring the median time of infection curve and recovered curve. First, we have fitted the generalized logistic distributions to observed values of cumulative number of infected persons, cumulative number of died persons and cumulative number of recovered persons. Figure 6 shows the cumulative distribution functions and corresponding observed data, i.e., the cumulative number of persons are divided by the total number of persons. We can see that the three observed cases are well fitted to the generalized logistic distributions. According to [17] , among the generalized lognormal, generalized extreme-value, generalized gamma and generalized logistic distributions, the generalized logistic distribution showed the best-fit model. To the observed number of infected persons, we have applied the L-plot in Figure 7 , assuming that the observed values follow generalized logistic distributions. From the figure, we see that the estimates for N seem to stable after 29th day from the beginning, and it seems to converge around 70, 000. Thus, we set N to 70, 000 in the SIR model. Figure 8 shows the predicted curves after the last day of observation. In the figure, solid curves express the case of N = 70, 000, and dotted curves N = 1, 000, 000 for the sake of comparison. Clearly, the case of N = 1, 000, 000 is misleading because there are large discrepancies between the predicted values Although the SIR model has been actively used for a long time and has been useful for prediction, there are uncertainties in predicting the final values for the number of infected population and the infectious parameter. In this paper, we have introduced that the generalized logistic distribution model can be derived from the SIR model under certain conditions. In using the generalized logistic distribution model, we can resolve one of the uncertainty factors, resulting in Contributions to the mathematical theory of epidemics-iii. further studies of the problem of endemicity Infectious diseases of humans: Dynamics and control Mathematical epidemiology of infectious diseases: model building, analysis and interpretation A simple mathematical model for ebola in africa Critical response to post-outbreak vaccination against foot-and-mouth disease The foot-and-mouth epidemic in great britain: Pattern of spread and impact of interventions Modelling the epidemic spread of an h1n1 influenza outbreak in a rural university town Generalized logistic growth modeling of the covid-19 outbreak: comparing the dynamics in the 29 provinces in china and in the rest of the world Mathematical modelling of covid-19 transmission and mitigation strategies in the population of ontario, canada Inferring change points in the spread of covid-19 reveals the effectiveness of interventions Transmission dynamics of the covid-19 outbreak and effectiveness of government interventions: A data-driven analysis Mathematical models and deep learning for predicting the number of individuals reported to be infected with sars-cov-2 Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Predicting the spread of covid19 using sir model augmented to incorporate quarantine and testing The challenges of modeling and forecasting the spread of covid-19 Severe acute respiratory syndrome epidemic in asia The mixed trunsored model with applications to sars Generalized logistic growth modeling of the covid-19 pandemic in asia A simplex method for function minimization A flexible growth function for empirical use A quantitative theory of organic growth (inquiries on growth laws. ii) Estimation of the generalized logistic distribution of extreme events using partial l-moments Exact solutions of stochastic differential equations: Gompertz, generalized logistic and revised exponential the use of such the information for the numerical computations in the SIR model. We have proposed a more accurate and stable prediction methodology by cooperating these two models with each other. Applications to SARS and COVID-19 using this combined method are also introduced.