key: cord-0794711-og0ybfzf authors: Marinov, Tchavdar T.; Marinova, Rossitza S. title: Dynamics of COVID-19 Using Inverse Problem for Coefficient Identification in SIR Epidemic Models date: 2020-07-15 journal: nan DOI: 10.1016/j.csfx.2020.100041 sha: a4eca1646a5207f6ce8d942a45dd43a6831cf41b doc_id: 794711 cord_uid: og0ybfzf Abstract This work deals with the inverse problem in epidemiology based on a SIR model with time-dependent infectivity and recovery rates, allowing for a better prediction of the long term evolution of a pandemic. The method is used for investigating the COVID-19 spread by first solving an inverse problem for estimating the infectivity and recovery rates from real data. Then, the estimated rates are used to compute the evolution of the disease. The time-depended parameters are estimated for the World and several countries (The United States of America, Canada, Italy, France, Germany, Sweden, Russia, Brazil, Bulgaria, Japan, South Korea, New Zealand) and used for investigating the COVID-19 spread in these countries. The inverse problem for estimating the time-dependent transmission and removal rates in the SIR epidemic model is derived and solved. The minimization problem uses the entire dataset with data available on June 21, 2020 for estimating the non-constant rates. The obtained numerical results demonstrate that the transmission and removal rates and the unknown functions are accurately estimated. The numerically computed rates are used for forecasting the COVID-19 pandemic for the world and a number of countries. The results of this research give insight of the pandemic in parts of the world and could help in determining policy. The SIR model is a good choice for the short period of time of this epidemic; however, it possesses known limitations in case of a long term infectious disease. In future, we plan to use other models. Depending on future developments of the disease, we may consider models addressing non-constant population, latency, reinfection, and vaccine. The COVID-19 coronavirus appeared in late 2019 and quickly spread across many countries. According to [14] and [40] , by June 21, 2020, there were more than 9 million confirmed cases of infected people, with more than 470,000 reported deaths globally. Governments closed the so-called non-essential businesses and services for weeks in order to slow down the growth of infections -especially among vulnerable populations -and thus, save lives. Numerous mathematical models are developed to forecast the future of the COVID-19 epidemic spread worldwide. They are used to assist governments in making decisions to cope with the virus and its consequences. As of June 2020, there are no vaccines for this highly contagious disease; nor efficient antiviral drugs. Mathematical modeling and forecasting the spread of epidemic diseases has a long history, see [3] , [5] , [10] , and [31] . The earliest published paper on mathematical modeling of spread of disease was carried out in 1766 by Daniel Bernoulli. Trained as a physician, Bernoulli created a mathematical model to defend the practice of inoculating against smallpox [15] . Infectious diseases include measles, malaria, varicella, HIV, Ebola, and SARS. A systematic review of the risk of death associated with Middle East respiratory syndrome (MERS) as well as risk factors for associated complications is given in [27] . An analysis and forecast of COVID-19 spreading in China, Italy and France was given in [11] . There is no vaccination for some of the infectious diseases, only preventive practices -see [16] . Kermack and McKendrick [17] introduced their model in 1921. Their theory is a mathematical hypothesis proposed to explain the rapid rise and fall in the number of infected people with a contagious illness in a closed population over time. It is the origin of SIR (Susceptibles-Infected-Recovered) type models. This formalism is the basis of all current modeling of the dynamics and evolution of infectious diseases, see [4] , [28] , [35] , [37] , and [38] . The models are useful in understanding the basic principles of the system, even in choosing a proper policy of infectious disease control. COVID-19 attracted attention in recent publications [12] , [19] , [29] , [33] , [34] , and [42] . The dynamical systems for epidemics are highly nonlinear -they include diverse characteristics, from population specifics to the immune system of an individual. Every new epidemics must be studied and mathematical models constructed to answer important questions on handling the disease. The parameters in the SIR model, the rate of transmission a and the rate of removal b, depend on the evolution of the epidemic disease over time, see [1] . Complex and realistic mathematical models are used to assist policy decision making, recent relevant works include [13] . This work aims to create a method that can accurately identify the time dependent parameters of the SIR system using real data and then use the computed parameter values to predict the spread of the epidemics. If some conditions change, then the projection based on the "old" data will no longer be accurate and will require adjustment. The parameter estimation is referred to as the inverse modeling problem. It means adjusting the parameters of a mathematical model to reproduce measured data. Model deficiencies are usually due to inaccurate parameters and must always be addressed. The inverse problem is crucial for calibrating the model and for controlling the model parameters. Approaches involving inverse problems can be successfully applied to a variety of important processes, including the spread of infectious diseases, allowing epidemiologists and public health specialists to make predictions on epidemics [5] . The present work extends the method proposed in [24] for finding optimum values for the infectivity and recovery rates. In [24] , these values are assumed to be constant over the whole interval because we applied the method to a short outbreak of influenza instead of considering the long term evolution of a pandemic as with the case of COVID-19. Here, we assume the infectivity and recovery rates are functions of time, namely a = a(t) and b = b(t). The SIR model is undoubtedly the most famous mathematical model for the spread of an infectious disease. A constant population of size N is divided into three classes: susceptible S, infectives I and removed R. Removed individuals R are no longer susceptible nor infectious for whatever reason; for example, they have recovered from the disease and are now immune, or they have been vaccinated, or perhaps they have died from the disease. Since the isolated infectious people are still capable of infecting individuals from class S (see for example the big number of infected medical personnel), we count these people as members of the class I. The diagram for the SIR model is given on Fig. 1 . There exist other models considering additional events and features, such as birth and death rates (non-constant population), vaccine effect, reinfection, latent period, and so on. In this work, we use the SIR model, because the considered time period is relatively short. We assume that the population under study is well mixed so that every person has equal probability of coming into contact with every other person. This major assumption does not hold in many situations [28] , e.g. sexually transmitted diseases and COVID-19 social distancing. Let a∆t be the probability that a random infectious individual infects a random susceptible individual during the time period ∆t. Then with S(t) susceptible and I(t) infectious population, the expected number of newly infectious individuals in the total population N during time ∆t is aSI∆t, i.e., the rate of change of the class S(t) is −aSI, where a > 0. We also assume that infectious individuals leave the I(t) class with rate b and they move directly into the R(t) class. There are reports for recovered from COVID-19 individuals who are re-infected; hence, we added a dashed arrow from the class R to the class I. Since the number of these cases at the present moment is very limited, we cannot estimate the rate c and we do not include this option into the system. Since the differential equations (1) and (2) do not depend on R, it is convenient to split the system into two parts -equations (1) and (2) as one system, and equation (3) by itself. If the coefficients a and b are given, we can derive initial conditions from the given data and solve the problem numerically by any of the well known methods for initial value problems (IVPs). We refer the reader to [2] for more information on methods for solving IVPs numerically. The main question is how to find the coefficients a and b for a given infection, e.g. COVID-19. The problem for the estimation of the constants a and b is an inverse problem solved in [24] . A similar approach for identifying coefficients in an Euler-Bernoulli equation from over-posed data is used in [23] and [26] . We assume that (from the available data) the values of S and I are known at two time moments, initial time moment T I and final time moment T F , and If the coefficients a and b are known, the solution of equations (1) and (2) can be determined using the initial conditions (4) . In general, the terminal conditions (5) are not satisfied exactly. Therefore, if the coefficients a and b are given, the problem is overdetermined. Let us assume that the coefficients a and b are constant and unknown. In this case, the general solution of the system (1), (2) depends on four constants-two constants from the integration and the unknown coefficients a and b. The number of conditions/equations in (4) and (5) is also four; hence, the problem for identifying the coefficients along with the functions S and I is well-posed, provided that an exact solution exists (or can be obtained). This type of problems belongs to the class of the so-called inverse problems. Still, the problem for obtaining (S, I) and (a, b) from equations (1) and (2) under the conditions (4) and (5) could also be ill-posed. For arbitrary values of S I , S F , I I , I F , there may be no solution (S, I), (a, b) satisfying the equations (1), (2) and all of the conditions in (4), (5) . For this reason, we assume that the problem is posed correctly after Tikhonov [39] , i.e., it is known a-priori that a solution of the problem exists. In other words, the the boundary data have "physical meaning" and, therefore, a solution exists. We are now ready to construct an algorithm for approximating the solution (S, I), (a, b) of the inverse problem (1), (2), (4), (5) . For estimating the coefficients a and b of the SIR model, we use an approach that is very similar to the variational method used in [24] . This method is a generalization of the Least Squares Method. More details about the method for transforming the inverse problem into a correct direct problem can be found in [8] , [22] , [23] , [26] . The original SIR model assumes that the coefficients a and b are constants. In real live situations, e.g. the COVID-19 pandemic, the coefficients vary in time, i.e., a = a(t) and b = b(t). The coefficients are affected by governments' restrictions, societal experiences, treatment, among others. A similar idea was proposed in [30] . Let D be a data set of function values S(t k ), I(t k ) at some time moments t 1 , t 2 , . . . . We assume there exists an algorithm for solving the inverse problem (1), (2), given the conditions (4), (5) and the data set D. Here, we consider a and b to be piece-wise constant (step) functions of time The constants a k and b k can be estimated by solving the inverse problem (1), (2), (4), (5) using S I = S(t k−1 ), S F = S(t k ), I I = I(t k−1 ), and I F = (t k ), for k = 2, 3, . . . Epidemics grow if the derivative dI/dt > 0 and decrease if dI/dt < 0. An important parameter, which characterizes infectious diseases, is the reproduction rate (also called reproduction number or ratio) of the infection From equation (2) and (7), it follows that Consequently, the epidemic grows if ρ(t) > 1 and decreases if ρ(t) < 1. The reproduction rate ρ(t) represents the expected number of secondary infections produced by a single primary infected person. The reproduction rate in Hubei Province (China) by using case-report data from January 11 to February 6, 2020 was estimated in [18] as 4.1192 (with 95% confidence interval 4.04734.1912). The reproduction rate ρ(t) also relates to the fraction of the population that gets sick. Another important characteristics of an infectious disease is the so-called herd immunity -the minimum fraction of the population,f , that is required to have immunity in order to prevent an epidemic [32] . The stability analysis of the SIR model shows thatf = 1 − 1/ρ. The society can acquire herd immunity in two ways. The "natural" way is to let the epidemic spread until the required fraction of the population,f , obtains immunity. The most common "non-natural" way is vaccination. Since vaccines are not presently available for COVID-19, some countries, e.g. Sweden, tried the "natural" way. As of this moment, their approach is questionable, according to the results for their pandemic spread, see Section 4.7. Many countries have been practicing social distancing and other measures to slow down the transmission rate a of the epidemic. A method for solving the inverse problem in the case of constant coefficients a and b is given in [24] , along with a discussion of some theoretical aspects. Here, we present the numerical algorithm for estimating the coefficients if they depend on time, i.e., a = a(t) and b = b(t). The inverse problem for the time dependent infectivity and recovery rates follows the idea of the method used in [26] . The original problem is replaced by a minimization problem. For the correct embedded problem, a difference scheme and a numerical algorithm can be constructed. First, we start with the sub-problem in case of using data values at two time moments. Since we are concerned with the numerical solution of the system (1)-(3), we seek approximations of the functions S(t), I(t), and R(t) at the discrete set {t 0 , t 1 , . . . , t n } of points in the interval [T I , T F ], where T I is the initial time moment, T F is the final time moment, and n is an integer greater than 1. The mesh of equidistant points is shown in Fig. 2 . We define the step size as τ = TF −TI n−1 . The nodes are the equidistant points t k = T I + kτ , k = 0, 1, . . . , n. Now we are ready to discretize equations (1) and (2) This discretization secures the second order of approximation O(τ 2 ). Since the problem (8)-(9) is non-linear, we use an iterative procedure, assuming that S k andĪ k are given from the previous iteration. Consider the function Φ(a, b, S 1 , . . . , S n−1 , I 1 , . . . , I n−1 ) = where k and δ k are the residuals of equations (8) and (9), respectively. Since the function Φ is a homogeneous quadratic function of k and δ k , its absolute minimum is zero. On the other hand, the function Φ attains its minimum if and only if k = 0 and δ k = 0 for all k = 1, 2, . . . , n − 1. Hence, there exists one-to-one correspondence between the solution of the system of equations (8), (9), (4), (5) and the problem for the minimization of the function Φ under the conditions (4) and (5). The necessary conditions for minimization of the function Φ with respect to its arguments S k and I k are Conditions (11) yield the following difference equations for k = 1, 2, . . . , n − 1. Adding the initial and terminal conditions (4) and (5), we obtain a well-posed linear system of 2(n + 1) equations for the unknown sets of values (S 0 , S 1 , S 2 , . . . , S n ) and (I 0 , I 1 , I 2 , . . . , I n ). We rewrite the function Φ in the form where The necessary conditions for minimization of the function Φ with respect to a and b are as follow The solution of the system (21), (22) is Let us assume that the number of infectious individuals at some time moments ν 1 , ν 2 ,. . . , ν m , is known and given by for l = 1, 2, . . . , m, while the values of the coefficients a and b are unknown. Suppose that for every 1 ≤ l ≤ m, there exists index k l , such that ν l = t k l , i.e., the set of time moments {ν 1 , ν 2 , . . . , ν m }, is a subset of the set of mesh nodes {t 0 , t 1 , t 2 , . . . , t n }. To simplify the calculations, let us introduce the notations χ k such as: if there exists k ∈ {0, 1, 2, . . . , n} such that t k = ν l , 1 ≤ l ≤ m, then χ k = σ l and µ k > 0, otherwise χ k = 0 and µ k = 0, where µ k is the weight of equation (25) in the LSM. Similarly to Subsection 3.1, consider the function where k and δ k are the residuals of the equations (8) and (9), respectively. The necessary conditions for minimizations of the function Φ with respect to S k and I k are given by equations (11) . For the case under consideration, we obtain for k = 1, 2, . . . , n − 1. Adding the initial and terminal conditions (4), (5), we obtain a well-posed linear system with 2(n + 1) equations for the unknown set of values (S 0 , S 1 , S 2 , . . . , S n ) and (I 0 , I 1 , I 2 , . . . , I n ). The equations for a and b in this case are the same as the equations (23) and (24), derived in Subsection 3.1.2. We use the following approach for solving the system (12), (13), (4), (5): i) With given initial values forŜ,Î, a, b, the system (12), (13), (4), (5) is solved for the functions S and I. ii) The deviation of the values of S and I fromS andĪ are computed as: If they are smaller than a given tolerance ε 0 , then the algorithm proceeds to iii); otherwise,Ŝ andÎ are replaced by S and I, respectively, and the calculations return to i); iii) The coefficients a and b are computed from (23) and (24) . If the difference between the new and old values of a and b is less than ε 0 then the calculations terminate; otherwise, the iterations continue, go back to i). We use ε 0 = 5 · 10 −10 in all calculations presented here. 3.4. Parameters a, b, and ρ as functions of time The data set (25) contains the number of infected individuals for every day for a period of m days. We divide the set into subsets of fixed length of P days (say P = 7, or 10, or 14 days), as shown in Fig. 3 . Next, we apply the algorithm presented in subsection 3.3 for every sub-interval [k − P + 1, k], for k = P, P + 1, . . . , m. Consequently, we obtain a k and b k as functions of time. For every sub-interval we calculate the value of the reproduction rate Models can help us understand what would be the worst case scenario and plan proper actions to achieve the best possible outcome under the given constraints. They cannot accurately predict what will happen. The forecasts of the epidemic spread are based on the available COVID-19 data as of June 21, 2020, at [40] and [41] . Our experiments indicate that the optimal value of ε 0 is 10 −8 . The parameters a, b, and ρ as functions of time are computed with τ = 1/40 and µ = τ . They are estimated with calculations performed over 7, 10, and/or 14 day intervals. The dynamic of the COVID-19 disease is projected based on the values of the parameters taken at the last day of the period used for the transmission are recovery rates estimation. This section presents our results for the COVID-19 pandemic. In particular, the following characteristics are important for understanding the infectious disease dynamics: • Estimated up-to-date transmission rate a(t); • Estimated up-to-date removal rate b(t); • Estimated up-to-date reproduction rate ρ(t); • Projected COVID-19 infectious cases I(t) using the up-to-date estimated rates, taken at the last time period, for which data is available. The functions I(t), S(t), and R(t) satisfy the direct problem, if the coefficients are known. After estimating a and b, we find the numerical solution of the direct problem (1)-(3) with proper initial conditions in order to project the future behavior of the epidemic. In all calculations, we use second order Runge-Kutta method, a type of predictor-corrector method. The first step is: Next, the corrector step improves the prediction obtained in the first step: Finally, for k = 1, 2, . . . . We use τ = 0.01 for solving the direct problem numerically by the Runge-Kutta method. It must be noted that these forecasts are valid as far as the rates remain as estimated; however, since COVID-19 is a highly contagious epidemic, for which very little is known, the rates of transmission, removal, and reproduction may suddenly change due to measures and/or events affecting the spread. Over the last several months, people learned how to protect themselves from being infected; consequently, this helped keep the transmission and reproduction rates low. Table 1 . Forecasts, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 4(d) . The reproduction rate ρ is still greater than one although decreasing. Table 2 . The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 5(d) . The transmission rate a and the reproduction rate ρ have increased lately. Table 3 . The projections, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 6(d) . Canada started its reopening some services and businesses in several provinces around May 14, 2020. However, during the month of June, the transmission rate a and the reproduction rate ρ steadily decrease. Table 4 . The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 7(d) . The data shows that the epidemic is slowing down significantly and it may disappear soon. Table 5 . Three projections, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 8(d) . The reproduction rate has been less than one since the middle of May, but, the end of May it is again greater than one and the epidemic spread is increasing. (c) (d) Figure 8 : Estimated values of a, b, ρ used for projections of I for France 4.6. Germany Fig. 9 (a)-(c) presents the obtained values of the parameters a, b, and ρ as functions of time, calculated over 7, 10, and 14 day intervals. The values of a, b, and ρ for the last day are given in Table 6 . The projected values, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 9(d) . The reproduction rate has been less than one since the middle of April, leading to decrease of the epidemic spread. For the last week the reproduction rate tends to increase. (c) (d) Figure 9 : Estimated values of a, b, ρ used for projections of I for Germany 4.7. Sweden Fig. 10(a)-(c) presents the obtained values of the parameters a, b, and ρ as functions of time, calculated over 7, 10, and 14 day intervals. The values of a, b, and ρ for the last day are given in Table 7 . Three forecasts, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 10(d) . The authors could not find detailed data for Sweden after May 30. Table 8 . Three projections, based on the last 7, 10 and 14 day interval estimations of a and b are given in Fig. 11(d) . Russia flattened the curve of the reproduction rate ρ, with values being around one lately. Table 9 . Three projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 12(d) . The transmission rate a decreases; however, the reproduction rate ρ is still greater than one. Table 10 . The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 13(d) . The reproduction rate ρ was less than one for a while, however, is started to increase about two weeks ago. Table 11 . The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 14(d) . The reproduction rate became less than one early May, causing the COVID-19 spread to decrease. Table 12 . The projections, based on the last 7, 10, and 14 day interval estimations of a and b are given in Fig. 15(d) . The reproduction rate became greater than one during the last three weeks. There are similarities between the epidemic spread for some countries. Fig. 17 presents the obtained values of the reproduction rate ρ as function of time, calculated over 7 day intervals, for four European countries: France; Germany; Italy; and Bulgaria. The obtained values of the reproduction rate ρ as function of time, also calculated over 7 day intervals, for the world, the United States of America, Russia, and Brazil are shown in Fig. 18 . The inverse problem for estimating the time-dependent transmission and removal rates in the SIR epidemic model is derived and solved. The minimization problem uses the entire dataset with data available on June 21, 2020 for developments of the disease, we may consider models addressing non-constant population, latency, reinfection, and vaccine. A deterministic model for highly contagious diseases: The case of varicella Classical and Modern Numerical Analysis: Theory, Methods and Practice, Chapman & Hall/CRC Numerical Analysis and Scientific Computing Infectious diseases of humans Numerical modeling and theoretical analysis of a nonlinear advection-reaction epidemic system Interepidemic Intervals in Forced and Unforced SEIR models Modelling Mathematical Methods and Scientific Computation An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it Identification of heat-conduction coefficient via method of variational imbedding A note on the existence and stability of an inverse problem for a SIS model Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation Analysis and forecast of COVID-19 spreading in China Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare Qualitative study of a stochastic SIRS epidemic model with information intervention The mathematics of infectious diseases Analysis of SIR epidemic model with information spreading of awareness A contribution to the mathematical theory of epidemics Risk estimation of the SARS-CoV-2 acute respiratory disease outbreak outside China Early dynamics of transmission and control of COVID-19: a mathematical modelling study Global stability of a network-based SIRS epidemic model with nonmonotone incidence rate Dynamical behavior of a stochastic multigroup SIR epidemic model Novel numerical approach to solitary-wave solutions identification of Boussinesq and Korteweg-de Vries equations Coefficient Identification in Euler-Bernoulli Equation from over-posed data Inverse problem for coefficient identification in SIR epidemic models, Computers and Mathematics with Applications Coefficient identification in elliptic partial differential equation Inverse Problem for Coefficient Identification in Euler-Bernoulli Equation Clinical determinants of the severity of Middle East respiratory syndrome (MERS): a systematic review and meta-analysis Mathematical biology. I. An introduction Serial interval of novel coronavirus (COVID-19) infections, International journal of infectious diseases Characterization of the COVID-19 pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in South Korea, Italy, and Brazil Mathematical Models in Biology: An Introduction Viral evolution and transmission effectiveness Real-time forecasts of the COVID-19 epidemic in China from COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses The SIR Model for Spread of Disease COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives Mathematics for Life Science and Medicine Inverse Problem Theory and Methods for Model Parameter Estimation Methods for Solving Incorrect Problems Novel Coronavirus (2019-nCoV) situation reports -World Health Organization (WHO COVID-19 Coronavirus Pandemic Mathematical Modeling and Epidemic Prediction of COVID-19 and Its Significance to Epidemic Prevention and Control Measures