key: cord-0128594-ic0xi5tg authors: Amaro, J. E.; Dudouet, J.; Orce, J. N. title: Global analysis of the COVID-19 pandemic using simple epidemiological models date: 2020-05-14 journal: nan DOI: nan sha: afb462d1840f57be1e7800c3344b53cedd4a5d1c doc_id: 128594 cord_uid: ic0xi5tg Several analytical models have been used in this work to describe the evolution of death cases arising from coronavirus (COVID-19). The Death or `D' model is a simplified version of the SIR (susceptible-infected-recovered) model, which assumes no recovery over time, and allows for the transmission-dynamics equations to be solved analytically. The D-model can be extended to describe various focuses of infection, which may account for the original pandemic (D1), the lockdown (D2) and other effects (Dn). The evolution of the COVID-19 pandemic in several countries (China, Spain, Italy, France, UK, Iran, USA and Germany) shows a similar behavior in concord with the D-model trend, characterized by a rapid increase of death cases followed by a slow decline, which are affected by the earliness and efficiency of the lockdown effect. These results are in agreement with more accurate calculations using the extended SIR model with a parametrized solution and more sophisticated Monte Carlo grid simulations, which predict similar trends and indicate a common evolution of the pandemic with universal parameters. The SIR (susceptible-infected-recovered) model is widely used as first-order approximation to viral spreading of contagious epidemics [1] , mass immunization planning [2, 3] , marketing, informatics and social networks [4] . Its cornerstone is the so-called "mass-action" principle introduced by Hamer, which assumes that the course of an epidemic depends on the rate of contact between susceptible and infected individuals [5] . This idea was extended to a continuous time framework by Ross in his pioneering work on malaria transmission dynamics [6] [7] [8] , and finally put into its classic mathematical form by Kermack and McKendric [9] . The SIR model was further developed by Kendall, who provided a spatial generalization of the Kermack and McKendrick model in a closed population [10] (i.e. neglecting the effects of spatial migration), and Bartlett, who -after investigating the connection between the periodicity of measles epidemics and community size -predicted a traveling wave of infection moving out from the initial source of infection [11, 12] . More recent implementations have considered the typical incubation period of the disease and the spatial migration of the population. The pandemic has ignited the submission of multiple manuscripts in the last weeks. Most statistical distributions used to estimate disease occurrence are of the binomial, Poisson, Gaussian, Fermi or exponential types. Despite their intrinsic differences, these distributions generally lead to similar results, assuming independence and homogeneity of disease risks [13] . In this work, we propose a simple and easy-to-use epidemiological model -the Death or D model [14] -that can be compared with data in order to investigate the evolution of the infection and deviations from the predicted trends. The D model is a simplified version of the SIR model with analytical solutions under the assumption of no recovery -at least during the time of the pandemic. We apply it globally to countries where the infestation of the COVID-19 coronavirus has widespread and caused thousands of deaths [15, 16] . Additionally, D-model calculations are benchmarked with more sophisticated and reliable calculations using the extended SIR (ESIR) and Monte Carlo Planck (MCP) models -also developed in this work -which provide similar results, but allow for a more coherent spatial-time disentanglement of the various effects present during a pandemic. A similar ESIR model has recently been proposed by Squillante and collaborators for infected individuals as a function of time, based on the Ising model -which describes ferromagnetism in statistical mechanics -and a Fermi-Dirac distribution [17] . This model also reproduces a posteriori the COVID-19 data for infestations in China as well as other pandemics such as Ebola, SARS, and influenza A/H1N1. The SIR model considers the three possible states of the members of a closed population affected by a contagious disease. It is, therefore, characterized by a system of three coupled non-linear ordinary differential equations [18] , which involve three time-dependent functions: • Susceptible individuals, S(t), at risk of becoming infected by the disease. • Infected individuals, I(t). • Recovered or removed individuals, R(t), who were infected and may have developed an immunity system or die. The SIR model describes well a viral disease, where individuals typically go from the susceptible class S to the infected class I, and finally to the removed class R. Recovered individuals cannot go back to be susceptible or infected classes, as it is, potentially, the case of bacterial infection. The resulting transmission-dynamics system for a closed population is described by where λ > 0 is the transmission or spreading rate, β > 0 is the removal rate and N is the fixed population size, which implies that the model neglects the effects of spatial migration. Currently, there is no vaccination available for COVID-19, and the only way to reduce the transmission or infection rate λ -which is often referred to as "flattening the curve"-is by implementing strong social distancing and hygiene measures. The system is reduced to a first-order differential equation, which does not possess an explicit solution, but can be solved numerically. The SIR model can then be parametrized using actual infection data to solve I(t), in order to investigate the evolution of the disease. In the D model, we make the drastic assumption of no recovery in order to obtain an analytical formula to describeinstead of infestations -the death evolution by COVID-19. This can be useful as a fast method to foresee the global behavior as a first approach, before applying more sophisticated methods. We shall see that the resulting D model describes well enough the data of the current pandemics in different countries. The main assumption of the D model is the absence of recovery from coronavirus, i.e. R(t) = 0, at least during the pandemic time interval. This assumption may be reasonable if the spreading time of the pandemic is much faster than the recovery time, i.e. λ ≫ β. The SIR equations are then reduced to the single equation of the well-known SI model, which represents the simplest mathematical form of all disease models, where the infection rate is proportional to both the infected, I, and susceptible individuals N −I. Equation 5 is trivially solved by multiplying by dt and dividing by (N − I)I, Integrating over an initial t = 0 and final t we obtain where I 0 = I(t 0 ). Taking the exponential on both sides Finally, solving this algebraic equation we obtain the solution I(t) which can be written in the form where we have defined the constants The parameter b is the characteristic evolution time of the initial exponential increase of the pandemic. The constant C is the initial infestation rate with respect to the total population N . Assuming C ≪ 1, Eq. 11 yields In order to predict the number of deaths in the D model we assume that the number of deaths at some time t is proportional to the infestation at some former time τ , that is, where µ is the death rate, and τ is the death time. With this assumption we can finally write the D-model equation as where a = µI 0 e −τ /b , c = C e −τ /b , and a/c yields the total number of deaths predicted by the model. This is the final equation for the D-model, which presents a similar shape to the well-known Woods-Saxon potential for the nucleons inside the atomic nucleus or the bacterial growth curve. The rest of the parameters, µ, τ , I 0 and N are embedded in the parameters a, b, c, which represent space-time averages and can be fitted to the timely available data. In Fig. 1 , we present the fit of the D-model to the COVID-19 death data for China, where its evolution has apparently been controlled and the D function has reached the plateau zone, with few increments over time, or fluctuations that are beyond the model assumptions. This plot shows the duration of the pandemic -about two months to reach the top end of the curve -and the agreement, despite the crude assumptions, between data and the evolution trend described by the D-model. This agreement encourages the application of the D model to other countries in order to investigate the different trends. In order to get insight into the stability and uncertainty of our predictions, Fig. II shows the evolution of a, b, and c and other model predictions from fits to the daily data in Spain. The meaning of these quantities is explained below: • The parameter a is the theoretical number of deaths at the day corresponding to t = 0. In general, it differs from the experimental value and can be interpreted as the expected value of deaths that day. Note that experimental data may be subject to unknown systematic errors and different counting methods. • The parameter b, as mentioned above, is the char- acteristic evolution time. During the initial expo-nential behavior, it indicates the number of days for the number of deaths to double. Moreover, 1/b is proportional to the slope of the almost linear behavior in the mid region of the D function. That behavior can be obtained by doing a Taylor expansion around t 0 = −b ℓn c and is given by • The parameter c is called the inverse dead factor because D(t → ∞) = a/c provides the asymptotic or expected total number of deaths. Figure II shows the stable trend of the parameters between days 19 to 24 (corresponding to March 27-30), right before reaching the peak of deaths cases, which occurred in Spain around April 1. Such stability validates the D-model predictions during this time. However, a rapid change of the parameters is observed, especially for a, once the peak is reached, drastically changing the prediction of the number of deaths given by a/c. This sudden change results in the slowing down of deaths per day and longer time predictions T 95 and T 99 . The parameters of the D model correspond to average values over time of the interaction coefficients between individuals, i.e. they are sensitive to an additional external effect on the pandemic evolution. These may include the lockdown effect imposed in Spain in March 14 and other effects such as new sources of infection or a sudden increase of the total susceptible individuals due to social migration and large mass gatherings [20] . It is not possible to identify a specific cause because its effects are blurred by the stochastic evolution of the pandemic, which is why any reliable forecast presents large errors. One can also determine deaths/day rates by applying the first derivative to Eq. 15, which allows for a determination of the pandemics peak and evolution after its turning point. The D model describes well the cumulative deaths because the sum of discrete data reduce the fluctuations, in the same way as the integral of a discontinuous function is a continuous function. However, the daily data required for D ′ have large fluctuations -both statistical and systematic -which normally gives a slightly different set of parameters when compared with the D model. Using the D model fitted to cumulative deaths allows to compute deaths/day as where ∆t = 1 day. Figure 3 shows that Eqs. 18 and 19 yield similar parameters, as the time increment is small enough compared with the time evolution of the D(t) function. Hence, the first derivative D ′ (t) can be used to describe deaths per day. In addition, Fig. 4 shows that the parameters may be different for both D and D ′ functions using cumulative and daily deaths, respectively, as shown for Spain on April 5. It is also important to note that b is directly proportional to the full width at half maximum (F W HM ) of the D ′ (t) distribution, As shown below, the b parameter presents typical values between 4 and 10 for most countries undergoing the initial exponential phase, which yields a minimum and maximum time of 14 and 35 days, respectively, between the two extreme values of the F W HM . C. Dn model with two or more channels of infection Some models [21] include changes in the transmission rate due to various interventions implemented to contain the outbreak. The simple D model does not allow to do this explicitly, but changes in the spread can be taken into account by considering the total D or D n function as the sum of two or more independent D-functions with different parameters, which may reveal the existence of several independent sources, or virus channels. An example is shown in Fig. 5 , where the two-channel function has been fitted with six parameters to the Spanish data up to April 13. The fit reveals a second, smaller death peak, which substantially increase the number of deaths per day and the duration of the pandemic. This is equivalent to add a second, independent, source of infection several weeks after the initial pandemic. The second peak may as well represent a second pandemic phase driving the effects of quarantine during the descendant part of the curve. Additionally, the cumulative D-function can also be computed with a two-channel function, which provides, as shown in Fig. 6 , a more accurate prediction for the total number of deaths and clearly illustrates the separate effect of both source peaks. It is interesting to note that for large t, a ≈ a 2 , c ≈ c 2 and b 2 ≈ 2b. In such a case, the total number of deaths expected during the pandemic is given by D 2 (∞) = 2a/c. The D-model can also be used to estimate I(t) using the initial values of I 0 = I(0) and the total number of susceptible people N = S(0). The initial value of N is unknown, and not necessarily equal to the population of the whole country since the pandemic started in localized areas. Here, we shall assume N = 10 6 , although plausible values of N can be tens of millions. Note that the no-recovery assumption of the D model is unrealistic, and this calculation only provides an estimation of the number of individuals that were infected at some time, independently of whether they recovered or not. From the definition of D(t) in Eq. 14, the following relations between the several parameters of the model were extracted Solving the first two equations for µ and I 0 we obtain Hence, µ can be computed by knowing N . However, to obtain I 0 one needs to know the death time τ . This has been estimated to be about 15 to 20 days for COVID-19 cases, which can be used to compute two estimates of I(t). These are given in Fig. 7 for the case of Spain. Since there is no recovery in the D model, the total number of infected people is I ∼ N for large t, i.e. N = 10 6 in our case. In Fig. 7 which also depends on N and τ . For N = 10 6 , the ratio D/I increases similarly to the separate functions D and I between the initial and final values, These results depend on the total susceptible population N . However, the ratio of infected with respect to susceptibles, I/N , is independent on N . This function depends only on τ and is shown in the bottom panel of Fig. 8 for τ = 15 and 20 days, which reveals the rapid spread of the pandemic. Accordingly, between 10% and 30% of the susceptibles were infected in March 7, and one month later (April 6), when the fit was made, all susceptibles had been infected. This does not means that the full population of the country got infected, since the number N is unknown and, for instance, excludes individuals in isolated regions, and it may additionally change because of spatial migration, not considered in the model. D-model predictions can be compared with more realistic results given by the complete SIR model [9, 11] , which is characterized by Eqs. 1, 2, 3 and 4 with initial conditions R(0) = 0, I(0) = I 0 , S(0) = N − I 0 . The SIR system of dynamical equations can be reduced to a non-linear differential equation. First, dividing Eq. 1 by Eq. 3 one obtains, which yields the following exponential relation between the susceptible and the removed functions, Moreover, Eq. 4 provides a relation between the infected and the removed functions, which yields, by inserting into Eq. 3, the final SIR differential equation In order to obtain R(t) we only need to solve this first-order differential equation with the initial condition so that s + i + r = 1, then r(t) verifies which can be solved numerically, or by approximate methods in some cases. In Ref. [9] , a solution was found for small values of the exponent λN r/β. For the coronavirus pandemic, however, this number is expected to increase and be close to one at the pandemic end. At this point, we propose a modification of the standard SIR model. Instead of solving Eq. 38 numerically and fitting the parameters to data, the solution can be parametrized as which presents the same functional form as the D-model and, conveniently, provides a faster way to fit the model parameters by avoiding the numerical problem of solving Eq. 38. In fact, numerical solutions of the SIR model present a similar step function for R(t). Additionally, one can assume that D(t) is proportional to R(t), and can also be written as where a 2 , c 2 = s 0 and b 2 = β/(λN ) are unknown parameters to be fitted to deaths-per-day data, together with the three parameters of the r(t)-function: a, b, c. Figure 9 shows fits of the ESIR model to daily deaths in Spain during the coronavirus spread. The use of no boundary condition for the number of deaths (left panel) is not an exact solution of the SIR differential equation. A way to solve this problem is to impose the condition D ′ (∞) = 0, as the number of deaths must stop at some time. Numerically, it is enough to choose a small value of D ′ (t) for an arbitrary large t. The middle and right panels of Fig. 9 show different boundary conditions of D ′ (100) = 10 and D ′ (100) = 5, respectively, which yield the same results and the expected behavior for a viral disease spreading and declining. It is also consistently observed (e.g. see middle and right panels of Fig. 9 ), that at large t, r(t) → a c ≈ 1, which essentially means that most of the susceptible population N recovers, as we previously inferred from the D model. This, together with the fact that c 2 can always be adjusted to 1, leaves the ESIR model with essentially 4 free parameters to fit to the daily death data; i.e. the same number of parameters than the original SIR model. As shown in Fig. 10 , ESIR fits reproduce well the long flattening behavior observed in UK, USA, Germany or Iran, whereas it fails to reproduce the more-pronounced double-peak structure typically observed in countries like France, Italy, Spain or Belgium. As previously done with the D model, one can also expand the ESIR model to accommodate this apparent failure to take lockdown effects into account. Similarly, the ESIR2 model is proposed as, with where we have assumed that a = a ′ and c = 2a to accommodate that r(∞) → 1 and c 2 = 1. Hence, we are left with five free parameters. Finally, Fig. 11 shows the comparison between the ESIR2 and D ′ 2 fits to real data for countries where COVID-19 has widely spread: Belgium, USA, France, Germany, Iran, Italy, Spain and UK, USA. Death data are taken from Refs. [19, 22, 23] and consider 7-day average smoothing to correct for anomalies in data collection such as the typical weekend staggering observed in various countries, where weekend data are counted at the beginning of the next week. Real error intervals are extracted from the correlation matrix. As discussed in Section 2.3, the reduced D ′ 2 model has been used with a = a 2 and c = c 2 . Although arising from different assumptions, both models provide similar data descriptions and predictions, with slightly better values of χ 2 per degree of freedom for the ESIR2 model. It is also interesting to note that the reduced ESIR2 model with five parameters yields similar results to the full ESIR2 model, with eight parameters. As data become available, daily predictions vary for both ESIR2 and the D ′ 2 models. This is because the model parameters are actually statistical averages over space-time of the properties of the complex system. No model is able to predict changes over time of these properties if the physical causes of these changes are not included. The values of the model parameters are only well defined when the disease spread is coming to an end and time changes in the parameters have little influence. More sophisticated calculations can be compared with ESIR2 and D ′ 2 predictions. In particular, Monte Carlo (MC) simulations have also been performed in this work for the Spanish case [24] , which consist of a lattice of cells that can be in four different states: susceptible, infected, recovered or death. An infected cell can transmit the disease to any other susceptible cell within some random range R. The transmission mechanism follows principles of nuclear physics for the interaction of a particle with a target. Each infected particle interacts a number n of times over the interaction region, according to its energy. The number of interactions is proportional to the interaction cross section σ and to the target surface density ρ. The discrete energy follows a Planck distribution law depending on the 'temperature' of the system. For any interaction, an infection probability is applied. Finally, time-dependent recovery and death probabilities are also applied. The resulting virus spread for different sets of parameters can be adjusted from COVID-19 pandemic data. In addition, parameters can be made time dependent in order to investigate, for instance, the effect of an early lockdown or large mass gatherings at the rise of the pandemic. As shown in Fig. 12 , our MC simulations present similar results to the D ′ 2 model, which validates the use of the simple D-model as a first-order approximation. More details on the MC simulation will be presented in a separate manuscript [24] . Interestingly, MC simulations follow the data trend up to May 11 without any changes in the parameters for nearly two weeks. An app for Android devices, where the Monte Carlo Planck model has been implemented to visualize the simulation is available from Ref. [25] . In order to investigate the universality of the pandemic, it is interesting to compare all countries by plotting the D model in terms of the variable (t − t 0 )/b, where t 0 is the maximum of the daily curve given by t max = −b ℓn(c). By shifting Eq. 15 by t max = −b ℓn(c) and dividing by t max = a/c, the normalized D function is given by, The left of Fig. 13 shows similar trends for the normalized D curves of different countries, which suggests a universal behavior of the COVID-19 pandemic. Only Iran seems to slightly deviate from the global trend, which may indicate an early and more effective initial lockdown. A similar approach can be done for the daily data using peaks. The global models considered in this work present some differences with respect to other existing models. First, in this work we have tried to keep the models as simple as possible. This allows to use theoretical-inspired analytical expressions or semi-empirical formulae to perform the data analysis. The use of semi-empirical expressions for describing physical phenomena is recurrent in physics. One of the most famous is the semi-empirical mass formula from nuclear physics. Of course the free parameters need to be fitted from known data, but this allowed to obtain predictions for unknown elements. In our case we were inspired by the well known statistical SIR-kind models slightly modified to obtain analytical expressions that carry the leading time dependence. We have found that the D and D 2 models allow a fast and efficient analysis of the pandemics in the initial and advanced stages. Our results show that the time dependence of the pandemic parameters due to the lockdown can be effectively simulated by the sum of two Dfunctions with different widths and heights and centered at different times. The distance between the maxima of the two D-functions should be a measure of the time between the effective pandemic beginning and lockdown. In the Spanish case this is about 20 days. Taking into account that lockdown started in March 14, this marks the pandemic starting time as about February 22. Had the lockdown started on that date, the deaths would had been highly reduced. The smooth blending between the two peaks provides a transition between the two statistical regimes (or physical phases) with and without lockdown. The Monte Carlo simulation results are in agreement with our previous analysis with the D and D 2 models. The Monte Carlo generates events in a population of in-dividuals in a lattice or grid of cells. We simulate the movement of individuals outside of the cells and interactions with the susceptible individuals within a finite range. The randon events follow statistical distributions based on the exponential laws of statistical mechanics for a system of interacting particles, driven by macroscopic magnitudes as the temperature, and interaction probabilities between individuals, that can be related to interaction cross sections. The Monte Carlo simulation spread the virus in spacetime, and allows also space-time dependence on the parameters. In this work we have made the simplest assumptions, only allowing for a lockdown effect by reducing the range of the interaction starting on a fixed day. This simple modification allowed to reproduce nicely the Spanish death-per-day curve. The lockdown produces a relatively long broadening of the curve and a slow decay. Similar MC calculations can be performed in several countries to infer the devastating effect of a late lockdown as compared with early lockdown measures. The later is the case of South Africa and other countries, which have not reached the exponential growth. The Death and extended SIR models are simple enough to provide fast estimations of pandemic evolution by fitting spatial-time average parameters, and present a good first-order approximation to understand secondary effects during the pandemic, such as lockdown and population migrations, which may help to control the disease. Similar models are available [17, 26] , but challenges in epidemiological modeling remain [27] [28] [29] [30] . This is a very complex system, which involves many degrees of freedom and millions of people, and even assuming consistent disease reporting -which is rarely the case -there remains an important open question: Can any model predict the evolution of an epidemic from partial data? Or similarly, Is it possible, at any given time and data, to measure the validity of an epidemic growth curve? We finally hope that we have added new insightful ideas with the Death, the extended SIR and Monte Carlo models, which can now be applied to any country which has followed the initial exponential pandemic growth. Discussion: the Kermack-McKendrick epidemic threshold theorem Stability analysis of SIR model with vaccination Seasonality and the effectiveness of mass vaccination Application of SIR epidemiological model: new trends Epidemic disease in England -the evidence of variability and of persistency of type Report on the Prevention of Malaria in Mauritius An application of the theory of probabilities to the study of a priori pathometry. -Part I An application of the theory of probabilities to the study of a priori pathometry.-Part III A contribution to the mathematical theory of epidemics Discussion of Measles periodicity and community size by Measles Periodicity and Community Size Deterministic and Stochastic Models for Recurrent Epidemics Basic Models for Disease Occurrence in Epidemiology The D model for deaths by COVID-19 The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health The latest 2019 novel coronavirus outbreak in Wuhan, China WHO. Coronavirus disease 2019 Attacking the Covid-19 with the Isingmodel and the Fermi-Dirac Distribution Function The SIR model and the Foundations of Public Health Impact of non-pharmaceutical interventions against COVID-19 in Europe: a quasi-experimental study Inferring COVID-19 spreading rates and potential change points for case number forecasts Special Issue on Challenges in Modelling Infectious Disease Dynamics Modeling infectious disease dynamics in the complex landscape of global health Mathematical epidemiology: past, present, and future True epidemic growth construction through harmonic analysis The authors thank useful comments from Emmanuel Clément, Araceli Lopez-Martens, David Jenkins, Ra-mon Wyss, Liam Gaffney and Hans Fynbo. This work was supported by the Spanish Ministerio de Economía y Competitividad and European FEDER funds (grant FIS2017-85053-C2-1-P), Junta de Andalucía (grant FQM-225) and the South African National Research Foundation (NRF) under Grant 93500.