key: cord-0197314-mg04drna authors: Maltezos, S. title: Methodology for Modelling the new COVID-19 Pandemic Spread and Implementation to European Countries date: 2020-06-27 journal: nan DOI: nan sha: 81b27081094296177bc4b76dbc14fc533b1e54ca doc_id: 197314 cord_uid: mg04drna After the breakout of the disease caused by the new virus COVID-19, the mitigation stage has been reached in most of the countries in the world. During this stage, a more accurate data analysis of the daily reported cases and other parameters became possible for the European countries and has been performed in this work. Based on a proposed parametrization model appropriate for implementation to an epidemic in a large population, we focused on the disease spread and we studied the obtained curves, as well as, we investigated probable correlations between the country's characteristics and the parameters of the parametrization. We have also developed a methodology for coupling our model to the SIR-based models determining the basic and the effective reproductive number referring to the parameter space. The obtained results and conclusions could be useful in the case of a recurrence of this repulsive disease in the future. The disease of the new virus COVID-19 which has been pandemic in the world for about 90 days, the "wavefront" of infection has reached its mitigation stage. Therefore, this is the time to turning our thoughts not only to its subsequent, painful and serious implications of this pandemic [1] , [2] , [3] , [4] but also, it could be useful to analyse the way of growth of the disease among the countries until the 10 th of May 2020, as well as, to correlate these with the main parameters that likely played a significant role. In particular, we consider extremely useful to study the specific characteristics of each country that played a role, the financial level or even genetic behaviour against to the new corona virus and the associated disease. Some of these characteristics used as mathematical parameters for performing correlation studies. The results of this study could give us information for preparing more effective defensive strategies or practical "tools" in a possible future return of the pandemic which constitutes the central goal of the present work. The outline of the paper is as follows. In Section 2 we present a theoretical methodology for parametrizing an epidemic, in Section 3 we explain how to couple the present parametrization model with the basic SIR model, in Section 4 we give results relevant to the end-to-end epidemics growth and in Section 5 we discuss the conclusions. Our methodology is based on the parametrization of the growth of the COVId-19 disease that we used in our recent work [5] and also in [6] and [7] , that we call "semi-gaussian of n-degree". It was used for fitting the disease's growth at various indicative countries and it belongs to the model category of "Regression Techniques" for epidemic surveillance. The basic-single term expression of this parametrization model is where the function c(t) applied in an epidemic spread represents the rate of the infected individuals as the new daily reported cases (DRC) and coincides with the function I(t) in the SIR model, as we can see in the following. Also A is constant while n and τ are model parameters. The more analytical approach, in the general case from the mathematical point of view, comes from the fundamental study of the epidemic growth and includes a number of terms in a form of double summation related to the inverse Laplace Transform of a rational function given in [8] , referring to the "Earlier stages of an epidemic in a large population". In this hypothesis, the number of unaffected individuals may be considered to be constant, while any alternation is assumed small compared to the total number of exposed individuals. This function, which could be called "Large Population Epidemic Semi-Gaussian model" (LPE-SG), is the following where A ij are arbitrary amplitudes, n ij are the degrees of the model (assumed fractional in a general case) and τ ij are the time constants representing in our case a mean infection time respectively. Also, M and N are the finite number of terms to be included. It is easy to prove that the "peaking time" of the function of each term depends on the product of n ij by τ ij , that is, t pij = n ij τ ij . In practice, the number of the required terms should be determined according to the shape of the data and the desired achievable accuracy. After investigation of the fitting performance we concluded that, at most, two terms of the above double sum are adequate for our purpose. Also, the cross terms, with indices ij and ji, cannot help more the flexibility of the model. In particular, a) the degree of the model can "cover" any early or late smaller outbreak of the daily cases, while b) the mean infection time is a characteristic inherent parameter of the disease under study and thus should be essentially constant. For these reasons, the expression with one term was adequate in most of the cases, whereas, the 6 free parameters allow a very good flexibility for the fitting. Therefore, we can write For the fitting procedure, we have used two alternative tactics, based on either the daily model or on the cumulative integral of it. The decision depends on the goodness of the fit in each case based on the criterion of minimizing the χ 2 /dof, as we have done in our previous work. [5] . The starting date (at t = 0) was one day before the first reported case (or cases). The cumulative parametrization model and the fitting model take the following forms respectively where the symbol Γ represents the gamma function and the Γ c the upper incomplete gamma function at time t. The above generalized mathematical model, has the advantage of providing more flexibility when the raw data include a composite structure or superposition of more than one growth curves which could be coexisting. This is possible to happen due to restriction measures applied during the evolution of an epidemic. Regardless of the number of terms used, the obtained parameters must be well understood, in the sense of their role in the problem. Let us consider a country where the disease starts to outbreak due to a small number of infected individuals. In this stage we assume that the country has been isolated in a relatively high degree, but of course not ideally. At this point, the disease starts with a transmission rate which depends on the dynamics of the spread in each city and village, while other inherent properties of the disease affect its dynamics (e.g. the immune reaction, the incubation time and recovering time). At this point, we must clarify also the issue of the "size" of the epidemic. The SIR-based models assume that the size, N, that is the total number of individuals exposable to the disease is unchanged during its evolution, a fact which cannot be exactly true. On the other hand, a fraction of the size concerns individuals who are in quarantine for different reasons (due to tracing or for precautionary reasons). Therefore, the size cannot be absolutely constant and the forecasting at the first stage (during the growing of the epidemic) should be very uncertain. In the second stage (around the turning point), the situation is more clear by means of more accurate parametrization, although high fluctuations could still be present. At the third stage (mitigation or suppress), an overall parametrization can be made and any trial for forecasting concerns a likely future comeback of the epidemic. In any of the three stages regardless of the level of uncertainties the parametrization specifies the associated parameters according to the epidemic model used. It is known that the "basic reproductive number" symbolized by R 0 is a very important parameter of the spreading of the epidemic. In the SIR-based models it is determined at the first moments of the epidemic (mathematically at t = 0) and is related to the associated parameters. Moreover, as it is proven in the next, this parameter doesn't depend on the size N of the epidemic. By using the present parametrization model we assume that the size of the epidemic is, not only unknown, but also much smaller than the population of the country or city under study, that is, it constitutes an unbiased sample of the potentially exposable generic population. Once the epidemic is pretty much eliminated, the size could be also estimated "a posteriori" by the help of a SIR-based model. However, in this case, the parameters of the spread, as well as, the reproductive number are already determined by the methodology given in the next. We consider that this more generic approach facilitates the fitting process and improves the accuracy because of the existence of an analytical mathematically optimal solution. 3 Coupling with the SIR-based models 3 The classical model for studying the spread of an epidemic, SIR, belongs to the Mathematical or State-Space category of models along with a large number of other types which are analytically described in [9] . Our model belongs to the continuum deterministic SIR models in the special case applied at the earlier stage of an epidemic, assuming that the population is much greater than the infected number of people [8] . This model can be applied also when the epidemic is in each latest stage where the total number of infected individuals is an unbiased sample of the population. Under these assumptions, this model can be related to the classical SIR model, or even with its extensions (SEIR and SIRD), by means of correlating their parameters. Below, we give a brief description of the basic SIR epidemic model. Let us describe briefly the three state equations of the SIR model: The function S = S(t) represents the number of susceptible individuals, I = I(t) the number of infected individuals and R = R(t) the number of recovered individuals, all referred per unit time, usually measured in [days] . The constant factor a is the transmission rate, the constant factor β is the recovering rate and N is the size of the system (the total number of individuals assumed constant in time), that is, S + I + R = N , at every time. The assumptions concerning the initial and final conditions are, S(0) = 0, S(∞) > 0, I(∞) = 0 and This model does not have an analytical mathematical solution additional the two parameters a and β are constant during the spread of the epidemic. A solution is derived only with the approximation, βR/N < 1, that is, when the epidemic essentially concerns a small number of recovered compared to the total number of individuals. In this case a Taylor's expansion to an exponential function is used. In our study, we work for the general case without this assumption. In our basic model, C(t), the integral of the function c(t), must be compared to the total number of infected individuals I, while the parameter A undertakes the scaling of the particular data. The parameter τ does not necessarily coincides with the inverse of the " mean infection rate", a, but 1/τ expresses an "effective transmission rate" in our model. The parameter n cannot be equalized to any of the parameters of the SIR model. However, this parameter contributes to the key parameter of the epidemic spread, the so-called "basic reproductive number", R 0 , which is defined at t = 0 and is equal to R 0 = (a/β)(S(0)/N ), where β is the "mean recovering rate". Because S(0) ≈ N , it becomes R 0 ≈ a/β. However, the S(0)/N ) represents a basic threshold, the so called "population density", above which the epidemic is initiated and growing when R o = (a/β) > 1. Moreover the "effective reproductive number", R e , a variable as a function of time is also defined by the same way as follows Because the condition for creating an epidemic is R e > 1, the corresponding condition should be S/N > 1/R 0 . Also, at t = 0 should be R e (0) ≡ R 0 , at the peaking time t = t p should be R e (t p ) = 1 and at t = ∞ takes the value R t = R 0 S N (∞) < R 0 . By using the expressions of Eq. 9, including only the value of I and its derivative, one can estimate roughly the R e at any time t. It can be also shown, that the S(∞)/N can be determined by solving the following transcendental equation numerically. Therefore, we can conclude that for R e , at the outbreak of the epidemic (rising branch of the curve) we have, 1 < R e < R 0 , exactly at the peak of the curve, R e = 1 (because S = N/R 0 , as we explain in the next) and at the mitigation stage (leading branch of the curve), R e < 1 and tends to a minimum value at the asymptotic tail of the curve which is Once the above relationship is achieved, the R 0 can be determined by solving the derived algebraic equation. Indeed, this was our initial motivation to perform the following analysis. The methodology for accomplishing it, was based on the idea to exploit the property of our model at its maximum at the peaking time, which is t p = nτ , as it can be easily proven by differentiation. On the other hand, in the SIR model a peak is expected some time for the function I, as the typical effect of the epidemic's spread. Considering that both models can be fitted to the data, in the vicinity of the peak must agree, and therefore, we must claim that I p = C(t p ). Let us first find an expression of the S, R and I at the peaking time, symbolizing them by, S p , R p and I p respectively. In order to relate S with R, we replace I from Eq. 6 into Eq. 8, we obtain from which, taking into account that S(0) ≈ N , we derive the solution The later result at the peaking time gives us an expression of R p Also, according to Eq. 7, at the peaking time we might have From this equation, and using the definition of R 0 , we obtain Based on Eq. 14 we calculate R p as follows Adding the three functions at the peaking time, S p , I p and R p , we derive the algebraic equation In order to achieve an equation independent of size N , we must express I p /N as a function of the model parameters, that is, by using the maximum value of the model curve [5] and the Eq. 8 of the SIR model by integration with upper limit the infinity, as follows Aβτ 0 1−n0 Γ(n 0 + 1) 1 + S(∞) where the symbol Γ represents the gamma function, n 0 and τ 0 are the particular values obtained by a fitting) and τ 0 = 1/β. Replacing the above expression to Eq. 18 and setting s N = S(∞)/N we obtain This transcendental equation can be solved only numerically for R 0 in which the combined unknown s N is also found numerically by using again another transcendental Eq. 10, by using multiple iterations leading to a converging accurate solution within 12 loops. The parameter n of the model is essentially the expresser of R 0 , while the obtained value of R 0 concerns an hypothetical SIR model fitted to the data of the daily reported cases (DRC). From the obtained solution for R 0 we can also calculate the parameter a of SIR model, a = βR 0 , where β can be calculated from the peak value of the daily reported recovered individuals by the formula, β = R p /I tot,p , where I tot,p represents the integral of the DRC curve with upper limit the peaking time t p . In particular, Implementing the above methodology, by using home made software codes written in Matlab platform [10] , we obtained the fitting of the DRC curve for Greece at the mitigation stage, shown in Fig. 1 and Fig. 2 . The fitted parameters, n 0 = 4.57 and τ 0 = 5.96 and the solutions R 0 = 2.90 and s N = 0.06. For the parameter β we used a typical-average value found in the literature where, β = 0.10 and the same value we used for the other analyzed countries. Two characteristic parametrizations of very large normalized size and very small one (64 times smaller), that is, of Belgium and Malta, respectively, are given in Fig. 3 and Fig. 4 . Definitely, without seeing the vertical scale, one cannot distinguish which corresponds a large or a small normalized size. The only visible difference at a glance, is the peaking time (29 and 16 days respectively). 4 Study of the end-to-end epidemic growth For the data analysis we selected the 29 countries of EU, including Switzerland and UK obtained from [11] . The characteristics of the countries relevant to our study are summarized in Table 1 . In particular, we used the population density, the estimated normalized total number of infected individuals (determined by the number of deaths by using a typical constant factor) and the Gross Domestic Product (GDP), nominal per capita. The degree of correlation among the above characteristics and the modelling parameters, was studied by the "theoretical pearson linear correlation coefficient" given by the following formula where X and Y are considered normal random variables, σ x and σ y are the corresponding standard deviations and Cov(X, Y ) is their covariance. However, as it is done in practice, we calculated the "sampling pearson coefficient" (SPC), r(X, Y ), for n observed random pairs (X i , Y i , ..., X n , Y n ), where the X can represent the first selected variable and Y the second one. The correlation study concerned eight pairs, as is illustrated in Table 2 . The conclusions of the linear correlation study are the following: 1. For the population density D: no correlation was found with other parameters. 2. For the model parameters n 0 and τ 0 : strong anti-correlation was found. 3. For the peaking time t p : very strong anti-correlation was found with the basic reproductive number, R 0 . The scatter plot of the basic reproductive number R 0 and the peaking time t p is shown in Fig. 5 . This correlation gives us the following message: the higher R 0 results to less a delay of the upcoming peak in the DRC curve. The obtained slope of the linear fit was −8.4 ± 0.2 days. On the other hand, the R 0 among the analyzed countries, present statistical fluctuations from about 2 to 4.6, obeying roughly a gaussian distribution with mean value 2.96 and standard deviation 0.68 (or relative to mean 23 %). The parameters n and τ also fluctuate, as we can see in Fig. 6 while the peaking time t p shows stochastic characteristics obeying similarly a gaussian distribution with mean 25.7 days and standard deviation 7.8 days (or relative to mean 30 %). Since R 0 fluctuates (and in turn the t p due to their linear correlation) among the different countries randomly without presenting any correlation with their associated parameters, we can conclude that the normalized size of the epidemic can be explained only by taking into account other reasons and aspects related to the way citizens interact and behave as well as the degree of social distances and mobility or transport within a country's major cities. Also, a crucial role played definitely the degree of quarantine and likely some individual biological differentiations (genetic and other related characteristics). The capability for surveying the epidemic spread during the three main stages is very important and could be based on the daily data and the mathematical modelling we presented. In the mitigation stage the surveying is even more useful and crucial when the restriction measures are starting to be relaxed. The crucial condition for a new epidemic reappearance is based on the effective reproductive number, R e , as well as, on the corresponding population threshold. However, because of the large statistical fluctuations caused by the poor statistics of data as well as because of the low slope of the epidemic curve in this stage, it is very hard to achieve accurate numbers, but only a qualitative estimate as follows. The R e can be estimated from the expressions in Eq. 9 and using average numerical approximations of the slope, dI/dt. An alternative and practical formula based on the parametrization model SG-LPE can be easily proven and is, R e = 1 + (nτ /t − 1)/β. This formula is valid only in the vicinity of the peak, namely in the narrow range from 0.5t p to 1.5t p , because the fitted model and the SIR one differ in the slopes at both side tails. Once R e is estimated, the population density threshold, in turn, can be calculated and should be S(t)/N = 1/R e , assuming that the normalized size N can also be estimated by a similar level of accuracy. Therefore the crucial condition in the mitigation stage is written as The derivative has to be calculated as an average slope, ∆I/∆t, preferably at least within one week. Assuming that this slope is I w and the corresponding average cases in a week is I av the crucial condition becomes where we used the typical values, β = 0.1 days −1 and N S(∞) ≈ 15 for having a practical result as a case study. This simplified formula combined with one-week measurements should be very useful because the relative fluctuations of the DRC are expected to be very large. A systematic analysis of the epidemic characteristics of the new virus COVID-19 disease spread is presented in this work. For the mathematical analysis, we used a model that we called LPE-SG which facilitates the parametrization by an analytical mathematical description. We also presented a methodology of its coupling with the SIR-based models aiming to determine the basic and effective reproductive numbers based on the fitted parameters. Analysing the daily reported cases of European countries, we found no correlation between the population density, normalized size or GDP of the countries with respect to the spreading characteristics. Another important finding of our study was a strong anti-correlation, statistically significant, of the basic reproductive number and the peaking time. Moreover, we found that the basic reproductive number in the epidemics studied showed a uniform distribution with a wide range of values. This means that it is mainly influenced by many factors and generic characteristics of the society in a country. Databased analysis, modelling and forecasting of the COVID-19 outbreak Epidemic analysis of COVID-19 in China by dynamical modeling A Robust Stochastic Method of Estimating the Transmission Potential of 2019-nCoV Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data Parametrization Model Motivated from Physical Processes for Studying the Spread of CIVID-19 Epidemic Polynomial growth in branching processes with diverging reproductive number Fractal kinetics of COVID-19 pandemic A Contribution to the Mathematical Theory of Epodemics Mathematical modeling of infectious disease dynamics I would like to thank my daughter V. Maltezou, a Graduate of the Department of Agriculture of the Aristotle University of Thessaloniki and, of Athens School of Fine Arts, for our discussions on the global epidemiological problem, which gave me the warmth and the motivation for doing this work. Also, I thank my colleagues, Prof. Emeritus E. Fokitis and E. Katsoufis, for their insightful comments and our useful discussions.