key: cord-0983915-2xk2p9a8 authors: Christopoulos, D. T. title: A Novel Approach for Estimating the Final Outcome of Global Diseases Like COVID-19 date: 2020-07-04 journal: nan DOI: 10.1101/2020.07.03.20145672 sha: 6f6b2f1b22d625b31e0e9d2ac7639eb00545c2c2 doc_id: 983915 cord_uid: 2xk2p9a8 The existence of a universal law which maps the bell curve of daily cases to a sigmoid curve for cumulative ones is used for making robust estimations about the final outcome of a disease. Computations of real time effective reproduction rate are presented and its limited usefulness is derived. After using methods ESE and EDE we are able to find the inflection point of the cumulative curve under consideration and study its time evolution. Since mortality processes tend to follow a Gompertz distribution, we apply the properties of it and introduce novel estimations for both the time remaining after inflection time and the capacity of the curve. Special properties of sigmoid curves are used for assessing the quality of estimation and as indices for the cycle completion. Application is presented for COVID-19 evolution for most affected countries and the World. Modern Epidemiology is based merely on the mathematics of dynamical systems that describe the time evolution of basic epidemiological variables, after having established a set of causal relations between them. Although it is a well known mathematical field which has given many successes, the fact that those models need calibration for determining the value of parameters and their tendency to give exponential growth for the mid and long term output makes them not suitable for all diseases. Another disadvantage is the extensive uncertainty intervals given for each initial estimate that makes all predictions uncertain and closer to "alarm calls" rather than to scientific results. After COVID-19 our life has drastically changed due to the social distancing measures, thus we need reliable tools for estimating the time evolution and the final outcome per country. Current work focus on the properties of every cumulative curve, either confirmed cases, deaths or recovered, in order to make robust inferences about both remaining time and final outcome at the end of current disease cycle. The only needed requirement is a small initial waiting time until an inflection point has occurred. Structure of the paper: introduction, models & limitations, introducing alternative view, comparison of the two approaches, time dynamics of COVID-19, discussion. Modern mathematical epidemiological models can be classified in a few categories, the compartment models, [17] , [18] with their enhancements, extensions, generalizations and the stochastic models, see [1] for an extensive presentation. A short history can be found in [3] , the three basic models (SIS, SIR, SIRS) are being explained in [14] while from the mathematical point of view a rigorous classification is presented at [15] and a more detailed presentation with example codes can be found in [16] . The procedure followed by those dynamical systems is (i) the creation of causal differential or difference equations between the variables Susceptible (S), Exposed (E), Infected (I), Recovered (R), Deaths (D) or others model dependent ones, (ii) calibrating the model to real time data and estimating its parameters, (iii) using it for making predictions about the time evolution of the disease. A typical epidemiological model can be described by the set of differential equations of Eq. 1, for approximately continuous time . . . (1) or by the set of difference equations of Eq. 2 for discrete time n = 0, 1, . . ., usually in days or weeks S n+1 = f s (S n , E n , I n , R n , . . . ; θ S ) E n+1 = f e (S n , E n , I n , R n , . . . ; θ E ) I n+1 = f i (S n , E n , I n , R n , . . . ; θ I ) R n+1 = f r (S n , E n , I n , R n , . . . ; θ R ) . . . Here S, E, I, R,... stand for the fractions of Susceptible, Exposed, Infectious, Recovered,... respectively of the total population and θ S,E,I,R,... is the relevant vector of parameters needed for modeling each variable. The functions that describe the rate of evolution can be either linear or non linear, while the parameters can be fixed or time dependent too, so an extended class of models can be generated by the above presented 'equations of motion'. As every typical useful dynamical system, there always exist an equilibrium state for the models of Eq. 1 which describes the long term behavior for the population, after the beginning of the disease. That process may or may not be oscillating, depending on the estimated value of the involved parameters. Our task is not to present an extended literature review of epidemiological models, but to compare their predictions with the non parametric treatment of our work. Figure 1 : Real Time R t for UK then both confirmed cases and deaths will continue to increase exponentially, following the initially calculated basic effective rate R 0 , so a very rough estimation for the expected confirmed cases after time t will be C(t) = C 0 R t τ 0 , with τ the serial interval or generation time. For example if we take the case of UK and compute its R t , after using [10] with serial interval 4 days (sd = 4.75) & weekly smoothing, then from Fig. 1 we see that on March 14th R t ≈ 2 while from [4] we have C 0 = 1144, thus the exponential phase could be roughly described by the above simple formula. so the expected cases, given τ = 4 days, [20] , will be after a month C(30) ∼ 207K, much greater than the reported value of 90K. If we recall that real number of infected cases is many times the reported ones, [2] , then above estimation could further increase and give unacceptable large numbers. A desired feature for each model would be is its ability to predict the inflection point of the cumulative confirmed cases or deaths, since after that point situation tends to enter a decreasing mode, at least for the cycle under consideration. During recent COVID-19 pandemic most of the countries applied NPI in order to reduce the R t below the unity, however even if they succeeded to keep it permanently there, no effective improvement was achieved, despite the commonly accept rule that a disease has been controlled that way. For example Italy after March 27 had R t < 1, but that did not make any difference in both daily confirmed cases or deaths. The same holds for all heavily affected Western countries, Spain, France, UK, US have followed a 3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 4, 2020. Figure 2 : Daily R t for COVID-19 in Greece similar pattern, national lockdown, keeping the rate below unity and the time evolution of the virus was not affected. On the other hand there existed countries like Greece for which after releasing lockdown the daily computed R t was many times above 1 and nothing bad was happened, see Fig. 2 (after using [10] ), thus reproduction rate cannot serve as a reliable proxy for estimations, it can serve as just a picture of the currently observed disease spread dynamics. Mathematical models can be calibrated and fitted to the data under a specific time period with success, see for example [12] , however when a mid to long term forecasting is requested, then they are giving extremely large interval estimations with mean values also very large, compared to the actually realised a few months later. For example at the beginning of COVID-19 in US & UK a study estimated the overall deaths at extremely large numbers, 510K & 2.2M respectively, just for a seven month window, see Fig. 1 of [11] . Such estimations rang as an alarm for taking social distancing measures in UK, however, even after adopting suggested interventions, the disease did not slowdown its evolution. Currently UK, US have 44K, 128K of deaths, much smaller than the predicted ones. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 4, 2020. . Even for a country like Greece, which was one of the countries that took measures very early and succeeded to combat the disease, a relevant study after using a SEIR model was predicting a cumulative number of confirmed cases approximately at 12K with a confidence interval [4.5K,27.5K], see Fig. 3 of [21] . Currently Greece has a cumulative number of 3.4K confirmed cases, smaller even from the lower edge of predicted interval. Any disease can be considered as a time evolved process which has a daily output, either confirmed cases, deaths or recovered representing the rate while the relevant cumulative variables give the final outcome from the beginning and are always S-shape curves or sigmoids. That is a kind of universal law which determines all processes and maps the bell curve of rate to the sigmoid curve of total outcome, see Fig. 3 and observe that maximum of daily rate corresponds to the inflection of the cumulative outcome. More law examples for a broad range of processes can be found at [19] . Why do we care about inflection point so much? Well contrary to the rate curve, the sigmoid one is more smooth due to the relative small changes added every day. That works by itself as a kind of smoothing to the added noise and makes things more easy from a computational aspect. Mathematically speaking Gompertz [13] curve which has been showed that describes mortality processes with its 1 st and 2 nd derivatives are defined at Eq. 3, 4, 5, It is evident that inflection point is p = ln(b) κ and we can check that corresponds to the maximum of the bell curve. Since the final outcome of the cumulative sigmoid curve is just L, we conclude that it holds Eq. 6 and that means we are able to make a first estimation for the capacity of the curve L: if we have a method to compute the inflection point p, then a first estimation for the expected capacity would be just e times the observed cumulative outcome at the time of inflection point. Thus we have proven next Lemma 3.1, or more conveniently the 'e-rule'. Lemma 3.1. If a cumulative variable like the total confirmed cases or deaths for a disease (i) resembles a Gompertz sigmoid pattern & (ii) has reached its inflection point, then its final maximum will be at least e times the observed value at the time of inflection. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.20145672 doi: medRxiv preprint Another key question is when approximately that maximum is expected to occur. For that task we recall again the properties of Gompertz distribution and compute the value of the function at e times the inflection point The above function converges to unity very rapidly for values of b > 1, so we have proven next Lemma 3.2. Lemma 3.2. Given the assumptions of Lemma 3.1 the value of the cumulative variable under consideration at time e p is approaching fast the limit value for the current cycle of the disease. Although we can use the confirmed cases of a disease in order to study the time evolution, it is preferable to use the cumulative deaths because its count is more reliable, at least for countries that present a higher level of transparency, we can simply argue that "dead cannot be hidden". So we focus on the curve of total losses until the current date, starting from the time of first reported loss in each country/region of interest. 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. . A first task is to use method Extremum Distance Estimator (EDE), see [5] , [7] , which is a rigid method in the sense that the inflection point has to be reached for a sufficient time in order to be able to give an estimation. The two critical points correspond to the two Unit Invariant Knee (UIK) points, see [6] , for the first (convex) and the second (concave) part of the curve. Since curve begins its main exponential growth at the first UIK while after the relevant second enters in the slowing phase, we have an objective way to divide the process. If we follow the definitions of method EDE, then the sequence of points in ascending order is {x 1 , χ F1 , χ D , χ F2 , x n }, with χ D the estimation for the inflection point, thus we can divide the time evolution in next In order to be able to infer about the percentage of completion for stage IV, given a death curve for a disease, we introduce next Definitions. The LIR D & T IR d for the first 20 countries in the global list of deaths due to COVID-19 has been computed and is presented in Table 2 1 , which carries much information: • First of all the fact that method EDE computes the inflection point as the midpoint between the lower and upper UIK point, guarantees that a value close to 2 is an index of completing a symmetric sigmoid curve. However not all diseases offer such an option, for example COVID-19 has shown highly non symmetrical behavior for all countries that completed the first cycle. China til April 14 (before changing the number of deaths) had a value of LIR D,China = 2.44. • If we take the observed values of LIR D as proxies, then we can estimate a lower threshold for the expected losses from disease by simply doubling the L D . • For countries that inflection point has not yet been reached, the expected losses will be more than twice the currently reported ones. 1 Til the end of first cycle: US June-1, Spain May-21, Iran May-19, China Apr-6, World May-27 7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. All inflection computations have been performed by using [8] . If we apply the ESE method for estimation the inflection point of the curve of total losses, then curve classification changes because points of interest are different in that method. Now, according to the definitions of [5, 7] , we focus on the x-left (x l ) and x-right (x r ) points where the left and right chord respectively are tangent to the curve. Of course here we use the estimated points χ l , χ r due to the noisy nature of the curve. The sequence of points in ascending order is {x 1 , χ r , χ S , χ l , x n }, with χ S the estimation for the inflection point, thus we can divide the time evolution in next four Given the fact that the two methods EDE, ESE give different estimations, we need next Definitions is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. For Italy and Brazil we find Figs. 6, 7. Due to its design, ESE method can give an estimation for the inflection point at its early appearance and there is no need for a reasonable cycle completion to have been achieved. This is a reason for making it a good proxy for detecting the turning point of the disease from exponentially growing (Stages I, II) to slowing down (Stages III, IV). For example Brazil seems to have started the slowing down process. From Table 2 , again created after using [8] , we see that it is an encouraging sign the fact that al countries seem to have moved at an early version of Stage III. It is interesting to mention that the time needed from the first reported case to the begin of the exponential phase, as measured by the first existing UIK point in the curve, varies from just 3 days (China, that's an index that disease actually has began much before) to 94 days (India), see Fig. 8 for a graphic visualization of the countries with more than 50K confirmed cases, while in Table 3 they are listed in ascending order on the date of exponential begin -in its last lines there are mainly those from the Southern Hemisphere. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. After having a reasonable number of daily data we area ble to compute the Inverse Empirical Cumulative Distribution Function (IECDF) for the variable of interest, usu-11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. ally for deaths from the disease. That curve becomes a sigmoid one after the overall cumulative process has reached its inflection point and we can estimate it either by EDE or ESE method. That point is a percentile one for the empirical distribution and has always a certain movement from a value close to unity, say 0.95 just after inflection, to a value slightly over 0.10 at the end of current disease cycle. For the 20 most affected countries plus World, the relevant plots for EDE (past month) and ESE(past 2 months) can be found at Figs. 9 and 10. Obviously it is evident a country classification according to its inflection dynamics, since IECDF is comparable between them. There are countries like China, Spain, Italy which seem to have completed their first cycle. Other countries like UK, France tend to reach that task now. A third class is countries like US, Canada, Ecuador that are at the last miles. Finally a fourth class contains Russia, Brazil, Peru, India that are still at the first exponential phase, actually they have not reached yet an EDE inflection point. From the plots of 20 most affected countries we can derive two next Empirical Observation 3.1. The inflection point of IECDF curve which is computed by EDE needs substantial time to be reached and then tends towards the 1/3 percentile point after a few months. Empirical Observation 3.2. The inflection point of IECDF curve which is computed by ESE although it can be found at an earlier stage, then it needs more time for converging towards either to the 1/3 or 1/10 percentile point. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. Since the COVID-19 has completed its first cycle in many countries and has also just begun for others, we are able now to compare the two approaches, the broadly accepted mathematical models (SEIR etc) and the novel suggested by this work. That task is better to be done for UK, US and Greece because they are representative cases for a big, medium and small country respectively, while there exist works published online about them. The first approach is the mathematical model used by [11] with data til March 16th. Their prediction for UK was presented at Fig. 1 which is a totally symmetric curve. The model output had a cumulative value of 510K for losses in UK, far beyond the currently reported 44K. The alternative approach, based on data available til March 16th, gave that UK was just on the exponential phase and no inference could be done at that early time. 14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. However, after passing a reasonable time period, method ESE first and then EDE gave their estimations, see Fig. 11 . We observe that middle May, both methods converged to the value of 18527, thus an estimation for the total losses in UK begins with that number as multiplier and given China's LIR S = 3, we may expect at the worst cases approximately 55K losses, again much less than the 510K, actually close to the 10% of model's predicted value. From the sample presented at Fig. 11 we were able to extract the inflection point at May 24th which has a value of 255K due to the symmetry of the curves, while the robust estimation of 18257 was found by both methods at May 9th. From the same Fig. 1 of [11] we find that expected evolution by mathematical models is of symmetric shape and predicts 2.2M losses for US after using data available til March 16th. Our alternative approach again would not give any kind of estimation til March 16th, since the disease was at Stages I, II, but it would be able to create ESE estimations for inflection point at early April, while then it had an increasing trend. EDE method which is more rigid began to give output at early May and both methods have finally converged to the values of 61252(2020-04-29), 55162(2020-04-26) respectively given order of mentioning above, see Fig. 12 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. The model predictions of [21] are presented at their Fig. 1 where the model has been fitted to the available data til April 26. Although non published, the estimation given by the senior author of [21] for the deaths expected in Greece if no measures had been taken was a number of 13685 losses (as stated in the last daily TV briefing of May 26th). We may probably compare that number with Belgium's 9334 deaths for the same date, however Greece is not such an extroverted country (recall that Belgium is the EE administrative center and many other International organisations are based in Belgium), so given that Greece has 1 million less population, that estimation would not have been realized. If we had applied our alternative view for the same time period, then we could had created Fig. 13 and on April 26th we could had made a prediction for the total deaths by using the 'e-rule' finding for the worst case, based on China, a number of 171 deaths, very close to the finally obtained from that cycle in Greece (about May 24th). The fact that Earth is divided in Northern and Southern Hemispheres is proven to play a critical role for the number of COVID-19 cycles. First it was the winter at North and that gave us the first cycle. Now it is the winter at South and a new cycle has began. We can estimate the approximate begin date if we apply method EDE for the curve of 16 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. Figure 13 : The 4 EDE-stages for COVID-19 in Greece for the study time of [21] global cumulative deaths, after its inflection point. Then we find a new turning point on May-27, see Fig. 14 , where we have also plotted the LOWESS smooth curve, after [9] . It is certainly not a good sign which shows that disease will not slow down unless it will have covered both winters on Earth. Except the geographical reasoning, we have observed the appearance of a second cycle in countries like Iran which after May 19th has increasing number of deaths again. We have seen from Tables 1 & 2 that World has not reached an inflection point under method EDE for the first cycle, while the relevant point that was found by method ESE gave rather small values for LIR S = 1.66 & T IR S = 1.32, thus we cannot make any robust inferences for the expected number of losses. If we accept that second cycle will have the same intensity as the first, then we may expect a slightly less number than 356K of May-27, as extra deaths. If we also accept the hypothesis that in Northern Hemisphere only Russia has not completed its cycle, then we may not expect many losses from there. Unfortunately one of the most powerful countries of the World, United States, seem to have began a second cycle after June 1st, see Fig. 15 . As usually, since truth is somewhere in the middle, we cannot reject the currently accepted set of mathematical models and the underlying treatment of a disease, but we cannot rely only on them for estimating the time evolution of the pandemic. Although 17 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 4, 2020. the effective reproduction rates R 0 , R t can serve as reliable proxies of the picture taken at the outbreak or later, the recent COVID-19 adventure has shown that the requirement of R t < 1 has a very limited usefulness for either predicting the outcome of a disease cycle or for controlling the pandemic. Examples are almost all affected Western countries, where the above condition although fulfilled, both confirmed cases and deaths did not slow down until they had created a sigmoid curve for the relevant cumulative variables. Coronavirus seems to act like a 'disease weapon bomb' which after being released has an autonomous route despite the NPIs that a country will apply or not. The successful cases for combating the virus attach have been only those countries (like Greece) or territories (like Zahara de la Sierra of Spain) that applied a very strict policy of almost total isolation from anything that was arrival from abroad. Recall that in Greece at the very early stage, neither the Sailing Yachts were allowed to moor at the ports. That strict isolation combined by the national lockdown and given the country terrain, led to the almost zero deaths compared to other affected countries. Especially for Greece next particularities hold: • Greece presents difficulties in communication between its provinces due to the terrain geography (mountains, many small islands) • The prohibition of airport flights from abroad reduced the import of new cases, something that other countries were forced to do much later than Greece 18 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 4, 2020. • The non existence of a direct flight from Wuhan did not transmit the virus directly from its first place of appearance, unlike Italy which had a stronger connection to China The specific country constraints should have been considered for a less hard lockdown in the Aegean Islands, since isolation was almost absolute due to the lack of transports. For example in a mountain village with population 280, what was the rational of not permitting local residents to walk free of constraints? Who was going to infect whom? Those measures were extremely hard for the already social self isolated villages and towns in Greece and did not offer any kind of help against the virus. Thus decision making based merely on the values of R t could be just harmful for the society, it can act as a "bulldozer in the garden". Since cumulative deaths from any disease follow an S-shape or sigmoid curve, then it is easy to compute the inflection point by using methods EDE & ESE and then define new concepts for helping us to monitor the time evolution and to compare different countries. That is the contribution of current work and we hope subsequent works to improve the newly introduced concepts. The author declares that there exist no conflict of interest. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.20145672 doi: medRxiv preprint An Introduction to Stochastic Epidemic Models Covid-19 antibody seroprevalence in santa clara county, california Mathematical epidemiology: Past, present, and future. Infectious Disease Modelling Covid-19 data repository by the center for systems science and engineering (csse) at johns hopkins university Developing methods for identifying the inflection point of a convex/concave curve Introducing unit invariant knee (uik) as an objective choice for elbow point in multivariate data analysis techniques On the efficient identification of an inflection point inflection: Finds the Inflection Point of a Curve Lowess: A program for smoothing scatterplots by robust locally weighted regression EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves. R package version 2.2-3 Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand. School of Public Health, Imperial College London COVID-19 Report 13: Estimating the number of infections and the impact of non-pharmaceutical interventions on covid-19 in 11 european countries. School of Public Health, Imperial College London COVID-19 On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies Three Basic Epidemiological Models The mathematics of infectious diseases Modeling Infectious Diseases in Humans and Animals A contribution to the mathematical theory of epidemics Contributions to the mathematical theory of epidemics. ii. —the problem of endemicity Serial interval of novel coronavirus (covid-19) infections Modelling the SARS-CoV-2 first epidemic wave in Greece: social contact patterns for impact assessment and an exit strategy from social distancing measures. medRxiv This work has been completed without any kind of external funding.