key: cord-0591789-prmp6ega authors: Ahmetolan, Semra; Bilge, Ayse Humeyra; Demirci, Ali; Peker-Dobie, Ayse; Ergonul, Onder title: What Can We Estimate from Fatality and Infectious Case Data? A case Study of Covid-19 Pandemic date: 2020-04-27 journal: nan DOI: nan sha: 1b0ffeb0094fb5a60b103d0a8945e4311ea4041d doc_id: 591789 cord_uid: prmp6ega Daily case reports and daily fatalities for China, South Korea, France, Germany, Italy, Spain, Iran, Turkey, the United Kingdom and the United States over the period January 22, 2020 - April 20, 2020 are analysed using the Susceptible-Infected-Removed (SIR) model. For each country, the Susceptible-Infected-Removed (SIR) models fitting cumulative infective case data within 5% error are analysed. It is shown that the quantity that can be the most robustly estimated from the normalized data, is the timing of the maximum and timings of the inflection points of the proportion of infected individuals. countries, there have been no substantial delays in the arrival of the pandemic in non-affected areas. This fact was also observed for the H1N1 epidemic in 2009 [2] . Increased globalization makes infectious diseases everyone's problem, as experienced in this pandemic: the massively increased demand for intensive care treatment of Covid-19 patients has caused severe disruption of health care systems throughout the world. In most countries the infection is in its early phase in terms of the duration of the infection. A great deal of effort has been invested in the estimation of epidemic parameters of Covid-19 in the early stage for China and some other countries [3] , [4] , [5] , [6] , [7] , [8] , [9] , [10] . In [3] the authors analysed the temporal dynamics of the disease in China, Italy and France in the period between 22nd of January and 15 th of March 2020. In [4] , the potential for sustained human-to-human transmission to occur in locations outside Wuhan is assessed based on the estimations of how transmission in Wuhan varied between December, 2019, and February, 2020. The difficulty related to the accurate predictions of the pandemic is discussed in [5] . In [6] , the authors used phenomenological models valid during previous outbreaks to generate and assess short-term forecasts of the cumulative number of confirmed reported cases in Hubei province and for the overall trajectory in China [7] . Epidemic analysis of the disease in Italy is presented in [8] by means of dynamical modelling [9] . Forecasting Covid-19 is investigated in [10] . In addition, the change in the epidemic behaviour of various countries can be traced by the use of data driven systems [11] . One of the common features of these works is the existence of variations in these parameter estimations. In the present work, we show that the number of reported cases provides the most accurate representation of the number of removed individuals and that the quantity that can be most robustly estimated from normalized data, is the timing of the maximum and timings of the inflection points of the proportion of infected individuals. These values correspond to the peak of the epidemic and to the highest rates of increase and the highest rates of decrease in the number of infected individuals. The stability of the estimations is discussed by comparing predictions based on data with long time spans. Publicly accessible data that have been released by the state offices of each country are used for the analysis. The data set of each country is collected according to published official reports and available at the website http://www.worldometers.info/coronavirus/ (last access: 27 April 2020). Updated data are also available at the website http://epikhas.khas.edu.tr/. The last data in this work was collected on the 18th of April, 2020. Data covers the period January 22-April 18, 2020 and in the following, "Day 1" corresponds to January 22, 2020. The Susceptible-Infected-Removed (SIR) model [12] is used for analysis. It should be noted that Chinese state-provided data up to April 16, 2020 is used in the analysis. This data was provided prior to the revisions made after this date. The Susceptible-Infected-Removed (SIR) model is a system of ordinary differential equations modelling the spread of epidemics in a closed population, under the assumption of permanent immunity and homogeneous mixing [12] . These equations are If the disease has an incubation period, then the Susceptible-Exposed-Infected-Removed (SEIR) model governing its spread is where is the contact rate, = 1/ is the mean infectious period and 1/ is the incubation period. Since the Covid-19 infection has an incubation period, the right model to use is the SEIR system. But, in previous work [14] it was shown that the parameters of the SEIR model cannot be determined from the time evolution of the normalized curve of removed individuals. Thus the SEIR model should not be used in the absence of additional information that might be obtained by clinical studies. Nevertheless, the SIR model can be used with some modifications. The ratio β/η, called the Basic Reproduction Number and denoted as 6 , is the key parameter in both the SIR and SEIR models. This number is related to the growth rate of the number of infected individuals in a fully susceptible population and determines the final value of , 7 , that is the proportion of individuals that will be affected by the disease. This proportion includes individuals who gain immunity without showing symptoms, those who are treated, as well as disease-related fatalities. The reciprocal of the parameter η, T=1/ η is considered as a representative of the mean infectious period. In [14] , it was shown that the normalized R(t) for the SEIR model with parameters (β, ε, η) and the normalized R I (t) for the SIR model with parameters (β I , η I ) have the same final value (R f ), the same R m value at the peak of the number of infected individuals, and the same slope (R' : ) at that point (after a time shift of these curves), provided that 1/ η I = 1/ε +1/η. It follows that if one has to work with the normalized data, the SIR model can be used to model diseases with an incubation period, provided that the sum of the incubation and the infection periods of the SEIR model is used as an effective infectious period for the SIR model, unless there are reasonable estimates for 6 and/or for the duration of incubation and/or infection periods. It was also observed in [14] that the normalized curves of removed individuals are practically indistinguishable for moderate values of 6 , such as in ordinary flu. As the Basic Reproduction Number for Covid-19 is high, one cannot expect that the shifted normalized curves for the SIR and SEIR models coincide, but as clinical information on Covid-19 parameters is yet unclear, the SIR system is adopted as the basic model in the present work. The relation between R 0 and R f is determined as follows. Note that R(t) is monotone increasing, and hence it can be used as an independent variable, instead of t. The derivative of S with respect to R is given by Assuming initial conditions S → 1 and R → 0 as t approaches negative infinity, one can obtain the following by integrating (3) As t approaches positive infinity, I → 0, and S(t) and R(t) approach their final values S f and R f , respectively. Then S+I+R=1 yields Consequently, R 0 is derived as Therefore, the values S m =S(t m ), I m =I(t m ) and R m =R(t m ), as obtained in [14] are Here, t m refers to the time at which the number of infectious cases reaches its maximum. The values L , L and L are crucial in determining the proportion of individuals that need to be vaccinated in order to reduce the proportion of susceptible individuals below the threshold value S m =1/R 0. The graph of R f versus R 0 is shown on Figure 1 , together with the ranges of R 0 for well-known diseases. It can be seen that for R 0 >2.5, R f is greater than 90%. The figure also shows that the increase in R f with respect to R 0 is very slow for R 0 >3. It is generally accepted that the R 0 for Covid-19 is greater than 3 despite all containment measures [13] , [15] , [16] . Thus, unless vaccination is applied, one would expect that at least 95% of the population would be affected by the disease. In addition, the knowledge of its precise value would have little effect on the planning of healthcare measures. It should also be kept in mind that containment measures provide a temporary control of the spread of the epidemic, just to the point of reducing the burden of the epidemic to a manageable size. According to the Centers for Disease Control and Prevention (CDC), it is still unknown when viral shedding begins or how long it lasts for, and nor is the period of COVID-19's infectiousness known. Like infections with MERS-CoV and SARS-CoV, SARS-CoV-2 RNA may be detectable in the upper or lower respiratory tract for weeks after illness onset, though the presence of viral RNA is no guarantee of the presence of the infectious virus. It has been reported that the virus was found without any symptoms being shown (asymptomatic infections) or before symptoms developed (pre-symptomatic infections) with SARS-Cov-2, though the role they may play in transmission remains unknown. According to prior studies, the incubation period of SARS-CoV-2, like other coronaviruses, may last for 2-14 days [17]. To illustrate an example for an SIR model, R 0 , T and R(t 0 ) are chosen as 3, 10 and 10 -3 , respectively and the related graphs are given on Figure 2 . It is in general accepted that the number of fatalities represents the number of removed individuals and the number of confirmed cases represents the number of infected individuals. The proportionality constants are unknown, but as long as they are constant, one can work with the normalized case reports and normalized fatalities and look for the determination of the epidemic parameters from the shape of these normalized curves. In Section 4, it will be shown that for the Covid-19 data, total cases would be a better representative of the number of removed individuals. According to the SIR and the SEIR models, given by the equations (1) and (2), the rate of change of the number of removed individuals is proportional to the number of infectious cases. In terms of observations, this corresponds to the fact that the ratio of, for example, daily fatalities to daily infectious cases should be constant. In the literature on the analysis of historical epidemics, fatality reports are usually the only available data, hence models are necessarily based on the assumption that cumulative fatalities represent cumulative number of removed individuals. For the Covid-19 pandemic, as daily fatality and infectious case reports are available, further evaluation of the representation of R(t) in terms of fatality data is presented. Daily infections and total fatalities are displayed on Figure 3 , for all countries. From these graphs, it is difficult to see whether the relation ' = is satisfied or not. For this, first, daily fatalities as representatives of the derivatives of R(t) will be compared with daily infectious cases as representatives of I(t) on Figures 4-6. Based on these comparisons, it will be concluded that total infectious cases would be better representative of the number of Removed individuals, R(t). Normalized daily infectious cases and total fatalities are shown on Figure 3 . From Figure 3 , it can be seen that the epidemic cycle has been completed in China over the course of about 70 days. The jump in total fatalities is due to a change in the reporting scheme. As our analysis is based on total infectious cases, this change has no effect on the models. For South Korea, the epidemic is in a state of slow decrease at the end of about 60 days, but the rate of infections is still high. This qualitative behaviour is an indication of the fact that R 0 for South Korea is expected to be much higher than the one for China. For France, Germany and Iran, the epidemic is in the decline phase. For the rest of the countries, further analysis is needed in order to assess epidemic phase. Daily infectious cases and daily fatalities are given on Note that in China, fatalities occur earlier than hospitalizations in the initial phase. This early phase is followed by a "stationary" period, over which the number of hospitalizations is around their peak and the number of daily fatalities oscillates around a mean. This intermediate period ends with a sharp decrease in fatalities (the reasons should be investigated). During the third phase, the decrease in the fatalities is faster than the decrease In the hospitalizations. Thus the data for China has 3 phases while the data for South Korea looks much like the stationary phase of the data for China. It should be also pointed out that the number of fatalities in South Korea is very low as compared with China, hence it would be expected that the number of infections is a better representation of the number of infections. Daily infectious cases and daily fatalities for France, Germany, Italy and Spain, and for Iran, Turkey, the United Kingdom and the United States are shown on Figures 5 and 6 , respectively. From these graphs, one can see that in Germany, the number of daily infectious cases leads the number of daily fatalities, but for other countries either they coincide or the situation is reversed. The underlying reasons for this behaviour should be analysed in more detail, using country specific information on strategies for testing infectious cases and treatment procedures applied in the course of the epidemic. As noted above, the knowledge of R 0 determines the total proportion of individuals that would be affected, R f . Furthermore, the peak of I(t) occurs at the time t m , at which the proportion of susceptible individual falls to the value 1/R 0 . This information is useful for the determination of the proportion of people that have to be vaccinated in order to drag the proportion of susceptible individuals below this threshold. The Basic Reproduction Number is "defined" as the number of new infections per unit time in a fully susceptible population. Thus, it is a quantity that might be measured by direct on-site observations. On the other hand, the knowledge of R 0 by itself does not give any information on the timing of the progress of the epidemic. In the present work, the determination of the following parameters is discussed: 1) The Basic Reproduction Number R 0 , 2) The mean duration of the infectious period T, 3) The time t m (days) at which the number of infectious cases reaches its maximum, i.e, the first derivative of I(t) is zero, 4) The time t a (days) at which the rate of increase in the number of infectious cases reaches its maximum, i.e, the time at which the second derivative of I(t) is zero and the first derivative is positive, 5) The time t b (days) at which the rate of decrease in the number of infectious cases reaches its maximum, i.e, the time at which the second derivative of I(t) is zero and the first derivative is negative. It will be seen that R 0 and T can be estimated only for China where the spread of the epidemic is over. For other countries, R 0 and T cannot be estimated from the normalized data, but the timings of the key events, t m , t a and t b can be determined quite reliably. These parameters are determined by a "brute force" approach: The models are run for a broad range of parameters. Then the difference between data and the model is compared by using various norms. Finally, the models that match data within 5% are selected. If the scatter plot of the errors versus the parameter to be estimated has a sharp minimum, it is concluded that the corresponding parameter can be determined from the shape of the normalized data. The parameter ranges for the SIR model are and the initial values are chosen as where 1 < k < 10. For South Korea, these parameter ranges are extended appropriately. In the SIR model, since ' = ; that is, the rate of change in the number of removed individuals is proportional to the number of infected individuals, it is expected that the cumulative cases are proportional to cumulative fatalities. Thus, the SIR model predicts the simultaneity of the daily fatalities and daily infections. The verification of this fact requires the availability of data both for infections and for fatalities. To the best of our knowledge, historical data studied in the literature includes fatalities only, and the data for the 2009 H1N1 epidemic collected at certain major hospitals [18] is unique in the sense of reflecting information on both infections and fatalities. The peculiarity of this data is a shift of about 8 days between total infections and total fatalities, the peak of infections occurring 8 days prior to the peak of fatalities. This time shift was explained by a multi-stage SIR model [19] . Cumulative cases and cumulative fatalities for Covid-19 do not show such a clear time shift. On the contrary, in China and Korea, fatalities increase faster than infections. In Germany, there is a slight lead for infections, while for other countries the two curves more or less coincide. The lead of fatalities over infections that is observed in China and in Korea is an unexpected fact, which is possibly due to the irregularities in the statistics, in medical treatment practices, etc. We should also note that the progression of the Covid-19 epidemic is unique in the sense that new treatment methods are applied during the initial phase in China and these methods have been applied in other countries. For China, several programs were run, first by fitting the predicted R(t) to the total fatality data, then to the cumulative infectious case data. In the first case, about 700 models fitting cumulative fatalities within 5% error and about 3000 models that fit cumulative infections within 5% error are found. Furthermore, in the latter case, the minima for the quantities that were aimed to be determined were much sharper. For South Korea, as it will be explained later, the model matching was not successful. For other countries, as the difference between total infections and total fatalities was negligible, total infections are used as a representative of R(t) of the SIR model. Our main result is that it is not possible to determine the Basic Reproduction Number and the mean duration of the infectious period from the shape of the normalized data (unless there are reasonable estimates for either of these parameters). In order to make a stable determination of the parameters R0 and T by using the early stage data, a certain period of time has to pass. This period is approximately 70 days for 2009 A(H1N1) epidemic [20] . However, this period for Covid-19 is still uncertain. This is possibly the reason why the parameters for countries other than China and South Korea can not be established. On the other hand, the timings of the peak of the infectious cases, the peak of the rate of increase and the rate of decrease of the infectious cases can be determined more precisely from the shape of the normalized data. The 'best' estimations of the parameters R 0 and T lie on a curve that is nearly linear when a SIR model is used to fit the data of an epidemic. This fact has been observed in previous work [20] , in the study of the H1N1 epidemic and it was explained by the fact that the duration of the epidemic pulse (appropriately defined in terms of a fraction of the peak of infections) was nearly invariant for values of R 0 and T, with R 0 /T constant. In order to visualize this situation, the solutions of this system of differential equations of the SIR model (1) for parameter ranges 1.5