key: cord-1056575-wwcq20l5 authors: Martin-Olalla, J. M. title: A reference for mortality in Spain from 2001 to 2019 records with an accurate estimate of excess deaths during the 2020 spring covid-19 outbreak date: 2020-07-24 journal: nan DOI: 10.1101/2020.07.22.20159707 sha: 964a1e93ffaee9d4bcc8a40c6786931cc45314c8 doc_id: 1056575 cord_uid: wwcq20l5 Spanish official records of mortality and population in the 21st century are analyzed to determine the reference value of all cause, all age, all sex per capita death rate in the country. This reference is used to analyze the mortality in 2020, largely influenced by a massive anomaly in spring due to covid-19. The most probable mortality excess from Monday March 2 to Sunday May 24, 2020 ---W10 to W21--- is 1055{+/-}15 deaths per one million population or 49940{+/-}730 total deaths. Standard deviation is 67 deaths per one million population. The excess amounts to 53% relative to the expected number of deaths from 21st century records and yields a z-score z=16 ---extremely high excess, according to EuroMoMo classification scheme. Taking into account nationwide seroprevalence(Pollan 2020) that is an upper bound of the infection fatality rate of 2.1%. The illness designated covid-19 caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caught worldwide attention since its identification in late 2019 (Zhu et al., 2020) . Spain is one of the European countries most impacted by the disease during the spring of 2020. Confirmed covid-19 cases climbed up to several hundred thousands -some few thousands per one million population-and confirmed covid-19 deaths to some 28 000 or six hundred deaths per one million population. 1 Nonetheless this numbers are only a fraction of the excess death recorded by mortality monitors(ISC3, 2020) 2 and a fraction of infections as detected by seroprevalence studies (Pollán et al., 2020) . Total excess deaths is a key quantity to understand the acute impact of a pandemic (Aaron et al., 2020) . Its determination sensibly requires a reference mortality and a time interval over which the excess is determined. This manuscript takes official records of all cause, all age, all sex weekly mortality in Spain in the 21st century to ascertain a sensible reference value of weekly per capita mortality as a function of the week of the year. When this reference is used, 2020 mortality from Monday March 2 to Sunday May 24 2020 yields an excess of 1055 deaths per one million population or 49 940 excess deaths. In the classification scheme of EuroMoMo the z−score is 16, an extremely high deviation for an elapsed time interval equal to eleven weeks. The Spanish Instituto Nacional de Estadística (INE) usually releases at the end of a calendar year mortality reports corresponding to the preceding year. INE has access to every single death certificate in Spain. On June 3th 2020 the INE announced preliminary reports(INE, 2020a) on weekly all cause, all age number of deaths in Spain for 2020. On their first press release INE announced 3 44 000 excess deaths in Spain until week 21 of the year 2020, with the reference set to mortality in 2019 until week 21. A csv dataset is available at the link provided in the reference. This report will analyze the July 15th, 2020 update which collects reports until Sunday July 5th, 2020. The dataset contains consolidated all cause, all age weekly mortality deaths from 2000 to 2018 and the preliminary results from 2019, alongside with 2020 fresh data. The dataset is organized by the ISO-8601 week date which considers regular years with w = 52 weeks (364 days) and extraordinary ("leap") years with w = 53 weeks (371 days). In the dataset 2004, 2009, 2015 and 2020 are "leap" years. The dataset follows the European convention: (1) a week begins on Monday and ends on Sunday; (2) a year starts on the Monday following the first Thursday of a calendar year and ends on the Sunday following the last Thursday of the calendar year. Every single calendar date belongs to only one week of the ISO-8601 week date; there is no gap and no overlap. Therefore the first week (W01) and the last week of a year (W52 or W53) may contain dates from adjacent calendar years: dates from December 29st to January 3rd can fall on the first week of a year or in the last week of a year. Any analysis of mortality over a range of 20 years must of necessity account for population changes during the period. INE(INE, 2020b) publishes (table 31 304) the official numbers of people living in Spain on a regular basis. The dataset starts on 1996 and it is updated twice a year with the population for January 1st and for July 1st. The last available figures are those of January 1st, 2020. In this work this dataset will be translated in week dates and linear interpolation with splines will produce an array N ij of estimated Spanish population from 2000-W01 to 2020-W01. Every week of the year 2020 will equally collect the last available data. Since the start of the covid-19 worldwide outbreak the Centro de Coordinación de Alertas y Emergencias Sanitarias (CCAES) -a Spanish equivalent for the Centers for Disease Control and Prevention CDC, and enrolled in the Instituto de Salud Carlos III (ISC3)-released situation reports and, later, started to release the number of covid-19 cases detected in Spain and the number of deaths within this subset of population. This dataset has been largely criticized during in the spring of 2020. It was often refreshed and it usually contained inconsistent figures. Since mid June the dataset-showing total number of cases-gained stability and accuracy reporting by date of symptom start or, if unknown, by the sixth day prior to the date of diagnostic positive test. The dataset is available at the link provided in footnote 1. The database on covid-19 deaths was also cleaned up but it is no longer available within 2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 4 2 0 0 5 2 0 0 6 2 0 0 7 2 0 0 8 2 0 0 9 2 0 1 0 2 0 1 1 2 0 1 2 2 0 1 3 2 0 1 4 2 0 1 5 2 0 1 6 2 0 1 7 2 0 1 8 2 0 1 9 2 0 2 0 From 2001 to 2019 weekly number of deaths D ij and weekly population N ij were used of collect the weekly all-cause, all-age, all sex per capita 6 death rate in Spain d ij = D ij /N ij (see Figure 1 ). Save for the "high frequency" seasonal variability the relative change in d ij is on average 1 % for the full range. If bivariate descriptive analysis where the predictor x is week date and the response y is d ij are performed then the null hypothesis "per capita weekly death rate is unrelated to x" sustains (p = 0.43) at the standard level of confidence. Contrastingly, D ij increased on average 16 % and its bivariate analysis does not sustain (p = 3 × 10 −16 ) the null hypothesis. Therefore a sensitive conclusion is d ij remained still over the elapsed 19 years or 991 weeks, save for "high frequency" seasonal variability. In practical terms all cause, all age, all sex weekly deaths in Spain can be expressed as D = N × f (x i ), where f (x i ) is some function of social inputs -including sex and age-and environmental inputs -like subsolar latitude-which remained unaltered over the span of twenty years. This shows the stability of this period. Figure 2 collapses the full set of weekly death data into a w = 52 week x-axis corresponding to every of the first w weeks in a week year. The multicolored step lines display the consolidated values in the range 2001 to 2018 and the provisional 2019 value. Every week i exhibits a distribution d ij of n = 19 values from 2001 to 2019. A Kolmogorov-Smirnov test of normality sustains the null hypothesis "sample comes from a normal distribution" at the standard level of confidence for every week except for W33 (p = 0.043). Univariate central tendency and dispersion will describe the sample through the 4 https://github.com/datadista/datasets/tree/master/COVID%2019 5 https://raw.githubusercontent.com/datadista/datasets/master/COVID%2019/ccaa_covid19_fallecidos_por_fecha_defuncion_ nueva_serie_long.csv 6 This manuscript will translate per capita values into "per one million population". The symbol M −1 will be used for this purpose. Notice, however, that the use of isolated SI prefixes is not recommended. In this case M −1 means a fraction 1 × 10 −6 of 1. For that purpose µ could have been equally used. Sometimes rates will be computed per capita and per some unit of time, like M −1 a −1 which means per one million population and per year and not per one million years, symbol Ma −1 . Notice again that M −1 a −1 is not a recommended notation. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 24, 2020. 25000 2 9 / 1 2 2 6 / 0 1 2 3 / 0 2 2 2 / 0 3 1 9 / 0 4 1 7 / 0 5 1 4 / 0 6 1 2 / 0 7 0 9 / 0 8 0 6 / 0 9 0 4 / 1 0 0 1 / 1 1 2 9 / 1 1 2 7 / 1 2 sample average r i , sample standard deviation s i and sample error of the mean (SEM) u i = s i / √ n. Figure 2 displays the sample average r i (thick black step line) and a strip band located at r i ± 2.33s i which under the null hypothesis of normality should contain 99 % of the observations. Therefore 978 out of the w × n = 988 observations should lie within the colored strip band in Figure 2 . There are 18 observations lying outside the colored strip band. However 4 of this outliers are located the 2003 heat wave while 3 of them are located on 2012 and can be traced to the impact of a strong influenza wave (León-Gómez et al., 2015) , also annotated in Figure 1 . The remaining 11 outliers are already compatible with the number of outliers under normality even though they might be also linked to the impact of some disease or to environmental conditions. Figure 1 and Figure 2 also display the seasonality of death rates. Generally speaking death rates are higher from W48 -early November-to W10 -early March-in the following year. This is associated to winter mortality, the impact of influenza and cold waves. Mortality is lower from W10 to W48 but variability still remains in the summer weeks as a result of the impact of variable heat waves. Specifically 2003-W32 and 2003-W33 are easily spotted on Figure 2 . They are contemporaneous with the strong 2003 heat wave that increased mortality across Europe and which is annotated in Figure 1 . Table 1 displays the average r i -a reference for the 2001 to 2019 death rate in Spain-and the sample error of the mean (SEM) in brackets. The average value r i is the most probable value for the death rate in 2020 under the null-hypothesis of normality following the preceding nineteen years. The table also shows the accumulated value in a 52 week year which is 8607(71) M −1 a −1 (death rate per one million population and per year) or 165.5(14) M −1 w −1 (death rate per one million population and per week) or 23.64(19) M −1 d −1 (death rate per one million population and per day). Table 1 also lists observed all cause, all age death rates in 2020 o i per week and per one million population, alongside with the accumulated value until W27. It is assumed that the observed 2020 value has zero uncertainty, albeit it is a provisional figure. The hypothesis just means o i is not a source of uncertainty, which in practical terms implies excess death uncertainty is given by the sample error of the mean for the 2001 to 2019 series. In the same way Table 1 displays three statistics related to the 2020 excess rate of death. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 24, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 24, 2020. . https://doi.org/10.1101/2020.07.22.20159707 doi: medRxiv preprint Absolute excess: The difference e i = o i − r i from observed death rate to predicted death rate under the normal hypothesis for the series. Absolute excess shows an anomaly around W14 -March 30 to April 5-, where the largest deviation occurs. The anomaly starts on W10 and ends on W21. Thereafter it will be referred as the 2020 spring anomaly. The ratio e i /r i , shown as a percentage. Largest value is 160 % on W14. Relative to r i observed death rate more than doubled during three weeks. z−score: or normalized excess. It is the ratio e i /s i , which tests the observed rates of deaths against a normal distribution with zero mean and unit standard deviation. The largest z−score is z = 42. Under the null hypothesis of normality such a deviation would have likely been observed in a sample consisting of ∼ 10 350 years. The European Monitor of Mortality uses the z−score to compare mortality across countries with different sizes. They bin the scores according to: 2 < z 4 is a low excess; 4 < z 7 is moderate; 7 < z 10 is a high excess; 10 < z 15 is very high; and z > 15 is extremely high. For four weeks in a row the z−score was extremely high in Spain. Finally, Table 1 lists the breakdown of the 2020 excess in Spain with the numbers of confirmed covid-19 deaths c i and the numbers of excess deaths without attributed cause -unknown cause-k i = e i − c i . Both of them show an anomaly around W14. Notably, k i > c i for W13 to W15. Table 2 provides a better insight on how W01 to W27 accumulated mortality changed in Spain during the past twenty years. The table shows population, deaths and per capita deaths alongside with deviation from the average value for the 2001-2019 years and the expected values for 2020. Notice that population is only listed for W01, while interpolated values of population were used to obtain o. As of W27 death excess in Spain amounts to 1008(52) deaths per one million population or 47 700(2400) deaths, a 22 % relative to the reference. The z−score is z = 4.5, meaning a moderate excess. Notice, however, this is a score computed on an elapsed time interval equal to 27 weeks. From W10 to W21 -Monday March 2 to Sunday May 24-excess death rate is anomalously high in Spain -see Table 1 -. Figure 3 displays the observed death rates (o i , black step line) in Spain for the year 2020; the reference r i (greenish step line) and the number of deaths with unknown attributed cause (magenta step line). Succintly Table 1 is displayed graphically in Figure 3 . Moreover the dashed line displays the weekly rate confirmed covid-19 cases. It is advanced some two weeks relative to the mortality anomaly. The darkest shaded area highlights the total number of expected deaths from W10 to W21. Intermediate shaded area is the number of deaths with unknown cause in the same period of time and the lightest shaded area is the number of confirmed covid-19 deaths in the time window. Table 3 lists accumulated values for e i , c i and k i . In the elapsed time from W10 to W21 excess death rate amounts to 1055(15) per million population or 49 940(730) deaths; 55 % of them (27 555 deaths) were confirmed covid-19 cases and the remaining 45 % (22 383 still lacks an attributed cause). Total excess is 55 % relative to expected deaths in the same period of time and the z−score for the W10-W21 period of time is 15.8, extremely high according to the EuroMoMo classification. Note however that this is the z−score for the twelve-week period of time. Weekly z−scores are listed in Table 1 . Table 4 lists the variability of the W10 to W21 accumulated number of deaths in the past twenty years and population numbers for W01. Notice however that interpolated values of population were used to compute per capita death rates. Bivariate descriptive statistics where the calendar year plays the role of a predictor and the observed per capita mortality o is the response sustains (p = 0.48) the null hypothesis of no relationship at the standard level of confidence. The confidence interval for the slope is [−4, +8]M −1 a −1 , in accordance with Figure 1 . In sharp contrast if the response is set to the observed mortality O then the null hypothesis does not sustain (p = 7 × 10 −8 ) at the standard level of significance. The confidence interval for the slope is [572, 923]a −1 . A reference value for weekly per capita mortality in Spain obtained from public records which includes data in the 21st century is reported (see Table 1 and Figure 2 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 24, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 24, 2020. This reference value is used to compute the excess of mortality in the year 2020 from W10 to W21 when weekly mortality displays an extremely high anomaly (see Figure 3 ) due to the impact of covid-19. The per capita excess death is 1055(15) per one million population or 49 940(730) (see Table 4 ). Considering a nationwide seroprevalence (Pollán et al., 2020) of 5.0 % this translates into a upper bound of the infection fatality rate (IFR) of 2.1 % if every excess death is atributed to the disease. The lower bound for IFR is 1.2 %, obtained after considering confirmed covid-19 deaths only. The author declares no conflict of interest. A tsv file containing the Spanish weekly deaths from 2000 to 2020 exported from INE table 35 176 and the computed values of Spanish population used in this manuscript is available at https://personal.us.es/olalla/ covid/ESPDeathsPopulation.tsv. Zhu, Na, Dingyu Zhang, Wenling Wang, Xingwang Li, Bo Yang, Jingdong Song, Xiang Zhao, Baoying Huang, Weifeng Shi, Roujian Lu, Peihua Niu, Faxian Zhan, Xuejun Ma, Dayan Wang, Wenbo Xu, Guizhen Wu, George F. Gao, and Wenjie Tan (2020) , "A novel coronavirus from patients with pneumonia in China, 2019," New England Journal of Medicine 382 (8), 727-733. Counting and its how-to are the least things I can do in the loving memory of those who died during the spring of 2020. JMMO hearthfully thanks the Instituto Nacional de Estadística and the Instituto de Salud Carlos III for making publicly available the datasets. JMMO also thanks to Datadista (www.datadista.com) for realising public records of covid-19 confirmed deaths. This work was performed using free software running on xubuntu 18.04.1LTS. Data bases have been imported into GNU octave-4.2.2 (https://www.gnu.org/software/octave/). Pictures were developed thanks to gnuplot-5.2.2 (http://www.gnuplot.info/). The manuscript was typeset in GNU emacs-25.2.2 (https://www.gnu.org/software/emacs/) assisted by AUCT E X(https://www.gnu.org/software/auctex/). Data in pictures and in tables have been exported directly from octave. This project started on July 3, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2020. . https://doi.org/10. 1101 A pandemic primer on excess mortality statistics and their comparability across countries Estimación del número de defunciones semanales durante el brote de Covid-19 Exceso de mortalidad relacionado con la gripe en España en el invierno de 2012 10 W10 to W21 analysis