key: cord-0950330-iq3njzoi authors: Martin-Olalla, J. M. title: Age disaggregation of crude excess deaths during the 2020 spring COVID-19 outbreak in Spain and Netherlands date: 2020-08-07 journal: nan DOI: 10.1101/2020.08.06.20169326 sha: 13e47c4ea49225e136df570b8354c10661d57338 doc_id: 950330 cord_uid: iq3njzoi Spanish and Dutch official records of mortality and population during the 21st century are analyzed to determine the age specific crude death rate in the 2020 spring COVID-19 outbreak. Excess death rate increases exponentially with age showing a doubling time [5.0,5.6]a (Spain) and [3.9,6.7]a (Netherlands), roughing doubling every five years of increase in age.The effective infection fatality rate in Spain also shows this doubling time. Statistically significant mortality increase is noted above 45a (Spain) and 60a (Netherlands). A statistically significant increase of mortality is also noted in Spain for the youngest age group. The illness designated COVID-19 caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caught worldwide attention since its identification in late 2019 [1] . Spain is one of the European countries most impacted by the disease during the spring of 2020. Confirmed COVID-19 cases climbed up to several hundred thousands -some few thousands per one million population-and confirmed COVID-19 deaths to some 28 000 or six hundred deaths per one million population. [2] Lock-down measures came into effect on March 16 to help diminishing the outbreak. Total excess deaths or crude excess deaths is a key quantity to understand the acute impact of a pandemic [3] . Its determination sensibly requires a reference mortality or baseline and a time interval over which the excess is determined. Baseline is determined from previous records of mortality. European National Statistical Institutes and Eurostat are doing a great effort in disseminating weekly deaths in Europe in the past years. This manuscript takes official records of weekly crude deaths in Spain and Netherlands during the 21st century and population records to ascertain the impact of the COVID-19 in age specific death rates. In April 2020 Eurostat set up "an exceptional temporary data collection on total week deaths in order to support the policy and research efforts related to COVID-19" [4] . The data collection is disaggregated by sex, five year age group and NUTS regions in several countries of the European Union and elsewhere. Data are provided by National Statistical Institutes on a voluntary basis. The 2020 weekly deaths still have the flag of "estimate". This manuscript will analyze the age disaggregation in Spanish and Dutch weekly death rate. The data set is not fully coherent. Adding up deaths for every age group does not usually result in a number equal to the total number of weekly deaths included in the catalog. In 2020 total number of weekly deaths is larger than the sum of age grouped weekly deaths by 4 % (Spain) and 3 % (Netherlands). In a previous pre-print [5] the total (all-age) weekly crude death rate in Spain was analyzed and a total excess equal to 1055 death per one million population from W10 to W21 in the year 2020 was reported. The 95 % confidence interval (CI) was [1023, 1087] . Age group population values until January 1, 2019 for Spain and Netherlands were also collected from Eurostat demo pjan table. This table disaggregates age year by year which allows to build up five year age groups matching to the weekly death data. Linear interpolation was used to compute population a weekly basis. Total population values for January 1,2020 can be collected from table ts00001 in Eurostat. Unluckily there is no age disaggregation. However these figure were obtained in Spain from the table 31304 at the Instituto Nacional de Estadística. For the Dutch set the last two available shares of population for every age group were used to extrapolate the shares of population in 2020. Eurostat requested the National Statistical Institutes the transmission of "a back series weekly deaths for as many years as possible, recommending as starting point the year 2000". In this manuscript Spanish and Dutch weekly 2 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2020. . https://doi.org/10.1101/2020.08.06.20169326 doi: medRxiv preprint deaths will be analyzed from 2001 on-wards. As Eurostat points out "a long enough time series is necessary for temporal comparisons and statistical modelling". In age specific analysis the population N is assumed to be a blending, a mixture, or a multicomponent system composed of several age groups of size N i whose behaviour in relation to death is homogeneous under some circumstances. Age specific weekly death rate [6] d i is nothing but the ratio between weekly deaths D i within a group i and its population N i . All age specific deaths sum up the total weekly deaths D = i D i . Also considering the definition of age specific death rate this can be written as D = i N i × d i from which the total weekly death rate d = D/N can be obtained as d = x i d i where x i = N i /N is the shares of population by age group. Total weekly death rate is then the weighted average of age specific death rates. Figure 1 shows the age specific weekly crude death rate d i in Spain for 19 age groups (color from a gradient palette) and the total weekly crude death rate d (orange). Some interesting points must be remarked. First, in the figure it is clearly perceptible that higher age groups are equally spaced along the y-axis, which is logarithmic. Therefore the age specific weekly death rates are distributed exponentially with age and follows the Gompertz exponential law [7] d(a) ∝ 2 a/τ , where a is the age and τ is a characteristic age time for which d(a) doubles. This empirical law suggests the predominance of age specific contributions to the cause of death instead of external causes like wars, murders, plagues or the like. One could collapse the lines in figure 1 into a universal age-independent weekly death rate just by scaling weekly death rates with some exponential function whose characteristic time τ is obtained by fitting age specific weekly death rates. However no such scaling would fit to every age group. As an alternative it is useful to look for age specific baselines, which will model the behaviour of weekly death rate back in time. On the second hand and globally speaking, every age specific weekly death rate is decreasing with time in Spain. Just as an example a thin horizontal line at the level 0.3 % w −1 is plotted to highlight the evolution of the oldest age group. On the contrary the total weekly death rate stands still for nineteen years (see also Ref. [5] ). This result can be understood considering two competing phenomena. First the improvements in the public health system which helps decreasing age specific death rates. Second, the ageing of population helped by the decrease in fertility rates and by elder people being less prone to die. As a consequence death rates tend to increases simply because population tends to be elder and more prone to die even though at age specific smaller rates. For the specific set up in Spain in the 3 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2020. . past 20 years the resulting total weekly death rate stands still. The ageing of population is perceptible in Figure 1 by looking at total weekly death rate in the background of age specific weekly death rates. At the beginning of the 21st century d (orange line) matched to the seven to last age group, which is 60-64. In 2020 it matches to the six to last age group, 65-69. Following the ideas in Ref. [5] this work will analyze the cumulative weekly death rate from W10 to W21 which, in 2020 means from Monday March 2, 2020 to Sunday May 24, 2020. In this period of time weekly deaths exhibits an extremely large anomaly in Spain (z− score equal to 16) and a moderate anomaly in Netherlands (z ∼ 7). The data collection allows to compute deaths recorded in this elapsed time in the past years for every age group. Scaling by the group population at the given year gives us the cumulative death rate in this period of time and year. Figure 2 shows the results of this statistics grouped by age group and where age specific death rates are plotted against calendar year. Every plot contains 19 data points per country. Spanish data are displayed by solid blue circles. Dutch results, by open black circles. Notice that the death rates are shown in their normal scores -the score after sample average is removed and sample is scaled by sample standard deviation-so that every age specific y-axis extends equally. Figure 2 also displays in a solid line the linear fitting from 2001 to 2019. The broken lines shows the 95 % confidence (prediction) interval for the residuals of the fit. This lines were extended until 2020 yielding the baseline or reference value of death rate for every age group and its confidence (prediction) interval. As discussed previously every age group displays decreasing values of observed death rates until 2020. It is easily perceived in the figure that every age group older than 45 years in Spain showed an observed 2020 death rate outside the confidence interval of the normal behaviour in the previous 19 years. The death rate in 2020 is an outlier from a statistical point of view. The same happens in Netherlands above age 55 a. It is unnecessary to collect every Pearson's R 2 correlation coefficient or every p-value in these analyses: no age group sustains the null hypothesis of no relationship between age specific death rates and years. The largest recorded p-value is p = 0.0028 but usually p falls below 10 −5 . Contrastingly the total death rate in Spain and Netherlands does sustain the null hypothesis (p = 0.48 and p = 0.87) at the standard level of confidence. It is also worthy to note that the youngest age group also exhibits an increase in 2020, which is equal in both countries. However it is only statistically significant in Spain. It could be related to failures in the public health system amidst the strong stress lived in the spring of 2020 or to parents being reluctant to show up at hospital facilities. Table 1 and Table 2 summarizes the results displayed in Figure 2 . The tables first lists the shares of population in 2020 x for every age group. Then the observed cumulative death rates O in 2019 and 2020 for the observation period. Next group of columns displays the results in Figure 2 : first the predicted value or reference R for 2020 followed by the death rate excess E = O − R and three statistics related to it: the 95 % confidence interval, the P −score E/R and the z−score computed as the excess E divided by the standard deviation of the residuals. [8] If the ratio excess death to CI falls below 1 then the CI includes E = 0 suggesting that the excess is not statistically significant: in those cases P -scores and z−scores are not listed. For the sake of clarity the table then lists the excess age specific deaths E × x × N . Finally in Table 1 last two columns display the results from the nationwide seroprevalence study [9] S and the effective infection fatality rate IFR computed as E/S.[10] To the best of our knowledge there is not a seroprevalence study in Netherlands yet. Mimicking Figure 2 one may think in plotting age specific cumulative death rates as a function of age group for every year in the collection. As an example first two columns of Table 1 and Table 2 shows a steadily progressive increase with age group. Same happens for excess death rates and for IFR in Table 1 which perceptibly doubles in every step above group 45-50. Indeed such plots would have shown up the Gompertz law in these magnitudes as observed in Figure 3 (panels A, B and C) . Age specific excess death rates (panel C) gives τ = 5.3 a (CI [5.0, 5.6]a) in Spain and τ = 4.9 a (CI [3.9, 6.7]a) while IFR in Spain shows τ = 5.2 a (CI [4.9, 5.6]a), as suggested by Table 1 : the IFR doubles every step in the age group staircase climbing until 42 % for the most aged group. For age specific death rates (panels A and B) the analysis can be extended back to 2001. Figure 4 shows the distribution of τ (y) in the past twenty years. Errorbars display the confidence interval for τ (y). Solid lines display the fitting τ versus y in the range 2001 to 2019 with broken lines showing the confidence interval. Spanish characteristic times sustain the null hypothesis (p = 0.77) of non-relationship with calendar year. The observed value in 2020 is τ = 6.3 a, 7 % down from the reference τ = 6.8 a. Dutch times do not sustain (p = 0.003). The observed value in 2020 is τ = 6.0 a, 4 % down the predicted value τ = 6.3 a. One of the most striking results is the anomaly observed for first age group -less than five years age-in Table 1 and Table 2 where it exhibits large P -score and z−score. Somehow these scores may be biased by the model and is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2020. Figure 2 ): the predicted value R and the excess E = O − R followed by 95 % confidence interval, P −score E/R and the z−score computed as the ratio E to the standard deviation of residuals; and the age specific total deaths (E − R) × x × N . Last two columns shows the seroprevalence in the country [9] and the resulting effective age specific infection fatality rate IFR. Figure 2 ): the predicted value R and the excess E = O − R followed by 95 % confidence interval, P −score E/R and the z−score computed as the ratio E to the standard deviation of residuals; and the age specific total deaths (E − R) × x × N . . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2020. . https://doi.org/10.1101/2020.08.06.20169326 doi: medRxiv preprint might be unrealistic. Nonetheless, panel D in Figure 3 shows the ratio of observed age specific death rate in 2020 to observed age specific death rates in 2019 a magnitude which does not depend on modelling. In the panel the increase of mortality in elder groups is clear in Spain (above 30 a) and Netherlands (above 55 a) peaking at 80 a. The first age group usually gets high mortality rates compared to the following groups due to its specific fragility. This is noted in Figure 1 by the lightest coloured age specific death rate in the middle of darker shades starting at 20 × 10 −6 w −1 . In Figure 3 the relative increase for the first age group in Spain (×1.4) matches to that observed at 60 a and the modelling shows O 20 well outside the predicted range of observations (see Figure 2 ). It should also be taken into account that age specific death rate for the first age group matches to that of the 40-45 age group. If the latter is high enough to make its ratio (×1.2) relevant (making a 20 % of increase) then the former is high enough to make its ratio (×1.4) relevant (making a 40 % of increasing, and doubling the impact of the latter). It is out of the scope of this manuscript to address the cause of this striking anomaly. Whether this is a direct impact of COVID-19 in the youngest age group or an indirect impact due to poor public health responses or to parents reluctant to show up children at hospital for health. Either way it is showing the well-known fragility of this age group. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2020. Age specific excess death rates in Spain and Netherlands during the spring 2020 (W10 to W21) COVID-19 outbreak follows Gompertz exponential law with doubling times in the range [5.0, 5.6]a (Spain) and [3.9, 6.7]a (Netherlands) and roughing double every five years. Same happens for the effective IFR in Spain. This result is not far from the characteristic time observed for age specific death rates in the past twenty years which are 6.8 a (Spain) and 6.0 a (Netherlands). Excess death for the eldest age group climbs to 1.8 % of the population of this group (Spain) and 1.9 % (Netherlands) which must be added to the usual 3.8 % (Spain) and 4.7 % (Netherlands) mortality rate for this group in this period of time. The youngest group age also shows the impact of the outbreak with a 40 % increase of mortality rate in Spain relative to 2019 after a long standing decrease in the past twenty years. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2020. The author declares no conflict of interest. Every piece of data in this work comes from public collection found in Eurostat and the Instituto Nacional de Estadística. A pandemic primer on excess mortality statistics and their comparability across countries This manuscript will print death rates as a number times 10 −6 where the number stands for the mortality per one million population. Sometimes rates will be computed per capita and per some unit of time, like a number times 10 −6 a −1 which means per one million population and per year and not "per one million years Notice that, however, Figure 2 is scaled by the standard deviation of the sample The palette used in Figure 1 is named "dense", from cmocean palettes by Kristen Thyng https://matplotlib. org/cmocean/.JMMO thanks National Statistics Institutes and Eurostat for releasing public records of weekly deaths. JMMO knew of Eurostat effort from a tweet posted by Kiko Llaneras https://twitter.com/kikollan/status/1288830168925188096 on July 30, 2020.This work was not founded. This work was performed using free software running on xubuntu 18.04.1LTS. Data bases have been imported into GNU octave-4.2.2 (https://www.gnu.org/software/octave/). Pictures were developed thanks to gnuplot-5.2.2 (http://www.gnuplot.info/). The manuscript was typeset in GNU emacs-25.2.2 (https: //www.gnu.org/software/emacs/) assisted by AUCT E X(https://www.gnu.org/software/auctex/). Data in pictures and in tables have been exported directly from octave.This project started on July 31, 2020.