key: cord-0720733-08pzt4m0 authors: Moraga, Paula; Ketcheson, David I.; Ombao, Hernando C.; Duarte, Carlos M. title: Assessing the age- and gender-dependence of the severity and case fatality rates of COVID-19 disease in Spain date: 2020-06-02 journal: Wellcome Open Res DOI: 10.12688/wellcomeopenres.15996.1 sha: 63230c1edc91715eb1081a3d410ceeee12319702 doc_id: 720733 cord_uid: 08pzt4m0 Background: The assessment of the severity and case fatality rates of coronavirus disease 2019 (COVID-19) and the determinants of its variation is essential for planning health resources and responding to the pandemic. The interpretation of case fatality rates (CFRs) remains a challenge due to different biases associated with surveillance and reporting. For example, rates may be affected by preferential ascertainment of severe cases and time delay from disease onset to death. Using data from Spain, we demonstrate how some of these biases may be corrected when estimating severity and case fatality rates by age group and gender, and identify issues that may affect the correct interpretation of the results. Methods: Crude CFRs are estimated by dividing the total number of deaths by the total number of confirmed cases. CFRs adjusted for preferential ascertainment of severe cases are obtained by assuming a uniform attack rate in all population groups, and using demography-adjusted under-ascertainment rates. CFRs adjusted for the delay between disease onset and death are estimated by using as denominator the number of cases that could have a clinical outcome by the time rates are calculated. A sensitivity analysis is carried out to compare CFRs obtained using different levels of ascertainment and different distributions for the time from disease onset to death. Results: COVID-19 outcomes are highly influenced by age and gender. Different assumptions yield different CFR values but in all scenarios CFRs are higher in old ages and males. Conclusions: The procedures used to obtain the CFR estimates require strong assumptions and although the interpretation of their magnitude should be treated with caution, the differences observed by age and gender are fundamental underpinnings to inform decision-making. The coronavirus disease 2019 (COVID-19) has spread to nearly every country in the world since it first emerged in the Hubei province of China in 2019. As of 14 May 2020, more than 4.22 million cases and more than 290,000 deaths have been reported worldwide 1 . While people of any age may get infected, COVID-19 symptoms are particularly severe for the elderly and those with underlying health conditions, which creates a disproportionate risk and need for intensive care in these groups. Understanding the severity of the disease in the different population groups is essential to help predict the demand of healthcare resources and to design effective mitigation policies. Case fatality rates (CFR) are often used to characterize the severity of the disease. The crude CFR is obtained by dividing the cumulative number of deaths by the cumulative number of reported cases. This indicator is simple to calculate but is difficult to interpret due to different biases 2 . First, the clinical outcome (recovery or death) of the most recent cases may be unknown due to the delay between disease onset and death which may underestimate the true CFR. Moreover, limited capabilities in testing result in most of people tested being only those with the most severe symptoms and most likely to experience fatal outcomes. As a result, crude CFRs may overestimate rates that are defined based on the actual number of infected people (including those with weak or no symptoms). Crude fatality rates can be adjusted in a number of ways to obtain estimates that more accurately represent the severity of the disease in each of the population groups. For example, censoring can be taken into account by using the distribution of the time between disease onset and death to determine the number of cases that could experience an outcome by the point in time when the rates are calculated. The under-ascertainment of in different groups can also be corrected by using the population demographics. Here, we calculate crude and adjusted fatality rates by age group and gender in Spain. Spain is one of the hardest-hit countries in the pandemic with 272646 cases and 27321 deaths as of 14 May 2020. The country is characterized by one of the longest life expectancies and lowest birth rates in the world 3 and, thus, has a large percentage of older adults. Moreover, it is characterized by a sociable lifestyle and extensive inter-generational interactions which may accelerate the spread of the virus. Accurate assessment of CFRs by age group and gender is essential to help planning responses that help save lives. First, we present the data on population, confirmed cases and deaths of Spain. We then demonstrate how to calculate crude and adjusted CFRs by population group and present the estimates for Spain. We discuss the limitations of the methods and conduct a sensitivity analysis where we compare CFRs adjusted under different assumptions. Population data for Spain stratified by age group and gender for 2019 are obtained from the National Institute of Statistics of Spain 4 (Figure 1 ). We note the large percentage of older adults with over-60 males and females representing 11.41% and 14.16% of the whole population, respectively. Data on the daily total confirmed cases and deaths, as well as daily confirmed cases and deaths by age group and gender from a subset of the population are reported by the Spanish Ministry of Health and provided by 5. Assuming this subset is representative of all cases, in terms of the relative distribution among age group and gender, we can estimate the daily number of confirmed cases in each group by multiplying the total number of cases by the proportion of cases in each group. Daily number of deaths in each age group and gender are calculated following a similar procedure. Figure 2 and Figure 3 show the proportion of cases and deaths, respectively, in each age group and gender. We observe a low proportion of confirmed cases in young people (under 20 years old) and a high proportion of deaths in older age groups and males. Figure 4 shows the total number of confirmed cases and deaths over time. We can examine the relative risks in each age group and gender to compare the severity of the disease between population groups. The relative risk in each population group is obtained by dividing the number of deaths in a group by the total population in that group, and normalizing the values so the risk of males older than 80 is equal to 1. We observe a roughly tenfold increase in risk for every 20 year increase in age, consistent with an earlier smaller study of cases in China 6 . Case fatality rate At any point in time, the crude CFR is calculated by dividing the cumulative number of deaths by the cumulative number of reported cases. As noted, CFRs may be affected by preferential ascertainment of severe cases. This is likely to occur in COVID-19 where cases asymptomatic or with mild symptoms are less likely to seek medical care or be included in the surveillance data. This could result in an upward bias (or overestimate) of the crude CFRs by under-reporting of cases. We can partially correct this bias by calculating the adjusted daily number of confirmed cases following the procedure detailed in 6. Specifically, we calculate NC a = pop a /cases a where pop a and cases a are the population and the number of cases, respectively, in group a, a ∈ { males 0-9, males 10-19, males 20-29, males 30-39, males 40-49, males 50-59, males 60-69, males 70-79, males 80+, females 0-9, females 10-19, females 20-29, females 30-39, females 40-49, females 50-59, females 60-69, females 70-79, females 80+ }. We assume perfect ascertainment in the group with maximum 1/NC a value which is the group of males older than 80. Then, we assume the attack rate is the same in all groups and estimate the adjusted number of cases in each population group by multiplying the confirmed cases by NC a /NC males 80+ . Figure 6 and Figure 7 show the cumulative confirmed cases and the cases adjusted for preferential ascertainment over time for each age group and gender. Finally, we calculate the CFRs adjusted for preferential ascertainment by dividing the cumulative number of deaths by the cumulative number of adjusted cases in each population group. We also calculate 95% confidence intervals using exact binomial tests 7 . CFRs can also be biased due to the delay between disease onset and death. At any moment in time, the cumulative number of confirmed cases includes people who have not yet died but may do so in the future. Therefore, crude fatality rates may underestimate the true severity of the disease. We can correct this bias by replacing the denominator with an estimate of the cumulative number of cases with known outcomes by the time rates are calculated. Specifically, we adjust for this bias as follows. Let T be the point in time when the CFRs are calculated. The probability that a case confirmed at time t, t = 1, . . . , T, has a known outcome by time T is expressed Here we calculate the number of adjusted cases assuming a log-normal distribution of the time from disease onset to death with mean equal to 13 days and a standard deviation equal to 12.7 8 ( Figure 5 ). Figure 6 and Figure 7 show cumulative cases adjusted for preferential ascertainment of severe cases and time delay between confirmation and death for each population group. Then we calculate corrected CFRs using the adjusted cases as denominator and 95% confidence intervals using an exact binomial test. The procedure we used to obtain adjusted CFRs requires strong assumptions that could greatly affect the results. First, we have adjusted crude CFRs by preferential ascertainment of severe cases by assuming complete ascertainment in the group with the highest attack rates (males older than 80). We then have assumed a uniform attack rate in all population groups, and used demography-adjusted under-ascertainment rates to obtain estimates of the number of infected individuals in each population group. However, there could also be under-ascertainment in the males older than 80 group due to extensive strain on the health system, and this fact could mean the CFR estimates are only an upper bound on the real values. We could correct this bias by further scaling the number of cases after the initial demographic adjustment. For example, we could multiply the adjusted cases by a value α > 1 to obtain a higher number of infected cases and lower CFRs. Moreover, the uniform attack rate assumption could be incorrect if certain population groups have more interactions with other people and are more exposed to the disease. CFRs may also be biased due to the delay between disease onset and death. To correct this bias, we have considered a log-normal distribution with mean 13 days and standard deviation 12.7 days for the time from disease onset to death 8 , and estimated the CFRs using as denominator the cumulative number of cases that could have a clinical outcome by the time rates are calculated. However, other distributions may be considered that could change the results. To illustrate these limitations, we conduct a sensitivity analysis where we calculate the CFRs using different levels of ascertainment and different distributions for the time from disease onset to death. Specifically, we estimate the adjusted number of cases in each population group by multiplying the confirmed cases by NC a /NC males 80+ × α using α values equal to 1, 1.5 and 2. We also use delay distributions equal to a log-normal distribution with mean 13 days and standard deviation 12.7 days 8 (Figure 5) , and a Gamma with mean 18.8 days and coefficient of variation 0.45 days 9 (Figure 9 ). Analysis are performed with the statistical software R version 3.6.1 10 . Plots are created with the R package ggplot2 version 3.3.0 11 . Figure 8 shows the relative risks in each age group and gender. We note the risk of COVID-19 increases with age and is higher for males than for females for all age groups except 0-9 and 10-19. Table 1 shows the crude and adjusted CFRs by age group and gender calculated on 14 May 2020. This table also shows the CFRs by age group obtained from aggregated time series of cases in mainland China by Verity et al. 6 . We observe CFRs are much higher in age groups older than 60 and, for most age groups, in males. We observe the adjusted CFRs obtained with the data from Spain are smaller than the CFRs obtained by Verity et al. 6 for all except the oldest two groups, and the confidence intervals for the CFRs of Spain are much smaller due to the use of a larger dataset. Table 2 shows the CFRs estimated under different scenarios assuming different levels of ascertainment and distributions for the time from disease onset to death. We observe that in all scenarios CFRs are higher in older age groups and males but yield different values for the CFRs. In a newly emerging infectious disease like COVID-19 data are assembled in challenging circumstances that may contribute to the underestimation of cases and deaths. Data available on the total confirmed cases and deaths in Spain do not provide age and gender information. Here, we have obtained estimates by population group by multiplying the total confirmed cases and deaths by the proportions occurring in each group of a sample with that information. This is a limitation of our study since it is possible that the sample with demographic information may not be representative of the whole population. We have seen that the approach of estimating crude CFRs by dividing the total number of deaths by the total number of confirmed cases produce results that are difficult to interpret due to several biases. For example, the estimated rates may overstate the true rates due to preferential inclusion of severe cases since data assembled during emergency settings typically contain people who seek medical care, have the most severe symptoms, and experience fatal outcomes. Following Verity et al. 6 we have adjusted by preferential ascertainment of severe cases by assuming complete ascertainment in the group with the highest attack rates, and using demography-adjusted under-ascertainment rates to estimate the number of infected individuals in each population group. In addition, CFRs may also be biased due to the delay between disease onset and death. We have adjusted for this bias by considering a specific distribution for the time from disease onset to death. These are strong assumptions that could greatly affect the results. We conducted a sensitivity analysis where we calculated the CFRs using different levels of ascertainment and different distributions for the time from disease onset to death. The sensitivity analysis yielded different values for the CFRs, and in all scenarios CFRs were higher in older age groups and males. In addition, CFRs calculated in the initial phase of an epidemic are highly dependant of the point in time they are calculated. Here we provide estimates calculated with data from 14 May but rates calculated at a later point in time could be different. The assessment of the severity of COVID-19 and the determinants of its variation is essential for planning health resources and the design of mitigation policies, including intelligent strategies to release population from confinement while protecting the most vulnerable. In this article we have estimated CFRs by age group and gender in Spain accounting for censoring and ascertainment biases. We have found that COVID-19 is highly influenced by age and gender with higher rates in older ages and males. The procedures used to obtain the CFR estimates require strong assumptions and although the interpretation of their magnitude should be treated with caution, the differences observed by age and gender are fundamental underpinnings to inform decision-making. Source data Data on total confirmed cases and deaths, as well as confirmed cases and deaths by age group and gender from a subset of the population are reported by the Spanish Ministry of Health and provided by 5. Population data for Spain are obtained from the National Institute of Statistics of Spain 4 . This project contains the following underlying data: • ccaa_covid19_casos2020-05-14.csv (number of confirmed cases in each of the 17 regions of Spain from 2020-02-21 to 2020-05-14) • ccaa_covid19_fallecidos2020-05-14.csv (number of deaths in each of the 17 regions of Spain from 2020-03-03 to 2020-05-14) • nacional_covid19_rango_edad2020-05-14.csv (number of confirmed cases, hospitalized, uci, and deaths in Spain for each age group and gender from 2020-03-23 to 2020-5-14) • popspainagegroupsex1Jul19.csv (Spanish population for each age and gender in 2019) Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0). Code for the results, figures and tables of this study can be found at https://github.com/Paula-Moraga/coronavirus-cfr Potential Biases in Estimating Absolute and Relative Case-Fatality Risks during Outbreaks Reference Source 4. National Institute of Statistics: Population by age group and sex. Reference Source 5. Datadista: Datasets related to COVID-19 in Spain Estimates of the severity of coronavirus disease 2019: a model-based analysis PubMed Abstract | Publisher Full Text | Free Full Text Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the out-break on the Diamond Princess cruise ship PubMed Abstract | Publisher Full Text | Free Full Text Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data PubMed Abstract | Publisher Full Text | Free Full Text Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Elegant Graphics for Data Analysis Assessing the age-and gender-dependence of the severity and case fatality rates of COVID-19 disease in Spain Universidade Estadual do Ceará, Fortaleza, BrazilThe article "Assessing the age-and gender-dependence of the severity and case fatality rates of COVID-19 disease in Spain" aims to calculate crude and adjusted fatality rates by age group and gender in Spain. Overall, the manuscript is well written with interesting analysis and important results. Minor considerations are given below.In the introduction, the part from "while people of any age" to "in these groups" needs a reference. Once the authors decided to go through a methodological approach on the use of models to calculate CFR in the COVID-19 scenario, the epidemiological features of the cases are not well addressed. It is understood that doing this type of analysis during an ongoing pandemic has its limitations. Then, I suggest addressing them as some limitations of the study. The methods are well-stated and results are clearly presented. 3.Code and data files are available to download and replicate the study. 4. Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Competing Interests: No competing interests were disclosed. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 14 July 2020 https://doi.org/10.21956/wellcomeopenres.17545.r39137 © 2020 Russell T. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Centre for Mathematical Modelling of Infectious Diseases, Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK The authors calculated adjusted CFR estimates for Spain, adjusting directly for the delay between onset to death and indirectly for under-ascertainment and the age structure of the Spanish population. The adjustments for delay and for under-ascertainment require some assumptions, so the authors carry out a sensitivity analysis testing how different ascertainment assumptions and delay distribution assumptions affect the overall results. The sex and age-specific CFR estimates they calculate are consistent with other published CFR estimates and provide a useful and detailed contribution to the evidence base of COVID severity estimates.The discussion and application of the existing difficulties and biases when estimating many epidemiological parameters during an ongoing outbreak is lucid and well-written. It sheds light on the difficulties present, rather than glossing over them, which is all too often the case in other severity estimate papers.One minor comment is that the under-ascertainment adjustment includes some assumptions. Specifically, they assume that where cases are most severe (i.e. older age groups), case ascertainment is highest. A very fair assumption! But there are ascertainment estimates, in Verity et al. and many other detailed modelling papers (amongst others I'm sure), against which these assumptions could be tested. It is a minor comment, as the estimates are very much inline with other published estimates. But some discussion about the existence of more rigorous ascertainment estimates as a minimum requirement would help. Comparing such estimates to the assumed values would be even better.The code is available, but it would only run after some difficulty, as I believe some of the data loaded includes letter/accents specific to Spanish. It would not run on first attempt and I had to do quite a bit of digging and tweaking in the code to get it to run. I believe cleaning up the code so that it runs reproducibly and smoothly would also be a great benefit. However, the code is well written and commented and the analysis is relatively straight forward. It is great that it has been shared, as the analysis is almost reproducible just by reading the code to see what it does! Improving it would certainly help though.Apart from those minor points, this is a sound, interesting and well-written paper which addresses all of the difficulties in calculating CFRs during an outbreak and provides detailed estimates for Spain. I believe it would be of indexable standard if the code base is smoothed out a bit and some discussion of other case ascertainment estimates is included. Are all the source data underlying the results available to ensure full reproducibility? Yes Competing Interests: No competing interests were disclosed. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.