key: cord-330338-i6ozygkp
authors: Babacic, H.; LehtioÌ, J.; Pernemalm, M.
title: Global between-countries variance in SARS-CoV-2 mortality is driven by reported prevalence, age distribution, and case detection rate
date: 2020-06-02
journal: nan
DOI: 10.1101/2020.05.28.20114934
sha: 
doc_id: 330338
cord_uid: i6ozygkp

Objective: To explain the global between-countries variance in number of deaths per million citizens (nDpm) and case fatality rate (CFR) due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Design: Systematic analysis. Data sources: Worldometer, European Centre for Disease Prevention and Control, United Nations Main outcome measures: The explanators of nDpm and CFR were mathematically hypothesised and tested on publicly-available data from 88 countries with linear regression models on May 1st 2020. The derived explanators - age-adjusted infection fatality rate (IFRadj) and case detection rate (CDR) - were estimated for each country based on a SARS-CoV-2 model of China. The accuracy and agreement of the models with observed data was assessed with R2 and Bland-Altman plots, respectively. Sensitivity analyses involved removal of outliers and testing the models at five retrospective and two prospective time points. Results: Globally, IFRadj estimates varied between countries, ranging from below 0.2% in the youngest nations, to above 1.3% in Portugal, Greece, Italy, and Japan. The median estimated global CDR of SARS-CoV-2 infections on April 16th 2020 was 12.9%, suggesting that most of the countries have a much higher number of cases than reported. At least 93% and up to 99% of the variance in nDpm was explained by reported prevalence expressed as cases per million citizens (nCpm), IFRadj, and CDR. IFRadj and CDR accounted for up to 97% of the variance in CFR, but this model was less reliable than the nDpm model, being sensitive to outliers (R2 as low as 67.5%). Conclusions: The current differences in SARS-CoV-2 mortality between countries are driven mainly by reported prevalence of infections, age distribution, and CDR. The nDpm might be a more stable estimate than CFR in comparing mortality burden between countries.

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has substantially affected the lives of billions of people. (1, 2) An ongoing question in the public is how high is the direct mortality caused by SARS-CoV-2. Observations in the case fatality rate (CFR), i.e. the proportion of individuals with a confirmed SARS-CoV-2 infection who die, has raised concerns due to the high variability between countries, ranging from below 0·1% in Qatar and Singapore to above 15% in Belgium and France, on May 27 th 2020. (3, 4) The global average case detection rate (CDR) on March 30 th was estimated at 9%, suggesting that the true prevalence of infections is likely underestimated in most of the countries. (9) Studies suggest that the reported number of cases per million citizens (nCpm) is probably lower than the true number of infected individuals, and that this contributes to the varying CFR between countries.(5,6) CFR appears higher than the true infection fatality rate (IFR), i.e. the true proportion of individuals with a SARS-CoV-2 infection who will die in the population regardless of whether they are confirmed or not. (7) This was observed in China where the crude CFR estimate was 3.67%, whereas the age-adjusted overall IFR (IFRadj) was estimated at 0.66%. (8) The number of confirmed deaths per million citizens (nDpm) is a population-normalised measure of mortality used to compare countries. However, the varying nDpm in countries with similar nCpm, population size and similar mitigation strategies has also raised fears of potential varying virulence of the virus and different treatment capacity between countries. A recent multivariable model could explain only 62.5% of SARS-CoV-2 mortality variance between countries.(10) Explaining the remaining variance of the reported mortality as nDpm and CFR is extremely relevant for both the medical community and the public, to address public concerns. Furthermore, it is important to assess whether the adjusted mortality differs substantially between countries, in order to track the success of different strategies. The aim of this study was to test two mathematical hypotheses that explain the global between-countries variance in SARS-CoV-2 mortality expressed as nDpm and CFR on real data.

Global data on cumulative number of cases (nC), cumulative number of deaths (nD), cumulative number of tests (nT), number of tests per million citizens (nTpm), number of cases per million citizens (nCpm), and nDpm per country were downloaded from Worldometer. (3) Global data on number of  new cases and deaths per day were  downloaded from the European Centre for  Disease Prevention and Control (ECDC).(4) Global data on age distribution and 2018 GDP per country were obtained from United Nations (UN) statistics.(11,12)

The overall IFRadj per country was estimated and weighed per nine age groups, following the equation:

where !"# is the IFRadj in percentages (%), is the total population size, $ is the number of susceptible individuals within an age group, $ is the IFR for that age group in % as estimated by Verity and colleagues. (8) For the purposes of this study, the !"# serves as an ageadjustment factor. The CDR per country was estimated as the percentage of the estimated cases that have been confirmed cases, following the approach of Vollmer & Bommer (9):

where $ is CDR in %, $ is IFRadj in %, -$ is cumulative number of confirmed cases at time , and -# is cumulative number of confirmed deaths at time . Following the Verity model (8) , is 14 days before in this approach, based on the estimate that on average 18·8 days pass from the onset of symptoms to death, holding a conservative assumption that on average 4.8 days pass from symptom onset to case detection. From these equations, -# is implied to have an inverse relation with the $ and will depend on the cumulative number of cases 14 days before the -# have occurred, and the age-adjusted $ :

Assuming that the number of cases at the time of -# ( -# ) will have a constant dependence on the -$ , as observed repeatedly in epidemics, including SARS-CoV-2, -# can replace it in the equation. In order to explain the population-. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.05.28.20114934 doi: medRxiv preprint normalised number of deaths -nDpm, one has to normalise the -# per population size with nCpm, deriving that:

where nDpm at time will be higher in countries with higher nCpm at time and higher IFRadj, whereas it will be lower in countries with the same nCpm and IFRadj that have higher CDR. Following that the CFR is the proportion of -# from the -# , the subsequent relationship between the CFR, the IFRadj and CDR is implied as:

$ is the CFR of a country, $ is the IFRadj of a country, -# is the cumulative number of cases on the day of -# , and -$ is the cumulative number of cases two weeks before . Again, assuming that -# will have a constant dependence on the -$ , they can be omitted from the equation, deriving that:

where the CFR will be higher in countries with higher IFRadj and will have an inverse relation with the CDR. Hypothesis 2 implies that older countries will have higher CFR and countries with higher CDR will have lower CFR, and predicts that the -# will not drive CFR.

To test hypothesis 1, we built linear regression model 1 (nDpm model), to explain nDpm with nCpm, IFRadj, and CDR. To test hypothesis 2, we built linear regression model 2 (CFR model), to explain CFR with IFRadj and CDR. Only countries with more than 1,000 cases were included in the analyses. All variables were normalised by log transformation. We have additionally tested whether GDP, nTpm, and duration of epidemic (as days from first case) could explain the mortality after being added to the models.(10) The accuracy was assessed with R 2 , and the agreement was analysed graphically with the Bland-Altman mean difference plot.(13)

To address uncertainty, we removed outliers outside of the 95% confidence intervals (95% CI) of the Bland-Altman plots, and reiterated the analyses retrospectively on April 4 th , 8 th , 12 th , 21 st , 24 th , and prospectively on May 7 th , 11 th , 18 th , and 27 th 2020.

The study is conducted according to the Guidelines for Accurate and Transparent Health Estimates Reporting. (14) The code, data, and results are publicly available at https://github.com/harbab/covid_19_morta lity. All analyses were performed in R V.3.6.1.

As of May 1 st 2020, a total of 214 countries in the world have reported SARS-CoV-2 infections. Of these, 88 countries have reported more than 1,000 SARS-CoV-2 infections. The estimated IFRadj varied from below 0.2% in the youngest nations of Ivory Coast, Guinea, Nigeria, UAE, Cameroon, and Afghanistan, up to above 1.3% in the world's oldest nations of Germany, Portugal, Greece, Italy, and highest in Japan with 1.6%. The global average CDR on April 16 th 2020 was 22.12% (median: 12.9%, SD: 32.47), suggesting that most of the cases were undetected. Only two countries detected more than 100% of expected cases -Iceland (154.50%) and Singapore (234.95%). Estimates for each country are shown in Table S1 , supplementary information.

Univariate analyses showed that nCpm, IFRadj, and nTpm could explain 65.57%, 40.29%, and 25.91% of the variance in nDpm, respectively (p < 3.922 -07 ). The CDR was not univariately associated with nDpm (p = 0.738). However, combined together nCpm, IFRadj and CDR could explain 97.18% of the variance in nDpm (p < 2.2 -16 ).

Introducing nTpm to the model only slightly improved the R 2 to 0.9728 (p < 2.2 -16 ). All four variables were included in the final model 1 ( Table 1 ). The relationship . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.05.28.20114934 doi: medRxiv preprint between the variables was as hypothesised mathematically. The model showed almost perfect accuracy ( Figure  1A ) and high agreement ( Figure 1B ) in explaining nDpm. Some countries were outliers ( Figure 1C ). Univariately, nDpm had a positive association with GDP (p = 8.213 -11 ) and the duration of the epidemic (0.0227). However, neither variable had an association with nDpm when added to model 1 (p = 0.308 and 0.196, respectively). GDP had a positive correlation with all four explanators of model 1 (p < 9.341 -06 ), most evidently with nTpm (R 2 = 0.592), and was thus redundant in the model. GDP and duration of epidemic are possibly unstable explanators that can be useful in stratified analyses per continents and regions.

Univariately, IFRadj and CDR explained 17.68% and 40.35% of the variance in CFR, respectively. When combined together, IFRadj and CDR accounted for 91.84% of the variance in CFR ( Table 2) . The predicted CFR also had high accuracy and high agreement with observed CFR (Figure 2 ). Both nCpm and -# were not associated with the CFR when added to model 2 independently, confirming the assumption on which hypothesis 2 relies. None of the additional variables (nTpm, GDP, or duration of epidemic) was associated with CFR univariately or when added to model 2. nTpm and GDP were also not associated with CFR in a previous report (10).

Reiterating the analysis at five retrospective and four prospective timepoints showed that model 1 could robustly explain at least 93% of the variance in nDpm (at least 95% after removing outliers), but model 2 had lower accuracy at earlier stages of the pandemic (Figure 3) . Less countries had >1,000 cases at earlier timepoints (range: 56 on April 4 th -110 on May 27 th ). The nTpm was an unreliable explanator of nDpm that accounted for a very small proportion of variance that can be omitted; the effect of nTpm is possibly underestimated due to its association with nCpm.

The CFR model was more sensitive to outliers compared to the nDpm model, with a higher average decrease in R 2 of 5.53% (range:

1.4-9.6%, median: 5.85%) compared to an average decrease of 1.59% for the nDpm model (range: 0.5-3.4%, median: 1.2%) when including outliers (p = 0.0035). The assumption of no effect of -# on CFR was violated at some time-points. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.05.28.20114934 doi: medRxiv preprint Bland-Altman plot. The mean of the predicted log-normalised nDpm and observed log-normalised nDpm is plotted on the x axis, whereas the difference on a log scale between the observed nDpm and predicted nDpm is plotted on the y axis. The mean difference between the observed nDpm and predicted nDpm was 0 (blue, full line), with the 95% confidence intervals (red, dashed lines) containing most of the values. Five countries were outliers in this model, having less nDpm than predicted: Russia, Belarus, Singapore, Bangladesh, and Kazakhstan; C. Countries outliers. Actual difference between observed nDpm and predicted nDpm in numbers. The labelled countries in the upper part of boxplot (>95 th quantile) had much more observed nDpm than predicted, whereas the labelled countries in the lower part had much less nDpm than predicted by the model. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.05.28.20114934 doi: medRxiv preprint Figure 2 . Agreement between observed and predicted case fatality rate (CFR). The countries are annotated with their country code. A. Predicted log-normalised CFR (x axis) vs log-normalised observed CFR (y axis). The model could predict almost perfectly the CFR in a linear fashion. The blue line is model fit and the shades are 95% CI; B. Bland-Altman plot. The mean of the predicted log-normalised CFR and observed log-normalised CFR is plotted on the x axis, whereas the difference on a log scale between the observed CFR and predicted CFR is plotted on the y axis. The mean difference between the observed CFR and predicted CFR was 0 (blue, full line), with the 95% confidence intervals (red, dashed lines) containing most of the values. Four countries were outliers in this model, having lower CFR than predicted: Russia, Belarus, Singapore, and Bangladesh.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.05.28.20114934 doi: medRxiv preprint On average, several countries had > 25 more nDpm than expected -Italy, Belgium, Switzerland, Spain, the Netherlands, and Iran, whereas others had on average > 5 less nDpm than expected -UK, Peru, Brazil, Belarus, Russia, Canada, Chile and Kuwait. Likewise, Italy, Algeria, Iran, the Netherlands, China, Belgium, Iraq, Indonesia, Spain, Switzerland, and the Philippines had on average > 1.5% higher CFR than expected according to the model, whereas Bangladesh, Ukraine, Brazil, Bolivia, Mexico, Belarus, Russia, Peru, and Honduras had on average > 1% lower CFR than expected. At prospective time-points in May, three countries had consistently higher than 100% CDR, reporting more cases than expected: Singapore (range: 404-854%), Iceland (range: 159-160%), and Qatar (range: 121-210%). Detailed results from the sensitivity analyses are available in supplementary information.

Most of the global variance in nDpm between countries was explained by reported prevalence of SARS-CoV-2 infections (nCpm) and age distribution as expressed with the IFRadj. This has to be further adjusted for the CDR, which has an inverse relation with the nDpm, but only in the context of using nCpm and IFRadj to explain nDpm. As expected, the richer countries were better at testing and detecting cases, but were also older and had a higher infection mortality burden. The CFR is also dependent on the IFRadj and the CDR, but does not depend on the prevalence or the total number of SARS-CoV-2 confirmed cases. Some countries remain outliers, having consistently higher or lower mortality than expected according to the models. This is possibly due to consistent misreporting (10), differences in reporting deaths, diagnostic bias, sex distribution and average age of individuals who diedcountries with on average higher mortality than expected possibly had more older people and more men infected and dying. (15) The observation that several countries have detected a higher number of cases than expected and had lower observed CFR than IFRadj (see supplementary . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.05.28.20114934 doi: medRxiv preprint information), supports the notion that the current IFR is overestimated and that the actual mortality is lower than estimated. To confirm these observations, serological surveys of populations will be essential in correctly estimating the true prevalence and mortality of SARS-CoV-2.

These models use very few explanators while maintaining high accuracy in explaining mortality. The sensitivity analyses demonstrated the robustness of the mathematical models when tested on real data. There is a remaining small proportion of variance than cannot be explained by the models, and this can be due to data mishandling or estimation errors, which limit the study. Independent of these limitations, the nDpm model remained robust. The CFR model was more sensitive to outliers than the nDpm model, and might be a less stable mortality outcome to follow SARS-CoV-2 mortality burden over time and across countries. The models were somewhat less accurate at earlier stages, which can be due to the amount of data (number of countries) used to build the models.

Overall, this study demonstrates that most countries are on a similar SARS-CoV-2 mortality trajectory as the number of cases increases, after adjusting for age distribution and CDR. These models should be used for less biased comparisons of mortality between countries. The nDpm model appears as a more stable indicator of SARS-CoV-2 infection mortality burden and should be favoured in following and comparing mortality within and between countries.

Evidence before this study -Verity and colleagues (Lancet Inf Dis 2020) have estimated the SARS-CoV-2 infection fatality rates (IFR) per age groups, and Vollmer & Bommer (2020) have estimated that the average case detection rate (CDR) of SARS-CoV-2 infections in 40 countries was below 10% end of March.

-No studies have been published explaining the global SARS-CoV-2 variance in mortality. A medRxiv preprint by Shagam (2020) reports that approximately 60% of SARS-CoV-2 mortality variance can be explained by gross domestic product per capita in United States dollars (GDP), latitude, hemisphere, press freedom, population density, fraction of citizens over 65 years old, and outbreak duration.

Added value of this study -The models in this study demonstrate that most of the between-countries variance in SARS-CoV-2 mortality can be explained with two to three explanators, maintaining high accuracy. This can help to alleviate public concerns of potential varying virulence of the virus, and provide a less biased, standardised comparison of mortality burden between countries.

-In the setting of lacking an effective and safe treatment and/or vaccine against SARS-CoV-2, most of the countries will be on a similar SARS-CoV-2 mortality trajectory as the number of cases increases, after adjusting for the age distribution of the population and the case detection rate.

Disease Control, Civil Liberties, and Mass Testing -Calibrating Restrictions during the Covid-19 Pandemic

If the world fails to protect the economy, COVID-19 will damage health not just now but also in the future

European Centre for Disease Prevention and Control ECDC. Geographic distribution of COVID-19 cases worldwide

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Real estimates of mortality following COVID-19 infection. The Lancet. Infectious diseases. United States

Estimates of the severity of coronavirus disease 2019: a model-based analysis

Average detection rate of SARS-CoV-2 infections is estimated around nine percent

Untangling factors associated with country-specific COVID-19 incidence, mortality and case fatality rates during the first quarter of 2020. medRxiv

Statistical methods for assessing agreement between two methods of clinical measurement

Guidelines for Accurate and Transparent Health Estimates Reporting: the GATHER statement

Case-Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy

We express gratitude to Dr. Petter Brodin, Dr. Ioannis Siavelis, and Dr. Emilie Friberg for reading the draft and providing fruitful feedback.

The study is conducted with publicly available data, and does not include individual patient or public involvement.

The study was performed according to the ethical standards expressed in the Declaration of Helsinki. This study does not require ethical approval.Contribution: HB designed the study, derived the hypotheses, collected, analysed and interpreted the data, and wrote and edited the manuscript. JL and MP assisted in design and interpretation of the study, supervised the work, reviewed and edited the manuscript.

The corresponding author (HB) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.Dissemination: not applicable.

The authors declare no conflict of interest.Funding: The authors have not received funding for this work.