key: cord-341827-z9r5i0ky authors: Macias-Ordonez, R.; Villasenor-Amador, D. title: The misleading illusion of COVID-19 confirmed case data: alternative estimates and a monitoring tool date: 2020-05-25 journal: nan DOI: 10.1101/2020.05.20.20107516 sha: doc_id: 341827 cord_uid: z9r5i0ky Confirmed Case Data have been widely cited during the current COVID-19 pandemic as an estimate of the spread of the SARS-CoV-2 virus. However, their central role in media, official reports and decision-making may be undeserved and misleading. Previously published Infection Fatality Rates were weighted by age structure in the 50 countries with more reported deaths to obtain country-specific rates. For each country, the number of infections up to the Infection Date (23 days ago = Incubation Period + Onset to Death period) and the present percentage of immune population were estimated using Infection Fatality Rate, the number of reported deaths (which is less prone to undersampling), and projecting back to Infection Date. We then estimated a Detection Index for each country as the percentage of estimated infections that confirmed cases represent. Assuming that detection remains constant after Infection Date, we estimated the number of deaths and the estimated percentage of the population of each country expected to be immune up to 23 days into the future. Estimated Infection Fatality Rates are higher in Europe. In most countries, confirmed cases currently represent less than 30% of estimated infections on Infection Date, and this value decreases with time. Countries with flat curves throughout the pandemic show the lowest immunity percentages and these values seem unlikely to change in the near future, suggesting that they remain vulnerable to new outbreaks. Estimates for some countries with low Infection Fatality Rates suggest a still steep increase in the number of casualties in the next three weeks. Countries that did not control initial outbreaks seem to have reached higher immunity percentages, although mostly still under 5%. We provide the code to monitor the trajectories of these estimates in 178 countries throughout the COVID-19 pandemic. Previously published Infection Fatality Rates were weighted by age structure in the 50 27 countries with more reported deaths to obtain country-specific rates. For each country, 28 the number of infections up to the Infection Date (23 days ago = Incubation Period + 29 Onset to Death period) and the present percentage of immune population were 30 estimated using Infection Fatality Rate, the number of reported deaths (which is less 31 prone to undersampling), and projecting back to Infection Date. We then estimated a 32 Detection Index for each country as the percentage of estimated infections that 33 confirmed cases represent. Assuming that detection remains constant after Infection 34 Date, we estimated the number of deaths and the estimated percentage of the 35 population of each country expected to be immune up to 23 days into the future. 36 Estimated Infection Fatality Rates are higher in Europe. In most countries, confirmed 37 cases currently represent less than 30% of estimated infections on Infection Date, and 38 this value decreases with time. Countries with flat curves throughout the pandemic 39 show the lowest immunity percentages and these values seem unlikely to change in the 40 near future, suggesting that they remain vulnerable to new outbreaks. Estimates for 41 some countries with low Infection Fatality Rates suggest a still steep increase in the 42 number of casualties in the next three weeks. Countries that did not control initial 43 outbreaks seem to have reached higher immunity percentages, although mostly still 44 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.20.20107516 doi: medRxiv preprint Introduction 50 COVID-19 confirmed case data (CCD) are the central piece of information in most 51 news, official reports, conversations, forecasting efforts, and are also probably central to 52 most decisions made by authorities worldwide since the pandemic outbreak in 53 December 2019. However, it is widely known that they represent a small and unknown 54 fraction of the actual number of SARS-CoV-2 infections [1] [2] [3] [4] , we just do not know how 55 small. Especially hard to assess is the number of asymptomatic but contagious 56 infections [5] [6] [7] [8] , as asymptomatic carriers are unlikely to seek testing. 57 Elementary sampling theory and recent supported opinions [3, 6, 9] suggest that 58 CCD are highly dependent on the testing effort and sampling protocol, among other 59 factors. Unless a randomized and standardized sampling protocol of the whole 60 population is carried out, there is no a priori reason to assume they are representative 61 of the magnitude or even the speed of the spread of this or any other virus. 62 Furthermore, unless similar testing efforts are made in different countries, the data are 63 not comparable, and pooling them may provide an even worse picture of the spread of 64 the virus worldwide. COVID-19 related deaths data are reported nearly as much, but 65 there is less focus on them. These data, however, are less prone to sampling error [2,3], 66 unless a large number of COVID-19 related deaths go undetected, misdiagnosed, or 67 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 25, 2020. provide, but see Vogel [4] , and may even represent a monitoring alternative for 84 countries with little or no access to antibody testing. 85 Since IFR is age dependent, its global value depends on the age structure of the 86 infected population. COVID-19 is known to be more lethal in older age classes 87 [12, 15, 16, 17] , thus we can expect that the more biased the age structure of a given 88 population to such classes, the higher its IFR will be. In order to apply this value to other 89 populations, the IFR of each age class must be weighted by the relative proportion of 90 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 25, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.20.20107516 doi: medRxiv preprint 6 modified in the form of function parameters as explained in S1. It has been recently 114 suggested [20] that sharing fully detailed code and functionality of modeling and 115 monitoring tools is more important than ever to face the current COVID-19 pandemic. 116 This section describes what the script does and the parameters used as default, 117 beginning with the procedure used to obtain the data. 118 The database used for analyses is obtained from the European Centre for 119 Disease Prevention and Control (ECDC) web page [21] . The script includes code lines 120 needed to import the daily updated database with confirmed cases and deaths. The 121 ECDC keeps a daily updated database curated from over 500 sources [22] . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. period. This value will be referred to as %DI on Idt. If a country is currently reporting no 181 deaths, this %DI value is evaluated for the period ending with the last daily %DI value 182 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.20.20107516 doi: medRxiv preprint 222 Estimated IFR values show a bimodal distribution with one mode around 0.5% and the 223 other slightly above 1% (Fig 1) . Out of the 50 countries with more reported deaths, 224 estimated IFR values are higher than 1% in all European countries (n=23) with the 225 exception of Russia (0.92%), Ireland (0.84%), Ukraine (0.98%) and Moldova (0.72%), 226 and lower than 1% in all non-European countries (n=27) with the exception of Canada 227 (1.05%) and Japan (1.60%) ( Table 1 ). This is due to the fact that European countries 228 have age structures biased to higher age classes. This is likely one of the reasons why is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. IFR may also depend on country specific factors such as healthcare system, socio- Our results on %DI on Idt show great variation among countries, and suggest 266 that CCD represent less than a third of estimated infections on Idt in all but 5 countries, 267 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. values consistently far outside the distribution of the rest of the countries (Fig 1) . Other CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. 68-82% more deaths in the next 23 days. They would also experience an equivalent 378 increase in population immunity. 379 It has been suggested that the accumulation of herd immunity in the population 380 slows epidemic resurgence [37] . By contrast, when virtually no population immunity is 381 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.20.20107516 doi: medRxiv preprint 19 built, such as the case of China, India, Japan, Bangladesh or South Korea (percentage 382 of immune population below 0.1%, in Table 1 ) a resurgence peak of COVID-19 may be 383 nearly the same size as an uncontrolled epidemic if control policies fail [37] . The 384 numbers for Japan (Fig 5) , for instance, suggest a high vulnerability since it shows the 385 highest estimated IFR (1.60) of all countries, a low %DI on Idt (6 %), the lowest 386 percentage of immune population (0.04%), and the highest estimated increase (111%) 387 in infections from Idt to the present, and thus a similar increase in deaths in the 388 following 23 days. We still don't know how long immunity to SARS-CoV-2 could last [4], 389 but recent modeling suggests anywhere between 40 weeks and 5 years [37] . 390 Nevertheless, immunity is unlikely to end abruptly, so the rate of any further re-infection 391 with new strains should be smoother in the future as long as herd immunity has had 392 some build-up [4,16]. However, even Belgium with the highest estimated percentage of 393 immune population (6.94%) seems far from values approaching herd immunity. 394 Although CCD have been used to estimate the number of infections [38], we 395 believe that this will usually provide poor estimates as suggested by the short and long-396 term variation in daily %DI values. In any case, we suggest that having a low %DI value CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.20.20107516 doi: medRxiv preprint 20 antibodies, we suggest that estimates based on reported deaths and IFR are a more 405 reliable alternative to estimate the spread of SARS-CoV-2 than CCD in any country for 406 which age structure data is available and data of reported deaths is trustworthy. They 407 also illustrate the potential bias when assuming that CCD data reflect the actual spread 408 of COVID-19. Logistic approximations used to describe new outbreaks in the 2020 COVID-19 pandemic The COVID-19 infection in Italy: a statistical study of an abnormally severe 421 disease As COVID-19 cases, deaths 426 and fatality rates surge in Italy, underlying causes require investigation First antibody surveys draw fire for quality, bias Incubation period of 2019 novel 475 coronavirus (2019-nCoV) infections among travellers from Wuhan The epidemiological characteristics of an outbreak of 2019 483 novel coronavirus diseases (COVID-19) in China Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality 488 and healthcare demand Demographic science aids in understanding the spread and fatality rates of Vienna: R Foundation for Statistical Computing United Nations, Department of Economic and Social Affairs European Centre for Disease Prevention 509 and Control. An Agency of the European Union ECDC collects and processes COVID-19 data European Centre for Disease Prevention and Control An 514 agency of the European Union COVID-19 deaths and cases: how 522 do sources compare? The 525 Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly 526 Reported Confirmed Cases: Estimation and Application Epidemiological data from the COVID-19 outbreak, real-time case information The Editorial Board. The Global Coronavirus Crisis Is Poised to Get Much, Much 535 Worse. The New York Times Prediction models for diagnosis and prognosis of covid-19 infection: 546 systematic review and critical appraisal Available from: 552 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted Variación en casos por 555 reclasificación -Covid19 Virological assessment of hospitalized patients with COVID-2019 Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Estudio Ene-Covid19: Primera Ronda Estudio 575 Nacional de Sero-Epidemiología de la Infección por SARS-CoV-2 en España 576 Informe Preliminar Projecting the 580 transmission dynamics of SARS-CoV-2 through the postpandemic period Serial interval of 585 COVID-19 among publicly reported confirmed cases. Emerg Infect Dis 412 We are grateful to Isabel Noriega, Yareni Perroni, Matt Draud and Warren Greiff for 413 proofreading and useful comments on early versions of the manuscript. It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. reported deaths. This spreadsheet is generated by the function cv19.tab.num() in the 604 R script (S1) and includes 10 additional columns apart from those shown in Table 1 CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.20.20107516 doi: medRxiv preprint