key: cord-0969464-ao93c31w authors: Raheem, Ali title: Estimating cases of COVID-19 from Daily Death Data in Italy date: 2020-03-20 journal: nan DOI: 10.1101/2020.03.17.20037697 sha: 08d138f10c916bfd41fb359cc523946a62c94982 doc_id: 969464 cord_uid: ao93c31w COVID-19 is an emerging infectious disease which has been declared a pan- demic by the World Health Organisation. Due to limited testing capacity for this new virus, variable symptomatology the majority of infected showing non-specific mild or no symptoms it is likely current prevalence data is an underestimate. Methods: We present an estimate of the number of cases of COVID-19 com- pared to the number of confirmed case in Italy based on the daily reported deaths and information about the incubation period, time from symptom on- set to death and reported case fatality rate. Results: Our model predicts that on the 31st of January 2020 when the first 3 infected cases had been identified by Italian authorise there were already nearly 30 cases in Italy, and by the 24th of February 2020 only 0.5% cases had been detected and confirmed by Italian authorities. While official statistics had 132 confirmed case we believe a more accurate estimate would be closer to 26000. With a case-doubling period of about 2.5 days. COVID-19 has now been declared a pandemic by the World Health Organisation. Caused by a betacoronvirus virus SARS-CoV2 which is related to the SARS and MERS virus. 1 This disease appears to have a mortality rate of approximated 1-15%. However there have been a wide variety in reported proportion of cases that are asymptomatic or only show mild non-specific symptoms. Making it difficult to estimate the prevalence of COVID-19 without widespread testing which has not yet been implemented in any country. Previously authors have used this to estimate the mortality of COVID-19 (1) . Accurately estimating the prevalence of COVID-19 will allow organisations to make better informed decisions to control COVID-19. We used a linear retrospective model to estimate past point prevalence using daily number of report deaths. This model required us to calculate a nominal time to death from infection. We based this value on data available from the World Health Organisation. The WHO report time from onset of symptoms to death of about 2 weeks (4). Additionally, the mean incubation period has been reported to be 6.4 days (2) . We estimate that deaths on a given day should correlate with infections 3 weeks prior and use this with daily reported deaths to estimate the spot prevalence in the past. We obtained data on daily deaths and past cases from the European Centre for Disease Prevention and Control (3) cross referenced for accuracy with data from the World Health Organisation (4). Data processing was carried out with R, the Juptyer notebook and Tidyverse software suites on a Debian 9.0 Stretch using the latest Jupyter/r-notebook docker image (jupyter/r-notebook:15a66513da30) (5) (6). Using a case mortality of 7% was used based on the recent estimates. Deaths d M ortality 2 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The following data was generated using the reported number of deaths per day and new confirmed cases. From this a cumulative cases and cumulative deaths data was calculated and used to calculate the point prevalence according to the formula described above. Using data up to the 16 th of March 2020 we can estimate the point prevalence 21 days into the past (the 24 th of February 2020). This value is subject to the inevitable jitter in deaths per day due to COVID-19, it therefore should be used to guide a trend line before interpretation. Figure 1 summaries the results graphically, the full results can be reviewed in the supplementary materials section in Table 1 . On the 24th of February when our prediction data ends there were 132 cases confirmed by Italian authorities but our model predicts there were near 26000 cases in reality. Our model predicts that in this period there was undetected transmission resulting in a rise in cases from 28 to 18000. With a doubling period of about 2.5 days. 3 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037697 doi: medRxiv preprint The large disparity with estimated prevalence being much higher than confirmed cases indicates that either an increasing majority of cases are not detected. There seems to have been a period of several weeks where COVID-19 was transmitted in the Italian population undetected. Only a minority of cases appear to be confirmed at any point in time. In this paper we present evidence that the currently confirmed cases of COVID-19 are a dramatic underestimate of the true point prevalence in Italy and a method to estimate point prevalence from daily deaths of COVID-19. Increasing the mortality would reduce the estimated prevalence but this alone could not make the estimates agree with the confirmed cases in order of magnitude. This methodology would be applicable to many other conditions and relies only on accurate estimate of deaths due to the condition which can easily be confirmed post-mortem and case mortality. Without incubation date or data on disease progression an accurate estimate can still be produced but will not provide temporal information but could be used to estimate the time from infection to death. This model used the spot daily reported deaths which may lag the true date of death due to delays in confirming and then reporting causes of death if COVID-19 was not diagnoses ante mortem. Our estimate of point prevalence varies proportionally to the error in deaths. Deaths due to infection not reported will cause an underestimate in prevalence. Estimating true mortality rates is difficult, and our estimate varies with 4 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. accessed 17-March-2020]. 5 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. I would like to thank the healthcare workers around the globe for their tireless efforts fighting COVID-19. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037697 doi: medRxiv preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037697 doi: medRxiv preprint The Lancet Infectious Diseases Coronavirus disease 2019 (COVID-19) Situation Report -56 Euro surveillance : bulletin Europeen sur les maladies transmissibles = The authors declare no competing interests.