key: cord-0882827-yiaan65d authors: Garcia-Agundez, A.; Ojo, O.; Hernandez, H.; Baquero, C.; Frey, D.; Georgiou, C.; Goessens, M.; Lillo, R.; Menezes, R.; Nicolaou, N.; Ortega, A.; Stavrakis, E.; Fernandez Anta, A. title: Estimating the COVID-19 Prevalence in Spain with Indirect Reporting via Open Surveys date: 2021-02-01 journal: nan DOI: 10.1101/2021.01.29.20248125 sha: 2b02fa7af1a6127d60d4b6322e40275a6938c585 doc_id: 882827 cord_uid: yiaan65d During the initial phases of the COVID-19 pandemic, accurate tracking has proven unfeasible. Initial estimation methods pointed towards case numbers that were much higher than officially reported. In the CoronaSurveys project, we have been addressing this issue using open online surveys with indirect reporting. We compare our estimates with the results of a serology study for Spain, obtaining high correlations (R squared 0.89). In our view, these results strongly support the idea of using open surveys with indirect reporting as a method to broadly sense the progress of a pandemic. During the initial phases of the COVID-19 pandemic, progress tracking via massive serology testing has 11 proven to be unfeasible. However, initial estimation methods suggested that the real numbers of COVID-19 12 cases were significantly higher than those officially reported (1) . For instance, by April 30th, 2020, the 13 number of confirmed fatalities due to COVID-19 in the US was 66, 028, and the number of confirmed cases 14 was 1, 080, 303. However, with that number of fatalities the number of cases must have been no less than 15 4, 784, 637, by simply using the Case-fatality Ratio (CFR) of 1.38% measured in Wuhan (2). 16 In the case of Spain, the discrepancy seems to be even higher. Preliminary studies point towards only one 17 in 53 cases being reported during the first days of the pandemic (3). Although recent availability of massive contrast with the officially reported number of deaths due to , which rests at 50, 837 (5). This 23 discrepancy is corroborated in publications from official government authorities, which indicate an ongoing 24 estimated underreporting of 20% to 40% (6). 25 In the CoronaSurveys project, (7) we aim to track the progress of the pandemic using online, open, 26 anonymous surveys with indirect reporting. Recent articles have also suggested the use of surveys to monitor 27 the pandemic, both for Spain (8, 9) and globally (10). However, to our knowledge, all surveys conducted 28 in Spain have employed direct reporting only, asking participants about themselves. CoronaSurveys 29 implements the network scale-up method of indirect reporting instead, allowing us to collect data on a wide 30 fraction of the population with a small number of responses and in a very short time-frame (11). In this 31 article, we compare the accuracy of CoronaSurveys with a gold standard: serology testing data collected by 32 the Spanish government in the ENE-COVID study (12). The survey deployed in the CoronaSurveys project, which can be answered via browser or mobile app, 34 includes two questions: can be back-traced to its user. All the data is published in a public Github repository. The study design was 46 reviewed and approved by the ethics committee of the IMDEA Networks Institute. The survey includes an informed consent. Once the data is collected, we remove outlier responses. A response is considered an outlier if (1) r i is 49 outside 1.5 times the interquartile range above the upper quartile (which for the data in this paper means 50 r i > 175) or if (2) c i /r i is greater than 1/3 (to exclude participants with an exceptionally high contact 51 with cases). For this paper we only consider responses in which participants provide information for their 52 region. Hence, the data is aggregated by region for all participants, to obtain the estimator of COVID-19 To evaluate the accuracy of this method to sense the cumulative number of cases of COVID-19, we compare is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.29.20248125 doi: medRxiv preprint provides data for n = 61, 075 participants (0.1787% ± 0.0984% of the regional population, and 0.1299% 58 of the national population). We consider as positive cases those that tested positive to the point-of-care or 59 immunoassay IgG tests (Supplementary Table 6 The Bland-Altman plot in Figure 1A shows a high correlation between the CoronaSurveys estimates and the Our study presents a number of limitations. Firstly, as presented in Table 1 , our number of responses 90 in some regions was limited (e.g., 9 responses in La Rioja or 16 in Navarra and Cantabria). Our own 91 analysis suggests this is not enough to offer reliable data for these three regions. Additionally, our criteria 92 to eliminate outliers is heuristic, and may change in the future as we collect more data. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. This is a provisional file, not the final typeset article 6 . CC-BY-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 1, 2021. ; https://doi.org/10.1101/2021.01.29.20248125 doi: medRxiv preprint How much is coronavirus spreading under the radar