key: cord-0662203-uq0rvxc1 authors: Bonifazi, Gianluca; Lista, Luca; Menasce, Dario; Mezzetto, Mauro; Pedrini, Daniele; Spighi, Roberto; Zoccoli, Antonio title: A study on the possible merits of using symptomatic cases to trace the development of the COVID-19 pandemic date: 2021-01-05 journal: nan DOI: nan sha: 7296175f397f3ce33e1623f9d496404297bb23cc doc_id: 662203 cord_uid: uq0rvxc1 In a recent work [1] we introduced a novel method to compute $R_t$ and we applied it to describe the development of the COVID-19 outbreak in Italy. The study is based on the number of daily positive swabs as reported by the Italian Dipartimento di Protezione Civile. Recently, the Italian Istituto Superiore di Sanit`a made available the data relative of the symptomatic cases, where the reporting date is the date of beginning of symptoms instead of the date of the reporting of the positive swab. In this paper we will discuss merits and drawbacks of this data, quantitatively comparing the quality of the pandemic indicators computed with the two samples. The worldwide data about the development of the COVID-19 outbreak is always reported as daily number of positive swabs. This quantity suffers from several problems, since it can be biased by different strategies and response time of swab data taking in different regions and different periods of time. It's affected by strong weekend effects in the recording of the data, due to reduced capacity of processing swabs on Saturdays and Sundays, furthermore the reporting of a positive swab introduces a further delay between the dates of contagion and those of appearance of symptoms (if any). Potentially, the reporting of symptomatic cases, together with the date of symptom onset, could attenuate most of these issues. In principle, symptomatic cases should suffer less from different strategies of swab data taking, being the most urgent cases to be treated, and the date of symptom onset should be less influenced by weekend effects and should not be affected by additional delays introduced by the processing and reporting of a molecular swab. On the other hand, the sample of symptomatic cases is a subset of the total number of cases, while the size of the sample is an issue for relatively small populations, like Italian regions or provinces. Furthermore, a bias could be introduced if the true fraction of asymptomatic cases changes during the pandemic because of a modification of the age distribution of infected people. From December 6 th 2020, the numbers of symptomatic cases, associated to the date of symptom onset, are made available in Italy by the daily bulletin of the Istituto Superiore di Sanità (ISS) [2] 1 . The published data contains the history of all the symptomatic cases on a national basis, while for regions and provinces the daily data are only reported. Data about positive swabs are instead published, since the beginning of the outbreak, by the Italian Dipartimento di Protezione Civile (DPC) [3] . It becomes then possible to compare the information that can be extracted from the full sample of positive swabs with the one from the sub-sample of symptomatic cases. a Corresponding author, e-mail: mauro.mezzetto@pd.infn.it. 1 It should be noted that molecular swabs initiated on February 24, 2020 and the reported symptomatic cases reported before this date refer to those positive swabs. For this reason the symptomatic cases reported from January 28 to February 24 are an incomplete sample and we don't consider them in the following In the following, we will work out several indicators to compare the merits and the differences of the two samples. We show in Fig. 1 the daily data of the symptomatic cases and positive swab samples. We perform a fit to the data with the sum of three Gompertz functions, g(t; a, b, c): in order to take care of the first phase of the outbreak in the period March-April, the increase of August and the third phase in October-December, as reported in the same Fig. 1 . The positions of the first and the second peak are at days 49.5 and 279.8 respectively for the symptomatic cases and at days 61.0 and 288.7 respectively for the positive swabs sample, dates are counted since January 28 th , 2020. The fit errors of the peak positions, considering only diagonal terms in the covariance matrix, are of the order of 0.1 days. We conclude that the positive swab sample is delayed by about 9 days, which can be considered the average delay between the appearance of the symptoms and the reporting of a positive swab. This number takes into account that asymptomatic cases are mostly detected by a tracing of the symptomatic cases (and they are therefore delayed) and not by a screening of the population. The fact that the delay at the first peak was bigger by 2.6 days with respect to the delay at the second peak could be understood as better procedures for swab processing developed along the outbreak. We can quantify the level of fluctuations present in the two samples by computing the residuals of the curves. We define as residual the difference of the fitted point with the data point, divided by the Poisson error of the data point. Would the Poisson error be the only source of errors, we should find a distribution of residuals with a standard deviation of 1. Limiting the analysis to the first peak, where the size of the two samples is similar, we obtain a standard deviation of the residual distributions equal to 8.2 for the symptomatic cases and of 9.3 for the positive swabs. A contribution to these high values can be associated to a non-perfect parameterization of the data or to an underestimation of the quoted errors that does not take into account the systematic contribution. We can anyway conclude that the sample of symptomatic cases does not significantly reduce the dispersion of the data around the central values with respect to the positive swab sample. Another quantity that could be influenced by additional fluctuations present in the positive swabs sample is the width of the peaks. If, for instance, the delay between the date of appearance of symptoms and the date of reporting of a positive swab would follow a broad distribution, this could affect the width of the fitted peaks. We compute the FWHM of the peaks as fitted with the three Gompertz curves. We found a FWHM of 35 and 41 days at the first peak and 41 and 45 days at the second peak for the symptomatic and positive swab samples respectively. We consider these values as an indication of a significant contribution of the dispersion of swab reporting times to the distribution of positive cases. If we attribute the increase of FWHM of the second peak entirely to this effect, the reporting times should have a standard deviation of about 8 days. A side result of these comparisons is the distribution of the fraction of asymptomatic cases in the positive swabs sample along the outbreak. If we anticipate by 9 days the positive swabs distribution, according to the above discussion, we can compute day by day the difference of the two data and from this the fraction of asymptomatic cases in the positive swabs sample. The result is displayed in Fig: 2 . We observe that at the beginning of the pandemic the fraction of asymptomatic cases went to almost zero at the peak of the pandemics. It then grew to about 0.6 at the end of first peak, and remained stable until the second peak was reached, when the total number of swabs was probably insufficient to guarantee a proper tracing. We demonstrated in [1] that the growth rate λ = 1/t 2 , where t 2 is the doubling time of an exponential fit to the data in the last "n" days, is as good an indicator as R t for the description of the behaviour of the outbreak. We represent the distribution of λ for the symptomatic and positive swabs samples. λ is computed via an exponential fit to the last 14 days and we display its moving average along 14 days. Results are displayed in Fig. 3 . The two curves have almost the identical behaviour with the characteristic delay of the positive swabs curve. We display the same result in terms of the more familiar R t in Fig. 4 , computed using the algorithm published in [1] . Again, the two curves are very similar, and in particular there is no evidence of underestimation of the value of R t by the curve. For the sake of completeness we repeat the same comparison with four of the most common algorithms used in literature to evaluate R t : Wallinga and Teunis [4] , Cori et al. [5] , both computed thanks to the public package EpiEstim [6] , Bettencourt-Ribeiro [7] , computed following the indications of [8] , and Robert Koch Institute (RKI) [9] . The plots are reported in Fig. 5 and show the identical behaviour of the plot of Fig. 4 . It should be noted that the fact that the symptomatic curve is anticipated by about 9 days, doesn't mean that by using this data one can identify in advance the trends of the outbreak. The reporting time of the dates of symptoms indeed follows the time of positive swabs, and a minimum delay of 14 days is suggested by ISS to collect all the dates of symptoms. This is coherent with our estimation of 9 days of delay with a standard deviation of 8 days. Considering this effect, the symptomatic cases sample is even slower, as a real-time estimator, than the positive swabs sample, needing 14 days for data collection and having an anticipation of 9 days. We have compared the information that can be extracted about the development of the COVID-19 outbreak in Italy by using the daily new cases reported for the infected with symptoms and for the total sample of positive swabs. The symptomatics sample is a valuable control sample because under some aspects it suffers of less systematic effects than the positive swabs sample. We observe a modest reduction of the dispersion of the data with the sample of symptomatic cases and a better definition of the peaks in the distribution of the daily positive cases. The differences between the two curves can be explained with a delay time, between the appearance of the symptoms and the date of the reported positive swab, of about 9 days with a standard deviation of about 8 days. With this correction, the two samples are comparable and the extracted R t is almost identical. Real-time evaluations of R t are faster and more robust with the sample of the positive swabs. We conclude that the sample of the positive swabs can be safely used to monitor the development of the COVID-19 outbreak. We publish daily estimates of R t in real time, together with many information about the development of the Italian outbreak in [10] . Daily values for the major world countries are also reported. A simplified estimate of the Effective Reproduction Number Rt using its relation with the doubling time and application to Italian COVID-19 data Dati COVID-19 Italia Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases The Metric We Need to Manage COVID-19. Rt: the effective reproduction number The present work has been done in the context of the INFN CovidStat project that produces an analysis of the public Italian COVID-19 data. The results of the analysis are published and updated daily on the website covid19.infn.it/. The project has been supported in various ways by a number of people from different INFN Units. In particular, we wish to thank, in alphabetic order: Stefano Antonelli (CNAF), Fabio Bredo (Padova Unit), Luca Carbone (Milano-Bicocca Unit), Francesca Cuicchio (Communication Office), Mauro Dinardo (Milano-Bicocca Unit), Paolo Dini (Milano-Bicocca Unit), Rosario Esposito (Naples Unit), Stefano Longo (CNAF), and Stefano Zani (CNAF). We also wish to thank Prof. Domenico Ursino (Università Politecnica delle Marche) for his supportive contribution.