key: cord-0853608-0sskhkbw authors: Alvarez, L.; Colom, M.; Morel, J.-M. title: Removing weekly administrative noise in the daily count of COVID-19 new cases. Application to the computation of Rt date: 2020-11-18 journal: nan DOI: 10.1101/2020.11.16.20232405 sha: c4d09be9f77ffcb0793d3bcba9efd4c7d84d9410 doc_id: 853608 cord_uid: 0sskhkbw The way each country counts and reports the incident cases of SARS-CoV-2 infections is strongly affected by the "weekend effect". During the weekend, fewer tests are carried out and there is a delay in the registration of cases. This introduces an "administrative noise" that can strongly disturb the calculation of trend estimators such as the effective reproduction number Rt. In this work we propose a procedure to correct the incidence curve and obtain a better fit between the number of infected and the one expected using the renewal equation. The classic way to deal with the administrative noise is to invoke its weekly period and therefore to filter the incidence curve by a seven days sliding mean. Yet this has three drawbacks: the first one is a loss of resolution. The second one is that a 7-day mean filter hinders the estimate of the effective reproduction number Rt in the last three days before present. The third drawback of a mean filter is that it implicitly assumes the administrative noise to be additive and time invariant. The present study supports the idea that the administrative is better dealt with as being both periodic and multiplicative. The simple method that derives from these assumptions amount to multiplying the number of infected by a correcting factor which depends on the day of the week. This correcting factor is estimated from the incidence curve itself. The validity of the method is demonstrated by its positive impact on the accuracy of an the estimates of Rt. To exemplify the advantages of the multiplicative periodic correction, we apply it to Sweden, Germany, France and Spain. We observe that the estimated administrative noise is country dependent, and that the proposed strategy manages to reduce it noise considerably. An implementation of this technique is available at www.ipol.im/ern, where it can be tested on the daily incidence curves of an extensive list of states and geographic areas provided by the European Centre for Disease Prevention and Control. The effective reproduction number R(t) is one of the most important epidemiological characteristics of the COVID-19 pandemic. It is constantly invoked by politicians and their scientific advisers to steer the social distancing measures. R(t) represents the expected number of secondary cases produced by a primary case at each time t. It can be computed from the incidence curve i(t) and the serial interval Φ, which is the empirical probability distribution of the time between the onset of symptoms in a primary case and the onset of symptoms in secondary cases. There are different strategies to estimate R(t) from i(t) and Φ: EpiEstim, the method proposed in [4] is widely used. In the online interface available at www.ipol.im/ern, we compare our estimate of R(t) with the one of EpiEstim. In [11] a technique to compute R(t) separating local transmission and imported cases is proposed. In [6] the authors make a detailed comparison of the methods proposed in [4] , [2] and [13] . A systematic review and analysis of the serial interval is presented in [12] . In [10] , [5] , [7] different statistical distributions of the serial interval Φ for the SARS-CoV-2 are proposed. The so-called renewal equation (see [9] ) is a key epidemiological model linking the daily count of new detected cases of infections, i(t), with the reproductive power, A(t, s), at time t and infection-age s at which an infected individual generates secondary cases. A(t, s) depends on R(t) and Φ(s) and the renewal equation can be expressed as where t c represents the current time (the last time at which i(t) is available). For instance, in [4] and [3] the following formulation of the renewal equation is used to compute R(t) from i(t) and Φ(s): This model is a simplification of the Nishiura equation [8] : In the original formulation of this model it was assumed that Φ(s) = 0 for s ≤ 0. However, in the case of SARS-CoV-2 a patient can show symptoms before the patient who infected him/her shows symptoms him/herself. This means that, actually, Φ(s) can be positive for s < 0. In [1] , the above model is used without any restriction on the support of Φ(s). Since measurements are generally made daily, a discrete formulation of model (3) is sound. It can be expressed as where f 0 is, in general, negative. In the technique we propose in this work, either of the above expressions of the renewal equation can be used. In the experiments presented, we use the model F 2 (i, R, Φ, t), and the method developed in [1] to estimate R t from i t and Φ s . This method is based on the the minimization of the following energy: 3 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.16.20232405 doi: medRxiv preprint where p 90 (i) is the 90th percentile of {i t } t=0,..,tc used to normalize the energy with respect to the size of i t . The first term of E is a data adjustment term which forces the renewal equation (4) to be satisfied as much as possible. The second term forces R t to be a smooth curve; w t ≥ 0 represents the weight of the regularization at each time t. The higher the value of w t the smoother R t . The last term of E forces R tm to be close to an initial estimate given byR tm for some particular times t m . Roughly speaking, minimizing the energy E leads to satisfy approximately the renewal equation (4) with a reasonably smooth R t and, optionally, prescribed initial values for some particular times t m . The daily number of new detected cases, i t , is strongly affected by the "weekend effect". During the weekend, fewer tests are carried out and there is a delay in the registration of cases. It follows that the actual number of cases is systematically underestimated in some days of the week and overestimated in others. The usual way to deal with this weekly administrative noise is to use a 7-day moving average of i t , but this procedure negatively affects the accuracy of the point estimate of the trend of i t when approaching the current date and forces stopping the estimate three days before present. The main assumption of this paper is that a significant part of the discrepancy between i t and its expected value F (i, R, Φ, t) is given by the weekend effect. We therefore assume that the quotient follows a 7-day periodic-dynamic, that is, . In other words, the ratio between F (i, R, Φ, t) and i t depends mainly on the day of the week. In the experiments shown in this article, it will be observed that, indeed, q(i, R, Φ, t) follows accurately this periodic pattern in several countries 1 . For example, in Sweden every Monday the registered value i t is systematically underestimated and on Saturdays the opposite effect occurs. So we propose to approximate q(i, R, Φ, t) using a 7-day periodic function given by the vectorq(i, R, Φ) = (q 0 ,q 1 ,q 2 ,q 3 ,q 4 ,q 5 ,q 6 ). Therefore q(i, R, Φ, t) ≈q t%7 where the symbol % represents the modulo operation, that is, the remainder of the division between two numbers. To computeq(i, R, Φ) we proceed to a least square estimatê q(i, R, Φ) = arg min q=(q 0 ,q 1 ,q 2 ,q 3 ,q 4 ,q 5 ,q 6 ) where T represents the number of days used in the estimation (in our experiments we use T = 56, that is 8 weeks). We point out that the valueî t =q t%7 i t can be considered as an update of i t where we have removed the weekly administrative noise. To preserve the number of accumulated cases in the period of estimation, we add to the minimization problem (7) the constraint In that way, the multiplication by the factorq t%7 produces a redistribution of the cases i t during the period of estimation, but it does not change the global amount of cases. Notice that we could add weights to some particular days in the expression of E(i, R, Φ, q) in (7). For example, 1 To test this assumption on many more countries, the reader is invited to the online demo www.ipol.im/ern 4 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.16.20232405 doi: medRxiv preprint if a day is a holiday in the middle of the week, one might reduce, in the energy E(i, R, Φ, q), the weight of that day and the following ones. Once i t is updated usingî t =q t%7 i t , we can useî t to recompute R t using the renewal equation (1) . We denote byR t the updated version of R t . This whole procedure is repeated to update iterativelyî t andR t until convergence. The final valueî t andR t provides a more realistic trend of the evolution of i t and improves the estimation of R t . The final vectorq(i,R, Φ) represents the set of multiplicative factors used to remove the weekly administrative noise. In Fig. 1 we present a flowchart of the whole procedure and in the appendix we give technical details. Figure 1 : Flowchart of the estimation ofî t ,R t andq(i,R, Φ). First R t is computed from i t and Φ s using the method proposed in [1] andR t = R t is initialized. Next,q(i,R,Φ) is obtained by minimizing (7) with the constraint (8) . Then i t is updated asî t =q t%7 i t . The iteration is stopped when the efficiency measure I defined in formula (9) does not improve in the current iteration. Otherwise,R t is updated by the method proposed in [1] fromî t and Φ s and the iteration goes on. All of the experiments made here can be reproduced with the online interface available at www.ipol.im/ern. In the appendix, we have included some details about the use of this online demo. To measure how well the removal of the weekly administrative noise improves the explanation of i t by the renewal equation F (i, R, Φ, t), we use the following efficiency measure: where E(.) is defined in (7). I represents the reduction, after the removal of the weekly administrative noise, of the average distance between i t and F (i, R, Φ, t). The smaller I, the more efficient the noise reduction has been. In fact, the value of I can be used to assess whether it is worth applying the proposed method to a given country and in a given time interval. In Fig. 2 we plot, for Sweden, Germany, France and Spain the values of the vectorq(i,R, Φ) obtained for each day of the week. We observe that in Sweden and Germany, Monday is the day of the week where the value of i t is most underestimated (the higherq k is, the more underestimated is i t in the day k). However, in France, that day corresponds to Tuesday. This suggests that in France there is an additional delay of one day due to the way France records and reports the number of new cases. In these three countries, the effect of weekly administrative noise is mainly concentrated on Monday, Tuesday and Wednesday (with a 1-day 5 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint lag in the case of France). The case of Spain is special because it does not provide data on the weekend. In that case we have assumed that on Saturday, Sunday and Monday the number of cases is constant and equal to the accumulated number of cases on Saturday, Sunday and Monday divided by 3. Moreover, in Spain, some regions sometimes do not report data for one or several days, which produces an additional administrative noise different from the weekend effect. The more orderly the daily count of new infected, the better the result obtained by our method. In table 1 we present the numerical values of the vectorq(i,R, Φ), the efficiency measure I and the effective reproduction number in the current time before and after the weekly administrative noise removal given by R tc andR tc respectively. Notably, and with the exception of Spain, the efficiency value I shows a high reduction of the administrative noise. At the end of the period chosen for the study, between September 9 and October 28, there is a strong expansion of the virus. This period ends on a Wednesday. In the previous days the number of cases was underestimated. It follows that the estimated effective reproduction number becomes considerably higher after eliminating the weekly administrative noise. 6 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint for each day of the week, the efficiency measure, I, and the effective reproduction number in the current time before and after the administrative noise removal given by R tc andR tc respectively. In Fig. 3 -6 we plot on the one hand the quotient q(i,R, Φ, t) and its periodic approximation q t%7 , and on the other hand the values of i t , its updateî t after the removal of the weekly administrative noise and the expected value using F 2 (î,R, Φ, t). For Sweden, Germany and France we observe a quite a good agreement between the quotient q(i,R, Φ, t) and its periodic approximationq t%7 , which supports the validity of the proposed method and our assumption that the evolution of q(i,R, Φ, t) follows a 7-day periodic dynamic. For these countries,î t = q 7%t i t , the number of new cases after the removal of the administrative noise, is less oscillating than i t and very close to its expected value following the renewal equation F 2 (î,R, Φ, t). In the case of Spain, the obtained improvement is minor. On the one hand, Spain does not report data during the weekend, which introduces an additional disturbance in the data that hinders the application of the proposed method. In addition, in two of the weeks included in the period of the study, q(i,R, Φ, t) diverges significantly from its periodic approximationq t%7 . . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.16.20232405 doi: medRxiv preprint We proposed a technique to remove the weekly administrative noise in the incidence curves of COVID-19, based on the improvement of the agreement between i t and its expected value using the renewal equation F (i, R, Φ, t). The method boils down to multiplying i t by a factor q t%7 which depends on the day of the week. The main assumption supporting the validity of this approach is that the quotient of i t and F (i, R, Φ, t) follows approximately a 7-day periodic dynamic. We verified this assumption on Sweden, Germany and France. In the case of Spain the method brings less improvement, due to the fact that Spain does not report data on weekends. The proposed method updates iteratively i t and the effective reproduction number R t . The number of cases after the removal of the weekly administrative noiseî t =q t%7 i t is much less oscillating than i t and very close to its expected value following the used renewal equation. Fromî t we obtain a more accurate value of the effective reproduction number R t . An implementation of this technique is available at www.ipol.im/ern. In this online interface we compare the final estimation of R t with the estimate obtained by EpiEstim. As shown in [3] , in practice, EpiEstim uses a 7-day moving average to remove the administrative weekly noise. As can be observed using the online interface, this 7-day moving average introduces a shift towards the past that is not observed in our estimate. Therefore our method seems to provide a more to date estimate of R t than EpiEstim. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.16.20232405 doi: medRxiv preprint A variational model for computing the effective reproduction number of SARS-CoV-2 Real time bayesian estimation of the epidemic potential of emerging infectious diseases A daily measure of the SARS-CoV-2 daily reproduction number for all countries A new framework and software to estimate time-varying reproduction numbers during epidemics The serial interval of COVID-19 from publicly reported confirmed cases, medRxiv Epidemiological parameters of coronavirus disease 2019: a pooled analysis of publicly reported individual data of 1155 cases from seven countries Time variations in the transmissibility of pandemic influenza in Prussia The Effective Reproduction Number as a Prelude to Statistical Estimation of Time-Dependent Epidemic Trends Serial interval of novel coronavirus (covid-19) infections Improved inference of time-varying reproduction numbers during infectious disease outbreaks Serial intervals of respiratory infectious diseases: a systematic review and analysis Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures Here, we show some technical details of the proposed technique. First we notice that, in general, the only data we use is i t and Φ s . So initially, we do not know the actual day of the week which corresponds to each datum i t . For the computation ofq = (q 0 ,q 1 ,q 2 ,q 3 ,q 4 ,q 5 ,q 6 ), we will assume, without loss of generality, that the first day of the period we consider for the estimation, that is t c − T + 1 corresponds to the time t = 0. This means that we are initially assuming that this day corresponds to a Monday, but in fact, this is not relevant because we can reorganize at any time the values ofq to fit the actual days of the week.The minimization problem (7) given by the quadratic energy:can be expressed in matrix form as13 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020. 11.16.20232405 doi: medRxiv preprint where . is the usual Euclidean norm, A ∈ R T ×7 is defined byIt is well-known that the above minimization problem has a closed form solution given byqThe constraint (8) given bycan be expressed as the following additional linear equation:this constraint can be included in the minimization procedure by removing one of the unknowns using the above equation. However, we use another approach which consists of adding the above equation, multiplied by a certain weight w, to the expression (11) . In this way we can control the weight that is assigned to this constraint. If w is large, we force the restriction to be fulfilled and if w = 0 we remove this constraint. In the experiments performed in this work we use w = 10 5 , so we choose that the constraint is satisfied.As explained in the Fig. 1 , in the proposed method, we update iteratively i t , R t and q(i,R,Φ). Let us denote by i n t , R n t andq n (i,R,Φ) these updates for each iteration n starting at n = 0. Following the flowchart of Fig. 1 these updates are computed using the following algorithm:14 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.16.20232405 doi: medRxiv preprint Data: i, Φ. Result:î,R,q(i,R,Φ). We fix i 0 ≡ i, I 0 = 1, q 0 ≡ 1, and compute R 0 from i 0 and Φ using the method proposed in [1] ; for n = 1, 2, .., M axIter do compute q n (i,R n−1 ,Φ) using the method explained above; compute i n t = q n t%7 i t ; compute I n = E(i n ,R n−1 ,Φ,q n ) E(i 0 ,R 0 ,Φ,q 0 ) ; if I n > I n−1 then stop iterations; elsê i = i n ; q(i,R,Φ) = q n (i,R n−1 ,Φ); compute R n from i n and Φ using the method proposed in [1] ; R = R n ; end end Algorithm 1: Algorithm to estimateî,R,q(i,R,Φ). M axIter represents the maximum number of iterations allowed. In the online interface, available at www.ipol.im/ern, one can test the method proposed in [1] to estimate R t as well as the method proposed here to remove the weekly administrative noise. When the DEMO is executed with the option "remove weekly administrative noise" activated, the method proposed in this paper is used to computeî t ,q(i,R,Φ) andR t . A comparison with the EpiEstim method is showed. We also show (in the plot on the right) the original curve i t as well as the estimation using the renewal equation F 2 (î,R, Φ s , t). In the output file named "Rn.csv" you will find in the first row, the final value of the efficiency measure I (see (9) ) and the vectorq(i,R,Φ). The elements of the vector are organized in such a way that the last onê q 6 corresponds to the multiplicative factor of the last element of i t (that is i tc ). In this way, to obtainî t we have just to doî tc =q 6 i tc ,î tc−1 =q 5 i tc−1 ,..., and, in general:î t =q (t+6−tc%7)%7 i t . Moreover, in the first column of the output file "Rn.csv" you will find the values ofR t , in the second column, F 2 (î,R, Φ s , t), in the third column, the original number of infected i t , and in the forth column a measure of the variability in the estimation ofR t , as explained in [1] . . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted November 18, 2020. ; https://doi.org/10.1101/2020.11.16.20232405 doi: medRxiv preprint