key: cord-0792810-62ichbjg authors: Günther, Felix; Bender, Andreas; Katz, Katharina; Küchenhoff, Helmut; Höhle, Michael title: Nowcasting the COVID‐19 pandemic in Bavaria date: 2020-12-01 journal: Biom J DOI: 10.1002/bimj.202000112 sha: 838c90ececb0ac74099882c0231ebbd0d04be738 doc_id: 792810 cord_uid: 62ichbjg To assess the current dynamics of an epidemic, it is central to collect information on the daily number of newly diseased cases. This is especially important in real‐time surveillance, where the aim is to gain situational awareness, for example, if cases are currently increasing or decreasing. Reporting delays between disease onset and case reporting hamper our ability to understand the dynamics of an epidemic close to now when looking at the number of daily reported cases only. Nowcasting can be used to adjust daily case counts for occurred‐but‐not‐yet‐reported events. Here, we present a novel application of nowcasting to data on the current COVID‐19 pandemic in Bavaria. It is based on a hierarchical Bayesian model that considers changes in the reporting delay distribution over time and associated with the weekday of reporting. Furthermore, we present a way to estimate the effective time‐varying case reproduction number [Formula: see text] based on predictions of the nowcast. The approaches are based on previously published work, that we considerably extended and adapted to the current task of nowcasting COVID‐19 cases. We provide methodological details of the developed approach, illustrate results based on data of the current pandemic, and evaluate the model based on synthetic and retrospective data on COVID‐19 in Bavaria. Results of our nowcasting are reported to the Bavarian health authority and published on a webpage on a daily basis (https://corona.stat.uni-muenchen.de/). Code and synthetic data for the analysis are available from https://github.com/FelixGuenther/nc_covid19_bavaria and can be used for adaption of our approach to different data. Daily reported case numbers of an infectious disease outbreak do not correspond to the actual number of disease onsets on that day. Due to delays from reporting and testing, the number of newly reported cases and the actual number of newly diseased cases can substantially differ. It is the latter, however, that is of central interest when assessing the state and dynamics of an epidemic outbreak. Focusing on the daily number of reported cases hampers our ability to understand current dynamics of the outbreak close to now. This is especially problematic when one wants to gain insight about the current trend or if one wants to assess the effects of political and social interventions in real time. Knowledge of the actual number of new infections per day is highly relevant for the current COVID-19 pandemic, where far-reaching political action was taken in order to contain the epidemic outbreak in 2020. The problem of occurred-but-not-yet-reported cases during outbreaks is well known from the HIV/AIDS outbreak and different statistical approaches have been proposed to handle delayed reporting. A standard reference is Lawless (1994) . A more flexible Bayesian approach, which is the basis of the model we use here, has been developed by Höhle and an der Heiden (2014) . In the following, we will refer to this delay adjustment approach as the nowcast and define the reporting delay as the time between disease onset and official case reporting by a health authority. Other authors use the term nowcasting for models that focus on adjusting the administrative delay between the first case report to a local health authority and registration (in aggregated data) at higher (e.g., state and/or federal) authorities (De Nicola, , or to perform nowcasting of fatal cases between case registration and fatality date . The basic idea of the nowcasting approach proposed here is to estimate the reporting delay between disease onset and reporting date based on reported cases from the past for which the date of disease onset and the reporting date are known. Given the delay distribution and the current number of case reports with reporting dates close to now, we can infer the actual number of new disease onsets at current dates. The resulting estimated epidemic curve of disease onsets per day gives a more realistic picture of the current state of the epidemic than looking at daily counts of new case reports. Furthermore, the nowcast can facilitate estimation of the time-varying effective reproduction number (Wallinga & Teunis, 2004) . There are other approaches including mathematical infection models (compartmental models) for the estimation of , see, for example, Khailaie et al. (2020) . One complication of using nowcasting for COVID-19 reports is that reporting of symptom onset in cases is not complete: either this information could not be elicited due to difficulties getting in contact with the case or because symptoms had not manifested (yet) at the time of contact with the case. This point was first addressed in Glöckner, Krause, and Höhle (2020) and a similar approach based on Lawless (1994) is used by the Robert Koch Institute for analyzing COVID-19 in Germany (an der Heiden & Hamouda, 2020). Using our approach, we provide nowcast estimates for the COVID-19 pandemic in Bavaria using data from the Bavarian Health and Food Safety Authority (LGL) including the estimation of . The results are updated daily with recent data. In this article, we provide methodological details, show results based on data obtained from the LGL until April 9, 2020, 10 a.m., and provide results of the evaluation of the proposed nowcasting approach. We use daily data on reported COVID-19 cases from Bavaria from the mandatory notification data based on the German Infection Protection Act (IfSG). The data are provided by the Bavarian Health and Food Safety Authority (LGL) on a daily basis and includes a list of all reported cases with the date of reporting to the LGL, the date of reporting to the local health authority (Gesundheitsamt), the date of disease onset if available, and the district of residence for the case (Kreis). Since we get our data from the LGL, the number of cases reported to the LGL on a specific date is complete and will not change on subsequent days. These consistent data offer a valid base for inferring the epidemic curve and the considered associated quantities. The date of reporting to the local health authority is closer to disease onset due to a delay between reporting at the local health authority and transmission to the LGL. However, based on the data obtained from the LGL, the aggregated number of cases reported to the local health authorities on a given day may be incomplete because a case reported to the local health authority can be reported to the LGL with a delay of several days and therefore may not be included in the data yet. Therefore, we use the date of reporting to the local health authority only for the imputation of missing disease onsets, while the nowcast is based on the date a case was reported to the LGL (cf. Steps 1 and 2 in Section 3.1). Information on disease onset stems from a retrospective collection of the day of symptom onset. However, the daily COVID-19 surveillance data of Bavaria contain about 50%-60% cases with missing information on the day of symptom onset in the weeks close to now. For a specific week, this fraction becomes lower over time since more information on the cases is collected. The missing onset information exists partly due to the heavy workload imposed on health authorities during the pandemic, but also because a certain proportion of cases have no or only very mild symptoms. However, we expect the latter explanation to be less prominent than the former. Note also that the date of symptom onset does not correspond to the infection date due to a preceding incubation time. In the following sections, we provide methodological details regarding the proposed nowcasting (cf. Section 3.1 as well as the estimation of the time-varying case reproduction number, Section 3.2). The nowcast itself consists of two steps: imputation of missing disease onset dates (Step 1) and Bayesian nowcasting based on the imputed data (Step 2). Due to the many cases with a missing disease onset date, we decided to proceed with a two-step approach for nowcasting. First, we impute missing data on disease onset and, second, perform the nowcast based on the information on reporting date (available for all cases) and the date of disease onset, which is partly available and partly imputed. Imputing missing disease onset information implies that we also consider presymptomatic and asymptomatic COVID-19 cases in our analyses (to the part at which they are observed in the official COVID-19 case counts). The rationale is that this allows to compare the nowcasting results to the daily reported case numbers. In addition, it is not straightforward to limit the analysis to symptomatic cases, because in cases with missing disease onset date it is not entirely clear whether they are asymptomatic, just symptomless at the time of reporting (pre-symptomatic), or actually show symptoms, but information on the symptom onset date is missing for other reasons, for example, not yet collected. Step 1: Imputation of disease onset In the imputation step, we fit a flexible generalized additive model for location, scale, and shape (GAMLSS, Stasinopoulos, Rigby, Heller, Voudouris, & De Bastiani, 2017) , assuming a Weibull distribution for the delay time > 0 between disease onset and reporting date at the local health authority: where and are the location and scale parameters of the Weibull distribution with density ( | , ) = ⋅ ⋅ ( −1) exp(− ). The same, additive predictor (1) was defined for both, and , however, the estimated effects could differ for the two distributional parameters. In (1), parameter 0, is the location-or scale-specific global intercept and , is the effect of the weekday on which the report arrived at the local health authority. Furthermore, 1, and 2, are smooth effects of the calendar week (of report arrival) and age of case, respectively, both parameterized via cubic splines. To estimate the model, we use data of all cases for which the disease onset date and thereby is available. Afterward, we impute the delay time , if missing, by sampling from the fitted, conditional Weibull distribution and derive the missing symptom/disease onset date. No imputation is performed for observations for which the symptom onset date is reported. Since this imputation induces, conditional on the predictors of the GAMLSS imputation model, a missing at random assumption with respect to the time between disease onset and case reporting, we perform a sensitivity analysis, where we omit (i) all individuals where the reports say explicitly that they were symptom-free and (ii) all individuals with missing information about symptoms. This allows us to check, whether the dynamics of the daily number of individuals with available symptoms are structurally different compared to all registered cases over time. Step 2: Bayesian nowcasting For the nowcasting step, we use a Bayesian hierarchical model based on Höhle and an der Heiden (2014), which associated implementation in the R-package surveillance (Salmon, Schumacher, & Höhle, 2016) . In the present work, we have extended the approach considerably, adapted it to the context of COVID-19, and provide a novel implementation in rstan (Stan Development Team, 2020). Let , = , be the (observed) number of cases, with disease onset on day and reported with a delay of days (case report arrives on day + ). On day > ("current" day, i.e., "now"), the information is available on cases that had disease onset on day and are reported until day . The aim of nowcasting is to predict the unobserved total number of disease onsets on day t, ( , ∞) = ∑ ∞ =0 , , based on information available up until the current day . For identifiability reasons, one defines a maximum relevant delay time of = and considers each observation with an observed delay > as having a delay of . As described in Höhle and an der Heiden (2014) , the hierarchical Bayesian model for nowcasting consists essentially of two parts: a model for the expected number of disease onsets on day , ( ( , ∞)) = , and a model for the delay distribution at day , specifying the probability of a reporting delay of days for a case with disease onset at day , (delay = |onset = ) = , . Both parts of the model can in general be flexibly specified. We set the maximum delay to = 21 and utilize the following hierarchical model for nowcasting: The number of cases with disease onset at day and reporting delay days, , , is assumed to follow a negative binomial distribution with expectation × , , and overdispersion parameter . For the delay distribution, we utilize a discrete time hazard model ℎ , = (delay = |delay ≥ , , ) as logit(ℎ , ) = + ′ , , = 0, … , − 1; ℎ , = 1, where , is a vector of time-and delay-specific covariates and the corresponding regression coefficients. In our main model, we use linear effects of time with breakpoints every 2 weeks before the current day (corresponding to a first-order spline), and a categorical weekday effect of the reporting day with a common effect for holidays and Sunday, since there are substantial differences in the reported case numbers over the week. From model (3), we can derive the probabilities of interest in (2), ,0 = ℎ ,0 and , = (1 − ∑ −1 =0 , ) × ℎ , . The main goal of nowcasting is to obtain inference about ( , ∞) = ∑ =0 , . Based on the described Bayesian hierarchical model, this corresponds to a sum of negative binomial distributed counts and we can obtain such inference by summing up the Markov chain Monte Carlo (MCMC) samples of , at each timepoint . In an alternative specification of the model during evaluation (see below) we assume that , | , , ∼ Po( × , ). In this case ( , ∞) is Poisson distributed as well and it is directly possible to sample from Po( ) to obtain inference about ( , ∞). The utilization of the first-order random walk for modeling in (2) was motivated by results of McGough, Johansson, Lipsitch, and Menzies (2020). For the modeling of the delay distribution, we utilized several different approaches and covariates and evaluated them on synthetic data and retrospectively on the Bavarian COVID-19 data (see below for a description of the approaches). Once a depletion of susceptibles occurs during an outbreak of a person-to-person transmitted disease or specific interventions are made, a key parameter to track is the so-called effective reproduction number (also referred to as case reproduction number). This time-varying quantity is defined as follows: consider a case with disease onset on daythe expected number of secondary cases one such primary case generates will be denoted by ( ). The time until these secondary cases will show symptoms is governed by the serial-interval distribution, which is defined as the time period between manifestation of symptoms in the primary case to time of symptom manifestation in the secondary case (Svensson, 2007) . We estimate the time-varying case reproduction number by the procedure of Wallinga and Teunis (2004) : Consider a case showing symptoms for the first time on day . The relative likelihood that a case (with symptom onset on day ) was infected by is given by where is the probability mass function of the serial-interval distribution. For the serial interval distribution, we use a discretized version of the results from Nishiura, Linton, and Akhmetzhanov (2020), which find a log-normal distribution with mean 4.7 days and standard deviation 2.9 as the most suitable fit to data from 28 infector-infectee pairs. An estimate of the effective reproduction number at time is now given as the average reproduction number of each case showing first symptoms of the illness on day :̂( (4) We prefer this ( ) estimation over the method used in an der Heiden and Hamouda (2020), because it is unbiased for our generation time distribution (see the discussion in Höhle, 2020) . For each imputed data set, we extract = 500 time series of case counts from the posterior distribution of the nowcast and then estimate ( ) as defined in (4) for each time series using the R-package R0 (Obadia, Haneef, & Boëlle, 2012) . Furthermore, each ( ) estimation generates = 100 samples from the corresponding sampling distribution of ( ). Altogether, we report̂( ) as mean of these × samples together with the 2.5% and 97.5% quantiles to form a 95% credibility interval for ( ). We estimate ( ) for all so that + (0.95) ≤ , where (0.95) is the 95% quantile of the serial interval distribution. This avoids a downward bias in the ( ) estimation near "now." Alternatively, one could employ correction methods near (Cauchemez et al., 2006) . We perform an evaluation of the hierarchical nowcasting based on synthetic data mimicking the reported Bavarian COVID-19 data and retrospectively on the official data from the LGL that were reported until July 31. For creation of the synthetic data, we utilized a smoothed version of the observed number of reported disease onsets per day and specified a reporting delay model similar to the model described in (3) with five changepoints in the linear time effect on the hazard. This leads first to a slight increase, followed by a decrease and stabilization, and a final slight increase of the (average) reporting delay over time (see the supplemental material for a detailed description and visualization of the data generating process). The aggregated daily numbers of disease onsets and daily numbers of reported cases are similar in structure to the officially reported data. For faster computation during the evaluation, we divided the daily cases by two. For the retrospective evaluation on the official COVID-19 data, we focus on all reported cases with available disease onset and on the time period between March 17 and June 30, assuming that all cases that will be reported with disease onset until June 30 are reported on July 31. For the evaluation of the nowcasting, we estimate several different models (Table 1) . We vary the distributional assumptions of , between Poisson and negative binomial (cf. Section 3.1, Step 2). Furthermore, we vary the specification of the model for the reporting delay distribution: first, we assume a reporting delay distribution without changes over time, second, we estimate linear effects of time on the delay distribution with changepoints every 2 weeks, and third use a different specification of the discrete time hazard model, where we model logit(ℎ , ) = + , = 0, … , − 1; ℎ , = 1, with a prior on | −1 ∼ ( −1 , 2 ) and 0 = 0. With this model, we aim to estimate smooth daily changes in the delay distribution over time similar to the first-order random walk in the modeling of . In case of the synthetic data, we additionally estimate the nowcasting with the known true changepoints in the delay distribution (that are unknown in real-world applications) and in case of the retrospective evaluation on Bavarian data, we additionally include in some scenarios dummy effects of the weekday of the reporting date. To compare the performance of the different models, we estimate the log scoring rule (logS) and the continuous ranked probability score (CRPS) (Jordan, Krüger, & Lerch, 2019) , root mean squared error (RMSE), as well as coverage frequencies of 95% prediction intervals. For all those criteria, we average over all dates and nowcast predictions 2-6 days before the current date. In addition to the quantitative measures, we visually inspect the performance of the different approaches based on the nowcasting predictions and the estimated delay distribution in comparison to the retrospective truth in order to identify potential problems of the models. We extend the retrospective evaluation of the nowcasting on Bavarian data to the estimation of ( ) and compare the estimated̂( ) on the most current day max( ) s.t. + (0.95) ≤ for all to the retrospective true ( ) given all available case data until July 31. This is done based on all evaluated models, and we visually inspect the estimated̂( ) over time and compute coverage frequencies of 95% credibility intervals. All calculations were done using the statistical programming environment R (R Core Team, 2020). Nowcasting was performed based on a custom rstan (Stan Development Team, 2020) implementation. Estimation of ( ) was based on code of the R0 package (Obadia et al., 2012) for each selected posterior sample. For computation of the proper scoring rules we used the scoringRules package (Jordan et al., 2019) . Code to reproduce our analysis and for adaption to other application scenarios is available at https://github.com/ FelixGuenther/nc_covid19_bavaria. There, we also provide an artificial data set based on the observed reporting dates of cases but for data protection reasons featuring only artificial information on the age and disease onset dates of the cases. We present results based on data obtained from LGL on April 9, 2020, 10 a.m. The data contain information on 29,262 COVID-19 cases, which we restrict to 29,246 cases reported after March 1, as the first 16 COVID-19 cases reported between January 28 and February 13 (reported disease onset between January 23 and February 3, three with missing onset infor- Week-specific observed number of cases with available information on disease onset. Empirical mean, median, and 25%/75% quantile of delay distribution between disease onset and reporting at local health authority. Cases are grouped into weeks based on their reporting date at local health authority. Data from April 9, 10 a.m. mation) concerned a contained outbreak (Böhmer et al., 2020) and no further cases were detected upon February 27. This outbreak can therefore be assumed to not have contributed to the later disease spread. Information on disease onset is available for 13,137 cases, but reported disease onset was past the official reporting date for 50 cases and before January 23 for 16. We set the disease onset date for these cases as missing, yielding 13,071 cases with valid information on disease onset (44.7%). For these, the median delay between disease onset and reporting was 7 days (25% quantile: 5, 75% quantile: 11), Table 2 shows observed delay times over the observation period and reveals a considerable increase in the delay distribution over time. For imputation of missing disease onset dates, we estimate a Weibull GAMLSS with smooth effects of the reporting week, the cases' age, and a categorical effect of the weekday of report arrival on location and scale. We utilize the reporting date at the local health authority in the imputation model, since it is closer to the actual disease onset than the reporting date at LGL and is available for all cases contained in our data. Thereby, we do not have to deal with the additional reporting delay between the local health authorities and the LGL for the imputation, which might also change over time. Figure 1 shows the estimated association of the covariates with the median delay. All covariates turned out to be relevant: we find an increase in expected delay time over the reporting weeks, lower reporting delay for older cases, and differences over the course of a week. The estimated GAMLSS model is used to impute the date of disease onset for cases with missing onset information. Figure 2 shows the number of daily reported cases and the number of cases with reported and imputed disease onset on a certain day over time. Furthermore, we display the estimated new cases from nowcasting based on our main model (2). We observe a clear difference between the estimated new cases from the nowcast and the daily numbers of reported cases. The induced bias due to the reporting delay is obvious: the estimated daily new cases stabilize from around March 20 on and start to decrease afterward, while the reported cases still show a rapid increase. The 95% prediction interval, however, shows substantial uncertainty in estimates, especially for more recent estimates. Note that we set the current day for the nowcasts to April 8, since we only consider days with fully available reporting data. Furthermore, we set a reporting lag between the current date and reported nowcast results of 2 days due to considerable uncertainty in the nowcasts for dates with very few observations with reported or imputed disease onset. The black dotted line in Figure 2 shows the retrospective true number of disease onsets (reported and imputed) based on data known on July 31. We can see that the predicted number of new cases per day and the actually observed number of cases match closely and the prediction intervals contain the actual number of onsets for most days. Note, that the imputation of missing disease onset dates was performed based on the same Weibull GAMLSS but based on different data (all data available on April 8 for nowcasting and July 31 for the retrospective truth), and the number of cases with (imputed )disease onset on a specific day can therefore vary slightly. We performed an evaluation of the nowcasting approach based on synthetic data and retrospectively on the Bavarian COVID-19 data to investigate the performance of the Bayesian hierarchical model under various model specifications and gain a better understanding of important aspects of modeling. For the synthetic data, we found the following (Table 3 , Results of the evaluation of different nowcasting models on synthetic and actual Bavarian data (retrospectively). CRPS is the continuous ranked probability score, logS denotes the log scoring rule, RMSE denotes the root mean squared error of the posterior median. Additionally, we provide coverage frequencies of 95% prediction intervals for the number of disease onsets per day and coverage frequencies of 95% credibility intervals in the estimation of ( more detailed results in the supplemental material): when we supply the true, in reality unknown changepoints of the delay distribution to model fitting the nowcasting approach performs best with respect to our evaluation metrics. Averaged over all days , and for all nowcast days = − 6, … − 2, it shows the lowest log and CRPS score, lowest RMSE and shows the desired coverage frequencies for the 95%prediction intervals. With the models assuming changepoints in the linear time effect on the reporting delay every 2 weeks before , we obtain similar, but slightly worse performance (see supplemental material for more details). The approach appears to be able to capture moderate changes in the delay distribution successfully. Modeling the changes on a daily basis shows a slightly worse performance with respect to the CRPS score and PI coverage frequencies. Assuming a constant reporting delay distribution over time and ignoring the changes leads to the worst performance with biggest scores and low coverage frequencies of the prediction intervals. When speci-F I G U R E 4 Sensitivity analysis. Estimation of nowcast and ( ) with associated 95% prediction and credibility intervals based on three different data subsets: original analysis (red), analysis only with cases with reported COVID-19 symptoms (green), and with all cases excluded that are reported as having no symptoms (blue) fying an adequate model for the delay distribution, the distributional assumptions regarding , play a minor role for the evaluation based on synthetic data. In the retrospective evaluation of the Bavarian data, the Poisson model assuming no changes in the reporting delay distribution performs badly as well. This is in line with the apparent changes in the reporting delay between disease onset and reporting at LGL over time (supplemental material). Comparing the Poisson model with 2-week changepoints with a similar model using a negative binomial distribution for , we find the latter to perform better. Adding weekday effects to the delay distribution improves the performance of the models as well. Comparing the negative binomial model with daily changes in the delay distribution with the 2-week changepoint model, we found better coverage frequencies for the former (e.g., 90% vs. 82% when including the weekday effect) but lower CRPS score and RMSE for the latter. Comparing the estimated̂( ) at most current 's based on the different nowcast models with the retrospective truth based on all reported data, we find coverage probabilities of the 95% credibility intervals bigger than 90% for all negative binomial models that consider changes in the delay distribution over time. The estimation of ( ) is, however, biased when it is based on a biased nowcasting approach, for example, when changes in the delay distribution are ignored. Our analyses show that nowcasting is a valuable real-time tool to gain situational awareness in the middle of an outbreak. Based on our evaluation, we found several aspects to be important for the successful application of nowcasts: first and foremost, it is important to account for existing changes in delay between disease onset and case reporting over time. Ignoring such changes can severely bias the predicted number of disease onsets. In the Bavarian data, we also found evidence for changes in reporting delay associated with the weekday of reporting, which should be accounted for. Second, we found an improved performance when modeling the daily counts of disease onsets with a specific reporting delay , , , based on a negative binomial distribution with overdispersion. In our data, the disease onset counts show bigger variability then implied by a Poisson distribution. Third, utilizing a first-order random walk for modeling the logarithmic expected daily number of new disease onsets, , as proposed by McGough et al. (2020) , worked well. We also tried i.i.d. log-Gamma priors and a smooth modeling of the epidemic curve based on truncated power splines as proposed in Höhle and an der Heiden (2014), but found the first-order random walk to perform best. Altogether, we found that a negative binomial model with random-walk prior of and modeling of the delay distribution via an discrete time hazard model with linear time effects and 2-week changepoints works satisfactory. With this model, we are able to account flexibly for changes in reporting delay over time and obtain a satisfactory performance on synthetic data as well as the true retrospective Bavarian COVID-19 data. The alternative smooth modeling of the delay distribution based on daily changes using a firstorder random walk also worked well for many days,but had convergence problems on some days and might be overly complex for many scenarios. However, there are important limitations of any nowcasting estimation: (i) we correct for a bias due to delays between disease onset and case reporting, but provide no correction for possible cases in the population that were not tested. This is a big issue in understanding COVID-19 spread, since there are possibly many undetected cases. Assuming a constant factor of underreporting, we can analyze the dynamics of the outbreak in a more reliable way by our nowcasting method compared to focusing on daily counts of newly reported cases. Furthermore, ( ) estimates would be invariant to such constant underreporting. However, if the proportion of undetected cases varies over time, then the dynamics of the pandemic is not described adequately by our approach as well. (ii) We model the temporal variation in the delay distribution in a flexible way. However, short-term changes, especially in the time close to the current day can lead to a bias, because it is particularly hard to distinguish between developments in the epidemic curve and changes in the reporting delay with no or very less data. (iii) Our imputation method includes a missing at random assumption, which implies that the time between disease onset and reporting is the same for individuals with and without available symptom onset date. This could be violated due to many asymptomatic and presymptomatic among the reported COVID-19 cases. However, the sensitivity analyses in the Appendix show that our results are relatively stable to variations of this definition. Comparing our approach to the one used by the Robert Koch Institute an der Heiden and Hamouda (2020), we use a more detailed modeling of the delay distribution for the nowcast, for example, including the day of the week in our model, which turned out to be relevant in our data. Furthermore, we observed and modeled a dependence of the delay time on calendar time as part of the nowcast. This was not originally taken into account by an der Heiden and Hamouda (2020). When calculating the effective reproduction number ( ), an der Heiden and Hamouda (2020) used a constant generation time of 4 days, while our approach includes a more realistic assumption of an individually varying time originating from a lognormal distribution, which also provides a smoother estimate over time. The approach to estimate ( ) proposed by Khailaie et al. (2020) includes a complex compartmental model with many assumptions about the other model parameters, which in part can only be guesstimated from literature sources. Their procedure of estimating ( ) is only partly data driven and mainly relies on cumulative reported cases in the federal states of Germany. Confidence intervals are generated by the variation of the other model parameters. This highlights the problems of the approach: while compartmental models can be useful for forecasting, its value for real-time estimation of ( ) hinges on it being a realistic model with a well-calibrated parameter estimated. Instead, we prefer the more statistically driven transmission-tree-based estimates, which rely less on model assumptions and more on a statistically sound analysis of the available data. In our retrospective evaluation of the Bavarian COVID-19 data we found, that the estimation of ( ) based on the predicted daily counts of disease onsets from nowcasting performs well if the nowcast model is adequately specified. Coverage frequencies of the 95% credibility intervals were as desired compared to a calculation of ( ) based on all retrospectively available disease onset data. The utilization of the predictive distribution from the Bayesian nowcast for the estimation of ( ) helps therefore successfully to avoid a bias close to the current date due to diseased-but-not-yet reported cases. For interpretation of the estimated̂( ) over time, similar limitations arise as in the interpretation of the estimated epidemic curve from the nowcast. If the fraction of undetected cases compared to all cases changes strongly over time, for example, due to changes in testing strategy, this can bias the estimated reproduction number. Compared to the interpretation of the estimated epidemic curve, the time-varying reproduction number might, however, have the advantage that it only requires stable conditions within a short time window, since it compares the estimated and reported number of disease onsets to the situation at time points close by, instead of looking at the absolute numbers over a longer period of time. Summarizing, we believe that our results give a much more reliable picture of the course of the pandemic than the mostly used time series of reported cases. For the interpretation, it has to be emphasized, that we estimate the number of persons with disease onset on a certain day. Our proposed nowcasting model can be applied to other data, when sufficient information about disease onset dates is available and the numbers are large enough for reliable modeling. On our webpage (corona.stat.uni-muenchen.de), we present daily results of the nowcasting for Bavaria and, in addition, for the city of Munich. Since we introduce no correction for cases, which are never detected, our estimated epidemic curve should be related to other data sources, like hospital admission, ICU admission, or death numbers. This aspect highlights the need for the collection and combination of many different data sources-each bringing challenges of its own. We thank Titus Laska for initial code on the ( ) estimation. An early version of the imputation modeling, but without any methodological extensions of the nowcast, was proposed in the now disused work by Glöckner et al. (2020) . We thank two anonymous referees for constructive and helpful comments. Open access funding enabled and organized by Projekt DEAL. The authors have declared no conflict of interest. This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section. This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results. The results reported in this article were reproduced partially for data confidentiality reasons. Since the amount of data with missing information on disease/symptom onset date are rather high, we perform two sensitivity analyses. The symptom onset date can either be missing because the reporting local health authorities were not able to provide any information or because a case did not experience any symptoms until case reporting. Based on available information, we can distinguish between cases for which COVID-19 symptoms are documented at time of reporting (17,723, 61%; 73% with available onset), cases explicitly without symptoms (2,221, 8%), and cases without any information on symptoms (9,302, 32%). In the first sensitivity analysis, we focus only on the cases with reported symptoms. In a second analysis, we exclude all cases that have been explicitly reported to have no symptoms. Figure 4 (A) indicates that the estimated structure of the epidemic curve is very similar to the main analysis when excluding asymptomatic cases and cases without known symptom status in the sensitivity analyses. This also applies to the estimated ( ) as show in Figure 4 (B). Since the sensitivity analyses consider fewer reported cases, the actual estimated number of disease onsets per day is lower as well. However, the interpretation regarding the dynamics of the COVID-19 pandemic in Bavaria is similar based on all three analyses. Schätzung der aktuellen Entwicklung der SARS-CoV-2-Epidemie in Deutschland: Nowcasting Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: A case series Real-time estimates in early detection of SARS Regional now-and forecasting for data reported with delay: A case study in COVID-19 infections Now-casting the COVID-19 epidemic: The use case of Japan Bayesian nowcasting during the STEC O104:H4 Outbreak in Germany Evaluating probabilistic forecasts with scoring rules Development of the reproduction number from coronavirus SARS-CoV-2 case data in Germany and implications for political measures Adjustments for reporting delays and the prediction of occurred but not reported events Nowcasting by Bayesian smoothing: A flexible, generalizable model for real-time epidemic tracking Serial interval of novel coronavirus (COVID-19) infections The R0 package: A toolbox to estimate reproduction numbers for epidemic outbreaks R: A language and environment for statistical computing Monitoring count time series in R: Aberration detection in public health surveillance Nowcasting fatal COVID-19 infections on a regional level in Germany RStan: the R interface to Stan Flexible regression and smoothing using GAMLSS in R A note on generation times in epidemic models Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures