key: cord-0138568-ct2wc6bt
authors: Wistuba, Tobias; Mayr, Andreas; Staerk, Christian
title: Estimating the course of the COVID-19 pandemic in Germany via spline-based hierarchical modelling of death counts
date: 2021-09-06
journal: nan
DOI: nan
sha: e7c012ef77859dc1d329c9d64d47eab9f56cdb53
doc_id: 138568
cord_uid: ct2wc6bt

The effective reproduction number is a key figure to monitor the course of the COVID-19 pandemic. In this study we consider a retrospective modelling approach for estimating the effective reproduction number based on death counts during the first year of the pandemic in Germany. The proposed Bayesian hierarchical model incorporates splines to estimate reproduction numbers flexibly over time while adjusting for varying effective infection fatality rates. The approach also provides estimates of dark figures regarding undetected infections over time. Results for Germany illustrate that estimated reproduction numbers based on death counts are often similar to classical estimates based on confirmed cases. However, considering death counts proves to be more robust against shifts in testing policies: during the second wave of infections, classical estimation of the reproduction number suggests a flattening/ decreasing trend of infections following the"lockdown light"in November 2020, while our results indicate that true numbers of infections continued to rise until the"second lockdown"in December 2020. This observation is associated with more stringent testing criteria introduced concurrently with the"lockdown light", which is reflected in subsequently increasing dark figures of infections estimated by our model. These findings illustrate that the retrospective viewpoint can provide additional insights regarding the course of the pandemic. In light of progressive vaccinations, shifting the focus from modelling confirmed cases to reported deaths with the possibility to incorporate effective infection fatality rates might be of increasing relevance for the future surveillance of the pandemic.

The COVID-19 pandemic continues to have severe impacts on public health in many parts of the world. During the course of the pandemic, different non-pharmaceutical interventions (NPIs) have been implemented to mitigate the spread of the virus, including closures of schools and nurseries, cancellation of public events, regulations regarding social distancing, closures of non-essential shops and further measures (see e.g. [1] ). Since many of these interventions impose a large burden on society and economy, it is crucial to understand which measures are effective in reducing the spread of the virus. A central figure in this context is the time-dependent effective reproduction number, which can be interpreted as the average number of cases infected by one case in a particular region at a certain time. In practice, the effective reproduction number is usually estimated based on currently available surveillance data, particularly based on daily numbers of newly confirmed cases (see e.g. [2] ). While numbers of confirmed cases are crucial for nowcasting the current development of the effective reproduction number [3, 4] , they are largely influenced by the implemented testing policies. In particular, short-term changes in numbers of conducted SARS-CoV-2 tests complicate the accurate estimation of the effective reproduction number.

Data on COVID-19 related deaths can provide an additional, retrospective viewpoint on the course of the pandemic and the assessment of NPIs. Important and influential approaches hence also focused on modelling the spread of SARS-CoV-2 based on numbers of reported deaths [5, 6] . In particular, the Imperial College COVID-19 Response Team [5] developed a Bayesian hierarchical model to estimate the impact of NPIs on the effective reproduction number in different European countries during the first wave of infections in spring 2020. In the further course of the pandemic, several interventions (e.g., closures of schools and non-essential shops) have been adapted, relaxed or restricted to particular regions, while others (e.g, cancellations of public events and face masks regulations) have largely kept in place. To account for this development, the Bayesian model of Flaxman et al. [5] has been updated and extended to estimate the effectiveness of NPIs during the second infection wave in Europe [7] .

Other authors have argued, that resulting effect estimates of NPIs are non-robust and highly model-dependent [8, 9] . In particular, the selection of NPIs to be included in the model predetermine the potential change points for the effective reproduction number and thus can have large effects on the estimates attributed to individual prevention measures. Furthermore, not only the implemented NPIs change over time, but also the adherence and awareness of the population, which may not be adequately described by categorical variables for the implemented prevention measures (cf. [10, 11] for analyses of mobility patterns during the first phase of the pandemic in Germany). Another limitation of the original model of Flaxman et al. [5] is that the infection fatality rate (IFR) is assumed to be constant over time. However, the IFR of COVID-19 increases largely with increasing age [12, 13] . As the age distribution of infections changes substantially during the course of the pandemic [14] , the effective IFR should not be regarded as constant over time when modelling the number of infections based on deaths data.

In this study we adapt the Bayesian hierarchical model of Flaxman et al. [5] to estimate the course of the effective reproduction number in Germany by continuous smoothing splines, without the need for additional information regarding the timings of specific interventions. While our model is primarily driven by the numbers of reported deaths similar as in previous modelling approaches [5, 9] , we additionally incorporate the changing age distribution of confirmed infections to account for changes in effective IFR. We compare our model estimates of the effective reproduction number with classical estimates derived solely from the numbers of confirmed cases in combination with nowcasting [2, 15] . Furthermore, we discuss resulting estimates of dark figures of infections over the course of the pandemic in Germany. Complimentary results for the individual 16 German federal states are provided in the Supplement to this paper (see Section S.3). 

To estimate the course of the pandemic in Germany based on death counts, we extend the Bayesian hierarchical model from Flaxman et al. [5] by adjusting for time-dependent effective infection fatality rates and by considering splines for modelling the effective reproduction number over time. Figure 1 provides a schematic representation of the adapted hierarchical model. The main idea of the model is to estimate the effective reproduction number R t retrospectively from daily numbers of reported deaths D t associated with COVID-19. For this, we use death counts for Germany (and also for the 16 German federal states, see Section S.3 of the Supplement) based on daily situation reports published by the German federal agency for disease control and prevention (Robert Koch Institute, RKI; [16] ).

On the last level of the hierarchical model (see Figure 1 ), the reported deaths D t are modelled with a negative binomial distribution, i.e. the likelihood is given by

with mean d t and variance d t + d 2 t ψ . A half-normal distribution is considered as the prior for the dispersion parameter ψ ∼ N + (0, 5). Numbers of daily reported deaths D t are linked to expected numbers of daily infections I τ , τ < t, by taking into account estimates for the effective infection fatality rate (IFR) and the time distribution between infection and reported death. More specifically, expected numbers of daily reported deaths d t are given by

where π denotes the (discretized) distribution for the time between infection and reported death for lethal infections, IFR τ is the estimated effective infection fatality rate and I τ denotes the expected number of infections for day τ . The distribution π for the time between infection and reported death is obtained as the sum of two components: the incubation period and the time between symptom onset and reported death. Based on results from Lauer et al. [17] , for the incubation period we use a log-normal distribution with mean 5.52 (days) and standard deviation 2.43 (days). The distribution for the time between symptom onset and reported death is adopted from Flaxman et al. [5] and given by a gamma distribution with mean 17.8 (days) and standard deviation 8.01 (days). Since our model is based on daily data, we consider a discretized version π of the distribution for the sum of the two periods.

The IFR is an important link when trying to infer total numbers of infections from death counts. As the IFR of COVID-19 largely depends on the age of the infected individuals [12, 13, 18] , it is important to take the age structure of infections into account, which has been shown to vary over the course of the pandemic in Germany [14] . Thus, in contrast to the original model of Flaxman et al. [5] using time-constant averaged IFRs, we consider time-dependent effective IFRs which reflect the changing age distribution of infections. Here, we estimate weekly effective IFRs based on the assumption that the age distributions of true infections can be approximated by the age distributions of confirmed cases (cf. [14] ). In particular, based on data from the RKI [19], let C a,w denote the number of confirmed cases in calendar week w for age-group a ∈ {0−9, 10−19, . . . , 70−79, 80+}. Furthermore, let IFR a denote the estimated infection fatality rate for age group a based on Brazeau et al. [18] (see Section S.2 of the Supplement for sensitivity analyses using alternative age-specific IFR estimates from O'Driscoll et al. [12] and Levin et al. [13] ). The effective IFR for day τ in calendar week w is estimated as a weighted average of age-specific IFR estimates:

Finally, IFR estimates are shifted backwards by ten days to adjust for the delay between the reporting date of cases C t and the date of infections I t . This delay can be expressed as the sum of the incubation time with an expected value of 5.52 days [17] and the time between onset of symptoms and reporting date (median delay of four days for German data [20] ). On the previous level of the hierarchical model (see Figure 1 ), expected numbers of daily infections I t are modelled based on estimated infections I τ of the preceding days τ < t via

where R t denotes the effective reproduction number and g the (discretized) generation time distribution (i.e. the time between two infections in a transmission pair). As infection times are generally unknown, direct estimation of the generation time is rather difficult and commonly approximated by the serial interval. Based on results from Nishiura et al. [21] , the generation time g is modelled using a discretized log-normal distribution with mean 4.7 (days) and standard deviation 2.9 (days). An important and idealistic assumption of the considered semi-mechanistic model is that the country is viewed as a closed environment (cf. [5] ), so that all infections are assumed to occur within the German population. Consequently, numbers of infections for the first days have to be initialized. Similarly to Flaxman et al. [5] , for the first six modelling days t = 1, . . . , 6, expected numbers of infections I t ∼ Exp(1/λ) are exponentially distributed with mean λ ∼ Exp(10). The starting date for modelling (t = 1) is considered to be the 15th of January 2020, which is defined as the earliest date so that 60 days later, at least 10 cumulative deaths associated with COVID-19 have been reported in Germany.

On the first level of the hierarchical model (see Figure 1 ), the effective reproduction number is modelled with a smoothing spline via

where B p,3 are B-splines of third degree between equidistant knots (each with distance of two weeks) and a p denote the corresponding spline coefficients (cf. [22] ). The maximum in equation (5) is taken to ensure that the effective reproduction number is non-negative. For a smoothing effect, the priors for the spline coefficients a p , p > 1, are considered to be normal distributions a p ∼ N (a p−1 , θ) with the previous spline coefficients a p−1 as expected values and common variance θ ∼ N + (0, 1), while the prior for the first spline coefficient a 1 ∼ N + (0, 1) is considered to be a half-normal distribution. The Bayesian hierarchical model is implemented in R [23] via the add-on package rstan for Stan [24] . The implementation of smoothing splines is based on Kharratzadeh [22] . Posterior samples are obtained by the No-U-Turn Sampler for Hamiltonian Monte Carlo [25, 26] , using eight independent chains with 2000 iterations each (considering burn-in periods of 1000 iterations). Convergence diagnostics indicate that the algorithm provides a representative sample from the posterior distribution (see Section S.4 of the Supplement).

Using German surveillance data [16, 19] , we model the course of the pandemic for Germany as a whole country as well as separately for each of the 16 German federal states during the first year of the pandemic. Here we present detailed national results, while further results for the individual federal states can be found in Section S.3 of the Supplement. Results of the Bayesian hierarchical model (cf. Figure 1 ) for Germany are depicted in Figure 2 . During the first year of the pandemic, death counts show two pronounced waves, with a shorter first wave in spring 2020 with fewer deaths compared to the second wave in autumn and winter 2020/2021. Figure 2 (first plot) indicates that the hierarchical model yields a good fit to the course of reported numbers of deaths. Note that daily reported deaths and confirmed cases show a characteristic weekly oscillating pattern. Estimates of our model, however, capture the average tendency and are robust to such reporting artefacts due to the use of smoothing splines with biweekly equidistant knots for modelling the effective reproduction number. Figure 2 additionally illustrates the introduction of major non-pharmaceutical interventions (NPIs) in Germany. Note that, due to the federal structure of Germany, there have been local differences regarding the implementation of NPIs (not shown here). Around the 23rd of March 2020, the "first lockdown" (Lockdown 1) was introduced in Germany, including strict contact restrictions and closures of non-essential shops. At this point, schools and nurseries had already been closed for one week and public events had been banned for two weeks in most parts of the country. Approximately three to four weeks after the introduction of the "first lockdown", the first wave reached its maximum regarding the number of reported deaths. During summer 2020, cases and death counts were at relatively low levels. After a rapid increase of cases in October 2020, the German government introduced the so-called "lockdown light" to mitigate the spread of the virus. During this period, restaurants and leisure facilities were closed, while schools and shops remained opened with hygiene concepts in place. However, no noticeable decline of reported deaths was observed and confirmed cases remained at relatively high levels, which led to the introduction of the "second lockdown" in December 2020 (Lockdown 2), including closures of schools and non-essential shops amongst further measures. Three to four weeks after the introduction of the "second lockdown", a decline of daily deaths was reported similarly as during the first wave in spring 2020. Note that, as expected, the general course of confirmed cases precedes the course of reported deaths.

Instead of modelling directly the numbers of confirmed cases, which are largely influ- Figure 1 ), based on age-specific IFR estimates from Brazeau et al. [18] . Model estimates are based on posterior medians, together with 50% and 95% credible intervals (CIs). enced by the adopted testing regime, the Bayesian hierarchical model provides estimates of the course of true infections. Estimated infections in turn precede the course of confirmed cases due to the incubation period and reporting delays (see second plot of Figure 2 ). However, results show a remarkable difference between the course of estimated infections and confirmed cases during the second wave of infections in autumn/winter 2020/21: following the "lockdown light" introduced at the 2nd of November 2020, the course of confirmed cases suggests a flattening or even decreasing trend, while results of our hier-archical model indicate that true numbers of estimated infections continued to rise and reached a maximum around one week after the "second lockdown" in December 2020.

Such effects should also be viewed in light of changing testing policies, which can result in varying dark figures. In fact, by modelling the true infections, our approach allows for the interpretation of dark figures of infections during the pandemic. During the first year of the pandemic (i.e. between the 1st of February 2020 and the 31st of January 2021), there were in total 2, 221, 838 confirmed SARS-CoV-2 cases in Germany; for the same time period, the model estimates 4, 448, 889 (95% credible interval: [3, 244, 647; 6, 176 , 642]) infections based on age-specific IFR estimates from Brazeau et al. [18] , yielding an overall estimated dark figure factor of 2.002 [1.460; 2.780]. Using alternative age-specific IFR assumptions based on O'Driscoll et al. [12] and Levin et al. [13] (cf. Figure 3 The third plot of Figure 2 depicts the estimated course of the dark figures factor, which is given by the ratio of numbers of estimated infections divided by numbers of confirmed cases. Since there is an average delay of ten days between infections and reporting dates (see Methods), dark figures for day t are computed as the ratio of estimated infections at day t − 10 and the 7-day-mean of detected cases at day t. Results show that estimated dark figures are much larger before the "first lockdown" in spring 2020 than in the following course, reflecting limited testing capacities at the beginning of the pandemic. It can be observed that changes in estimated dark figures are often associated with shifts in testing policies and practice (here, vertical lines in the third plot of Figure 2 indicate dates of important shifts). In particular, estimated dark figures declined sharply in June 2020 in the context of a local super-spreading event in a slaughterhouse in North Rhine-Westphalia, resulting in temporarily increased targeted testing of factory employees. During August 2020, when incidence rates were low and targeted testing of travellers returning from summer holidays was intensified, the factor of dark figures is estimated to be close to one. Following increasing dark figures of infections in September and October 2020, on the 15th of October the Robert Koch Institute (RKI) changed its recommendations towards less stringent indications for SARS-CoV-2 tests, leading to a further increase in numbers of conducted tests and a considerable decrease in estimated dark figures of infections. However, with increasing incidences towards the end of October, on the 5th of November the recommendations were again updated towards more restricted testing, which again results in increasing estimated dark figures from our model. Finally, temporarily increased dark figures are estimated following Christmas, reflecting the decrease of conducted tests during this holiday period. Overall, our results indicate that the hierarchical modelling approach can reliably detect shifts in testing policies, even though model estimates of dark figures are solely based on numbers of confirmed cases and reported deaths, without incorporating any direct information regarding the numbers of conducted tests.

Finally, the last plot of Figure 2 shows the resulting course of the effective reproduction number estimated by our hierarchical model based on death counts, in comparison with official estimates from the Robert Koch Institute (RKI), which are based on the evolution of confirmed cases [2, 15] . It can be observed that estimates based on death counts are often similar to classical estimates based on confirmed cases. However, model estimates based on death counts prove to be more robust against shifts in testing policies. In particular, confirmed cases indicate a short-term spike in the effective reproduction number linked to a local super-spreading event in June 2020, whereas our model does not estimate a spike during this period; instead, it estimates reduced dark figures of infections, suggesting that the spike in the effective reproduction number was mainly related to targeted testing of contact persons. Furthermore, after the "lockdown light" at the 2nd of November 2020, classical estimates of the effective reproduction number tend to be smaller than model estimates based on death counts. Although the differences do not seem large, they imply considerably different interpretations regarding the course of the pandemic: while classical estimates (with values smaller or equal to one) suggest a flattening or even decreasing trend of infections following the "lockdown light", estimates of our model (with values larger or equal to one) suggest that true numbers of infections continued to rise (cf. second plot of Figure 2 ). Note that the "lockdown light" was introduced more or less at the same time when the RKI recommendations for testing were adapted (see third plot of Figure 2 ), indicating that our model is able to disentangle overlying effects of reduced testing (resulting in larger dark figures) and the adaptation of NPIs on numbers of confirmed cases.

We have extended and adapted the Bayesian hierarchical model of Flaxman et al. [5] for modelling the course of the COVID-19 pandemic in Germany based on death counts. A main feature of the proposed approach is the smooth and data-driven way of estimating the effective reproduction number. As a result, there is no need to prespecify discrete change points for timings of non-pharmaceutical interventions (NPIs) as in the original model of Flaxman et al. [5] , diminishing the chance that potential implicit biases are incorporated into the model [8, 9] . While our approach shows parallels with the Bayesian model developed in Wood [9] , which uses smoothing splines to estimate effective reproduction numbers in the United Kingdom, an important additional characteristic of our work is the adjustment for time-varying effective infection fatality rates (IFRs), which are estimated to change substantially over the course of the pandemic as a result of varying age distributions of infections (cf. [14] ).

Results for German surveillance data illustrate that the proposed retrospective modelling can provide additional valuable insights regarding the course of effective reproduction numbers and dark figures of true infections. While estimated reproduction numbers of our model are often similar to classical estimates from the Robert Koch Institute (RKI) based on confirmed cases [2, 15] , the proposed modelling approach based on death counts proves to be more robust against shifts in testing policies. In contrast to classical estimation methods relying solely on confirmed cases, our approach has the potential to disentangle overlying effects of shifts in testing policies and actual changes in the effective reproduction number, as illustrated for the second wave of infections in Germany in November 2020, where the "lockdown light" was introduced concurrently with the adaptation of testing recommendations. Further results presented in the Supplement illustrate that our Bayesian modelling approach yields robust results for the different developments of the pandemic in the 16 individual German federal states (using the same model and priors as for the full country). Here one should note that death counts were at relatively low levels during summer 2020, so that estimating the effective reproduction number and dark figures of infections for individual federal states comes with larger uncertainty, which is also reflected in the wider credible intervals of model estimates (particularly for federal states with smaller populations, see e.g. the results for Mecklenburg-Western Pomerania and Saarland in Figures 13 and 17) .

Our study is also related to the recent work of Schneble et al. [27] , which estimates relative changes in the case detection ratio (CDR) over time for different age groups. The authors employ a smooth generalized linear mixed model for confirmed cases and variables indicating whether the infections resulted in COVID-19 related deaths (without modelling the actual dates of deaths). While the classical mixed modelling approach provides age-group specific estimates of relative changes in the CDR, our proposed Bayesian hierarchical modelling approach also yields estimates of absolute numbers of dark figures of infections as well as estimates of the effective reproduction number, by considering age-specific IFR estimates and dates of reported deaths. Our numerical results regarding trends in dark figures of infections generally support the results of Schneble et al. [27] : dark figures in Germany are estimated to be largest at the beginning of the pandemic and, after a period of relatively low estimated dark figures (i.e. large CDR) during summer 2020, numbers of undetected cases are estimated to increase sharply in September 2020.

An obvious limitation of our modelling approach relying on death counts is that it can only reflect the course of the pandemic in retrospect. While this is partly true for all modelling approaches, including the traditional ones based on confirmed cases (due to the incubation period and reporting delays), one clearly has to acknowledge that a timely reporting and analysis is essential for estimating the effects of political decisions (e.g., lockdown measures or other NPIs). In this context, models based on death counts will always yield results several weeks later than day-by-day estimates relying on reported cases. A limitation of the presented analysis of German surveillance data is that dates of daily reported deaths may deviate from actual dates of deaths. Another limitation of the Bayesian hierarchical modelling approach is that it relies on various assumptions (cf. [5] ), among them statistical ones including the implemented prior distributions for model parameters (see Section 2). From a more practical perspective, the model also relies on the assumption of a closed environment (no new infections imported from outside of the population) and on the assumption that cases are insusceptible for another (second) infection with COVID-19. For the estimation of the effective infection fatality rate (IFR) we assumed that the evolving age distribution of infections can be approximated by the corresponding age distribution of confirmed cases (cf. [14] ); furthermore, it relies on the assumption that age-specific IFR estimates from Brazeau et al. [18] are applicable to Germany (see Section S.2 of the Supplement for sensitivity analyses with alternative IFR estimates). In the current modelling approach we do not account for vaccinations, which does not pose an important limitation for the considered time period with low total numbers of administered vaccinations until the end of January 2021.

In this context, we have reason to believe that our hierarchical approach is particularly flexible regarding extensions for future challenges of COVID-19 modelling, as it shifts the focus from modelling confirmed cases towards reported deaths. Since we have included time-varying IFRs to account for changing age distributions of infections, future research could make use of this flexibility and incorporate also time-varying vaccination effects into our model. This could be achieved via a general population-wise factor or age-group specific parameters representing the rates of fully vaccinated individuals in the respective groups. Especially in light of progressive vaccination programs in many countries, it can be expected that there will be additional sharp changes in implemented testing regimes during the further course of the pandemic; for example, at the time of writing it was announced that SARS-CoV-2 tests will generally be provided for free in Germany only until the 11th of October 2021. In addition, vaccinated individuals may generally be less likely to be tested due to asymptomatic or mild symptomatic infection (cf. [28] ), which may induce larger dark figures of infections in the vaccinated part of the population. Such vaccination effects are expected to further complicate the reliable estimation of the effective reproduction number based only on the numbers of confirmed cases. Therefore, future research should be targeted at incorporating additional information for modelling the further course of the pandemic, including data on vaccinations, hospitalizations and death counts. 

All hands on deck: a synchronized whole-of-world approach for COVID-19 mitigation

Schätzung der aktuellen Entwicklung der SARS-CoV-2-Epidemie in Deutschland -Nowcasting

Nowcasting the COVID-19 pandemic in Bavaria

Nowcasting fatal COVID-19 infections on a regional level in Germany

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Inferring the effectiveness of government interventions against COVID-19

Understanding the effectiveness of government interventions in Europe's second wave of COVID-19. medRxiv

Effect estimates of COVID-19 nonpharmaceutical interventions are non-robust and highly model-dependent

Inferring UK COVID-19 fatal infection trajectories from daily mortality data: Were infections already in decline before the UK lockdowns?

Effects of coronavirus disease (COVID-19) related contact restrictions in Germany

COVID-19 lockdown induces disease-mitigating structural changes in mobility networks. Proceedings of the National Academy of Sciences

Age-specific mortality and immunity patterns of SARS-CoV-2

Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications

Estimating effective infection fatality rates during the course of the COVID-19 pandemic in Germany

Nowcasting und R-Schätzung: Schätzung der aktuellen Entwicklung der SARS-CoV-2-Epidemie in Deutschland

Aktueller Lage-/Situationsbericht des RKI zu COVID-19

The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Annals of internal medicine

COVID-19 infection fatality ratio: estimates from seroprevalence

Fallzahlen in Deutschland

Serial interval of novel coronavirus (COVID-19) infections. International journal of infectious diseases

Splines In Stan

R: A Language and Environment for Statistical Computing

Stan Modeling Language Users Guide and Reference Manual, Version 2.27; 2021

The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo

Hamiltonian Monte Carlo for Hierarchical Models

A statistical model for the dynamics of COVID-19 infections and their case detection ratio in 2020

SARS-CoV-2 vaccine breakthrough infections are asymptomatic or mildly symptomatic and are infrequently transmitted