key: cord-0198043-qryorn2f authors: Bergstrom, Fanny; Gunther, Felix; Hohle, Michael; Britton, Tom title: Flexible Bayesian Nowcasting with application to COVID-19 fatalities in Sweden date: 2022-02-09 journal: nan DOI: nan sha: 4fc95f6a9061db50a4c42b69a7845a7d60d5d735 doc_id: 198043 cord_uid: qryorn2f The real-time analysis of infectious disease surveillance data, e.g. time-series of reported cases or fatalities, can help to provide situational awareness about the current state of a pandemic. This task is challenged by reporting delays that give rise to occurred-but-not-yet-reported events. If these events are not taken into consideration, this can lead to an under-estimation of the counts-to-be-reported and, hence, introduces misconceptions by the interpreter, the media or the general public -- as has been seen for example for reported fatalities during the COVID-19 pandemic. Nowcasting methods provide close to real-time estimates of the complete number of events using the incomplete time-series of currently reported events by using information about the reporting delays from the past. In this report, we consider nowcasting the number of COVID-19 related fatalities in Sweden. We propose a flexible Bayesian approach that considers temporal changes in the reporting delay distribution and, as an extension to existing nowcasting methods, incorporates a regression component for the (lagged) time-series of the number of ICU admissions. This results in a model considering both the past behavior of the time-series of fatalities as well as additional data streams that are in a time-lagged association with the number of fatalities. The real-time analysis of infectious disease surveillance data, e.g. time-series of reported cases or fatalities, can help to provide situational awareness about the current state of a pandemic during the pandemic and can be used by public health agencies and governments for monitoring the disease development and planning of preventive actions Metcalf et al. (2020) ; Wu et al. (2021) . This task is challenged by reporting delays that give rise to occurred-but-not-yet-reported events which may lead to underestimation of the complete number of reported events. The problem is illustrated in Figure 1 with data of Swedish COVID-19 related fatalities as of 2020-11-24, where the reported number of fatalities per day is showing a declining trend when in reality the number of fatalities per day was currently increasing. However, the reporting for this period of time was not complete until mid February 2021, in other words almost to 3 months later. Nowcasting methods (Donker et al., 2011; Höhle and an der Heiden, 2014; McGough et al., 2020) tackle this problem by providing close to real-time estimates of the complete number of events using the incomplete time-series of currently observed events and information about the reporting delay from the past. Figure 1 : Number of fatalities per day, reported (black bars) and unreported (grey bars) as of 2020-11-24. The figure shows the actual reported situation of Swedish COVID-19 related fatalities on that day, where the reported number of events shows a declining trend but when in reality (as known in hindsight) it was increasing. Nowcasting methods have connections to insurance claims-reserving Kaminsky (1987) and its epidemiological applications trace back to HIV/AIDS modelling (Kalbfleisch and Lawless, 1989; Zeger et al., 1989; Lawless, 1994) . A Bayesian approach to Nowcasting, which constitutes the foundation of our method, was developed by Höhle and an der Heiden (2014) . Nowcasting methods has been used in COVID-19 analysis for daily infections Greene et al. (2021) ; Li and White (2021) , and fatalities Schneble et al. (2020) ; Altmejd et al. (2020) ; Bird and Nielsen (July 2020) . Most nowcasting methods are focused on estimating the reporting delay distribution; however, an epidemic contains a temporal dependence since the disease transmission is correlated from one time point to the next. Taking the temporal dependence into account has been shown to improve the nowcasting performance McGough et al. (2020) ; Günther et al. (2020) . Another approach to nowcasting, not considering the delay distrubution, is to use other data sources that are sufficiently correlated with the time series of interests, see e.g. the Machine Learning approach by Peng et al. (2021) . As an extension to existing Nowcasting methods, we propose a method that allows for flexibility in the reporting delay structure and additionally incorporates further correlated data streams. Our approach for nowcasting Swedish COVID-19 fatalities is based on a Bayesian hierarchical model that considers temporal changes in the reporting delay distribution and incorporates a regression component for the (lagged) time-series of the number of Intensive Care Unit (ICU) admissions. The surveillance data used for the analysis in this paper are daily counts of fatalities and ICU admissions of people with a laboratory-confirmed SARS-CoV-2 infection in Sweden. The data is publicly available from the website of the Public Health Agency of Sweden FHM, where new reports are published daily throughout Tuesday to Friday (excluding public holidays). The aggregated daily counts are updated in retrospective at each reporting date, this implies that the published time series on reported COVID-19 fatalities will always show a declining trend as the events are associated with a reporting delay (see Figure 1 for an illustrative example). The reporting delay can not be observed in a single published report but can be obtained by comparing the aggregated numbers of fatalities of each date from previously published reports. In the present work, we present methodological details of our approach and compare the results to existing nowcasting methods to illustrate the implication of incorporating an additional regression component associated with the number of fatalities. In this section we present the methodological details of our approach, which follows closely the notation introduced in Günther et al. (2020) . In particular, our nowcasting model has two distinct elements; (1) the underlying epidemic curve and (2) the delay distribution. Let n t,d , be the number of events occurring on day t = 0, ..., T and reported with a delay of d = 0, ..., ∞ days, such that the reporting occurs on day t + d. The goal of Nowcasting is to infer the total number of events Nt on day t based on the information available on the current day T > t. The sum Nt can be written as where the first sum is observed and the second sum is yet unknown. This can be illustrated by the so called reporting triangle, seen in The number of events occurring on day t with a delay of d days is assumed to be negative binomial distributed with mean λt · p t,d and overdispersion parameter φ. Let λt = E[Nt] denote the expected number of events on day t. We specify a baseline model for λt as where t = 0, ..., T and d = 0, ..., D. Time t = 0 is assumed to be the start of the the pandemic or the observation period. This approach to model λt as a Random Walk on the log scale is proposed by Günther et al. (2020) and we will use it as a benchmark model. As an extension of Eq. (1), we assume that we can predict the total number of events with an additional data stream having a time-lagged association with the event of interest. The additional data stream is assumed to be ahead in time compared to the time series of interest, e.g., due to the tracked event being a more earlier state in a typical COVID-19 disease progression, or because of smaller reporting delays. We denote the additional data stream at time t as mt and specify the extended model for λt as log(λt)|λt−1, m t−lag , ∼ N (β0 log(λt−1) + β1 log(m t−lag ), σ 2 ), where the β0 and β1 are regression coefficients, such that λt is assumed to be a linear combination of its prior value and the additional time series m with some specified time lag. We note that in the case of β0 = 1 and β1 = 0 the two models specified in Eq. (1) and Eq. (2) become identical. The model for the delay distribution at day t is specifying the probability of a reporting delay of d days for a fatality occurring on day t. We denote this conditional probability p t,d = P (delay = d|fatality day = t). Similarly to Günther et al. (2020) , we model the delay distribution as a discrete time hazard model h t,d = P (delay = d|delay ≥ d, W t,d ) as where d = 0, ..., D − 1, ht,D = 1, γ d is a constant, W t,d being a vector of time-and delay-specific covariates and η regression coefficients. The indicator function 1rep enforce the reporting probability to be zero for non-reporting days. In Günther et al. (2020) it is shown how the reporting probabilites are derived from Eq. (3). The implementation is a hierarchical Bayesian model for nowcasting based on Höhle and an der Heiden (2014) and Günther et al. (2020) in R (R Core Team, 2021) and the inference for the Bayesian hierarchical models is performed using Markov Chain Monte Carlo using Stan (Stan Development Team, 2020). We apply our Nowcasting method to reported COVID-19 fatalities in Sweden and use the change in number of new ICU admissions as the additional data stream. We let Nt be the daily total number of fatalities. We further let the change in ICU admission be the observed signal, mt, as we assume that an increase in ICU admissions will also mean an increase the number of fatalities with some delay. In Sweden, ICU admissions are associated with less reporting delay than the fatalities. We use a retrospective evaluation to assess the performance of our model, in which we compare the model performance to the now assumed to be known true number of COVID-19 related fatalities in Sweden. The evaluation period ranges from August 29 2020 to April 6 2021 and contains 101 reporting days (Tuesday to Friday excluding public holidays). The evaluation period is chosen such that it covers the second wave of COVID-19 related fatalities in Sweden. The time series of number of fatalities and weekly ICU admissions can be seen in Appendix Fig. 7 . In the discrete time hazard model in Eq. (3), we are using linear effects of the time with break-points every two weeks before the current day and a categorical weekday effect. The reporting probability is forced to be zero on non-reporting days (Saturday-Monday) and we use a maximum delay of D = 35 days. To evaluate the performance of our nowcasting method, the posterior samples of the total number of events,Nt = D d=0n t,d are extracted for each reporting date of the evaluation period for both the benchmark model and the ICU-model data. Figure 3 is showing the model results for a single day, 2020-11-24. For this reporting date, we see that the baseline model and the extended model which includes the ICU data provide fairly similar results and that both models provide reasonable estimates of the (at that time) unknown total number of reported fatalities. To assess the model performance over the whole evaluation period, the samples ofNt are evaluated at each reporting day for that current day (when the reporting delay is equal to zero days. In other words, it is the estimates of the complete number of event on the actual reporting day day are evaluated for each reporting day in the evaluation period. In Figure 3 , this corresponds to evaluating the model prediction at Nov 24. The model performance at "now" over the evaluation period can be seen in the Appendix (Figure 6 ). To quantify the the model performance, we use as in (Günther et al., 2020 ) the following four metrics; 1) root mean squared error (RMSE), 2) log scoring rule (logS), 3) continuous rank probability score (CRPS), and 4) the prediction interval coverage (PI). The RMSE is calculated with a point estimate being the median of the posterior samples ofNt, while the scoring rules logS and CRPS are assessing the quality of the probabilistic forecast by considering the full posterior distribution of Nt Gneiting and Raftery (2007) . For the PI, we provide coverage frequencies of 95% prediction intervals for the number of fatalities per day. Figure 4 is showing the model performance evaluated by the CRPS over the evaluation period. From the figure, we can see that the baseline model performs better during a short period in November, but overall the extended model including ICU data has a slightly lower score and hence better performance. The remaining scoring rules, the RMSE and LogS, entail similar results (seen in Appendix Figure 7 ). Table 1 shows the average values of the evaluation metrics over the whole evaluation period, including the coverage frequency of the 95% prediction interval. The average values of the RMSE, LogS and CRPS entail that the models perform very similar but that the model including the ICU data performs slightly better than the baseline model. The coverage of the 95% prediction interval is 94% for both models. In other words, the proportion of times the prediction intervals contain the actual number of fatalities is very close to the nominal level, which is a sign for good model calibration. The similar performance of the two models can be explained by the estimates of the regression coefficients seen in Figure 5 . As previously mentioned in Section 2.1.1, the two models are equal when β0 = 1 and β1 = 0. We also apply the Nowcasting method to reported ICU admissions with confirmed Sars-Cov2 infection where the additional data source is the number of confirmed cases. This model has not yet been retrospectively evaluated, but will be continuously monitored and evaluated as the pandemic progress. In our presented work we provide an alternative to Altmejd et al. (2020) for real-time estimates for the number of Swedish COVID-19 fatalities. Even though fatalities are a lagging indicator in order to obtain situational awareness about the pandemic, and is not without difficulties itself, it is often used an indicator to assess burden of disease. Hence, with the emergence of the omicron variant and relating to the discussions about its severity, monitoring the time series of reported deaths will be of importance. We aim to monitor and evaluate our approach in real-time as cases unfold. Current nowcast estimates of COVID-19 fatalities and ICU admissions in Sweden can be found on https://staff.math.su.se/fanny.bergstrom/covid19-nowcasting. The Public Health Agency of Sweden's Covid-19 data portal Nowcasting covid-19 statistics reported with delay: a case-study of sweden Now-casting of covid-19 deaths in english hospitals Nowcasting pandemic influenza A/H1N1 2009 hospitalizations in the Netherlands Strictly Proper Scoring Rules, Prediction, and Estimation Nowcasting for Real-Time COVID-19 Tracking in New York City: Evaluation Study Using Reportable Disease data from the early stages of the pandemic Nowcasting the COVID-19 pandemic in Bavaria Bayesian nowcasting during the STEC 0104:H4 outbreak in Germany Inference based on retrospective ascertainment: an analysis of the data on transfusion-related aids Prediction of IBNR claim counts by modelling the distribution of report lags Adjustments for reporting delays and the prediction of occurred but not reported events Bayesian back-calculation and nowcasting for line list data during the covid-19 pandemic Nowcasting by bayesian smoothing: A flexible, generalizable model for real-time epidemic tracking Mathematical models to guide pandemic response: Models can be used to learn from the past and prepare for the future Real-time Prediction of the Daily Incidence of COVID-19 in 215 Countries and Territories using machine learning: Model development and validation R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Nowcasting fatal covid-19 infections on a regional level in germany RStan: the R interface to Stan Nowcasting epidemics of novel pathogens: lessons from COVID-19. Nature Medicine Statistical methods for monitoring the AIDS epidemic This work is artly funded by the Nordic Research Agency (NordForsk, grant 105572). The computations and data handling was enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at HPC2N partially funded by the Swedish Research Council through grant agreement no. 2018-05973. We also thank Markus Lindroos for discussions and his contribution in coding of the reporting delay distribution.