key: cord-0606188-g5ww5748
authors: Hawryluk, Iwona; Hoeltgebaum, Henrique; Mishra, Swapnil; Miscouridou, Xenia; Schnekenberg, Ricardo P; Whittaker, Charles; Vollmer, Michaela; Flaxman, Seth; Bhatt, Samir; Mellan, Thomas A
title: Gaussian Process Nowcasting: Application to COVID-19 Mortality Reporting
date: 2021-02-22
journal: nan
DOI: nan
sha: d43071adf3c8181a8fb99249b009ddb285ffa13d
doc_id: 606188
cord_uid: g5ww5748

Updating observations of a signal due to the delays in the measurement process is a common problem in signal processing, with prominent examples in a wide range of fields. An important example of this problem is the nowcasting of COVID-19 mortality: given a stream of reported counts of daily deaths, can we correct for the delays in reporting to paint an accurate picture of the present, with uncertainty? Without this correction, raw data will often mislead by suggesting an improving situation. We present a flexible approach using a latent Gaussian process that is capable of describing the changing auto-correlation structure present in the reporting time-delay surface. This approach also yields robust estimates of uncertainty for the estimated nowcasted numbers of deaths. We test assumptions in model specification such as the choice of kernel or hyper priors, and evaluate model performance on a challenging real dataset from Brazil. Our experiments show that Gaussian process nowcasting performs favourably against both comparable methods, and against a small sample of expert human predictions. Our approach has substantial practical utility in disease modelling -- by applying our approach to COVID-19 mortality data from Brazil, where reporting delays are large, we can make informative predictions on important epidemiological quantities such as the current effective reproduction number.

In many real-world settings, current observations from a noisy signal can be systematically biased, with these biases only being corrected after subsequent updates create more complete data. Often, these updates occur much later in the future due to data processing or reporting delays. Not accounting for these delays would result in biased predictions, while waiting for updates would result in a lack of timely estimates. The need for timely estimates to predict the present is colloquially known as nowcasting, and its importance has been shown in a wide range of fields such as actuarial science, economics, and epidemiology [Kaminsky, 1987 , Lawless, 1994 , Bastos et al., 2019 , McGough et al., 2020 .

Nowcasting, as defined by Banbura et al. [2010] at the European Central Bank, is the process of predicting the present, the very recent past, and very near future using time series data known to be incomplete. An example from economics is using monthly data to nowcast the current state of important indicators for an economy such as GDP or income. More broadly, nowcasting is relevant for scenarios not only where the data are incomplete, but when the data are comprised of a biased subsample that will be updated in the future retrospectively, following lengthy delays.

In epidemiology, nowcasting is required due to delays in reporting arising from limitations in testing capacity, data curation, and the requirement for pseudonymisation of patient data [Bastos et al., 2019] . These delays are further compounded by the noise inherent in such data due to limited sampling (typically only a subset of the population is sampled). Throughout this paper, we specifically focus on the delays in the reporting of deaths. An individual dies of a disease on a given day, but the delay between this event and the death being reported (and appearing in the dataset) can be substantial because of the reasons noted above. These reporting delays mask the true current state of the epidemic, and have material consequences for our understanding of both present and future evolution of the epidemic. For example, estimation of key epidemiological quantities such as the effective reproduction number (R t ) would be systematically biased. Contemporary, real-time and unbiased estimates are necessary for effective public health planning and policy. R t using nowcasted data D Figure 1 : A) Reported daily hospital deaths are censored at recent times due to reporting delays. This can be seen by comparing the raw data with a ground truth from two months in the future, when the records have been backdated. B-C) The effective reproduction number R t for SARS-CoV-2 infections in Brazil from 30-Jun-2020 to 23-Nov-2020, estimated using deaths from the raw reported data released on the 23-Nov-2020, and using a backdated ground truth based on data released on 08-Feb-2021. D) R t estimates based on nowcasted mortality data. Whereas the raw data results in misleading estimates of R t , with the estimated R t < 1, by applying nowcasting to the deaths counts we achieve a picture of the epidemic closer to the truth.

In this paper we propose a nowcasting framework based on latent Gaussian processes (GPs). This methodology is used to address the specific problem of delayed reporting in the true incidence of deaths due to COVID-19 in Brazil.

Previous methods for nowcasting exist in several different contexts. Bańbura and Modugno [2014] propose a maximum likelihood approach with a dynamic factor model to predict GDP. Shi et al. [2015] use a deep learning approach based on LSTM to nowcast rainfall intensity. Codeco et al. [2018] provide a framework to gather epidemiological information and correct for delays in reporting in Brazilian data. Bastos et al. [2019] present a Bayesian hierarchical model for nowcasting applied on data relating to dengue fever and severe acute respiratory infection cases. In McGough et al.

[2020] a Bayesian nowcasting approach is proposed that produces accurate estimates that capture the time evolution of the epidemic curve. Specifically for COVID-19, Bayesian nowcasting approaches have been used to correct for the reporting delays in Bavaria and Sweden [Günther et al., 2020 , Altmejd et al., 2020 . Further discussion around the challenges in estimating reporting delays are also addressed in Seaman and De Angelis [2020] . Finally the problem and background context in Brazil for delays in reporting with corrected data are further explained in , Villela [2020] .

Our methods build upon and generalize the NobBS (Nowcasting by Bayesian Smoothing) method originally proposed by McGough et al. [2020] . NobBS is a Bayesian method that produces smooth and accurate nowcasted estimates in the presence of multiple diseases. NobBS allows for both uncertainty in the delay distribution and the evolution of the epidemic curve. While an effective method, NobBS has several limitations, such as inability to pick up fast-occurring changes in the delay distribution, which we overcome in this paper. The extensions we show result in comparable performance for COVID-19 mortality surveillance in Brazil, but present a better fit to the dynamic delays distribution.

The problem tackled in this paper is conceptually illustrated in Figure 1 . The black points are the data available to us at a given time, and the red the ground truth that is only available much further in the future. It can be seen that the discrepancy between the presently available data and the underlying ground truth data grows markedly as we approach the present -a distinguishing characteristic of reporting delays. Alongside this, in Figure 1 we also show 3 estimates of the effective reproduction number R t (defined as the average number of infections an infected individual will go on to infect), obtained using a Bayesian hierarchical renewal-type model . Understanding this epidemiological quantity is vital -R t > 1 results in epidemics growing, while R t < 1 results in epidemics declining. Figure 1B shows estimates of R t derived from the raw data, while Figures 1C and 1D show estimates of R t derived from the ground truth data and our nowcasting approach respectively. These plots show that not correcting for delays can lead to a fundamentally different picture of the current epidemic state. Delays in death reporting lead to an underestimation of the true number of deaths in the observed data -the result is a suggestion of a declining epidemic, despite the fact that the epidemic is actually growing.

In this paper we focus on the Brazilian death data from the publicly available hospitalisation database with deaths from both confirmed and suspected COVID-19 diagnostic status [Ministério da Saúde, 2020] . Our central premise is that using these daily death data alone results in policy decisions being made based on false statistics and trends [Villela, 2020] . To facilitate well informed policy making based on unreliable data streams we propose and implement a nowcasting method using latent Gaussian processes. These GPs are capable of capturing the complex correlation structure in delayed data and present an effective means to correct the reporting delays. We use this corrected death data to calculate the effective reproduction number R t using raw retrospective observed data, nowcasted data and the ground truth updated dataset ( Figure 1 ).

Our contributions are the following:

• We provide a new, flexible and accurate way to correct for delays in reporting. Our framework solves the nowcasting problem through using latent GPs, and provides realistic estimates for the deaths today given incomplete data. Our approach closely predicts the non observed/missing values and simultaneously learns the underlying (latent) data generating mechanisms of the delays.

• We compare our approach to an established alternative method (NobBS), and in a novel contribution, also provide a comparison to a small human expert panel of infectious disease epidemiologists. Domain knowledge is of primary importance for such applications, and is frequently the primary approach taken to interpret data. In generating estimates that are improved over both existing computational methods as well as human experts, we demonstrate the utility of our approach.

• An important contribution of this work are the results and estimates provided. Implementing our approach enables generating of more accurate estimates of the reproduction number; and in turn, a better understanding of the evolution of the COVID-19 epidemic in Brazil. Our framework is implemented in the easy to use probabilistic program PyStan, and therefore facilitates use in low and middle income settings with limited technical expertise.

The structure of the paper is as follows: In section 3 we briefly introduce Gaussian processes and describe the latent GP nowcasting models with several variants. In section 4 we describe the data and perform retrospective tests to evaluate the accuracy of the new models and compare them with a sample of human experts predictions. Finally, we discuss the advantages and limitations of the GP nowcasting framework in section 5.

Let n t denote the response variable of interest that needs to be nowcasted at time t. In this paper n t represents the reported COVID-19 mortality in week t. The mortality observations, in general, consist of measurements from an online data source, subject to distributed observation delays.

The central task of nowcasting approaches is to identify a regular time-delay structure, and use this to estimate n t , at a time when it has only been partially observed. The structure that nowcasting identifies is the additive decomposition of the observable over the reporting delay d. That is, the true signal at a given time t is the sum over all the delayed partial observations for that time:

The intuition behind this formulation is that the "true" deaths that occurred at time t are distributed over various delays d due to the delays in reporting them.

A visual example of partial observation at recent times is the right-censored epidemiological data shown in Figure 2A .

For all data releases, we observe precipitous declines in contemporary data, which is then subsequently revised upwards as the data becomes more complete. In the COVID-19 context this occurs due to time lags in registering and reporting death certificates [Villela, 2020] . Figure 2B shows that most deaths are reported to near completeness after around 5 weeks, and 90% are reported within 10 weeks. Splitting the data up by delay index we form a 2D array in time and delay, n t,d . The filled-in part of n t,d , called the reporting triangle, is shown in Figure 2C . The lower triangle part of this 2D array is missing, since at any time T only the number of deaths reported with delay d ≤ T − t are known for each epidemiological week t.

The representation of the data by time and delay, rather than time and reporting date, induces a regular structureone that is auto-correlated and approximately monotonically decreasing in delay ( Figure 2C ). This relatively simple structure makes this problem amenable to statistical modelling. The lower triangle of the n t,d matrix can be predicted with the model, and therefore an estimate of the true signal is available for any time up to the current time by Eqn. 1, by summing over the delays. This is a common theme from which variations of nowcasting branch out.

To model the discrete positive values of n t,d , we can use a Poisson or a negative-binomial likelihood for overdispersed data: In the negative-binomial case, the dispersion parameter r is a hyperparameter that can be learnt or given an informative prior based on the problem. The latter approach is common among the established Bayesian nowcasting methods [Bastos et al., 2019 , Günther et al., 2020 , McGough et al., 2020 . The mean of the negative binomial, λ t d , is often modelled as a random walk [Bastos et al., 2019] or as an auto-regressive process [McGough et al., 2020] along the time dimension, that is joint independent with a learnt vector of delays. The second approach is taken in the NobBS model [McGough et al., 2020] , used in this paper as a benchmark. Specifically, the NobBS model describes λ t d as:

where α t is a latent signal for week t and β d is the probability of reporting with delay d. It is worth noting, that with this approach, the distribution of delays β d is fixed throughout the window of analysis.

This approach has been successful for dengue and influenza surveillance [Codeco et al., 2018 , Bastos et al., 2019 , but has limitations in terms of the generality of the time-delay covariance structure that can become apparent in more dynamic nowcasting scenarios, such as an evolving epidemic with changing delay distributions ( Figure S1 ). Such issues can be minimised by tuning the window over which the static delay vector is estimated, or by manually adding crossterm covariates. Here we employ Gaussian processes as a generic flexible alternative to model arbitrarily structured λ t d . The details of this are set out in the following section.

The introductory model we consider consists of a latent GP with a 1D kernel. In general terms, GPs are a class of Bayesian non-parametric models that define a prior over functions. They are a powerful tool in machine learning, for learning complex functions with applications in both regression and classification problems [Rasmussen and Williams, 2006, Wilson and Adams, 2013] . In recent years GPs have gained popularity in statistics and in machine learning, due to their flexibility and excellent performance for many spatial and spatiotemporal problems [Wilson and Adams, 2013 , Flaxman et al., 2015 , including COVID-19 modelling [Qian et al., 2020] . The covariance function or kernel together with the mean function completely define a GP. The mean function is the base function around which all of the realizations of the GP are distributed. The covariance kernel is a crucial component of the Gaussian process, as it describes the covariance of the Gaussian process random variables i.e. how similar two points are. Therefore, the kernel defines the shape of the distribution and which type of functions are more probable.

One of the most popular choices of covariance kernel, and the one we chose to introduce the model with, is the squared exponential kernel, k SE , with entries defined by a covariance function k SE (·, ·) such that

The parameter α defines the kernel's variance scale, and ρ is a lengthscale parameter that specifies how nearsighted the correlation between pairs of time points (t i ) is. The kernel results in a prior over a set of functions to describe, λ t,d , the mean of the statistical model in Eqn. 2. This is modelled as a zero mean log-space latent Gaussian process

Due to weak identifiability [Rasmussen and Williams, 2006 ], a strategy to identify the hyperparameters ρ and α is to fix the lengthscale ρ to the maximum delay time considered in the nowcasting problem, and learn only the scale parameter α. Markov Chain Monte Carlo (MCMC) is used in order to generate posterior summaries for arbitrary (non-normal) latent Gaussian processes.

The basic model introduced above can be extended to provide a more expressive description of the data. The purpose of this is to be able to describe the complex structure in n t,d .

Using the compositional kernel approach [Duvenaud et al., 2013 , Wilson and Adams, 2013 , Wilson et al., 2016 , we can create a new additive kernel over multiple lengthscales, indexed s, as

The lengthscale hyperparameters are fixed or given strongly informative priors, ρ s , while each α s is learnt. In the simplest case we consider a kernel with two lengthscale contributions, short-and long-range correlation structure:

plus a regularising term with a Kronecker delta function ensuring σ 2 Gaussian noise is only added when i = j. The choice of kernel confers bias that can result in a better generalisation. The logic of this kernel is to split the covariance into two components: (a) a smooth long-range component, used to extrapolate the trend into the unknown part of the reporting triangle where large distances from the observed points exist and (b) a part for describing variation in n t,d over shorter lengthscales. Additionally, the separation of kernels provides a generic method to describe more complex data generating processes -for example, the long-range kernel can be squared-exponential, while the short-range can be a less smooth type with a different power spectrum such Matérn (1/2). This can be used to create a general statistical model for all of n t,d . Furthermore, in this regard the δ contribution provides a source of regularisation which may be useful if there is reason to believe n t,d values are subject to variation beyond the scope of the basic nowcasting framework. For example, if a death can switch category from a COVID-19 suspected death to a cause other than COVID-19 in later data releases, this could result in a negative n t,d count, which can be modelled as an error to be regularised.

A further modification that can be applied if the time-delay surface n t,d has a complex structure, is to split the data into two components and model each with separate kernels. For example, if delays of 0 or 1 weeks account for a large fraction of total counts, they can be considered separately to delays > 1. This approach is considered later in section 4.2. But a more generic formulation is to consider a 2D kernel to fully account for the time-delay correlation structure, which is introduced below.

As a further expansion of the approach described before, we introduce a separable two dimensional kernel over time and

. Separable kernels can be efficiently implemented using Kronecker product algebra as described in Flaxman et al. [2015] . Specifically, individual Gram matrices for time and delay are combined using the Kronecker product such that

As before, the kernel can be given an additive structure over multiple lengthscales. For example,

This approach captures the relationship between t and d. In both 1D and 2D kernel approaches it is possible to perform partial pooling of the models parameters by combining two or more spatial locations with similar features, for example neighbouring states, if limited data is available. In practice however we found that there was a limited gain in doing so as our approach works well with relatively few observations. , de Souza et al., 2020 . New data have been released regularly online, on a weekly basis, in the second half of 2020 considered here. In this study, we extracted all SIVEP-Gripe data releases from 7-July-2020 to 31-May-2021. We consider all cases of suspected or confirmed COVID-19 (class 4 and 5).

There are a number of potential sources of error in the reported SIVEP data. One is underascertainment -systematic biases which are beyond the scope of correction by this nowcasting methodology. Another source of error is delayed classification. After the initial input of patient's data into the database (usually at the time of hospitalisation), the entry might be later updated with clinical and laboratory data, including confirmatory COVID-19 testing. Further updates will include the outcome and its date (i.e. date of death or date of hospital discharge) and cases receive a final classification. Cases can be classified as confirmed (class 5) or suspected COVID-19 (class 4), or other causes (classes 1-3). Despite being described as a "final classification", reclassification does occur, and is especially common for unknown cases to be reclassified as COVID-19 once results from confirmatory tests are informed to the health authorities. On the other hand, some deaths attributed to suspected SARS-CoV-2 infection are later "removed" from the SIVEP database, due to duplicate filtering or because they are eventually attributed to other diseases. That can cause the number of deaths on certain days to decrease in consecutive data releases, as shown in Figure S2 in the Supplement.

The number of deaths per day as reported by each release is presented in Figure 2 , together with a reporting triangle, showing the distribution of the reporting delays across time.

According to the SIVEP-Gripe dataset, over 90% of all deaths have been reported with delay less than 10 weeks ( Figure 2B ). We therefore choose the maximum reporting delay D for our data to be D = 10, and sum up all deaths which were reported with the delay longer than 10 weeks. Finally, to create the reporting triangle appropriate for our model, we aggregate the data into weeks.

We fit and present 7 models. For 1D kernel GPs, we consider a single SE kernel (1D SE), and additive long-and shortrange component kernels (1D SE+SE and 1D SE+Mat). The additive long-and short-range component kernels are also considering splitting the data into across delay greater and less than one (1D SE+SE data-split). Finally we consider a 2D kernel GP model with additive long and short range components (2D additive Each of the models, characterised by the likelihood given in Eq (2), and a latent GP part for modelling the λ t,d (section 3.2) is trained by supplying the reporting triangle n t,d filled with data available up to the point of the nowcast. Each of the parameters governing the model, such as over-dispersion r or lengthscales and variances of the GPs are learnt during the model fit. The best performing hyperparameters of the prior distribution were selected conditioned on the observed results. All of those parameters and their prior densities are given in the Supplement, Table S1 . The training of the model and nowcasting through sampling each element of the n t,d matrix is done simultaneously. Specifically, at each iteration parameter values are sampled and immediately used to sample from the negative-binomial distribution to obtain all elements of the n t,d matrix.

Other nowcasting methods, including NobBS, focus primarily on estimating only the "missing" part of the n t,d array and comparing the total numbers n t , that is sums of each row of the array. Here, we aim to obtain a statistical model explaining all elements of the n t,d matrix. The reason for that is twofold: firstly, having a model that describes the whole n t,d surface well increases the reliability of the model, which is vital in any healthcare setting. Secondly, the SIVEP-Gripe database contains hard to identify errors discussed in section 4.1, therefore it is preferable to treat the reported data with additional statistical uncertainty. The fit of the 2D GP and the NobBS models to the n t,d matrix is presented in Figure 3 and shows that the GP-based nowcasting method fits the time-delay structure much closer than NobBS.

(SLGHPLRORJLFDOZHHN 1RE%6 1XPEHURIGHDWKVSHUZHHN 5HSRUWLQJGHOD\ZHHNV Figure 3 : Reported and nowcasted numbers of deaths with reporting delay 0 to 5 weeks generated by the 2D additive GP and NobBS models. These plots show the columns of the reporting triangle n t,d . The reported data are shown with solid lines, and the 50% CrI for the nowcasts with the ribbons. 95% CrI are shown in Figure S3 . To evaluate the accuracy of the competing nowcasting models, we fit all of the models to the retrospective data sets available at each week between 05-Oct and 30-Nov-2020, using the numbers of deaths recorded for the whole of Brazil. This way we obtain 63 different sets of nowcasts, which we compare to the numbers of deaths reported by the most recent SIVEP-Gripe data release from 31-May-2021. The start date gives us at least 15 weeks of training data for each of the nowcasts. The end date of 30-Nov is 26 weeks before the most recent release, so the number of deaths reported in the most recent release can be confidently taken as a true value. The comparison is done by calculating the weighted and unweighted rooted mean squared error (RMSE) and the continuous ranked probability score (CRPS) between the "true" values from the most recent release and the nowcasted values. The differences between the ground truth and raw data, as shown in Figure 1A , are used as weights. For each RMSE and CRPS evaluation, we use the mortality data from the 10 weeks leading up to the date of the nowcast.

Out of all tested models, GP models with 1D kernels with 2 components (SE+SE and SE+Mat) performed worse than the benchmark. The predictive accuracy of the other models was comparable to that of the benchmark, as shown in Figure 5 , while also simultaneously giving an appropriate statistical description of the data (Figure 3 ). These results provide empirical evidence that our proposed method, under correct specification, gives a complete and accurate approach for describing and nowcasting COVID-19 death data. Model fits for the 2D additive GP model, including the 95% credible intervals (CrI) are shown in Figure 4 and the fits for the remaining models are shown in Figures S8 -S14 .

In addition to the forecasting metrics, as a novelty, we also evaluate how the GP nowcasting model performs when compared to human experts' predictions. We asked a group of infectious disease epidemiology experts to provide a series of nowcasts when presented with time series data up to 12-Oct and 23-Nov-2020. They were asked for their estimates ' 6 ( ' V S OLW 6 ( 6 ( ' D G G LW LY H 1 R E % 6 506( :HLJKWHG 8QZHLJKWHG ' 6 ( ' V S OLW 6 ( 6 ( Figure 5 : RMSE and CRPS evaluated for n t for tested nowcasting methods over the weeks 5-Oct to 30-Nov-2020. For the weighted RMSE, the weights were calculated as a difference between the ground truth and the raw data available up to the nowcasting date.

of the true numbers of deaths due to COVID-19 in Brazil on the 08-Oct and 19-Nov-2020 ( Figure S15 ). The dates were specifically chosen to represent different scenarios. In the first one, 08-Oct-2020, both the raw data and the updated numbers of daily deaths were declining. Whereas in the second date, 19-Nov-2020, the raw data were declining while the updated release revealed that the true numbers of daily deaths were actually increasing. For this experiment, 36 anonymous experts provided their point estimates and confidence intervals, which are presented in Figure 6 . To extract daily deaths from the model's weekly estimates, we performed a simple interpolation, through setting the nowcasted number of deaths per week divided by 7 to the middle day of the given week, and interpolating the remaining values using splines (example shown in Figure S16 ). Notably in both cases the median value guessed by the human experts was not far from the "true" value (difference of 55 and 73 deaths/day respectively for the human median estimate and 13 and 72 for the nowcasting model mean), however only 36% and 50% of all answers included the true value within the provided credible intervals, respectively.

The confidence intervals given by the human experts were comparable with the 95% CrI of the model, with the model confidence narrower by 19 deaths/day for the first date and 27 deaths/day for the second date.

We performed basic sensitivity analysis for the 1D SE+SE data-split GP model. We first varied the prior for the overdispersion parameter r. This parameter is often unidentifiable by the models and has to be chosen based on the data [McGough et al., 2020] . This is confirmed by our sensitivity analysis, where changing the prior for r changed the width of the confidence interval, but did not impact the mean predictions, as shown in Figures S19 and S20 . Changing the priors for the scale parameter α to less informative, that is increasing the variance of the priors does not significantly affect the mean predictions ( Figures S21, S22) . Changing the mean of the priors does however have an impact on the predictions and leads to bi-modality of the posterior distribution if the model is misspecified, e.g. when N(0, 1) priors are used ( Figures S23, S24) .

Applying nowcasting to surveillance data suffering from the reporting delays is crucial to accurate tracking of realtime epidemic dynamics. The limitations associated with using non-corrected data in epidemiological analyses is highlighted with our results of the R t estimates shown in Figure 1 . Use of this raw data leads to continued underestimation of R t and predicts a declining epidemic. Specifically, in the month preceding the nowcast, the relative entropy value for the ground truth and raw data R t was on average 13.14 (max 43.8) and for ground truth and nowcasted data R t only 0.26 (max 0.35) (see Figure S4 ). By contrast, the ground truth results show that the epidemic remains uncontrolled, with R t remaining above 1 -an important conclusion also captured by our nowcasting approach.

The recent emergence of SAR-CoV-2 variants of concern with altered epidemiological characteristics, such as increased transmissibility [Volz et al., 2021] or partial evasion of immunity [Faria et al., 2021] , emphasise the need accurate and continued real-time epidemic tracking. To estimate COVID-19 mortality in Brazil, during a resurgent phase of the epidemic concurrent with the emergence of the P.1 variant, we perform nowcasting using data released on the 8-Feb-2021 and compare the predictions to the updated numbers of deaths released on the 31-May-2021. The results of the nowcast for the whole of Brazil are shown in Figure 7 . The reported data show a decline in the weekly number of deaths since epidemiological week 54 for Brazil, however the nowcasted results show much higher numbers of estimated weekly deaths, closer to the true value. The GP nowcasting framework we introduced was benchmarked with the established NobBS method [McGough et al., 2020] . The main theoretical difference between the two proposed models is how the mean of the negativebinomial distribution, describing the number of deaths per week, is modelled. Our model uses a latent Gaussian process to do that, whereas NobBS uses first-order random walk. The 2D GP model improves on the RMSE for point predictions and CRPS for the distribution of samples compared to NobBS, as shown in Figure 5 , and provides more realistic uncertainty intervals (Figures S13 and S14). As well as improving predictive performance on the missing parts of the reporting triangle, the GP framework provides a more expressive statistical model capable of better explaining the historical reporting data (see Figure 3 ).

One of the limitations of the approach described here is the dependence on the historical data and the regularity of the data releases, a limitation shared by many other nowcasting approaches. Additional challenges include variability in the distribution of reporting delays over time. For example, during the initial phase of the Brazilian COVID-19 epidemic, reporting delays were particularly severe. Delays in reporting are typically most extensive during outbreaks of a novel pathogen (such as SARS-CoV-2), due to the limitations in diagnostic availability and testing capacity. Relatedly, during epidemic peaks, strain on healthcare systems and administrative staff due to increasing admissions can also result in lengthening of reporting delays.

The GP nowcasting models introduced in this paper can be readily used for real-time monitoring of the new outbreaks of diseases, as relatively few data points are required to train the model, ca. 3 months here. In other applications it is possible more data may be required, depending on the distribution of reporting delays, variance of counts, and regularity and granularity of data. Although this paper focuses on ap-plication of the proposed GP-based nowcasting framework to the death counts for the whole country, the GP models can be applied at finer spatial scales, as illustrated for individual states of Brazil in Figure S5 . This flexibility is important due to the large heterogeneity of the healthcare system in the country [Baqui et al., 2020] .

We have presented a new approach to modelling time-delay data, which can be used to nowcast online data streams that have statistically distributed delays. Our approach uses latent Gaussian processes with additive kernels, and gives a fully flexible and generic method to describe and predict the data for unknown delays. The method has been demonstrated for assessing mortality and estimating the effective reproduction number for COVID-19 reporting in Brazil, but can be used for other contexts in which delays in a measurement process exist. Table S1 : Priors for the analysed models. Here T is the maximum number of weeks for which the data is available, that is rows in the reporting triangle, and D is the maximum reporting delay, that is number of columns in the reporting triangle. Γ denotes a Gamma distribution. * the same hyperparameters were used for the models with Matérn(1/2) and Matérn(3/2) kernels.

1D SE 1D SE+SE 1D SE+Mat* 1D SE+SE data-split 2D additive NobBS r Γ(500, 2) Γ(500, 2) Γ(500, 2) Γ(500, 2) Γ(200, 2) Γ(500, 2) α 1,long N (1, 1)

(SLGHPLRORJLFDO:HHN Figure S1 : The change in the distribution of delays for Manaus (state Amazonas). Each cell in this heatmap shows the percent of all deaths which occurred at week t, and were reported with given delay d.

Here we assume, that up till epidemiological week 53 deaths have been recorded in the database. We use all available SIVEP data up to the release on 5-April-2021.

'HOD\LQGH[ 1XPEHURIGHDWKVUHSRUWHG Figure S2 : Example of weekly reporting delay with negative signal for Amazonas, epidemiological weeks 27 to 42. Each week's mortality data are plotted as a single line. Some lines fall under the y = 0 line (red dashed line), indicating that there were some deaths that were incorrectly assigned to a given week, which was corrected by the following releases of the data. . Figure S7 : Distribution of reporting delays for each of the retrospective tests. The moment of the "nowcast" is shown with the red dotted line in each plot. Solid line present the data extracted from the release of SIVEP data from 31-May-2021, and the ribbons show 50% CrI of the model fit obtained obtain using the 2D additive kernel GP model. (SLGHPLRORJLFDOGD\ 1XPEHURIGHDWKVSHUGD\ 7UXWK *3PRGHO QRZFDVW Figure S16 : Example of the interpolation of numbers of deaths per day based on the weekly nowcasted values. The 50% and 95% CrI for the nowcast are shown with the ribbon.

For each of the model runs shown in this section, 4 chains were run for 1000 iterations, with 500 iterations used for burn-in. All fits presented are done for nowcasting using all data available up to 11-Jan-2021. 

Nowcasting COVID-19 statistics reported with delay: a casestudy of Sweden. arXiv

Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data

Ethnic and regional variations in hospital mortality from COVID-19 in Brazil: a crosssectional observational study. The Lancet Global Health

A modelling approach for correcting reporting delays in disease surveillance data

COVID-19 and hospitalizations for SARI in Brazil: a comparison up to the 12th epidemiological week of 2020. Cadernos de Saúde Pública

Stan: A probabilistic programming language

Infodengue: A nowcasting system for the surveillance of arboviruses in Brazil

Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil

Structure Discovery in Nonparametric Regression through Compositional Kernel Search

Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus

Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Nowcasting the COVID-19 pandemic in Bavaria

The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo

Prediction of IBNR claim counts by modelling the distribution of report lags

Adjustments for Reporting Delays and the Prediction of Occurred but Not Reported Events

Nowcasting by Bayesian Smoothing: A flexible, generalizable model for real-time epidemic tracking

Estimating COVID-19 cases and reproduction number in Brazil. medRxiv

SRAG 2020 -Banco de Dados de Síndrome Respiratória Aguda Grave

On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective. arXiv

Description and comparison of demographic characteristics and comorbidities in SARI from COVID-19, SARI from influenza, and the Brazilian general population

When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes

Gaussian Processes for Machine Learning

Challenges in estimating the distribution of delay from COVID-19 death to report of death

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. NIPS'15

How limitations in data of health surveillance impact decision making in the COVID-19 epidemic

Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England

Gaussian Process Kernels for Pattern Discovery and Extrapolation

Deep Kernel Learning

The authors thank the CADDE group for insight into COVID-19 reporting in Brazil, and are grateful to Bruce Nelson for consistent public reporting of COVID-19 in Amazonas. T.A.M is grateful to Daniel A. M. Villela for helpful discussion of nowcasting. The authors would also like to thank the group of anonymous epidemiology experts from Imperial College London Department of Infectious Disease Epidemiology, who participated in testing the nowcasting model against the human predictions. 

 Figure S18 : Traceplots for the 2D additive GP model.

For all sensitivity analyses, 1D SE+SE data-split GP model has been used. Each time we run the models with 4 chains for 1000 iterations, with 400 iterations used for burn-in. All fits presented are done for nowcasting using all data available up to 11-Jan-2021. Figure S23 : Model fits with different α long,1 , α long,2 , α short,1 and α short,2 prior density. Default means using the default priors described in Table S1 , for default x 3 prior we increased the mean in the default priors 3-fold, Normal(0,1) means a standard prior was set to all α-s, and long-and short-default x 2 means increased mean in the default prior long-and short-part respectively. Figure S24 : Model fits with different α long,1 , α long,2 , α short,1 and α short,2 prior density. Default means using the default priors described in Table S1 , for default x 3 prior we increased the mean in the default priors 3-fold, Normal(0,1) means a standard prior was set to all α-s, and long-and short-default x 2 means increased mean in the default prior long-and short-part respectively.