key: cord-0151024-y2p1nibm
authors: Monod, M'elodie; Blenkinsop, Alexandra; Brizzi, Andrea; Chen, Yu; Perello, Carlos Cardoso Correia; Jogarah, Vidoushee; Wang, Yuanrong; Flaxman, Seth; Bhatt, Samir; Ratmann, Oliver
title: Regularised B-splines projected Gaussian Process priors to estimate time-trends of age-specific COVID-19 deaths related to vaccine roll-out
date: 2021-06-23
journal: nan
DOI: nan
sha: 75a14f8c316d6f23099d6fe8fc58e24f1f432e6b
doc_id: 151024
cord_uid: y2p1nibm

The COVID-19 pandemic has caused severe public health consequences in the United States. In this study, we use a hierarchical Bayesian model to estimate the age-specific COVID-19 attributable deaths over time in the United States. The model is specified by a novel non-parametric spatial approach, a low-rank Gaussian Process (GP) projected by regularised B-splines. We show that this projection defines a new GP with attractive smoothness and computational efficiency properties, derive its kernel function, and discuss the penalty terms induced by the projected GP. Simulation analyses and benchmark results show that the spatial approach performs better than standard B-splines and Bayesian P-splines and equivalently well as a standard GP, for considerably lower runtimes. The B-splines projected GP priors that we develop are likely an appealing addition to the arsenal of Bayesian regularising priors. We apply the model to weekly, age-stratified COVID-19 attributable deaths reported by the US Centers for Disease Control, which are subject to censoring and reporting biases. Using the B-splines projected GP, we can estimate longitudinal trends in COVID-19 associated deaths across the US by 1-year age bands. These estimates are instrumental to calculate age-specific mortality rates, describe variation in age-specific deaths across the US, and for fitting epidemic models. Here, we couple the model with age-specific vaccination rates to show that lower vaccination rates in younger adults aged 18-64 are associated with significantly stronger resurgences in COVID-19 deaths, especially in Florida and Texas. These results underscore the critical importance of medically able individuals of all ages to be vaccinated against COVID-19 in order to limit fatal outcomes.

A new pathogen, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), emerged in the Wuhan region of China in December 2019 and continues to spread worldwide. The resulting disease, COVID-19, is severe, with overall infection fatality rates (IFRs) between 0.1% and 1% (Meyerowitz-Katz and Merone, 2020; Brazeau et al., 2020) , which increase exponentially with age (Levin et al., 2020) . Developed vaccines are highly effective to prevent deaths (Haas et al., 2021; Baden et al., 2021) , and initially have been prioritised to older age groups. In the United States (US), vaccines are now offered to all adults since May 2021, but uptake has been variable across states (Centers for Disease Control and Prevention, 2021a) . Despite increasing vaccine coverage, several US states, most notably Florida and Texas, are reporting resurgent COVID-19 death waves that are on par or exceed in magnitude the death waves seen in the prevaccination period. A key question is if these death waves since summer 2021 are linked to low vaccination coverage in younger adults, which are known to drive transmission (Monod et al., 2021) .

Here, we provide methods for estimating the age composition of COVID-19 attributable deaths over time, and to characterise the potential shifts in the age composition of deaths. Doing so requires a statistical model because data are partially censored, reported with delays, and reported in age bands that can be inconsistent across locations or do not match those of other data streams such as reported vaccinations. Numerous methods have been developed to interpret the time evolution of overall COVID-19 attributable deaths (e.g., Lavezzo et al. (2020) ; Institute for Health Metrics and Evaluation (2020); Chen et al. (2020) ; Blangiardo et al. (2020) ; Zheng et al. (2021) ; Flaxman et al. (2020) ), but few are suitable to analyze longitudinal data stratified by age brackets or other discrete strata (Monod et al., 2021) , let alone at high resolution such as 1-year age bands. One reason for this paucity of methods is that in SIR-type models, calculating age-specific next generation operators at a time resolution of days becomes computationally slow over observation periods that span years, especially when calculations need to be repeated millions of times within a Bayesian framework (Wikle et al., 2020; Monod et al., 2021) . These considerations are prompting us and others to consider non-mechanistic, flexible estimation approaches (Shah et al., 2020; Pokharel and Deardon, 2021) . As our objective is to reconstruct spatiotemporal trends in age-specific COVID-19 attributable deaths, it does not require invoking many of the assumptions or complexities that underlie mechanistic or semi-mechanistic COVID-19 transmission dynamics models and Bayesian non-parametric models are sufficient. We present a fully Bayesian, computationally scalable approach to estimate a two-dimensional (2D) surface over ages and weeks that describes the time evolution of COVID-19 deaths by 1-year age bands at US state level. To impute missing entries on the surface and estimate global trends over ages and weeks, we borrow information across neighboring entries by using a non-parametric smoothing method.

A natural starting point for modelling a surface is a 2D Gaussian process (GP) (Rasmussen and Williams, 2005) . However, their computational complexity makes the use of 2D GPs in a fully Bayesian framework difficult when the surface dimension becomes large, even after using a Kronecker decomposition of the kernel function (Saatçi, 2011; Wilson et al., 2014) . Here we adopt a low-rank approximation via a tensor product of B-splines for which the parameters follow a 2D GP. The resulting approach is equivalent to a 2D GP defined by a low-rank covariance matrix projected by B-splines. B-splines are a popular choice for non-parametric modelling, due to their continuity properties, ensuring smoothness of the fitted surface, and their easy implementation. But choosing the optimal number and position of knots-the defining grid segments where the surface is expected to change its behavior-on the space of ages and weeks is a complex task. Some approaches have focused on adding a penalty to restrict the flexibility of the fitted surface in a frequentist framework (O'Sullivan, 1986,9; Eilers and Marx, 1996; Eilers et al., 2006) . Following this idea, we regularise the fitted surface by using a kernel function with a free complexity parameter to define the covariance matrix of the low-rank 2D GP. We qualitatively compare the penalty induced by this choice to that of related regularisation methods. We benchmark the proposed regularised B-splines projected GP against several other popular smoothing methods, and we demonstrate that our approach results in substantive computational gains over a standard 2D GP for similar estimation accuracy. This paper focus on estimating age-specific COVID-19 attributable deaths in the context of vaccine roll-out on publicly available, age-specific COVID-19 death and vaccination data from the 4 most populated US states, California, Florida, New York (state) and Texas, that are reported by the Centers for Disease Control and Prevention (CDC) (2021a; 2021c). It is structured as follows. Section 2 introduces the data and their limitations. Section 3 describes the proposed methodology to model trends in the age-composition of COVID-19 deaths, including a theoretical characterization of the penalties introduced by B-splines projected GPs. Section 4 presents a comparison of the proposed method to related smoothing approaches on simulated data, and Section 5 on real world data used for benchmarking. In Section 6 we estimate the time evolution of age-specific COVID-19 deaths for the four most populated US states. We document the marked variation in the summer 2021 resurgence in age-specific COVID-19 deaths across states, and show strong resurgences in deaths are associated low vaccination coverage in younger adults. Lastly, Section 7 closes with a discussion.

The proposed modelling framework can be further applied to estimate COVID-19 mortality rates in arbitrary age groups, investigate differences in the age composition of deaths across locations or identify time shifts in the age composition of deaths. The code to use our approach and to reproduce the results is available at https://github.com/ImperialCollegeLondon/B-SplinesProjectedGPs. Median and 95% credible interval of the state-and age-specific COVID-19 attributable deaths predicted by our approach are available in the GitHub repository. Posterior samples are also available upon request. Our results suggest regularised B-splines projected GP priors are likely useful for other surface estimation problems, and we provide templated Stan model files in Supplement S12.

The Center for Disease Control (CDC) and National Center for Health Statistics (NCHS) report each week on the total number of deaths involving COVID-19 that have been received and tabulated through the National Vital Statistics database (Centers for Disease Control and Prevention, 2021c) for every US state across the age groups We refer to the latter as cumulative reported COVID-19 attributable deaths, D m,b,w , for state m, in age group b and on week w. Historical records are made available by Rearc (2021) . To simplify the longitudinal dependence and create a time series from the cumulative counts (King et al., 2015) , we obtain the weekly COVID-19 attributable deaths by differencing,

in each location and age group for all but the last available week (daily deaths, d, for state m, in age group b, on week w). We index the weekly deaths by w = 1, . . . , W .

Reflecting the reporting nature of the age-stratified CDC data, the weekly COVID-19 attributable d m,b,w are subject to reporting delays and do not necessarily correspond to the number of individuals who died of COVID-19 in state m and week w. The CDC does not report the cumulative deaths count if the count is between 1 and 9. Thus, it is not possible to retrieve all weekly deaths, and we refer to the set of weeks that are retrievable in state m for age group b through first order differencing by W WR m,b , and to the set of weeks that were non-retrievable because of censoring by W WNR m,b . The censored cumulative deaths are bounded, such that the sum of non-retrievable weekly deaths is also bounded. The exact computations are described in Appendix S10.1. In addition, there was no update on July 4, 2020 resulting in missing cumulative deaths in that week. The weekly deaths of that week and the preceding week are declared missing. Note that the missing weekly deaths are not equivalent to the non-retrievable weekly deaths because we cannot bound their sums. In the main text, we focus on data reported from May 2, 2020 to September 25, 2021. We thus have W = 74. Figure 1 show the COVID-19 attributable weekly deaths data for the four most populated US states, California, Florida, New York and Texas.

The CDC reports weekly time series of the proportions of individuals aged 18-64 and 65+ who are fully vaccinated (Centers for Disease Control and Prevention, 2021b), referred to as vaccination rate in the following. Fully vaccinated individuals are defined as having received the second dose of a two-dose vaccine or one dose of a single-dose vaccine. We use records from March 02, 2021 to September 25, 2021, and denote by v m,c,w the vaccination rate for state m in age group c ∈ C at the start of week w,

(2.3) Figure 2 presents the vaccination rates among individuals aged 18-64 and 65+ reported in the four most populated US states, California, Florida, New York and Texas. Figure 1 : Retrievable weekly COVID-19 attributable deaths in four US states.

The retrievable weekly deaths are computed using the first order difference of reported cumulative deaths by the CDC (Centers for Disease Control and Prevention, 2021c). Grey cells indicate non-retrievable weekly death counts due to the censoring of cumulative deaths. Orange bars show the missing weekly deaths due to non-reported cumulative deaths.

We use two additional data sets. Firstly, we retrieved the weekly COVID-19 attributable deaths regardless of age from John Hopkins University (JHU) for all US states from May 2, 2020 to September 25, 2021 (John Hopkins University of Medicine, 2020), which we denote by d JHU m,w for state m and week w. In these data, reporting-delayed deaths are back-distributed where possible, and for this reason we use the overall deaths as a reference to mitigate the reporting delays in the age-stratified CDC data (2.2). Secondly, age-specific weekly COVID-19 attributable deaths are also directly reported by state Departments of Health (DoH) on their websites, or data repositories. For California, Florida and Texas, data records are available up to April 1, 2021 at https://github.com/ ImperialCollegeLondon/US-covid19-agespecific-mortality-data. The DoH do not censor low death counts, and the all-ages sum of the DoH data correspond well to JHU data. This prompts us to use the DoH data as an independent data set to assess the accuracy of our model estimates that are derived from the CDC data. 

For simplicity we suppress the state index in what follows, with all equations being analogous. Our aim is to estimate the weekly deaths by 1-year age bands, a ∈ A = {0, 1, . . . , 104, 105} in week w ∈ W = {1, . . . , W }, and we denote their expectation by µ a,w . We first decompose µ a,w as the product of the weekly deaths for all ages λ w with the relative contribution π a,w of age a to weekly deaths, where a π a,w = 1 for all w. Second, we model the longitudinal age composition of deaths π a,w as a bivariate random function (a, w) → f (a, w) ∈ R, that is exponentiated, normalised, and evaluated on the 2D grid A × W. Our basic latent model structure is thus

.

(3.1)

To link the expected weekly deaths by 1-year age band, µ a,w , to the data (2.2), we aggregate them over the age groups specified by the CDC, µ b,w = a∈b µ a,w , for all b ∈ B. Then, we model the observed, weekly deaths in age b and week w for all ages and weeks through Negative Binomial distributions in the shape-scale parameterisation,

with mean µ b,w and variance σ 2 b,w , and where ν > 0 is interpreted as an overdispersion parameter. The purpose of the shape-scale parameterisation with identical θ (3.2a) is that then, the weekly deaths conditional on their total follow a Dirichlet-Multinomial distribution with parameters α b,w , resulting in the succinct identity α b,w = a∈b α a,w (Townes, 2020). For implementation, notice that the shape and scale parameters can be rewritten as α b,w = µ b,w /ν and θ = ν/(1 + ν).

Our Bayesian model comprises as log likelihood the sum of the log Negative Binomial densities (3.2a) over retrievable weekly deaths parameterised by µ b,w and ν. Considering data censoring, similar terms involving log cumulative density functions of the Negative Binomial that bound sums of the non-retrievable weekly deaths are added in the log likelihood. Taken together, for the collection of weekly deaths

where for brevity the term p {d b,w } w∈W WNR b | . . . in (3.3b) is detailed in Appendix S10.2. The priors are specified as follows. First, the prior on the total number of weekly deaths λ w , in (3.3c), is specified in the mean-standard deviation parametrisation. The prior expectation on the total deaths is found by summing the retrievable deaths, such that T w = b∈B,w∈W WR b d b,w . Next, we assume that the standard deviation is equal to twice the first order difference in the total deaths. We find the empirical ratio of the total death relative to their first order difference η, using ordinary least square method without intercept, and specify the standard deviation accordingly. Second, the inverse of the square root of the overdispersion parameter is given a standard normal distribution truncated on the positive support, the recommended prior in this context. Lastly, the model is completed with a prior on the random function f and its hyperparameters φ, which we investigate in detail in the next section.

Considering reporting delays in the data, we explicitly allow for a rescaling of the sum of deaths across age groups in the model according to other curated data sets, which effectively re-distributes reporting delayed deaths in the CDC data set to earlier dates.

Specifically, we require that our posterior predictions of the COVID-19 attributable deaths in age group a and week w, d a,w , sum to a d a,w = d JHU w . We achieve this by exploiting the probabilistic relationship between Negative Binomial distributions in the shape-scale parameterisation and the Dirichlet-Multinomial, through 4) where as before α a,w = µ a,w /ν. We denote by µ a,w the expected predictive COVID-19 attributable deaths in age group a at week w,

(3.5)

In (3.1), we introduced a 2D function f (a, w), for which we shall find a prior in this section. Let the number of points of the age axis be n = |A| and on the week axis be m = |W|. The total number of points on the grid is N = n × m. The ensemble of pairs of points is X = x 1 , . . . , x N = A × W. We now investigate different modelling approaches for the function f (a, w).

Given observations (X, f ) = x 1 , f (x 1 ) , . . . , x N , f (x N ) , we start by considering as model of f a zero-mean 2D GP,

The covariance matrix K is evaluated at all pairs of points in X and has entries K x,

) is a kernel function. For computational efficiency and because our output is on a multidimensional grid, we decompose the kernel function,

where k 1 (., .) and k 2 (., .) are kernel functions over ages and weeks, respectively. The corresponding covariance matrix K is calculated with the Kronecker product K = K 2 ⊗ K 1 , and the number of operations to evaluate the covariance matrix reduces from O(N 3 ) to O(2N 3/2 ) (Gönen and Alpaydin, 2011; Saatçi, 2011) . For implementation, we note that the order of the product follows from the fact that the matrix's entries are stacked columnwise. We rely on the efficient Kronecker product implementation via matrix-vector products proposed in the Supplementary Material, Section 4 of Wilson et al. (2014) . In our applications, we set k 1 and k 2 to be squared exponential kernel functions with variance scale ζ 2 and specific lengthscales for the rows and the columns, γ 1 and γ 2 . Our priors on the length scales are independent γ i ∼ Inv-Gamma(5, 5) for i = 1, 2, and ζ ∼ Cauchy + (0, 1). For brevity, we denote φ = {ζ, γ 1 , γ 2 }.

B-splines basis functions -or, more simply, B-splines -are constructed from polynomial pieces that are joined at certain values over the input space, called knots, and defined by a polynomial degree, d, and a non-decreasing sequence of knots, (t 1 ≤ t 2 ≤ · · · ≤ t K ), where K is the number of knots. Given those, the total number of B-splines is I = K + d − 1. We show how a B-spline is constructed in Appendix S9.1. In this paper, we use cubic B-splines, with d = 3. Moreover, we use equally spaced knots on both dimensions, such that the only tuning parameter is the number of knots. A fundamental property of B-splines, that constitutes most of their attractiveness, is that they are smooth. More rigorously, a cubic B-spline defined on strictly increasing knots is piecewise infinitely differentiable between the knots, and of continuity C 2 on the knots (Goldman (2002) , chap. 7). f can be modelled with a tensor product of B-splines given by

is the ith B-spline defined on the knot vector (t 1 1 ≤ t 1 2 ≤ · · · ≤ t 1 K1 ) over the space A with K 1 being the number of knots, i ∈ I = {1, . . . , I} and I = K 1 + 2. Similarly, B 2 j (.) is the jth B-spline defined on the knot vector (t 2 1 ≤ t 2 2 ≤ · · · ≤ t 2 K2 ) over the space W with K 2 being the number of knots, j ∈ J = {1, . . . , J} and J = K 2 +2. The ensemble of B-splines form a matrix basis denoted by B 1 and B 2 of size I ×n and J ×m, respectively. The pairs of B-splines indices is denoted by U = u 1 , . . . , u M = I × J with M = I × J. [β u ] u∈U is a rectangular set of coefficients. In our applications, we obtain standard B-splines surfaces by placing independent standard normal priors on all B-splines coefficients β u . To keep notation streamlined, notice that there are no hyperparameters φ.

Given the B-splines indices and corresponding coefficients (U , β) = (u 1 , β u1 ), . . . , (u M , β u M ) , we place a 2D GP on the coefficients,

(3.9)

The covariance matrix has entries (K β ) u,u = Cov β u , β u = k β u, u where k β (., .) is a kernel function depending on unknown hyperparameters φ. This approach is equivalent to directly placing a GP on f with a covariance matrix projected by B-splines.

To show this, we rewrite (3.8) as a matrix calculation, f = (B 1 ) T βB 2 , and find its vectorized form

(3.10)

The linear operator produces functions from R IJ to R nm . GPs are closed under linear operations (Papoulis and Pillai (2002), chap.10), and so f specified by (3.8-3.9) is the

In our applications, we decompose the kernel function k β into two kernel functions over the rows and columns of the B-spline coefficients matrix as in (3.7). We use squared exponential kernel functions with variance scale ζ 2 and specific lengthscales for the rows and the columns, γ 1 and γ 2 . Our priors on the length scale priors are independent γ i ∼ Inv-Gamma(5, 5) for i = 1, 2, and ζ ∼ Cauchy + (0, 1). Under this prior, φ = {ζ, γ 1 , γ 2 }.

The kernel function obtained by projecting a base kernel function with cubic Bsplines is C 2 as it can be represented as a linear combination of C 2 functions (proof in Appendix S9.2). Therefore the surface obtained when modelling a surface with the B-spline tensor product (3.11), unlike a standard 2D GP (3.6), has inherent smoothness properties not matter the base kernel functions k β used. We can also interpret (3.11) as a regularised splines method. Indeed, existing regularised spline methods such as smoothing splines (O'Sullivan, 1986,9) and P-splines (Eilers and Marx, 1996; Eilers et al., 2006) aim to minimise in a frequentist setting the loss function

whereβ are the estimated coefficients and P is a penalty applied on the second derivative of the fitted curve for smoothing splines and on finite differences of adjacent B-splines coefficients for P-splines. Now adopting a Bayesian approach, our aim is to maximise the probability of the posterior parameter conditional upon the data, which is proportional to the likelihood multiplied by the prior. By placing a GP on the B-splines coefficients, the prior of f modelled with the B-spline tensor product (3.11) can be decomposed in the same way as (3.12),

where |A| denotes the determinant of matrix A. Thus, the log determinant of the covariance matrix in (3.9) can be interpreted as a complexity penalty, or regulariser, which comes into play if the kernel has a free parameter to control the model complexity (MacKay (2003) , chap. 28).

Bayesian P-splines have been developed as an extension of the frequentist P-splines, in Lang and Brezger (2004) ; Brezger and Lang (2006) , and impose a spatial dependency on the B-splines coefficients to avoid overfitting. Bayesian P-splines are obtained by placing the Gaussian Markov Random Field prior on the B-splines coefficients,

(3.14)

where β −(i,j) is the matrix β without entry (i, j), δ (i,j),(i ,j ) evaluates to one if (i, j) and (i , j ) are neighboring coordinates vertically or horizontally, and otherwise to zero, and δ + (i,j) =

Here, τ is a spatially varying precision parameter. This model implies that each β i,j is normally distributed with a mean equal to the average of its neighbors. We use the Bayesian P-splines prior in benchmark comparisons, due to their close structural relationship with B-splines projected GP. In our applications, we accelerated the evaluation of the likelihood by simplifying the joint likelihood to the pairwise difference formulation and we placed a sum-to-zero constrain on the parameters to ensure the identifiability of the pairwise differences as proposed by (Morris et al., 2019) . We used the prior τ ∼ Cauchy + (0, 1). To keep notation streamlined, we write φ = {τ }.

All inferences using model (3.1-3.2) coupled with different priors on the random surface f were fitted with RStan version 2.21.0, using an adaptive Hamiltonian Monte Carlo (HMC) sampler (Stan Development Team, 2020). 8 HMC chains were run in parallel for 1,500 iterations, of which the first 500 iterations were specified as warm-up. Inferences in simulation analysis and benchmarking were performed similarly.

Prior to application to the CDC data, we compared the performance of the spatial models defined in Section 3.2 at retrieving the mean surface of 2D count data. In the simulations, we considered observations on a 2D grid {0, 0.02 . . . , 0.98, 1}×{0, 0.02 . . . , 0.98, 1}, such that there are 50 × 50 entries. The observations are generated through a Negative Binomial model using as mean the exponential of a 2D GP with a squared exponential kernel. The variance scale of the 2D GP is fixed to 1 and the length scale is varied (γ ∈ {0.05, 0.25, 1}) to generate weakly, mildly and strongly correlated observations. The training set includes 40% uniformly sampled observations (i.e., number of observations in the training set N = 640). Here, we compare the performance of four models with different prior specifications on the mean surface, a standard 2D GP, a standard B-splines surface, Bayesian P-splines and regularised B-splines projected 2D GP. For the methods using B-splines, the same number of knots are placed along the two axes. Figure 3A shows the simulated mean surface in the mildly correlation scenario obtained with γ = 0.25, Figure 3B the simulated observations and Figure 3C mean surface when a standard B-spline surface, Bayesian P-splines, and regularised Bsplines projected 2D GP priors are used for inference with different numbers of knots.

We find that the standard 2D GP and the regularised B-splines projected 2D GP recover well the simulated true mean surface (Figures 4 and Supplementary Figure S8 ). Moreover, the regularised B-splines projected 2D GP obtains equivalent or better predictive performance as the standard 2D GP for a shorter running time, 73.95% faster on average for 30 knots (Table 1 and Supplementary Table S2 ). In contrast, the standard B-splines and the Bayesian P-splines approaches overfit the data when the number of knots increase (Figures 4 and Table 1 ). Similar results were obtained for the other correlation scenarios (Figures S9-S12, Table 1 and Supplementary Table S2 ).

We next benchmarked the performance of the spatial models defined in Section 3.2 on data unrelated to COVID-19, which has previously been used to benchmark spatial models. The data set consists of more than 83,000 locations across East Africa with measured deviations in land surface temperatures (Ton et al., 2018) . We use the same training data as in , consisting of 6,000 uniformly sampled locations, and fit them using a Gaussian likelihood with mean surface specified by the various spatial models, and a free observation variance parameter. We compared standard 2D Bsplines, Bayesian P-splines, regularised B-splines projected 2D GPs, low rank Gaussian Markov random fields, neural network models, and the πVAE model (Mishra et al., 2020) as a prior for the mean surface. For the spline methods, 125 equidistant knots were placed along both axes. The results are summarised in Supplementary Table S3 and Figure S13 . We find that the regularised B-splines projected 2D GP model (testing MSE: 2.96) obtained better predictive performance than the standard B-splines surface model (testing MSE: 4.45), a low rank Gaussian Markov random field (testing MSE: 4.36), and 6 Time trends of age-specific COVID-19 deaths in the US Based on the simulation and benchmark results, we used the regularised B-splines projected 2D GP model to reconstruct the time trends in age-specific COVID-19 attributable deaths across the US. We placed 12 equidistant knots over the age axis and 10 over the week axis. This choice was determined such that the predictive performance did not significantly increase with more knots. Overall, fitting the regularised B-splines projected 2D GP model took 48 hours on a high performance computing environment, with typical effective sample sizes between 342 to 22127, and there were no reported divergences in Stan's Hamiltonian Monte Carlo algorithm. Here we focus on studying the recovered trends in age-specific COVID-19 deaths during the summer 2021 resurgence in relation to vaccine coverage. Vaccinate coverage is reported by age strata that do not match the age strata in which the weekly deaths are reported by the CDC, which renders our Bayesian semi-parametric modelling approach on 1-year age groups advantageous.

6.1 Smooth estimates of age-specific COVID-19 attributable deaths over time Figure 5 shows the predicted weekly attributable COVID-19 deaths over time that occurred in the four most populated US states. The predictions for each 1-year age are aggregated into the age bands 0-54, 55-74 and 75+ for direct comparison to the CDC data, shown as black dots. The data and estimates clearly reflect the multiple COVID-19 waves across the four states, with high numbers of deaths in the elderly.

The CDC data contain many anomalies, including censored and delayed entries, such as the unrealistically high number of deaths reported for the week starting on January 23, 2021 ( Figure 5 ). We used the curated all-ages weekly COVID-19 deaths records reported by JHU to re-distribute the delayed deaths to earlier weeks (6.1). The posterior median and 95% credible intervals in Figure 5 illustrate how the reporting delays apparent in the CDC data are adjusted by using the JHU data as external calibration, and how the discretized CDC data informs our estimates of the age composition of deaths. Using our model in conjunction with the curated JHU data on all COVID-19 deaths, we can estimate cumulative mortality for any age band. We find that tragically, as of September 25, 2021, the cumulative mortality rates in individuals aged 85+ now exceed 3% in Texas and New York (Supplementary Figure S14 ).

We validated our estimates of the age profile of COVID-19 attributable deaths using an external data set, i.e. that was not used to inform the model. The external data set Figure 5 : Predicted age-specific COVID-19 attributable deaths in the four most populated US states. Shown are the posterior median (line) and 95% credible intervals (ribbon) of the predicted weekly COVID-19 attributable deaths obtained with (3.4). The CDC data are shown with dots. The start of the 2021 summer resurgence period is indicated as dashed vertical line.

includes age-specific COVID-19 deaths reported directly by US states DoH (presented in Section 2). We computed the empirical age-specific contribution to weekly COVID-19 attributable deaths from the DoH data and checked if the latter lied inside the 95% posterior credible interval estimated by our model. We find a 93.82% prediction using the regularised B-splines projected 2D GP model across the four states, California, Florida, New York and Texas. State stratification of the prediction performance are presented in Supplementary Table S4. For completeness, we also fitted the model using alternative smoothing priors. Supplementary Section S11.1 shows the estimated age-specific composition of the COVID-19 attributable deaths in three weeks in Florida, as obtained with a standard 2D GP smoothing prior, standard 2D B-splines, Bayesian P-splines and the regularised Bsplines projected 2D GP. Moreover, Supplementary Table S4 presents the accuracy of their estimates compared to the external data set, ranging from 94.31% to 95.29%. Under the model specified with a Bayesian P-splines and standard 2D B-Splines surface, the estimated 2D surface over ages and weeks was wiggly and had a large uncertainty. Moreover, better predictive performances, quantified by the expected log pointwise predictive density Verity et al. (2020), were obtained for the B-Splines projected 2D GP compared to all other methods (Supplementary Table S5 ). Those results motivate the use of our proposed prior, the B-Splines projected 2D GP.

6.2 Strong summer 2021 resurgence in age-specific COVID-19 deaths are associated with limited vaccine coverage

Since July 2021, COVID-19 cases started to increase substantially (John Hopkins University of Medicine, 2020) and with them COVID-19 deaths in all age groups ( Figure 5 ). We define the summer 2021 resurgence as the upward trend in COVID-19 attributable deaths spanning from July 03, 2021 to September 25, 2021 across the United States, and denote by w start-r m the week index corresponding to the first week included in the summer 2021 resurgence period for state m. The week index w start-r m is defined as the first week from July 01 2021 for which a 4-week central moving average on the all-age weekly COVID-19 deaths was increasing. The state-specific start of the resurgence period is indicated by a dashed vertical line in Figure 5 . As is evident from the empirical all-age data and confirmed in our reporting-adjusted age-specific estimates, the increase in COVID-19 deaths was highly heterogeneous across US states, with small increases seen in California and New York, and large increases seen in Florida and Texas, relative to the magnitude of COVID-19 deaths in the previous waves.

We hypothesised that the variation in the magnitudes of the summer 2021 resurgence in COVID-19 deaths could be associated with differences in how comprehensively agespecific populations in each state took up the COVID-19 vaccine offer. To test this hypothesis, we attached as transformed parameter to the model presented in Section 3.1, age-specific predicted COVID-19 deaths relative to previous waves over the summer 2021. The relative COVID-19 deaths for state m, in age group c, at week w in summer 2021, denoted by r m,c,w , are defined as the ratio of the expected posterior predictive weekly deaths in week w to the maximum of the expected posterior predictive weekly deaths attained before the summer 2021 resurgence period in the same state and age group,

for c ∈ C and w ≥ w start-r m , where µ m,c,w are the expected posterior predictive weekly deaths for state m, in age group c, at week w defined in (3.5), and µ ,max-pre-r m,c = max {µ m,c,w } w<w start-r m } . For simplicity we here suppress in the notation the conditional dependence on the data. By dividing the mean posterior predictive weekly deaths by their maximum value attained on previous waves, the resulting relative COVID-19 deaths are adjusted for state-specific factors that are known to influence the magnitude of SARS-CoV-2 infections such as population density, stringency of non-pharmaceutical interventions, and comprehensiveness of behaviour changes.

We then assessed if the relative resurgences in COVID-19 deaths across age groups, weeks and states could be associated with differences in pre-resurgence vaccination rates, within a random-effects meta-regression model across states. The pre-resurgence vaccination rates are defined as the proportion of fully vaccinated individuals 14 days before the start of the summer 2021 resurgence period and denoted by v pre-r m,c for state m and age group c. We formulated the model as follows,

The χ m,c correspond to state-and age-specific baseline terms at the start of the summer resurgence, and the ψ m,c capture state-and age-specific growth rates. Note that the model also allows for indirect effects of vaccination rates in the other age group. Random effects and fixed effects (in terms of vaccination rate) were specified to the baseline terms, and fixed effects to the growth rates,

The model structure was motivated by the fact that by the start of the summer resurgence, the epidemics in each state had widely different magnitudes, likely depending on several state-specific unobserved factors, such as non-pharmaceutical intervention measures, mask use, or mobility, which we aimed to capture with the χ state m,c baseline random effects. Next, we hypothesized that after accounting for baseline differences, deaths would grow at smaller rates in populations Figure 6 : Resurgence in COVID-19 deaths and pre-resurgence vaccination coverage during the summer 2021 in four states. Posterior median estimates (line) along with 95% credible intervals (ribbon) of the relative COVID-19 deaths (6.1) over the summer 2021 resurgence period are shown on the y-axis for four states. The relative death estimates for each age group were aggregated to the reporting strata of the vaccination data. Week indices of the summer 2021 resurgence are shown on the x-axis, and correspond to different calendar weeks for each state (see text). The pre-resurgence vaccination rate are shown in color. Row facets show resurgences in relative COVID-19 deaths in individuals aged 18-64 and 65+. Column facets show vaccine coverage in individuals aged 18-64 and 65+.

with higher vaccine coverage at the start of the summer resurgence. To summarize, in the meta-regression model, the growth in deaths during the summer 2021 resurgence is expressed with initial values in terms of baseline epidemic magnitude and initial vaccination rates, which in turn characterise state-level exponential growth rates of this highly transmissible virus over the resurgence period. We found that model (6.2) without state-specific random effects on the growth rate already fitted the data very well (Supplementary Figures S20-S21) , suggesting a clear negative association of growth rates in deaths and vaccination coverage (after accounting for baseline difference at the start of resurgence, and after accounting for long-term differences in epidemic magnitude across states through r m,c,w ). Figure 6 shows on the x-axis the week indices of the summer 2021 resurgence period and in color the state-specific pre-resurgence vaccination rate among individuals aged 18-64 (left) and 65+ (right). The estimated relative COVID-19 deaths (6.1) are shown on the y-axis, now aggregated by the reporting strata of the vaccination data. We show results for four states for visual clarity; please see Supplementary Figures S15-S19 for results across the 10 most populated US states California, Florida, Georgia, Illinois, Michigan, New York, North Carolina, Ohio, Pennsylvania and Texas. Across US states, we observe that the resurgences in the weekly, relative COVID-19 attributable deaths were stronger in individuals aged 18-64 compared to individuals aged 65+, in line with higher vaccination rates in the 65+ age group compared to the 18-64 age group (note the scales on the y-axes).

We evaluated model (6.2) on the relative COVID-19 deaths of the ten most populated US states, California, Florida, Georgia, Illinois, Michigan, New York, North Carolina, Ohio, Pennsylvania and Texas. We find that the age-specific vaccination rates in individuals aged 18-64 and 65+ were statistically significantly associated with a protective effect on averting a strong summer 2021 resurgence in COVID-19 deaths. These relationships are visualized in Figure 6 and further in Supplementary Figure S22 , which shows effect sizes as a forest plot, indicating that higher proportions of fully vaccinated individuals prior to resurgence were statistically significantly associated with smaller relative COVID-19 deaths among the same age group. Specifically, the baseline effect of vaccination in the same age group (χ vacc ) was strongly negatively associated with resurgent deaths, with a Bayesian p-value of being non-negative of 0.025%. Similarly, the effect of vaccination in the same age group on the rate of exponential increase in resurgent deaths (ψ vacc ) was strongly negatively associated with resurgent deaths, with a Bayesian p-value of being non-negative of 0%.

With regard to indirect effects, all the terms χ vacc-cross

, and ψ vacc-cross 65+ were negative but not statistically significantly so in terms of the 95% posterior credible intervals. Yet, such negative associations are consistent with previous observations that identified young adults as being the main spreaders of COVID-19 across all age groups (Monod et al., 2021; Wikle et al., 2020) . In this context, infectious disease theory predicts that reducing infections in the younger adults, has substantial indirect benefits in averting COVID-19 deaths both in younger adults and the elderly. This rationale has been widely demonstrated for other highly transmissible pathogens such as seasonal influenza (Baguelin et al., 2013) , which is primarily transmitted through children and their parents, and so achieving high vaccine coverage in these populations is optimal to minimize deaths and hospitalisations, and maximize other cost-benefit metrics. The effect sizes associated with higher vaccine coverage are thus consistent with infectious disease theory, and underscore the tremendous importance that younger adults get vaccinated against COVID-19 if they are medically fit to halt future COVID-19 resurgences among their age group.

To illustrate the potential impact of the inferred associations between vaccine coverage in younger adults and resurgent COVID-19 deaths, we considered counterfactual analyses. Specifically, we project the weekly COVID-19 deaths in all age groups assuming that the pre-resurgence vaccine coverage among 18-64 year olds in all states would have been the same as the maximum pre-resurgence vaccine coverage among 18-64 year olds across the ten most populated states, and assuming that the estimated associations are causal. Posterior median estimates of the predicted weekly deaths (line) and 95% credible intervals (ribbon) are shown for the observed COVID-19 deaths (black) and the counterfactual (purple). The start of the 2021 summer resurgence is shown as dashed vertical line. In Florida, an estimated 57. 15% [40.96, 70.47 ] of deaths in the 18-64 age group, and 20.09% [-29.54, 54 .04] of deaths in the 65+ age group could have been avoided. In Texas, an estimated 39. 80% [20.81, 54 .85] of deaths in the 18-64 age group, and 13.41% [-41.91, 49 .61] of deaths in the 65+ age group could have been avoided.

The maximum pre-resurgence vaccine coverage among 18-64 year olds was reported in New York, and was equal to 46%. We consider this counterfactual scenario realistic, as it is a retrospective evaluation rather than a forecast, and within the observed range of vaccination rates, i. e. not an extrapolation. In the counterfactual, we did not consider higher vaccination rates in individuals aged 65+ because the differences in vaccination rates across states were much smaller in this age group, and we here aim to highlight the potential consequences of the more substantial variation in vaccination rates in younger individuals. Table S6 .

We present a Bayesian non-parametric modelling approach to estimate and report longitudinal trends in COVID-19 attributable deaths by 1-year age bands across the United States. The model is informed by COVID-19 attributable weekly deaths reported by the CDC, the official agency providing such data at the finer spatial resolution in the US. In comparison to the crude CDC data, the predictions are for all weeks, and account for missing and censoring present in the crude data. The predictions of the model also adjust for reporting delays as e. g. seen in November to December 2020 in Florida and for under-reporting as e. g. seen in December 2020 to March 2021 in New York ( Figure 5 ), by allowing the use of an external dataset for calibration, for which we used the curated JHU all-age COVID-19 attributable deaths. Importantly, the Bayesian model predicts COVID-19 attributable deaths by 1-year age band, which for example allows reporting of cumulative mortality rates in any age category of interest. Finally, the model implementation is portable, freely available and computationally efficient through its implementation in Stan and the use of B-splines projected Gaussian process priors. For these reasons, we find that the model is a useful tool to improve the reporting of COVID-19 deaths and facilitate demographic and epidemiologic analyses at finer granularity.

Our analyses of the US summer 2021 resurgence provide further support for the hypothesis that lower vaccine coverages in several states, most notably Florida and Texas, are associated with larger COVID-19 death counts (Sah et al., 2021; Moghadas et al., 2021) . However, there are a large number of behavioural, governmental and state specific differences that modify this effect size. We have here attempted to agnostically account for these via random effect terms, but it should be noted that we do not explicitly model these factors.

We developed B-splines projected Gaussian process priors to describe the time trends in the age profile of COVID-19 deaths through a non-parametric 2D random surface. We placed GP priors on the B-splines coefficients, showed that the model is again a GP, and derived the B-splines projected kernel function of the GP. We also showed that the model can be interpreted as regularised B-splines, and find that the negative determinant of the covariance matrix in the log-likelihood, often referred to as Occam's factor (MacKay (2003), chap. 28), plays the same role as the penalty of smoothing splines and P-splines (O'Sullivan, 1986; Eilers and Marx, 1996) . Compared to a standard 2D GP, improvements in runtimes lie both in the fact that fewer parameters on the 2D surface are estimated (from n × m to I × J), and that the computational complexity in calculating covariance matrices is reduced (from O(2(n × m) 3/2 ) to O(2(I × J) 3/2 )), where I and J are the number of B-splines on the x and y axes, and n and m are the number of points on the x and y axes and with I < n and J < m. On simulations and on real data, we find the model induces better smoothness properties than standard B-splines (d. Boor, 1978) and Bayesian P-splines (Lang and Brezger, 2004) where these are warranted, and so we find the B-splines projected Gaussian process priors are likely an appealing addition to the arsenal of Bayesian regularising priors.

Supplementary Figures and Tables.

B-splines and properties of the regularised B-splines projected Gaussian Process prior. 2 parts: (1) Construction of a B-spline curve and a B-spline surface and, (2) Proof that a baseline kernel function projected with cubic B-splines is a C 2 function.

Modelling COVID-19 weekly deaths. 2 parts: (1) Finding weekly death count from censored and missing cumulative death count from the CDC data and, (2) Specifying the likelihood form. Zheng, J., Wu, X., Fang, F., Li, J., Wang, Z., Xiao, H., Zhu, J., Pain, C., Linden, P., and Xiang, B. (2021). "Numerical study of COVID-19 spatial-temporal spreading in London." Physics of Fluids, 33 (4) Figure S9 . Figure S12 : Estimated mean surfaces using non-regularized and regularized B-splines spatial models (strong correlation, γ = 1). Posterior median of the mean surface estimated by (A) standard B-splines, (B) Bayesian P-splines and (C) regularised B-splines projected 2D GP with different number of knots given strongly correlated simulated data (γ = 1). The simulated data are illustrated in Figure S10 . Figure S14: Predicted COVID-19 attributable mortality rates. Posterior median estimates of the mortality rate as of September 25, 2021 (barplot) and 95% credible intervals (error bars) are presented. Column facets indicate the age group among which the COVID-19 mortality rates are predicted. Figure S15 : Retrievable weekly COVID-19 attributable deaths in ten US states. The retrievable weekly deaths are computed using the first order difference of reported cumulative deaths by the CDC (Centers for Disease Control and Prevention, 2021c). Grey cells indicate non-retrievable weekly death counts due to the censoring of cumulative deaths. Orange bars show the missing weekly deaths due to non-reported cumulative deaths. Figure S18: Resurgence in COVID-19 deaths and pre-resurgence vaccination coverage during the summer 2021 in the ten most populated states (part 1). Posterior median estimates (line) along with 95% credible intervals (ribbon) of the relative COVID-19 deaths (6.1) over the summer 2021 resurgence period are shown on the y-axis for ten states. The relative death estimates for each age group were aggregated to the reporting strata of the vaccination data. Week indices of the summer 2021 resurgence are shown on the x-axis, and correspond to different calendar weeks for each state (see main text). The pre-resurgence vaccination rate among individuals aged 18-64 are shown in color. The states are ordered from left to right by the pre-resurgence vaccination rate among individuals aged 18-64. Row facets show resurgences in relative COVID-19 deaths in individuals aged 18-64 and 65+. Figure S19 : Resurgence in COVID-19 deaths and pre-resurgence vaccination coverage during the summer 2021 in the ten most populated states (part 2). Posterior median estimates (line) along with 95% credible intervals (ribbon) of the relative COVID-19 deaths (6.1) over the summer 2021 resurgence period are shown on the y-axis for ten states. The relative death estimates for each age group were aggregated to the reporting strata of the vaccination data. Week indices of the summer 2021 resurgence are shown on the x-axis, and correspond to different calendar weeks for each state (see main text). The pre-resurgence vaccination rate among individuals aged 65+ are shown in color. The states are ordered from left to right by the pre-resurgence vaccination rate among individuals aged 65+. Row facets show resurgences in relative COVID-19 deaths in individuals aged 18-64 and 65+. Figure S20 : Posterior predictive checks of model (6.2) (part 1). Posterior median estimates (line) and 95% credible intervals (ribbon) of the relative COVID-19 deaths during the summer 2021 resurgence period estimated with pre-resurgence vaccination coverage (6.2) are shown in blue. Posterior median estimates (line) and 95% credible intervals (ribbon) of the relative COVID-19 deaths during the summer 2021 resurgence period estimated with weekly deaths data (3.4), on which (6.2) was fitted, are shown in red. Row facets show resurgences in relative COVID-19 deaths in individuals aged 18-64 and 65+. Figure S21 : Posterior predictive checks of model (6.2) (part 2). Posterior median estimates (line) and 95% credible intervals (ribbon) of the relative COVID-19 deaths during the summer 2021 resurgence period estimated with pre-resurgence vaccination coverage (6.2) are shown in blue. Posterior median estimates (line) and 95% credible intervals (ribbon) of the relative COVID-19 deaths during the summer 2021 resurgence period estimated with weekly deaths data (3.4), on which (6.2) was fitted, are shown in red. Row facets show resurgences in relative COVID-19 deaths in individuals aged 18-64 and 65+. Table S4 : Predictive accuracy of the four spatial prior on age-specific weekly COVID-19 attributable deaths data. Proportion of the empirical age-specific contribution to COVID-19 weekly deaths reported by DoH inside the 95% credible interval estimated by the model presented in Section 3.1 under four prior specifications. The four prior are placed on the random surface which specifies the age composition of deaths over time, a standard 2D GP, a standard B-splines surface and our regularised B-splines projected GP. Note that age-specific COVID-19 attributable deaths were not reported by the DoH of New York. Table S6 : Projected avoidable deaths under higher vaccine coverage in individuals aged 18-64 during the summer 2021 resurgence period. In counterfactual scenarios, we predicted the number of COVID-19 deaths assuming that the vaccine coverage in individuals aged 18-64 in all states had been the same as the maximum vaccine coverage in individuals aged 18-64 across states (46% reported in New York). These counterfactuals consider that the inferred associations between resurgent deaths and vaccine coverage are causal, which may be incorrect. Posterior median estimates and 95% credible interval of the projected avoidable deaths (left) and proportion of deaths that were avoidable (right) over the summer 2021 resurgence period in the 10 most populated US states are shown.

S9 B-splines and properties of the regularised B-splines projected Gaussian Process prior S9.1 Construction of a B-spline curve and a B-spline surface

We start this section by constructing a B-spline basis function, also called B-spline segment, which we refer to as B-spline. A B-spline is entirely defined by a polynomial degree, d, and a non-decreasing sequence of knots (t 1 ≤ · · · ≤ t K ) spanning in X ⊆ R, where K is the number of knots. For the B-spline to cover the entire span of knots, the sequence of knots has to be extended on the left (d times) and on the right (d times), (ξ 1 ≤ · · · ≤ ξ d ≤ ξ d+1 ≤ · · · ≤ ξ d+K ≤ ξ d+K+1 ≤ · · · ≤ ξ 2d+K+1 ), where ξ d+1 := t 1 , . . . , ξ d+K := t K . The value of the additional knots is generally set to the boundary knots (Perperoglou et al., 2019) . The order of a B-spline, denoted by p, is equal to d−1. A B-spline of order p, written B i,p (x), is defined by the recursive formula,

, for x ∈ X and i = 1, . . . , I with I = K + d − 1. The above is usually referred to as the Cox-de Boor recursion formula (d. Boor, 1978) .

Let us define C k (A; B) as the set of C k functions from A to B.

Proposition 1. Let a B-spline of degree d, constructed in (S9.1) given a vector of strictly increasing sequence knots (no knot occurs more than 1 time), spanning over X . Then, the B-spline is a C d−1 (X ; R + ).

Proof. Recall from Goldman (2002) that a B-spline of degree d, constructed in (S9.1) given a vector of strictly increasing sequence knots, spanning over X is of continuity C ∞ (X ; R + ) piecewise, and of continuity C d−1 (X ; R + ) on the knots. Therefore, the same B-spline is C d−1 (X ; R + ) everywhere.

Next, we construct a B-spline curve of degree d, a linear combination of coefficients and B-splines of degree d, given by,

..,I are the associated B-spline coefficients, also referred to as control points or de Boor points.

The surface analogue to the B-spline curve is the B-spline surface. The surface is defined on the space X 1 × X 2 with X 1 ⊆ R and X 2 ⊆ R. We start by defining two non decreasing sequences of knots, t 1 = (t 1 1 ≤ · · · ≤ t 1 K1 ) and (t 2 1 ≤ · · · ≤ t 2 K2 ) placed respectively on the space X 1 and X 2 , where K 1 is the number of knots along the first axis, and K 2 along the second axis. The corresponding integral of the B-spline surface evaluated at x 1 ∈ X 1 and x 2 ∈ X 2 is given by,

where B 1 i,d1−1 (.) is the ith B-spline of degree d 1 defined over the space X 1 on the knot vector t 1 . Similarly, B 2 j,d2−1 (.) is the jth B-spline of degree d 2 defined over the space X 2 on the knot vector t 2 . The total number of B-splines defined over the space X 1 is I K 1 + d 1 − 1 and over the space X 2 it is J K 2 + d 2 − 1. We denote the ensemble of pairs of B-spline's indices by 

We follow the notations introduced in Section S9.1 and denote the ith and jth cubic Bspline by B 1 i (x 1 ) and B 2 j (x 2 ), with i = 1, . . . I and j = 1, . . . , J 1 and where x 1 ∈ X 1 ⊆ R and x 2 ∈ X 2 ⊆ R. They are respectively defined on two vectors of strictly increasing knots of lengths K 1 and K 2 . For ease of notation, we denote the ensemble of B-spline indices by U defined in (S9.4).

Let us define a low-rank covariance matrix K β of size IJ × IJ obtained by evaluating at every pair of points in u, u ∈ U a base kernel function k β , K β u,u = k β u, u .

(S9.5)

Let us define a high-rank covariance matrix projected by cubic B-splines by W , obtained by evaluating at every pair of points a kernel function w, W (x1,x2),(x 1 ,x 2 ) = w(x 1 , x 2 , x 1 , x 2 ) with,

where (x 1 , x 2 ), (x 1 , x 2 ) ∈ X 1 × X 2 , and

We repeat here the definition presented in S9.1: we define C k (A; B) as the set of C k functions from A to B.

Proposition 2. Let w(·) be defined as in (S9.6) and S = X 1 × X 2 × X 1 × X 2 ⊆ R 4 . Then, w(·) is a C 2 (S; R) function.

Proof. In order to show that w(x 1 , x 2 , x 1 , x 2 ) ∈ C 2 (S; R) it is sufficient to show that all its second partial derivatives are continuous functions on S. We show this for the case of the partial derivative ∂ 2 w ∂x 2 1 and the mixed derivative ∂ 2 w ∂x1∂x2 , as the proof for the remaining second partial derivatives is identical. The proof is constructed as follows: (1) we compute ∂ 2 w ∂x 2 1 and ∂ 2 w ∂x1∂x2 and (2) we show that ∂ 2 w ∂x 2 1 and ∂ 2 w ∂x1∂x2 are continuous.

Step (1): A direct computation shows,

where we have used the expression of D as stated in (S9.7) and U is defined in (S9.4).

Step (2): We show that (S9.8) is a continuous function on S. As stated in Proposition 1, the functions B 1 u1 (·) and B 2 u2 (·) are C 2 (X 1 ; R + ) and C 2 (X 2 ; R + ) functions, respectively. Therefore,

u2 (x 2 ) are continuous function on X 1 and X 2 , respectively. Moreover, recall that k β is a kernel function, so for any fixed (u 1 , u 2 ), (u 1 , u 2 ) ∈ U , k β (u 1 , u 2 , u 1 , u 2 ) ∈ R. By standard results of Multivariate Calculus (Lax and Terrell (2017) Theorem 2.5), for any fixed u 1 , u 2 , u 1 , u 2 the term

is a continuous function from S to R. Moreover, the RHS of (S9.8) is a linear combination of continuous function from S to R, therefore it is continuous from S to R. Hence, ∂ 2 w ∂x 2 1 is a continuous function from S to R. Now, we show that (S9.9) is a continuous function on S. As stated in Proposition 1, the functions B 1 u1 (·) and B 2 u2 (·) are C 2 (X 1 ; R + ) and C 2 (X 2 ; R + ) functions, respectively. Therefore, ∂B 1 u 1 ∂x1 (·) and ∂B 2 u 2 ∂x2 (·) are continuous functions on X 1 and X 2 , respectively. Moreover, recall that k β is a kernel function, so for any fixed (u 1 , u 2 ), (u 1 , u 2 ) ∈ U , k β (u 1 , u 2 , u 1 , u 2 ) ∈ R. By standard results of Multivariate Calculus (Lax and Terrell (2017) Theorem 2.5), for any fixed u 1 , u 2 , u 1 , u 2 the term

is a continuous function from S to R. Moreover, the RHS of (S9.9) is a linear combination of continuous functions on from S to R, therefore it is continuous from S to R. Hence, ∂w ∂x1∂x2 is a continuous function from S to R. The proof for the other terms follows from a similar argument, with the necessary replacement of sets and functions. Therefore w(·) is a C 2 (S; R) function. S10 Modelling COVID-19 weekly deaths S10.1 Finding weekly death count from censored and missing cumulative death count

We denote by W cum = {1, . . . , W cum } the set of week indices from the first to the last reported cumulative deaths by the CDC. The reports span from May 2, 2020 to October 02, 2021 such that W cum = 75. We denote by D m,b,w the reported cumulative deaths by the CDC in state m for age group b in week w. The cumulative deaths count D m,b,w is reported if the count count is 0 or strictly greater than 9, and otherwise it is censored. For simplicity we suppress the state index in what follows, with all equations being analogous. We denote by W censored b the set of weeks where D b,w has been censored.

For a given age group b, if D b,w is reported on weeks w and w + 1, we find the weekly deaths among age group b in week w, d b,w , with the first order difference,

The weekly deaths that are retrievable from the first order difference are defined as "retrievable" and the set of week indices for which we can retrieve the weekly deaths is denoted by W WR b (WR := Week Retrievable). However, if D b,w is censored for w or w+1, the first-order difference is not obtainable and the weekly deaths is said to be "nonretrievable". We denote by W WNR b the set of week indices for which we cannot retrieve the weekly deaths (WNR := Week non-retrievable), with W WR b ∩ W WNR b = ∅. Notice that for a fixed age group b, weekly deaths are non-retrievable for a joint sequence of weeks over time, and importantly this period can happen only once (because the cumulative deaths are strictly increasing). Therefore, we consider the sequence of nonretrievable weekly deaths for a fixed age group together. Because the boundaries of the censored cumulative death are known, we can obtain the boundaries of the sum of the non-retrievable weekly deaths sequence. Let us denote by w FC b the first week when the cumulative death count is censored (FC := First Censored). Such that,

(S10.2) Similarly, we denote by w FNC b the first week when the cumulative death count is observed after being censored (FNC := First Non Censored). Such that,

Four different scenarios may apply.

1. If D b,w is censored for some weeks w ∈ W cum but is observed on the first and the last weeks (i.e., w = 1 and w = W cum ), the non-retrievable weekly deaths must sum to the first positive cumulative deaths, such that,

, . . . , (W cum − 1)}. We show an example of this scenario in Table S7. 2. If D b,w is censored for some weeks w ∈ W cum , including the first week (i.e., w = 1), but is observed for the last week (i.e., w = W cum ), the sum of non-retrievable weekly deaths must be between,

, . . . , (W cum − 1)}. We show an example of this scenario in Table S8. 3. If D b,w is censored for some weeks w ∈ W cum , including the last week (i.e., w = W cum ), but is observed for the first week (i.e., w = 1), the sum of the nonretrievable weekly deaths must be between,

does not exist. We show an example of this scenario in Table S9. 4. Lastly, if D b,w is censored for all weeks w ∈ W cum , including the first week (i.e., w = 1) and the last week (i.e., w = W cum ), the sum of the non-retrievable weekly deaths must be between,

does not exist. We show an example of this scenario in Table S10 .

The CDC did not publish a report on July 4 2020, such that the cumulative deaths in the corresponding week are not available. We denote this week by w missing . The weekly deaths could not be obtained for the missing week or the week before, such that,

Weekly deaths that occurred on those w missing and w missing − 1 are defined to be missing. It is important to note the difference between non-retrievable weekly deaths, for which we have information through the censoring of cumulative deaths, and missing weekly deaths for which no information has been provided. Note that if the index of the missing week w missing and the week before w missing − 1, are in between the range of W censored b , or are before or after the set's limits, they are included in W censored b , with all inequations still holding. Table S10 : Finding weekly death when D w is censored as in scenario 4. In this example W censored = {1, 2, 3, 4, 5}, w FC = 1, w FNC does not exist, W WNR = {1, 2, 3, 4} and W WR = ∅. From (S10.7), we know that 0 ≤ w∈W WNR d w ≤ 8.

In this section, we use the notation introduced in Section S10.1. The log likelihood is,

The log likelihood for the retrievable weekly deaths in (S10.8a) is

The log likelihood for the non-retrievable weekly deaths in (S10.8b) is

where the boundaries for the different scenarios have been discussed in Section S10.1. Note that to find (S10.10), we used the fact that the sum of independent negativebinomially distributed random variables with shape parameter α 1 and α 2 , and with the same scale parameter θ is negative-binomially distributed with a scale parameter θ and with a shape parameter α 1 + α 2 . S11 Additional analyses S11.1 Comparison of random surface priors for one US state

We used the non-parametric model (3.1-3.2) to fit age-specific weekly COVID-19 attributable deaths from the CDC data. We compare results, for one state, Florida, across the four prior specifications on the random surface which specifies the age composition of deaths over time, a standard 2D GP, a standard B-splines surface and our regularised B-splines projected GP. Figure S24 shows the estimated age-specific composition of weekly COVID-19 attributable deaths in three weeks for the four prior specifications. The posterior distribution of the age profile of COVID-19 attributable deaths estimated with the standard B-splines and Bayesian P-splines prior is rough because the model's complexity is not penalized, or not enough. Such a penalty is introduced in our regularised B-splines projected GP prior, which results in a less wiggly posterior distribution with less uncertainty. To quantify the predictive performance of the model, we used the expected log pointwise predictive density (ELPD) (Verity et al., 2020) (Supplementary Table S5 ). The best performance was obtained by the regularised B-splines projected GP. In particular, the expected log pointwise predictive density (ELPD) difference, between a standard GP and a regularised B-splines projected GP was -22.03 (sd 11.99), suggesting significantly better predictive performance with the regularised B-splines projected GP prior. In addition, a standard GP requires a longer running time (2.34 times more). Similar results were obtained for all other states, with the regularised B-splines projected GP prior consistently outperforming the other standard methods. Figure S24 : Estimated age-specific contribution to weekly COVID-19 attributable deaths by four priors. Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Four prior placed on the random surface which specifies the age composition of deaths over time are compared, a standard 2D GP, a standard B-splines surface and our regularised B-splines projected GP. S11.2 Differences in the pre-vaccination age profile of COVID-19 attributable deaths across states

We investigate whether the age composition of COVID-19 attributable deaths differs across states. To address this question, we focus on the baseline period of the first 3 months after the 10th cumulated death in each state, and aggregate the predictions of the share of 1-year age bands from our non-parametric model into the age strata 0-24, 25-54, 55-74, 75-84, 85+ years for each state during the baseline period. In these calculations, we adjusted for age differences in state populations with the age composition of the US population. Pre-pandemic age-specific population counts were drawn from the U.S. Census Bureau in 2018 (United States Census Bureau, 2018). Strikingly, the resulting posterior estimates of the share of age groups among COVID-19 attributable deaths were statistically significantly different across states even after standardising the age composition of state populations ( Figure S25 ). These discrepancies across states are clearly supported in the empirical data, albeit without an assessment of their statistical significance ( Figure S26 ). Notice that the contribution from individuals aged 55+ estimated by the model is always lower or equal to the empirical contribution while the one in 25-54 is always greater or equal. This is because the empirical contributions were computed considering that unretrievable weekly deaths were null while they could, in fact, be positive. Figure S25 : Estimated age-specific contribution to COVID-19 attributable deaths adjusted for population size in the first 3 months of state epidemics. Posterior median estimates of each age group to COVID-19 attributable deaths are shown for each state (barplot), along with 95% credible intervals (error bars). Estimates are for the 3 months after the 10th cumulative death in each state, and are adjusted for differences in the age composition of state populations (see text). Brighter colors indicate higher values of the baseline contribution and darker colors indicate lower values. Figure S26 : Estimated and empirical age-specific contribution to COVID-19 attributable deaths in the first 3 months of state epidemics. Posterior median estimates of each age group to COVID-19 attributable deaths are shown for each state (barplot), along with 95% credible intervals (error bars). Estimates are for the 3 months after the 10th cumulative death in each state, and are not adjusted for differences in the age composition of state populations. Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4) . (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S28 : State summary for Florida. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4). (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S29 : State summary for Georgia. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4). (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S30 : State summary for Illinois. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4). (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S31 : State summary for Michigan. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4) . (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S32 : State summary for North Carolina. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4). (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S33 : State summary for New York. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4) . (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S34 : State summary for Ohio. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4). (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S35 : State summary for Pennsylvania. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4) . (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks. Figure S36 : State summary for Texas. (A) Weekly COVID-19 attributable deaths reported by the CDC stacked by age groups (color). (B) Posterior mean of the weekly COVID-19 attributable deaths stacked by age, estimated with a regularised B-splines projected GP prior. Three weeks in the observation period are selected and designated by a shape. The black line shows the all-age weekly COVID-19 attributable deaths reported by JHU, that were used to adjust the CDC data for reporting delays (3.4). (C) Posterior median (black line) and 95% credible interval (ribbon) of the age-specific contribution to COVID-19 attributable deaths in three weeks.

Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine

Assessing Optimal Target Populations for Influenza Vaccination Programmes: An Evidence Synthesis and Modelling Study

Estimating weekly excess mortality at sub-national level in Italy during the COVID-19 pandemic

Generalized structured additive regression based on Bayesian P-splines

COVID-19 Vaccinations in the United States

COVID-19 Vaccinations in the United States, Jurisdiction

Weekly Updates by Select Demographic and Geographic Characteristics

A Time-Dependent SIR Model for COVID-19 With Undetectable Infected Persons

A Practical Guide to Splines

Fast and compact smoothing on large multidimensional grids

Flexible smoothing with B-splines and penalties

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Pyramid Algorithms

Multiple Kernel Learning Algorithms

Impact and effectiveness of mRNA BNT162b2 vaccine against SARS-CoV-2 infections and COVID-19 cases, hospitalisations, and deaths following a nationwide vaccination campaign in Israel: an observational study using national surveillance data

Modeling COVID-19 scenarios for the United States

Coronavirus Resource Center

Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola

Bayesian P-Splines

We thank Sam Abbott and two anonymous reviewers for their helpful comments. 

Listing 2: Stan model to fit regularised B-splines projected Gaussian Process on the mean surface of 2D continuous data S12.3 R code to obtain B-splines