key: cord-0151024-y2p1nibm authors: Monod, M'elodie; Blenkinsop, Alexandra; Brizzi, Andrea; Chen, Yu; Perello, Carlos Cardoso Correia; Jogarah, Vidoushee; Wang, Yuanrong; Flaxman, Seth; Bhatt, Samir; Ratmann, Oliver title: Regularised B-splines projected Gaussian Process priors to estimate time-trends of age-specific COVID-19 deaths related to vaccine roll-out date: 2021-06-23 journal: nan DOI: nan sha: 75a14f8c316d6f23099d6fe8fc58e24f1f432e6b doc_id: 151024 cord_uid: y2p1nibm The COVID-19 pandemic has caused severe public health consequences in the United States. In this study, we use a hierarchical Bayesian model to estimate the age-specific COVID-19 attributable deaths over time in the United States. The model is specified by a novel non-parametric spatial approach, a low-rank Gaussian Process (GP) projected by regularised B-splines. We show that this projection defines a new GP with attractive smoothness and computational efficiency properties, derive its kernel function, and discuss the penalty terms induced by the projected GP. Simulation analyses and benchmark results show that the spatial approach performs better than standard B-splines and Bayesian P-splines and equivalently well as a standard GP, for considerably lower runtimes. The B-splines projected GP priors that we develop are likely an appealing addition to the arsenal of Bayesian regularising priors. We apply the model to weekly, age-stratified COVID-19 attributable deaths reported by the US Centers for Disease Control, which are subject to censoring and reporting biases. Using the B-splines projected GP, we can estimate longitudinal trends in COVID-19 associated deaths across the US by 1-year age bands. These estimates are instrumental to calculate age-specific mortality rates, describe variation in age-specific deaths across the US, and for fitting epidemic models. Here, we couple the model with age-specific vaccination rates to show that lower vaccination rates in younger adults aged 18-64 are associated with significantly stronger resurgences in COVID-19 deaths, especially in Florida and Texas. These results underscore the critical importance of medically able individuals of all ages to be vaccinated against COVID-19 in order to limit fatal outcomes. A new pathogen, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), emerged in the Wuhan region of China in December 2019 and continues to spread worldwide. The resulting disease, COVID-19, is severe, with overall infection fatality rates (IFRs) between 0.1% and 1% (Meyerowitz-Katz and Merone, 2020; Brazeau et al., 2020) , which increase exponentially with age (Levin et al., 2020) . Developed vaccines are highly effective to prevent deaths (Haas et al., 2021; Baden et al., 2021) , and initially have been prioritised to older age groups. In the United States (US), vaccines are now offered to all adults since May 2021, but uptake has been variable across states (Centers for Disease Control and Prevention, 2021a) . Despite increasing vaccine coverage, several US states, most notably Florida and Texas, are reporting resurgent COVID-19 death waves that are on par or exceed in magnitude the death waves seen in the prevaccination period. A key question is if these death waves since summer 2021 are linked to low vaccination coverage in younger adults, which are known to drive transmission (Monod et al., 2021) . Here, we provide methods for estimating the age composition of COVID-19 attributable deaths over time, and to characterise the potential shifts in the age composition of deaths. Doing so requires a statistical model because data are partially censored, reported with delays, and reported in age bands that can be inconsistent across locations or do not match those of other data streams such as reported vaccinations. Numerous methods have been developed to interpret the time evolution of overall COVID-19 attributable deaths (e.g., Lavezzo et al. (2020) ; Institute for Health Metrics and Evaluation (2020); Chen et al. (2020) ; Blangiardo et al. (2020) ; Zheng et al. (2021) ; Flaxman et al. (2020) ), but few are suitable to analyze longitudinal data stratified by age brackets or other discrete strata (Monod et al., 2021) , let alone at high resolution such as 1-year age bands. One reason for this paucity of methods is that in SIR-type models, calculating age-specific next generation operators at a time resolution of days becomes computationally slow over observation periods that span years, especially when calculations need to be repeated millions of times within a Bayesian framework (Wikle et al., 2020; Monod et al., 2021) . These considerations are prompting us and others to consider non-mechanistic, flexible estimation approaches (Shah et al., 2020; Pokharel and Deardon, 2021) . As our objective is to reconstruct spatiotemporal trends in age-specific COVID-19 attributable deaths, it does not require invoking many of the assumptions or complexities that underlie mechanistic or semi-mechanistic COVID-19 transmission dynamics models and Bayesian non-parametric models are sufficient. We present a fully Bayesian, computationally scalable approach to estimate a two-dimensional (2D) surface over ages and weeks that describes the time evolution of COVID-19 deaths by 1-year age bands at US state level. To impute missing entries on the surface and estimate global trends over ages and weeks, we borrow information across neighboring entries by using a non-parametric smoothing method. A natural starting point for modelling a surface is a 2D Gaussian process (GP) (Rasmussen and Williams, 2005) . However, their computational complexity makes the use of 2D GPs in a fully Bayesian framework difficult when the surface dimension becomes large, even after using a Kronecker decomposition of the kernel function (Saatçi, 2011; Wilson et al., 2014) . Here we adopt a low-rank approximation via a tensor product of B-splines for which the parameters follow a 2D GP. The resulting approach is equivalent to a 2D GP defined by a low-rank covariance matrix projected by B-splines. B-splines are a popular choice for non-parametric modelling, due to their continuity properties, ensuring smoothness of the fitted surface, and their easy implementation. But choosing the optimal number and position of knots-the defining grid segments where the surface is expected to change its behavior-on the space of ages and weeks is a complex task. Some approaches have focused on adding a penalty to restrict the flexibility of the fitted surface in a frequentist framework (O'Sullivan, 1986,9; Eilers and Marx, 1996; Eilers et al., 2006) . Following this idea, we regularise the fitted surface by using a kernel function with a free complexity parameter to define the covariance matrix of the low-rank 2D GP. We qualitatively compare the penalty induced by this choice to that of related regularisation methods. We benchmark the proposed regularised B-splines projected GP against several other popular smoothing methods, and we demonstrate that our approach results in substantive computational gains over a standard 2D GP for similar estimation accuracy. This paper focus on estimating age-specific COVID-19 attributable deaths in the context of vaccine roll-out on publicly available, age-specific COVID-19 death and vaccination data from the 4 most populated US states, California, Florida, New York (state) and Texas, that are reported by the Centers for Disease Control and Prevention (CDC) (2021a; 2021c). It is structured as follows. Section 2 introduces the data and their limitations. Section 3 describes the proposed methodology to model trends in the age-composition of COVID-19 deaths, including a theoretical characterization of the penalties introduced by B-splines projected GPs. Section 4 presents a comparison of the proposed method to related smoothing approaches on simulated data, and Section 5 on real world data used for benchmarking. In Section 6 we estimate the time evolution of age-specific COVID-19 deaths for the four most populated US states. We document the marked variation in the summer 2021 resurgence in age-specific COVID-19 deaths across states, and show strong resurgences in deaths are associated low vaccination coverage in younger adults. Lastly, Section 7 closes with a discussion. The proposed modelling framework can be further applied to estimate COVID-19 mortality rates in arbitrary age groups, investigate differences in the age composition of deaths across locations or identify time shifts in the age composition of deaths. The code to use our approach and to reproduce the results is available at https://github.com/ImperialCollegeLondon/B-SplinesProjectedGPs. Median and 95% credible interval of the state-and age-specific COVID-19 attributable deaths predicted by our approach are available in the GitHub repository. Posterior samples are also available upon request. Our results suggest regularised B-splines projected GP priors are likely useful for other surface estimation problems, and we provide templated Stan model files in Supplement S12. The Center for Disease Control (CDC) and National Center for Health Statistics (NCHS) report each week on the total number of deaths involving COVID-19 that have been received and tabulated through the National Vital Statistics database (Centers for Disease Control and Prevention, 2021c) for every US state across the age groups We refer to the latter as cumulative reported COVID-19 attributable deaths, D m,b,w , for state m, in age group b and on week w. Historical records are made available by Rearc (2021) . To simplify the longitudinal dependence and create a time series from the cumulative counts (King et al., 2015) , we obtain the weekly COVID-19 attributable deaths by differencing, in each location and age group for all but the last available week (daily deaths, d, for state m, in age group b, on week w). We index the weekly deaths by w = 1, . . . , W . Reflecting the reporting nature of the age-stratified CDC data, the weekly COVID-19 attributable d m,b,w are subject to reporting delays and do not necessarily correspond to the number of individuals who died of COVID-19 in state m and week w. The CDC does not report the cumulative deaths count if the count is between 1 and 9. Thus, it is not possible to retrieve all weekly deaths, and we refer to the set of weeks that are retrievable in state m for age group b through first order differencing by W WR m,b , and to the set of weeks that were non-retrievable because of censoring by W WNR m,b . The censored cumulative deaths are bounded, such that the sum of non-retrievable weekly deaths is also bounded. The exact computations are described in Appendix S10.1. In addition, there was no update on July 4, 2020 resulting in missing cumulative deaths in that week. The weekly deaths of that week and the preceding week are declared missing. Note that the missing weekly deaths are not equivalent to the non-retrievable weekly deaths because we cannot bound their sums. In the main text, we focus on data reported from May 2, 2020 to September 25, 2021. We thus have W = 74. Figure 1 show the COVID-19 attributable weekly deaths data for the four most populated US states, California, Florida, New York and Texas. The CDC reports weekly time series of the proportions of individuals aged 18-64 and 65+ who are fully vaccinated (Centers for Disease Control and Prevention, 2021b), referred to as vaccination rate in the following. Fully vaccinated individuals are defined as having received the second dose of a two-dose vaccine or one dose of a single-dose vaccine. We use records from March 02, 2021 to September 25, 2021, and denote by v m,c,w the vaccination rate for state m in age group c ∈ C at the start of week w, (2.3) Figure 2 presents the vaccination rates among individuals aged 18-64 and 65+ reported in the four most populated US states, California, Florida, New York and Texas. Figure 1 : Retrievable weekly COVID-19 attributable deaths in four US states. The retrievable weekly deaths are computed using the first order difference of reported cumulative deaths by the CDC (Centers for Disease Control and Prevention, 2021c). Grey cells indicate non-retrievable weekly death counts due to the censoring of cumulative deaths. Orange bars show the missing weekly deaths due to non-reported cumulative deaths. We use two additional data sets. Firstly, we retrieved the weekly COVID-19 attributable deaths regardless of age from John Hopkins University (JHU) for all US states from May 2, 2020 to September 25, 2021 (John Hopkins University of Medicine, 2020), which we denote by d JHU m,w for state m and week w. In these data, reporting-delayed deaths are back-distributed where possible, and for this reason we use the overall deaths as a reference to mitigate the reporting delays in the age-stratified CDC data (2.2). Secondly, age-specific weekly COVID-19 attributable deaths are also directly reported by state Departments of Health (DoH) on their websites, or data repositories. For California, Florida and Texas, data records are available up to April 1, 2021 at https://github.com/ ImperialCollegeLondon/US-covid19-agespecific-mortality-data. The DoH do not censor low death counts, and the all-ages sum of the DoH data correspond well to JHU data. This prompts us to use the DoH data as an independent data set to assess the accuracy of our model estimates that are derived from the CDC data. For simplicity we suppress the state index in what follows, with all equations being analogous. Our aim is to estimate the weekly deaths by 1-year age bands, a ∈ A = {0, 1, . . . , 104, 105} in week w ∈ W = {1, . . . , W }, and we denote their expectation by µ a,w . We first decompose µ a,w as the product of the weekly deaths for all ages λ w with the relative contribution π a,w of age a to weekly deaths, where a π a,w = 1 for all w. Second, we model the longitudinal age composition of deaths π a,w as a bivariate random function (a, w) → f (a, w) ∈ R, that is exponentiated, normalised, and evaluated on the 2D grid A × W. Our basic latent model structure is thus . (3.1) To link the expected weekly deaths by 1-year age band, µ a,w , to the data (2.2), we aggregate them over the age groups specified by the CDC, µ b,w = a∈b µ a,w , for all b ∈ B. Then, we model the observed, weekly deaths in age b and week w for all ages and weeks through Negative Binomial distributions in the shape-scale parameterisation, with mean µ b,w and variance σ 2 b,w , and where ν > 0 is interpreted as an overdispersion parameter. The purpose of the shape-scale parameterisation with identical θ (3.2a) is that then, the weekly deaths conditional on their total follow a Dirichlet-Multinomial distribution with parameters α b,w , resulting in the succinct identity α b,w = a∈b α a,w (Townes, 2020). For implementation, notice that the shape and scale parameters can be rewritten as α b,w = µ b,w /ν and θ = ν/(1 + ν). Our Bayesian model comprises as log likelihood the sum of the log Negative Binomial densities (3.2a) over retrievable weekly deaths parameterised by µ b,w and ν. Considering data censoring, similar terms involving log cumulative density functions of the Negative Binomial that bound sums of the non-retrievable weekly deaths are added in the log likelihood. Taken together, for the collection of weekly deaths where for brevity the term p {d b,w } w∈W WNR b | . . . in (3.3b) is detailed in Appendix S10.2. The priors are specified as follows. First, the prior on the total number of weekly deaths λ w , in (3.3c), is specified in the mean-standard deviation parametrisation. The prior expectation on the total deaths is found by summing the retrievable deaths, such that T w = b∈B,w∈W WR b d b,w . Next, we assume that the standard deviation is equal to twice the first order difference in the total deaths. We find the empirical ratio of the total death relative to their first order difference η, using ordinary least square method without intercept, and specify the standard deviation accordingly. Second, the inverse of the square root of the overdispersion parameter is given a standard normal distribution truncated on the positive support, the recommended prior in this context. Lastly, the model is completed with a prior on the random function f and its hyperparameters φ, which we investigate in detail in the next section. Considering reporting delays in the data, we explicitly allow for a rescaling of the sum of deaths across age groups in the model according to other curated data sets, which effectively re-distributes reporting delayed deaths in the CDC data set to earlier dates. Specifically, we require that our posterior predictions of the COVID-19 attributable deaths in age group a and week w, d a,w , sum to a d a,w = d JHU w . We achieve this by exploiting the probabilistic relationship between Negative Binomial distributions in the shape-scale parameterisation and the Dirichlet-Multinomial, through 4) where as before α a,w = µ a,w /ν. We denote by µ a,w the expected predictive COVID-19 attributable deaths in age group a at week w, (3.5) In (3.1), we introduced a 2D function f (a, w), for which we shall find a prior in this section. Let the number of points of the age axis be n = |A| and on the week axis be m = |W|. The total number of points on the grid is N = n × m. The ensemble of pairs of points is X = x 1 , . . . , x N = A × W. We now investigate different modelling approaches for the function f (a, w). Given observations (X, f ) = x 1 , f (x 1 ) , . . . , x N , f (x N ) , we start by considering as model of f a zero-mean 2D GP, The covariance matrix K is evaluated at all pairs of points in X and has entries K x, ) is a kernel function. For computational efficiency and because our output is on a multidimensional grid, we decompose the kernel function, where k 1 (., .) and k 2 (., .) are kernel functions over ages and weeks, respectively. The corresponding covariance matrix K is calculated with the Kronecker product K = K 2 ⊗ K 1 , and the number of operations to evaluate the covariance matrix reduces from O(N 3 ) to O(2N 3/2 ) (Gönen and Alpaydin, 2011; Saatçi, 2011) . For implementation, we note that the order of the product follows from the fact that the matrix's entries are stacked columnwise. We rely on the efficient Kronecker product implementation via matrix-vector products proposed in the Supplementary Material, Section 4 of Wilson et al. (2014) . In our applications, we set k 1 and k 2 to be squared exponential kernel functions with variance scale ζ 2 and specific lengthscales for the rows and the columns, γ 1 and γ 2 . Our priors on the length scales are independent γ i ∼ Inv-Gamma(5, 5) for i = 1, 2, and ζ ∼ Cauchy + (0, 1). For brevity, we denote φ = {ζ, γ 1 , γ 2 }. B-splines basis functions -or, more simply, B-splines -are constructed from polynomial pieces that are joined at certain values over the input space, called knots, and defined by a polynomial degree, d, and a non-decreasing sequence of knots, (t 1 ≤ t 2 ≤ · · · ≤ t K ), where K is the number of knots. Given those, the total number of B-splines is I = K + d − 1. We show how a B-spline is constructed in Appendix S9.1. In this paper, we use cubic B-splines, with d = 3. Moreover, we use equally spaced knots on both dimensions, such that the only tuning parameter is the number of knots. A fundamental property of B-splines, that constitutes most of their attractiveness, is that they are smooth. More rigorously, a cubic B-spline defined on strictly increasing knots is piecewise infinitely differentiable between the knots, and of continuity C 2 on the knots (Goldman (2002) , chap. 7). f can be modelled with a tensor product of B-splines given by is the ith B-spline defined on the knot vector (t 1 1 ≤ t 1 2 ≤ · · · ≤ t 1 K1 ) over the space A with K 1 being the number of knots, i ∈ I = {1, . . . , I} and I = K 1 + 2. Similarly, B 2 j (.) is the jth B-spline defined on the knot vector (t 2 1 ≤ t 2 2 ≤ · · · ≤ t 2 K2 ) over the space W with K 2 being the number of knots, j ∈ J = {1, . . . , J} and J = K 2 +2. The ensemble of B-splines form a matrix basis denoted by B 1 and B 2 of size I ×n and J ×m, respectively. The pairs of B-splines indices is denoted by U = u 1 , . . . , u M = I × J with M = I × J. [β u ] u∈U is a rectangular set of coefficients. In our applications, we obtain standard B-splines surfaces by placing independent standard normal priors on all B-splines coefficients β u . To keep notation streamlined, notice that there are no hyperparameters φ. Given the B-splines indices and corresponding coefficients (U , β) = (u 1 , β u1 ), . . . , (u M , β u M ) , we place a 2D GP on the coefficients, (3.9) The covariance matrix has entries (K β ) u,u = Cov β u , β u = k β u, u where k β (., .) is a kernel function depending on unknown hyperparameters φ. This approach is equivalent to directly placing a GP on f with a covariance matrix projected by B-splines. To show this, we rewrite (3.8) as a matrix calculation, f = (B 1 ) T βB 2 , and find its vectorized form (3.10) The linear operator produces functions from R IJ to R nm . GPs are closed under linear operations (Papoulis and Pillai (2002), chap.10), and so f specified by (3.8-3.9) is the In our applications, we decompose the kernel function k β into two kernel functions over the rows and columns of the B-spline coefficients matrix as in (3.7). We use squared exponential kernel functions with variance scale ζ 2 and specific lengthscales for the rows and the columns, γ 1 and γ 2 . Our priors on the length scale priors are independent γ i ∼ Inv-Gamma(5, 5) for i = 1, 2, and ζ ∼ Cauchy + (0, 1). Under this prior, φ = {ζ, γ 1 , γ 2 }. The kernel function obtained by projecting a base kernel function with cubic Bsplines is C 2 as it can be represented as a linear combination of C 2 functions (proof in Appendix S9.2). Therefore the surface obtained when modelling a surface with the B-spline tensor product (3.11), unlike a standard 2D GP (3.6), has inherent smoothness properties not matter the base kernel functions k β used. We can also interpret (3.11) as a regularised splines method. Indeed, existing regularised spline methods such as smoothing splines (O'Sullivan, 1986,9) and P-splines (Eilers and Marx, 1996; Eilers et al., 2006) aim to minimise in a frequentist setting the loss function whereβ are the estimated coefficients and P is a penalty applied on the second derivative of the fitted curve for smoothing splines and on finite differences of adjacent B-splines coefficients for P-splines. Now adopting a Bayesian approach, our aim is to maximise the probability of the posterior parameter conditional upon the data, which is proportional to the likelihood multiplied by the prior. By placing a GP on the B-splines coefficients, the prior of f modelled with the B-spline tensor product (3.11) can be decomposed in the same way as (3.12), where |A| denotes the determinant of matrix A. Thus, the log determinant of the covariance matrix in (3.9) can be interpreted as a complexity penalty, or regulariser, which comes into play if the kernel has a free parameter to control the model complexity (MacKay (2003) , chap. 28). Bayesian P-splines have been developed as an extension of the frequentist P-splines, in Lang and Brezger (2004) ; Brezger and Lang (2006) , and impose a spatial dependency on the B-splines coefficients to avoid overfitting. Bayesian P-splines are obtained by placing the Gaussian Markov Random Field prior on the B-splines coefficients, (3.14) where β −(i,j) is the matrix β without entry (i, j), δ (i,j),(i ,j ) evaluates to one if (i, j) and (i , j ) are neighboring coordinates vertically or horizontally, and otherwise to zero, and δ + (i,j) = Here, τ is a spatially varying precision parameter. This model implies that each β i,j is normally distributed with a mean equal to the average of its neighbors. We use the Bayesian P-splines prior in benchmark comparisons, due to their close structural relationship with B-splines projected GP. In our applications, we accelerated the evaluation of the likelihood by simplifying the joint likelihood to the pairwise difference formulation and we placed a sum-to-zero constrain on the parameters to ensure the identifiability of the pairwise differences as proposed by (Morris et al., 2019) . We used the prior τ ∼ Cauchy + (0, 1). To keep notation streamlined, we write φ = {τ }. All inferences using model (3.1-3.2) coupled with different priors on the random surface f were fitted with RStan version 2.21.0, using an adaptive Hamiltonian Monte Carlo (HMC) sampler (Stan Development Team, 2020). 8 HMC chains were run in parallel for 1,500 iterations, of which the first 500 iterations were specified as warm-up. Inferences in simulation analysis and benchmarking were performed similarly. Prior to application to the CDC data, we compared the performance of the spatial models defined in Section 3.2 at retrieving the mean surface of 2D count data. In the simulations, we considered observations on a 2D grid {0, 0.02 . . . , 0.98, 1}×{0, 0.02 . . . , 0.98, 1}, such that there are 50 × 50 entries. The observations are generated through a Negative Binomial model using as mean the exponential of a 2D GP with a squared exponential kernel. The variance scale of the 2D GP is fixed to 1 and the length scale is varied (γ ∈ {0.05, 0.25, 1}) to generate weakly, mildly and strongly correlated observations. The training set includes 40% uniformly sampled observations (i.e., number of observations in the training set N = 640). Here, we compare the performance of four models with different prior specifications on the mean surface, a standard 2D GP, a standard B-splines surface, Bayesian P-splines and regularised B-splines projected 2D GP. For the methods using B-splines, the same number of knots are placed along the two axes. Figure 3A shows the simulated mean surface in the mildly correlation scenario obtained with γ = 0.25, Figure 3B the simulated observations and Figure 3C mean surface when a standard B-spline surface, Bayesian P-splines, and regularised Bsplines projected 2D GP priors are used for inference with different numbers of knots. We find that the standard 2D GP and the regularised B-splines projected 2D GP recover well the simulated true mean surface (Figures 4 and Supplementary Figure S8 ). Moreover, the regularised B-splines projected 2D GP obtains equivalent or better predictive performance as the standard 2D GP for a shorter running time, 73.95% faster on average for 30 knots (Table 1 and Supplementary Table S2 ). In contrast, the standard B-splines and the Bayesian P-splines approaches overfit the data when the number of knots increase (Figures 4 and Table 1 ). Similar results were obtained for the other correlation scenarios (Figures S9-S12, Table 1 and Supplementary Table S2 ). We next benchmarked the performance of the spatial models defined in Section 3.2 on data unrelated to COVID-19, which has previously been used to benchmark spatial models. The data set consists of more than 83,000 locations across East Africa with measured deviations in land surface temperatures (Ton et al., 2018) . We use the same training data as in , consisting of 6,000 uniformly sampled locations, and fit them using a Gaussian likelihood with mean surface specified by the various spatial models, and a free observation variance parameter. We compared standard 2D Bsplines, Bayesian P-splines, regularised B-splines projected 2D GPs, low rank Gaussian Markov random fields, neural network models, and the πVAE model (Mishra et al., 2020) as a prior for the mean surface. For the spline methods, 125 equidistant knots were placed along both axes. The results are summarised in Supplementary Table S3 and Figure S13 . We find that the regularised B-splines projected 2D GP model (testing MSE: 2.96) obtained better predictive performance than the standard B-splines surface model (testing MSE: 4.45), a low rank Gaussian Markov random field (testing MSE: 4.36), and 6 Time trends of age-specific COVID-19 deaths in the US Based on the simulation and benchmark results, we used the regularised B-splines projected 2D GP model to reconstruct the time trends in age-specific COVID-19 attributable deaths across the US. We placed 12 equidistant knots over the age axis and 10 over the week axis. This choice was determined such that the predictive performance did not significantly increase with more knots. Overall, fitting the regularised B-splines projected 2D GP model took 48 hours on a high performance computing environment, with typical effective sample sizes between 342 to 22127, and there were no reported divergences in Stan's Hamiltonian Monte Carlo algorithm. Here we focus on studying the recovered trends in age-specific COVID-19 deaths during the summer 2021 resurgence in relation to vaccine coverage. Vaccinate coverage is reported by age strata that do not match the age strata in which the weekly deaths are reported by the CDC, which renders our Bayesian semi-parametric modelling approach on 1-year age groups advantageous. 6.1 Smooth estimates of age-specific COVID-19 attributable deaths over time Figure 5 shows the predicted weekly attributable COVID-19 deaths over time that occurred in the four most populated US states. The predictions for each 1-year age are aggregated into the age bands 0-54, 55-74 and 75+ for direct comparison to the CDC data, shown as black dots. The data and estimates clearly reflect the multiple COVID-19 waves across the four states, with high numbers of deaths in the elderly. The CDC data contain many anomalies, including censored and delayed entries, such as the unrealistically high number of deaths reported for the week starting on January 23, 2021 ( Figure 5 ). We used the curated all-ages weekly COVID-19 deaths records reported by JHU to re-distribute the delayed deaths to earlier weeks (6.1). The posterior median and 95% credible intervals in Figure 5 illustrate how the reporting delays apparent in the CDC data are adjusted by using the JHU data as external calibration, and how the discretized CDC data informs our estimates of the age composition of deaths. Using our model in conjunction with the curated JHU data on all COVID-19 deaths, we can estimate cumulative mortality for any age band. We find that tragically, as of September 25, 2021, the cumulative mortality rates in individuals aged 85+ now exceed 3% in Texas and New York (Supplementary Figure S14 ). We validated our estimates of the age profile of COVID-19 attributable deaths using an external data set, i.e. that was not used to inform the model. The external data set Figure 5 : Predicted age-specific COVID-19 attributable deaths in the four most populated US states. Shown are the posterior median (line) and 95% credible intervals (ribbon) of the predicted weekly COVID-19 attributable deaths obtained with (3.4). The CDC data are shown with dots. The start of the 2021 summer resurgence period is indicated as dashed vertical line. includes age-specific COVID-19 deaths reported directly by US states DoH (presented in Section 2). We computed the empirical age-specific contribution to weekly COVID-19 attributable deaths from the DoH data and checked if the latter lied inside the 95% posterior credible interval estimated by our model. We find a 93.82% prediction using the regularised B-splines projected 2D GP model across the four states, California, Florida, New York and Texas. State stratification of the prediction performance are presented in Supplementary Table S4. For completeness, we also fitted the model using alternative smoothing priors. Supplementary Section S11.1 shows the estimated age-specific composition of the COVID-19 attributable deaths in three weeks in Florida, as obtained with a standard 2D GP smoothing prior, standard 2D B-splines, Bayesian P-splines and the regularised Bsplines projected 2D GP. Moreover, Supplementary Table S4 presents the accuracy of their estimates compared to the external data set, ranging from 94.31% to 95.29%. Under the model specified with a Bayesian P-splines and standard 2D B-Splines surface, the estimated 2D surface over ages and weeks was wiggly and had a large uncertainty. Moreover, better predictive performances, quantified by the expected log pointwise predictive density Verity et al. (2020), were obtained for the B-Splines projected 2D GP compared to all other methods (Supplementary Table S5 ). Those results motivate the use of our proposed prior, the B-Splines projected 2D GP. 6.2 Strong summer 2021 resurgence in age-specific COVID-19 deaths are associated with limited vaccine coverage Since July 2021, COVID-19 cases started to increase substantially (John Hopkins University of Medicine, 2020) and with them COVID-19 deaths in all age groups ( Figure 5 ). We define the summer 2021 resurgence as the upward trend in COVID-19 attributable deaths spanning from July 03, 2021 to September 25, 2021 across the United States, and denote by w start-r m the week index corresponding to the first week included in the summer 2021 resurgence period for state m. The week index w start-r m is defined as the first week from July 01 2021 for which a 4-week central moving average on the all-age weekly COVID-19 deaths was increasing. The state-specific start of the resurgence period is indicated by a dashed vertical line in Figure 5 . As is evident from the empirical all-age data and confirmed in our reporting-adjusted age-specific estimates, the increase in COVID-19 deaths was highly heterogeneous across US states, with small increases seen in California and New York, and large increases seen in Florida and Texas, relative to the magnitude of COVID-19 deaths in the previous waves. We hypothesised that the variation in the magnitudes of the summer 2021 resurgence in COVID-19 deaths could be associated with differences in how comprehensively agespecific populations in each state took up the COVID-19 vaccine offer. To test this hypothesis, we attached as transformed parameter to the model presented in Section 3.1, age-specific predicted COVID-19 deaths relative to previous waves over the summer 2021. The relative COVID-19 deaths for state m, in age group c, at week w in summer 2021, denoted by r m,c,w , are defined as the ratio of the expected posterior predictive weekly deaths in week w to the maximum of the expected posterior predictive weekly deaths attained before the summer 2021 resurgence period in the same state and age group, for c ∈ C and w ≥ w start-r m , where µ m,c,w are the expected posterior predictive weekly deaths for state m, in age group c, at week w defined in (3.5), and µ ,max-pre-r m,c = max {µ m,c,w } w