key: cord-0263471-ti8yhrux
authors: Gressani, Oswaldo; Faes, Christel; Hens, Niel
title: Laplacian-P-splines for Bayesian inference in the mixture cure model
date: 2021-03-02
journal: nan
DOI: 10.1002/sim.9373
sha: aaaa66ed751a5881db21abd9d1d0adfe472b392b
doc_id: 263471
cord_uid: ti8yhrux

The mixture cure model for analyzing survival data is characterized by the assumption that the population under study is divided into a group of subjects who will experience the event of interest over some finite time horizon and another group of cured subjects who will never experience the event irrespective of the duration of follow-up. When using the Bayesian paradigm for inference in survival models with a cure fraction, it is common practice to rely on Markov chain Monte Carlo (MCMC) methods to sample from posterior distributions. Although computationally feasible, the iterative nature of MCMC often implies long sampling times to explore the target space with chains that may suffer from slow convergence and poor mixing. An alternative strategy for fast and flexible sampling-free Bayesian inference in the mixture cure model is suggested in this paper by combining Laplace approximations and penalized B-splines. A logistic regression model is assumed for the cure proportion and a Cox proportional hazards model with a P-spline approximated baseline hazard is used to specify the conditional survival function of susceptible subjects. Laplace approximations to the conditional latent vector are based on analytical formulas for the gradient and Hessian of the log-likelihood, resulting in a substantial speed-up in approximating posterior distributions. The statistical performance and computational efficiency of the proposed Laplacian-P-splines mixture cure (LPSMC) model is assessed in a simulation study. Results show that LPSMC is an appealing alternative to classic MCMC for approximate Bayesian inference in standard mixture cure models. Finally, the novel LPSMC approach is illustrated on three applications involving real survival data.

rely on Markov chain Monte Carlo (MCMC) methods to sample from posterior distributions. Although computationally feasible, the iterative nature of MCMC often implies long sampling times to explore the target space with chains that may suffer from slow convergence and poor mixing. Furthermore, extra efforts have to be invested in diagnostic checks to monitor the reliability of the generated posterior samples. An alternative strategy for fast and flexible sampling-free Bayesian inference in the mixture cure model is suggested in this paper by combining Laplace approximations and penalized B-splines. A logistic regression model is assumed for the cure proportion and a Cox proportional hazards model

Survival analysis is a challenging, yet very attractive area of statistical science that is devoted to the study of time-to-event data. Standard models for survival data typically leave no room for the existence of a cure fraction such that it is implicitly assumed that all subjects of the population under study will experience the event of interest as time unfolds for a sufficiently long period. Technological breakthroughs in medicine during the last decades, especially in cancer research, have led to the development of promising new treatments and therapies so that many diseases previously considered fatal can now be cured. This phenomenon has triggered the necessity to develop models that allow for long-term survivors and gave birth to cure models. A recent complete textbook treatment on cure models is proposed by Peng and Yu (2020) and an enriching literature review on cure regression models has been written by Amico and Van Keilegom (2018) . Among the large family of cure models that have emerged, the mixture cure model driven by the seminal work of Boag (1949) ; Berkson and Gage (1952) ; Haybittle (1965) and later refined by Farewell (1977a Farewell ( , 1982 is probably the most prominent as its mathematical formulation allows for a clear and interpretable separation of the population in two categories, namely cured subjects and susceptible subjects who are at risk of experiencing the event of interest.

Let T ∈ [0, +∞) be a continuous random variable representing the survival time. Existence of a cure proportion in the population under study is made possible by allowing the event {T = +∞} to arise with positive probability. To include covariate information, denote by X and Z random covariate vectors (with continuous and/or discrete entries) that belong to covariate spaces X and Z, respectively. In a mixture cure model, the population survival function expresses the separation between the cured and uncured subpopulations as follows:

with covariate vectors x = (x 1 , . . . , x p ) ∈ X and z = (z 1 , . . . , z q ) ∈ Z that can share (partially) the same components or can be entirely different. The term p(x) is frequently called the "incidence" of the model and corresponds to the conditional probability of being uncured, i.e. p(x) = P (B = 1|X = x) with binary variable B = I(T < +∞) referring to the (unknown) susceptible status and I(·) the indicator function, i.e. I(E) = 1 if condition E is true. The term S u (t|z) is known as the "latency" and represents the conditional survival function of the uncured subjects S u (t|z) = P (T > t|B = 1, Z = z). The logistic link is commonly employed to establish a functional relationship between the probability to be uncured and the vector x (Farewell, 1977b; Ghitany et al., 1994; Taylor, 1995) , so that p(x) = exp(β 0 + x β)/(1 + exp(β 0 + x β)), with regression coefficients β = (β 1 , . . . , β p ) and β 0 an intercept term. The latency part is often specified in a semiparametric fashion by using the Cox (1972) proportional hazards (PH) model (see e.g. Kuk and Chen, 1992; Peng and Dear, 2000) and implies the following form for the survival function of the susceptibles S u (t|z) = S 0 (t) exp(z γ) , where γ = (γ 1 , . . . , γ q ) are the regression coefficients pertaining to the latency part and S 0 (·) is the baseline survival function.

The philosophy underlying Bayesian approaches considers that the model parameters are random and that their underlying uncertainty is characterized by probability distributions. After obtaining data, Bayes' theorem acts as a mechanistic process describing how to update our knowledge and is essentially the key ingredient permitting the transition from prior to posterior beliefs. Unfortunately, the complexity of mixture cure models is such that the posterior distribution of latent variables of interest are not obtainable in closed

form. An elegant stochastic method that is widely used in practice is Markov chain Monte Carlo (MCMC) as it allows to draw random samples from desired target posterior distributions and hence compute informative summary statistics. According to Greenhouse and Wasserman (1996) , the first steps of Bayesian methods applied to mixture models with a cure fraction date back as far as Chen et al. (1985) to analyze cancer data. Later, Stangl (1991) and Stangl and Greenhouse (1998) used a Bayesian mixture survival model to analyze clinical trial data related to mental health. The end of 1990s saw the emergence of Bayesian approaches in the promotion time cure model (Chen et al., 1999) , another family of cure models motivated by biological mechanisms that does not impose a mixture structure on the survival. Some references for Bayesian analysis in the latter model class are Yin and Ibrahim (2005) , who proposed a Box-Cox based transformation on the population survival function to reach a unified family of cure rate models embedding the promotion time cure model as a special case; Bremhorst and Lambert (2016) used Bayesian P-splines with MCMC for flexible estimation in the promotion time cure model and Gressani and Lambert (2018) suggested a faster alternative based on Laplace approximations. More recent uses of Bayesian methods in mixture cure models are Yu and Tiwari (2012) in the context of grouped population-based cancer survival data or Martinez et al. (2013) (Rue et al., 2009 ) and modal Gibbs sampling (Gomez-Rubio, 2017) . In this article, we propose a new approach for fast approximate Bayesian inference in the mixture cure model based on the idea of Laplacian-P-splines (Gressani and Lambert, 2018) . The proposed Laplacian-P-splines mixture cure (LPSMC) model has various practical and numerical advantages that are worth mentioning. First, as opposed to Lázaro et al. (2020) , our approach is completely samplingfree in the sense that estimation can be fully reached without the need of drawing samples from posterior distributions. This of course implies a huge gain from the computational side, without even mentioning the additional speed-up effect implied by the analytically available gradient and Hessian of the log-likelihood function in our Laplace approximation scheme. Second, the LPSMC approach delivers approximations to the joint posterior latent vector, while the INLA scheme concentrates on obtaining approximated versions of the marginal posterior of latent variables. A direct positive consequence is that with LPSMC, the "delta" method can be used to compute (approximate) credible intervals for functions of latent variables, such as the cure proportion or the survival function of the uncured, in virtually no time. A third beneficial argument is that the use of P-splines is particularly well adapted in a Bayesian framework and provide smooth estimates of the survival function. Finally, our approach and its associated algorithms are explicitly constructed to fit mixture cure models contrary to INLA that cannot fit such models originally (Lázaro et al., 2020) .

The article is organized as follows. In Section 2, the spline specification of the logbaseline hazard is presented and the Bayesian model is formulated along with the prior assumptions. Laplace approximations to the conditional latent vector are derived and an approximate version of the posterior penalty parameter is proposed. The end of Section 2 is dedicated to the construction of approximate credible intervals for (functions of) latent variables. Section 3 aims at assessing the proposed LPSMC methodology in a numerical study with simulated data under different cure and censoring scenarios. Section 4 is dedicated to three real data applications and Section 5 concludes the article.

2 The Laplacian-P-spline mixture cure model

We consider that the survival time T is accompanied by the frequently encountered feature of random right censoring. Rather than observing T directly, one observes the pair (T obs , τ ),

where T obs = min(T, C) is the follow-up time and τ = I(T ≤ C) is the event indicator (τ = 1 if the event occurred and τ = 0 otherwise) and C is a non-negative random censoring time that is assumed conditionally independent of T given the covariates, i.e. C ⊥ T |X, Z. At the sample level, D i = (t i , τ i , x i , z i ) denotes the observables for the ith unit, with t i the realization of T obs and τ i its associated event indicator. Vectors x i and z i represent the observed covariate values of subject i and the entire information set available from data with sample size n is denoted by D = n i=1 .

A flexible spline specification of the (log) baseline hazard function h 0 (·) is proposed (Whittemore and Keller, 1986; Rosenberg, 1995) using a linear combination of cubic B-splines,

i.e. log h 0 (t) = θ b(t), where θ = (θ 1 , . . . , θ K ) is a K−dimensional vector of B-spline amplitudes and b(·) = (b 1 (·), . . . , b K (·)) is a cubic B-spline basis constructed from a grid of equally spaced knots in the closed interval I = [0, t u ], with t u the largest observed followup time. Partitioning I into J (say 300) sections of equal length ∆ with midpoint s j , the Riemann midpoint rule is used to approximate the analytically unsolvable baseline survival function:

where j(t) ∈ {1, . . . , J} is an integer enumerating the interval that includes time point t.

The latent vector of the model is ξ = (θ ,β , γ ) and contains the spline vector θ, the vector of regression coefficients belonging to the incidence part (including the intercept) β = (β 0 , β ) and the vector of remaining regression coefficients belonging to the latency part γ, with dimension dim(ξ) = K + (p + 1) + q. Based on the idea of Eilers and Marx (1996) , we fix K large enough to ensure flexible modeling of the baseline hazard curve and counterbalance the latter flexibility by imposing a discrete penalty on neighboring Bspline coefficients based on finite differences. In a Bayesian translation (Lang and Brezger, 2004) , the prior distributional assumption on the B-spline vector is taken to be Gaussian θ|λ ∼ N dim(θ) 0, (λP ) −1 , with a covariance matrix formed by the product of a roughness penalty parameter λ > 0 and a penalty matrix P = D r D r + εI K obtained from rth order difference matrices D r of dimension (K − r) × K. An ε-multiple of the K−dimensional identity matrix I K is added to ensure full rankedness (Lambert, 2011) with typical values for the scalar perturbation being ε = 10 −6 (Eilers and Marx, 2021) or ε = 10 −5 (Alston et al., 2013) . Furthermore, a Gaussian prior is imposed on the remaining regression coefficients, with zero mean and small (common) precision ζ = 10 −6 , resulting in the following proper (conditional) prior for the latent vector ξ|λ ∼ N dim(ξ) (0, Σ ξ (λ)) with covariance matrix:

The prior precision matrix of the latent vector is denoted by Q ξ (λ) = Σ −1 ξ (λ). For full Bayesian treatment, we impose a Gamma prior with mean a λ /b λ and variance a λ /b 2 λ on the roughness penalty parameter λ ∼ G(a λ , b λ ). Fixing a λ = 1 and b λ = 10 −5 (see e.g.

Ç etinyürek Yavuz and Lambert, 2011) yields a large variance and hence reflects a minimally informative prior for λ. Other prior specifications are also available (see e.g. Jullion and Lambert, 2007; Ventrucci and Rue, 2016 ).

In a mixture cure model, the full likelihood is given by (see e.g. Sy and Taylor, 2000) :

Using the Cox PH model specification for the survival function of the susceptibles, one recovers:

It follows that the log-likelihood is:

Using the B-spline approximations for the baseline quantities, we get:

Let us denote by g i (ξ) the contribution of the ith unit to the log-likelihood:

so that the log-likelihood can be compactly written as (ξ; D) ≈ n i=1 g i (ξ). Using Bayes' theorem, the conditional posterior of the latent vector is (up to a proportionality constant):

( 2) A second-order Taylor expansion of the log-likelihood yields a quadratic form in the latent vector ξ and hence can be used to obtain a Laplace approximation to (2) as shown in

In what follows, we denote by p G (ξ|λ, D) = N dim(ξ) (ξ * (λ), Σ * ξ (λ)) the Laplace approximation to p(ξ|λ, D) for a given value of λ.

The conditional posterior in (2) is a function of the penalty parameter λ. In a frequentist setting, an "optimal" value for λ is generally obtained by means of the Akaike information criterion (AIC) or (generalized) cross-validation. From a Bayesian perspective, λ is random and its associated posterior distribution is of crucial importance for optimal smoothing.

Mathematically, the posterior of λ is given by:

Using the Laplace approximation to p(ξ|λ, D) and replacing the latent vector by its modal value ξ * (λ) from the Laplace approximation, the marginal posterior in (3) is approximated in the spirit of Tierney and Kadane (1986) :

For numerical reasons it is more appropriate to work with the log transformed penalty parameter v = log(λ) as the latter is unbounded. Using the transformation method for random variables, one obtains the following approximated (log) posterior for v:

where= denotes equality up to an additive constant. Approximation (4) provides a good starting point for various strategies to explore the posterior penalty space. A possibility is to use grid-based approaches (Rue et al., 2009; Gressani and Lambert, 2018) or MCMC algorithms (Yoon and Wilson, 2011; Gressani and Lambert, 2021) as often encountered in models with a multidimensional penalty space. In latent Gaussian models, the posterior penalty typically satisfies suitable regularity conditions such as unimodality (Gómez-Rubio and Rue, 2018) and not a "too-large" deviation from Gaussianity. This suggests to use a simple and yet efficient type of bracketing algorithm to compute the (approximate) posterior mode v * of log p(v|D). Starting with an arbitrarily "large" value, say v 0 = 15, the algorithm moves in the left direction with a fixed step size δ, i.e. at the mth

Movement in the left direction continues until reaching v m , the point at which the target Figure 1 illustrates the normalized approximate posterior to p(v|D) from a simulated example along with the modal value (dashed line) obtained with a step size δ = 0.01.

The Laplace approximation to the conditional posterior of the latent vector evaluated at the (approximated) modal posterior value v * (cf. Section 2.4) is denoted by

and a point estimate for ξ is taken to be the mean/mode ξ * (v * ) with associated variance-covariance matrix Σ * ξ (v * ). To ensure that the estimated baseline survival function S 0 (·) "lands" smoothly on the horizontal asymptote at 0 near the end of the follow-up, we constrain the last B-spline coefficient by fixing θ K = 1. A major advantage of LPSMC is that credible intervals for (differentiable functions of) latent variables can be straightforwardly obtained starting from p G (ξ|v * , D). 

where ξ * h is the hth entry of the vector ξ * (v * ) and σ 2 * ξ h is the hth entry on the main diagonal of Σ * ξ (v * ). It follows that a quantile-based (1 − α) × 100% credible interval for ξ h is:

where z α/2 is the α/2−upper quantile of a standard normal variate.

Credible interval for the incidence p(x) and cure rate 1 − p(x)

The incidence of the mixture cure model is a function of the latent vectorβ and hence an appropriate approach to derive credible intervals for p(x) is through using a "delta" method.

In particular, let us consider the following differentiable function of the probability to be

, where x is a known profile of the covariate vector. The Laplace approximated posterior to vectorβ is known

. The delta method operates via a first-order Taylor expansion of g(β|x) aroundβ * :

with gradient:

Note that g(β|x) in (5) is still Gaussian as it is a linear combination of a random vectorβ that is a posteriori (approximately) Gaussian due to the Laplace approximation with mean E(g(β|x)) ≈ g(β * |x) and covariance matrix V(g(β|x)) ≈ ∇ g(β|x)|β =β * Σ * β ∇g(β|x)|β =β * . This suggests to write the approximated posterior of g(β|x) as:

and so an approximate quantile-based (1 − α) × 100% credible interval for g(β|x) is:

Multiplying the values in the interval (6) by exp(− exp(·)) gives us the desired credible interval for the incidence p(x). If a credible interval for the cure rate 1 − p(x) is required, simply use the transformation g(β|x) = log −log(1−p(x)) = log log(1+exp(β 0 +x β)) with gradient ∇g(β|x) = p(x)/ log(1 + exp(β 0 + x β)) (1, x ) .

Credible interval for S 0 (·) and S u (·|z)

Let us denote by t q the qth quantile of the distribution of the survival time T at base-

The "delta" method can also be used to compute an approximate credible interval for S 0 (·) at t q by using a log(− log(·)) transformation

, one can show that the resulting credible interval for g(θ|t q ) is:

where ∇g(θ|t q )| θ=θ * is the gradient of g(θ|t q ) with respect to θ evaluated at θ * and can be found in Gressani and Lambert (2018) Appendix C. Applying the inverse transformation

The same approach is used to construct credible intervals for the survival function of the

Applying the log(− log(·)) transform yields g(θ, γ|t q , z) = z γ+log

The resulting credible interval for g(θ, γ|t q , z) is:

CI g(θ,γ|tq,z) = g(θ * , γ * |t q , z) ± z α/2 ∇ g(θ, γ|t q , z)| θ=θ * ,γ=γ * Σ * θ,γ ∇g(θ, γ|t q , z)| θ=θ * ,γ=γ * , (8) where Σ * θ,γ is the covariance matrix of the vector (θ , γ ) obtained from the Laplace approximation. Finally, an exp(− exp(·)) transform on (8) gives a (1 − α) × 100% credible interval for S u (t|z) at t q for a given covariate vector z.

To measure the statistical performance of the LPSMC methodology in a mixture cure model setting, we consider a numerical study where survival data is generated according to different cure and censoring rates. Generation of survival data for the ith subject is as follows. The incidence part is generated from a logistic regression function with two covariates p(

variate and X i2 ∼ Bern(0.5) is a Bernoulli random variable. The cure status is generated as a Bernoulli random variable with failure probability p(X i ), i.e. B i ∼ Bern(p(X i )).

Survival times T i for the uncured subjects (B i = 1) are obtained from a Weibull Cox proportional hazards model and are truncated at τ 0 = 8. The latency is given by S u (t i |Z i ) = exp(−νt i exp(γ 1 Z i1 + γ 2 Z i2 )), with scale parameter ν = 0.25 and shape = 1.45. The covariate vector Z i = (Z i1 , Z i2 ) is independent of X i . We assume that Z i1 follows a standard Gaussian distribution and Z i2 ∼ Bern(0.4). For the cured subject (B i = 0), the theoretically infinite survival times are replaced by a large value, say T i = 20, 000. The censoring time C i is independent of the vector (X i , Z i , T i ) and is generated from an exponential distribution with density function f (c i ) = µ c exp(−µ c c i ) truncated at τ 1 = 11.

We consider samples of sizes n = 300 and n = 600 and simulate survival data with two scenarios for the coefficients β, γ and µ c , yielding different censoring and cure rates. In both scenarios (almost) all the observations in the plateau of the Kaplan and Meier (1958) estimator are cured, such that the simulated data are representative of the practical real case scenarios for which mixture cure models are used. Intel Xeon E-2186M processor at 2.90GHz. Figure 2 shows the estimated baseline survival curves (gray), the target (solid) and the pointwise median of the 500 curves (dashed) for the different scenarios.

Another performance measure for the incidence of the model is obtained by computing

The latter quantity is computed on triplets of covariate values x m = (1, x m1 , x m2 ) for Coverage probability of the 90% and 95% (approximate) credible interval for the incidence p(x) (and cure rate 1 − p(x)) at the mean covariate profilex = (1, 0, 0.5) as computed from (6) have also been measured and are close to their nominal value in all scenarios.

Scenario 1, n=300

Scenario 2, n=300

Scenario 1, n=600 Scenario 2, n=600 Table 2 : Numerical results for S = 500 replications of sample size n = 300 and n = 600 under two different cure-censoring scenarios.

In Table 3 Table 3 : Estimated coverage probability of 90% and 95% credible intervals for S 0 (·) and S u (·|z = (0, 0.4) ) with S = 500 replications of sample size n = 300 and K = 30 B-splines.

We start by applying the LPSMC methodology to a well-known dataset in the cure literature and compare the estimates with the ones obtained using the benchmark smcure package (Cai et al., 2012) . The data comes from the Eastern Cooperative Oncology Group (ECOG) phase III two-arm clinical trial with n = 284 observations after removing missing data.

The aim of the study was to assess whether Interferon alpha-2b (IFN) had a significant effect on relapse-free survival. There were 144 patients receiving the treatment and the remaining patients (140) A plateau is clearly visible indicating the potential presence of a cure fraction, so that a mixture cure model is appropriate to fit this type of data. 

The second dataset used to illustrate the LPSMC methodology concerns n = 286 patients with breast cancer studied in Wang et al. (2005) . Survival time (in days) is defined as the distant-metastasis-free survival (DMFS), i.e. time to occurrence of distant metastases or death (whatever happens first) and the event of interest is distant-metastasis. The data is obtained from the breastCancerVDX package (Schroeder et al., 2020) We use K = 20 cubic B-splines in [0, t u ], with t u = 5201 days, the largest follow-up time.

The Kaplan-Meier estimate of the data given in Figure 5 emphasizes the existence of a plateau and motivates the use of a mixture cure model. The estimated latent variables with LPSMC are reported in Table5. We see that AGE and ER have no significant effect on the probability of being uncured (incidence) at significance level 0.05. However, the latter two covariates significantly affect the survival of the uncured. Table 5 : Estimation results for the breast cancer data with LPSMC. The first column indicates the model part, the second and third columns the parameter and its posterior (mean) estimate. The fourth column is the posterior standard deviation of the estimate and the last column the 95% credible interval.

For instance, a subject that is susceptible to experience metastasis with ER = 1 has a risk of experiencing the event that is 1/ exp(−1.063) = 2.90 times smaller than the risk with ER = 0. In Figure 6 the estimated survival for the susceptible subjects is shown for two age categories (30 years and 50 years) with ER = 1. 

In a third application, we use LPSMC to investigate the impact of age on survival of Covid-19 patients. The data comes from the Ziekenhuis Netwerk Antwerpen (ZNA, Belgium), a network of hospitals in the province of Antwerp. The considered dataset has n = 3258 patients entering hospitals between March 2020 and April 2021. The follow-up time is defined as the total number of days spent in hospital and receiving COVID-19 care. The outcome of interest is in-hospital death due to Covid-19 as indicated by a binary variable (1 = Dead; 0 = Alive). Among the 3258 patients, 461 (14.15%) experienced in-hospital death and the remaining 2797 (85.85%) are censored due to causes unrelated to the outcome of interest (uninformative censoring). The Kaplan-Meier curve is given in Figure 7 and highlights a wide plateau around 0.52, suggesting the existence of a cure fraction and hence motivating a cure model analysis with LPSMC. The age of patients is taken as a covariate both in the incidence and latency part of the model. The mean age is 66 years and the youngest, respectively oldest patient is 1 and 103 years old. We use K = 15 cubic B-splines between 0 and the largest follow-up time (t u = 98.88) along with a third order penalty.

x 

Approximate Bayesian inference methods are an interesting alternative to classic MCMC algorithms, especially when the latter require long computation times, as can be the case for models with complex likelihoods. In survival analysis, the mixture cure model is an interesting class of models that allows for the existence of a cure fraction and hence goes beyond classic proportional hazards model for which the feature of long-term survivors is absent. In this article, we propose a new approach for fast Bayesian inference in the standard mixture cure model by combining the strength of Laplace approximations to selected posterior distributions of latent variables and P-splines for flexible modeling of baseline smooth quantities. The attractiveness of the LPSMC approach lies in its completely sampling-free framework, with an analytically available gradient and Hessian of the log-likelihood that makes the approach extremely fast from a computational viewpoint. This computational advantage and the relatively straightforward possibility to derive (approximate) credible intervals for functions of latent variables makes it a promising tool for fast analysis of survival data with a cure fraction. A possible interesting extension for future research would be to generalize the LPSMC approach at the level of the incidence with alternative specifications to the standard logistic regression model. For instance, LPSMC could be extended to the single-index/Cox mixture cure model of Amico et al. (2019) . Finally, Laplacian-P-splines can also be extended in cure models with frailties or in the context of competing risks survival data with a cure fraction.

R code for the simulation scenarios in Section 3 and the ECOG and breast cancer data applications in Section 4 are publicly available on https://github.com/oswaldogressani/ LPSMC.

The authors declare no conflicts of interest.

Recall that the contribution of the ith unit to the log-likelihood is given by:

and using the definition of the population survival function, one has:

A second-order Taylor expansion of g i (ξ) around and arbitrary point ξ (0) is given by:

where c is a constant that does not depend on ξ. The gradient ∇g i (ξ)| ξ=ξ (0) and Hessian ∇ 2 g i (ξ)| ξ=ξ (0) are given by:

To simplify the notation, we define the following quantities:

We first derive the gradient and start with the partial derivatives with respect to the spline coefficients:

Note that:

It follows that:

To obtain the derivatives with respect to theβ coefficients, let us first compute:

It follows that:

for m = 0, . . . , p with x i0 = 1.

Derivatives with respect to the γ coefficients are:

To compute the Hessian, we only require the main diagonal and the upper triangular parts (Blocks 12, 13 and 23 below). The lower triangular part is obtained by symmetry.

Block11 : ∂ 2 ∂θ k ∂θ l g i (ξ) k = 1, . . . , K l = 1, . . . , K.

Block12 : ∂ 2 ∂θ k ∂β m g i (ξ) k = 1, . . . , K m = 0, . . . , p.

Block13 : ∂ 2 ∂θ k ∂γ s g i (ξ) k = 1, . . . , K s = 1, . . . , q.

Block22 : ∂ 2 ∂β m ∂β l g i (ξ) l = 0, . . . , p m = 0, . . . , p.

Block23 : ∂ 2 ∂β m ∂γ s g i (ξ) m = 0, . . . , p s = 1, . . . , q.

Block33 : ∂ 2 ∂γ s ∂γ v g i (ξ) s = 1, . . . , q v = 1, . . . , q.

Let us first define the function:

It follows that:

k, l = 1, . . . , K,

where ∂f i (ξ) ∂θ l = − exp(z i γ − exp(z i γ)ω 0i ) exp(z i γ)ω l 0i ω k 0i + exp(z i γ − exp(z i γ)ω 0i )ω kl 0i = exp(z i γ − exp(z i γ)ω 0i )ω kl 0i − f i (ξ) exp(z i γ)ω l 0i .

and ∂ ∂θ l S p (t i |x i , z i ) = −p(x i ) exp z i γ − exp(z i γ)ω 0i ω l 0i .

Block12

Define the following function:

f i (ξ) = p(x i )(1 − p(x i )) exp(− exp(z i γ)ω 0i ) − 1 .

The second-order derivatives are:

Define the following function:

The second-order partial derivative is:

. . , p, s = 1, . . . , q, with ∂f i (ξ) ∂γ s = − exp(z i γ − exp(z i γ)ω 0i )z is ω 0i .

Define the following function:

f i (ξ) = exp(z i γ − exp(z i γ)ω 0i )

The second-order partial derivative is:

Laplace approximation

Define the short notation n i=1 ∇g i (ξ)| ξ=ξ (0) := ∇g ξ (0) and n i=1 ∇ 2 g i (ξ)| ξ=ξ (0) := ∇ 2 g ξ (0) and use (9) to write the second-order Taylor expansion of the log-likelihood (omitting constant c) as:

≈ ξ ∇g ξ (0) − ∇ 2 g ξ (0) ξ (0) + 1 2 ξ ∇ 2 g ξ (0) ξ.

Plugging the approximated log-likelihood (11) into the conditional posterior of the latent vector (2) yields:

The logarithm of (12) is thus:

Taking the gradient of (13) and equating to the zero vector yields:

ξ (1) (λ) = Q ξ (λ) − ∇ 2 g ξ (0) −1 ∇g ξ (0) − ∇ 2 g ξ (0) ξ (0) .

The inverse of the negative Hessian of (13) yields:

Finally, the Laplace approximation to the conditional posterior latent vector is written as:

ξ (λ) .

One can iterate this in a Newton-Raphson type algorithm to obtain a Laplace approximation to the conditional latent vector p G (ξ|λ, D) = N dim(ξ) (ξ * (λ), Σ * ξ (λ)), where ξ * (λ) and Σ * ξ (λ) denotes the mean vector and covariance matrix respectively towards which the Newton-Raphson algorithm has converged for a given value of λ.

Case Studies in Bayesian Statistical Modelling and Analysis

Cure models in survival analysis

The single-index/Cox mixture cure model

Survival curve for cancer patients following treatment

Maximum likelihood estimates of the proportion of patients cured by cancer therapy

Flexible estimation in cure survival models using Bayesian P-splines

smcure: An R-package for estimating semiparametric mixture cure models

Smooth estimation of survival functions and hazard ratios from interval-censored data using Bayesian penalized B-splines

A new Bayesian model for survival data with a surviving fraction

Bayesian analysis of survival curves for cancer patients following treatment, Bayesian statistics 2: Proceedings of the 2nd valencia international meeting

A SAS macro for parametric and semiparametric mixture cure models

Regression models and life-tables

Practical Smoothing: The Joys of P-splines

Flexible smoothing with B-splines and penalties

The combined effect of breast cancer risk factors

A model for a binary variable with time-censored observations

The use of mixture models for the analysis of survival data with long-term survivors

Exponential mixture models with longterm survivors and covariates

Mixture model fitting using conditional models and modal Gibbs sampling

Markov chain Monte Carlo with the Integrated Nested Laplace Approximation

A practical robust method for Bayesian model selection: a case study in the analysis of clinical trials (with discussion)

Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines

Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines

A two-parameter model for the survival curve of treated cancer patients

Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models

Nonparametric estimation from incomplete observations

A mixture model combining logistic regression with proportional hazards regression

Smooth semiparametric and nonparametric Bayesian estimation of bivariate densities from bivariate histogram data

Bayesian P-splines

Approximate Bayesian inference for mixture cure models

Cure models in oncology clinical trials

Mixture and nonmixture cure fraction models based on the generalized modified weibull distribution with an application to gastric cancer data

A nonparametric mixture model for cure rate estimation

Cure Models: Methods, Applications, and Implementation

Hazard function estimation using B-splines

Approximate Bayesian inference for latent Gaussian models by using Integrated nested Laplace approximations

breastCancerVDX: Gene expression datasets published by

Modeling heterogeneity in multi-center clinical trials using Bayesian hierarchical survival models. Doctoral dissertation

Assessing placebo response using Bayesian hierarchical survival models

Estimation in a Cox proportional hazards cure model

Semi-parametric estimation in failure time mixture models

Accurate approximations for posterior moments and marginal densities

Penalized complexity priors for degrees of freedom in Bayesian P-splines

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer

Survival estimation using splines

Cure rate models: a unified approach

Inference for latent variable models with many hyperparameters

A Bayesian approach to mixture cure models with spatial frailties for population-based cancer relative survival data

This project is funded by the European Union's Research and Innovation Action under the H2020 work programme, EpiPose (grand number 101003688). The authors wish to thank the Ziekenhuis Netwerk Antwerpen for granting access to the Covid-19 hospitalization data.