key: cord-0807195-e8k1l2cz authors: Rehman, H.; Chandra, N.; Jammalamadaka, S. Rao title: Competing risks survival data under middle censoring—An application to COVID-19 pandemic date: 2021-09-28 journal: Healthcare Analytics DOI: 10.1016/j.health.2021.100006 sha: 27bcf40c2a273e86ddd4d058ede535203f458071 doc_id: 807195 cord_uid: e8k1l2cz Survival data is being analyzed here under the middle censoring scheme, using specifically quantile function modelling under competing risks. The use of middle censoring scheme has been shown to be very appropriate under the COVID-19 pandemic scenario. Cause-specific quantile inference under middle censoring is employed. Such quantile inferences are obtained through cumulative incidence function based on cause-specific proportional hazards model. The baseline lifetime is assumed to follow a very general parametric model namely the Weibull distribution, and is independent of the censoring mechanism. We obtain estimates of the unknown parameters and cause specific quantile functions under classical as well as a Bayesian set-up. A Monte Carlo simulation study assesses the relative performance of the different estimators. Finally, a real life data analysis is given applying the proposed methods. as interval event times, and it is important to include such exact and interval event times in any analysis to obtain an accurate estimate of the patients' survival after surgery. This is exactly the context and type of data where the idea of "middle-censoring scheme" [10] is most appropriate and which we investigate. Censoring is a key feature of survival analysis in statistics, and it is commonly used when the exact lifetimes of individuals are not known except for a few in the study. Jammalamadaka, S Rao and Mangalam [10] introduced a modern concept of censoring scheme known as middle censoring, which has received considerable attention in the statistical literature. In this censoring scheme the exact lifetimes of some individuals are observed while for others, it becomes unobservable as they fall in some random censoring intervals. This censoring scheme arises prominently in time to event analysis in clinical trial studies. Moreover, middle censoring also occurs when the mechanism where the observations are being taken, is closed for a period, due to an external emergency such as the outbreak of disease, war or a strike. In the current scenario of COVID-19 pandemic, the need for the use and application of middle censoring scheme is highly relevant. For example, in a breast cancer clinical trial centre, patients are registered with node-negative breast cancer. Some of the patients may receive the tamoxifen-alone arm and remaining patients received the combination of radiation and tamoxifen arm. Investigators may be interested to observe events such as local relapse, auxiliary relapse, remote relapse, second malignancy of any kind, and death. After the surgery patients are discharged and they are instructed that they have to come to the hospital for routine check-up and follow-up. But in the pandemic situation, many countries declared a nationwide complete lock-down so that many patients fail to get routine check-ups, and the investigators may also lose the follow-up on these patients. In the interim, some of the patients may experience an event of interest. In this situation the exact time of the event occurrence can not be observed for some patients except for noting the interval for the event-time, and learning on later inspection that the actual event has occurred during this interval. Thus, middle censoring is highly relevant with immediate application in real life. To define middle censoring, let 1 , 2 , … , and [ 1 , 1 ], [ 2 , 2 ], … , [ , ] be the lifetimes and random censoring intervals respectively of the individuals who are under observation. Intervals [ , ], = 1, 2, … , are independent and identically distributed (i.i.d.) with some unknown bivariate distribution (⋅, ⋅) and they are independent of . Under the notion of the middle censoring, lifetime becomes observable if ∉ [ , ] with Pr( < ) = 1, otherwise unobservable. Middle censored competing risks data with exponential distribution was studied by Ahmadi et al. [2] and Abuzaid et al. [1] . See also the references cited therein. A single long-term survivor may have a major impact on mean life in survival study, particularly in the case of heavy tailed models, which are widely used for lifetime data. This type of situation is commonly encountered when the subjects can experience types of mutually exclusive competing risks of death/event. For example, in a cancer clinical trial, the primary risk of concern may be a full or partial reaction to therapy, with death as a competing risk, and death may be attributed due to various risks such as cardiac arrest, corona virus infection etc.. Similarly, in liver transplantation an individual can experience one of the three possible outcomes such as death, transplantation and withdrawal from the waiting list. In modelling of survival data with competing risks, two basic quantities such as cumulative incidence function (CIF) and cause specific hazard function (CSHF) get considerable attention in the statistical literature, see Kalbfleisch and Prentice [11] . The CIF represents the cumulative probability of failure due to cause up to a certain time point , conditional on × 1 vector of covariates ∈ ℝ , which is described as follows where is the time to failure, = , ∈ {1, 2, … , } is the cause of failure. The CSHF simply gives the instantaneous failure rate from the cause at time among the individuals who survives up to . Mathematically, CSHF is defined as Although, in general the distribution function ( ) reaches 1 as → ∞, in the presence of competing risks, the asymptote of the CIF is less than 1, implying that the proportion of the ( ; ) due to cause increases up to some time point, and then plateaus. Therefore, obtaining the mean survival time does not make much sense because it will always be infinite. Hence, in such a situation, quantile-based estimates, which are finite and may be identifiable from observed data are generally found to be more precise and robust. These measures can be used for summarizing a CIF curve. A good discussion of the various theoretical aspects of the quantile function may be found in [19] . Let ( ) be a right-continuous distribution function of a random variable . The quantile function of , say ( ) is defined by The quantile function has a number of unique features that a distribution function does not have. For example, the sum of the quantiles, product of the positive quantiles, and the monotonic transformation of the quantile functions are also quantile functions. These properties make the quantile function a preferred alternative to the distribution function in statistical modelling. The quantile function is gaining popularity as a comprehensive tool for statistical analysis of lifetime data. Quantiles are frequently used in medical research to summarise the survival function. For example, the median survival time has long been used to assess the survival curve. In survival studies, quantile regression has gained increasing interest as a viable alternative to methods using distribution functions. For more details on quantile function one may refer to Sankaran et al. [25] and the references therein. Modelling of competing risks survival data using the quantile function has been studied by several researchers. Peng and Fine [21, 22] discuss non-parametric and regression model approach of quantile function with competing risks. Sankaran et al. [25] propose a quantile based test for comparing the equality of CIFs. Lee and Fine [15] considered parametric and non-parametric inferences for cumulative incidence quantiles without covariates. Lee and Han [16] proposed covariate adjusted quantiles inferences through cause specific proportional hazards (PH) model of the CIF. Lee [14] considered the parametric modelling of the cause specific quantiles with covariates through Weibull PH model and direct semi-parametric improper Gompertz model. Recently, Peng [20] has reviewed various aspects of quantile regression analysis of survival data with competing risks and without competing risks under randomly censored and left truncation mechanism based on the semiparametric approach. The primary goal of this paper is to obtain quantile based inference for middle censored competing risks survival data based on the parametric regression modelling of CIF. We define the CIF through Weibull PH model. Under this set-up, we obtain both maximum likelihood and Bayes estimates under reasonable priors. The Bayes estimates are obtained using two different loss functions, namely the squared error loss function and LINEX loss function. As one may expect, explicit form of the posterior densities turn out to be intricate, and so we adopt the Markov Chain Mote Carlo (MCMC) simulation algorithms for generating the posterior samples. The rest of the article is organised as follows. In Section 2 we define the parametric causespecific quantile functions based on Weibull PH model. In Section 3 we obtain the maximum likelihood estimates under middle censoring scheme for cause specific quantile functions. Bayes estimates based on squared error and LINEX loss functions are provided in Section 4. Section 5 presents a Monte Carlo simulation study to compare the relative performance of the proposed methods. A real data application of the proposed approach is given in Section 6. Finally Section 7 concludes with some remarks. Regression models in survival studies may be developed via Cox's Proportional Hazards (PH) [6] model, in which the effect of the covariates is multiplicative on some baseline hazard function. For parametric regression modelling of survival time one could use some well known distribution for the baseline function. More details can be found in [23] and reference therein. The CSHF in terms of the well known Cox PH model turns out to be of the following form: where 0 ( ) is the baseline CSHF and ⊤ ∈ ℝ is the 1 × vector of regression coefficients of cause . The CIF can be formulated in terms of all the CSHFs as follows The overall survival function ( ; ) is obtained in terms of cumulative CSHFs as ( ; ) = exp{− ∑ =1 Λ ( ; )}, where Λ ( ; )) = ∫ 0 ( ; ) . In this article, we consider 0 ( ) is corresponding to a Weibull distribution i.e. 0 ( ; , ) = ( ) −1 . The CIF under the cause specific PH assumption (4) is then given by The corresponding cause specific probability density function ( ; , ) is obtained by differentiating ( ; , ), and we get Following general notation for quantile function as in (3), the cause specific quantile/ subquantile function is defined as The estimate of cause specific quantile is obtained aŝ −1 ( ; ) = inf { ∶̂ ( ; ) ≥ }. Note that, if ( ; ) is continuous and strictly increasing function, then −1 ( ; ) is the unique value of such that ( ; ) = . Therefore, cause specific quantile function of ( ; , ) from equation (7), yields We used Weibull distribution for parametric modelling of cause specific quantile functions under middle censoring scheme because it is a very braod and flexible distribution for lifetime data analysis. A detailed discussion on parametric modelling of middle censored lifetime data with covariates can be found in [4] . We now proceed to obtain estimates of the unknown parameters of the Weibull distribution, the regression coefficients, and cause specific quantile functions, both under classical and Bayesian approaches in the following sections 3 and 4 respectively. The maximum likelihood estimation (MLE) is widely used among the statistical inference methods because of its desirable properties such as consistency, asymptotic efficiency, and invariance. In the middle censoring scenario, we will assume that the lifetime is middle censored by random censoring interval [ , ] which having a bivariate cumulative distribution function (⋅, ⋅). Moreover, we assumed that 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 ) is a censoring indicator. In this study it is assumed that when ∈ [ , ], then the causes of failure can be observed on later inspection. We assume that ( , , , ) are i.i.d. observations of ( , , , ) corresponding to individuals under study. For the observed data ( , , , ), the likelihood function is then given by where Δ ( ) = ( = ) is the indicator function for jth cause. It is also assumed that 1 and (7) and (8) can be written as The log-likelihood function = log ( ) is given by The MLEs of unknown parameters are obtained by maximizing the log-likelihood (13). The system of equations with respect to each of the parameters is given in the Appendix A. Since these equations are not in an explicit form, analytical solutions are not possible. So, we utilize iterative methods such as Newton-Raphson or other techniques to solve the system of equations. We used the optim function in R software for obtaining MLEs of the unknown parameters. By using the invariance property of MLEs we obtain the MLEs of cause specific quantile functions. Suppose that thê is the MLE of then the estimator of −1 ( ; , ) is given bŷ −1 ( ;̂ , ). Bayesian inference is distinctive in that it incorporates prior information with the observed data. For obtaining the Bayes estimates of unknown parameters , , and cause specific quantile −1 ( ; , ), first we need to define the suitable priors of unknown parameters and appropriate loss functions. It is known that there is limited information about the unknown parameters except that > 0, > 0 and −∞ < < ∞, = 1, 2, … , . If all the parameters are unknown then it is difficult to obtain the joint conjugate prior for the parameters. We therefore assumed informative prior by choosing independent gamma priors for and and normal priors for as follows where 1 , 1 , 2 , 2 , and are the hyper parameters. The hyperparameters are assumed to be known and are chosen in such a way that reflects the degree of belief about the unknown parameters. The joint prior distribution of , and from (14) up to the proportionality is given by The joint posterior density of , and is obtained as follows where ( | ) is the likelihood function based on observed data as given in equation (12) and ( ) is the joint prior density (15) . The denominator part of equation (16) involve multiple integrals and it is difficult to obtain the posterior densities of random variables , and in explicit form. Thus the analytical evaluation of posterior samples is impossible. Therefore, in this situation MCMC method can be used to approximate the integrals [24] . Popularly used MCMC algorithms are Gibbs sampling algorithm [8] and Metropolis-Hastings (M-H) algorithm [9] . Since, marginal posterior densities of random variables , and are not obtained in closed form, and so we employ the M-H algorithm. In this article we consider two different types of loss functions, namely the commonly used squared error (symmetric) loss function and the LINEX (asymmetric) loss function for the purpose of comprehensive comparison of Bayes estimates. Squared error loss function (SELF) is defined as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Note that SELF is a symmetric loss function but it is not useful for the situations when under/over estimation is more costly than the over/under estimation and it is considered the equal weight for both under and over estimation. For example, over estimation of survival function and failure rate function is usually much more serious than under estimation. To overcome this difficulty we also consider as an alternative the LINEX loss function (LLF) which is an asymmetric loss function given by ( ,̂ ) = (̂ − ) − (̂ − ) − 1, ≠ 0. Under LLF the Bayes estimates of parameter and −1 ( ; , ) can be obtained as followŝ where is the hyper parameter of the LLF and magnitude of reflect the degree of asymmetry. For > 0 the LLF is quite asymmetric about 0 with overestimation being more serious than underestimation. The opposite is true with < 0. If is close to zero then estimates under LLF are approximately equal to estimates obtained under SELF. We conducted a Monte Carlo simulation study to observe the finite sample behaviour of the MLE and Bayes estimators of the unknown parameters and cause specific quantile functions. We generate the random samples through inverse transformation for four different sample sizes i.e. = 25, 50, 100, and 200. For each sample sizes, we simulated 500 sets of data. In this scenario, we computed average estimates (AVE) and mean square error (MSE) for , , and −1 ( ; , ). Besides that we obtained the average length (AVL) along with coverage probability (CP) of the asymptotic confidence interval (ACI) of the MLE and Bayes credible interval (BCI) of the Bayes estimates to compare the precision of the estimates. We also consider two different censoring percentage viz., mild (approximately, 10%), and heavy (approximately, 30%) for observing the impact of censoring. The censoring effect is explained as follows: if lock-down is extended during the COVID-19 pandemic, the percentage of censored observations will increase. As a result, COVID-19 cases are on the rise, posing a threat to the health-care system. We refer these censoring percentages as censoring scheme 1 (CS-1) and censoring scheme 2 (CS-2) respectively. The results of simulation study based on CS-1 and CS-2 are available in Table 1 and Table 2 respectively. In the simulation study, we consider two causes of failure for simplicity i.e. = 1, 2 and one single covariate . The covariate is generated from (0, 1). The survival time is generated by using the steps given in [5] . Without loss of generality the true parameter values arbitrary taken as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 For regression parameters 1 and 2 we assumed (0, 1) as informative priors. The hyper parameter of LLF is fixed at = ±1.5, and it is known as llf-1 and llf-2. Next, as we discussed in Section 4 that marginal posterior densities of unknown parameters are not in a closed form, so we utilized the MCMC procedure for generating the random samples from marginal posteriors. For this purpose we used the BUGS software via R2OpenBUGS package in R software [17] . We generate, = 10000 Markov chains for each parameter and the first = 4000 samples were used in burn-in period. Furthermore, for minimizing the effect of the autocorrelation every second equally spaced outcome is considered i.e. thin=2. By the visualization of the convergence diagnostics plots it is realized that chains are converging nicely. Therefore, the last 6000 MCMC samples are used to obtained the Bayes estimates of , , and −1 ( ; , ) based on SELF and LLF. From Table 1 it is clear that for fixed censoring proportion as the sample size increases, MSEs decreases for MLE and Bayes estimates, which verifies the consistency property of all the estimators. As expected, for small sample size the Bayes estimates under both the loss functions are better than MLE in terms of MSE, AVL and CP. It is also noticed that the CPs for ACIs of the cause specific quantile functions for sample size 25 are little bit away from the nominal level (95%). Similarly, for = 50, 100 and 200 we can say that Bayes estimates of baseline parameters and cause specific quantiles under both the loss functions are quite better except some values in sample size 200. The Bayes estimates for llf-1 are smaller as compared to llf-2. From Table 1 and Table 2 it is observed that for fixed sample sizes the MSE and AVL of all the estimates is increases as the censoring percentage increases except for some values. Therefore, it indicates that as the censored observations are increased this will leads to the less efficient estimates. This implies that if the spread of corona virus is not under control in a reasonable period of time, then it will have a significant impact on the human health-care system. Overall it is observed that the CPs maintain the nominal level (95%) of all the estimates for both the censoring schemes. In this section, for illustrative purposes a real life application is considered. We have taken real data from a Mayo Clinic study on primary biliary cirrhosis (PBC) of the liver conducted between 1974 to 1984. This data set is available in survival package of R software. During this tenyear period, 312 patients were randomly assigned to receive D-penicillamine or placebo treatment from a total of 424 patients. The remaining 112 patients did not take part in the clinical trial but agreed to have their basic measurements taken and to be followed for survival. Six of those patients were lost to follow-up shortly after diagnosis, so these patients were removed from the study. 161 patients died at the end of the study, 25 patients received a liver transplant, and 232 patients were lost to follow-up. As a result, for two competing outcome variables, liver transplant and death, the competing risks model becomes more reasonable. All the survival times are measured in days. For more information on this PBC data, one may refer to Therneau and Grambsch [27] and application of competing risk on PBC data is available in [13] . In order to compute the survival time in years, it divided by 365, which yielded a median survival time of 4.74 years. First, we check the goodness of fit of the Weibull distribution using fitdistrplus package in R software with assumption that the data is complete. The Kolmogorov-Smirnov distance between the empirical distribution function and the fitted Weibull distribution function is 0.0331 and the corresponding -value is 0.7378. Therefore, Weibull model appears to be reasonable and cannot be rejected. We also consider the graphical method of goodness of fit to check the appropriateness of the model as given in Figure 1 and it indicates that the model fits well to the data. As we have discussed in Section 1 the middle censoring may arise due to COVID-19 pandemic. But this PBC data set does not have middle censored observations and currently we do not have any middle censored follow-up data. However, once the COVID-19 pandemic is over, it is possible that middle censored follow-up data will be available. Therefore, we created an artificial data set using middle censoring, whose left end point is equal to the observed time and the right end point is equal to the + , where is the width of the interval, which is generated from an exponential distribution with a mean value of 10. Then, all the censored individuals in the original data set are considered as the middle censored observations. The competing outcome variables for middle censored observation are randomly assigned from transplant and death. For this new data set which consists observed exact lifetimes and censored intervals, we check the PH assumption of the model (4) by considering treatment as a covariate. To examine the PH assumption for transplant and death we utilize the graphical method known as Andersen plot [3] . The covariate treatment is discrete and take two values 1 and 2 for D-penicillmain and placebo respectively. We also assume that 106 patients who do not participate in the trial they received the D-penicillmain treatment. Thus data are divided into two strata, corresponding to D-penicillmain and placebo individuals. Suppose thatΛ 0 ( ) be the estimate of baseline cumulative CSHF for jth cause in rth stratum, = 1, 2 for transplant and death, and = 1, 2 for D-penicillmain and placebo respectively. We plotΛ 1 0 ( ) versusΛ 2 0 ( ) for transplant and death. If the proportionality assumptions holds then these plots should be a straight line passing through origin which is verifying by the Figure 2 . We then apply the proposed methods of estimation to obtain the estimates of unknown param- Page 13 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 eters and cause specific quantile functions. These are presented in Table 3 and Table 4 under proposed methods of estimation. The Bayes estimates are obtained under SELF and LLF loss functions based on non-informative priors because we have no past information about the unknown parameters. For non-informative priors we assume that 1 = 1 = 21 = 21 = 22 = 22 = 0.0001 and where 1 and 2 are said to follow normal distribution with mean zero and large variance. From Table 3 it is observe that the MLE and Bayes estimates of the unknown parameters are very close. We also estimate the baseline CIFs based on equation (7) for transplant and death under MLE and Bayes estimates. Plots of the baseline CIFs are presented in Figure 3 in which solid line represents the estimates of the CIF due to death and dotted line represents the estimates of the CIF due to transplant. From Figure 3 it is observed that the estimates of CIFs have smaller value for transplant as compared to death. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 and for death are approximately 2.2 and 3.6 years respectively. Quantile event times gives the information about the stay of the patients in waiting queue, this implies that the 10% and 20% patients will receive the the transplant soon after 4 and 7 years respectively and 10% and 20% patients died soon after 2.2 and 3.6 years respectively. This shows that the waiting time of the patients to receive the transplant is roughly two times larger as compared to the death under both the treatment groups. It is observed that the quantile estimates under both the treatment groups have minor differences. This indicates that the effect of treatment is not significantly different on transplant and death. This article considers parametric cause specific quantile inference of the CIF under middle censoring scheme. In this study we have discussed the use of middle censoring scheme in the context of the current COVID-19 pandemic. This research could provide a useful statistical framework for medical practitioners to obtain precise survival analysis for patients who were lost to follow-up due to this pandemic. We believe this to be first such attempt to model the quantile event times of the middle censored data under competing risks. The regression model was developed based on Cox PH model by assuming a very flexible Weibull distribution for the baseline hazard function. Also, we provide the estimates of unknown parameters and cause specific quantiles under both the classical and Bayesian set-up. The simulation study shows that the Bayes estimates perform well based on informative priors under squared error loss function in terms of MSE as compared to MLE. However, all the estimates exhibit the consistency property, as also the identifiability and appropriate convergence of the proposed model. Overall, the proposed inferences performed well in simulation studies. In a real data analysis on Primary Biliary Cirrhosis of the liver, goodness of fit criteria verify that the Weibull model fits well to the data. Also the covariate treatment maintains the assumed model PH assumption. Other semi-parametric regression models, such as additive hazard regression model and proportional odds model, may also be appropriate in this context and will be explored elsewhere. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Partial derivatives of the log-likelihood of Equation (13) with respect to each of the parameters gives the following equations. On the robustness of right and middle censoring schemes in parametric survival models Statistical analysis of middle censored competing risks data with exponential distribution Testing goodness of fit of cox's regression and life model Analysis of gamma and weibull lifetime data under a general censoring scheme and in the presence of covariates Competing risks and multistate models with R Regression models and life-tables Perceptions of romanian physicians on lockdowns for covid-19 prevention Stochastic relaxation, gibbs distributions, and the bayesian restoration of images Monte carlo sampling methods using markov chains and their applications Nonparametric estimation for middle-censored data The Statistical Analysis of Failure Time Data Impact of covid-19 pandemic on health system & sustainable development goal 3 Competing risk model with bivariate random effects for clustered survival data Parametric inference for quantile event times with adjustment for covariates on competing risks data Inference for cumulative incidence quantiles via parametric and nonparametric approaches Covariate-adjusted quantile inference with competing risks The BUGS book: A practical introduction to Bayesian analysis The impact of the covid-19 pandemic on noncommunicable disease resources and services: results of a rapid assessment Nonparametric statistical data modeling Quantile regression for survival data. Annual review of statistics and its application Nonparametric quantile inference with competing-risks data Competing risks quantile regression Cause-specific hazard regression estimation for modified weibull distribution under a class of non-informative priors Introducing monte carlo methods with r A quantile based test for comparing cumulative incidence functions of competing risks models The impact of covid-19 on globalization Modeling survival data: extending the Cox model