key: cord-0027042-93d5i1u0 authors: Austin, Peter C.; Putter, Hein; Lee, Douglas S.; Steyerberg, Ewout W. title: Estimation of the Absolute Risk of Cardiovascular Disease and Other Events: Issues With the Use of Multiple Fine-Gray Subdistribution Hazard Models date: 2022-01-31 journal: Circ Cardiovasc Qual Outcomes DOI: 10.1161/circoutcomes.121.008368 sha: 640bf4f2403b75fc17b021bc8c1944262782f1de doc_id: 27042 cord_uid: 93d5i1u0 BACKGROUND: The Fine-Gray subdistribution hazard model is frequently used in the cardiovascular literature to estimate subject-specific probabilities of the occurrence of an event of interest over time in the presence of competing risks. A little-known limitation of this approach is that, for some subjects and for some time points, the sum of the subject-specific probabilities for the different event types (eg, cardiovascular and noncardiovascular death) can exceed one. METHODS: We used data on 8238 patients hospitalized with congestive heart failure in Ontario, Canada. We fit 2 Fine-Gray subdistribution hazards models, one for cardiovascular death and one for noncardiovascular death and estimated the probability of death due to each cause within 5 years of hospital admission. We also fit 2 cause-specific hazard models for the 2 event types and combined the estimated cause-specific hazard functions to obtain subject-specific estimates of the probabilities of each of the 2 event types occurring within 5 years. RESULTS: When adding the probabilities of 5-year cardiovascular death and 5-year noncardiovascular death obtained from the Fine-Gray subdistribution hazard models, 8.6% of subjects had an estimated probability of 5-year all-cause mortality that exceeded 1 (100%). This problem was avoided by fitting 2 cause-specific hazard models, one for each outcome type, and combining the estimated cause-specific hazard functions to obtain subject-specific estimates of the risk of cardiovascular and noncardiovascular death. CONCLUSIONS: The Fine-Gray subdistribution hazard model may be problematic to use for a comprehensive assessment of absolute risks of multiple outcomes, while the combination of 2 cause-specific hazard models shows better statistical behaviour. Cause-specific modeling should not be discarded in competing risk situations. S urvival analysis is concerned with the analysis of outcomes that occur over time. A common example in medical research is time to all-cause mortality. A competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. If the primary event of interest is death due to cardiovascular causes, then death due to noncardiovascular causes is a competing risk, since subjects who die of noncardiovascular causes are no longer at risk of death due cardiovascular causes. There have been a substantial number of tutorial and expository articles on the analysis of competing risk data, both in the general literature and the cardiovascular literature. [1] [2] [3] [4] [5] [6] [7] Cardiovascular researchers are increasingly aware of the need to use appropriate statistical methods when analyzing data in which competing risks are present. For example, a model for incident acute myocardial infarction may consider mortality from other causes as a competing event. It is increasingly acknowledged that in the presence of competing risks, the use of the Kaplan-Meier survival function (by censoring on the competing risks) results in biased estimation of the absolute risk of events over time. Similarly, the use of a single Cox proportional hazards model (ie, a single cause-specific hazards model) can result in biased estimation of the absolute risk of the event of interest over time in the presence of competing risks. There are 2 primary options when using regression models to obtain subject-specific estimates of the absolute risk of an event over time in the presence of competing risks. The first is the Fine-Gray subdistribution hazard model, which allows for modeling the effect of covariates on the cumulative incidence function (CIF). 8 Second, one can fit cause-specific hazard models for each of the different event types and then combine the resultant causespecific hazard models to estimate the absolute risk of each of the different event types over time. 1, 9 Note that this latter approach is different from fitting a single causespecific hazard function for the primary cause of interest. While the former approach is increasingly frequently used, our impression is that the latter approach is rarely used in applied applications. A little-known limitation of the Fine-Gray subdistribution hazard model is that, for specific covariate patterns and for certain values of time, the sum of the estimated absolute risk for the different event types can exceed one. Thus, one can obtain estimates of the probability of all-cause events that are not constrained to lie between 0 and 1, which is clearly impossible. Increasingly, cardiovascular research has expanded beyond all-cause mortality for challenging healthcare issues, especially when considerations extend beyond this one outcome. For example, heart failure is a challenge because of the multiplicity of potentially important outcomes including heart failure hospitalization, ischemia-related hospitalization, emergency department visits, sudden death, cardiovascular death, and worsening heart failure. 10, 11 These outcomes may have different degrees of importance to subspecialist physicians (eg, heart failure specialists, cardiac electrophysiologists, interventional cardiologists, cardiac surgeons), generalists, and hospitalists. Based on our prior work engaging patients, 12, 13 the outcomes that are important to patients extend beyond all-cause mortality. This reinforces the importance of being able to provide accurate estimates of the absolute risk of multiple outcomes, as these are easier for patients to understand and enable appropriate patient decision making. The objective of this article is to illustrate some limitations with estimation of subject-specific estimates of absolute risk when using the Fine-Gray subdistribution hazard model in patients with congestive heart failure. Using cause-specific hazard models for all event types will be shown to circumvent the problems of the Fine-Gray model. The data sets used in these analyses were linked using unique encoded identifiers and analyzed at ICES. While data sharing agreements prohibit ICES from making the data set publicly available, access may be granted to those who meet prespecified criteria for confidential access, available at www.ices.on.ca/ DAS. The use of data in this project was authorized under section 45 of Ontario's Personal Health Information Protection Act, which does not require review by a Research Ethics Board. We used data from first phase of EFFECT (The Enhanced Feedback for Effective Cardiac Treatment) Study, 14 which collected data on patients hospitalized with congestive heart failure between April 1, 1999, and March 31, 2001 at 86 hospital corporations in Ontario, Canada. For the current study, individual patient data were available on 8238 patients hospitalized with a diagnosis of congestive heart failure. The outcome was time to death, with subjects censored 5 years after the date of hospital admission. Death was categorized as either death due to cardiovascular causes or death due to noncardiovascular causes. Outcome ascertainment was through linkage with the provincial death registry. Of the 8238 patients, 3416 (41.5%) died of cardiovascular causes within 5 years of admission, while 2228 (27.0%) died of noncardiovascular causes within 5 years of admission. All subjects were followed to either the date of death or 5 years postadmission, whichever came first. We considered 28 predictor variables consisting of demographic characteristics (age, sex); vital signs on presentation • The Fine-Gray subdistribution hazard model is frequently used to estimate subject-specific probabilities of the occurrence of an event of interest over time in the presence of competing risks. • A limitation of this approach is that, for some subjects and for some time points, the sum of the subject-specific probabilities for the different event types (eg, cardiovascular and noncardiovascular death) can exceed one. • This problem can be avoided by combining the cause-specific hazard functions for all the different types of events. • We illustrated the existence of this problem using data on patients hospitalized with congestive heart failure. We regressed the subdistribution hazard of cardiovascular death on the 28 covariates described above. Using the fitted model, we estimated the absolute risk of cardiovascular death within 5 years for each subject in the sample. We then repeated this analysis for noncardiovascular death. For each subject, we added the probabilities of cardiovascular and noncardiovascular death within 5 years to obtain the probability of all-cause mortality within 5 years. We fit 2 cause-specific hazard models using the 28 covariates described above. The cause-specific hazard models for cardiovascular and noncardiovascular death were estimated separately. Using the 2 fitted cause-specific hazard models, we obtained the estimated absolute risk of cardiovascular and noncardiovascular death within 5 years for each subject in the sample using methods described elsewhere. 1, 9 For each subject, we added the probabilities of cardiovascular and noncardiovascular death within 5 years. We also fit a Cox proportional hazards model for all-cause mortality using the 28 covariates described above (note that all-cause death is comprised of death due to cardiovascular causes and death due to noncardiovascular causes). Using the fitted model, we estimated for each subject the probability of all-cause mortality within 5 years. Note that the 5 regression models (2 Fine-Gray subdistribution hazard models, 2 causespecific hazard models; 1 Cox proportional hazard model) all incorporated the same 28 covariates described above, with similar coding. The cause-specific and subdistribution hazard ratios along with associated 95% CIs for the 2 type of events are reported in Table. We assessed the calibration of the models for estimating the absolute risk of cardiovascular death within 5 years using a method described before. 5 For a given modeling approach (combining the 2 cause-specific hazard models or the Fine-Gray subdistribution hazard model), we divided subjects into ten risk strata using the deciles of the estimated risk of 5-year cardiovascular mortality. Within each stratum, we computed the mean estimated probability of 5-year cardiovascular mortality. Then, within each stratum we estimated the observed risk of 5-year cardiovascular mortality using a CIF. We plotted the observed risk against the mean estimated risk across the 10 risk strata. This process was then repeated for noncardiovascular mortality. The Fine-Gray subdistribution hazard models and the cause-specific hazard models were fit using the GFR and CSC functions, respectively, in the riskRegression package (version 2019.11.03) for R (version 3.5.1). R code for fitting the 2 Fine-Gray subdistribution hazard models, the 2 cause-specific hazard models, and the Cox proportional hazards model, along with estimating subject-specific probabilities of 5-year cardiovascular death, 5-year noncardiovascular death, and 5-year all-cause death is provided in appendix A in the Supplemental Material, while corresponding SAS code is provided in appendix B in the Supplemental Material. Overall, the estimates from the 2 approaches (Fine-Gray versus cause-specific hazard modeling) tended to be comparable for each outcome (Figure 1 ). The median difference in the estimated probability of 5-year cardiovascular death between the 2 approaches was 0.5% (25th and 75th percentiles: −0.8% and 1.4%), while the minimum and maximum differences were −31.6% and 7.4%, respectively (these differences are cause-specific hazard minus Fine-Gray). The median difference in the estimated probability of 5-year noncardiovascular death between the 2 approaches was 0.5% (25th and 75th percentiles: −1.1% and 1.6%), while the minimum and maximum differences were −34.6% and 11.0%, respectively. The estimated risk of death due to any cause within 5 years obtained from a single Cox model was compared with that obtained from adding the 2 estimates of the absolute risk of cause-specific mortality derived from the 2 cause-specific hazard models (left pane of Figure 2 ). There was good agreement between the 2 approaches. Importantly, the estimated risk of all-cause mortality was ≤1 for all subjects using both the Cox proportional hazards model for all-cause mortality and the method based on combining the 2 cause-specific hazard models. The median difference in the estimated probability of 5-year all-cause mortality between the 2 approaches was 0.3% (25th and 75th percentiles: −0.3% and 0.7%), while the minimum and maximum differences were −11.6% and 3.0%, respectively (these differences are Cox minus cause-specific hazard). Finally, the estimated risk of death due to any cause within 5 years obtained from a single Cox model was compared with that obtained from adding the 2 estimates of absolute risk derived from the 2 Fine-Gray subdistribution hazard models (right panel of Figure 2 ). The median difference in the estimated probability of 5-year all-cause mortality between the 2 approaches was 0.9% (25th and 75th percentiles: −2.2% and 3.2%), while the minimum and maximum differences were −63.6% and 15.7%, respectively (these differences are Cox minus Fine-Gray). The agreement was hence inferior for the combination of 2 Fine-Gray models than the combination of 2 cause-specific hazard models. Importantly, the sum of risk estimates from the Fine-Gray models exceeded 1 (or 100%) in 707 (8.6%) out of 8238 of subjects, in contrast to in none of the subjects when using the 2 cause-specific hazard modeling approach. Both methods displayed very good calibration for cardiovascular death (Figure 3 ). For noncardiovascular death, calibration was slightly worse for both approaches, with the cause-specific modeling approach having slightly better calibration than the subdistribution hazard modeling approach. We demonstrated that using the Fine-Gray subdistribution hazard model can result in estimated probabilities of events occurring within specified durations of time such that the sum of these probabilities across all event types exceeds one for a non-negligible proportion of patients. This is clearly undesirable, as the probability of all-cause events is necessarily constrained to be at most one. This problem can be avoided by fitting all the cause-specific hazard models and combining the estimated cause-specific hazard functions. Moreover, the sum of absolute risks for cardiovascular and noncardiovascular events was slightly off from what would be predicted by a simple Cox model for the combination of events in the lower risk range. Despite these problems with the Fine-Gray subdistribution hazard model, there are attractive features. First, the Fine-Gray subdistribution hazard model is interpretation-friendly. 15 This is because the direction of the regression coefficients (ie, positive versus negative) indicate the direction (but not the magnitude) of the effect of the associated covariate on the cumulative incidence (or risk) of the occurrence of the outcome. 16 Thus, increasing values of covariates that have a positive regression coefficient in a subdistribution hazard model are associated with increases in the cumulative incidence of the event, while increasing values of covariates that have a negative regression coefficient are associated with decreases in the cumulative incidence of the event. Second, if a researcher publishes the baseline cumulative incidence function (eg, at 5 years) and the regression coefficients, then one can easily compute the absolute risk of the event of interest for any covariate pattern at that time point (eg, 5 years). In contrast to the interpretation-friendly nature of the subdistribution hazard model, it is difficult to determine the direction of the effect of individual covariates on the absolute risk of the outcome when combining cause-specific hazard functions. Furthermore, one cannot derive a direct expression for the absolute risk of the occurrence of the outcome as a function of the baseline risk, the regression model coefficients and subject characteristics, since both risk functions need to be combined. 5 Our assessment of calibration indicates that the limitation of the Fine-Gray model that we have illustrated is masked when examining predictions in aggregate. Both modeling approaches displayed very good calibration and the mean estimated risk of cause-specific death within 5 years within each of the 10 risk strata was comparable between the 2 approaches. A limitation of the Fine-Gray model is evident when the focus is on subject-specific probabilities of all event types. Indeed, it is these very probabilities that are of key importance for informing clinical decision making. In some contexts, only the probabilities of the primary event may be of interest, while in other contexts, all the event types may be of interest. In the context of heart failure, there is increasing interest in determining the risk of cardiovascular and noncardiovascular hospitalizations, with death treated as a competing risk. 17 The Fine-Gray model is increasingly used to enable these types of analyses. 11 Furthermore, the Fine-Gray model is being used to estimate the risk of not just the primary event of interest, but also of competing events. For example, in the context of prophylactic primary prevention implantable cardioverter defibrillators for left ventricular systolic dysfunction, clinicians, and patients may benefit from risk prediction methods to decide whether to implant a device or forego the procedure. While implantation of a defibrillator may treat life-threatening ventricular arrhythmias, recipients of the device may be subject to complications, device infections, device recalls, and inappropriate shocks. 18, 19 Since implantable cardioverter defibrillators reduce the risk of arrhythmic death but do not impact nonarrhythmic death, it is important to enable prediction of these competing events with better estimates of the risk of arrhythmic death and nonarrhythmic death, since the former is preventable by a defibrillator while the latter is not. 20 An explanation for the phenomenon that we observed is that in the context of there being 2 types of events and when fitting 2 separate subdistribution hazard models, at least one of the fitted models will be incorrect. 21 This occurs because, when fitting 2 subdistribution hazard models, the regression coefficients of the second model are completely determined by the regression coefficients of the first model and the 2 CIFs for a reference subject (a subject whose covariates are all equal to 0). However, when each model is fit separately, this constraint is no longer observed and thus the regression coefficients for the second model will be incorrect. This will lead to incorrect estimates of the CIF for at least one of the models, leading to the possibility that the sum of the CIFs may exceed one for some values of time, as we have illustrated. There are certain limitations to the current study. The study was not intended to be a comprehensive evaluation of the Fine-Gray subdistribution hazard model, nor was it intended to comprehensively compare all available methods for estimating incidence in the presence of competing risks. Rather, our intent was to advertise a little-known issue with the use of the Fine-Gray model. While the phenomenon of total event probability exceeding one is not unknown, it has received little coverage, even in the statistical literature. We refer the interested reader to a article in the statistical literature in which this issue is explored in substantially greater depth, including a mathematical proofs of its existence. 22 We are only aware of a few other instances where this phenomenon has been discussed: once in a vignette associated with the survival package for R and 3 times in passing elsewhere. [23] [24] [25] [26] In our case study, we observed that 8.6% of subjects had a total event probability at 5 years (as derived from the 2 fitted Fine-Gray models) that exceeded 1. It is likely that the magnitude of this issue varies across samples and across times at which cumulative incidence is estimated. In the current study, we have illustrated that using all cause-specific hazard functions allows one to circumvent this limitation of the Fine-Gray model. Another advantage to the use of the cause-specific hazard approach compared with the subdistribution hazard approach is that the cause-specific risk set is more easily interpretable than the subdistribution risk set, as the latter retains subjects who have experienced a competing event. 1, 2 Alternative modeling approaches are also available. [24] [25] [26] [27] [28] [29] We are not arguing against the use of the Fine-Gray subdistribution hazard model. As noted above, the Fine-Gray model is both an interpretation-friendly model and allows for simple communication of risk estimates by reporting only the estimated regression coefficients and the baseline CIF. However, we encourage researchers to be cautious in the use of the Fine-Gray model when the focus is on the absolute risk of >1 of the different event types, where a combination of cause-specific models may be more sensible. In conclusion, a little-known limitation of the Fine-Gray subdistribution hazard model is that the sum of the cause-specific estimates of absolute risk can exceed one (or 100%) for some subjects and for some time points. Using all the cause-specific hazard models allows one to avoid this limitation. Investigators are encouraged to consider this latter approach, particularly when the focus is on providing the absolute risk of each of the different types of events. Tutorial in biostatistics: competing risks and multi-state models Competing risk regression models for epidemiologic data Competing risks and the clinical community: irrelevance or ignorance Competing risks analyses: objectives and approaches Prognostic models with competing risks: methods and application to coronary risk prediction Introduction to the analysis of survival data in the presence of competing risks Evaluating health outcomes in the presence of competing risks: a review of statistical methods and clinical applications A proportional hazards model for the subdistribution of a competing risk The Statistical Analysis of Failure Time Data Early invasive coronary angiography and acute ischaemic heart failure outcomes Sex-related differences in heart failure with preserved ejection fraction Investigators of the COACH trial. Patient engagement in a trial testing a new strategy of care for acute heart failure Rationale and design of the comparison of outcomes and access to care for heart failure (COACH) trial: A stepped wedge cluster randomized trial Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards Practical recommendations for reporting Fine-Gray model analyses for competing risk data Associations between short or long length of stay and 30-day readmission and mortality in hospitalized patients with heart failure Investigators of the Ontario ICD Database. Evaluation of early complications related to De Novo cardioverter defibrillator implantation insights from the Ontario ICD database Sex differences in implantable cardioverter defibrillator outcomes: findings from a prospective defibrillator database Investigators of the Ontario ICD Database. Clinical risk stratification for primary prevention implantable cardioverter defibrillators Competing Risks and Multistate Models with R Fine-Gray subdistribution hazard models to simultaneously estimate the absolute risk of different event types: cumulative total failure probability may exceed 1 Multi-State Models and Competing Risks Absolute risk regression for competing risks: interpretation, link functions, and prediction Maximum likelihood estimation of semiparametric mixture component models for competing risks data Semiparametric regression on cumulative incidence function with interval-censored competing risks data Direct parametric inference for the cumulative incidence function Constrained parametric model for simultaneous inference of two cumulative incidence functions Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function ICES is an independent, nonprofit research institute funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). As a prescribed entity under Ontario's privacy legislation, ICES is authorized to collect and use health care data for the purposes of health system analysis, evaluation, and decision support. Secure access to these data is governed by policies and procedures that are approved by the Information and Privacy Commissioner of Ontario. The opinions, results, and conclusions reported in this article are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOH or MLTC is intended or should be inferred. Parts of this report are based on Ontario Registrar General (ORG) information on deaths, the original source of which is ServiceOntario. The views expressed therein are those of the author and do not necessarily reflect those of ORG or the Ministry of Government and Consumer Services. The data set from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (eg, health care organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet prespecified criteria for confidential access, available at www.ices.on.ca/DAS (email: das@ices.on.ca). The use of data in this project was authorized under section 45 of Ontario's Personal Health Information Protection Act, which does not require review by a research ethics board. None. Supplemental software code: R code (Appendix A) and SAS code (Appendix B).