key: cord-1004235-gjhfo15c authors: Tanboğa, Ibrahim Halil; Canpolat, Uğur; Çetin, Elif Hande Özcan; Kundi, Harun; Çelik, Osman; Çağlayan, Murat; Ata, Naim; Özeke, Özcan; Çay, Serkan; Kaymaz, Cihangir; Topaloğlu, Serkan title: Development and validation of clinical prediction model to estimate the probability of death in hospitalized patients with COVID‐19: Insights from a nationwide database date: 2021-02-10 journal: J Med Virol DOI: 10.1002/jmv.26844 sha: d12ea6c89843e741edab6b44abd366a7a103c6cb doc_id: 1004235 cord_uid: gjhfo15c In the current study, we aimed to develop and validate a model, based on our nationwide centralized coronavirus disease 2019 (COVID‐19) database for predicting death. We conducted an observational study (CORONATION‐TR registry). All patients hospitalized with COVID‐19 in Turkey between March 11 and June 22, 2020 were included. We developed the model and validated both temporal and geographical models. Model performances were assessed by area under the curve‐receiver operating characteristic (AUC‐ROC or c‐index), R (2), and calibration plots. The study population comprised a total of 60,980 hospitalized COVID‐19 patients. Of these patients, 7688 (13%) were transferred to intensive care unit, 4867 patients (8.0%) required mechanical ventilation, and 2682 patients (4.0%) died. Advanced age, increased levels of lactate dehydrogenase, C‐reactive protein, neutrophil–lymphocyte ratio, creatinine, albumine, and D‐dimer levels, and pneumonia on computed tomography, diabetes mellitus, and heart failure status at admission were found to be the strongest predictors of death at 30 days in the multivariable logistic regression model (area under the curve‐receiver operating characteristic = 0.942; 95% confidence interval: 0.939–0.945; R (2) = .457). There were also favorable temporal and geographic validations. We developed and validated the prediction model to identify in‐hospital deaths in all hospitalized COVID‐19 patients. Our model achieved reasonable performances in both temporal and geographic validations. The ongoing outbreak of the novel coronavirus disease 2019 has posed a challenge for public health, healthcare systems, and economies globally. It manifests with a broad clinical spectrum, ranging from asymptomatic patients to critical septic shock and a multiorgan dysfunction. 1, 2 Elderly patients and those with comorbidities are at higher risk of COVID-19 complications. 3, 4 Delays in the treatment of patients can be detrimental. 5 A simple and accurate clinical score for the assessment of disease severity could help identify the COVID-19 patients at a high risk of developing critical illness 6 and allow physicians to determine which patients can be managed safely at local hospitals and which require early transfer to tertiary pandemic centers. 7 Some prediction models have been developed to guide physicians to triage and treat high-risk patients rapidly. Wynants et al. 8 concluded that all models reported good-to-excellent predictive performance. However, all were appraised to have a high risk of bias, owing to a combination of poor reporting and methodological conduct for the participant selection, predictor description, and statistical methods used. 8, 9 They recommended that the studies should adhere to the TRIPOD statement (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis). 8, 10 However, such a reliable and validated prediction model is still lacking. 8, 11 In the current study, we aimed to develop and validate a model based on our nationwide centralized COVID-19 database for predicting in-hospital deaths. We conducted an observational, retrospective, and longitudinal cohort study (CORONATION-TR registry) in accordance with the TRIPOD statement. All patients hospitalized in Turkey with at least one positive reverse transcriptase polymerase chain reaction (PCR) test for COVID-19 between March 11, 2020 and June 22, 2020 were included in the study. We did not include patients with negative PCR results, who were aged <18 years or who were not hospitalized. The Turkish Ministry of Health approved the study with a waiver of informed consent for retrospective data analysis. All these data were obtained from the "public health management system (PHMS) module" to collect COVID-19-specific data (symptoms, biomarkers, medications, comorbidities, and clinical outcomes during index hospitalization). Detailed information about data collection has been published in advance. 12 The primary outcome for this study is 30-day all-cause death. We did not include all-cause deaths in the prehospital period, after discharge from the hospital, or in patients who were not hospitalized. Patients who were admitted to the emergency department and who died in the emergency department were also not included in this study. We selected candidate predictors on the basis of known or plausible associations with exposure to COVID-19 infection. Candidate predictor variables obtained at the time of admission are described below. (i) Age (years), neutrophil-lymphocyte ratio (NLR), C-reactive protein (CRP) (mg/dl), lactate dehydrogenase (LDH) [U/L], D-dimer (μg/ml), hemoglobin (Hgb) (mg/dl), albumin (mg/dl), creatinine (mg/dl), and platelet count (×10 9 /L) were included in the model as continuous variables using restricted cubic spline (four knots). (ii) Sex, coronary artery disease (CAD), peripheral vascular disease (PVD), collagen tissue disorders (CTD), malignancy, lymphoma, heart failure (HF), chronic obstructive pulmonary disease (COPD), cerebrovascular disease (CVD), hypertension (HTN), diabetes mellitus (DM), valvular heart disease, chronic liver disease, and pneumonia on computed tomography (CT) were included in the model as categorical variables. All statistical analyses were performed using R-software v. 3.6.3 (R statistical software, Institute for Statistics and Mathematics, Vienna, Austria) using "rms", "CalibrationCurves", "ggplot", and "survminer" packages. Continuous variables were presented as a median and interquartile range, whereas categorical variables were presented as counts and percentages. The associations between prespecified candidate predictors and death were assessed using multivariable logistic regression. The associations between candidate predictors and outcome were quantified using the adjusted odds ratio (OR) with a 95% confidence interval (CI). To capture nonlinear associations, continuous predictors were modeled using restricted cubic spline transformations (four knots). The adjusted OR for continuous predictors were shown as inter-quartile OR. The final model was fitted using step-down backward variable selection (α = .05). Overall predictive accuracy and discriminative ability of the model were evaluated using R 2 and area under the curve-receiver operating characteristic (AUC-ROC orc-index), respectively. Agreement between predicted and observed outcomes were evaluated graphically with calibration plots. Validation procedures were as follows 13, 14 : (n = 92) were more than 70 years. In addition, 80.6% of these patients were male and 19.4% were female. The relationship between the number of comorbidities and death risk is shown in Figure S4 . The observed frequency of death, transfer to ICU, and need for MV for number of comorbidities are summarized in Table S1 . Model development: There are 23 baseline variables available for inclusion in prognostic model. Table 3 summarizes the multivariable risk model with adjusted odds ratio and 95% CI for each predictor. After backward step-down variable selection (α = .05), age, LDH, CRP, NLR, creatinine, D-dimer, albumin, hemoglobin, platelet counts, presence of heart failure, DM, and pneumonia on CT were found to be the strongest predictors of 30-day mortality. Age, LDH, albumin, CRP, and creatinine accounted for 80% of the variation in 30-day mortality. Figure S3 ). In Table 4 In this study, we developed and validated the prediction model to identify in-hospital deaths using predictors measured at admission in all hospitalized patients. Our model demonstrated reasonable performances in both temporal and geographic validations. Although the majority of COVID prognostic models reported good-to-excellent discriminations, all were found to exhibit a high risk of bias due to a combination of poor reporting, bias in participant selection, low sample size, and the use of improper statistical methods. 8 A high risk of bias suggests that the performance of these models in new samples will probably be poor, and the estimated AUC indicating near-perfect discrimination was consistent with overfitting. We used one of the largest populations and developed the model to generalize our findings and reduce the risk of overfitting. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72314 cases from the Chinese Center for Disease Control and Prevention Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study Predictors of mortality in hospitalized COVID-19 patients: a systematic review and meta-analysis Unpredictable fall of severe emergent cardiovascular diseases hospital admissions during the COVID-19 pandemic: experience of a single large center in Northern Italy Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19 Prediction for progression risk in patients with COVID-19 pneumonia: the CALL score Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal Prediction models for diagnosis and prognosis in Covid-19 Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration COVID-19 prediction models should adhere to methodological and reporting standards The role of frailty on adverse outcomes among older patients with COVID-19 Prediction models need appropriate internal, internal-external, and external validation Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis Statistical regions in the European Union and partner countries -NUTS and statistical regions 2021 Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: a metaanalysis Multisystem inflammatory syndrome in U.S. children and adolescents Prevalence and impact of coagulation dysfunction in COVID-19 in China: a meta-analysis Endotheliopathy in COVID-19-associated coagulopathy: evidence from a single-centre, crosssectional study ACKNOWLEDGEMENT This study is supported by Republic of Turkey Ministry of Health.The supporter had no role in design, analysis, and interpretation. The authors declare that there is no conflict of interests. The data that support the findings of this study are available from the corresponding author upon reasonable request. The peer review history for this article is available at https://publons.