key: cord-0872866-kp4otn54 authors: Elmoheen, Amr; Abdelhafez, Ibrahim; Awad, Waleed; Bahgat, Mohamed; Elkandow, Ali; Tarig, Amina; Arshad, Nauman; Mohamed, Khoulod; Al-Hitmi, Maryam; Saad, Mona; Emam, Fatima; Taha, Samah; Bashir, Khalid; Azad, Aftab title: External Validation and Recalibration of the CURB-65 and PSI for Predicting 30-Day Mortality and Critical Care Intervention in Multiethnic Patients with COVID-19 date: 2021-08-18 journal: Int J Infect Dis DOI: 10.1016/j.ijid.2021.08.027 sha: 5ea235ec7bdaef6170a67dee584500b880706acd doc_id: 872866 cord_uid: kp4otn54 OBJECTIVES: To validate and recalibrate the CURB-65 and pneumonia severity index (PSI) in predicting 30-day mortality and critical care intervention (CCI) in a multiethnic population with COVID-19, along with evaluating both models in predicting CCI. METHODS: Retrospective data was collected for 1181 patients admitted to the largest hospital in Qatar with COVID-19 pneumonia. The area under the curve (AUC), calibration curves, and other metrics were bootstrapped to examine the performance of the models. Variables constituting the CURB-65 and PSI scores underwent further analysis using the Least Absolute Shrinkage and Selection Operator (LASSO) along with logistic regression to develop a model predicting CCI. Complex machine learning models were built for comparative analysis. RESULTS: The PSI performed better than CURB-65 in predicting 30-day mortality (AUC 0.83, 0.78 respectively), while CURB-65 outperformed PSI in predicting CCI (AUC 0.78, 0.70 respectively). The modified PSI/CURB-65 model (respiratory rate, oxygen saturation, hematocrit, age, sodium, and glucose) predicting CCI had excellent accuracy (AUC 0.823) and good calibration. CONCLUSIONS: Our study recalibrated, externally validated the PSI and CURB-65 for predicting 30-day mortality and CCI, and developed a model for predicting CCI. Our tool can potentially guide clinicians in Qatar to stratify patients with COVID-19 pneumonia. Illness from coronavirus first appeared almost 20 years ago through severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS) . At the end of 2019, COVID-19 (SARS-CoV-2) took the world by storm, spreading even to the most remote places. Unfortunately, COVID-19 proved to be far more challenging than its predecessors. The World Health Organization (WHO) reported that as of April 8, 2021, the number of confirmed cases of COVID-19 was 132,730,691, with staggering 2,880,726 deaths worldwide. As of April 7, 2021, there has been a total of 650,382,819 vaccine doses have been administered (WHO, 2021) . The State of Qatar has a comprehensive public healthcare system of governmentoperated facilities of the Hamad Medical Corporation (HMC). All acute care cases, including adults and children, are seen. The emergency physicians (EPs) are usually involved in the initial assessments and the clinical management decisions to direct the patients to the most appropriate facility according to the severity of the disease. Recently, there has been an urgent need to predict critical care intervention (CCI) and mortality in patients with COVID-19. In the State of Qatar, as of April 16, 2021, there have been 194,930 confirmed cases with 367 deaths and 1,209,648 vaccine doses administered . Qatar is currently being hit by the second wave, with an average of 32 admissions to the intensive care unit (ICU) and an average of 327 ICU beds occupied per day . Respiratory illness has been the leading cause of death from COVID-19 worldwide (Ruan et al., 2020) . Different scoring systems can be used to assess the severity of pneumonia (Ranzani et al., 2018) . Most of the EPs are familiar with the CURB-65 score (confusion, urea, respiratory rate, blood pressure, and age ≥65 years) (Lim et al., 2003) and pneumonia severity index (PSI) (Fine et al., 1997) . While the PSI score is an extensive tool that provides excellent risk stratification of patients with community-acquired pneumonia (CAP), the CURB-65 is easier to use in a clinical setting with its fewer variables (Chalmers et al., 2010 , Fine et al., 1997 . PSI scores comprise five groups; I, II, III, IV, and V. Patients with scores (I-III) and (IV-V) are deemed low and high-risk groups, respectively. Similarly, the CURB-65 scores ranging (0-4), with scores of 0-1 and >2 indicating low and high mortality risks, respectively. Various prediction models have emerged during the COVID-19 pandemic (Brabrand et al., 2010) , aiming to optimize patient stratification for potentially reducing morbidity and mortality. Such scoring systems can only be used after rigorous validation (i.e., external validation) in another population different from where they were developed for reliability testing (Brabrand et al., 2010) . Our first aim of the study was to externally validate and recalibrate the CURB-65 and PSI in predicting CCI and 30-day mortality in a highly multiethnic population. The second aim was to use variables of PSI and CURB-65 to evaluate predictors of CCI amongst COVID-19 patients. External validation referes to the use of new datasets that were not utilized to construct prediction models Royston, 2000, Steyerberg, 2019) . Although the target population is patients with COVID-19 rather than CAP, bothe groups share similar charactaristics. We assessed the performance of the CURB-65 (Lim et al., 2003) and PSI (Fine et al., 1997) in predicting death within 30-days mortality as well as CCI in COVID-19 patients. These models have been used to predict death within 30-days in CAP, with the former being recently tested in predicting CCI (Ilg et al., 2019) . However, they were not designed to classify patients with COVID-19 pneumonia. For comparative analysis, we performed recalibration of the PSI and CURB-65 via estimating new beta coefficients for all variables in each model in the full dataset. We used the PSI and CURB-65 variables as potential predictors of severity. Data were collected for all the variables at admission, including age, data on living in nursing or long-term care residence, and history of comorbid conditions; cerebrovascular disease, renal disease, neoplastic disease, liver disease, congestive heart failure (CHF). Physical measurements included oxygen saturation < 90%, body temperature <35°C or >39.9°C, respiratory rate (RR) ≥ 30 b/m, pulse rate ≥125 b/m, systolic blood pressure (SBP) < 90, diastolic blood pressure (DBP) ≤ 60. In addition to confusion or altered mental status. Imaging results included the presence of pulmonary effusion on chest radiography (CXR). Laboratory measures were blood urea nitrogen (BUN) > 11, pH < 7.35, serum sodium < 130, serum glucose ≥ 14 mmol/l, hematocrit < 30%, partial arterial oxygen pressure (PO 2 ) < 60. All the continuous variables were dummy coded for ease of clinical use. The severity of COVID-19 (1 vs. 0) was defined as any of the following CCI: invansive or non-invansive mechanical ventilation, extracorporeal membrane oxygenation (ECMO), and/or administration of vasopressor and/or ionotropic medications, commencing assisted ventilation, insertion of invasive catheters including central line and/or arterial line, and/or renal replacement therapy. Categorical variables were reported as n (%) and continuous variables as median (IQR). Continuous variables were dummy coded as (0 vs. 1) in the regression analysis representing above and below cut-off values. We recognize that this approach is often criticized, however; we believe it aids with rapid patient stratification, especially during the second wave of the pandemic at the time of the study (April 2021), Data were analyzed after a random 80/20 splitting into development and validation datasets, respectively. All 938 patients in the development dataset were included for variable selection. Twenty one variables entered the Least Absolute Shrinkage and Selection Operator (LASSO) binary logistic regression using 10-fold cross-validation for internal validation and lambda within one standard error of the minimum (λ.1se). LASSO is a machine learning logistic regression that minimizes over-fitting and potential collinearity between predictor variables, while shrinking coffients of the weakest variables to zero and therefore excluding them from further analysis. Lasso regression was performed using the R package -glmnet‖, followed by regular variable-selection using logistic regression. The β-coefficient of significant predictors was multiplied by 10 and rounded to the nearest integer. The optimal cut-off (15) was derived from the optimal Youden index. Subsequently, subjects with scores <15 and >15 were considered the low and high-risk groups, respectively. The other methodological approach used in this study was based on more complex machine learning (ML) models using the same data splits and significant variables from the logistic regression model to construct four ML models. These models included random decision forest (RF), gradient boosting (XgBoost), along a deep learning neural network (DL). We used the same parameters utilized in LASSO for internal validation. The ML models were built using R package -h2o‖ (version 3.32.0.5). Assessment of the accuracy of all the models externally validated, recalibrated, and developed was conducted through the area under the curve (AUC). In addition, the AUCs of the four sophisticated ML models were compared with the AUC values of the logistic regression analyses. Calibration (precision) of all the models was evaluated using calibration curves from the -rms‖ package in R, and values of McKelvoy's R 2 , Brier score, calibration slope, intercept were generated using the 200 bootstrap approach to account for optimism. In addition, we performed the DeLong test (DeLong et al., 1988) for AUC comparative analysis. Statistical analysis was conducted using R software (version 4.0.4), with significance being accepted at p-value <0.05. A total of 1181 patients were included in this study. Out of which, 45 (3.8%) died within 30 days, and 229 (19.3%) underwent CCI. Patients were 94.5% males between the ages of 19 and 87, median age 43 (IQR: 35-53) while being from diverse nationalities. Patients showed Multiethnic nature and the most common nationalities were Bangladeshi, Indian, Nepalese, Pakistani, Filipino, Egyptian, Sri Lankan, Qatari; 26%, 22%, 19%, 8%, 6%, 5%, 4%, 3% of patients in order (figure 1). The median length of hospital stay was 7 days (IQR: 4-15). Eighty-one (6.85%) patients had at least one comorbidity (Table 1 and Table 2 ). Deceased patients were more likely to be older, with CCI, with incident hypoxia, tachycardia, tachypnea, and hypotension at admission along with the prolonged length of hospital stay 20 days (IQR: 10-30) ( Table 1) . Eight hundred eighty-nine (75.3%) patients had a CURB-65 score of 1, 207 (17.5%) had a score of 2, 57 (4.8%) had a score of 3, 20 (1.7%) had a score of 4, and only 8 (0.7%) scored 5. Out of which, 13 (1.4%) died within 30-days in score 1, 10 (4.83%) in score 2, 14 (24.56%) in score 3, 3 (15%) in score 4, and 5 (68.5%) in score 5. On the other hand, 944 patients (79.9%) were found to be in PSI class I, 44 (3.7%) were in class II, 94 (8.0%) in class III, 60 (5.1%) in class IV, and 39 (3.3%) in class V. The mortality rate was 10 (22.2%) in class I, 4.5% in class II, 7.4% in class III, 16.6% in class IV, and 41% in class V (Table 1) . Using the original points, we were able to calculate mortality scores for the CURB-65 and PSI in 1181 (100%) patients. Both models had a comparable performance, with AUCs 0.83 (95% CI 0.765-0.901) and 0.78 (95% CI 0.703-0.855), respectively ( Figure 2-A) . The DeLong test was insignificant with differences between the AUCs (0.05, p-value= 0.2232). However, the PSI showed a substantially better calibration in predicting 30-day mortality (Table S1) . Using the original points, we were able to calculate CCI scores for the CURB-65 and PSI models in 1181 (100%) COVID patients. The CURB-65 proved better calibration metrics (Table S1 ) and better overall accuracy, AUC 0.78 (95% CI 0.746-0.814) compared to the PSI 0.70 (95% CI 0.665-0.738) (Figure 2-B) , with a significant DeLong test (differences between the AUCs: 0.08, p-value <0.001). Both models showed comparable calibration ( Figure S1 -C&D). Recalibration of the CURB-65 and PSI models 30-day mortality resulted in significant The DeLong test for the recalibrated models was significant (differences between the AUCs: 0.14, p-value <0.001). While the AUC remained did not improve in the recalibrated CURB-65 model, we observed an overall better calibration ( Figure S2 -F). Based on the recalibrated coefficients (Table S2) The LASSO selected 7/21 variables (shown in Table 3 ). Inclusion of the selected variables in logistic regression yielded 6 significant predictive variables of COVID-19 CCI and therefore constituted the risk score. This included RR ≥ 30 bpm (20 points), oxygen saturation <90% (18 points), hematocrit < 30% (15 points), age > 55 years old (12 points), serum sodium < 130 mmol/l (10 points), and serum glucose ≥ 14 mmol/l (8 points) (Table 3) . ROC curves of the 6 significant predictors of severe COVID-19 status were generated from the complex ML models. The ML models' performance in the validation cohorts was similar to the regression model (Figure 3 -A&B) with AUC=0.826 for DL, AUC=0.832 for GBM, AUC=0.828 for RF, and AUC=0.829 for XGB (Figure 3-C) . To our knowledge, the CURB-65 and PSI have not been used or recalibrated to assess mortality in COVID-19 patients among multicultural populations from different countries living in Qatar, Middle East. This study utilizes these common tools in risk stratifying patients with community-acquired pneumonia (CAP) in this diverse population group. Qatar is an Arab state on the Qatar Peninsula with a heavy male predominance and females representing only 25% of the total population due to the high flow of male laborers. The population has tripled in the past 10 years up to 2011, with Qatari locals representing <15% of the whole population. This is followed by other Arab nationalities (13%), Indian (24%), Nepali (16%), Filipino (11%), and Bangladeshi (5%) and Sri Lankans (5%) . The CURB-65 and the PSI can reliably predict the mortality of in-patients with CAP. Previously, these tools have been used to assess CAP's mortality, but they have not been used to assess mortality in COVID19 patients amongst multicultural populations from different countries living in Qatar, Middle East. In this study, we assessed, recalibrated, and modified the PSI and CURB-65 as prognostic scoring mechanisms in the prediction of 30-day mortality and CCI. Although 30-day mortality (3.8%) in our study was lower than recent reports (Artero et al., 2021) , Our findings have shown that 71% and 68.8% of our sample underwent CCI after admission to ICU with COVID-19 pneumonia had a CURB-65 and PSI scores of 1-2 and 1-3, respectively; with a relatively high need, for CCI (18.7%). Hence the need for clinical models for COVID-19 pneumonia based on other outcomes modified by inpatient care as compared to mortality. Therefore, we performed variable selection using LASSO and logistic regression on all the CURB-65 and PSI variables to predict CCI, with only 6/21 significant including RR ≥ 30 bpm, O2 saturation <90%, hematocrit < 30%, age> 55 years old, serum sodium < 130 mmol/l, and serum glucose ≥ 14 mmol/l. The model showed better accuracy and calibration compared to the PSI and CURB-65 scores. While many studies showed the PSI and CURB-65 systems to be beneficial in the assessment of COVID-19 severity and good predictors of mortality, some found other scoring systems better. SCAP score (Severe Community-Acquired Pneumonia) was found to be a more accurate marker of 14-day mortality (Anurag and Preetam, 2021) . Another cohort found the COVID-GRAM score as the preferred means to identify patients with higher mortality with pneumonia caused by SARS-CoV-2 (Esteban Ronda et al., 2021) . Another cohort used NEWS2 (National Early Warning Score 2) and found it to be more beneficial than SIRS and qSOFA but not as predictive as CURB-65 and PSI (Holten et al., 2020 ). Yet another study found A-DROP, a modified version of CURB-65, to be superior in predicting in-hospital death compared to other widely used CAP-specific tools (Fan et al., 2020) . External validation was essential in other cohorts as well as this one since it was expected that newer scoring systems outperform older ones (Neto et al., 2021) . PSI and CURB-65 have been used for risk stratification and prognostic assessment in patients with COVID-19 previously . In our cohort, the PSI and CURB-65 performed similarly to other studies on CAP (Ahnert et al., 2019 , Asai et al., 2019 , George et al., 2019 . Notwithstanding its implications, two limitations in this study are important to point out, including its retrospective design and the heavily male predominance. However, the lack of missing data and the robust methodological approach, and Qatar's multiethnic and diverse population represented by a relatively large sample with a whide age group made this study unique, accounting for the lack of population diversity in other similar studies (Artero et al., 2021) . External validation proved that well-respected scoring systems such as CURB-65 and PSI lead credence to predictability and prognostic value in treating patients with COVID-19. Further studies are needed to evaluate the model developed in larger cohorts along with testing the validity of the CURB-65 and PSI in predicting CCI. Authors worldwide are encouraged to share data and collaborate in order to develop robust clinical models that can aid with clinical decisions while avoiding issues arising from introducing bias and overfitting (Wynants et al., 2020) . ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Abbreviations: CCI, critical care intervention; CHF, congestive heart failure; SBP, systolic blood pressure; DBP, diastolic blood pressure; BUN, blood urea nitrogen; PO2, partial pressure of oxygen. Abbreviations: CCI, critical care intervention; CHF, congestive heart failure; SBP, systolic blood pressure; DBP, diastolic blood pressure; BUN, blood urea nitrogen; PO2, partial pressure of oxygen. Pleural effusion a a Pleural effusion excluded from entering the analysis, while oxygen saturation was used as a substitute to simplify the score. COVID19 -Qatari Ministry of Publich Health Qatar Open Data -Coronavirus Disease 2019 (COVID-19) Statistics Sequential organ failure assessment score is an excellent operationalization of disease severity of adult patients with hospitalized community acquired pneumonia -results from the prospective observational PROGRESS study What do we mean by validating a prognostic model Validation of PSI/PORT, CURB-65 and SCAP scoring system in COVID-19 pneumonia for prediction of disease severity and 14-day mortality Severity Scores in COVID-19 Pneumonia: a Multicenter, Retrospective, Cohort Study Efficacy and accuracy of qSOFA and SOFA scores as prognostic tools for community-acquired and healthcareassociated pneumonia Risk scoring systems for adults admitted to the emergency department: a systematic review Severity assessment tools for predicting mortality in hospitalised patients with community-acquired pneumonia. Systematic review and meta-analysis Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach Application of validated severity scores for pneumonia caused by SARS-CoV-2 Comparison of severity scores for COVID-19 patients with pneumonia: a retrospective study A prediction rule to identify low-risk patients with community-acquired pneumonia External Validation of the qSOFA Score in Emergency Department Patients With Pneumonia Predicting severe COVID-19 in the Emergency Department Performance of the CURB-65 Score in Predicting Critical Care Interventions in Patients Admitted With Community-Acquired Pneumonia Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study Community-acquired Pneumonia Severity Assessment Tools in Patients Hospitalized with COVID-19: a Validation and Clinical Applicability Study Severity scoring systems for pneumonia: current understanding and next steps Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China Clinical prediction models The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies Comparison of severity classification of Chinese protocol, pneumonia severity index and CURB-65 in risk stratification and prognostic assessment of coronavirus disease COVID-19) Dashboard; 2021 Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal The Qatar National Library funded the publication of this article. The authors of this article declare no conflict of interest.