key: cord-0743358-x8yu7k4a
authors: Hincapié, Carolina; Ascuntar, Johana; León, Alba; Jaimes, Fabián
title: Community-acquired pneumonia: comparison of three mortality prediction scores in the emergency department
date: 2021-10-23
journal: Colombia medica
DOI: 10.25100/cm.v52i4.4287
sha: 3346c4d7d2417606d8aeaeadf7bb9e0eeceaec71
doc_id: 743358
cord_uid: x8yu7k4a

BACKGROUND: qSOFA is a score to identify patients with suspected infection and risk of complications. Its criteria are like those evaluated in prognostic scores for pneumonia (CRB-65 - CURB-65), but it is not clear which is best for predicting mortality and admission to the ICU. OBJECTIVE: Compare three scores (CURB-65, CRB-65 and qSOFA) to determine the best tool to identify emergency department patients with pneumonia at increased risk of mortality or intensive care unit (ICU) admission. METHODS: Secondary analysis of three prospective cohorts of patients hospitalized with diagnosis of pneumonia in five Colombian hospitals. Validation and comparison of the score´s accuracies were performed by means of discrimination and calibration measures. RESULTS: Cohorts 1, 2 and 3 included 158, 745 and 207 patients, with mortality rates of 32.3%, 17.2% and 18.4%, and admission to ICU was required for 52.5%, 43.5% and 25.6%, respectively. The best AUC-ROC for mortality was for CURB-65 in cohort 3 (AUC-ROC=0.67). The calibration was adequate (p>0.05) for the three scores. CONCLUSIONS: None of these scores proved to be an appropriate predictor for mortality and admission to the ICU. Furthermore, the CRB 65 exhibited the lowest discriminative ability.

Pneumonia is a significant cause of sepsis worldwide, representing approximately half of all cases, and is the second most frequent cause of sepsis in Colombia 1,2 . Globally, pneumonia confers a high risk of mortality 3, 4 . Between 2005 and 2012 in Colombia, acute respiratory infection was the number one cause of death from communicable diseases, with 48.6% of the cases, representing 56.2% of deaths from communicable diseases in women and 43.1% in men 5 .

Providing health care to patients with severe infections carries high cost to a state and its health system. These infections have a challenging clinical approach because they do not have simple and specific prognostic markers that allow early identification of individuals at risk who warrant differential care. Therefore, it is important to have useful clinical tools to estimate the risk of death or complications in emergency department patients with suspected infections. Several studies have been conducted to define a mortality predictive score specifically for pneumonia, and the CURB-65 and CRB-65 scores have been widely used due to their easy application, compared with other ones such as the PSI (Pneumonia Severity Index) 6 . Recently, the third consensus in sepsis (SEPSIS 3) encouraged the implementation of qSOFA (quick sepsis-related organ failure assessment) score in adult patients suspected of having an acute bacterial infection for early identification of those on worse prognosis 7 . The Colombian Ministry of Health 8 , as well as the Argentine Society of Infectious Diseases 9 , and the Mexican Institute of Social Security 10 in their guidelines for the management of patients with community-acquired pneumonia, recommend implementing CURB-65, despite the lack of local studies to confirm and validate this recommendation 7 .

The CURB-65, CRB-65 and qSOFA were designed to identify patients at increased risk of complications and mortality. These scores share clinical variables in their compositions and community-acquired pneumonia is the main cause of sepsis; therefore, exploring potential differences in their performance as prognosis models would have implications for clinical

The CURB-65, CRB-65 and qSOFA were designed to identify patients at increased risk of complications and mortality. These scores share clinical variables in their compositions and community-acquired pneumonia is the main cause of sepsis; therefore, exploring potential differences in their performance as prognosis models would have implications for clinical practice.

We did not find either the qSOFA, CURB-65 or CRB-65 to be adequate tools for discriminating hospital mortality or ICU admission in three cohorts of patients with community-acquired pneumonia, who were admitted to emergency departments in 5 reference hospitals in Medellín, Colombia.

The qSOFA, CURB-65 and CRB-65 were all found to be ineffective predictive tools for mortality and admission to the ICU in our cohorts, therefore it is necessary to develop and validate predictive models of prognosis of community-acquired pneumonia that are useful for the Colombian population.

practice. Likewise, it is necessary to validate any multivariable model that has been developed for prognostic or diagnostic purposes for a clinical issue in independent populations 11 . Therefore, this study aimed to validate and compare the three scores to determine the best tool to identify emergency department patients with pneumonia who are at increased risk of mortality or intensive care unit (ICU) admission.

This analysis was performed using three prospective cohort studies developed between 2013 and 2016 in five emergency departments of the city 

For each of the original cohorts, trained research assistants collected data based on electronic medical records in a systematic way, reviewing all admissions to hospital institutions and screening all patients admitted for emergencies with a diagnosis of infection, sepsis, severe sepsis or shock. The definition of the infection source and the presence of organ dysfunction or septic shock were verified with the data extracted from the medical history records in the first 6 hours. To assess the data accuracy, the information was evaluated periodically by the coinvestigators. This information was recorded using forms designed specifically for each of the investigations and then stored in electronic databases. Given that the cohorts were prospective, the evaluation of predictors was independent from knowledge of the outcomes of interest.

Additionally, it was necessary for this study to recover the BUN (blood urea nitrogen) value upon hospital admission for patients at Hospital Universitario San Vicente Fundación and Hospital Pablo Tobón Uribe. The data collection process took information confidentiality into account and was approved by the ethics committees of each of the participating institutions.

For the current study, the inclusion criteria were patients who had entered the previous studies with a diagnosis of pneumonia. For cohort 1, the Centers for Disease Control and Prevention (CDC) criteria for infection were used for inclusion, cohort 2 consisted of suspected infection with at least one organ dysfunction criterion, and cohort 3 consisted of clinical suspicion of infection. As common criteria exclusion, we found for the 3 cohorts: patients that were early discharge or referenced to another institution, and patients with do-not-resuscitate orders or terminal diseases (Annex 1). No additional exclusion criteria that had not been considered in the original studies were used in the present study (12) (13) (14) .

The primary outcome was hospital mortality; ICU admission was included as a secondary outcome.

Given that this was a secondary analysis of data, there was no calculation of sample size because the analysis was performed with patients of the respective cohorts that met the inclusion criteria. However, the power for the expected difference in the areas under the curve was calculated from a fixed number of patients and considered a type I error fixed at 0.05. The calculation was based on the formula described by Hanley and McNeil 16, 17 .

With a fixed sample size of 158, 745 and 207 patients for cohorts 1, 2 and 3, respectively, an alpha of 0.05 and taking the observed values of the AUC-ROC (area under the ROC curve) as θ1: 0.7 and θ2: 0.77 (based on the study by Kolditz et al. 18 because we lacked this information locally), we found an estimation of power of 0.52, 0.98 and 0.62, respectively.

The quantitative variables with a normal distribution are presented as means and standard deviations, while those without a normal distribution are expressed as medians and interquartile ranges (IQRs).

A validation and comparison of the three predictive models (CURB-65, CRB-65 and qSOFA) was performed in terms of prognosis. To determine the accuracy of the prediction of the models, it was necessary to examine both the calibration and the discrimination. Calibration compares and establishes the agreement between observed and expected events, while discrimination establishes the ability with which the score distinguishes between individuals who experience or do not experience the event of interest 19 . The performance of the scores in terms of discrimination was determined based on the area under the receiver operating characteristics curve (AUC-ROC) based on the models defined as the sum of the corresponding predictors. The differences between the AUC-ROC were tested using the DeLong-DeLong statistic 20 . The calibration was determined by the degree of correspondence given by the Hosmer-Lemeshow goodness-of-fit test (p> 0.05). Additionally, calibration curves were performed based on the results of the models in each of the cohorts.

The operative characteristics for prediction of mortality and ICU need for each of the scores were then estimated, taking two or more points for the qSOFA and 3 or more for both the CURB-65 and CRB-65, based on the original proposal of the models indicating these cutoff points as high risk of mortality. Likewise, the performance of each of the predictive models was analyzed according to all possible cutoff points and compared with the originally proposed cohort points. To calculate the sensitivity, specificity, predictive values and the likelihood ratios of the mentioned scores with their respective cutoff points, Bayes theorem was used, considering mortality and ICU need as a reference test or gold standard.

In the main analysis, missing data were considered as abnormal values (worst-case scenario). Additionally, a sensitivity analysis was performed with two additional models: the best scenario, considering the missing data as normal values, and with a multivariate normal regression (MVN), multiple imputation technique, taking the BUN, age, gender, Charlson index, SOFA and Acute Physiology, Age, Chronic Health Evaluation II (APACHEII) as independent values.

Statistical analyses were performed with the Stata 14 ® software. The results are presented with their respective 95% confidence intervals (CI), and a significance level of p <0.05 was applied. Publication standards given by the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines were followed 11 .

A total of 158, 745 and 207 patients were analyzed for cohorts 1, 2 and 3, respectively. In the same order, the median age was 70 (IQR = 56-81), 66 (IQR = 54-77) and 60 (IQR = 44-75) years; 34.2%, 48.9% and 44.4% were female; 52.5%, 43.5% and 25.6% required admission to the ICU; and 32.3%, 17.2% and 18.4% died during hospitalization (Table 1 ). Blood cultures were requested in 95.6%, 84.8% and 84.5% and germs were isolated for 23.2%, 10.8% and 9.1% of the patients in cohorts 1, 2, and 3, respectively. The most frequent microorganisms found in each of the cohorts were Streptococcus pneumoniae, Klebsiella pneumoniae, Haemophilus influenzae and Escherichia coli ( Table 2) .

Abbreviations: SOFA, Sequential Organ Failure Assessment; APACHE II, Acute Physiology and Chronic Health Evaluation II; RR, respiratory rate; SAP, systolic arterial pressure; DAP, diastolic arterial pressure; MAP, mean arterial pressure; BUN, blood urea nitrogen. The quantitative variables were expressed as the medians and their respective interquartile range; categorical variables are shown in absolute and relative frequencies. For the outcome of admission to the ICU, discrimination was low for the three scores in the three cohorts. From the DeLong-DeLong statistic, a statistically significant difference was found between the AUC-ROC in cohorts 1 and 2 (P <0.05) (Figure 1 The calibration of the models was adequate in the study population for admission to the ICU and the mortality outcome, according to the Hosmer-Lemeshow statistic of the three scores in each of the cohorts (p> 0.05) (Table S1 ). Additionally, calibration curves were performed for both outcomes in the different models in each of the cohorts, and a high degree of correspondence of the scores was shown in most of the cohorts (Supplementary Figure S1 and andS2) .S2).

Regarding the performance of the models in their operative characteristics, the greatest sensitivity for ICU need was with the qSOFA (55.4%) and for mortality was with CURB-65 (58.8%) in cohort 1. The greatest specificity was with CRB-65 for both ICU need and mortality, with 93.5% and 93.4% in cohorts 2 and 3, respectively. The lowest performance in predicting mortality in terms of sensitivity was for the CRB-65 in cohort 3 (13.2%), for specificity it was for the qSOFA in cohort 1 (43.9%) and for the positive predictive value it was the CRB-65 in cohort 3 (Tables S2and andS3) .S3).

We found that qSOFA, CURB-65 or CRB-65 were not optimal in discriminating hospital mortality or ICU admission in three cohorts of patients with community-acquired pneumonia admitted to five hospitals in Medellín. However, looking at the AUC, sensitivity and negative predictive value values, CURB-65 appeared to consistently perform better than the other two tools with respect to mortality discrimination. In contrast, with regard to calibration, it was possible to demonstrate a good performance for the three scores in the 3 cohorts. Nevertheless, a lack of good discriminative performance indicates that these scoring systems should not be used as predictive tools 19, 21 .

It is necessary to account for the setting of the studies that originally developed the scores: the CURB-65 and the CRB-65 were developed in the United Kingdom, New Zealand and the Netherlands 22 more than 20 years ago, countries with a community-acquired pneumonia associated mortality lower than in Colombia (9% vs 17-32%). On the other hand, the qSOFA was derived from a very recent cohort 23 performance between the CURB-65 and CRB-65 scores for mortality at 30 days with an AUC over 0.85. Subsequently, Man et al. compared these prediction rules for 30-day mortality in patients with community-acquired pneumonia and found AUCs higher than the ones observed in the present study 25 .

In the original studies that served as the basis for the development of qSOFA, Seymour et al. found a good performance for the prediction of in-hospital mortality 26 As shown in the studies presented previously, the performance of the scores changed significantly among all cohorts due to their differences, include the distribution of etiological agents, coexisting diseases, social support, availability of resources and medical behaviors, including the ICU admission criteria. In our study, these scores performance varied even though the cohorts were from the same city, which can be explained by the variability in the patient inclusion criteria.

The AUC-ROC is a statistical parameter that allows the comparison of predictive models of diagnosis or prognosis in terms of discrimination capacity, and it is reasonable to use an AUC-ROC >0.75 as a reference of acceptable performance. However, this statistical measure does not allow a direct clinical interpretation, and this limitation in predictive models is a constant in the literature on this topic, for this reason it is always necessary to evaluate simultaneously their operative characteristics. Regarding calibration, none of the mentioned studies above accounted for this in the statistical analysis. The critical importance of poor calibration is often underestimated. This can lead to a decrease in clinical utility; the implementation of a predictive tool with poor calibration could even lead to making decisions that are harmful to the patient 32 . Future studies could consider other variables for score calculations, such as variables related to the microbiological agent, pulse oximetry, temperature, and comorbidities such as chronic obstructive pulmonary disease, congestive heart failure, and immunosuppression, among others.

One of the limitations of our study was the sample size. We based the difference of 0.7-0.77 between the discrimination (AUC-ROC) of CRB-65 and qSOFA scores on partial information from Kolditz et al 18 . This difference, however, does not necessarily have a clinical basis and did not consider that all scores had a final poor discrimination performance (AUC <0.75). The traditional approximation of the sample size calculation in predictive models defines a value of at least 10 outcomes for each independent variable 33, 34 . For comparisons between models, exclusively by means of discrimination, we based the sample size formula on the AUC-ROC comparison by 17 . However, specifically for the validation of predictive models, there is no clear indication of the sample size calculation, and although some authors have suggested a minimum of 100 outcomes, many studies do not consider this aspect 35, 36 . On the other hand, the collection was performed in 5 institutions that are recognized as high quality health care centers, which can lead to a selection bias. However, the three cohorts had different inclusion criteria, which significantly improved the clinical spectrum of the study population.

Another limitation was that despite being prospectively constructed cohorts, this study provides a secondary analysis of data, giving rise to missing urea values for some participants. These missing data were considered as abnormal values, which could generate a differential or nondifferential classification bias. The missing data represented only 5%, however, and the sensitivity analysis with different scenarios did not improve the performance of the models.

A predictive model is not of practical use if it cannot discriminate and be calibrated at the same time: to properly separate those who present the condition from those who do not, is as important as whether there is agreement between observed and expected events 21 . Unlike the supervision required for new medical technologies, prediction systems are not subjected to strict judgments, despite the potential risk of affecting a greater number of patients due to their extensive implementation.

In the three independent cohorts of patients admitted by the emergency department with pneumonia, the qSOFA, CURB-65 and CRB-65 were all found to be limited predictive tools for mortality and admission to the ICU. Furthermore, the CRB-65 exhibited the lowest discriminative ability.

Inclusion criteria: Patients with an infection in accordance with the clinical or microbiological criteria of the CDC definitions, and at least one organ/system dysfunction, based on the SOFA score ≥2, produced or related to the infection, and detected within 24 h prior to being admitted to the study. For the HGM and CLA, in addition to the previous criteria, patients were required to have positive blood cultures. This condition was established because of problems in recruiting and the need to improve the efficiency in the analysis by the infectious disease specialists. Exclusion criteria: Patients referred to another hospital within the first 48 h of admission were excluded, as were patients with known infections who needed prolonged treatments (e.g. tuberculosis, nocardiosis, histoplasmosis) and patients with do-not-resuscitate orders or terminal diseases.

Inclusion criteria: Patients ≥18 years, admitted by the ER with a suspected or confirmed diagnosis of infection, sepsis, severe sepsis or septic shock; at least two criteria of systemic inflammatory-response-syndrome and systolic blood pressure <90 mmHg after a bolus of crystalloid of at least 20ml/kg, or a serum lactate >4 mmol/L. Exclusion criteria: Refusal by the patient, family or attending physician to participate; pregnancy, myocardial infarction, stroke, asthmatic crisis, arrhythmia, trauma, gastrointestinal bleeding, seizure not due to meningitis, psychoactive substance overdose, surgery <24 hours, burns, CD4 count <50 cells/mm3, hyperosmolar status, diabetic ketoacidosis or cirrhosis; released or referred in the first 24 hours, prior participation in the study, referral from another institution where the patient has been hospitalized >24 hours, or a Do-Not Resuscitate (DNR) order.

Inclusion criteria: patients ≥18 years admitted to the ER with an acute bacterial infection confirmed by clinical or laboratory evidence, in accordance with the CDC criteria; Furthermore, patients should be available to assess vital signs and for physical findings upon being admitted to the hospital.

Exclusion criteria: patients who were referred from another institution where they stayed for more than 24 hours; discharge or reference to another institution during the first 24 hours of admission; diseases that hinder the physical evaluation of clinical parameters, such as limb amputation, extensive burns or severe skin diseases, Raynaud's phenomenon or peripheral arterial disease; cirrhosis; mesenteric thrombosis; patient's refusal to participate; screening after 6 hours of being admitted to the ER; a previous participation in the study and a do not resuscitate order. Figure S1 . Calibration curves for ICU need prediction

The epidemiology of sepsis in Colombia: a prospective multicenter cohort study in ten university hospitals

Severe sepsis and septic shock

Association of diagnostic coding with trends in hospitalizations and mortality of patients with pneumonia

Infectious Diseases Society of America/American Thoracic Society consensus guidelines on the management of communityacquired pneumonia in adults

Severity assessment tools for predicting mortality in hospitalised patients with community-acquired pneumonia. Systematic review and meta-analysis

The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)

Community-acquired pneumonia: comparison of three mortality prediction scores in the emergency department

Ministerio de la Protección Social. Guías para Manejo de Urgencias. 3a Edición. Bogota: Ministerio de la Protección

Neumonía adquirida de la comunidad en adultos: recomendaciones sobre su atención

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement

Antibiotics has more impact on mortality than other early goal-directed therapy components in patients with sepsis: An instrumental variable analysis

Antimicrobial agent prescription: a prospective cohort study in patients with sepsis and septic shock

Association of clinical hypoperfusion variables with lactate clearance and hospital mortality

Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study

The meaning and use of the area under a receiver operating characteristic (ROC) curve

A method of comparing the areas under receiver operating characteristic curves derived from the same cases

Comparison of the qSOFA and CRB-65 for risk prediction in patients with community-acquired pneumonia

What do we mean by validating a prognostic model

Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

Statistical Evaluation of Prognostic versus Diagnostic Models: Beyond the ROC Curve

Study of community acquired pneumonia aetiology (SCAPA) in adults admitted to hospital: implications for management guidelines

New Sepsis Definition (Sepsis-3) and Community-acquired Pneumonia Mortality. A Validation and Clinical Decision-Making Study

Validation of a predictive rule for the management of community-acquired pneumonia

Prospective comparison of three predictive rules for assessing severity of community-acquired pneumonia in Hong Kong

Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock

Predictive performance of quick Sepsis-related Organ Failure Assessment for mortality and ICU admission in patients with infection at the ED

Use of CRB-65 and quick Sepsis-related Organ Failure Assessment to predict site of care and mortality in pneumonia patients in the emergency department: a retrospective study

Communityacquired pneumonia severity assessment tools in patients hospitalized with COVID-19: a validation and clinical applicability study

Comparison of severity scores for COVID-19 patients with pneumonia: a retrospective study

Utility of established prognostic scores in COVID-19 hospital admissions: multicentre prospective evaluation of CURB-65, NEWS2 and qSOFA

Calibration of risk prediction models: impact on decision-analytic performance

Relaxing the rule of ten events per variable in logistic and Cox regression

Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure

Substantial effective sample sizes were required for external validation studies of predictive logistic regression models

Validation and updating of predictive logistic regression models: a study on sample size and shrinkage