key: cord-0755688-njg4uqqp authors: Akama-Garren, Elliot H.; Li, Jonathan X. title: Unbiased identification of clinical characteristics predictive of COVID-19 severity date: 2021-06-05 journal: Clin Exp Med DOI: 10.1007/s10238-021-00730-y sha: 8194e4239269977f6edbdd7adee033dbbaa46e55 doc_id: 755688 cord_uid: njg4uqqp There is currently limited clinical ability to identify COVID-19 patients at risk for severe outcomes. To unbiasedly identify metrics associated with severe outcomes in COVID-19 patients, we conducted a retrospective study of 835 COVID-19 positive patients at a single academic medical center between March 10, 2020 and October 13, 2020. As of December 1, 2020, 656 (79%) patients required hospitalization and 149 (18%) died. Unbiased comparisons of all clinical characteristics and mortality revealed that abnormal pH (OR 8.54, 95% CI 5.34–13.6), abnormal creatinine (OR 6.94, 95% CI 4.22–11.4), and abnormal PTT (OR 4.78, 95% CI 3.11–7.33) were most significantly associated with mortality. Correlation with ordinal severity scores confirmed these associations, in addition to associations between respiratory rate (Spearman’s rho = −0.56), absolute neutrophil count (Spearman’s rho = −0.5), and C-reactive protein (Spearman’s rho = 0.59) with disease severity. Unsupervised principal component analysis and machine learning model classification of patient demographics, laboratory results, medications, comorbidities, signs and symptoms, and vitals are capable of separating patients on the basis of COVID-19 mortality (AUC 0.82). This retrospective analysis identifies laboratory and clinical metrics most relevant to predict COVID-19 severity. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10238-021-00730-y. As the number of COVID-19 deaths approaches 3.5 million worldwide as of May 11, 2021 , there is increasing need to better understand what disease mechanisms and clinical correlates lead to poor outcomes. SARS-CoV-2 infection may result in a spectrum of severity ranging from asymptomatic disease to hospitalization requiring mechanical ventilation [1] [2] [3] [4] [5] [6] [7] , making identification of patients at risk for severe COVID-19 at initial presentation imperative yet complex. Case series of hospitalized COVID-19 patients during the early pandemic identified key risk groups of severe COVID-19 [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] , including patients with diabetes, obesity, chronic kidney disease, liver disease, and patients above 65 years old. Cytokine profiling [18] and multi-dimensional flow cytometry [19] [20] [21] [22] have identified hematologic profiles associated with severe COVID-19. Over the course of the pandemic, these advances along with improvements in supportive care such as prone positioning [23] [24] [25] have led to reductions in disease mortality [26, 27] . Despite these advances, clinical prediction of COVID-19 prognosis at the time of initial presentation remains imperfect [28] . A better understanding of the clinical correlates of COVID-19 severity would improve prognostic and therapeutic approaches to disease assessment. With an accumulating number of SARS-CoV-2 positive patients with a range of clinical outcomes, we are increasingly able to perform unbiased analyses across more diverse multi-dimensional clinical metrics, in order to identify novel associations with COVID-19 severity. We sought to leverage these data to determine which clinical characteristics are most useful to predict COVID-19 severity. Here, we perform analyses of over 1,700 clinical metrics including laboratory results, vitals, demographics, medications, and disease outcomes in 835 COVID-19 positive patients to identify correlates of disease severity. This study was conducted at the Beth Israel Deaconess Medical Center (BIDMC) in Boston. The BIDMC Institutional Review Board approved this retrospective cohort study (2020P000699) as minimal risk using data collected during routine clinical care and waived the requirement for informed consent. BIDMC patients who presented for care and with confirmed SARS-CoV-2 infection by positive result of nasopharyngeal sample polymerase chain reaction between March 10, 2020 and October 13, 2020, and who had available past medical history, were included. Data were obtained from the BIDMC COVID-19 Observational Research Effort (CORE) Data Registry REDCap database and BIDMC InSIGHT CORE service. Laboratory values were obtained from inpatient data acquired over the course of an individual patient's admission. When multiple laboratory draws were present over the course of a patient's admission, mean, maximum, and minimum laboratory values for each test collected were calculated for each patient. Time to follow-up was determined by the number of days between the earliest COVID-19 test date and date of death or December 1, 2020, the final date of follow-up, if still alive. COVID-19 severity was graded by the NIH Ordinal Severity Scale. Patients were stratified into eight groups with lower scores corresponding to greater severity: (1) death, (2) invasive mechanical ventilation, (3) noninvasive ventilation, (4) supplemental oxygen, (5) no supplemental oxygen but requiring medical care, (6) no supplemental oxygen and not requiring medical care, (7) limitation in activities, or (8) no limitation in activities. Outcome metrics including mortality, hospitalization length and status, ICU length and status, ventilation and renal replacement therapy requirement, NIH Ordinal Severity Score, pathology results, and medications prescribed after COVID-19 diagnosis were excluded to allow for unsupervised PCA. Patients and metrics with missing data were excluded from analysis, and categorical factor variables were converted to dummy numerical variables. Data were scaled to unit variance and principal component analysis was performed using factoextra (version 1.0.7). The top two principal components were used for two-dimensional mapping of patient data and variable eigenvectors. Mortality status was added to the data set used for PCA to allow for construction of a supervised machine learning classifier. All machine learning analyses were performed in R (version 3.6.1). Training and test data sets were created using the createDataPartition function in caret (version 6.0), with 75% of patients allocated to the training data set. Training data were preprocessed by centering and scaling and training was performed using ten separate tenfold repeated cross-validations for resampling. A gradient boosting machine model [29, 30] was built using 100 trees, a tree complexity of 2, and a learning rate of 0.1 using the train function in caret. Training performance was measured using area under the ROC curve, and variable importance was calculated using the varImp function in caret. Model performance was tested on the test data set and evaluated using MLeval (version 0.3). All statistical analyses were performed in R (version 3.6.1). Bar graphs and violin plots were created using ggpubr (version 0.4.0), correlation plots were created using corrplot (version 0.84), Kaplan-Meier plots were created using survminer (version 0.4.8) and survival (version 3.2-7), and scatter plots and forest plots were created using ggplot2 (version 3.3.0). Heatmaps and hierarchical clustering were performed using pheatmap (version 1.0.12). Volcano plots were generated using EnhancedVolcano (version 1.4.0), and significant differences (absolute logFC > 0.2 and P-val < 0.05) were highlighted in red. When data were missing, these patients were not included in a given univariate analysis, eliminating potential confounding due to the presence or absence of a given clinical metric. When multiple comparisons were made, p values were corrected by the Benjamini-Hochberg procedure and a false discovery rate < 0.05 was considered significant. A total of 835 patients with PCR confirmed SARS-CoV-2 infection were included ( Table 1 ). The median age was 64 years (IQR, 50-76 years; range, 17-102 years) and 438 (52%) were female. Of these patients, 363 (43%) were white and 253 (30%) were black. Past medical history was available for 549 patients and among these patients, common comorbidities included hypertension (347; 63%), diabetes (224; 41%), obesity (157; 30%), chronic kidney disease (144; 26%), and cancer (131; 24%). Active prescriptions at time of COVID-19 diagnosis were available for 697 patients, and among these the most common categories of prescribed drugs included To validate our ability to identify risk factors for COVID-19 severity, we compared mortality rates among currently recognized comorbidities for COVID-19 (Fig. 1A) In order to unbiasedly compare the relative association of clinical characteristics with COVID-19 outcomes, we calculated the odds ratios among binary categorical clinical metrics measured, including laboratory results, demographics, medications, comorbidities, and signs and symptoms (Fig. 1B) We next sought to compare the relative association between continuous variables and COVID-19 outcomes. Mann-Whitney U tests between mortality and laboratory values and demographic information revealed that elevated creatinine was most significantly associated with mortality (average maximum creatinine 3.97 in dead vs 1.97 in alive, adjusted P-val < 2 × 10 -16 ) (Fig. 1C) . Other significant associations with mortality included decreased albumin (average minimum albumin 2.50 in dead vs 3.22 in alive, adjusted P-val < 2 × 10 -16 ), decreased lymphocyte count (average minimum lymphocytes 7.53 in dead vs 13.56 in alive, adjusted P-val < 2 × 10 -16 ), elevated phosphate (average maximum phosphate 6.50 in dead vs 4.61 in alive, adjusted P-val < 2 × 10 -16 ), and older age (average age 71.9 years in dead vs 59.5 in dead, adjusted P-val = 8.6 × 10 -16 ) (Fig. 1D) . Comparisons in hospitalization, ventilation, oxygen requirement, and ICU admission patient groups revealed similar associations between abnormal creatinine, albumin, lymphocytes, and phosphate and COVID-19 outcomes (Fig. 1C) . These results suggest that laboratory abnormalities might be more informative in predicting outcomes from COVID-19 than patient demographic information including comorbidities. To quantify and rank the effects of clinical metrics on time to death following COVID-19 diagnosis, we performed Kaplan-Meier analysis of patient survival using positive COVID-19 test date and date of death. Among the 149 (18%) of patients that died, the median survival time after COVID-19 diagnosis was 13 days (IQR, 7-28 days) ( Fig. 2A) . Regression analysis of demographics, laboratory results, medications, comorbidities, and vitals against survival probability revealed that abnormal pH (HR 6.5, 95% CI 4.2-10), stratified age groups (HR = 1.5, 95% CI 1.3-1.7), abnormal albumin (HR 3.6, 95% CI 2.4-5.5), and abnormal phosphate (HR 4.7, 95% CI 2.7-8.1) were most significantly associated with increased risk of COVID-19 death (Fig. 2B) . These risks are greater than those associated with currently accepted comorbidities for severe COVID-19 in our cohort, such as hypertension (HR 2.0, 95% CI 1.2-3.3), diabetes (HR 2.1, 95% CI 1.4-3.3), and chronic kidney disease (HR 2.2, 95% CI 1.4-3.3) (Fig. 2C ). Both race (HR 0.99, 95% CI 0.92-1.1) and gender (HR 1.3, 95% CI 0.91-1.7) were not significantly associated with decreased survival following COVID-19 diagnosis in our cohort. To examine associations between clinical metrics and COVID-19 severity beyond binary categorical outcomes, we measured the correlation of each metric with NIH ordinal severity scores and total length of stay per patient (Fig. 3A) . Ordinal score was most significantly correlated with maximum respiratory rate (Spearman's rho = −0.56), maximum absolute neutrophil count (Spearman's rho = −0.5), maximum C-reactive protein (Spearman's rho = −0.52), and minimum albumin (Spearman's rho = 0.5) (Fig. 3B) . The total length of admission was most significantly correlated with maximum temperature (Spearman's rho = 0.62), maximum phosphate (Spearman's rho = 0.60), minimum hemoglobin (Spearman's rho = −0.58), and minimum systolic blood pressure (Spearman's rho = −0.53) (Fig. 3C) . These results confirm our previous findings, suggesting that hematologic laboratory results are not only indicative of mortality in COVID-19 patients, but are also correlated with disease severity. These results also quantify the relative association of vitals such as respiratory rate and temperature with COVID-19 severity. To determine relationships between multiple categorical and numerical outcomes and metrics, we performed correlation analysis across patient demographics, selected laboratory results, medications, comorbidities, vitals, and outcomes including continuous metrics of COVID-19 severity (Fig. 4) . In addition to the associations noted previously, this analysis revealed significant correlations between COVID-19 outcomes and clinical interventions such as ICU admission and mechanical ventilation. As expected, comorbidities were highly correlated with prescriptions for appropriate medications (e.g., diabetes and antiglycemic drugs) as well as corresponding laboratory results (e.g., chronic kidney disease and mean creatinine). Notably, comorbidities were more closely associated with corresponding medications than COVID-19 outcomes, whereas laboratory values and vitals were more closely associated with COVID-19 outcomes than corresponding comorbidities. Overall, this correlation analysis revealed the heterogeneity of COVID-19 patient presentation, and the relative utility of a spectrum of patient information in predicting COVID-19 severity. To determine whether COVID-19 patients can be stratified by severity based on clinical metrics typically present at admission to the emergency department, we performed unsupervised principal component analysis (PCA). We excluded metrics of COVID-19 outcomes and severity and metrics that would not be known at admission, such as pathology results and medications placed after COVID-19 diagnosis. Only patients for whom full demographic, laboratory, medication history, comorbidities, past medical history, and vitals were available were included, leaving 237 metrics across 209 patients. PCA distilled these 237 metrics into two dimensions, which were most defined by immunosuppression and anemia in Dimension 1, and by AST, LDH, ALT, and ferritin in Dimension 2 (Fig. 5A ). The eigenvectors for mean AST and maximum ferritin were orthogonal to the eigenvector for immunosuppression (Fig. 5B) , suggesting that these metrics capture independent meta-characteristics of COVID-19 patients. We next plotted the 209 patients present in our PCA in two-dimensional space. There was no clear distribution of COVID-19 patients in PCA space on the basis of demographic information such as gender, race, and age (Fig. 5C) . However, when we visualized mortality, which was not a variable included in our PCA, there was a separation among COVID-19 patients in PCA space. Similar trajectories could be appreciated for COVID-19 severity and outcomes metrics, such as length of stay, mechanical ventilation requirement, and ordinal score (Fig. 5C) . Trajectories of COVID-19 severity in PCA space were orthogonal to the eigenvector for immunosuppression, suggesting that although immunosuppression contributes to variability among COVID-19 patients, it likely does not contribute to disease severity. Given our ability to segregate patients by COVID-19 severity using unsupervised PCA, we next sought to design a machine learning classifier to predict patient mortality. Using mortality in addition to the 237 variables used for PCA above, we partitioned our COVID-19 patient cohort into a training set of 157 patients and a test set of 52 patients. The training set of patients was used to build a supervised gradient boosting machine model to classify patient mortality. Our model achieved a sensitivity of 0.53 (95% CI 0.39-0.67), specificity of 0.88 (95% CI 0.81-0.93), and area under curve (AUC) for the ROC curve of 0.87 (95% CI 0.80-0.94) based on the training data (Fig. 5D) . When applied to the test set, our model correctly identified 6 of 15 patients who died following COVID-19 diagnosis, achieving an accuracy of 0.77 (95% CI 0.63-0.87), a sensitivity of 0.92, specificity of 0.40, and AUC ROC of 0.82. Variable importance scores extracted from the gradient boosting machine model revealed that absolute neutrophil count, PTT, and patient age were the most contributory to model prediction (Fig. 5E ). Together our PCA and machine learning classifier suggest that COVID-19 severity and outcomes can be correlated with clinical characteristics known at the time of admission and confirm the importance of laboratory data over demographic information in predicting disease outcome. Here, we unbiasedly profile over 1700 unique clinical metrics in 835 COVID-19 patients to identify correlates of disease outcomes and severity. We observed similar odds ratios for COVID-19 mortality risk from comorbidities previously reported, such as increased age [11, 17, [31] [32] [33] , hypertension [12] , diabetes [8, [11] [12] [13] , and chronic kidney disease [16] . Univariate, correlation, and multivariate analyses revealed strong associations between key laboratory parameters and COVID-19 severity. Several of these associations have been previously reported, such as elevated creatinine [34] , decreased lymphocyte count [19, 20] , elevated CRP [34] , decreased hemoglobin [20] , abnormal pH [35] , decreased albumin [36] , and elevated PTT [20] . Notably, through unbiased comparisons across all clinical metrics, we observed that these laboratory abnormalities are more strongly associated with mortality in COVID-19 patients than patient age, gender, comorbidities, or prescribed medications. As this was a retrospective cohort study of associations with COVID-19 outcomes, it remains unclear whether the metrics identified here predispose patients to worse outcomes or are a consequence of severe COVID-19 itself. Fig. 1 Univariate analyses identify key laboratory parameters associated with mortality in COVID-19 patients. a Forest plot comparing odds ratios of selected comorbidities with mortality, hospitalization, and ICU admission in COVID-19 patients. Horizontal lines indicate 95% CI. b Volcano plots of odds ratios of laboratory results, demographics, medications, comorbidities, and signs and symptoms with mortality, hospitalization, and ICU admission in COVID-19 patients. P values corrected for multiple comparisons by Benjamini-Hochberg procedure and significant metrics (P-adj < 0.05) indicated in red. c Heatmap of adjusted p values from Mann-Whitney U tests for continuous laboratory values and demographic information between patients requiring or not requiring ICU admission, supplement oxygen, mechanical ventilation, hospitalization, and death. Metrics significantly altered between alive and dead patient cohorts are shown and arranged by increasing adjusted p value. d Violin plots of the most significantly altered clinical metrics alive and dead patient cohorts. Mann-Whitney U test p value shown ◂ Abnormal pH and increased respiratory rate in patients with severe COVID-19 is likely reflective of the eventual acute respiratory distress syndrome and tissue malperfusion experienced by these patients [5] , whereas the elevated inflammatory markers we observed are characteristic of the systemic inflammation observed in some case of severe COVID-19 [3, 37, 38] . Some laboratory perturbations such as prolonged PTT might reflect interventions employed preferentially in COVID-19 patients such as anticoagulants. Other laboratory parameters such as decreased lymphocytes and albumin might represent a unique inflammatory phenotype that predisposes patients to severe COVID-19 [19] . Regardless of Fig. 2 Unbiased identification of metrics most associated with increased risk of dying following COVID-19 diagnosis. A Kaplan-Meier plot of patient survival following COVID-19 diagnosis. B Volcano plot of hazard ratios (HR) calculated from unbiased Cox regression analysis between all measured patient metrics and patient survival following COVID-19 diagnosis. P values were calculated using the Wald test statistic and corrected for multiple comparisons by Benjamini-Hochberg procedure. Significant metrics (P-adj < 0.05) indicated in red. C Kaplan-Meier plots of patient survival following COVID-19 diagnosis stratified by indicated patient demographic or laboratory result. Log rank test p value indicated on plots and 95% CI indicated by shading the root cause of the clinical associations we describe, we have identified key clinical metrics that may be obtained at emergency department admission to identify overall risk for COVID-19 mortality. We observed a mortality rate of 18% and hospitalization rate of 79%, in contrast to currently estimated case fatality rates of 0.9-7.2% [17, 33, 39, 40] for SARS-CoV-2. This is likely due to sampling bias as only patients who sought care at an academic medical center, obtained a laboratory confirmed COVID-19 diagnosis, and had available medication or past medical history were included. Alternatively, this might reflect the evolving mortality rate of the course of this pandemic, as our ability to diagnose and treat COVID-19 has improved the past year [41] . Nevertheless, a range of clinical presentations and disease severity scores are represented in our patient cohort, including outpatients and patients with asymptomatic disease. COVID-19 remains a great threat to society relative to other respiratory viral diseases due to its case fatality rate and its striking range of clinical presentations and severity [17, 42, 43] . This study offers an unbiased retrospective approach to identify potential associations with this fatality rate and spectrum of disease severity. Our data suggest that increased absolute neutrophil count, decreased albumin, and decreased lymphocytes are key correlates of severe COVID-19 and are clinical characteristics available at initial admission that might be informative of disease prognosis. By identifying which COVID-19 patients are most at risk for severe disease, we may be better able to provide early and targeted therapeutic interventions, thereby combatting the current pandemic in an orthogonal but complementary approach to the preventative approaches currently being pursued across the world. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating personto-person transmission: a study of a family cluster Persons evaluated for 2019 novel coronavirus-United States Clinical features of patients infected with 2019 novel coronavirus in Wuhan Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan Clinical characteristics of novel coronavirus cases in tertiary hospitals in Hubei Province Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study Comorbidities associated with mortality in 31,461 adults with COVID-19 in the United States: a federated electronic medical record analysis Obesity and mortality among patients diagnosed with COVID-19: results from an integrated health care organization Obesity in patients younger than 60 years is a risk factor for COVID-19 hospital admission Factors associated with COVID-19-related death using OpenSAFELY Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Preliminary estimates of the prevalence of selected underlying health conditions among patients with coronavirus disease 2019-United States Cancer patients in SARS-CoV-2 infection: a nationwide analysis in China Clinical outcomes in young US adults hospitalized with COVID-19 Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the chinese center for disease control and prevention An inflammatory cytokine signature predicts COVID-19 severity and survival Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications Haematological characteristics and risk factors in the classification and prognosis evaluation of COVID-19: a retrospective cohort study COVID-19-neutralizing antibodies predict disease severity and survival Compromised humoral functional evolution tracks with SARS-CoV-2 mortality Feasibility and physiological effects of prone positioning in non-intubated patients with acute respiratory failure due to COVID-19 (PRON-COVID): a prospective cohort study Awake prone positioning in COVID-19 Prone positioning in awake, nonintubated patients with COVID-19 hypoxemic respiratory failure Improving survival of critical care patients with coronavirus disease in England: a national cohort study Trends in COVID-19 risk-adjusted mortality rates Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal Stochastic gradient boosting Nonlinear estimation and classification. Lecture notes in statistics COVID-19)-United States Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy Association of cardiac injury with mortality in hospitalized patients with COVID-19 in Wuhan Risk factors associated with in-hospital mortality in a US national sample of patients with COVID-19 Predictors of in-hospital COVID-19 mortality: a comprehensive systematic review and meta-analysis exploring differences by age, sex and health conditions Cytokine levels in the body fluids of a patient with COVID-19 and acute respiratory distress syndrome: a case report COVID-19: consider cytokine storm syndromes and immunosuppression Estimation of excess deaths associated with the COVID-19 pandemic in the United States Estimates of the severity of coronavirus disease 2019: a model-based analysis Outcomes and mortality among adults hospitalized with COVID-19 at US medical centers Asymptomatic transmission of covid-19 Excess deaths from COVID-19 and other causes We would like to thank the BIDMC COVID- The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s10238-021-00730-y.