key: cord-0717570-enh2gsf1
authors: Laino, Maria Elena; Generali, Elena; Tommasini, Tobia; Angelotti, Giovanni; Aghemo, Alessio; Desai, Antonio; Morandini, Pierandrea; Stefanini, Giulio G.; Lleo, Ana; Voza, Antonio; Savevski, Victor
title: An individualized algorithm to predict mortality in COVID-19 pneumonia: a machine learning based study
date: 2022-01-14
journal: Arch Med Sci
DOI: 10.5114/aoms/144980
sha: 6adf17a376822f717773d54b902dbae3f7adf37b
doc_id: 717570
cord_uid: enh2gsf1

INTRODUCTION: Identifying SARS-CoV-2 patients at higher risk of mortality is crucial in the management of a pandemic. Artificial intelligence techniques allow one to analyze large amounts of data to find hidden patterns. We aimed to develop and validate a mortality score at admission for COVID-19 based on high-level machine learning. MATERIAL AND METHODS: We conducted a retrospective cohort study on hospitalized adult COVID-19 patients between March and December 2020. The primary outcome was in-hospital mortality. A machine learning approach based on vital parameters, laboratory values and demographic features was applied to develop different models. Then, a feature importance analysis was performed to reduce the number of variables included in the model, to develop a risk score with good overall performance, that was finally evaluated in terms of discrimination and calibration capabilities. All results underwent cross-validation. RESULTS: 1,135 consecutive patients (median age 70 years, 64% male) were enrolled, 48 patients were excluded, and the cohort was randomly divided into training (760) and test (327) groups. During hospitalization, 251 (22%) patients died. After feature selection, the best performing classifier was random forest (AUC 0.88 ±0.03). Based on the relative importance of each variable, a pragmatic score was developed, showing good performances (AUC 0.85 ±0.025), and three levels were defined that correlated well with in-hospital mortality. CONCLUSIONS: Machine learning techniques were applied in order to develop an accurate in-hospital mortality risk score for COVID-19 based on ten variables. The application of the proposed score has utility in clinical settings to guide the management and prognostication of COVID-19 patients.

Since January 2020, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been implicated in the biggest pandemic of our century with more than 140 million cases, accounting for more than 4.6 million deaths (https://coronavirus.jhu.edu/map.html). Among other countries, Italy, and in particular the Lombardy region, have been dramatically exposed to SARS-CoV-2, with more than 30,000 deaths (https://coronavirus.jhu. edu/map.html).

SARS-CoV-2 infection can range from asymptomatic carriers to acute respiratory failure due interstitial pneumonia, which can be fatal in a considerable proportion of patients, up to 14% [1] .

In this view, determining the risk of severe disease at admission and therefore the mortality risk is crucial to support medical decision making during a pandemic with the need of allocating resources.

Imaging plays an important role in the diagnosis and management of COVID-19 pneumonia and in detecting the grade of lung involvement. Particularly, chest CT is considered the first-line imaging modality in highly suspected cases and is helpful for monitoring imaging changes during treatment [2] .

Furthermore, biomarker alterations can predict mortality [3] . However, no single biomarker has been demonstrated to have prognostic value. Since the original epidemic in Wuhan (China), several mortality risk factors have been identified: genetic predisposing factors [4] ; demographic factors, in particular older age and male sex [5] ; the presence of comorbidities, especially cardiovascular and metabolic [6] ; several laboratory findings -decreased lymphocyte count, increased lactic dehydrogenase (LDH), ferritin, and interleukin-6 (IL-6) [7] ; and imaging findings [8] . Furthermore, myocardial injury is frequent among patients hospitalized with COVID-19 and is associated with a poor prognosis, while early detection of altered cardiac markers is associated with high mortality even if admitted with mild disease [9] .

In this view, prognostic risk scores have been developed to assess mortality risk [10] , risk of clinical worsening and intensive care unit (ICU) admission [11] , or favorable outcomes [12] . A systematic literature review conducted at the beginning of the pandemic identified 50 prognostic studies assessing the risk of mortality; however, these studies had several limitations, in particular a high risk of bias, the absence of a validation cohort, the small sample size, and statistical faults [13] . So far, no risk score is routinely used in clinical practice; therefore clinical decisions are mainly based on expert opinion and clinical judgment.

Artificial intelligence (AI) represents a novel tool for analyzing big data in medicine, since it can provide the chance to assess frequently a large number of relevant variables, their temporal changes and interactions among variables with respect to the prognostic outcome. Furthermore, machine learning algorithms, such as decision trees, random forests, support vector machines, neural networks, and deep learning, can identify hidden pat-terns in clinical data [14] . Based on these findings, we aimed to develop a prognostic score for in-hospital mortality in patients admitted for COVID-19 pneumonia based on machine learning.

We conducted a retrospective cohort study on patients admitted for COVID-19 from March 1 st 2020 to December 15 th 2020 at Humanitas Research Hospital (Rozzano, Milan, Lombardy, Italy), a large tertiary center that has been largely converted to the management of COVID-19. Inclusion criteria were: age ≥ 18 years, diagnosis of COVID-19 with pneumonia documented by chest computed tomography (CT) requiring hospitalization. No exclusion criteria were applied.

SARS-CoV-2 viral RNA was detected by real-time polymerase chain reaction (PCR) in nasopharyngeal swabs to confirm the diagnosis of viral infection. Thoracic CT scan was used to confirm the diagnosis of pneumonia; interstitial pneumonia, bilateral ground-glass area, and absence of pleural effusion were considered typical features. In patients with a negative nasopharyngeal swab but high clinical or radiological suspicion for COVID-19, bronchoalveolar lavage was performed and SARS-CoV-2 RNA was detected in the bronchoalveolar fluid.

The following data were automatically collected from the electronic medical records: age, comorbidities (including arterial hypertension, diabetes mellitus, cardiovascular disease, cancer, chronic pulmonary disease, chronic kidney disease), vital parameters at admission (arterial pressure, heart rate, respiratory rate, oxygen saturation in room air). Comorbidities were summarized using the Charlson Comorbidity Index (CCI) [15] .

Patients underwent arterial blood gas analysis on admission, and pO 2 /FiO 2 ratio (p/F) was calculated. A complete laboratory panel was obtained for each patient at admission. The laboratory parameters assessed were: complete blood cell count, cardiac biomarkers, liver and renal function, coagulation tests, inflammation indexes. Laboratory analysis was performed by an internal laboratory. Any need for non-invasive mechanical ventilation or admission to the intensive care unit, and length of stay were recorded.

The primary outcome was in-hospital mortality. Secondary outcomes were ICU admission, need for non-invasive ventilation, and length of stay.

Missing values were imputed using an iterative algorithm (Iterative Imputer) [16] . Then, all the selected features were normalized to a z-score.

The analysis followed a machine learning approach using a supervised framework. For the prediction of outcomes, we employed four models of increasing sophistication: we started with a logistic regression and then moved to three ensemble methods: random forest, gradient boosting and extreme gradient boosting, which allow one to control for unbalanced classes, irregular distributions and outliers. The modelling strategy was as follows: 1. Favor interpretable models for both outcomes using only vital parameters, laboratory values and demographic features. 2. Assess the cross-validation performance for each outcome and provide a baseline. 3. Add interaction terms to increase model complexity and consider multi-collinearity. 4. Assess the performance of more complex models and compare it to baseline models. 5. Add a feature importance analysis to find the most significant features in predicting the outcome and increase scoring. We divided our dataset into training a set (70% of data) and a test set (30%).

An oversampling strategy (Synthetic Minority Oversampling Technique -SMOTE) was used to overcome the unbalancing of the two classes for outcomes (discharged patients -class 0 -and deceased patients -class 1) [17] .

The model parameters were chosen using a randomized search algorithm, through which we evaluated the performances of the classifiers on the whole dataset. We used the randomized search in order to jointly maximize macro-average F1, receiver operating characteristic curve (AUC) and accuracy values. We decided to use the macro-average F1 in cross validation in order to determine how the system performed overall across the sets of data. We did not take into consideration the micro-average and the weighted F1 since the former is influenced by the class unbalancing and both give more weight to the class with more observation. At the end, all the results obtained after the training and test phase were evaluated through a cross-validation process.

For the logistic regression and the random forest models, we decided to also use a class-weight strategy in order to balance the two classes.

As concerns feature selection, we planned to analyze the most important variables for the prediction of the outcome through a feature importance process (mean decrease impurity for tree-based models and b-coefficients for logistic regression). For this purpose, we designed an iterative process in which all models were re-trained, tested and cross-validated considering only subsets of the original set of features. The subsets were defined using a range of thresholds (from 0.01 to 0.05 for tree-based models and from 0.1 to 0.5 for logistic regression) for feature importance and, at each step, all features with an importance below the selected threshold were excluded from the training process.

Furthermore, we used the model derived by feature selection to develop a risk score that minimized the number of variables with good overall prediction performance to stratify patients. The score performances were evaluated in terms of discrimination and calibration capabilities. The discrimination ability of the proposed score was determined by AUC, and negative predictive value (NPV) and positive predictive value (PPV), sensitivity and specificity [18] . Finally, the relative importance was quantified for the identified variables and used as a comparative measure of patients' feature weight in determining the score.

The study received approval by the local Ethics Committee.

Statistical analysis was performed by STATA 13.1 (Stata Corp, College Station, Texas). Continuous variables were expressed as median and interquartile range (IQR), while categorical variables were expressed as percentages. Univariate analyses were performed using the c 2 test for categorical variables and Student's t-test for continuous variables. Statistical significance was taken as p-value < 0.05.

A total of 1,135 consecutive patients with confirmed COVID-19 pneumonia were included. The median age was 70 years (IQR 58-80) and 729 (64.2%) patients were men. Baseline clinical characteristics and clinical presentation are summarized in Table I . More than 40% of patients had a two or more comorbidities, mainly hypertension and chronic cardiovascular disease.

A total of 884 patients were discharged (78%, class 0 of the ML algorithm) and 251 patients died (22%, class 1 of the ML algorithm). During hospitalization, 172 patients (15%) were admitted to ICU due to worsening respiratory failure (Table II) .

Our algorithm was trained on 760 patients and tested on 327 patients. Twenty-six patients were excluded from the cohort since they had missing values in more than 60% of considered features; a further 22 patients were subsequently excluded from the cohort since they were considered as outliers using an isolation forest algorithm ( Figure 1 ). The randomized search results found that the best performing models were associated with a class 1 weight four times greater than class 0 for the random forest model and two times greater for the logistic regression model. The best performing classifier was logistic regression with class weight (Table III) , which showed a mean AUC of 0.88 ±0.03, macro-average F1 0.74 in cross-validation ( Figure 2 ). Furthermore, we used a feature selection iterative process based on feature importance to extract the most important features for outcome prediction. We set a threshold of 0.02, in order to obtain a model with less than 10 variables to be easily applied in clinical practice. The best performing classifier was the random forest, showing an AUC of 0.88 ±0.03 vs. 0.86 ±0.03 before the feature selection step, and, most importantly, from a cross-validated macro-AVG F1 0.73 (Figures 3, 4 , Table IV ).

Based on the feature importance model, we aimed to develop a pragmatic risk score for use at the bedside for in-hospital mortality. Continuous variables were converted to factors with cutoff values chosen by using component smoothed functions. We converted penalized regression coefficients into a prognostic index by using appropriate scaling based on relative importance.

The developed risk score included the following variables: -Age (> 70 years), p/F ratio (< 250 mm Hg); -Laboratory tests: hs-troponin I (> 20 ng/l), BNP (≥ 200 pg/ml), IL-6 (≥ 100 pg/ml), procalcitonin (≥ 1 ng/ml), red cell distribution width (RDW) (≥ 16%), urea (≥ 90 mg/dl), creatinine (≥ 1.9 mg/dl), albumin (≤ 3 g/dl). The risk score showed good performance in clinically relevant metrics across a range of cutoff values ( Table V) . The corresponding AUC for in-hospital mortality was 0.85 (95% CI: 0.82-0.87) (Figure 4) . Based on the performance metrics of the score, the score was grouped in three levels: low risk (0-6, 528 -46.5%), high risk (7-10, 235 -20.7%), and very risk (≥ 11, 372 -32.8%). A progressive increase in mortality rate was observed across risk levels (low risk 3.8%, high 19.15%, very high 50%). Regarding the relative importance of each feature, age, p/F ratio and hs-troponin I were the most important predictors of death ( Figure 5 ).

We have developed a machine learning based prediction score for in-hospital mortality in patients with COVID-19 pneumonia. The score uses demographics, clinical parameters, and blood tests available at hospital admission and can ac- curately characterize the population at low and very high risk of death. We used machine learning techniques to identify variables that predicted in-hospital mortality. We then reduced the number of variables using a feature selection process to build a more pragmatic score that could be easily used in the clinical setting. Using this approach, we were able to have satisfactory performance with good utility. More specifically, a low score has high specificity, and high NPV for a score higher than 11. This is particularly important in the context of a pandemic, such as COVID-19, which has dramatically altered hospitals' organization and led to the development of emergency ICUs and wards to assist ill patients. In this emergency setting, choosing which patient to admit to a regular ward versus discharge can be challenging. Therefore, the application of an easy-to-use risk score may help to quickly prioritize patients and apply stricter observation for at-risk patients. The application of artificial intelligence has great potential benefits, as it allows one to collect and analyze a large amount of data, and, more importantly, to identify hidden trends and unknown interactions among different variables with respect to the outcome. Several studies have used machine learning approaches for COVID-19 diagnosis and prognosis [19] . However, many of the proposed models have high risk of bias due to limited cohort dimension, lack of external validation, and the development only in the first wave of the pandemic, which might limit their generalizability [10] .

In line with previous findings, we identified that predictors of in-hospital mortality are age, severity of respiratory illness expressed as p/F ratio, biomarkers of cardiac damage, i.e. hs-TnI and BNP, inflammatory markers, i.e. IL-6 and Pct; creatinine, BUN, albumin and RDW.

With regards to biomarkers of cardiovascular damage, it is interesting to see how both median hs-TnI and BNP showed only mild increases above the normal range, but strongly predicted in-hospital mortality, including in patients without overt cardiovascular disease, similarly to other studies [20] . Potential causes of the increase of myocardial damage enzymes in COVID-19 include respiratory failure with hypoxemia, as in ARDS, pulmonary embolism, and myocardial injury, which have all been reported during severe SARS-CoV-2 infection with CT documented pneumonia [1] . Compared to existing risk scores, in our cohort of COVID-19 low albumin levels are associated with worse outcomes. Albumin levels have been associated with elevated risk of short-term and long-term mortality, and this could reflect both a status of malnutrition and acute illness [21] . Moreover, high levels of IL-6, a pleiotropic marker of inflammation, have been reported in the peripheral blood of hospitalized patients with COVID-19, with higher levels in those admitted to the ICU and an association between IL-6 levels and the probability of survival [22] . [22] . We are aware that our study has several limitations. First, the single-center nature of the study has intrinsic limitations. However, during the COVID-19 emergency, we were able to systematically collect data and laboratory tests, using electronical medical records and applying artificial intelligence. Furthermore, the study is based on retrospective data and is not validated in an external cohort, but we were able to overcome this using cross-validation. Moreover, we did not collect information about symptoms' du-ration before hospitalization; therefore we might have included patients at different stages of the disease. Additionally, we did not develop a specific model for the secondary outcomes, as selection bias may have occurred in the first vs second wave of SARS-CoV-2 infection for the ICU admission criteria. Conversely, the present study represents one of the biggest cohorts analyzed by machine learning approaches [22] . Furthermore, we included only patients with CT documented COVID-19 pneumonia, which is a prognostic criterion for the need of oxygenation support and intubation.

In conclusion, we have developed and validated using a machine learning driven approach a pragmatic prognostic score for in-hospital mortality in COVID-19 patients with documented pneumonia, based on clinical and laboratory measurements, tested and validated on a large cohort of patients during both the first and second SARS-CoV-2 infection wave. The application of artificial intelligence allowed us both to collect and analyze data to develop a practice tool for severity stratification of hospitalized COVID-19 patients. 

Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City Area

Early detection of elevated cardiac biomarkers to optimise risk stratification in patients with COVID-19

Genomewide association study of severe Covid-19 with respiratory failure

Factors associated with COVID-19-related death using OpenSAFE-LY

Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study

Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China

Quantitative chest CT analysis in COVID-19 to predict the need for oxygenation support and intubation

High mortality in COVID-19 patients with mild respiratory disease

Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score

Development and validation of the quick COVID-19 severity index: a prognostic tool for early clinical decompensation

A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients

Prediction models for diagnosis and prognosis of Covid-19: systematic review and critical appraisal

ICU management based on big data

A new method of classifying prognostic comorbidity in longitudinal studies: development and validation

Machine learning approaches in COVID-19 diagnosis, mortality, and severity risk prediction: a review

Prognostic value of natriuretic peptides and cardiac troponins in COVID-19

Impacts of admission serum albumin levels on short-term and long-term mortality in hospitalized patients

Tocilizumab in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial

Machine learning applied to clinical laboratory data in Spain for COVID-19 outcome prediction: model development and validation

Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19

Predicting outcomes in the machine learning era: the Piacenza score a purely data driven approach for mortality prediction in COVID-19 pneumonia

Quantitative chest CT analysis in COVID-19 to predict the need for oxygenation support and intubation

The authors declare no conflict of interest.R e f e r e n c e s