key: cord-0710566-zd52h8tu authors: Murri, Rita; Lenkowicz, Jacopo; Masciocchi, Carlotta; Iacomini, Chiara; Fantoni, Massimo; Damiani, Andrea; Marchetti, Antonio; Sergi, Paolo Domenico Angelo; Arcuri, Giovanni; Cesario, Alfredo; Patarnello, Stefano; Antonelli, Massimo; Bellantone, Rocco; Bernabei, Roberto; Boccia, Stefania; Calabresi, Paolo; Cambieri, Andrea; Cauda, Roberto; Colosimo, Cesare; Crea, Filippo; De Maria, Ruggero; De Stefano, Valerio; Franceschi, Francesco; Gasbarrini, Antonio; Parolini, Ornella; Richeldi, Luca; Sanguinetti, Maurizio; Urbani, Andrea; Zega, Maurizio; Scambia, Giovanni; Valentini, Vincenzo title: A machine-learning parsimonious multivariable predictive model of mortality risk in patients with Covid-19 date: 2021-10-27 journal: Sci Rep DOI: 10.1038/s41598-021-99905-6 sha: 088ecbe02cbb9ba52f874bfd183cb177ce5a09b6 doc_id: 710566 cord_uid: zd52h8tu The COVID-19 pandemic is impressively challenging the healthcare system. Several prognostic models have been validated but few of them are implemented in daily practice. The objective of the study was to validate a machine-learning risk prediction model using easy-to-obtain parameters to help to identify patients with COVID-19 who are at higher risk of death. The training cohort included all patients admitted to Fondazione Policlinico Gemelli with COVID-19 from March 5, 2020, to November 5, 2020. Afterward, the model was tested on all patients admitted to the same hospital with COVID-19 from November 6, 2020, to February 5, 2021. The primary outcome was in-hospital case-fatality risk. The out-of-sample performance of the model was estimated from the training set in terms of Area under the Receiving Operator Curve (AUROC) and classification matrix statistics by averaging the results of fivefold cross validation repeated 3-times and comparing the results with those obtained on the test set. An explanation analysis of the model, based on the SHapley Additive exPlanations (SHAP), is also presented. To assess the subsequent time evolution, the change in paO2/FiO2 (P/F) at 48 h after the baseline measurement was plotted against its baseline value. Among the 921 patients included in the training cohort, 120 died (13%). Variables selected for the model were age, platelet count, SpO2, blood urea nitrogen (BUN), hemoglobin, C-reactive protein, neutrophil count, and sodium. The results of the fivefold cross-validation repeated 3-times gave AUROC of 0.87, and statistics of the classification matrix to the Youden index as follows: sensitivity 0.840, specificity 0.774, negative predictive value 0.971. Then, the model was tested on a new population (n = 1463) in which the case-fatality rate was 22.6%. The test model showed AUROC 0.818, sensitivity 0.813, specificity 0.650, negative predictive value 0.922. Considering the first quartile of the predicted risk score (low-risk score group), the case-fatality rate was 1.6%, 17.8% in the second and third quartile (high-risk score group) and 53.5% in the fourth quartile (very high-risk score group). The three risk score groups showed good discrimination for the P/F value at admission, and a positive correlation was found for the low-risk class to P/F at 48 h after admission (adjusted R-squared = 0.48). We developed a predictive model of death for people with SARS-CoV-2 infection by including only easy-to-obtain variables (abnormal blood count, BUN, C-reactive protein, sodium and lower SpO2). It demonstrated good accuracy and high power of discrimination. The simplicity of the model makes the risk prediction applicable for patients in the Emergency Department, or during hospitalization. Although it is reasonable to assume that the model is also applicable in not-hospitalized persons, only appropriate studies can assess the accuracy of the model also for persons at home. Study population. The study cohort included all patients admitted to Fondazione Policlinico Gemelli with COVID-19 from March 5, 2020 to February 5, 2021 . The diagnosis of SARS-CoV-2 infection was considered when the reverse transcription polymerase chain reaction (PCR) of the SARS-CoV-2 assay was detected from nasopharyngeal swab. For each patient, time 0 was considered the date of hospitalization for SARS-CoV-2 infection. Data collection. Patient data included demographics, comorbidities, vital signs, and laboratory characteristics, as well as exposure history, medical history, symptoms at onset, treatment, and outcome data on admission and during hospitalization. Pre-existing conditions collected were diabetes, hypertension, chronic heart disease, chronic respiratory disease, chronic kidney disease, mild to severe liver disease, pancreatitis, neurological impairment, connective tissue disease, transplantation, HIV infection, and malignancy. Vital signs included heart rate, respiratory rate, oxygen saturation by pulse oximetry (SpO2), temperature, body weight, and body mass index (BMI). Laboratory parameters included hematologic variables (white blood cells [WBC] , neutrophils, lymphocytes, and eosinophils, platelet count, hematocrit), blood urea nitrogen (BUN); creatinine; total bilirubin; creatine kinase; glucose; sodium; potassium; C-reactive protein; procalcitonine, D-dimer; ferritin; lactate dehydrogenase (LDH); arterial blood oxygen partial pressure (paO2) and inspired oxygen fraction (FiO2), paO2/FiO2 ratio (P/F). SpO2 was grouped into three categories according to the interquartile range: SpO2 less than 94% (first quartile), SpO2 between 94 and 97.0% (second and third quartile), SpO2 greater than 97.0% (fourth quartile). All data were extracted from the electronic medical records of all patients. To obtain structural information from unstructured texts (such as clinical diary, radiology reports etc.), Natural Language Processing (NLP) algorithms were applied, based on text mining procedures such as: sentence/word tokenization; rule-based approach supported by annotations defined by the clinical SMEs, and using semantic/syntactic corrections where necessary. Outcome. The primary outcome was in-hospital case-fatality rate. Predictors. Candidate predictors were included when previously shown to be related to mortality in COVID-19 patients or other respiratory diseases (such as bacterial pneumonia) or possibly related because of clinical plausibility. To capture the risk of death associated with early hospitalization, we developed a predictive model including only laboratory variables and oxygen saturation at the time of SARS-Cov2 infection. The rationale behind this choice was to provide a tool for early risk assessment. The variables for the model are routinely collected, available within a very short time after presentation, and the literature has reported their association with an increased likelihood of death; moreover, they could also be available at home through home services. In this way, an estimate of risk can be obtained at the time of hospital admission, and actions on the management of critical versus non-critical patients can be readily taken by hospital staff from the patient's initial clinical status as well as its evolution in a relatively short time frame. A binary logistic regression was applied to express the risk of death in analytical terms, and possibly use it in risk assessment tools based on model coefficients alone. We have chosen to use a logistic regression model because it has both a simple analytical expression and a straightforward interpretation in terms of regression coefficients; other machine learning techniques can have in general higher or slightly higher performances, but at the cost of less technical transferability and clinical explainability, at least in our setting. Candidate predictors were selected through a combination of prior domain knowledge and a data-driven approach: for example, cut-off values to classify SpO2 and sodium were heuristically defined by the interquartile range, confirmed by a-priori medical knowledge. Overall feature selection was conducted iteratively based on www.nature.com/scientificreports/ their added contribution to the model in terms of information criterion to minimize model redundancy. The model was trained on the first 8 months of data (March 5, 2020-November 5, 2020), and tested on the next 3 months of data (November 6, 2020-February 5 2021). The out-of-sample performance of the model was estimated from the training set in terms of area under the receiving operator curve (AUROC) and classification matrix statistics by averaging the results of the fivefold cross validation repeated 3-times and comparing the results with those obtained on the test set. Finally, an analysis of lift and gain graphs is presented to identify segments of outcome probability where the model proves particularly useful compared to having no model at all. A model explanation analysis, based on the SHapley Additive exPlanations (SHAP) framework, is also presented to derive information about the contribution of individual variables to the model beyond that obtained from simple logistic regression coefficients. Baseline laboratory variables for each patient were included by taking the first value after the date-time of hospital admission; only variables with less than 5% of missing values were retained for further analysis, and the final training cohort was selected by choosing the complete records only. This set of variables, along with age and sex, and study outcome, were given as input to a routine of 100-iteration of AIC-based stepwise selection on 80% subsets of the randomly partitioned training data, and characteristics selected at least 50 times were considered to train the final logistic regression model. A level of 0.05 was considered significant for statistical testing. Statistical analysis was done with R version 3.6. Data were stored in SAS Viya V.03.05 and accessed through R with SWAT library version 1.5.0. According to TRIPOD guidelines 15 , the study should be considered a TRIPOD 2b because it involves a chronological division between training and testing data from a single institution. Ethical approval. This study was approved by Ethics Committees of the Fondazione Policlinico Gemelli (IRB 3447). All research was performed in accordance with relevant guidelines/regulations and it was conducted in accordance to the Declaration of Helsinki. Written informed consent was waived because of the rapid emergence of this infectious disease (Comitato Etico Policlinico Gemelli; comitato.etico@policlinicogemelli.it). The eligible training cohort included a total of 1126 patients with confirmed COVID-19 admitted from 5 March, 2020, to 5 November, 2020. In this cohort, the in-hospital mortality rate was 13.0%. Characteristics of the study population are shown in Table 1 . Survivors differed from nonsurvivors for being younger, having few preexisting medical conditions (specifically, lower rates of diabetes, hypertension, cardiovascular diseases, chronic respiratory diseases, renal failure, solid tumors, and arteriopathy), more cough and diarrhea at onset but less dyspnea, a longer time from symptoms onset to hospitalization, a higher P/F, albumin and hemoglobin value, a higher platelet count, lower WBC and lymphocyte count, a lower creatinine, BUN, C-reactive protein, and D-dimer. From an initial dataset of 1126 patient records, a total of 921 complete records were included. After the feature selection phase, the selected variables were age (relative selection frequency [RSF] 100%), platelet count (RSF 97%), SpO2 (RSF 80%), BUN (RSF 72%), hemoglobin (RSF 71%), C-reactive protein (RSF 68%), neutrophil count (RSF 60%), and sodium (RSF 58%). These variables were used to fit the logistic regression model. The estimated coefficients of the logistic model are shown in Table 2 , along with p values. Each variable in the model is associated with a distribution of importance values among all instances of the dataset (patients), ordered by the value of the variable from low to high. It emerges, for example, that a lower value of platelet count is associated with a higher risk of death, whereas higher values of BUN, C-reactive protein, neutrophils and age are associated with a higher risk of death. The sodium variable was subdivided according to the interquartile range: in this threecategory version of the variable (low, normal, high), it can be seen that the "low sodium" group (≤ 136 mmol/l) does not impact death for this cohort of patients, whereas the "high sodium" class (≥ 141 mmol/l) does. Similarly, SpO2 < 94% has a greater impact in the model than the variable representing SpO2 values between 94 and 97. Figure 1 is a representation of the importance of the variables in the model based on the SHAP framework. The overall statistical significance of the model according to chi-squared residual deviance test was confirmed with a p-value zero. The fivefold cross-validation repeated 3-times resulted in an AUROC of 0.87, and the statistics of the classification matrix at the Youden index as follows: sensitivity 0.840, specificity 0.774, negative predictive value 0.971. The model was then tested on the cohort of patients admitted between November 6, 2020, and February 5, 2021, (n = 1463), recording the model variable of interest and the clinical outcome. In this cohort of patients, the mortality rate was 22.6%. The model test results in terms of AUROC statistics and confounding matrix are AUROC 0.818, sensitivity 0.813, specificity 0.650, negative predictive value 0.922 (Table 3 ; Fig. 2 ). To get a quantification of how the model performs in different segments of probability outputs compared to a random classifier, a gain and lift curve analysis is shown (Fig. 3) . Moreover, the lift plot on the testing data in Fig. 3 shows that for the first decile of predictions, the model performs more than 3 times better than random guessing based on prevalence only. Specifically, when considering the first quartile of the predicted risk score on the test set, it contains 6 death events out of 366 total predictions in that risk group. Similarly, the highest 25% of risk scores on the test set contain 196 actual death events, which is more than 50% of the population classified in that risk group (Table 4) . A calibration analysis was performed on the testing set to produce the calibration plot of Suppl. Fig. Y . A linear regression fit on the calibration points sampled at every 5 percentiles of the predicted outcome probabilities estimated an intercept of − 4.57 ± 2.12 and a slope of 1.12 ± 0.03 for the regression line with an adjusted R-squared of 0.89. Brier score on the testing set predictions was 0.12. A decision curve analysis was conducted on the testing set to assess model utility compared to baseline strategies of considering "no high-risk" or "all high risk". Suppl. Fig. W shows the decision curve for thresholds in the range 0-0.5: the curve associated to the model is always higher or substantially higher than the baseline strategies. www.nature.com/scientificreports/ Also, a zoomed-in version of the graph was produced in Suppl. Fig. Z to highlight the first risk threshold we identified (0.02) for the risk classes. In addition to having an instrument capable of distinguish between low-risk, high-risk and very high-risk cases with a fair degree of accuracy, we evaluated the evolution of the different groups of patients in the first few hours after hospital admission. Considering the cohort of patients used for model training and taking the first available value of P/F within 24 h of hospital admission, the three model-defined risk groups had a mean value of P/F of 301, 273, 273 for low-risk, high-risk and very high-risk, respectively. A t test between the low-risk group versus the other two categories showed a statistically significant difference. To assess the subsequent time course, the change in P/F at 48 h after the base- www.nature.com/scientificreports/ line measurement can be plotted against its baseline value (Fig. 4) . In the low-risk group, the P/F following the admission to hospital did not worsen over the following 48 h (adjusted R squared of 0.48). In the very high-risk group the P/F tends to a single value independently from the baseline value (adjusted R squared of 0). Adoption in clinical practice. The risk of death score for each patient with SARS-CoV-2 infection was made available to clinicians along with real-time predictions directly on the Electronic Health Record (Fig. 5) . www.nature.com/scientificreports/ Given the high rate of patients with complications of SARS-CoV-2 infection, prioritization of patients who need higher levels of care or immediate medical attention is critical. In the present study on a total of 2384 patients hospitalized with COVID-19, of whom 18.9% died, we presented an artificial intelligence-driven clinical algorithm to predict risk of death. The algorithm showed that abnormal blood counts (hemoglobin, platelets, neutrophils), high levels of BUN, C-reactive protein, sodium and lower SpO2 were associated with an increased risk of death. From the model, we were able to identify three risk level groups: low-risk, with a prevalence of www.nature.com/scientificreports/ death of 1.6%, high-risk, with a prevalence of death of 17.8%, and very high-risk with a prevalence of death of 53.5%. Our model includes only easy-to-obtain variables: its simplicity makes the risk prediction applicable for different purposes for patients in the Emergency Department, or during the hospitalization. For example, when the calculated individual risk of death is low, the physician may choose to monitor the patient and send him/her back home, whereas high risk estimates suggest more aggressive monitoring or resource allocation or may be useful in anticipating organizational needs in terms of intensive, sub-intensive, and rehabilitation rooms and staff allocation. Safely discharging patients from the Emergency Department is of a great benefit in saving beds for other critically ill patients. Such a parsimonious model is exploitable even in medically resource-limited settings. The discriminatory performance of the model is very high and testing of the model on a new cohort of the very newly diagnosed patients confirmed its validation. The model also demonstrated good accuracy in predicting respiratory evolution when P/F at baseline and at 48 h were considered. The two major strengths of the present study are the parsimonious inclusion of simple and easy-to-obtain variables, also available in primary care settings, and the immediate translation of a mathematical model into a comprehensible and implementable number in EHR for clinical decision making in daily practice. Several published studies provide a computational tool or Web-based calculator for easy use in a variety of settings 10, 11, [16] [17] [18] [19] [20] . Unfortunately, such calculators require data entry that is cumbersome in a busy clinical practice. Real-time processing of the model directly from the EHR provides an immediate and seamless calculation, a score that can be used to support clinical decision making and support prioritization, especially when the healthcare system is overloaded. Other predictive models have been published previously, many of which report age, hematologic measures, C-reactive protein and spO2 as the main variables explaining the predictive model 7, 8 . Most of the published studies focused on very critically ill people 21 . Our results confirm and extend those of other large cohort studies 7-13 demonstrating the predictive value of renal function 20, 22, 23 and, particularly, of blood urea nitrogen for mortality 14, 24, 25 . In addition, we share 4 of 9 variables from a machine-learning-based study with the largest included population 14 . Many models make particular use of easy-to-collect variables 26, 27 . The model of the present study shares some variables among those included in CURB-65, a well-validated and widely used score for predicting mortality in persons with community-acquired pneumonia 28 , with an AUROC of 0.72 (0.71-0.73) in patients with COVID-19 14 . Age and BUN are included in both CURB-65 and our predictive model. whereas respiratory function was described by respiratory rate in CURB-65 and SpO2 in our model. The variables in the present model also share many parameters with other risk scores used to predict mortality in patients with sepsis, such as the widely used SOFA score 29 , probably reflecting a clinical presentation of COVID-19 very close to sepsis. These findings may help highlight the complex pathogenesis of the SARS-CoV-2 infection. To date, published models implementing machine learning techniques for statistical analysis used very different techniques (support vector machine 27 , artificial neural networks, decision trees, partial least squares discriminant analysis, K nearest neighbour algorithm 22, 30, 31 , ensemble, Gaussian process, linear, Naïve Bayes 22 , random forest, catboost, and extreme gradient boosting 31 ) indicating good ability to predict mortality. In our study, we proposed a simple classifier model based on logistic regression which can be easily exported on different software environments and has a neat clinical explainability in terms of regression coefficients, while still maintain a satisfying out-of-sample performance. In addition, we enhanced even more the model readability by using the Shapley additive explanations (SHAP) framework to make the individual variables contribution to the overall prediction available and understandable in real-time to physicians along with the model's risk score. Machine learning methods can synthesize data from thousands of patients to generate tailored predictions for each new patient in real time. In addition, model explanations used in our study such as Shapley additive explanations (SHAP) 25, 27 were made available and understandable to physicians along with real-time predictions. The present study includes several limitations: the scalability and the interoperability of the entire data architecture must be demonstrated in other centers and clinical settings. Moreover, the impact of clinical implementation of this predictive model in daily clinical life has not yet been demonstrated. Studies demonstrating changes in clinical management based on model prediction are strongly warranted. The two greatest strengths of the present study are the parsimonious inclusion of simple and easy-to-obtain variables, also available in primary care settings, and the immediate translation of a mathematical model into a comprehensible and implementable number in EHR for clinical decision-making in daily practice. Indeed, for each patient who tested positive to PCR for SARS-CoV2, hospital IT made available to us in near real-time the patient's data in a pseudo-anonimyzed manner on a dedicated environment. We were able to access this data and send back to the server the model risk score, the risk class, and the importance of the variables for each particular prediction. This output information was entered into the EMR software interface of the emergency and infectious disease, through an automated procedure, for on-line consultation in the wards. Currently, containing the COVID-19 epidemic is an urgent global priority. Dealing with a severe pandemic disease such as COVID-19 is also very challenging because rapidly changing variables (vaccination, new SARS-CoV-2 variants, saturation of hospital capacity) alter the risk of death over time 32 . Our predictive model is pragmatic and effective in identifying individuals at particularly high risk for a poorer hospital course. Computational infrastructure could enhance this process, and data repository, updated in real time, can continuously inform the planning of diagnostic and treatment strategies. Future randomised trials should be conducted to demonstrate whether the current use of the death risk score will improve final patient outcomes. Predictive models can help provide appropriate care and optimize the use of limited resources, such as during a pandemic. Finally, sharing large amounts of data among centers around the world can be a formidable response to the tremendous challenge of the COVID-19 pandemic. Clinical characteristics of coronavirus disease 2019 in China Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72 314 cases from the Chinese Center for disease control and prevention Coronavirus disease 2019 Clinical characteristics of 113 deceased patients with coronavirus disease 2019: Retrospective study Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study Laboratory findings associated with severe illness and mortality among hospitalized individuals with coronavirus disease 2019 in Eastern Massachusetts Characteristics and predictors of death among 4035 consecutively hospitalized patients with COVID-19 in Spain Factors associated with death in critically ill patients with coronavirus disease 2019 in the US Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19 A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal The national COVID cohort collaborative: Clinical characterization and early severity prediction Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: Development and validation of the 4C Mortality Score Machine learning assisted prediction of prognostic biomarkers associated with COVID-19, using clinical and proteomics data Development and validation of a web-based severe COVID-19 risk prediction model An early warning tool for predicting mortality risk of COVID-19 patients using machine learning Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool Developing and validating COVID-19 adverse outcome risk prediction models from a bi-national European cohort of 5594 patients Deploying unsupervised clustering analysis to derive clinical phenotypes and risk factors associated with mortality risk in 2022 critically ill patients with COVID-19 in Spain Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19 Development and validation of a prognostic COVID-19 severity assessment (COSA) score and machine learning models for patient triage at a tertiary hospital Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): A meta-analysis Prognostic assessment of COVID-19 in the intensive care unit by machine learning methods: Model development and validation Prediction of ICU admission for COVID-19 patients: A Machine Learning approach based on Complete Blood Count data Development of a prognostic model for mortality in COVID-19 infection using machine learning Defining community acquired pneumonia severity on presentation to hospital: An international derivation and validation study Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit Diagnosis and prediction of COVID-19 severity: Can biochemical tests and machine learning be used as prognostic indicators? A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo Variation in US hospital mortality rates for patients admitted with COVID-19 during the first 6 months of the pandemic R.M., J.L., S.P. and V.V. conceived of the presented idea and drafted the manuscript. J.L., N.D.C., C.M., C.I., S.P., A.D., A.M., P.D.A.S. extracted and analysed the data. All other authors contributed equally, discussed the results and concurred to the final manuscript.