key: cord-0962516-zw456rtv authors: Kuno, Toshiki; Sahashi, Yuki; Kawahito, Shinpei; Takahashi, Mai; Iwagami, Masao; Egorova, Natalia N. title: Prediction of in‐hospital mortality with machine learning for COVID‐19 patients treated with steroid and remdesivir date: 2021-10-22 journal: J Med Virol DOI: 10.1002/jmv.27393 sha: d25f5776cd75a008481e876767e62d4567b77f01 doc_id: 962516 cord_uid: zw456rtv We aimed to create the prediction model of in‐hospital mortality using machine learning methods for patients with coronavirus disease 2019 (COVID‐19) treated with steroid and remdesivir. We reviewed 1571 hospitalized patients with laboratory confirmed COVID‐19 from the Mount Sinai Health System treated with both steroids and remdesivir. The important variables associated with in‐hospital mortality were identified using LASSO (least absolute shrinkage and selection operator) and SHAP (SHapley Additive exPlanations) through the light gradient boosting model (GBM). The data before February 17th, 2021 (N = 769) was randomly split into training and testing datasets; 80% versus 20%, respectively. Light GBM models were created with train data and area under the curves (AUCs) were calculated. Additionally, we calculated AUC with the data between February 17th, 2021 and March 30th, 2021 (N = 802). Of the 1571 patients admitted due to COVID‐19, 331 (21.1%) died during hospitalization. Through LASSO and SHAP, we selected six important variables; age, hypertension, oxygen saturation, blood urea nitrogen, intensive care unit admission, and endotracheal intubation. AUCs using training and testing datasets derived from the data before February 17th, 2021 were 0.871/0.911. Additionally, the light GBM model has high predictability for the latest data (AUC: 0.881) (https://risk-model.herokuapp.com/covid). A high‐value prediction model was created to estimate in‐hospital mortality for COVID‐19 patients treated with steroid and remdesivir. There are several prediction models to estimate the risk of inhospital death for patients with COVID-19, however, prediction models with light gradient boosting model (GBM) are scarce. [5] [6] [7] [8] [9] Light GBM is considered to reduce calculation time and it might be suitable for creation of a prediction calculator on the website. It also allows missing values for prediction, which is more advantageous than the conventional logistic regression model. Additionally, as steroids and remdesivir are the standard treatments of moderate or severe COVID patients as of April 17th 2021, [10] [11] [12] a prediction model among patients treated with both steroid and remdesivir is warranted. Moreover, the racial difference in death due to COVID-19 remains uncertain although racial disparities were observed in infection rates, [13] [14] [15] [16] it should be investigated whether including it into the risk model predicting mortality. We aimed to build the prediction model for in-hospital mortality among patients infected with COVID-19 treated and treated with both steroid and Remdesivir in a diverse population of New York City. We also aimed to create the calculator on the website so that frontline providers can use this prediction model to identify high risk hospitalized COVID-19 patients treated with steroid and remdesivir. 28, 29 After selection of important variables, the data before February 17th, 2021 (N = 769) was randomly split into training and testing datasets; 80% and 20%, respectively. Then, light GBM and a logistic regression model using the stratified K-fold crossvalidation method were applied to the train data (K = 5). In comparison to the logistic regression model, Light GBM used "NaN" to represent missing values and were dealt separately than zero, as missing values were interpreted as containing information. 28 The hyper-parameter optimization was performed using an implementation called "Optuna" for light GBM. For logistic regression, we used a grid search strategy to identify the best tuning hyperparameters. 30 We also used Standard Scaler to improve predictability. 31 We also performed an imputation for missing data using the library of IterativeImputer in Python for a logistic regression model. We used area under the curve (AUC) to evaluate the different models. Furthermore, we validated the model into the data between February 18th, 2021 and March 30th, 2021 (N = 802). Finally, we created a web-based calculator to predict in-hospital mortality due to COVID-19. All statistical calculations and analyses were performed on R (version 3.6.2, R Foundation for Statistical Computing, Vienna, Austria) and Python 3.7 (Python Software Foundation Delaware, USA). All p values <0.05 considered statistically significant. This study was approved by the institutional review boards (#2000495) and conducted in accordance with the principles of the Declaration of Helsinki. The waiver of patients' informed consent was also approved by the institutional review boards. Of the 1571 patients admitted due to COVID-19, 331 (21.1%) died during hospitalization. Baseline characteristics across two study periods are reported in Table 1 , demonstrating mostly comparable patients' characteristics except sex and race. Treatments and outcomes are shown in Table 2 . Although the rates of therapeutic versus prophylactic anticoagulation, Tocilizumab, convalescent plasma were significantly different between the study periods. ICU admission, endotracheal intubation, acute kidney injury and in-hospital mortality were not significantly different ( Table 2) . LASSO method showed the following 17 variables as important features to predict in-hospital mortality; age, race, hypertension, coronary artery disease, heart rate, respiratory rate, systolic blood pressure, diastolic blood pressure, oxygen saturation, C-reactive protein, D-dimer, white blood cell count, Figure 3A ,B. Using this calculator, we could estimate the risk of death. The salient of our findings are the followings: (1) light GBM showed high AUC to predict in-hospital mortality, which was comparable to the logistic regression model; (2) Calculator on the website using a light GBM model which allows missing values is useful to predict inhospital mortality. As of April 17th, 2021, steroids and remdesivir are the standard treatment of COVID patients 10,11 for patients with moderate or severe COVID-19 (oxygen saturation level <94%). As the prediction model among patients treated with steroid and remdesivir is needed and we created the risk model among those patients. Using LASSO method, age, race, hypertension, coronary artery disease, heart rate, respiratory rate, systolic blood pressure, diastolic blood pressure, oxygen saturation, C-reactive protein, D-dimer, white blood cell count, hemoglobin, blood urea nitrogen, eGFR, ICU admission and endotracheal intubation were selected as important features which is compatible with the previous studies. 16, 26, [32] [33] [34] Additionally, we adjusted the number of variables with SHAP to enhance convenience of the risk model, with six variables of age, hypertension, blood urea nitrogen, oxygen saturation, ICU admission, and endotracheal intubation. Our risk model is valuable to predict the risk of death for moderate or severe COVID-19 patients treated with steroid and remdesivir. We demonstrated blood urea nitrogen as important variables rather than C-reactive protein, D-dimer using SHAP. 26, 35 Another strength of this study is the website calculator, which will enable frontline providers to identify high-risk patients immediately at the time of admission for patients requiring steroid and remdesivir. We consider risk prediction model is really useful especially when frontline providers can utilize it. It is also valuable as we could calculate the risk of death even with missing values since light GBM allows missing values to construct a model. Racial difference in death due to COVID-19 remains uncertain although racial disparities were observed in infection rates. [13] [14] [15] [16] using our data showed that race is an important feature, however, SHAP did not reveal that we could predict in-hospital mortality without the information of race. Ase COVID-19 occurred among diverse patients population in New York City, 3, 4, 36, 36, 37 our model would be useful globally as COVID-19 affects all over the world, however, more extensive validation using international data is necessary. Moreover, gender or comorbidities were less prominent in our model, especially selected by SHAP. Although gender or comorbidities were important variables that affect mortality, 38 A novel coronavirus from patients with pneumonia in China Coronavirus COVID-19 Global Cases by the Center for Systems Science and Enginerring at Johns Hopkins University Cardiac injury and outcomes of patients with COVID-19 in New York City The association of interleukin-6 value, interleukin inhibitors, and outcomes of patients with COVID-19 in New York City Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 model development and validation Clinical features of COVID-19 mortality: development and validation of a clinical prediction model Prognostic modeling of COVID-19 using artificial intelligence in the United Kingdom: model development and validation Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study Early prediction of mortality risk among patients with severe COVID-19, using machine learning Effect of remdesivir on patients with COVID-19: a network meta-analysis of randomized control trials Group WHOREAfC-TW. Association between administration of systemic corticosteroids and mortality among critically Ill patients with COVID-19: a metaanalysis Dexamethasone in hospitalized patients with Covid-19 Deaths in people from Black, Asian and minority ethnic communities from both COVID-19 and non-COVID causes in the first weeks of the pandemic in London: a hospital case note review Racial and ethnic disparities in SARS-CoV-2 pandemic: analysis of a COVID-19 observational registry for a diverse US metropolitan population Assessment of racial/ethnic disparities in hospitalization and mortality in patients with COVID-19 in New York City Racial and ethnic differences in presentation and outcomes for patients hospitalized with COVID-19: findings from the american heart association's COVID-19 cardiovascular disease registry COVID-19 and influenza testing in New York City The association of remdesivir and in-hospital outcomes for COVID-19 patients treated with steroids The association of COVID-19 antibody with in-hospital outcomes in COVID-19 infected patients U shape association of hemoglobin level with in-hospital mortality for COVID-19 patients The association of inhaled corticosteroid before admission and survival of patients with COVID-19 The association between convalescent plasma treatment and survival of patients with COVID-19 The characteristics and outcomes of critically Ill patients with COVID-19 who received systemic thrombolysis for presumed pulmonary embolism: an observational study Contrast-induced acute kidney injury The lasso method for variable selection in the Cox model Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease Impact on outcomes across KDIGO-2012 AKI criteria according to baseline renal function Novel machine learning can predict acute asthma exacerbation From local explanations to global understanding with explainable AI for trees Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data The statistical importance of P-POSSUM scores for predicting mortality after emergency laparotomy in geriatric patients Risk factors for predicting mortality of COVID-19 patients: a systematic review and metaanalysis Predictors of in-hospital COVID-19 mortality: a comprehensive systematic review and meta-analysis exploring differences by age, sex and health conditions Ethnicity/race and economics in COVID-19: meta-regression of data from counties in the New York metropolitan area Novel risk scoring system for predicting acute respiratory distress syndrome among hospitalized patients with coronavirus disease 2019 in Wuhan, China Increased secondary infection in COVID-19 patients treated with steroids in New York City Palliative care team involvement in patients with COVID-19 in New York City Gender difference is associated with severity of coronavirus disease 2019 infection: an insight from a meta-analysis Cardiovascular comorbidities, cardiac injury, and prognosis of COVID-19 in New York City Excess mortality in Italy during the COVID-19 pandemic: assessing the differences between the first and the second wave, Year 2020. Front Public Health