key: cord-0782511-1d3k5ifh authors: Chen, Chaojin; Yang, Dong; Gao, Shilong; Zhang, Yihan; Chen, Liubing; Wang, Bohan; Mo, Zihan; Yang, Yang; Hei, Ziqing; Zhou, Shaoli title: Development and performance assessment of novel machine learning models to predict pneumonia after liver transplantation date: 2021-03-31 journal: Respir Res DOI: 10.1186/s12931-021-01690-3 sha: e14b23a343a5454b7cd5bd595bf5d57676df2c00 doc_id: 782511 cord_uid: 1d3k5ifh BACKGROUND: Pneumonia is the most frequently encountered postoperative pulmonary complications (PPC) after orthotopic liver transplantation (OLT), which cause high morbidity and mortality rates. We aimed to develop a model to predict postoperative pneumonia in OLT patients using machine learning (ML) methods. METHODS: Data of 786 adult patients underwent OLT at the Third Affiliated Hospital of Sun Yat-sen University from January 2015 to September 2019 was retrospectively extracted from electronic medical records and randomly subdivided into a training set and a testing set. With the training set, six ML models including logistic regression (LR), support vector machine (SVM), random forest (RF), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost) and gradient boosting machine (GBM) were developed. These models were assessed by the area under curve (AUC) of receiver operating characteristic on the testing set. The related risk factors and outcomes of pneumonia were also probed based on the chosen model. RESULTS: 591 OLT patients were eventually included and 253 (42.81%) were diagnosed with postoperative pneumonia, which was associated with increased postoperative hospitalization and mortality (P < 0.05). Among the six ML models, XGBoost model performed best. The AUC of XGBoost model on the testing set was 0.734 (sensitivity: 52.6%; specificity: 77.5%). Pneumonia was notably associated with 14 items features: INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na(+), TBIL, anesthesia time, preoperative length of stay, total fluid transfusion and operation time. CONCLUSION: Our study firstly demonstrated that the XGBoost model with 14 common variables might predict postoperative pneumonia in OLT patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12931-021-01690-3. Postoperative pulmonary complications (PPC) adversely affect the clinical course of orthotopic liver transplantation (OLT) and play an important role in poor survival [1] . Postoperative pneumonia is the most common type of PPC, contributing to morbidity, length of hospital stay, and mortality [2] . Identification of patients at high risk of developing postoperative pneumonia is the key to early implementing interventions to prevent its onset or antibiotics to treat bacterial infection [3] . On the contrary, unnecessary and excessive antibiotic use in patients at low risk for postoperative pneumonia can lead to antibiotic resistance and side effects. For instance, recent studies have shown that extensive use of antibiotics for anti-bacteria prophylaxis, multi-drug resistant bacteria in post-transplant patients have been induced [4, 5] . Therefore, it is essential to establish a reliable model for prediction of postoperative pneumonia to tailor preventive interventions and treatments for patients at high-risk of postoperative pneumonia and avoid unnecessary use of antibiotics in low-risk patients. In recent years, several scoring systems for prediction of postoperative pneumonia have been reported to improve risk-stratification [6] , such as the Prestroke Independence, Sex, Age, National Institutes of Health Stroke Scales (ISAN) in acute ischemic stroke patients [7] , a pneumonia risk index for patients undergoing major noncardiac surgery [8] , and a systemic inflammation score for patients after radical resection of gastric cancer [9, 10] . However, these predictive models are not applicable to liver transplant recipients, mainly due to the preoperative pulmonary condition of patients with endstage liver disease and the immunosuppressive status of allograft recipients [10] . Currently, an effective risk classification for postoperative pneumonia has not yet been available for liver transplant recipients. Compared with the traditional scoring systems, machine learning (ML) models have shown better performance in predicting various diseases or clinical conditions [11] [12] [13] . ML models are usually constructed based on high volume data recorded in the electronic patient record (EPR) systems and its deep learning ability allows ML models to capture complex, nonlinear relationships, even previously unknown correlations in big data, digging deeper into clinical data [14] , and shows promising potential in clinical scenes where large amount of data were collected and integrated every day. Recently, Li and colleagues [15] have developed a model using ML methods to predict stroke-associated pneumonia in Chinese patients with acute ischemic stroke. In addition, ML was used to predict severe pneumonia during posttransplant hospitalization in recipients of a kidney transplant [16] . ML was also applied in developing models for liver disease and transplantation to predict post-transplant survival and complications, including acute kidney injury (AKI) and diabetes [17] . To date, there has been no ML model for prediction of postoperative pneumonia in recipients of liver transplant [18] . In this study, we aimed to develop predictive models using ML methods, and to evaluate their performance in predicting postoperative pneumonia in OLT patients. The findings obtained through conducting this study was expected to provide a novel ML algorithm for prediction of postoperative pneumonia in patients after liver transplantation. In this retrospective study, data of 894 patients who underwent either living donor liver transplantation (LDLT) or deceased donor liver transplantation (DDLT) in the Third Affiliated Hospital of Sun Yat-sen University-Lingnan Hospital (Guangzhou, Guangdong, China) spanning from January 2015 to September 2019 were retrieved from the EPR systems. All the patients were registered as recipients of organ transplantation in the China Organ Transplant Response Systems (www. cot. org. cn). During the retrospective enrollment, the patients aged < 18 years, presented with preoperative pneumonia or lack of sufficient post-operative data were excluded from this study. In the EPR systems of our hospital, a database platform was established by extracting medical records from hospital information system (HIS), laboratory information system (LIS), picture archiving and communication system (PACS), and Docare Anesthesia System (2005-2020 Medicalsystem Co., Ltd. Suzhou, China). This database platform enabled access to comprehensive data collected during hospital admission, inpatient stay, and post-hospital follow-up visit, including demographic characteristics, daily documentation, laboratory tests, imaging results, anesthesia records, and other clinical characteristics. This study was reported in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines. The primary outcome was the incidence of postoperative pneumonia during the postoperative period before hospital discharge. Postoperative pneumonia was defined on the basis of European Perioperative Clinical Outcome (EPCO) definitions, in which at least one of the following definitive chest X-ray or CT findings was fulfilled: infiltrate, consolidation, cavitation; and at least one of the following signs and symptoms of infection (Temperature > 38 °C or < 36 °C with no other causes, white blood cell (WBC) count > 10 × 10 9 /L or < 4 × 10 9 /L) [6] . The data elements related to the following categories were chosen from database platform: (1) Demographics: age, gender, height and weight; (2) Preoperative comorbidities: hypertension, coronary heart disease, myocardial infarction, diabetes mellitus, history of alcohol abuse, smoking, and past surgery; (3) Etiology: primary liver diseases contributing to the decision of LT with main focus on hepatitis B, hepatitis C, dual infection of any combination of the known hepatitis virus A to E, hepatic malignancy (including hepatocellular carcinoma and cholangiocarcinoma), alcohol-related liver disease (ALD), drug-induced liver injury (DILI), and autoimmune liver disease; (4) Perioperative laboratory values: lab results concerning liver function, kidney function, electrolytes, and count of blood cells. The results of the latest tests prior to surgery were collected. Lab MELD score prior to surgery was calculated; (5) Preoperative complications: complications and metrics reminding the severity of the patients were collected, which mainly consist of complications related to cirrhosis and portal hypertension, the documentation of treatment escalation including length of stay in ICU, use of continuous blood purification (CBP) and mechanical ventilation; (6) Intraoperative incidents: incidents indicating hemodynamic instability, such as cardia arrest, arrhythmia, lactic acidosis, acidosis, hypernatremia, hypokalemia, and hypotension; (7) Intraoperative medication: including intraoperative use of vasoconstrictors (either used as bolus or continuously) and blood coagulant, which reflected the extent of hemodynamic instability and hemorrhagic tendency. The data collected were the accumulative sum by the end of the surgery; (8) Intraoperative fluid and transfusion: the total of intraoperative fluid infusion and output, as well as the total of blood product transfused were respectively extracted. Red blood cell transfusion, plasma transfusion, total blood product transfusion and total fluid transfusion were all classified into two categories based on specific criterions; (9) Post-operative medications with mainly traced the post-operative medications within 7 days after surgery. These medications consist of colloid, vasoconstrictors, as well as immunosuppressant, antifungal agents and antibiotics; (10) Microorganism observation: test on microorganism during preoperative period and post-operative period. With 591 records and 148 features, overfitting could occur during training and undermine model performance. Therefore, we first implemented univariate test to filter out features that were statistically insignificant. Finally, 33 features were statistically significant (P < 0.05) and proceeded to be used in a recursive feature elimination (RFE) method embedded with random forest [19] . Initially, RFE method trained on all features and then it recursively removed least important features, the subset of features which had the highest sensitivity score was selected. To predict postoperative pneumonia, the following six different machine learning models were developed and evaluated for their performance: logistic regression (LR) [20] , support vector machine (SVM) [21] , random forest (RF) [22] , MLP (multilayer perceptron) [23] , extreme gradient boosting (XGBoost) [24] , and gradient boosting machine (GBM) [25] . XGBoost model was constructed using the xgboost package (https:// xgboo st. readt hedocs. io/ en/ latest/ python/ index. html). The remaining five models were established via Scikit-learn package (https:// github. com/ scikit-learn/ scikit-learn). Considering that machine learning models had multiple tuning parameters which were essential for model performance, fivefold crossvalidation grid search method was used for selection of the best parameters and AUCs on testing set were measured (Additional file 1: Table S1 ). The complete data set of 591 adult was then randomly separated into 70% train and 30% test for validation. Bootstrap method was then used to sample 1000 different test sets in order to get 95% confidence interval (CI) of the best tuned models' evaluation metrics. Model performance was evaluated by area under receiver-operating curve (AUC), accuracy, sensitivity, and specificity. Python (Anaconda Distribution, version 3.7) package Numpy (version 1.16.5) and Pandas (version 0.25.1) were employed for data cleaning. Python (Anaconda Distribution version 3.7) Scipy package (version 1.3.1) were used to analyze the data. The continuous variables were presented with the mean along with standard deviation (SD), or median along with interquartile range. Independent sample t-test was used for normally distributed data, while Mann-Whitney U test was used for non-normal distribution data in univariate analyses. Categorical variables were expressed with quantities and percentages, and tested by Chi-square test or Fisher's exact test. Kaplan-Meier methods were applied to estimate the long-term survival rates. Besides, the comparisons between groups were performed by Gehan-Breslow-Wilcoxon test and Log-rank test. No variables had missing percentage higher than 1%. We employed mean imputation, which imputed missing value with the mean of each feature, to fill in missing values. Before we proceeded to machine learning models, continuous variables were normalized based on the mean and SD of the training set. Categorical variables were encoded into binary variable, 1 represents having an incident, 0 represents not having an incident. Gender was also encoded, 1 represents male, 0 represents female. The whole dataset was split into 70% of training set and 30% of testing set. The data in the training set was used for development of predictive models, while the testing set was used to validate models' performance. A total of 894 patients who underwent orthotopic liver transplantation in our hospital, spanning the period from January 2015 to September 2019, were assessed for eligibility. After 65 pediatric patients, 226 patients with preoperative pneumonia, and 12 patients lack of sufficient postoperative data, were excluded, 591 patients were finally enrolled and used for development and performance evaluation of machine learning models to predict postoperative pneumonia. The flow diagram of the enrollment was presented in Fig. 1 . Notably, pneumonia occurred in 253 patients, accounting for as high as 42.81% of the study subjects following liver transplantation, while 338 (57.19%) patients did not have postoperative pneumonia. The demographic characteristics, laboratory tests results, and clinical features of the enrolled patients with or without postoperative pneumonia were summarized in Table 1 . The demographic characteristics and preoperative comorbidities did not differ significantly between the patients with or without occurrence of postoperative pneumonia (P > 0.05). Notably, hepatic malignancy, hematocrit (HCT), alanine transaminase (ALT), total bilirubin (TBIL), albumin (ALB), coagulation function, MELD score, and hospital stay were found to have significant differences between patients with or without postoperative pneumonia (P < 0.05). In particular, the patients without postoperative pneumonia had significantly better preoperative hepatic function, as reflected by preoperative liver function tests in comparison with those patients who developed pneumonia after surgery (P < 0.05). The intraoperative factors, including those in the following three categories: intraoperative incidents, fluid management and transfusion, and medications, were compared between the study patients with or without postoperative pneumonia. As shown in Table 2 , hypernatronemia, longer operation time and anesthesia time, more red blood cell (RBC) transfusion and blood product transfusion, larger volume of infusion and more blood loss were found to be significantly associated with postoperative pneumonia (P < 0.05). Notably, the proportions of patients with RBC transfusion > 18U, blood product transfusion > 5000 mL, total volume of infusion > 10 L, and blood loss > 2 L were significantly higher in the pneumonia group than the non-pneumonia group ( Table 2 ). In addition, higher doses of recombinant activated factor VII (0.343 ± 1.031 vs. 0.134 ± 0.615, P = 0.008) and prothrombin complex concentrate (602.367 ± 410.826 vs. 506.719 ± 359.224, P = 0.01) were administrated in the patients without pneumonia than those with pneumonia. In terms of postoperative medications (Table 3) , the doses of telipressin and dopamine in patients without pneumonia were significantly higher than those with pneumonia (0.148 ± 0.414 vs. 0.079 ± 0.314 mg/day, P = 0.012; 47.544 ± 72.198 vs. 35.473 ± 63.069 mg/day, P = 0.013; respectively). There were no significant differences between the two groups in terms of norepinephrine, dopamine, epinephrine and tacrolimus (P > 0.05). As partially relevant or less important features may negative affect performance of machine learning models, we performed feature selection and ranked levels of feature importance. Feature selection was performed using univariate and recursive feature elimination (RFE) methods, after which dimensionality was reduced from 148 to 14 features. These 14 features were listed as follows: preoperative international normalized ratio (INR), HCT, platelets (PLT), ALB, ALT, fibrinogen (FIB), WBC, prothrombin time (PT), serum sodium (Na + ), TBIL, anesthesia time, preoperative hospital stay, total fluid transfusion, and operation time. Further, feature importance plot was created to rank the levels of importance using fine tuned eXtreme Gradient Boosting (XGBoost) model. As a result, preoperative length of hospital stay, PT, and WBC were ranked first, second, and third, respectively (Fig. 2) . Six machine learning models, including LR, SVM, RF, MLP XGBoost, and GBM, were constructed, and their performance for prediction of postoperative pneumonia was assessed. Additional file 1: Table S1 and Fig. 3 showed the best hyperparameter combination for each model and their AUCs in predicting postoperative pneumonia. XGBoost had the highest AUC value (0.793) with the lowest AUC value (0.674) for SVM. The AUC values of LR, SVM, and MLP were relatively lower than other Data were expressed as frequency (proportion). Continuous variables were presented as mean (standard deviation), or median (interquartile range). The bold emphasis means that p < 0.05 WBC white blood cell, ALT alanine transaminase, AST aspartate amino transferase, TBIL total bilirubin, IBIL indirect bilirubin, ALB albumin, BUN blood urea nitrogen, PT prothrombin time, APTT activated partial thromboplastin time, FIB fibrinogen, INR international normalized ratio In addition to AUCs, accuracy, sensitivity, and specificity were used for evaluation of performance of the six machine learning models. As shown in Table 4 , on Table 4) . Compare with the non-pneumonia group, the pneumonia group had longer postoperative hospital stay [22 (17, 31) vs. 23 (17, 33) days, P = 0.046] ( Table 4 ) and lower 6-month, (91.0% vs. 96.2%; P = 0.01), 12-month (88.6% vs. 93.4%; P = 0.0045), 2-year (85.3% vs. 91.5%; P = 0.021), and 3-year (84.9% vs. 90.9%; P = 0.03) survival rates and overall survival rates (P = 0.0446; Table 5 , Fig. 4 ) than patients without occurring postoperative pneumonia. Early detection of postoperative pneumonia is critical for timely interventions to prevent the onset of the complication. Until now, the predication of postoperative pneumonia has been challenging, and there is need for reliable and accurate predictive model for patients after liver transplantation. This study, based upon large volume of data and ML methods, has the following major novel findings: (1) The incidence of postoperative pneumonia was high in patients after OLT, and the occurrence was significantly associated with prolonged hospital stay and increased mortality after liver transplantation; (2) A total of 14 factors were identified to be significantly correlated with postoperative pneumonia after OLT, including INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na + , TBIL, anesthesia time, preoperative length of hospital stay, total fluid transfusion, and operation time; (3) The XGBoost model exhibited the best overall performance in predicting postoperative pneumonia among the developed ML models, with the value of AUC of 0.794, sensitivity of 52.6%, and specificity of 77.5%; (4) Multiple lines of evidence support that the XGBoost model holds promise for future clinical application in predicting postoperative pneumonia in patients after liver transplantation. XGBoost model is recognized as an efficient and scalable tree boosting system [26] , and it has performed well in the ML competitions, especially the simplicity in use and the accuracy in prediction [27, 28] . In the present study, we developed a total of six ML models, of these, XGBoost model had the best overall performance, with a specificity of 77.5% and a sensitivity of 52.6% in predicting postoperative pneumonia in OLT patients. In the study, the AUC values of LR, SVM, and MLP were relatively lower than other three ensemble machine learning models including XGboost, RF and GBM, whose accuracy and robustness might be attributed to their nature of integrating multiple base classifiers or learners. However, RF is a bagging ensemble, and it needs to train a large amount of decision trees and aggregate them. As a result, it usually takes much more time to trade numerous random computations for high accuracy, compared with GBM and XGboost, which both belong to boosting ensemble method. Moreover, compared to GBM, XGboost leverages second order derivative and implements sampling method in each iteration to alleviate overfitting and speed up computation. Considering the high prevalence of multi-drug resistant bacteria in post-transplant patients induced by the excessive use of antibiotics [4] , high specificity is especially necessary in clinical practice to avoid an unnecessary and overuse of antibiotics in low-risk patients. By contrast, all patients received peri-operative antibiotic therapy for 72 h, and this has posed considerable challenge in predicting pneumonia at an early stage [29] . Therefore, the novel XGBoost model as established in this study may assist clinicians in making optimal interventions and treatments, and eventually improve care for affected patients. It has been reported that a number of risk factors, including age of recipient, liver dysfunction score, indication for OLT, perioperative transfusions especially the blood and fresh frozen plasma units, restrictive preoperative pulmonary testing pattern and INR measured prior OLT, are significantly associated with post-liver transplant pneumonia [3, 30, 31] . However, these factors are limited for its underutilization of within-category information, causing a loss of information [32] . For instance, patients above or below the optimal cut-point value had been equally considered in the risk-factor prediction, yet the risk of post-transplant pneumonia may vary considerably. As the risk-factor prediction is developed with neither combining all factors together nor weighting difference between different factors, it is not widely used in clinical practice. In addition, the traditional scores were given on the basis of the assumption that all misclassification errors have equal costs. In fact, this assumption is indefensible if apply in real-world applications [33] . In this study, we applied RFE feature selection method on 33 features which were statistically significant, of which 14 best features with the highest sensitivity score, including preoperative laboratory results of INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na + , TBIL, anesthesia time; preoperative length of hospital stay, total fluid transfusion, and operation time. We found that most of the factors have been reported to be associated with pneumonia and PPCs except for PLT and serum Na + [18, 30, 31, 34, 35] . As the risk factors reported in different literatures are quite different and this may be attributed to different population and definition of pneumonia and PPCs, we think it just reflects the advantage of ML models to capture previously unknown correlations in big data. Although the underlying mechanism remained unclear, the high clinical relevance of these factors laid a solid foundation for the consequent ML process and made the conclusion more practical and clinically valuable [36] . Moreover, we found the 14 features in ML model were all routinely recorded and widely used, and no factors need special instrument or equipment to obtain, indicating that our models are feasible and can be widely used in hospitals. To date, ML models have shown outstanding performance in prediction of diseases and clinical conditions, for which these models can be helpful in decision-making about the use of interventions and medications [33] . For example, ML models can generate an individualized probability for each patient. Additionally, implementation of sophisticated computer algorithms at the bedside has become a reality since the popularity of EPR systems and wide availability of structured patient data. In our study, the EPR systems included HIS, LIS, PACS, and Docare Anesthesia System, which allowed us to integrate medical data generated during admission, covering demographic data, daily documentation, laboratory and imaging results, anesthesia records and thorough record of medication, and treatment. In addition, we separated the patients 1000 times (70% train and 30% test) into 1000 different pairs of train and test sets and this could minimize accidental error and enhance the accuracy of the current ML models. This result showed that in predicting post-transplant pneumonia, we should not apply only one of the ML model. In the study, we found that patients with hepatic malignancy, better hepatic function before surgery, and longer hospital stay before surgery were significantly associated with lower risk of developing postoperative pneumonia. We postulated that this could be attributed to the better preoperative treatment and preparation, suggested that interventions should be implemented to improve the patients' overall preoperative conditions. In consistence with previous reports [37, 38] , we identified that a number of intraoperative factors, such as the longer operation and anesthesia time, excessive blood product transfusion, and fluid transfusion, were significantly related to postoperative pneumonia in patients following liver transplantation. By contrast, we found that there was an association between the use of telipressin and dopamine and decreased incidence of postoperative pneumonia in patients after liver transplantation. These findings are clinically important for the intraoperative anesthetic management and help improving the clinical outcomes. The study may have several limitations. Firstly, the ML models are developed on the basis of a singlecenter cohort study, and future multi-center study will be needed for external validation. Secondly, this study is performed retrospectively, for which collection and entry bias, as well as possible residual confounding may occur. Thirdly, we were unable to incorporate the metrics of liver donors as training variables in our study, due to the lack of donor information in the EPR systems of our hospital. Our study has successfully established six novel ML models to predict postoperative pneumonia among OLT patients. Of these, the XGboost model has demonstrated overall best performance, and therefore holds promise for future clinical application to predict post-transplant pneumonia in OLT patients. To the best of our knowledge, this is the first ML-based study to provide a novel ML algorithm for prediction of postoperative pneumonia in patients after liver transplantation. Exercise capacity impairment can predict postoperative pulmonary complications after liver transplantation. Respiration N-Acetylcysteine inhalation improves pulmonary function in patients received liver transplantation Pulmonary complications after elective liver transplantation-incidence, risk factors, and outcome Surveillance culture for multidrug-resistant gram-negative bacteria: performance in liver transplant recipients An outbreak of Pneumocytis jirovecii pneumonia among liver transplant recipients Postoperative pulmonary complications Can a novel clinical risk score improve pneumonia prediction in acute stroke care? A UK multicenter cohort study Development and validation of a multifactorial risk index for predicting postoperative pneumonia after major noncardiac surgery Systemic inflammation score as a predictor of pneumonia after radical resection of gastric cancer: analysis of a multiinstitutional dataset Pneumocystis jirovecii pneumonia in liver transplant recipients: a systematic review Predicting clinical outcomes of large vessel occlusion before mechanical thrombectomy using machine learning Predicting phenotypic polymyxin resistance in Klebsiella pneumoniae through machine learning analysis of genomic data ):e13389. • fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year submit your research ? Choose BMC Prediction of gestational diabetes based on nationwide electronic health records Using machine learning to predict stroke-associated pneumonia in Chinese acute ischaemic stroke patients Machine learning for the prediction of severe pneumonia during posttransplant hospitalization in recipients of a deceased-donor kidney transplant Applying machine learning in liver disease and transplantation: a comprehensive review Retrospective comparative study on postoperative pulmonary complications after orthotopic liver transplantation using the Melbourne Group Scale (MGS-2) diagnostic criteria Using recursive feature elimination in random forest to account for correlated variables in high dimensional data Development of a web-based system for exploring cancer risk with long-term use of drugs: logistic regression approach Predictive models of sepsis in adult ICU patients Prediction of postoperative complications of pediatric cataract patients using data mining Lung cancer detection using artificial neural network Prediction of complications after paediatric cardiac surgery Application of machine learning in the diagnosis of gastric cancer based on noninvasive characteristics Greedy function approximation: a gradient boosting machine Correction to extreme gradient boosting as a method for quantitative structure-activity relationships The Higgs machine learning challenge A composite model of wound segmentation based on traditional methods and deep neural networks Early-onset pneumonia after liver transplant: microbial causes, risk factors, and outcomes Human metapneumovirus infection: pneumonia risk factors in patients with solid organ transplantation and computed tomography findings Diagnosis of ventilator-associated pneumonia using electronic nose sensor array signals: solutions to improve the application of machine learning in respiratory research Machine learning in medicine Clinical and laboratory characteristics of 215 cases of coronavirus disease 2019 with different prognosis Analysis of risk factors for 24 patients with COVID-19 developing from moderate to severe condition Accelerating chart review using automated methods on electronic health record data for postoperative complications Prediction of postoperative pulmonary complications Ventilator associated pneumonia following liver transplantation: etiology, risk factors and outcome Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors thank all of the patients who kindly participated in this study. The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12931-021-01690-3. Table S1 . Hyperparameter combinations for each machine learning models and their corresponding test set AUC. Authors' contributions SZ, ZH and YY participated in research design. CC, DY, LC and YZ participated in the writing of the paper. CC, DY, SG and ZM analyzed and explained the data; LC, YZ, SG, BW and CC collected and helped with analyzing the data of the study. All authors read and approved the final manuscript. All data generated or analysed during this study are included in this published article [and its additional information files]. Ethics approval and consent to participate Ethical approval for this study (No.[2019]02-609-01) was provided by the Ethnic Committee of the Third Affiliated Hospital of Sun Yat-sen University-Lingnan Hospital, Guangzhou, China (Chairperson Prof. Xie Dongying) on 18 May 2019. The requirement for informed consent and clinical trial registration were waived by the committee, mainly due to the retrospective nature of this study. Not applicable. The authors declare no competing interests.