key: cord-1013410-cnt42o33 authors: Ryan, Logan; Maharjan, Jenish; Mataraso, Samson; Barnes, Gina; Hoffman, Jana; Mao, Qingqing; Calvert, Jacob; Das, Ritankar title: Predicting pulmonary embolism among hospitalized patients with machine learning algorithms date: 2022-01-11 journal: Pulm Circ DOI: 10.1002/pul2.12013 sha: ff1aba092e9daf0397cb9525829b58e9f0858137 doc_id: 1013410 cord_uid: cnt42o33 BACKGROUND: Pulmonary embolisms (PE) are life‐threatening medical events, and early identification of patients experiencing a PE is essential to optimizing patient outcomes. Current tools for risk stratification of PE patients are limited and unable to predict PE events before their occurrence. OBJECTIVE: We developed a machine learning algorithm (MLA) designed to identify patients at risk of PE before the clinical detection of onset in an inpatient population. MATERIALS AND METHODS: Three machine learning (ML) models were developed on electronic health record data from 63,798 medical and surgical inpatients in a large US medical center. These models included logistic regression, neural network, and gradient boosted tree (XGBoost) models. All models used only routinely collected demographic, clinical, and laboratory information as inputs. All were evaluated for their ability to predict PE at the first time patient vital signs and lab measures required for the MLA to run were available. Performance was assessed with regard to the area under the receiver operating characteristic (AUROC), sensitivity, and specificity. RESULTS: The model trained using XGBoost demonstrated the strongest performance for predicting PEs. The XGBoost model obtained an AUROC of 0.85, a sensitivity of 81%, and a specificity of 70%. The neural network and logistic regression models obtained AUROCs of 0.74 and 0.67, sensitivity of 81% and 81%, and specificity of 44% and 35%, respectively. CONCLUSIONS: This algorithm may improve patient outcomes through earlier recognition and prediction of PE, enabling earlier diagnosis and treatment of PE. A pulmonary embolism (PE) is an obstruction of a blood vessel in the branching arteries of the lung. 1 The obstruction is usually caused by the embolization of a distal blood clot (or thrombus) originating in a deep vein. A PE is a life-threatening condition associated with high morbidity and mortality; mortality rates are estimated to be 30% in untreated PE and 8% in treated PE. 2 Recognition of PE is made difficult by a significant overlap between symptoms of PE and symptoms of other conditions, such as acute coronary syndrome, heart failure, pneumonia, and exacerbation of chronic obstructive pulmonary disease (COPD). 3 Clinical signs and symptoms of PE are considered to be limited in terms of both sensitivity and specificity. 1 Accurate prediction of inpatient PE remains an unmet need. This study describes the development of machine learning algorithms (MLAs) to predict PE in hospitalized patients. Predictions may then be used to either hasten identification of a PE which has occurred but has not yet been detected or to predict the future development of PE during a patient's hospital stay. Identification of patients likely to experience a PE would enable increased monitoring of high-risk patients and earlier diagnosis. In addition, high-risk patients with no contraindications could begin anticoagulants earlier in the disease course or prophylactically, potentially reducing the need for higher-risk procedures such as catheter-directed thrombolysis. 4 This tool is designed to enable earlier PE diagnosis and more timely intervention, and provide clinicians with the opportunity to improve patient outcomes. Data used for model development were extracted from the electronic health record (EHR) system at a large, tertiary medical center in the western United States. Data were extracted from medical and surgical patients admitted to the hospital between May 2011 and November 2017. This data set contained patient data including demographics, lab results, vital sign measurements, medication usage, and patient diagnoses. Data were collected passively and were deidentified in compliance with the Health Insurance Portability and Accountability Act. Because this study was performed on deidentified data, and therefore, constitutes a nonhuman subject study as per the definition of human subjects research put forth in 45 Code of Federal Regulations 46, it was exempt from Institutional Review Board approval. We included all patients who had at least one recorded measure of all vital signs present in the chart (systolic and diastolic blood pressure, heart rate, respiratory rate, and temperature), at least one of the laboratory measurements used in the models (Table S1 ) present in the chart, and who were 40 years or older. The age criterion was included to minimize the probability of false alerts. The risk of PE increases significantly with age and is a less likely diagnosis in young adult patients. 5 The algorithm was designed to use only the first 3 h of data after any vital or lab measurement was recorded during a patient's hospital stay. Any data collected after the 3-h mark were excluded. All patients meeting the gold standard definition of PE before the algorithm was able to generate predictions (i.e., before all required measurements were present in the chart) were excluded, therefore, no patients with a known prevalent PE were included in the study cohort. The gold standard definition of PE was identified by the presence of an International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9) or ICD, Tenth Revision (ICD-10) code for PE in the patient chart, together with the presence of an order for a therapeutic regimen of anticoagulants in terms of type and dosage, 6 order of a thrombolytic medication, or insertion of an inferior vena cava (IVC) filter. Although ICD-10 codes have high sensitivity (~90%) for detecting PE, false positives may be present. 7, 8 Therefore, the use of ICD codes and therapeutic treatment together as the gold standard enhanced our ability to capture true positive class PE patients. ICD codes and medication orders used to identify PE are presented in Table S2 . We also included thrombolytics in our definition of the gold standard, as these medications may be urgently ordered for PE patients who are hemodynamically unstable. 9 We note that we included a broader list of thrombolytics than are currently approved by the Food and Drug Administration or recommended for use in treating acute PE to account for variability in real-world clinical practice and to ensure that patients experiencing a PE were unlikely to be missed by our gold standard definition of PE. Treatment with therapeutic anticoagulants, thrombolytics, or IVC filter placement was also used to determine the onset time of the clinical detection of PE, as ICD codes are not reliably assigned a time-stamp and so could not be used to determine which patients to exclude due to diagnosis of PE before the time of a prediction by the algorithm. All patients who met the PE gold standard during their hospitalization were considered to be positive for PE. All other patients were considered to be negative. MLAs were developed to predict the development of PE at any point during the patient stay. We compared the performance of models developed using logistic regression, neural network (using a multilayer perceptron method), and gradient boosted decision trees (implemented using XGBoost in Python). 10, 11 For consistency with regard to the inputs, the vitals and lab measurements from the 3-h input period were binned and averaged for every hour. The difference between the measurements of the first and second hours and second and third hours were calculated and provided as new inputs to the model to provide information about the change in measurements over time. Since logistic regression and neural network models are unable to incorporate missing data or not a number (NaN) values, such values were replaced with the average value for that feature across the entire data set. The algorithm generated PE risk predictions the first time all patient vitals and at least one laboratory measure was present in the patient chart. The features included in all models are presented in Table S1 . Before model training, the development data set was randomly split in a ratio of 80:20, with these partitions forming the training and test sets, respectively. The XGBoost model was trained with 100 estimators and a maximum depth of four nodes. The learning rate, gamma, colsample by tree, and L2 regularization (lambda) values were set to 0.08, 0.2, 0.6, and 3, respectively. The value for the hyperparameter scale_ pos_weight was set to 12.8 which accounted for the high-class imbalance in the data set. All of the hyperparameters were selected for the XGBoost by performing a cross-validated grid search. A neural network model was trained with 1 hidden layer of 100 neurons for a maximum of 300 iterations with ReLU activation function to introduce nonlinearity. A learning rate of 0.001 and tolerance for optimization of 0.001 was used for training the neural network. Model performance on the hold-out test set was assessed with regard to area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive and negative likelihood ratios, and diagnostic odds ratio. A Shapley additive explanations (SHAP) analysis 12 was performed to evaluate the feature importance used by the best performing model in generating predictions. In total, 60,297 patients were included in the experiments to develop and test the three algorithms, 309 of whom experienced a PE while hospitalized. On average, patients experiencing a PE were likely to be older, have a history of cancer, have experienced past venous thromboembolism (VTE), or be diagnosed with pneumonia. Patient demographic information for the full data set is presented in Table 1 . Demographic information for the hold-out test set only is presented in Table S3 . Of the three machine learning (ML) models examined, XGBoost demonstrated the highest performance in terms of AUROC ( Figure 1 ). XGBoost achieved an AUROC of 0.85, while the neural network and logistic regression models achieved AUROC values of 0.74 and 0.67, respectively. At a constant sensitivity of 81% across models, the XGBoost model also obtained superior specificity, positive and negative likelihood ratios, and diagnostic odds ratio values ( Table 2) . The feature importance plot for XGBoost was generated using the TreeSHAP algorithm ( Figure 2 ). The SHAP summary plot ranks the most important input features based on the contribution to the decisionmaking process of the algorithm. Record of recent fracture, history of surgery, and history of deep vein thrombosis were the top three most important features for generating accurate predictions of PE at any point during the patient's stay. To study the effects of anticoagulant usage before the algorithm runtime, we also trained an XGBoost model with an additional boolean input indicating whether a patient was administered anticoagulants during the hospital stay before the algorithm run time. However, the addition of this extra input did not significantly improve the model's performance. In this study, we demonstrated the ability of ML-based models to identify patients at high risk of experiencing a PE before the event occurred or was clinically detected. Because PE can rapidly become a life-threatening event, early detection or advanced prediction can optimize care by enabling rapid diagnosis and treatment or prophylaxis. The present study determined that our gradient boosted algorithm is capable of accurately predicting development of PE before the clinical detection of onset. Although several ML models were evaluated for their ability to predict PE in hospitalized patients, the gradient boosted decision tree algorithm developed using PULMONARY CIRCULATION | 3 of 9 XGBoost performed most accurately, which may be attributed to XGBoost's superior handling of missing data. It is possible that other neural networks could be more customized to this prediction task with additional research to outperform the multilayer perceptron model used in the present study. The accuracy of predictions made by XGBoost makes the use of this model advantageous. A SHAP analysis was used to evaluate the contributions of individual features to model predictions. 12 Several features identified as important for model predictions in the SHAP summary plot of the XGBoost model ( Figure 2 ) have previously been identified as provoking factors for provoked PE, or PE precipitated by identifiable, major risk factors. Major trauma is considered a risk factor for provoked PE, 5 and recent fracture may act as a proxy for recent trauma. The SHAP analysis also identified additional, nonprovoking risk factors which have been linked to increased risk of PE. Previous DVT, a known PE risk factor, was one of the five most important features for PE prediction. 13 Further, patient fluid status impacts hemoconcentration, which has been linked to increased risk of thromboembolic events. 14 Urine output, change in urine output and receipt of a fluid bolus were among the most important features, which, in combination, may reflect whether an individual is dehydrated and hemoconcentrated, or in a more balanced, net fluid positive situation in which additional IV fluids are not required. Obesity has also been associated with increased risk for VTE and PE 15 ; higher weight was associated with higher risk of a prediction of PE, suggesting that weight as a feature may have indirectly represented obesity. The MLA developed in this study offers many advantages over alternative risk stratification methods. Unlike many existing rules-based risk stratification methods, this algorithm requires no additional clinician inputs or workflow disruption and automatically screens a broad inpatient population based only on data taken from the EHR. Several rules-based tools have been developed to aid in the stratification of patients with suspected PE; the two most commonly used risk scores being the Wells criteria for PE 16 and the revised Geneva score. 17 However, these tools were designed for use in patients who are suspected to have an existing PE, and were not designed to predict the future occurrence of PE. Furthermore, these scoring systems have only been validated in assessing the risk of PE in outpatients and not for hospitalized patients. 18 A performance accuracy meta-analysis of the Wells and revised Geneva score for diagnosing PEs reported that sensitivity ranged from 63% to 79% and 55% to 74%, respectively. 19 The specificity for the Wells score ranged from 49% to 90% and had an AUROC of 0.78 while the Geneva score had an AUROC of 0.69. Furthermore, as the use of the Wells score has been validated in discharged patients, one study investigated its performance for hospitalized patients suspected of having deep vein thrombosis and deemed that it was inaccurate, and thus unsafe for inpatient use. 20 Additional risk assessment tools developed for VTE, as opposed to strictly PE, are also commonly used. These include the International Medical Prevention Registry on Venous Thromboembolism (IMPROVE) score, used for determining VTE in hospitalized patients, 21 the Padua Prediction Score (PPS), which assesses VTE risk in inpatient populations, and the Caprini Score, designed for use with surgical patients. 22 However, each of these scorings has its limitations. For example, efforts to enhance the accuracy of VTE identification with the IMPROVE tool have been achieved by incorporating D-dimer lab values, which require laboratory tests that may not be readily available. 23, 24 The Padua score has not been extensively validated, which limits the generalizability of the tool 22 ; though the Caprini score has been validated, this has been achieved in specific subpopulations, including surgical patients, patients hospitalized with serious illness, and recently, those with COVID-19. [25] [26] [27] Therefore, this tool may also lack generalizability. Our MLA has been developed and evaluated on a broad hospital inpatient population and demonstrates high accuracy in screening this population for PE risk. ML methods have been explored for PE prediction in the past; however, prior work has focused on the ability of ML to assist with interpreting chest images [28] [29] [30] or their accompanying radiology reports. [31] [32] [33] [34] Although these methods have the potential to increase the accuracy and timeliness of definitive PE diagnosis, MLAs for interpretation of radiological F I G U R E 2 SHAP summary plot for the XGBoost model. The x axis of the plot shows the SHAP value for each of the features. The color of a point is indicative of the feature value, where red is a high value and blue is a low value. The y axis lists feature names in descending order of importance to the model's decision-making process. Superscripts in the feature names denote the hour of the patient's stay at which the measure was recorded. Delta symbols (Δ) are used when a feature captures the hourly change in a measure, with superscripts denoting the hours under consideration, for example, ΔUrine Output 1,2 is the change in urine output measured during the first hour and the second hour in the 3-h input period. DVT, deep vein thrombosis; GCS, Glasgow coma scale; SysABP, systolic arterial blood pressure imaging cannot be applied to patients who have not yet undergone a formal diagnostic workup for PE. The method described in this study adds to the existing literature, by demonstrating a means through which ML methods can improve PE prediction and early detection upstream of the confirmatory diagnostic process. Such prediction may improve patient outcomes by enabling not only earlier therapeutic treatment of PE but also prophylactic use of anticoagulants. Interestingly, a recent study by Nafee et al. developed MLAs to predict the risk of VTE in hospitalized patients with certain acute medical illnesses, and these outperformed the IMPROVE score. 35 This study obtained data from the APEX clinical trial. 36 However, certain lab measurements used for the APEX trial may not be routinely taken, and thus could limit the MLA's use. Our study has several limitations. First, we evaluated the MLAs in a retrospective setting and we, therefore, cannot determine how any algorithm will perform in novel, prospective settings. Also related to the retrospective nature of this study, it is possible that not all patients' PE statuses were correctly classified by the gold standard. ICD codes have known limitations for identifying acute conditions, including PE, in chart data. 7 However, ICD codes for PE have previously been shown to identify PE within hospitalization data with sensitivities and specificities ranging from 88% to over 90%. 8 We attempted to mitigate this limitation by requiring that all identified PE cases additionally documented accepted treatments for PE (therapeutic anticoagulant regimens, thrombolytic medications, or placement of an IVC filter) during their hospital stay, to increase the diagnostic specificity. This retrospective study was conducted on data from a single hospital center. We cannot predict how this algorithm would perform in patient populations that have different demographics or a different incidence of PE, or those who reside in another geographic location. It is necessary to evaluate the algorithm on a range of additional sites to determine its generalizability. The incorporation of imaging data into the model may improve its performance and is an area for further investigation. Evaluation of the MLA in a prospective clinical setting is required to evaluate any effect on clinician actions and impact on patient outcomes. In this study, an MLA capable of accurately predicting which patients will develop a PE during their hospitalization was developed. This gradient boosted algorithm utilized only routinely collected health data from inpatient EHR data to predict risk of inpatient development of PE. The algorithm described in this study may be able to improve patient outcomes through earlier identification of at-risk patients, allowing for earlier confirmatory diagnostic testing and treatment of PE. Pulmonary embolism Venous thromboembolism: a public health concern Pulmonary embolism, part I: epidemiology, risk factors and risk stratification, pathophysiology, clinical presentation, diagnosis and nonthrombotic pulmonary embolism PERT C Diagnosis, treatment and follow up of acute pulmonary embolism: Consensus Practice from the PERT Consortium Pulmonary embolism: update on management and controversies Antithrombotic therapy for VTE disease: Antithrombotic Therapy and Prevention of Thrombosis Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware ICD-10 hospital discharge diagnosis codes were sensitive for identifying pulmonary embolism but not deep vein thrombosis Systemic Thrombolysis for pulmonary embolism: a review Python Package Introduction-xgboost 1.4.0-SNAPSHOT documentation XGBoost: A Scalable Tree Boosting System A Unified approach to interpreting model predictions. ArXiv170507874 Cs Stat Venous thromboembolism: classification, risk factors, diagnosis, and management Hematocrit and risk of venous thromboembolism in a general population. the Tromsø study Obesity as a risk factor in venous thromboembolism Excluding pulmonary embolism at the bedside without diagnostic imaging: management of patients with suspected pulmonary embolism presenting to the emergency department by using a simple clinical model and d-dimer Prediction of pulmonary embolism in the emergency department: the revised Geneva score Performance of 4 clinical decision rules in the diagnostic management of acute pulmonary embolism: a prospective cohort study Comparison of the Wells score with the revised Geneva score for assessing suspected pulmonary embolism: a systematic review and meta-analysis The Wells rule is not accurate in hospitalized patients IMPROVE I Predictive and associative models to identify hospitalized medical patients at risk for VTE Assessment of the risk of venous thromboembolism in medical inpatients using the Padua Prediction Score and Caprini Risk Assessment Model The IMPROVEDD VTE Risk Score: incorporation of D-Dimer into the IMPROVE Score to improve venous thromboembolism risk stratification Modified IMPROVE VTE Risk Score and elevated D-Dimer identify a high venous thromboembolism risk in acutely ill medical population for extended thromboprophylaxis The original and modified Caprini score equally predicts venous thromboembolism in COVID-19 patients The combination of Caprini risk assessment scale and thrombotic biomarkers to evaluate the risk of venous thromboembolism in critically ill patients Completion of the Updated Caprini Risk Assessment Model (2013 Version) Automatic segmentation of arterial tree from 3D computed tomographic pulmonary angiography (CTPA) scans Evaluation of acute pulmonary embolism and clot burden on CTPA with deep learning Development and Performance of the Pulmonary Embolism Result Forecast Model (PERFORM) for computed tomography clinical decision support Comparative Effectiveness of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) Architectures for Radiology Text Report Classification Radiology report annotation using intelligent word embeddings: applied to multi-institutional chest CT cohort Deep learning to classify radiology free-text reports Towards automated generation of curated datasets in radiology: application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism Machine learning to predict venous thrombosis in acutely ill medical patients The design and rationale for the Acute Medically Ill Venous Thromboembolism Prevention with Extended Duration Betrixaban (APEX) study The authors would like to sincerely thank and acknowledge Anna Siefkas for her writing and revision of the manuscript. This study received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. All authors associated with Dascena are employees of Dascena. The data in this study were collected passively and deidentified in compliance with the Health Insurance Portability and Accountability Act and therefore did not require Institutional Review Board approval.