key: cord-0066616-u1p6im53 authors: Trujillo Rivera, Eduardo A.; Chamberlain, James M.; Patel, Anita K.; Zeng-Treitler, Qing; Bost, James E.; Heneghan, Julia A.; Morizono, Hiroki; Pollack, Murray M. title: Predicting Future Care Requirements Using Machine Learning for Pediatric Intensive and Routine Care Inpatients date: 2021-08-10 journal: Crit Care Explor DOI: 10.1097/cce.0000000000000505 sha: 7a34dca790b4ab3dcefbe088548c740a98da12af doc_id: 66616 cord_uid: u1p6im53 Develop and compare separate prediction models for ICU and non-ICU care for hospitalized children in four future time periods (6–12, 12–18, 18–24, and 24–30 hr) and assess these models in an independent cohort and simulated children’s hospital. DESIGN: Predictive modeling used cohorts from the Health Facts database (Cerner Corporation, Kansas City, MO). SETTING: Children hospitalized in ICUs. PATIENTS: Children with greater than or equal to one ICU admission (n = 20,014) and randomly selected routine care children without ICU admission (n = 20,130) from 2009 to 2016 were used for model development and validation. An independent 2017–2018 cohort consisted of 80,089 children. INTERVENTIONS: None. MEASUREMENT AND MAIN RESULTS: Initially, we undersampled non-ICU patients for development and comparison of the models. We randomly assigned 64% of patients for training, 8% for validation, and 28% for testing in both clinical groups. Two additional validation cohorts were tested: a simulated children’s hospitals and the 2017–2018 cohort. The main outcome was ICU care or non-ICU care in four future time periods based on physiology, therapy, and care intensity. Four independent, sequential, and fully connected neural networks were calibrated to risk of ICU care at each time period. Performance for all models in the test sample were comparable including sensitivity greater than or equal to 0.727, specificity greater than or equal to 0.885, accuracy greater than 0.850, area under the receiver operating characteristic curves greater than or equal to 0.917, and all had excellent calibration (all R(2)s > 0.98). Model performance in the 2017–2018 cohort was sensitivity greater than or equal to 0.545, specificity greater than or equal to 0.972, accuracy greater than or equal to 0.921, area under the receiver operating characteristic curves greater than or equal to 0.946, and R(2)s greater than or equal to 0.979. Performance metrics were comparable for the simulated children’s hospital and for hospitals stratified by teaching status, bed numbers, and geographic location. CONCLUSIONS: Machine learning models using physiology, therapy, and care intensity predicting future care needs had promising performance metrics. Notably, performance metrics were similar as the prediction time periods increased from 6–12 hours to 24–30 hours. which 25-50% of the beds are ICU-level care (1) (2) (3) . Importantly, approximately 20% of PICU patients are transferred from non-ICU care areas, often after clinical deterioration (4) . Pediatric inpatients requiring transfer to ICU care are more likely to develop new morbidity and more likely to die than postoperative admissions (2) . It is often difficult for clinicians to predict which patients will respond favorably to medical interventions and which will deteriorate (5) . Early identification of patients responding to therapies and those at substantial risk for clinical deterioration could allow for earlier discharge or more aggressive interventions that might alter their clinical course, reduce morbidities, and, in severe cases, prevent death. Predicting future clinical events ideally includes the timing of change based on a consideration of the current physiologic state as a measure of severity of illness, the milieu of therapies and therapeutic intensity, and assessment of the trajectory of these variables. Recently, we validated a new severity measure, the Criticality Index, based on the physiology, therapies, and therapeutic intensity, which accounts for changes in these variables over time. The Criticality Index is calibrated to the probability of receiving ICU care and demonstrated large Criticality Index differences among high-intensity ICU care, ICU care, and routine care (4, 6) . Therefore, predicting changes in severity of illness for pediatric inpatients can be operationalized in a single model as predicting the care area. A major goal of clinical outcome prediction has been to predict changes in severity of illness to identify patients who will either need ICU care, continue their current care needs, or transition out of intensive care with sufficient temporal warning to allow for clinical interventions that might alter the clinical course. This goal can be operationalized in a single model by predicting severity changes at specified, future time periods based on the Criticality Index that use the outcome of ICU or non-ICU (routine) care. There were three goals for this analysis. First, we developed and compared separate machine learning models for prediction of care location (ICU or non-ICU) for hospitalized children in future time periods of 6-12, 12-18, 18-24, and 24-30 hours. This analysis used a research database with a distribution of ICU and non-ICU patients that enhanced model development. Second, we assessed performance in an independent dataset from 2017 to 2018. Third, we assessed potential clinical applicability by assessing performance in a simulated children's hospital, determining the accuracy of predicting ICU admission, and assessing the potential influence of institutional characteristics on model performances. We focused on a 24-hour time frame divided into discrete 6-hour time periods because this time frame and organization could have substantial implications for patient safety, clinical outcomes, and resource utilization. The model development dataset was derived from the Health Facts database (Cerner Corporation, Kansas City, MO) that collects comprehensive deidentified clinical data on patient encounters from hospitals in the United States with a Cerner data use agreement. Data are date-and time-stamped including admission and demographic data, laboratory results, medication data derived from pharmacy records, diagnostic and procedure codes, vital signs, respiratory data, and hospital outcome. Cerner Corporation has established HIPAA compliance operating policies to establish deidentification of Health Facts. Not all data are available for all patients. Health Facts has been assessed as representative of the United States (7) and used in previous care assessments including the Acute Physiology and Chronic Health Evaluation score (8) and medication assessments for children in ICUs (9, 10) . Details on preparing data have been published, including data cleaning and data definitions, medications and medication classification, laboratory data, and vital signs and respiratory data (6) . Medication data were determined from pharmacy records using start and discontinuation times. Drugs were categorized by Multum (North Kansas City, MO) (11) . Diagnoses were categorized based on the International Classification of Diseases (ICD), 9th Edition and ICD, 10th Edition classifications (12, 13) . The primary diagnosis was used for descriptive purposes but not for modeling because it was determined at discharge. Inclusion criteria included age less than 22 years (14), laboratory, vital signs, and medication data and care in non-ICU care units or ICUs from January 3 2009 to June 2016. Exclusion criteria included hospital length of stay greater than 100 days, ICU length of stay greater than 30 days, or care in the neonatal ICU. For model building, we included all patients receiving ICU care and a randomly selected sample of patients receiving only non-ICU care, approximately equal in size to the ICU sample. Therefore, we undersampled the non-ICU patients to enhance modeling. The hospital course was discretized into consecutive 6-hour time periods because data acquisition for non-ICU care children is relatively infrequent compared with ICU patients. Each time period was categorized into the mutually exclusive categories of ICU care or non-ICU care; we excluded time periods when the patient was in both the non-ICU and the ICU. The variables, definitions, and statistics for each variable used for modeling are shown in Supplemental Digital Content 1 (http://links.lww.com/CCX/A736). The variables are those in the Criticality Index and consist of six routine vital signs, 30 routinely measured laboratory variables, and parenterally administered medications. The machine learning methods required laboratory and vital sign measurements in each time period, requiring imputation for missing data. Consistent with other machine learning models, we imputed laboratory results and vital signs using the last known result because, in general, physicians use the last measured values and repeat measurements when required for clinical care or when results are acquired routinely (15, 16) . If during the first 6-hour time period there were missing values, these values were set to the median of the first 6-hour time periods using nine age groups (4, 6) . These imputed values have been reported (4, 6) . All were either in the normal range or had only minor deviations from normal. This imputation scheme is similar to other severity scores which assume normal values for unmeasured variables (2, 17) , with the improvement that specific estimates for ICU patients are used rather than normal data. The imputed values were identified in the modeling (below) by setting the count equal to zero. The possibility that imputation induced a systematic biased was exploring using pairwise comparison of distributions of laboratory and vital signs with and without imputation (18, 19) . No bias was evident. We randomly assigned 64% of patients for training, 8% for validation, and 28% for testing for the ICU and non-ICU patient groups. This distribution was chosen to maximize the test sample. Random selection was at the patient level. The training set was used for model development, and the validation set was used to fine-tune variables to avoid overfitting. The training and validation sets were combined for calibration of each of the models. The test sample and a 2017-2018 cohort were used to evaluate model performance and calibration. Independent neural networks calibrated to risk of ICU care were developed for four future times: 6-12, 12-18, 18-24, and 24-30 hours. Therefore, predictions of ICU or non-ICU care were based on a single model for each time period. The models are sequential, and layers are fully connected. Each model had seven hidden dense layers, an output layer with one node, and logistic activation. Inputs for the models included variables of the present and immediate past time period. Our model architecture is the result of sequential efforts to maximize the Mathew Correlation Coefficient (MCC). Initially, models with one hidden layer were considered. We sequentially increased the number of internal nodes in combination with the proportion of dropout nodes. This process along with L2 norm regularization and monitoring MCC values between training and validation sets determined the final number of nodes for the first hidden layer. We stopped increasing the number of nodes when the MCC of the validation and trainings sets converged to a common value. The architecture of this hidden layer was frozen, and additional hidden layers proceeded similarly. We stopped adding hidden layers when they did not significantly increase the MCC of the training and validation sets. Overfitting was avoided by keeping the MCC of the training and validation sets at a difference of no more than 0.05 as well as maintaining the stability of the other performance metrics. Each model was independently calibrated to the respective future risk of ICU care (20) . These model outputs predict future care areas of non-ICU and ICU care (4, 6) . The performance of the four models was first assessed in the test sample. Initially, the models were assessed with confusion matrices at the decision cut point of 0.5 (21) (22) (23) and areas under the receiver operating www.ccejournal.org August 2021 • Volume 3 • Number 8 characteristic curves (AUROC) and precision-recall curves (PRAUC) with their 95% CIs (24) . The number needed to evaluate is not shown but can be calculated as 1/precision. Accuracy, precision, and negative predictive value for the test sample were assessed for sensitivities (true positive rate) and specificities (true negative rate) of 0.85, 0.90, 0.95, and 0.99 for the approximate lower boundary of the 95% CI. Prediction of true positive indicates the patient is expected to be transferred to the ICU or remain in the ICU for the outcome time period, whereas prediction of a true negative indicates the patient is expected to remain in a non-ICU care area or be transferred out of the ICU to a non-ICU care area. For those patients correctly predicted to be transferred from non-ICU to ICU care, we computed the percentage of those receiving either mechanical ventilation or vasoactive agents within 24 hours of transfer. Second, we assessed the calibration of each model over the full range of risk intervals using the differences between the observed and expected proportions of ICU outcomes within the intervals. The numbers of calibration intervals for the four models were greater than 2,900 ( Fig. 1 , Supplemental Digital Content 3, http:// links.lww.com/CCX/A738). We used multiple metrics to assess calibration. We computed the regression line for the predicted proportions for comparison to the ideal and assessed the R 2 from the regression lines as measure of tightness around the regression lines. We also computed the differences between observed and predicted ICU proportions within each calibration interval and report the percentage of intervals with no evidence for difference. Third, we assessed the accuracy, precision, and negative predictive value for sensitivities and specificities of 0.85, 0.90, 0.95, and 0.99 for the approximate lower boundary of the 95% CIs. Since the models were developed in a sample constructed to enhance model development but not assess "real-life" performance, we also assessed performance in an independent January 2017 to June 2018 Health Facts cohort without ICU sample enhancement in a similar manner as the test sample. We also assessed the potential clinical utility in three ways. First, we constructed a simulated children's hospital by random selection from the test sample such that 20% of the total sample were cared for in the ICU and 20% of the ICU patients were initially admitted to non-ICU care areas prior to transfer to the ICU. These population estimates were obtained from previous analyses (3, 4) and a query of Children's Hospital Association database (MM Pollack, unpublished data, 2020). In addition, we created three additional sets of randomly selected test patients with prevalences of 10%, 15%, and 30% for the ICU patients. Second, since the most valuable potential utility is the prediction of transfer from non-ICU to ICU care, we assessed the accuracy of each model in patients who changed their care areas from non-ICU to ICU care in both the test sample and independent cohort. The accuracy was assessed if any of the prediction models were correct. The first 6-hour time period after transfer into the ICU had predictions from all four models, the second 6-hour period had predictions from three models, the third 6-hour period had predictions from two models, and the fourth period had predictions from one model. We also assessed the accuracy for the first, second, third, and fourth 6-hour time periods after transfer but only when the prediction was done prior to the transfer. Finally, we assessed the influence of institutional characteristics on model performances in institutions with different characteristics including teaching, geographical region, and hospital bed size determined from the Health Facts database. There were 20,014 patients with an ICU stay and 20,130 patients cared for in non-ICU care areas only in the 2009-2016 test sample. Demographic data are in Table 1 . Details of this sample have been previously published (4, 6) . Compared with patients with ICU stays, non-ICU care patients were older (median 132.2 vs 28.0 mo; p < 0.0001), had shorter median hospital stays (71 vs 110 hr; p < 0.0001), and had a lower mortality rate (0.1% vs 3.2%; p < 0.0001). Most diagnostic categories differed between the two groups (p < 0.0001). The numbers of patients and 6-hour time periods in the ICU and non-ICU care locations in the training, validation, and testing samples for each of the prediction models are shown in Supplemental Digital Content 2 (http://links.lww.com/CCX/A737). Overall, there were greater than 325,000 6-hour time periods for each future time period. The performances of all four models predicting the future care location were similar in the test sample (Table 2A) . At a decision threshold of 0.5, the sensitivity for the 6-12-hour time period was 0.797 and decreased The percent of the calibration intervals with the 95% CIs crossing zero ranged from 93.96% to 95.49%. There was a small tendency in all models to underpredict ICU care in the lower risk ranges consistent with care of stable patients in the ICU receiving primarily monitoring and a smaller tendency to overpredict in the middle and upper ranges consistent with some sicker patients being cared for in non-ICU care areas. The accuracy, precision, and negative predictive value for the whole test sample were assessed for sensitivities and specificities of 0.85, 0.90, 0.95, and 0.99 ( Table 1 , Supplemental Digital Content 3, http://links. lww.com/CCX/A738). Overall, the performance metrics did not significantly decrease as the prediction time interval increased. The precisions decreased as the sensitivity increased from greater than 0.73 for a sensitivity of 0.85 to greater than 0.47 for a sensitivity of 0.99. Accuracies for the sensitivities in ascending order for the four prediction models were greater than 0.84, greater than 0.82, greater than 0.77, and greater than 0.61. The assessment of negative predictive value and accuracy for specificities of 0.85, 0.90, 0.95, and 0.99 was similar with negative predictive values of greater than 0.91, greater than 0.87, greater than 0.82, and greater than 0.73 for the prediction models. All accuracies were greater than 0.76. The performance of the models was also assessed in the independent 2017-2018 Health Facts dataset. Demographic data are shown in Table 1 , the number of time periods are shown in Supplemental Digital Content 2 (http://links.lww.com/CCX/A737), and performance data are shown in Table 2B . Overall, the performance of the models slightly decreased compared with the test sample for AUPRC (0.867-0.726), sensitivity (0.797-0.590), precision (0.783-0.748), F1 score (0.790-0.660), and false discovery rate (0.217-0.252) and increased for accuracy (0.855-0.925), AUROC (0.917-0.948), specificity (0.885-0.972), and negative predictive value (0.893-0.944). The calibration plots are shown in Figure 1 . The regression lines have very small constants (range, -0.01 to -0.02), the slopes are close to identity (all 0.94), and the R 2 s all are 0.98. The AUPRCs are shown in Figure 2 with all values greater than 0.725. The AUROC curves are all greater than 0.945 (Fig. 1 , Supplemental Digital Content 4, http:// links.lww.com/CCX/A739). The accuracy, precision, and negative predictive value for the independent cohort were also assessed for sensitivities and specificities of 0.85, and 0.95 for the lower boundary of the 95% CI (Table 3) . Overall, performance metrics were stable as the prediction time interval increased. For a sensitivity of 0.95, accuracies for the four models varied from 0.780 to 0.800, precisions ranged from 0.362 to 0.377, specificities ranged from 0.755 to 0.779, and negative predictive values ranged from 0.990 to 0.991. For a specificity of 0.95 for the four models, accuracies for the four models varied from 0.921 to 0.924, precisions ranged from 0.676 to 0.684, sensitivities ranged from 0.722 to 0.733, and negative predictive values ranged from 0.958 to 0.962. The number needed to evaluate (= 1/precision) was always less than three patients for a sensitivity and specificity of 0.95 (data for sensitivities and specificities of 0.85, 0.90, 0.95, and 0.99 are shown in Table 1 , Supplemental Digital Content 4, http://links.lww.com/ CCX/A739). Potential clinical utility was first assessed in a simulated children's hospital sample with varying prevalences of ICU patients. Demographic data are shown in Table 1 , and performance data are shown in Table 2C for the sample composed of 20% ICU patients. Overall for the sample with 20% ICU patients, there were improvements in AUROC, sensitivity, specificity, negative predictive value, and accuracy and small decreases in AUPRC, precision, and false discovery rate. Changing the percent of ICU patients ( equal to 0.74 in the 2017-2018 cohort. The best performing models in the 2017-2018 models for correct prediction for ICU transfer for the first 6-hour ICU time period were the 12-18-hour model (89.6%) and the 6-12-hour model (84.8%). Finally, the performance metrics assessed by the hospital characteristics of bed size, geographic region, and teaching status are shown in Table 3 (Supplemental Digital Content 5, http://links.lww.com/CCX/A740). The maximum reductions compared with the test sample (Table 2A) were less than 11%, whereas most were equivalent or better. Hospitals greater than 500 beds, those in the northeast, and teaching hospitals had the lowest performance metrics, and there was sometimes a small decrease in performance as the prediction time period increased. Identification of patients' future care needs as ICU or non-ICU care is an estimate of changing severity of illness and may identify patients who will have increased, decreased, or stable care requirements. The Criticality Index which demonstrates large differences among high-intensity ICU care, ICU care, and non-ICU care is an appropriate framework to predict changes in severity of illness as reflected in care needs. This analysis focused on neural network models predicting future care needs in time periods ranging from 6-12 hours to 24-30 hours and evaluated their potential clinical applicability in a simulated children's hospital and in an independent cohort without ICU patient enhancement. In the independent 2017-2018 cohort with a decision cut point of 0.5, all models predicting the need for ICU care had an AUROC greater than 0.945, AUPRC greater than 0.72, and accuracy greater than 0.92, and all had excellent calibration. Notably, the performances in the different prediction time periods were very similar with only small decrements in performance as the prediction time increased for some performance metrics. Altering the decision cut points changed the performance metrics, and this is illustrated for sensitivities and specificities of 0.85, 0.90, 0.95, and 0.99. The stability of model performance across time could be explained by the relative infrequency of changes in care area. We evaluated this possibility by computing accuracies for transfers to the ICU, and the accuracy of predictions of ICU care was greater than 88% with a sensitivity of 0.95. In addition, the positive predictive value of these patients needing vasoactive agent infusions or mechanical ventilation if correctly predicted was 37-38%, at least as good as the performance of the Pediatric Early Warning Score (PEWS) paired with clinical assessment (25) . We assessed potential "real-world" performance both in a simulated children's hospital sample and an independent cohort without an enhanced ICU sample from 2017 to 2018, and the performances were comparable with the test sample including the precision indicating a number needed to evaluate of less than 2. Additionally, hospital characteristics had only minor influences on the performance metrics. These results indicate the methodology is appropriate for validation and optimization in a clinical environment. Risk scores are evolving from those generally directed at identifying patients with high risk of death to those that predict clinical deterioration (26) (27) (28) (29) . Relatively simple models, such as the PEWS, predominantly use vital signs to derive immediately actionable information (30, 31) . Although in widespread use, they generally require large "numbers needed to evaluate" (i.e., high false-positive rates) to achieve reasonable sensitivity and did not improved hospital outcome when tested in a large effectiveness study (32) . In the 2017-2018 independent cohort, the number needed to evaluate for a sensitivity of 0.95 was less than or equal to three patients for all models. Accurate predictions could provide major benefits in assisting clinician decision-making (33) (34) (35) . We operationalized improving or deteriorating severity of illness as changes in care area, enabling the prediction of ICU or non-ICU care within the same model. Although none of the current risk assessment or prediction methods have significantly enhanced the ability of bedside caregivers to recognize early patterns of deterioration (36) (37) (38) , the methodology described in this article has the potential to identify patients who will require future transfer to ICU care, potentially altering the clinical trajectory and improving hospital outcomes, patients who will be ready to transition to non-ICU care from ICU care, and those with stable care needs. However, these predictions have different clinical utilities. An alert that a patient may need ICU care usually results in an immediate clinical assessment, often by a rapid response team. Patients predicted to transition out of intensive care, however, do not need immediate evaluation, the transfer is often influenced by administrative and organization factors, and therefore, models are expected to have lesser performance. These models, based on the Criticality Index, integrate past and current physiologic data, therapeutic data, and therapeutic intensity. This is conceptually consistent with historically important ICU severity advances (39) (40) (41) . If the performance is validated with real-life data and if the methodology has sufficient face validity for providers, it could improve clinical decision-making by supplementing the limitations of cognitive processing and reducing medical errors (42) (43) (44) (45) (46) . Medical errors are often based in heuristics and are more likely to occur in high-pressure, high-stakes decisions, particularly when dealing with incomplete information, such as assessing a deteriorating patient or determining the need for ICU care (47) (48) (49) . This study has several limitations. First, the database did not contain the full spectrum of data available in the electronic health record (EHR), and therefore, these results might be further optimized. Second, potential clinical applicability needs to be confirmed using real-time EHRs and, when possible, models specific to individual hospitals. Our assessment of clinical applicability using a simulated children's hospital and independent cohort justifies optimism for successful clinical application. Third, we used time periods of 6 hours; shorter time periods might allow better predictive models. Fourth, although machine learning methods have the advantage of measuring intrinsically complicated interactions, the deep neural network models are not transparent, making the clinical importance of individual or sets of variables difficult to ascertain (50, 51) . Machine learning models, based on laboratory, vital sign, and medication data predicting future care needs of 6-12, 12-18, 18-24, and 24-30 hours based on the Criticality Index, had promising performance metrics. The performances in all time periods were similar without a significant drop-off as the prediction time period increased, and we demonstrated the models for different times were not simply predicting lack of change since they were able to predict care area changes. This conceptual framework and modeling method are applicable to assessing future care needs represented by care areas, including early detection of major changes in care needs and potentially identifying patients who would benefit from early clinical interventions. Epidemiology of pediatric hospitalizations at general hospitals and freestanding children's hospitals in the United States Eunice Kennedy Shriver National Institute of Child Health and Human Development Collaborative Pediatric Critical Care Research Network: Simultaneous prediction of new morbidity, mortality, and www survival without new morbidity from pediatric intensive care: A new paradigm for outcomes assessment Trends in US pediatric hospital admissions in 2020 compared with the decade before the COVID-19 pandemic Severity trajectories of pediatric inpatients using the criticality index MARS consortium: Predicting the clinical trajectory in critically ill patients with sepsis: A cohort study Criticality: A new concept of severity of illness for hospitalized children A comparison of a multistate inpatient EHR database to the HCUP nationwide inpatient sample Apache outcomes acriss venues predicing inpatient mortality using electronic medical record data Medications for children receiving intensive care: A national sample Sedation, analgesia, and neuromuscular blockade: An assessment of practices from 2009 to 2016 in a national sample of 66,443 pediatric patients cared for in the ICU Comparison of three commercial knowledge bases for detection of drugdrug interactions in clinical decision support Centers for Disease Control and Prevention: ICD-9-CM Official Guidelines for Coding and Reporting ICD-10-CM Official Guidelines for Coding and Reporting Committee On Practice And Ambulatory Medicine: Age limit of pediatrics Using the shapes of clinical data trajectories to predict mortality in ICUs Multicenter validation of a machine-learning algorithm for 48-h all-cause mortality prediction Validation of the paediatric logistic organ dysfunction (PELOD) score: Prospective, observational, multicentre study Imputation with the R Package VIM Missing data exploration: Highlighting graphical presentation of missing pattern Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets Tharwat A: Classification assessment methods Area under the precision-recall curve: Point estimates and confidence intervals Impact of rapid response system implementation on critical deterioration events in children The pediatric risk of mortality score: Update ANZICS Paediatric Study Group and the Paediatric Intensive Care Audit Network: Paediatric index of mortality 3: An updated model for predicting mortality in pediatric intensive care* Evaluation of a pediatric early warning score across different subspecialty patients Development and validation of a continuously age-adjusted measure of patient condition for hospitalized children using the electronic medical record Paediatric early warning systems for detecting and responding to clinical deterioration in children: A systematic review Validity and effectiveness of paediatric early warning systems and track and trigger tools for identifying and reducing clinical deterioration in hospitalised children: A systematic review Canadian Critical Care Trials Group and the EPOCH Investigators: Effect of a pediatric early warning system on all-cause mortality in hospitalized pediatric patients: The EPOCH randomized clinical trial Scalable and accurate deep learning with electronic health records Risk-adjusting hospital inpatient mortality using automated inpatient, outpatient, and laboratory databases Cognitive debiasing 2: Impediments to and strategies for change Critical thinking in critical care: Five strategies to improve teaching and learning in the intensive care unit Exploring patterns of error in acute care using framework analysis Cognitive biases associated with medical decisions: A systematic review Therapeutic intervention scoring system: A method for quantitative comparison of patient care Pediatric risk of mortality (PRISM) score The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults Kahneman D: Thinking, Fast and Slow The impact of cognitive and implicit bias on patient safety and quality Heuristics and cognitive error in medical imaging Cognitive processes in anesthesiology decision making Reducing errors from cognitive biases through quality improvement projects Judgment under uncertainty: Heuristics and biases From mindless to mindful practice-cognitive bias and clinical decision making Cognitive debiasing 1: Origins of bias and theory of debiasing Peering into the black box of artificial intelligence: Evaluation metrics of machine learning methods Causability and explainability of artificial intelligence in medicine The data extraction was done in structured query language and R with custom code. The data preparation and exploration, model development and evaluation, generation of tables, plots, and results were done in R with custom code. Code for specific tasks is available upon request.