key: cord-1026577-fytashvs authors: Andaur Navarro, Constanza L; Damen, Johanna A A; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel G M; Hooft, Lotty title: Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review date: 2021-10-20 journal: BMJ DOI: 10.1136/bmj.n2281 sha: 8122e474172950341dc2a0d8c9aeedb32842761a doc_id: 1026577 cord_uid: fytashvs OBJECTIVE: To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties. DESIGN: Systematic review. DATA SOURCES: PubMed from 1 January 2018 to 31 December 2019. ELIGIBILITY CRITERIA: Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes. REVIEW METHODS: Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall). RESULTS: 152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively. CONCLUSION: Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42019161764. • Time interval between predictor assessment and outcome determination enables to correctly record the outcome and achieve a representative number of events. • Time interval between predictor assessment and outcome determination is either too long or too short to correctly record the outcome and achieve a representative number of events. Tree-augmented Naive Bayes A clinical decision support system learned from data to personalize treatment recommendations towards preventing breast cancer metastasis Improved predictive models for acute kidney injury with IDEA: Intraoperative data embedded analytics Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease Comparison of Machine Learning Techniques for Prediction of Hospitalization in Heart Failure Patients Estimating Exposome Score for Schizophrenia Using Predictive Modeling Approach in Two Independent Samples: The Results from the EUGEI Study Machine Learning for Outcome Prediction in Electroencephalograph (EEG)-Monitored Children in the Intensive Care Unit A similarity-based approach to leverage multi-cohort medical data on the diagnosis and prognosis of Alzheimer's disease. Gigascience Evaluation of machine learning algorithms for improved risk assessment for Down's syndrome Using Kalman Filtering to Forecast Disease Trajectory for Patients With Normal Tension Glaucoma New Diagnostic Criteria for the Localization of Insulinomas with the Selective Arterial Calcium Injection Test: Decision Tree Analysis A highly predictive signature of cognition and brain atrophy for progression to Alzheimer's dementia Comparison of Algorithms to Triage Patients to Express Care in a Sexually Transmitted Disease Clinic New-Onset Diabetes and Preexisting Diabetes Are Associated With Comparable Reduction in Long-Term Survival After Liver Transplant: A Machine Learning Approach Early prediction of the severity of acute pancreatitis using radiologic and clinical scoring systems with classification tree analysis XGBoost Model for Chronic Kidney Disease Diagnosis Prediction of fatty liver disease using machine learning algorithms Predictive model for macrosomia using maternal parameters without sonography information Prediction Model for Choosing Needle Length to Minimize Risk of Median Nerve Puncture With Dry Needling of the Pronator Teres Developing infrared spectroscopic detection for stratifying brain tumour patients: Glioblastoma multiforme: Vs. lymphoma Prediction Models for 30-Day Mortality and Complications After Total Knee and Hip Arthroplasties for Veteran Health Administration Patients With Osteoarthritis Predicting hospitalacquired pneumonia among schizophrenic patients: A machine learning approach Using a machine learning algorithm to predict acute graft-versus-host disease following allogeneic transplantation Characterization of fibromyalgia using sleep EEG signals with nonlinear dynamical features An interpretable boosting model to predict side effects of analgesics for osteoarthritis Predictors of Inhospital Mortality after rapid response team calls in a 274 hospital nationwide sample Comprehensive Evolutionary Analysis of the Major RNA-Induced Silencing Complex Members Identifying predictors of probable posttraumatic stress disorder in children and adolescents with earthquake exposure: A longitudinal study using a machine learning approach Discovery and validation of a serum microRNA signature to characterize oligo-and polymetastatic prostate cancer: not ready for prime time Deep structural brain lesions associated with consciousness impairment early after hemorrhagic stroke Predicting Inpatient Payments Prior to Lower Extremity Arthroplasty Using Deep Learning: Which Model Architecture Is Best? Machine Learning Diagnosis of Peritonsillar Abscess. Otolaryngol -Head Neck Surg (United States) A novel framework with high diagnostic sensitivity for lung cancer detection by electronic nose Development and validation of case-finding algorithms for recurrence of breast cancer using routinely collected administrative data Sleep patterns predictive of daytime challenging behavior in individuals with low-functioning autism Prediction of the Depth of Tumor Invasion in Gastric Cancer: Potential Role of CT Radiomics Make intelligent of gastric cancer diagnosis error in Qazvin's medical centers: Using data mining method Vertebral body insufficiency fractures: detection of vertebrae at risk on standard CT images using texture analysis and machine learning Development of a deep learning model for dynamic forecasting of blood glucose level for type 2 diabetes mellitus: Secondary analysis of a randomized controlled trial Prediction of Perioperative Mortality of Cadaveric Liver Transplant Recipients during Their Evaluations Prediction of persistent hemodynamic depression after carotid angioplasty and stenting using artificial neural network model Utility of deep neural networks in predicting gross-total resection after transsphenoidal surgery for pituitary adenoma: A pilot study Machine learning-based radiomics for molecular subtyping of gliomas Predicting acute kidney injury in cancer patients using heterogeneous and irregular data Regional gray matter changes and age predict individual treatment response in Parkinson's disease Improved Interpretability of Machine Learning Model Using Unsupervised Clustering: Predicting Time to First Treatment in Chronic Lymphocytic Leukemia The effect of fractional inspired oxygen concentration on early warning score performance: A database analysis Building risk prediction models for type 2 diabetes using machine learning techniques Bayesian-based decision support system for assessing the needs for orthodontic treatment Occipital EEG Activity for the Detection of Nocturnal Hypoglycemia A Machine-Learning Approach Using PET-Based Radiomics to Predict the Histological Subtypes of Lung Cancer Using machine learning to derive just-in-time and personalized predictors of stress: Observational study bridging the gap between nomothetic and ideographic approaches Machine Learning Models Identify Multimodal Measurements Highly Predictive of Transdiagnostic Symptom Severity for Mood, Anhedonia, and Anxiety Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation Voxel-Based Morphometry: Improving the Diagnosis of Alzheimer's Disease Based on an Extreme Learning Machine Method from the ADNI cohort Identifying probable post-traumatic stress disorder: applying supervised machine learning to data from a UK military cohort Prospectively Classifying Community Walkers After Stroke: Who Are They? Long-term Relapse of Type 2 Diabetes After Roux-en-Y Gastric Bypass: Prediction and clinical relevance Analysis of survival for lung cancer resections cases with fuzzy and soft set theory in surgical decision making Clinical risk assessment in early pregnancy for preeclampsia in nulliparous women: A population based cohort study Key Marker Selection for the Detection of Early Parkinson' s Disease using Importance-Driven Models Postoperative neonatal mortality prediction using superlearning Mortality prediction in patients with isolated moderate and severe traumatic brain injury using machine learning models Predicting Mortality in Diabetic ICU Patients Using Machine Learning and Severity Indices Improved ICU mortality prediction based on SOFA scores and gastrointestinal parameters Patient-level prediction of cardio-cerebrovascular events in hypertension using nationwide claims data Binary Classification Using Neural and Clinical Features: An Application in Fibromyalgia with Likelihood-Based Decision Level Fusion Predicting visual outcome after open globe injury using classification and regression tree model: the Moradabad ocular trauma study Objective and automatic classifcation of Parkinson disease with Leap Motion controller Predicting emergency department orders with multilabel machine learning techniques and simulating effects on length of stay Predictive Utility of a Machine Learning Algorithm in Estimating Mortality Risk in Cardiac Surgery Predictive modeling of emergency cesarean delivery Machine Learning Models Improve the Diagnostic Yield of Peripheral Blood Flow Cytometry Predicting hospital admission at emergency department triage using machine learning Falls risk prediction 27 for older inpatients in acute care medical wards: is there an interest to combine an early nurse assessment and the artificial neural network analysis? Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose An improvised classification model for predicting delirium Predicting length of stay in intensive care units after cardiac surgery: Comparison of artificial neural networks and adaptive neuro-fuzzy system Automatic Classification of Sarcopenia Level in Older Adults: A Case Study at Tijuana General Hospital Accuracy of dengue clinical diagnosis with and without NS1 antigen rapid test: Comparison between human and Bayesian network model decision Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables Surgical Risk Is Not Linear: Derivation and Validation of a Novel, Userfriendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator Performance of a machine learning-based decision model to help clinicians decide the extent of lymphadenectomy (D1 vs. D2) in gastric cancer before surgical resection Predicting appropriate hospital admission of emergency department patients with bronchiolitis: Secondary analysis Using classification techniques for statistical analysis of Anemia Postoperative Pneumonia is Associated with Long-Term Oncologic Outcomes of Definitive Chemoradiotherapy Followed by Salvage Esophagectomy for Esophageal Cancer Machine Learning EEG to Predict Cognitive Functioning and Processing Speed Over a 2-Year Period in Multiple Sclerosis Patients and Controls Machine learning models for predicting post-cystectomy recurrence and survival in bladder cancer patients Incorporating spatial dose metrics in machine learning-based normal tissue complication probability (NTCP) models of severe acute dysphagia resulting from head and neck radiotherapy Predicting burn patient mortality with electronic medical records. Surg (United States) Using neuroimaging to predict relapse in stimulant dependence: A comparison of linear and machine learning models Predicting intensive care unit readmission with machine learning using electronic health record data Computer-Aided Diagnosis of Visceral fat mass as a novel risk factor for predicting gestational diabetes in obese pregnant women Signal Information Prediction of Mortality Identifies Unique Patient Subsets after Severe Traumatic Brain Injury: A Decision-Tree Analysis Approach Comparison of Machine Learning Algorithms for the Prediction of Preventable Hospital Readmissions Ensemble machine learning prediction of posttraumatic stress disorder screening status after emergency room hospitalization Prediction of Pseudoprogression versus Progression using Machine Learning Algorithm in Glioblastoma Developing well-calibrated illness severity scores for decision support in the critically ill Depression prediction by using ecological momentary assessment, actiwatch data, and machine learning: Observational study on older adults living alone Machine-learning-derived classifier predicts absence of persistent pain after breast cancer surgery with high accuracy Evaluation of Bayesian classifiers in asthma exacerbation prediction after medication discontinuation A blood-based signature of cerebrospinal fluid Aβ 1-42 status Response to repeat echoendoscopic celiac plexus neurolysis in pancreatic cancer patients: A machine learning approach Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report Development of machine learning algorithms for prediction of mortality in spinal epidural abscess Machine learning models reveal neurocognitive impairment type and prevalence are associated with distinct variables in HIV/AIDS Accurate prediction of blood culture outcome in the intensive care unit using long short-term memory neural networks Outcome prediction of intracranial aneurysm treatment by flow diverters using machine learning Identification of postoperative complications using electronic health record data and machine learning Functional Connectivities Are More Informative Than Anatomical Variables in Diagnostic Classification of Autism Prospective validation of a new airway management algorithm and predictive features of intubation difficulty A prediction model of military combat and training exposures on VA service-connected disability: a CENC study Predicting HCV Incidence in Latinos with High-Risk Substance Use: A Data Science Approach Was selection of predictors based on univariable analysis avoided? 63 (67 censoring, competing risks, sampling of control participants) accounted for appropriately? Were relevant model performance measures evaluated appropriately? 14 (15 Were model overfitting and optimism in model performance accounted for?