key: cord-1026577-fytashvs
authors: Andaur Navarro, Constanza L; Damen, Johanna A A; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel G M; Hooft, Lotty
title: Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review
date: 2021-10-20
journal: BMJ
DOI: 10.1136/bmj.n2281
sha: 8122e474172950341dc2a0d8c9aeedb32842761a
doc_id: 1026577
cord_uid: fytashvs

OBJECTIVE: To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties. DESIGN: Systematic review. DATA SOURCES: PubMed from 1 January 2018 to 31 December 2019. ELIGIBILITY CRITERIA: Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes. REVIEW METHODS: Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall). RESULTS: 152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively. CONCLUSION: Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42019161764.

• Time interval between predictor assessment and outcome determination enables to correctly record the outcome and achieve a representative number of events.

• Time interval between predictor assessment and outcome determination is either too long or too short to correctly record the outcome and achieve a representative number of events. Tree-augmented Naive Bayes

A clinical decision support system learned from data to personalize treatment recommendations towards preventing breast cancer metastasis

Improved predictive models for acute kidney injury with IDEA: Intraoperative data embedded analytics

Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease

Comparison of Machine Learning Techniques for Prediction of Hospitalization in Heart Failure Patients

Estimating Exposome Score for Schizophrenia Using Predictive Modeling Approach in Two Independent Samples: The Results from the EUGEI Study

Machine Learning for Outcome Prediction in Electroencephalograph (EEG)-Monitored Children in the Intensive Care Unit

A similarity-based approach to leverage multi-cohort medical data on the diagnosis and prognosis of Alzheimer's disease. Gigascience

Evaluation of machine learning algorithms for improved risk assessment for Down's syndrome

Using Kalman Filtering to Forecast Disease Trajectory for Patients With Normal Tension Glaucoma

New Diagnostic Criteria for the Localization of Insulinomas with the Selective Arterial Calcium Injection Test: Decision Tree Analysis

A highly predictive signature of cognition and brain atrophy for progression to Alzheimer's dementia

Comparison of Algorithms to Triage Patients to Express Care in a Sexually Transmitted Disease Clinic

New-Onset Diabetes and Preexisting Diabetes Are Associated With Comparable Reduction in Long-Term Survival After Liver Transplant: A Machine Learning Approach

Early prediction of the severity of acute pancreatitis using radiologic and clinical scoring systems with classification tree analysis

XGBoost Model for Chronic Kidney Disease Diagnosis

Prediction of fatty liver disease using machine learning algorithms

Predictive model for macrosomia using maternal parameters without sonography information

Prediction Model for Choosing Needle Length to Minimize Risk of Median Nerve Puncture With Dry Needling of the Pronator Teres

Developing infrared spectroscopic detection for stratifying brain tumour patients: Glioblastoma multiforme: Vs. lymphoma

Prediction Models for 30-Day Mortality and Complications After Total Knee and Hip Arthroplasties for Veteran Health Administration Patients With Osteoarthritis

Predicting hospitalacquired pneumonia among schizophrenic patients: A machine learning approach

Using a machine learning algorithm to predict acute graft-versus-host disease following allogeneic transplantation

Characterization of fibromyalgia using sleep EEG signals with nonlinear dynamical features

An interpretable boosting model to predict side effects of analgesics for osteoarthritis

Predictors of Inhospital Mortality after rapid response team calls in a 274 hospital nationwide sample

Comprehensive Evolutionary Analysis of the Major RNA-Induced Silencing Complex Members

Identifying predictors of probable posttraumatic stress disorder in children and adolescents with earthquake exposure: A longitudinal study using a machine learning approach

Discovery and validation of a serum microRNA signature to characterize oligo-and polymetastatic prostate cancer: not ready for prime time

Deep structural brain lesions associated with consciousness impairment early after hemorrhagic stroke

Predicting Inpatient Payments Prior to Lower Extremity Arthroplasty Using Deep Learning: Which Model Architecture Is Best?

Machine Learning Diagnosis of Peritonsillar Abscess. Otolaryngol -Head Neck Surg (United States)

A novel framework with high diagnostic sensitivity for lung cancer detection by electronic nose

Development and validation of case-finding algorithms for recurrence of breast cancer using routinely collected administrative data

Sleep patterns predictive of daytime challenging behavior in individuals with low-functioning autism

Prediction of the Depth of Tumor Invasion in Gastric Cancer: Potential Role of CT Radiomics

Make intelligent of gastric cancer diagnosis error in Qazvin's medical centers: Using data mining method

Vertebral body insufficiency fractures: detection of vertebrae at risk on standard CT images using texture analysis and machine learning

Development of a deep learning model for dynamic forecasting of blood glucose level for type 2 diabetes mellitus: Secondary analysis of a randomized controlled trial

Prediction of Perioperative Mortality of Cadaveric Liver Transplant Recipients during Their Evaluations

Prediction of persistent hemodynamic depression after carotid angioplasty and stenting using artificial neural network model

Utility of deep neural networks in predicting gross-total resection after transsphenoidal surgery for pituitary adenoma: A pilot study

Machine learning-based radiomics for molecular subtyping of gliomas

Predicting acute kidney injury in cancer patients using heterogeneous and irregular data

Regional gray matter changes and age predict individual treatment response in Parkinson's disease

Improved Interpretability of Machine Learning Model Using Unsupervised Clustering: Predicting Time to First Treatment in Chronic Lymphocytic Leukemia

The effect of fractional inspired oxygen concentration on early warning score performance: A database analysis

Building risk prediction models for type 2 diabetes using machine learning techniques

Bayesian-based decision support system for assessing the needs for orthodontic treatment

Occipital EEG Activity for the Detection of Nocturnal Hypoglycemia

A Machine-Learning Approach Using PET-Based Radiomics to Predict the Histological Subtypes of Lung Cancer

Using machine learning to derive just-in-time and personalized predictors of stress: Observational study bridging the gap between nomothetic and ideographic approaches

Machine Learning Models Identify Multimodal Measurements Highly Predictive of Transdiagnostic Symptom Severity for Mood, Anhedonia, and Anxiety

Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation

Voxel-Based Morphometry: Improving the Diagnosis of Alzheimer's Disease Based on an Extreme Learning Machine Method from the ADNI cohort

Identifying probable post-traumatic stress disorder: applying supervised machine learning to data from a UK military cohort

Prospectively Classifying Community Walkers After Stroke: Who Are They?

Long-term Relapse of Type 2 Diabetes After Roux-en-Y Gastric Bypass: Prediction and clinical relevance

Analysis of survival for lung cancer resections cases with fuzzy and soft set theory in surgical decision making

Clinical risk assessment in early pregnancy for preeclampsia in nulliparous women: A population based cohort study

Key Marker Selection for the Detection of Early Parkinson' s Disease using Importance-Driven Models

Postoperative neonatal mortality prediction using superlearning

Mortality prediction in patients with isolated moderate and severe traumatic brain injury using machine learning models

Predicting Mortality in Diabetic ICU Patients Using Machine Learning and Severity Indices

Improved ICU mortality prediction based on SOFA scores and gastrointestinal parameters

Patient-level prediction of cardio-cerebrovascular events in hypertension using nationwide claims data

Binary Classification Using Neural and Clinical Features: An Application in Fibromyalgia with Likelihood-Based Decision Level Fusion

Predicting visual outcome after open globe injury using classification and regression tree model: the Moradabad ocular trauma study

Objective and automatic classifcation of Parkinson disease with Leap Motion controller

Predicting emergency department orders with multilabel machine learning techniques and simulating effects on length of stay

Predictive Utility of a Machine Learning Algorithm in Estimating Mortality Risk in Cardiac Surgery

Predictive modeling of emergency cesarean delivery

Machine Learning Models Improve the Diagnostic Yield of Peripheral Blood Flow Cytometry

Predicting hospital admission at emergency department triage using machine learning

Falls risk prediction 27 for older inpatients in acute care medical wards: is there an interest to combine an early nurse assessment and the artificial neural network analysis?

Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose

An improvised classification model for predicting delirium

Predicting length of stay in intensive care units after cardiac surgery: Comparison of artificial neural networks and adaptive neuro-fuzzy system

Automatic Classification of Sarcopenia Level in Older Adults: A Case Study at Tijuana General Hospital

Accuracy of dengue clinical diagnosis with and without NS1 antigen rapid test: Comparison between human and Bayesian network model decision

Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables

Surgical Risk Is Not Linear: Derivation and Validation of a Novel, Userfriendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator

Performance of a machine learning-based decision model to help clinicians decide the extent of lymphadenectomy (D1 vs. D2) in gastric cancer before surgical resection

Predicting appropriate hospital admission of emergency department patients with bronchiolitis: Secondary analysis

Using classification techniques for statistical analysis of Anemia

Postoperative Pneumonia is Associated with Long-Term Oncologic Outcomes of Definitive Chemoradiotherapy Followed by Salvage Esophagectomy for Esophageal Cancer

Machine Learning EEG to Predict Cognitive Functioning and Processing Speed Over a 2-Year Period in Multiple Sclerosis Patients and Controls

Machine learning models for predicting post-cystectomy recurrence and survival in bladder cancer patients

Incorporating spatial dose metrics in machine learning-based normal tissue complication probability (NTCP) models of severe acute dysphagia resulting from head and neck radiotherapy

Predicting burn patient mortality with electronic medical records. Surg (United States)

Using neuroimaging to predict relapse in stimulant dependence: A comparison of linear and machine learning models

Predicting intensive care unit readmission with machine learning using electronic health record data

Computer-Aided Diagnosis of

Visceral fat mass as a novel risk factor for predicting gestational diabetes in obese pregnant women

Signal Information Prediction of Mortality Identifies Unique Patient Subsets after Severe Traumatic Brain Injury: A Decision-Tree Analysis Approach

Comparison of Machine Learning Algorithms for the Prediction of Preventable Hospital Readmissions

Ensemble machine learning prediction of posttraumatic stress disorder screening status after emergency room hospitalization

Prediction of Pseudoprogression versus Progression using Machine Learning Algorithm in Glioblastoma

Developing well-calibrated illness severity scores for decision support in the critically ill

Depression prediction by using ecological momentary assessment, actiwatch data, and machine learning: Observational study on older adults living alone

Machine-learning-derived classifier predicts absence of persistent pain after breast cancer surgery with high accuracy

Evaluation of Bayesian classifiers in asthma exacerbation prediction after medication discontinuation

A blood-based signature of cerebrospinal fluid Aβ 1-42 status

Response to repeat echoendoscopic celiac plexus neurolysis in pancreatic cancer patients: A machine learning approach

Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report

Development of machine learning algorithms for prediction of mortality in spinal epidural abscess

Machine learning models reveal neurocognitive impairment type and prevalence are associated with distinct variables in HIV/AIDS

Accurate prediction of blood culture outcome in the intensive care unit using long short-term memory neural networks

Outcome prediction of intracranial aneurysm treatment by flow diverters using machine learning

Identification of postoperative complications using electronic health record data and machine learning

Functional Connectivities Are More Informative Than Anatomical Variables in Diagnostic Classification of Autism

Prospective validation of a new airway management algorithm and predictive features of intubation difficulty

A prediction model of military combat and training exposures on VA service-connected disability: a CENC study

Predicting HCV Incidence in Latinos with High-Risk Substance Use: A Data Science Approach

Was selection of predictors based on univariable analysis avoided? 63 (67

censoring, competing risks, sampling of control participants) accounted for appropriately?

Were relevant model performance measures evaluated appropriately? 14 (15

Were model overfitting and optimism in model performance accounted for?