key: cord-0027936-nkyk8qca authors: Fazlollahi, Ali M.; Bakhaidar, Mohamad; Alsayegh, Ahmad; Yilmaz, Recai; Winkler-Schwartz, Alexander; Mirchi, Nykan; Langleben, Ian; Ledwos, Nicole; Sabbagh, Abdulrahman J.; Bajunaid, Khalid; Harley, Jason M.; Del Maestro, Rolando F. title: Effect of Artificial Intelligence Tutoring vs Expert Instruction on Learning Simulated Surgical Skills Among Medical Students: A Randomized Clinical Trial date: 2022-02-22 journal: JAMA Netw Open DOI: 10.1001/jamanetworkopen.2021.49008 sha: 7c8777c064e602cf4c43d4e5728d8a8777e4a753 doc_id: 27936 cord_uid: nkyk8qca IMPORTANCE: To better understand the emerging role of artificial intelligence (AI) in surgical training, efficacy of AI tutoring systems, such as the Virtual Operative Assistant (VOA), must be tested and compared with conventional approaches. OBJECTIVE: To determine how VOA and remote expert instruction compare in learners’ skill acquisition, affective, and cognitive outcomes during surgical simulation training. DESIGN, SETTING, AND PARTICIPANTS: This instructor-blinded randomized clinical trial included medical students (undergraduate years 0-2) from 4 institutions in Canada during a single simulation training at McGill Neurosurgical Simulation and Artificial Intelligence Learning Centre, Montreal, Canada. Cross-sectional data were collected from January to April 2021. Analysis was conducted based on intention-to-treat. Data were analyzed from April to June 2021. INTERVENTIONS: The interventions included 5 feedback sessions, 5 minutes each, during a single 75-minute training, including 5 practice sessions followed by 1 realistic virtual reality brain tumor resection. The 3 intervention arms included 2 treatment groups, AI audiovisual metric-based feedback (VOA group) and synchronous verbal scripted debriefing and instruction from a remote expert (instructor group), and a control group that received no feedback. MAIN OUTCOMES AND MEASURES: The coprimary outcomes were change in procedural performance, quantified as Expertise Score by a validated assessment algorithm (Intelligent Continuous Expertise Monitoring System [ICEMS]; range, −1.00 to 1.00) for each practice resection, and learning and retention, measured from performance in realistic resections by ICEMS and blinded Objective Structured Assessment of Technical Skills (OSATS; range 1-7). Secondary outcomes included strength of emotions before, during, and after the intervention and cognitive load after intervention, measured in self-reports. RESULTS: A total of 70 medical students (41 [59%] women and 29 [41%] men; mean [SD] age, 21.8 [2.3] years) from 4 institutions were randomized, including 23 students in the VOA group, 24 students in the instructor group, and 23 students in the control group. All participants were included in the final analysis. ICEMS assessed 350 practice resections, and ICEMS and OSATS evaluated 70 realistic resections. VOA significantly improved practice Expertise Scores by 0.66 (95% CI, 0.55 to 0.77) points compared with the instructor group and by 0.65 (95% CI, 0.54 to 0.77) points compared with the control group (P < .001). Realistic Expertise Scores were significantly higher for the VOA group compared with instructor (mean difference, 0.53 [95% CI, 0.40 to 0.67] points; P < .001) and control (mean difference. 0.49 [95% CI, 0.34 to 0.61] points; P < .001) groups. Mean global OSATS ratings were not statistically significant among the VOA (4.63 [95% CI, 4.06 to 5.20] points), instructor (4.40 [95% CI, 3.88-4.91] points), and control (3.86 [95% CI, 3.44 to 4.27] points) groups. However, on the OSATS subscores, VOA significantly enhanced the mean OSATS overall subscore compared with the control group (mean difference, 1.04 [95% CI, 0.13 to 1.96] points; P = .02), whereas expert instruction significantly improved OSATS subscores for instrument handling vs control (mean difference, 1.18 [95% CI, 0.22 to 2.14]; P = .01). No significant differences in cognitive load, positive activating, and negative emotions were found. CONCLUSIONS AND RELEVANCE: In this randomized clinical trial, VOA feedback demonstrated superior performance outcome and skill transfer, with equivalent OSATS ratings and cognitive and emotional responses compared with remote expert instruction, indicating advantages for its use in simulation training. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT04700384 datasets generated by high-fidelity virtual reality simulators can be employed by machine 23 learning algorithms to objectively measure trainee performance and competence on expert 24 benchmarks 4 . This allows repetitive practice of surgical skills in safe and risk-free environments 25 with immediate feedback. 26 27 Our group developed and has a patent pending for an intelligent tutoring system called the 28 Virtual Operative Assistant (VOA) 5,6 . Utilizing a support vector machine algorithm, the VOA 29 assesses data derived from the NeuroVR (CAE Healthcare) simulator platform and provides 30 individualized audiovisual feedback to improve learner performance during simulated brain 31 tumor resections. The effectiveness of intelligent tutoring systems such as the VOA to the 32 human surgical apprenticeship pedagogy remains to be elucidated. 33 The aim of this study is to compare the effectiveness and educational impact of personalized 35 VOA feedback to expert instruction on medical student's technical skills learning of a virtual 36 reality tumor resection procedure. 37 38 Participants do not know the performance metrics used in their final evaluation, only that they will be learning and practicing technical skills used in neurosurgery for subpial tumor resection procedures. All participants are told that their on-screen performance is being observed by an expert in a different room. Control Group -Baseline Training 23 Participants allocated. Individuals receive introductory information on using the simulator and the scenario. They perform 5 simple subpial tumour resections for practice and have 5 minutes per trial. After each attempt, the student takes a 5-minute break with no assessment or feedback on their performance. On their 6 th attempt they have 13 minutes to perform a different realistic scenario. No assessment or feedback. The average performance improvement by this group will determine the baseline for learning possible with only using the simulator. Experimental Group -VOA Training 23 participants allocated. Individuals receive the same information, have the same amount of time and perform the same scenarios as the control group. In the 5-VOA Assessment and Feedback. Students receive a percentage score of their performance based on their level of expertise in four performance metrics determined by the system's support vector machine. If the minutes between attempts, participant receive the VOA's assessment of their performance and audiovisual feedback. performance is outside the expert reference benchmark in a given metric, participants observe a feedback video which demonstrates an expert performance and provides constructive directional feedback to excel. Experimental Group -remote-based expert Instructor Training 23 participants allocated. Individuals receive the same information, have the same amount of time and perform the same scenarios as the control group. Meanwhile, a trained instructor observes the participant's on-screen performance, that is livestreamed, remotely. Instructors are senior neurosurgery residents with extensive experience in performing and assessing this scenario. During the 5-minute feedback session, they chat with the student, discussing the performance and help in setting goals for the next trial. NOTE: To control for Hawthorne (observer) effect, all participants are told that their performance is being streamed to an instructor. Remote-based expert instructor assessment and feedback. As the participant performs the simulation, instructors grade the on-screen performance using a previously established an Objective Structured Assessment of Technical Skills (OSATS) Visual Rating Scale in six components (e.g., Respect for Tissue, Flow, and Efficiency of Movement) 7 on a 7-point Likert scale. During the feedback session, instructors select feedback statements from a standardized list, discuss the participant's areas of improvement and help them set specific goals for the next attempt. To achieve a statistical power of 0.80, considering a potential effect size of 35% and a 179 significance level of 0.05, this study requires a minimum of 23 participants in each group. All 180 collected participant data will be anonymized and kept in a locked cabinet. Participant 181 characteristics are described as count with percentages, mean (SD) or median (IQR), as 182 appropriate. Validated artificial intelligence algorithms will analyze raw performance data and 183 evaluate the participant's performance utilizing previous established competence benchmarks 184 for practice and realistic tumor resection scenarios. 185 Performance videos will be scored by blinded experts using previously published visual rating 186 scales. Continuous data will be checked for outliers and tests of normality, sphericity, and 187 homogeneity of variance will be conducted to check the assumptions of ANOVA. Emotions 188 before, during, and after, and procedural performance in the 5 practice resections will be 189 examined by a two-way mixed ANOVA using time as the within-subjects variable and group 190 allocation as the between-subjects variable. Baseline performance (i.e., performance in the first 191 practice subpial resection) will be treated as a covariate in the mixed model for procedural 192 performance analysis. Post-intervention responses to the CLI will be summarized for each group 193 and evaluated using one-way ANOVA. Before recruitment, inter-rater reliability will be 194 evaluated using intraclass correlation coefficient and OSATS scale consistency will be examined 195 using Cronbach's alpha from data gathered from instructor training. 196 All statistical tests will be conducted in SPSS version 27 (IBM Corporation, 2020 release, 197 Armonk, New York, United States). Expertise Score predictions were conducted in MATLAB 198 R2020a release (MathWorks Inc., Natick, Massachusetts, United States). 199 200 Surgical Skill and Complication Rates after 202 Bariatric Surgery Surgeon 204 Volume and Operative Mortality in the United States Association Between Surgeon Technical Skills and 207 Patient Outcomes Machine Learning Identification of Surgical 209 and Operative Factors Associated With Surgical Expertise in Virtual Reality Simulation. 210 JAMA Network Open The 212 Virtual Operative Assistant: An explainable artificial intelligence tool for simulation-213 based training in surgery and medicine 215 Inventors. A FRAMEWORK FOR TRANSPARENT ARTIFICIAL INTELLIGENCE IN 216 SIMULATION: THE VIRTUAL OPERATIVE ASSISTANT A Comparison of Visual Rating Scales and 218 Simulated Virtual Reality Metrics in Neurosurgical Training: A Generalizability Theory 219 Study Emotions in medical education: Examining 221 the validity of the Medical Emotion Scale (MES) across authentic medical learning 222 environments. Learning and Instruction Development 224 of an instrument for measuring different types of cognitive load