key: cord-0028797-w48r5e8l authors: Kim, Dongsung; Hwang, Ji Eun; Cho, Youngjin; Cho, Hyoung-Won; Lee, Wonjae; Lee, Ji Hyun; Oh, Il-Young; Baek, Sumin; Lee, Eunkyoung; Kim, Joonghee title: A Retrospective Clinical Evaluation of an Artificial Intelligence Screening Method for Early Detection of STEMI in the Emergency Department date: 2022-03-07 journal: J Korean Med Sci DOI: 10.3346/jkms.2022.37.e81 sha: be05742c2cb9e2f71a971e1e0d622bb3806fd663 doc_id: 28797 cord_uid: w48r5e8l BACKGROUND: Rapid revascularization is the key to better patient outcomes in ST-elevation myocardial infarction (STEMI). Direct activation of cardiac catheterization laboratory (CCL) using artificial intelligence (AI) interpretation of initial electrocardiography (ECG) might help reduce door-to-balloon (D2B) time. To prove that this approach is feasible and beneficial, we assessed the non-inferiority of such a process over conventional evaluation and estimated its clinical benefits, including a reduction in D2B time, medical cost, and 1-year mortality. METHODS: This is a single-center retrospective study of emergency department (ED) patients suspected of having STEMI from January 2021 to June 2021. Quantitative ECG (QCG™), a comprehensive cardiovascular evaluation system, was used for screening. The non-inferiority of the AI-driven CCL activation over joint clinical evaluation by emergency physicians and cardiologists was tested using a 5% non-inferiority margin. RESULTS: Eighty patients (STEMI, 54 patients [67.5%]) were analyzed. The area under the curve of QCG score was 0.947. Binned at 50 (binary QCG), the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were 98.1% (95% confidence interval [CI], 94.6%, 100.0%), 76.9% (95% CI, 60.7%, 93.1%), 89.8% (95% CI, 82.1%, 97.5%) and 95.2% (95% CI, 86.1%, 100.0%), respectively. The difference in sensitivity and specificity between binary QCG and the joint clinical decision was 3.7% (95% CI, −3.5%, 10.9%) and 19.2% (95% CI, −4.7%, 43.1%), respectively, confirming the non-inferiority. The estimated median reduction in D2B time, evaluation cost, and the relative risk of 1-year mortality were 11.0 minutes (interquartile range [IQR], 7.3–20.0 minutes), 26,902.2 KRW (22.78 USD) per STEMI patient, and 12.39% (IQR, 7.51–22.54%), respectively. CONCLUSION: AI-assisted CCL activation using initial ECG is feasible. If such a policy is implemented, it would be reasonable to expect some reduction in D2B time, medical cost, and 1-year mortality. ST-elevation myocardial infarction (STEMI) is a cardiovascular emergency with high mortality and morbidity. 1,2 Rapid revascularization is the key to better patient outcomes. Delayed recognition of STEMI in the emergency department (ED) is the primary cause of delayed revascularization. 3 However, forcing the clinicians to reduce the delay will increase false alarms because of the subtleness of electrocardiography (ECG) changes in the early phase of STEMI and many benign patterns mimicking STEMI. 4 Therefore, many institutions use secondary confirmation by cardiologists to activate their cardiac catheterization laboratory (CCL). However, this may also require significant time, effort, and costs. Artificial intelligence (AI) systems powered by deep learning technology have innovated many industries 5 and are being actively adopted in the medical field. 6-10 If there is an AI system that can predict STEMI as accurate as a human cardiologist using initial ECG alone so that a triage nurse or an ECG technician can activate CCL directly, we can expect a significant reduction in door-to-balloon (D2B) time and cost as well as a significant improvement in patient outcomes. It has been reported AI algorithms can outperform human experts in detecting some ECG abnormalities. 5, 11 However, it is unlikely that a human expert depends only on a single piece of information (e.g., initial ECG) when making a diagnosis of STEMI. Other information that might be considered includes vital signs, symptom description, and past medical history, as well as serial ECG measurements, echocardiogram, and even cardiac enzyme measurements. Currently, it is unknown whether an AI algorithm using only initial ECG can safely replace the clinical diagnostic process. In addition, it is also unknown how much benefit such replacement would provide additionally. The objective of this study was two-fold. The first objective was to assess the non-inferiority of screening STEMI using AI interpretation of initial ECG only compared to the conventional screening process. The second objective was to estimate the clinical benefits, including a reduction in D2B time, medical cost, and 1-year mortality such process would provide. This is a retrospective study of ED patients suspected of having STEMI. The primary objective was to assess whether an AI system can achieve non-inferior diagnostic performance compared to the concerted screening efforts by emergency physicians (EPs) and cardiologists. The secondary goal was to estimate the benefits of the AI screening, such as reduction in D2B time, evaluation cost, and 1-year mortality. The AI algorithm tested was a CNN-based binary classifier. It is a part of a previously built deep-learning system called Quantitative ECG (QCG™) capable of diagnosing various conditions, including shock, cardiac arrest, acute coronary syndrome, STEMI, non-specific myocardial injury, left heart failure, right heart failure, large pericardial effusion, pulmonary hypertension, hyperkalemia and 35 types of heart rhythms with various accuracies. The AI algorithm was trained using a transfer learning scheme where a modified CNN-based algorithm was pretrained on various open ECG datasets (49,731 recordings total) using a self-supervised learning scheme and fine-tuned on a clinical dataset of 47,194 annotated ECG images of over 32,968 patients who visited Seoul National University Bundang Hospital ED from 2017 to 2019. The algorithm has a signal extraction part using a series of morphological operation procedures and a multi-channel CNN network with 16 layers of convolution layers and a non-local network block with a single sigmoid activation function at the end. The probability output from the sigmoid function was calibrated using focal loss and temperature scaling method as described in a previous study. 12, 13 The STEMI classifier outputs an estimated risk of STEMI (QCG score, a quantitative score ranging from 0 to 100), and we interpreted a score of 50 or more as positive for STEMI. The study facility was a tertiary academic hospital with over 80,000 annual patient visits. The ED's acute chest pain protocol includes the following rules; 1) any patient with acute chest pain undergo ECG at triage and be screened for STEMI by emergency medicine (EM) physicians (EM professors or postgraduate year 3 to 5 EM residents), 2) if a patient is suspected of having STEMI a pre-activation warning call is made to the cardiologist on duty, 3) If the ECG is typical for STEMI, this leads to immediate CCL activation. Otherwise, the cardiologist examines the patient to decide whether to activate the CCL or cancel the whole process. Patients with pre-activation warning call from January 1, 2021 to June 30, 2021 were included. Patients with delayed initial ECG (tested over 30 minutes after ED arrival) or insufficient coronary evaluation were excluded. The captured images of the initial ECGs of the patients were labeled based on their final diagnosis as assessed by reviewing the patient's discharge note and outpatient follow-up visit records rather than ECG morphologic criteria. The labeling work was done by an emergency physician and was checked and confirmed by an interventional cardiologist. For comparison, we assessed the performance of EPs' interpretation of initial ECGs and the results of joint clinical evaluation by EPs and cardiologists. EPs' interpretation of the initial ECGs was assumed to be positive if the pre-activation call was made within 30 minutes after the initial ECGs and negative otherwise. The results of the concerted efforts by EPs and cardiologists were assumed to be positive if the cardiologist confirmed the STEMI and activated the CCL and negative otherwise. The performance of the AI classifier was evaluated based on sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The non-inferiority of the AI classifier against the concerted clinical effort was tested with a priori non-inferiority margin of 5% in sensitivity and specificity. It was done by calculating 95% confidence interval (CI) of absolute difference (AI performance minus clinician performance) in sensitivity and specificity. If the lower margin of the CI is bigger than −5%, we can be sure at least 95% that the AI classifier is not inferior over 5% in sensitivity (or specificity) compared to the clinical efforts. The estimated reduction in D2B time was calculated assuming that direct CCL activation is made within 3 minutes after initial ECG acquisition. The assumption was based on an internal survey on how many minutes will be required for brief history taking to get essential information for the activation. The average reduction of D2B time was calculated following this equation. Categorical variables were reported using frequencies and proportions. Continuous variables were reported using mean and standard deviation (SD) or median and interquartile range (IQR) as appropriately. The performance of the AI algorithm was compared to its human counterparts (EPs and concerted effort by EPs and cardiologists) based on AUC, sensitivity, specificity, PPV, and NPV. AUC was compared using DeLong's method. 15 Sensitivity and specificity were compared using the McNemar test, and PPV and NPV were compared using relative predictive values, as proposed by Moskowitz and Pepe. 16 The CI of difference of sensitivity and specificity between two diagnostic modalities was calculated using Wald method as implemented in R-package, DTComPair. P values < 0.05 were considered significant. 17 All data handling and statistical analyses were performed using R-packages version 3.3.2 (R Foundation for Statistical Computing, Vienna, Austria). The Institutional Review Board of Seoul National Bundang Hospital approved the analysis and waived the informed consent requirement (IRB Number: B-2111-723-114). After excluding six patients with delayed initial ECG and two patients who had expired before coronary artery evaluation (one acute type-A aortic dissection and the other one with isolated AVR lead ST elevation), a total of 80 patients were included ( Table 1) . The mean age was 64.7 (SD, 13.8), and STEMI was confirmed in 54 (73.8%) patients. The most common presenting symptom was chest pain (n = 71, 88.8%), and the most common diagnosis other than STEMI was NSTEMI. The mean and median time required for CCL activation, minutes between initial ECG and cardiologists' confirmation, was 34.1 (SD, 73.4) and 14.0 (IQR, 10.3-23.0) minutes, respectively. The AUC of QCG score was 0.947 ( Fig. 1) , which was significantly higher than that of EPs' initial decision (0.710, P < 0.001) and joint decision by EPs and cardiologists (0.761, P < 0.001). If the QCG score was binned at 50 (binary QCG), so that the result is positive if the score is 50 or more and negative otherwise, the AUC was 0.875, which was still the highest. However, the difference was statistically significant only when compared to EPs' initial decision (P = 0.011). The sensitivity, specificity, PPV and NPV of binary QCG were 98.1% (95% CI, 94.6%, 100.0%), 76.9% (95% CI, 60.7%, 93.1%), 89.8% (82.1%, 97.5%), and 95.2% (86.1%, 100.0%), respectively ( Table 2 ). The sensitivity and NPV were significantly higher than EPs' (both P < 0.001). The difference in sensitivity and specificity between Binary QCG and the joint clinical decision was 3.7% (−3.5%, 10.9%) and 19.2% (−4.7%, 43.1%), respectively ( Table 3) . As the absolute values of their lower margins were smaller than the non-inferiority margin of 5%, it was confirmed that the Binary QCG was at least as accurate as the joint decision by EPs and cardiologists. The average (median) reduction in D2B time was estimated to be 11.0 (IQR, 7.3-20.0) minutes (Table 1, Fig. 2 ). There were 18 bedside echocardiography and 24 additional ECGs, Assumed to be positive if the pre-activation call was made within 30 minutes after the initial ECGs; b Assumed to be positive if the cardiologist confirmed the activation of the CCL. Additional serial ECGs and bedside echocardiography were done in some patients before the confirmation. This study reports that AI interpretation of initial ECG to screen STEMI in ED is non-inferior to joint clinical evaluation by EPs and cardiologists using bedside echocardiography and serial ECGs if required. In addition, we could estimate the possible benefits of CCL activation using the AI system such as reduction in D2B time, medical cost, and 1-year mortality risk. Rapid revascularization is essential in improving patient outcomes in STEMI. 14,18-20 One of the most critical factors in reducing D2B time is rapid diagnosis. The 2017 European clinical guideline for STEMI states that the delay between first medical contact (FMC) and STEMI diagnosis should be ≤ 10 minutes both in hospitals and emergency medical services (EMSs). However, the goal is challenging even in typical STEMI cases considering the minimum time required for ECG tests and brief history. We think AI algorithms can play vital roles if the following qualities are met in this situation. First, the algorithm should be as accurate as healthcare professionals armed with additional diagnostic information such as patient history and physical examination, bedside echocardiography, and serial ECG measurements. Second, such an algorithm should provide probabilistic information so that clinical policies incorporating the AI system can deal with uncertainty related to its diagnosis. Third, the algorithm should be universally available to users to consistently apply such a policy. The algorithm tested in this study satisfied all three of the requirements. It showed non-inferior accuracy using only initial ECG. It provides a probabilistic output so that users can apply different decision thresholds that can accommodate various clinical situations. Lastly, it can be used universally by using printed ECG images instead of raw signals. The cost change we estimated only considered the reduction in serial ECGs and bedside echocardiography tests. However, reduction in D2B and thus earlier life-saving treatment might lead to other types of cost reduction such as reduced length of stay or overall treatment cost. 21-24 It would be worthwhile to see the effect of highly-performing AI decision aids on overall healthcare cost in real-life situations through prospective studies. The AI algorithm used 12-lead ECG images instead of raw waveform data. This type of technology can be applied in many clinical situations where raw waveform data is not available to the users. For example, emergency medical technicians can use their existing ECG machines and analyze the printed materials using their smartphones to activate CCL directly. The core benefits of this approach are the high cost-effectiveness where there is no need to buy new AIenabled ECG machines and the high scalability where already existing application distribution services such as Apple or Android app store can be utilized to distribute the technology. Interestingly, there had been a mobile phone-based approach using crowdsourcing instead of ACCF/AHA guideline for the management of ST-elevation myocardial infarction: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines ESC Guidelines for the management of acute myocardial infarction in patients presenting with ST-segment elevation: The Task Force for the management of acute myocardial infarction in patients presenting with ST-segment elevation of the European Society of Cardiology (ESC) Causes of delay in door-to-balloon time in south-east Asian patients undergoing primary percutaneous coronary intervention Minimizing false activation of cath lab for STEMI--a realistic goal Artificial intelligence in cardiology: present and future Identification of sleep apnea severity based on deep learning from a short-term normal ECG Artificial intelligence in health care: current applications and issues Automatic prediction of atrial fibrillation based on convolutional neural network using a short-term normal electrocardiogram signal Quantitative assessment of chest CT patterns in COVID-19 and bacterial pneumonia patients: a deep learning perspective Performance of a convolutional neural network and explainability technique for 12-lead electrocardiogram interpretation On calibration of modern neural networks Focal loss for dense object detection Prognostic implications of door-to-balloon time and onset-to-door time on mortality in patients with ST-segment-elevation myocardial infarction treated with primary percutaneous coronary intervention Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs Practical Statistics for Medical Research Effect of door-to-balloon time on mortality in patients with ST-segment elevation myocardial infarction Door-to-balloon time with primary percutaneous coronary intervention for acute myocardial infarction impacts late cardiac mortality in high-risk patients and patients presenting early after the onset of symptoms Association of door-to-balloon time and mortality in patients admitted to hospital with ST elevation myocardial infarction: national cohort study Code STEMI' reduces door-to-balloon time and length of stay of patients presenting with STsegment elevation myocardial infarction Emergency department physician activation of the catheterization laboratory and immediate transfer to an immediately available catheterization laboratory reduce door-to-balloon time in ST-elevation myocardial infarction Stroke Thrombolysis protocol shortens "Door-to-Needle Time" and improves outcomes-experience at a tertiary care center in qatar Door to intravenous tissue plasminogen activator time and hospital length of stay in acute ischemic stroke patients Comparison of mobile application-based ECG consultation by collective intelligence and ECG interpretation by conventional system in a tertiary-level hospital Racial differences in electrocardiographic characteristics and prognostic significance in Whites versus Asians AI interpretation. 25 We think combining these two approaches involving both human experts and AI systems will be synergistic in promoting both technologies. This study has several limitations. First, this is a retrospective study. We do not think there would be any significant difference in performance measurements by study design, prospective or retrospective, as the input data will be the same. However, it is possible the estimation of the potential benefits of AI application could be affected. Second, this is a single-center study. Although the ECG report formats used for 12 lead ECGs are almost identical among hospitals in South Korea, there could be some exceptions that might affect the algorithm's performance. Third, the findings in the study should be externally validated, preferably on a more diverse population from multiple hospitals, countries, and races. Although we think 12-lead ECG is a well-standardized clinical test, some racial differences in ECG findings have been reported. 26 Future studies on how the algorithm performs in multicenter settings, especially in non-east Asian populations would be important.In conclusion, AI interpretation of initial ECGs was non-inferior to joint clinical evaluation by EPs and cardiologists in screening STEMI in ED. Therefore, CCL activation based only on AI interpretation of initial ECG is feasible. If such a policy is implemented, it would be reasonable to expect some reduction in D2B time, medical cost, and 1-year mortality.