key: cord-0912800-lsm47mwx authors: Wang, Chen-Xi; Zhang, Yi-Chu; Kong, Qi-Lin; Wu, Zu-Xiang; Yang, Ping-Ping; Zhu, Cai-Hua; Chen, Shou-Lin; Wu, Tao; Wu, Qing-Hua; Chen, Qi title: Development and validation of a deep learning model to screen hypokalemia from electrocardiogram in emergency patients date: 2021-10-05 journal: Chin Med J (Engl) DOI: 10.1097/cm9.0000000000001650 sha: 396fa27465d7b9fc11f196daf13375ce89a66a1f doc_id: 912800 cord_uid: lsm47mwx BACKGROUND: A deep learning model (DLM) that enables non-invasive hypokalemia screening from an electrocardiogram (ECG) may improve the detection of this life-threatening condition. This study aimed to develop and evaluate the performance of a DLM for the detection of hypokalemia from the ECGs of emergency patients. METHODS: We used a total of 9908 ECG data from emergency patients who were admitted at the Second Affiliated Hospital of Nanchang University, Jiangxi, China, from September 2017 to October 2020. The DLM was trained using 12 ECG leads (lead I, II, III, aVR, aVL, aVF, and V(1)–(6)) to detect patients with serum potassium concentrations <3.5 mmol/L and was validated using retrospective data from the Jiangling branch of the Second Affiliated Hospital of Nanchang University. The blood draw was completed within 10 min before and after the ECG examination, and there was no new or ongoing infusion during this period. RESULTS: We used 6904 ECGs and 1726 ECGs as development and internal validation data sets, respectively. In addition, 1278 ECGs from the Jiangling branch of the Second Affiliated Hospital of Nanchang University were used as external validation data sets. Using 12 ECG leads (leads I, II, III, aVR, aVL, aVF, and V(1)–(6)), the area under the receiver operating characteristic curve (AUC) of the DLM was 0.80 (95% confidence interval [CI]: 0.77–0.82) for the internal validation data set. Using an optimal operating point yielded a sensitivity of 71.4% and a specificity of 77.1%. Using the same 12 ECG leads, the external validation data set resulted in an AUC for the DLM of 0.77 (95% CI: 0.75–0.79). Using an optimal operating point yielded a sensitivity of 70.0% and a specificity of 69.1%. CONCLUSIONS: In this study, using 12 ECG leads, a DLM detected hypokalemia in emergency patients with an AUC of 0.77 to 0.80. Artificial intelligence could be used to analyze an ECG to quickly screen for hypokalemia. Hypokalemia is one of the most common electrolyte disturbances encountered in clinical practice. [1] Detection of the serum potassium concentration is the main diagnostic method for hypokalemia, but a long detection time and poor repeatability may delay clinical intervention and allow a patient's condition to deteriorate. [2, 3] This situation is undesirable for emergency patients who require a rapid diagnosis. Hypokalemia can increase the excitability and selfregulation of cardiomyocytes and slow conductivity, which manifests as a series of well-defined ECG abnormalities, such as T-wave changes, ST-segment decline, QTinterval prolongation, and U wave values ≥0.1 mV. [2, 4] However, physicians in clinical practice are not particularly attentive to changes in ECGs when diagnosing electrolyte disturbances. [5] Traditional artificial intelligence applications have gradually evolved into those for specialized medicine. [6, 7] We hypothesize that a deep learning model (DLM) based on convolutional neural networks (CNNs) can be used to effectively screen emergency patients for hypokalemia. Therefore, we trained and validated a DLM to screen for hypokalemia based on the ECGs of emergency patients. The objective of this study was to improve the detection efficiency of hypokalemia in emergency patients by using an electrocardiogram (ECG) to develop and verify noninvasive screening tests. This study was approved by the Ethics Review Committee of the Second Affiliated Hospital of Nanchang University (No. 2019-086). The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Clinical data, including ECGs stored as electronic data, serum potassium and magnesium ion concentrations, B-type natriuretic peptide (BNP) levels, free thyroxine levels, sex, and age, were obtained from the Second Affiliated Hospital of Nanchang University. As the acquired data were anonymously processed by the hospital scientific research platform and the study was retrospective, the Ethics Review Committee exempted the patients' informed consent. Deep learning is a new research direction in machine learning that is moving the field toward its original goal, namely, artificial intelligence (AI). Deep learning learns the internal laws and representation levels of sample data and uses many hidden neuron layers to generate increasingly abstracted, non-linear representations of the underlying data. The goal of deep learning is to endow machines with the analytical and learning abilities of humans and to recognize data such as images and sounds. Image recognition was the first application of deep learning to the clinical field. [7] In this study, a DLM was built on an Anaconda platform using Python (version 3.5.2; Python Software Foundation, Beaverton, OR, USA) and the TensorFlow neural network framework (Google LLC, Mountain View, CA, USA). The framework had 11 CNN layers, of which the first 10 were convolutional layers, and the last layer was a fully connected SoftMax layer. The network output result was between 0 and 1, indicating the probability of detecting hypokalemia from an ECG [ Figure 1 ]. We trained the DLM on 12 ECG leads (leads I, II, III, aVR, aVL, aVF, and V1-6). We also trained the DLM using a single lead (lead II) that can be easily detected by wearable devices. [8, 9] The electrocardiograph used to record the ECG data comes from Nippon Optoelectronics Tomioka Co., Ltd.(model ECG-1150). A total of 310,256 ECGs were obtained from September 2017 to October 2020 from the emergency department of the Second Affiliated Hospital of Nanchang University, including 4615 ECGs of patients with hypokalemia. By excluding patients who have taken >10 min between draw blood from vein and ECG examination, or who have had potassium supplementation treatment or any other medical orders during this period, we ensured that the ECG data could reflect the truest serum potassium level during the ECG examination as much as possible. In addition, the deletion of ECGs without associated demographic data and indication of death yielded a total of 4315 hypokalemia ECGs, as confirmed by serum potassium test results. Considering the limited computing power of existing machines, non-hypokalemic ECGs were randomly selected and aggregated with the same number of selected hypokalemic ECGs for the same period to serve as a DLM development data set in this study. The final development data set included 8630 ECGs. All patients underwent at least one standard 10-s 12-lead ECG in the resting supine position at the time of emergency treatment. Each digitally stored ECG lead was recorded at 500 data points per second (500 Hz) for 10 s. We compiled a data set consisting of the data from all 12 ECG leads. As shown in Figure 2 , the ECGs were randomly divided into a training data set (80%) and an internal validation data set (20%). A total of 1278 ECG data from 1278 patients from the Jiangling branch of the Second Affiliated Hospital of Nanchang University were used only for external verification, showing the powerful capability of the DLM for various data sets. The Jiangling branch (Hospital B) is located in the suburbs and has a distinctly different environment from the hospital headquarters (Hospital A). After review, Hospital A and Hospital B do not have duplicate emergency visit records. A normal serum potassium concentration is 3.5 to 5.5 mmol/L, with an average of 4.2 mmol/L. Hypokalemia usually occurs at <3.5 mmol/L serum potassium. [10] In this study, hypokalemia and non-hypokalemia were defined as corresponding to serum potassium concentrations <3.5 and ≥3.5 mmol/L, respectively. We used the size of the area under the receiver operating characteristic curve (AUC) to evaluate the DLM perfor- Figure 1 : DLM for predicting hypokalemia. The DLM consisted of a convolutional CNN with 11 layers, with the first ten layers being convolutional and the last fully connected SoftMax layer. A RELU activation function was used. Skip connections, dropout, BN, and max pooling were all utilized to improve generalization and convergence properties. The DNN was designed as a binary classifier that output a number from 0 to 1, representing the probability that hypokalemia was detected from the ECG. BN (19) www.cmj.org mance. As the DLM was developed to rapidly screen patients with potential hypokalemia, we evaluated the specificity, positive predictive value, and negative predictive value at a cutoff point selected for high sensitivity in the development data. [11] Except for the AUC, all the diagnostic performance indicators were based on an accurate 95% confidence interval (CI). The reliability interval of the AUC was determined by using the pROC software package in R (R Foundation) to perform the Sun and Su optimization of the Delong method. A bilateral P < 0.05 indicates statistical significance. [12] R software, version 4.0 (R Foundation) was used to perform the analyses. The incidence of hypokalemia in the emergency department patients was approximately 1.49% in this study. A total of 9908 ECGs from patients who admitted at the Jiangling branch of the Second Affiliated Hospital of Nanchang University were included in this study, of which the training data set consisted of 6904 ECGs from 5897 patients, the internal validation data set consisted of 1726 ECGs from 986 patients, and the external validation data set consisted of 1278 ECGs from 1278 patients. Table 1 shows the baseline characteristics of the study population. A total of 8251 patients with an average age of 64.3 years were included in the study. The average blood potassium concentration of patients with hypokalemia was 2.89 mmol/L, and the average blood potassium drawing time after performing the ECG was 51.3 min. Hypokalemia ECGs were more likely to be recorded in patients with hypomagnesemia or NT pro-BNP >300 pg/mL. The DLM performed well in identifying hypokalemia for the internal and external validation data sets [ Figure 3 ]. Table 2 ]. When the blood potassium concentration is lower than 2.6 mmol/L, the recognition accuracy rate of DLM is 0.72. When the blood potassium concentration is between 2.6 and 3.5 mmol/L, the recognition accuracy rate (19) www.cmj.org (CLBBB), complete right bundle branch block, and pacing ECG. In this data set, the overall recognition accuracy rate of DLM is 61.1%. The verification results are shown in Table 3 . The sensitivity and specificity of this model in identifying hypokalemia from AF ECGs were 74.2% and 72.0%, respectively, with an accuracy of 72.1%. The model performed best for pacing ECGs and worst for CLBBB ECGs. Over the past 10 years, various DLMs have been increasingly applied in research on cardiovascular disease, such as for the prediction of left ventricular systolic function, AF, and cardiac arrest, [11, 13, 14] especially during the Coronavirus disease 2019 (COVID-19) epidemic. Clinicians have found AI extremely useful for identifying patients with COVID-19 and predicting the severity and progress of the disease. [15, 16] The aforementioned studies have shown that a CNN-based DLM can confer strong recognition or prediction ability to a machine. A DLM for screening hypokalemia in emergency patients using 12 ECG leads was developed and validated in this study. Using the 12 ECG leads resulted in an AUC of the DLM of 0.80 for the internal validation data set and 0.77 for the external validation data set (not used for DLM development), which indicates good and stable model performance for hypokalemia screening. The model outperformed other common screening tests, such as fecal occult blood testing for detecting colorectal neoplasia (AUC 0.71; overall sensitivity, 29%). [17] However, lower model performance was obtained using a single-lead ECG (lead II). This result may have been obtained because of the relatively few data used. Extending the ECG monitoring time could gradually increase the quantity of acquired data and improve the detection performance. Unlike previous studies in which CNNs have been used to construct DLMs to screen serum ion concentrations, [7] complete 12-lead ECG data were used in this study to develop the DLM. Thus, the DLM detection performance is lower than that reported in previous studies but may be more reliable. Although the serum potassium concentration can be obtained relatively rapidly by venous blood measurement in a hospital, hypokalemia diagnosis outside the hospital (such as in community clinics) remains challenging because patients with hypokalemia usually do not exhibit characteristic symptoms. Using ECGs to non-invasively screen patients for hypokalemia can be a powerful facilitator for early detection of this disease and potentially improve care and outcomes. Moreover, many wearable devices for monitoring ECGs have been developed over the past few years. [18] Therefore, the serum potassium concentration can be dynamically detected at home, which is highly beneficial for patients prone to hypokalemia. However, whether a similar DLM performance would be obtained using wearable ECG inputs remains to be determined. The most important ability of a CNN is the extraction of features from various types of data, such as images, twodimensional data, and waveforms, as well as algorithm generation. Traditional methods use a standard regression model to estimate the potassium content, where the Twave width, T-wave amplitude, T-wave slope, and U-wave value are considered to be important indexes of changes in blood potassium levels. [19, 20] However, CNN is peculiar in precluding the inference of which feature information is extracted by the DLM. [21] We only know that the DLM can screen for hypokalemia based on characteristic changes in ECGs that humans have not yet discerned. Although some researchers have used visualization technology to determine the image area where DLM is used for decision-making, this area still cannot be quantified to enable humans to make the same judgment. [11] Therefore, a visual analysis of the DLM was not performed in this study. Overfitting models often only have a good recognition rate for specific data sets, but our DLM has a high recognition rate regardless of whether it is in the internal data validation set or the external validation data set; even in a data set that is full of potential confounding factors, there is still no recognition rate <60%. Therefore, the DLM model has stable performance and no overfitting. Among the hypothesized confounding factors, CLBBB and pacing rhythm may have the largest impact on the detection of hypokalemia by DLM. This suggests that we need to be more cautious about the judgment results of DLM when we encounter these two kinds of ECG in clinic. It is worth mentioning that our model still achieved good results in the ECG with features of AF. In addition to the considered confounding factors, the concentrations of serum calcium, troponin, creatinine, and free thyroxine may also obscure the characteristics of hypokalemia ECGs and interfere with the DLM extraction of hypokalemia characteristics, thereby affecting the model used to screen hypokalemia, which needs further research and analysis. Of course, using AI as a preliminary screening tool for hypokalemia constitutes a qualitative early-warning diagnostic method, regardless of whether the evaluation result is a true positive, whereas biochemical testing remains the gold standard for an unambiguous diagnosis of hypokalemia. In our study, the highest-performing DLM had a false positive rate of 22.9% and a specificity of only 77.1%. This result may be caused by false positive patients under the gold standard test. [22] In addition, the potassium level detected by the DLM in patients may better reflect the risk of arrhythmia than blood tests. An ECG reflects the response of heart tissue to the blood potassium level and is thus a direct response based on the serum potassium concentration near the actual myocardium. [23] The DLM might be more physiological tool than a blood test from this perspective. Our study has some limitations at this stage. First, a retrospective study was performed using conventional 12-lead ECG. Prospective studies must be conducted to correlate the DLM with enhanced hypokalemia detection and improved outcomes. Note that the DLM was developed and verified using 12-lead ECG data obtained in the environment of a hospital ranked among the top three of all hospitals in China. Therefore, prospective testing is required to analyze the DLM performance based on ECG data obtained in a home environment. Similarly, further testing of the detection performance using ECG data from wearable devices is also required. Third, the DLM performance must be further enhanced before application as a reliable detection tool for the serum potassium concentration. Fortunately, the popularity of AI and continuous optimization of deep learning algorithms make it likely that we will develop a better DLM to screen hypokalemia in the near future. Fourth, we only used 8630 ECG data to develop the DLM because of the limited computing power of existing machines. The relatively few ECGs notwithstanding, we used all of the 12-lead ECG data. Thus, the overall quantity of data used in the analysis was not less than that used in studies based on only 4-or 6lead ECG data. [24] The difficulty of limited computing power will be resolved with the upgrading of equipment and the use of more advanced computers. Fifth, the influence of arterial blood gas and blood pH on the ECG cannot be ignored, but unfortunately the above data is not stored electronically in our hospital; hence the influence on the DLM model cannot be further evaluated. In the followup research, we will pay attention to this part of the data and collect it manually. Finally, the decision-making process of DLM needs to be further explored. Explainable AI has recently attracted considerable interest in medicine and has been studied and reported on. This consideration motivates our next research direction. We expect to finally uncover the mystery of CNNs and understand their detailed decision-making methods in the near future. In conclusion, a CNN-based DLM exhibits good performance in screening hypokalemia using 12-lead ECGs and can provide more rapid serum potassium detection capabilities and dynamic detection capabilities for emergency patients than current methods. However, a prospective study needs to be conducted to determine whether the DLM can improve the clinical outcomes of emergency patients. Effects of diabetic ketoacidosis in the respiratory system Hypokalemia-induced arrhythmias and heart failure: new insights and implications for therapy The case j severe hypokalemia complicated by a syncope Relationship between electrocardiogram and electrolytes The ability of physicians to predict electrolyte deficiency from the ECG Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes Development and validation of a deep-learning model to screen for hyperkalemia from the electrocardiogram Assessment of remote heart rhythm sampling using the alivecor heart monitor to screen for atrial fibrillation: the REHEARSE-AF study Noninvasive blood potassium measurement using signal-processed, single-lead ECG acquired from a handheld smartphone Clinical analysis of a hypokalemic salt-losing tubulopathy case Artificial intelligence algorithm for predicting cardiac arrest using electrocardiography Artificial intelligence system of faster region-based convolutional neural network surpassing senior radiologists in evaluation of metastatic lymph nodes of rectal cancer Machine learning in cardiovascular medicine: are we there yet? An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction Prognostication of patients with COVID-19 using artificial intelligence based on chest x-rays and clinical data: a retrospective study COVID-19 recognition using ensemble-CNNs in two new chest x-ray databases Sensitivity of immunochemical faecal occult blood testing for detecting left-vs right-sided colorectal neoplasia Effect of a home-based wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation: the mSToPS randomized clinical trial Computer-assisted image processing 12 lead ECG model to diagnose hyperkalemia Novel bloodless potassium determination using a signalprocessed single-lead ECG Measuring the quality of explanations: the system causability scale (SCS): comparing human and machine explanations Errors of classification with potassium blood testing: the variability and repeatability of critical clinical tests pROC: an open-source package for R and S+ to analyze and compare ROC curves Artificial intelligence algorithm for detecting myocardial infarction using six-lead electrocardiography Development and validation of a deep learning model to screen hypokalemia from electrocardiogram in emergency patients The authors thank the Translational Medical College of Nanchang University for assisting in developing this deep learning model and also thank Dr. Libin Deng for his guidance. This work was supported by the National Natural Science Foundation of China (No. 81360025). None.