key: cord-0577383-8kpkpp91 authors: Lee, Haeyun; Eun, Yongsoon; Hwang, Jae Youn; Eun, Lucy Youngmin title: Explainable Deep Learning Algorithm for Distinguishing Incomplete Kawasaki Disease by Coronary Artery Lesions on Echocardiographic Imaging date: 2022-04-05 journal: nan DOI: nan sha: 2189ddaa8c5afa420c03efe1c407070bfdc3ce43 doc_id: 577383 cord_uid: 8kpkpp91 Background and Objective: Incomplete Kawasaki disease (KD) has often been misdiagnosed due to a lack of the clinical manifestations of classic KD. However, it is associated with a markedly higher prevalence of coronary artery lesions. Identifying coronary artery lesions by echocardiography is important for the timely diagnosis of and favorable outcomes in KD. Moreover, similar to KD, coronavirus disease 2019, currently causing a worldwide pandemic, also manifests with fever; therefore, it is crucial at this moment that KD should be distinguished clearly among the febrile diseases in children. In this study, we aimed to validate a deep learning algorithm for classification of KD and other acute febrile diseases. Methods: We obtained coronary artery images by echocardiography of children (n = 88 for KD; n = 65 for pneumonia). We trained six deep learning networks (VGG19, Xception, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) using the collected data. Results: SE-ResNext50 showed the best performance in terms of accuracy, specificity, and precision in the classification. SE-ResNext50 offered a precision of 76.35%, a sensitivity of 82.64%, and a specificity of 58.12%. Conclusions: The results of our study suggested that deep learning algorithms have similar performance to an experienced cardiologist in detecting coronary artery lesions to facilitate the diagnosis of KD. Kawasaki disease (KD) is the most common acquired heart disease in childhood. It was first described by Dr. Kawasaki in 1967 Kuo (2017 . KD mostly occurs in children aged less than 5 years and has a high prevalence in countries of Northeast Asia, particularly Japan, South Korea, and Taiwan Makino, Nakamura, Yashiro, Ae, Tsuboi, Aoyama, Kojo, Uehara, Kotani, and Yanagawa (2015) ; Kim, Park, Eun, Han, Lee, Yoon, Yu, Choi, and Lee (2017) . The main symptoms of KD are unexplained high fever, diffuse erythematous polymorphous rash, bilateral conjunctival injection, cervical lymphadenopathy, oral mucosal changes, and extremity changes, sometimes with perineal desquamation, or reactivation of the bacillus Calmette-Guérin injection site McCrindle, Rowley, Newburger, Burns, Bolger, Gewitz, Baker, Jackson, Takahashi, Shah et al. (2017) ; Dietz, Van Stijn, Burgner, Levin, Kuipers, Hutten, and Kuijpers (2017) ; Newburger, Takahashi, and Burns (2016) ; Singh, Jindal, and Pilania (2018) . While the widely used diagnostic criteria for KD are useful, incomplete KD in infants or children aged 10 years or older can often be problematic, causing misdiagnosis due to a lack of manifestation of the full clinical criteria of classic KD. Nevertheless, this disease has a much higher prevalence of coronary artery lesions McCrindle et al. (2017) ; Singh et al. (2018) . Given the difficulty in diagnosing incomplete or "atypical" KD by clinical features alone, identifying coronary artery findings by echocardiography, along with evaluation of various biomarkers by blood tests, becomes more significant for ensuring an appropriate diagnosis McCrindle et al. (2017) . Even though the choice of treatment with a high dose of intravenous immunoglobulin infusion decreases the risk of coronary artery complications, about 5% of treated children and 15-25% of untreated children have a risk of coronary artery aneurysms or ectasia. Certainly, one of the fatal complications of untreated KD is coronary artery aneurysm. Accordingly, the role of echocardiography in recognizing coronary artery lesions is substantial to ensure timely diagnosis and favorable outcome Na, Kim, and Eun (2019) . For the proper diagnosis of incomplete KD, the expert pediatric cardiologists need to perform echocardiography to investigate the patient's coronary arteries. The most important therapeutic goal of KD is the prevention of coronary artery aneurysm formation. When an aneurysm is noticed, it is critical to prevent development of a giant aneurysm or formation of a thrombus Rowley, Duffy, and Shulman (1988) . Unfortunately, without an experienced pediatric cardiologist and a KD expert, it is challenging to diagnose incomplete KD because the fever patterns of many acute febrile diseases in children appear similar to KD, particularly the initial high grade of fever. In 2020, reports of severely ill pediatric cases have shown that KD and coronavirus disease 2019 presented similar symptoms Jones, Mills, Suarez, Hogan, Yeh, Segal, Nguyen, Barsh, Maskatia, and Mathew (2020) ; Viner and Whittaker (2020) . Moreover, the incidence of KD suddenly increased in Europe and the USA during the COVID-19 pandemic. In addition, KD-like multisystemic inflammatory syndrome in children affected many children in Europe and in the USA. This is of particular concern, as it can result in missed or delayed KD diagnosis and treatment Guan, Ni, Hu, Liang, Ou, He, Liu, Shan, Lei, Hui et al. (2020) . Several studies on computer-aided diagnosis based on deep learning algorithms have shown good performance. Deep learning-based approaches for the diagnosis of cancer or lesions have been shown to perform well and have already surpassed human doctors in some categories Esteva, Kuprel, Novoa, Ko, Swetter, Blau, and Thrun (2017) . Additionally, deep learning algorithms have shown better performance in the medical vision field than conventional methods Liu, Wang, Yang, Lei, Liu, Li, Ni, and Wang (2019) . Several deep learning algorithms have been proposed to diagnose various diseases, such as breast cancer Lee, Park, and Hwang (2020) , liver cancer Wu, Chen, and Ding (2014) , and thyroid nodules Ma, Wu, Zhu, Xu, and Kong (2017) on ultrasound images. These medical deep learning algorithms have been proposed for computer-aided enhancement of diagnostic performance. However, deep learning algorithms have not yet been applied to KD diagnosis. The purpose of this study was to assess whether explainable deep learning algorithms could be used to identify coronary artery lesions on echocardiographic images for the timely diagnosis of KD. We also evaluated the performance of these algorithms in distinguishing KD from another acute febrile disease, pneumonia. To investigate whether it would be possible to distinguish between KD and another similar acute febrile disease, we selected pneumonia as an alternative representative acute febrile disease. Pneumonia is one of the most common febrile diseases in children. For this study, echocardiographic imaging data from January 2016 to August 2019 were acquired from Yonsei University Gangnam Severance Hospital. Giant coronary aneurysm cases were excluded from this study. Echocardiographic images of 88 children with incomplete KD and 59 children with pneumonia (147 in total) were acquired and labeled as KD and non-KD by an experienced cardiologist. 2D echocardiographic coronary artery short axis view images were obtained for the appropriate diagnosis when the children initially presented with high grade fever. We cropped the echocardiographic images to 512 × 512 pixels. In this study, to distinguish incomplete KD from non-KD using echocardiographic images, we applied six deep learning architectures: VGG-19 Simonyan and Zisserman (2014) , Xception Chollet (2017), ResNet-50 He, Zhang, Ren, and Sun (2016) , ResNext-50 Xie, Girshick, Dollár, Tu, and He (2017) , SE-ResNet-50, and SE-ResNext-50 Hu, Shen, and Sun (2018) . VGG Simonyan and Zisserman (2014) is the most basic network with a simple structure for classification and good performance. Therefore, it is still widely used as a comparison architecture. Xception Chollet (2017) Since the data in our study were limited to training the models, only networks with 50 or fewer convolution layers were used in this experiment. Thus, we used VGG-19, Xception, ResNet-50, ResNext-50, SE-ResNet50, and SE-ResNext50 in this experiment. We then evaluated the capability of the deep learning algorithms to distinguish between KD and non-KD. For the training, we used a stochastic optimization method (ADAM) Kingma and Ba (2014) with parameter 1 = 0.9, 2 = 0.999, and = 10 −8 . The initial learning rate was 1 −3 , and it decreased by 1/10 every 30 epochs. We trained each network for a total of 120 epochs. Training batch sizes consisted of 32 patches. We used a binary crossentropy loss function to train each network. We used the pretrained weights of each network on ImageNet to achieve better performance Russakovsky et al. (2015) . The deep learning framework used for training and testing the deep learning algorithms was PyTorch Paszke, Gross, Chintala, Chanan, Yang, DeVito, Lin, Desmaison, Antiga, and Lerer (2017). We trained and tested all networks using a 2-GHz Intel Xeon E5-2620 processor and an NVIDIA TITAN RTX graphics card (24 GB). To explain deep learning algorithms, we used the class activation map (CAM) Zhou, Khosla, Lapedriza, Oliva, and Torralba (2016) . Previously, it was not possible to know which salient parts of a medical image would be highlighted by a deep learning algorithm for classification. The CAM has been proposed to solve this issue and to be able to explain deep learning algorithms. Most deep learning algorithms use a fully connected layer to classify the values obtained by applying a global average pooling (GAP) to the feature maps from the last convolution layer. A linear transform with a class number of filters is then applied to the weight obtained through GAP. Here, to obtain the CAM, the weight of the linear transform for each class was multiplied by the feature map obtained from the last convolutional layer. The CAM at class , ( , ), can be calculated as follows: where ( , ) is a feature map from the last convolution layer for a unit k, is the weight of linear transformation corresponding to class for the unit , and and are the spatial information of and , respectively. The class activation map, ( , ), indicates a class-specific highlight map at a spatial grid ( , ). Therefore, through the CAM, it is possible to understand which parts of the image are considered when the deep learning algorithm proceeds with classification. Hence, the CAM is an excellent tool for analyzing medical image deep learning algorithms Ma, Ji, Niu, Leng, Rubin, and Chen (2020); Qiao, Song, Ye, He, Ma, Wang, Zhang, and Shou (2019), as in this study. This study was approved by the Yonsei University College of Medicine Institutional Review Board and the Research Ethics Committee of Severance Hospital (study approval number: 2020-1127-001). All research was performed in accordance with relevant guidelines and regulations. The requirement for written informed consent was waived by the Institutional Review Board due to the retrospective study design. Since we conducted the experiments using 10-fold cross validation, there were 10 outcomes. Each subset had 137 or 138 echocardiographic images for training and 16 or 15 echocardiographic images for testing. The test images in each subset included 8 or 9 echocardiographic images labeled as KD and 6 or 7 echocardiographic images labeled as non-KD (pneumonia) by an echocardiographic specialist. In these results, SE-ResNext50 showed the best performance in terms of accuracy, F1 score, sensivitivity, precision, and NPV for the distinction of KD and non-KD. It identified 112 true-positive diagnoses among 153 images. Figure 2 shows the precision-recall curve of the deep learning algorithms used for the classification between KD and pneumonia. The areas under the precision-recall curve (AUPRC) of VGG19, Xception, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50 were 0. 643, 0.752, 0.822, 0.830, 0.853, and 0.864, respectively. For non-KD, the echocardiographic images of pneumonia and its corresponding CAM images are illustrated in Figure 3 . These images were correctly identified as non-KD by SE-ResNext50. The images in the first row represent the pneumonia image correctly recognized as non-KD by SE-ResNext50, whereas the second row shows the pneumonia image incorrectly classified as KD by SE-ResNext50. For incomplete KD, the echocardiographic image of KD and its corresponding CAM image are demonstrated in Figure 4 . The images in the first row were correctly identified as KD by SE-ResNext50, whereas the images in the second row were incorrectly classified as non-KD by SE-ResNext50. The thicker red regions indicate the parts of images focused on by the deep learning algorithm during the process of classification as KD and non-KD. The goal of this study was to investigate the potential of explainable deep learning algorithms to identify and differentiate KD from acute febrile diseases. We therefore selected several well-known deep learning algorithms (VGG19, Xception, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) to distinguish incomplete KD from other acute febrile diseases. We selected pneumonia as a representative of other acute febrile diseases because it is the most common febrile disease in children. KD and US Image CAM Overlay pneumonia show similar fever patterns before the occurrence of respiratory symptoms in pneumonia. Despite the small training dataset, the results in our study demonstrated that the deep learning algorithms show excellent performance for the identification of the KD. Nevertheless, as the performance of a deep learning algorithm depends on the quantity of training data Sun, Shrivastava, Singh, and Gupta (2017); Rolnick, Veit, Belongie, and Shavit (2017) , the deep learning algorithm for KD diagnosis should be extended. Figure 3 and Figure 4 show the parts of the echocardiographic images that are considered important by the deep learning algorithm to distinguish between KD and non-KD. These figures indicate that the explainable deep learning algorithms identified KD by using the features of the coronary arteries. This is comparable to how pediatric cardiologists diagnose and differentiate the diseases. Clinical reports have mentioned that coronary artery imaging could be key to the appropriate diagnosis of KD, particularly incomplete KD Kim et al. (2017) . Through this analysis, our results revealed that deep learning algorithms can identify KD among KD and non-KD, as cardiologists do, which suggests that deep learning algorithms could be applied in a clinical setting to recognize incomplete KD among various acute febrile diseases in children. Our experimental results showed the potential of explainable deep learning algorithms to distinguish KD from acute febrile diseases. Among the tested algorithms, SE-ResNext50 showed the highest accuracy, specificity, and precision in the classification of KD and non-KD, whereas SE-ResNet50 showed the highest sensitivity. However, as shown in the precision-recall-curves, SE-ResNext50 showed the best performance among the algorithms in distinguishing between KD and non-KD. In particular, SE-ResNext50 yielded a sensitivity of 82.64% and a specificity of 58.12%. This performance of SE-ResNext50 is comparable to that of an experienced cardiologist (sensitivity of 85% and specificity of 70%) Na et al. (2019) . Thus, these results indicate that explainable deep learning algorithms might be used to diagnose KD at a general hospital without a KD expert. In Korea, an experienced pediatric cardiologist is not always available at each hospital, owing to a lack of human resources. Nevertheless, timely diagnosis of KD is essential for proper treatment, to prevent poor outcomes of coronary artery lesions. In the global COVID-19 pandemic in particular, there is a risk that KD might be misdiagnosed, as WHO stated that COVID-19 has a similar febrile clinical presentation as KD Jones et al. (2020) ; Viner and Whittaker (2020) . Therefore, now more than ever, it is important to distinguish KD from other febrile diseases in children; this may be possible by using an explainable deep learning algorithm. Previous studies have analyzed echocardiographic images using deep learning to perform classification of myocardial disease Zhang, Gajjala, Agrawal, Tison, Hallock, Beussink-Nelson, Lassen, Fan, Aras, Jordan et al. (2018) , detect hypertrophic cardiomyopathy, cardiac amyloid, and pulmonary arterial hypertension Vidal, Diller, Kempny, Li, Dimopoulos, Wort, and Gatzoulis (2021) , and evaluate chamber segmentation Leclerc, Smistad, Pedrosa, Østvik, Cervenansky, Espinosa, Espeland, Berg, Jodoin, Grenier et al. (2019) and wall motion abnormalities Sanchez-Martinez, Duchateau, Erdei, Kunszt, Aakhus, Degiovanni, Marino, Carluccio, Piella, Fraser et al. (2018) ; Omar, Domingos, Patra, Upton, Leeson, and Noble (2018) . However, there has been no study to date on diagnosis of incomplete KD by echocardiographic images of coronary artery lesions using deep learning, as we have done here. This study indicates that explainable deep learning has potential to diagnose KD among acute febrile diseases. This study had some limitations. First, we trained the deep learning algorithms with a small amount of data. Second, the experiment was conducted using only pneumonia as a non-KD acute febrile disease. To overcome the first and second limitations, more data need to be collected on incomplete KD and other acute febrile diseases, which would be considered as non-KD conditions. Training algorithms using such data will provide a KD detection algorithm with higher capability. Our work forms the basis for such future studies. We have shown the feasibility of using an explainable deep learning approach for detection of KD based on echocardiography images. The AUPRCs of the deep learning algorithms, including VGG19, Xception, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50, were found to be 0.643, 0.752, 0.822, 0.830, 0.853, and 0.864, respectively, for discrimination between KD and non-KD. In particular, the SE-ResNext50 offered the best performance among the deep learning algorithms with an accuracy of 72.88% and an AUPRC of 0.864. The explainable deep learning algorithms highlighted salient features of coronary arteries, similar to how an experienced pediatric cardiologist would examine coronary artery regions for the detection of KD. Although the specificity of deep learning algorithms was still lower than that of highly experienced clinicians for the discrimination between incomplete KD and non-KD, the deep learning algorithms used in this study were promising in terms of sensitivity and precision. Therefore, deep learning algorithms may assist clinicians in reducing the probability of misdiagnosing KD in clinical practice. The abilities of deep learning algorithms should be further developed to be comparable to the performance of highly experienced clinicians in order to translate this approach to application in the clinic. The study was approved by the Ethics Committee of Severance Hospital. All the participants provided their written informed consent to participate in this study. There are no conflicts of interest to disclose for publication of this paper. Preventing coronary artery lesions in kawasaki disease Descriptive epidemiology of kawasaki disease in japan Epidemiology and clinical features of kawasaki disease in south korea American heart association rheumatic fever, endocarditis, and kawasaki disease committee of the council on cardiovascular disease in the young; council on cardiovascular and stroke nursing; council on cardiovascular surgery and anesthesia; and council on epidemiology and prevention. diagnosis, treatment, and long-term management of kawasaki disease: a scientific statement for health professionals from the american heart association Dissecting kawasaki disease: a state-of-the-art review Diagnosis of kawasaki disease Utilization of coronary artery to aorta for the early detection of kawasaki disease Prevention of giant coronary artery aneurysms in kawasaki disease by intravenous gamma globulin therapy Covid-19 and kawasaki disease: novel virus and novel case Kawasaki-like disease: emerging complication during the covid-19 pandemic Clinical characteristics of coronavirus disease 2019 in china Dermatologist-level classification of skin cancer with deep neural networks Deep learning in medical ultrasound analysis: a review Channel attention module with multiscale grid average pooling for breast cancer segmentation in an ultrasound image Deep learning based classification of focal liver lesions with contrast-enhanced ultrasound A pre-trained convolutional neural network based method for thyroid nodule diagnosis Very deep convolutional networks for largescale image recognition Xception: Deep learning with depthwise separable convolutions Deep residual learning for image recognition Aggregated residual transformations for deep neural networks Squeeze-and-excitation networks Rethinking the inception architecture for computer vision Imagenet large scale visual recognition challenge Adam: A method for stochastic optimization Automatic differentiation in pytorch Learning deep features for discriminative localization Ms-cam: Multi-scale class activation maps for weakly-supervised segmentation of geographic atrophy lesions in sd-oct images Deep learning for automatically visual evoked potential classification during surgical decompression of sellar region tumors Revisiting unreasonable effectiveness of data in deep learning era Deep learning is robust to massive label noise Fully automated echocardiogram interpretation in clinical practice: feasibility and diagnostic accuracy Utility of deep learning algorithms in diagnosing and automatic prognostication of pulmonary arterial hypertension based on routine echocardiographic imaging Deep learning for segmentation using an open large-scale dataset in 2d echocardiography Machine learning analysis of left ventricular function to characterize heart failure with preserved ejection fraction Quantification of cardiac bull's-eye map based on principal strain analysis for myocardial wall motion assessment in stress echocardiography This study was supported by a new faculty research seed money grant of Yonsei University College of Medicine for 2020 (2020-32-0035).