key: cord-0012641-2lsz1idy authors: Wei, Xi; Zhu, Jialin; Zhang, Haozhi; Gao, Hongyan; Yu, Ruiguo; Liu, Zhiqiang; Zheng, Xiangqian; Gao, Ming; Zhang, Sheng title: Visual Interpretability in Computer-Assisted Diagnosis of Thyroid Nodules Using Ultrasound Images date: 2020-08-15 journal: Med Sci Monit DOI: 10.12659/msm.927007 sha: 8e15cbd37d475de648265c5d6d989257ad6d1eab doc_id: 12641 cord_uid: 2lsz1idy BACKGROUND: The number of studies on deep learning in artificial intelligence (AI)-assisted diagnosis of thyroid nodules is increasing. However, it is difficult to explain what the models actually learn in artificial intelligence-assisted medical research. Our aim is to investigate the visual interpretability of the computer-assisted diagnosis of malignant and benign thyroid nodules using ultrasound images. MATERIAL/METHODS: We designed and implemented 2 experiments to test whether our proposed model learned to interpret the ultrasound features used by ultrasound experts to diagnose thyroid nodules. First, in an anteroposterior/transverse (A/T) ratio experiment, multiple models were trained by changing the A/T ratio of the original nodules, and their classification, accuracy, sensitivity, and specificity were tested. Second, in a visualization experiment, class activation mapping used global average pooling and a fully connected layer to visualize the neural network to show the most important features. We also examined the importance of data preprocessing. RESULTS: The A/T ratio experiment showed that after changing the A/T ratio of the nodules, the accuracy of the neural network model was reduced by 9.24–30.45%, indicating that our neural network model learned the A/T ratio information of the nodules. The visual experiment results showed that the nodule margins had a strong influence on the prediction of the neural network. CONCLUSIONS: This study was an active exploration of interpretability in the deep learning classification of thyroid nodules. It demonstrated the neural network-visualized model focused on irregular nodule margins and the A/T ratio to classify thyroid nodules. The evaluation of thyroid nodules according to recommended guidelines for ultrasound examination and reporting has become routine [1] . Deep learning assisted by radiologists' reviews of medical images has also been used in recent years, and its application in medical-aided diagnosis has been developed for use in eye diseases, brain diseases, lung inflammation, and thyroid nodules using medical images including computerized tomography, magnetic resonance imaging, X-ray, and ultrasound [2] [3] [4] [5] . In the field of artificial intelligence (AI)assisted ultrasound diagnosis, several researchers used a convolutional neural network in the ImageNet database to obtain good diagnostic accuracy by connecting feature images [6] [7] [8] . In recent years, many improved methods based on a convolutional neural network have gradually emerged in the neural network field [9, 10] . In general, the existing AI-assisted medical research models can be divided into 2 categories based on training data: one uses full images for model training and the other uses the region of interest (ROI), which is extracted from the full images, for training [11] [12] [13] [14] [15] [16] [17] . However, there is currently no unified method for selecting training data for neural networks. The accuracy of neural network-assisted diagnosis is better than the performance of doctors in some diseases [18, 19] . However, it has been difficult to explain what the models actually learn in AI-assisted medical research. Because of the complex nature of deep learning, it is impossible to know exactly what features are learned by neural networks, which are essentially intelligent classifiers [20] . Thus, AI-assisted diagnosis still lacks a scientific medical explanation, and there is a long way to go before fully realizing AI-assisted medical care [21, 22] . In the diagnosis of thyroid nodules by radiologists, ultrasound examination can identify the size, number, location, composition, shape, margin, echogenicity, calcification, and blood flow signals. Among these, the composition, margin, echogenicity, calcification, and anteroposterior/transverse (A/T) ratio (a taller-than-wide shape) are important criteria for radiologists to distinguish between benign and malignant nodules [23] [24] [25] [26] . However, whether these criteria are also important for deep learning networks is still unknown. In this study, we aim to identify the features the models learned in the classification of thyroid nodules on ultrasound images by conducting visualization experiments and to verify that some ultrasound features, such as the A/T ratio or margin, are crucial points for neural networks to distinguish malignant nodules from benign nodules. A total of 7 216 thyroid ultrasound images from the database of the Tianjin Medical University Cancer Institute and Hospital were used in this study. Between January 2018 and October 2018, consecutive patients in its 4 medical centers who underwent a diagnostic thyroid ultrasound examination and subsequent surgery were included in the study. All the thyroid nodules were confirmed by postoperative pathological diagnosis. The exclusion criteria were as follows: (1) images from anatomical sites were judged as not having a tumor according to postoperative pathology; (2) nodules with incomplete (one or both orthogonal plane images were missing) or unclear ultrasound images; and (3) cases with incomplete clinicopathological information. This study was approved by the Tianjin Medical University Cancer Institute and Hospital Ethics Committee. The need for medical informed consent from patients was waived because of the retrospective design of the study. The original ultrasound data were scanned and marked by 3 radiologists from the Tianjin Medical University Cancer Institute and Hospital. The original images were 1024×768 pixels, with a pixel depth of 8. Employing a fully convolutional neural network, we used information on the position of the nodules marked by the doctors to obtain the nodule mask shape of each ultrasound image. The A/T ratio distribution of the nodules in the dataset was classified according to the masks. The original thyroid ultrasound images had black bands containing textual information, such as the machine model, nodular crosscut/longitudinal cut, and nodule size, which may have interfered with the training of the neural network to varying degrees. Therefore, preprocessing was required to remove the information on the surrounding black bands. Moreover, the ultrasound images in the dataset were produced by various types of ultrasound scanners; therefore, the positions and sizes of the black bands were different and could not be processed uniformly. We used a convolutional neural network to remove the information on the black bands of the ultrasound images, and, at the same time, the sizes of the images were normalized without any manual annotation or changes to the original scale. For detecting which types of images were the most appropriate for training and testing in deep learning, we tested the diagnostic performance of 3 types: whole images, images with the black edges cut off, and the ROI, which was used to localize and diagnose thyroid nodules. Visualization method of the neural network In this study, a boundary visualization model to observe the effect of nodule boundaries during the prediction of neural networks was proposed. The structure of the visualization model is shown in Supplementary Table 1. Sketches of the AI visualization method are shown in Supplementary Figure 1 . The batch size was set to 20, and the initial learning rate was set to 0.001 (the learning rate was multiplied by 0.1 per 100 iterations). The training was stopped when the accuracy of the model tended to be stable. The original thyroid ultrasound images were surrounded by textual information, which could have caused some interference in the model, as shown in Figure 1 . Therefore, the ultrasound images needed to be cropped and only the ultrasonic imaging region was retained. In addition, in the traditional visualization method, feature maps in the last layer are directly calculated by global average pooling, whereby too much information is lost. In the backward calculation, 1 channel of the feature map corresponds to only 1 weight, as shown in Figure 2A . The boundary visualization model was proposed to address the above problem by strengthening the global average pooling layer to a fully connected layer. In the backward calculation, each layer of the feature map corresponded to multiple weights, so the visualization was more refined ( Figure 2B ). To explore the sensitivity of the neural network to the A/T ratio, we proposed an experimental method which included 3 main steps. The structure of the model is shown in Supplementary Table 2 . First, we defined the original thyroid ultrasound image dataset as I={I 1 , I 2 , …, I n }, where n is the number of images in the dataset. Additionally, I=BÈM, represents benign and malignant images, respectively. Through ROI position extraction, a position set P was obtained, where Pi={x,y,h,w}, and x,y,h,w represent the coordinates, and height and width of the ROI, respectively. Meanwhile Figure 3 , different classification models were trained and tested. To demonstrate that the A/T ratio is an essential factor in diagnosing benign and malignant nodules and that the neural network used for assisted diagnosis can learn this important feature, the dataset used to train and test the assisted-diagnosis neural network was adjusted accordingly. First, as shown in Figure 3 , the A/T ratio of benign nodules was reduced, while that of malignant nodules was increased, demonstrating the influence of the A/T ratio of a nodule. The diagnostic performance of the network was evaluated by calculating its accuracy, sensitivity, and specificity. The area under the receiver operating characteristic curve with a 95% confidence interval was calculated to evaluate the diagnostic performance of the A/T ratio for distinguishing between benign and malignant thyroid nodules. All statistical results were calculated using MedCalc for Windows v15.8 (MedCalc Software, Ostend, Belgium), and were considered statistically significant when P was less than 0.05. This study included 7 216 ultrasound images from 2 489 consecutive patients who underwent diagnostic thyroid ultrasound examination and subsequent surgery. Of the total images, 2 712 were from 1 021 benign nodules, which included nodular goiter, adenomatous goiter, thyroid granuloma, and follicular adenoma, and 4 504 images were from 1 468 malignant nodules, which included papillary thyroid carcinoma, medullary thyroid carcinoma, and follicular thyroid carcinoma. The demographic and pathological features of the patients are shown in Table 1 . Through neural network visualization (Figure 1 ), we found that part of the focus of the neural network trained with full images shifted to the surrounding "interference information", rather than on nodules or internal organization, which could have greatly affected the generalizability of the model. In this way, the accuracy rate was dramatically decreased if different datasets were replaced; for example, changing the way the peripheral information was presented led to a change in the models. As shown in Table 2 , the dataset of experiment 1 had an accuracy rate of 87.96% from using full images to train and test the model; whereas, the dataset of experiment 3, which was obtained by segmenting the ROI of the ultrasound images, performed best among the 3 models. The accuracy of the experiment 3 was 6% and 5.19% higher, respectively, than those of the other 2 experiments. As shown in Figure 1 , the model was too focused in experiments 1 and 2 on the surroundings or multiple signs in the visualization result. According to the description of the Thyroid Imaging Reporting and Data System (TI-RADS), malignant nodules have the characteristics of unclear borders and irregular margins. With the global average pooling layer, the visualized reverse calculation yielded a rough result. On the contrary, through a fully connected layer, we made full use of all the data of the feature map to reduce the loss of information and obtain finer visualization results. The results of the neural network visualization method on malignant nodules focused on the nodule boundary area, which means that the boundary information was also an important indicator in the diagnosis of malignances. Figure 4 shows the experimental results, including screenshots of the visualization results, the original images, and detailed images of the nodules. According to the experimental results, the neural network paid special attention to malignant thyroid nodules with irregular and unclear margins; that is, the red spot areas of the precise visualization results shown in Figure 4 . The AI visual results indicated that irregular margin (red spot area of malignant nodules) was a crucial feature differentiated by the neural network. Figure 4 . The display of the neural network visualization results. The first column is the visualization results, the second column is the original image, and the third column is an enlarged image of the most distinguishing area. As shown in Supplementary Table 3 , 96.30% of the nodules were malignant while only 3.70% (85/2 296) of the nodules were benign when the A/T ratio was ³1, indicating that the A/T ratio effectively distinguished malignant nodules. However, when the A/T ratio was <1, the differentiated benign and malignant nodules were 53.39% and 46.61%, respectively, in which case the benign nodules could not be obviously distinguished ( Figure 5A ). According to the receiver operating characteristic curve, the best cutoff value for the A/T ratio was 0.90 in predicting benign and malignant nodules, with an accuracy, sensitivity, and specificity of 0.872, 74.69%, and 86.06%, respectively ( Figure 5B ). The results of models 1-5 are shown in Table 3 . The data show that the accuracy of the diagnosis of benign and malignant thyroid nodules reached up to 92.72% in model 1. Models 2-4 were tested to verify the sensitivity and specificity of the A/T ratios of nodules. When the A/T ratio was changed in the training set or testing set, the diagnostic accuracy of the models decreased by 9.24-30.45% (Table 3 , Figure 6 ). The specificity of models 2 and 3 decreased significantly, and the sensitivity and specificity decreased in all models. At present, there are many studies on and applications of medically aided diagnoses with neural networks, and a high classification accuracy rate has been achieved [27] [28] [29] [30] [31] . However, the neural network is essentially a black box, which means that its users do not know which features in the images are important to the neural network in making a prediction [32] . Thus, they cannot give a reasonable explanation regarding the reliability of the prediction [33, 34] . In this study, by designing comparative experiments and adopting neural network visualization methods, we showed that the basis of neural network diagnosis of thyroid nodules includes the A/T ratio and margin information. The results showed that an A/T ratio of 0.9 was the best cutoff value for differentiating between benign and malignant nodules. The precise AI visualization of ultrasound made features the successful localization of irregular margins of nodules, indicating the margin feature is an important aspect of neural networks within the black box operation. The diagnostic accuracy of deep learning can currently reach more than 92%, which is better than the performance of doctors [35] [36] [37] . This study found that the datasets used in some studies on deep learning in medical image-assisted diagnosis are based on full images as inputs, which may result in the model learning interference information, leading to poor robustness of the deep learning model without good generalizability among experiments. Therefore, it is recommended to extract the ROI or remove boundary interference information by segmenting ultrasound images before training. In this study, it was proposed that the ROI should be segmented first so that more in-depth research could be carried out with good results when using thyroid ultrasound images for auxiliary medical research. In 2019, deep learning algorithms achieved a high accuracy in the visual localization of lung disease by lung ultrasound, hip fractures by pelvic radiograph, and preclinical pulmonary fibrosis identification by computerized tomography scan [38] [39] [40] . Several experiments in our study were used to visualize the neural network, which can intuitively show the nodular feature area with a higher weight within the neural network. In the medical imaging field, margins, calcifications, and the A/T ratios of thyroid nodules are significant factors in the diagnosis of thyroid cancer. A study by Moon et al. [23] showed that an A/T ratio of ³1 is the best predictor of malignant tumors. Yoon et al. [41] concluded that the feature of a taller-than-wide shape depends on the compressibility of a malignant thyroid nodule on transversal ultrasound images. According to our statistics, the best cutoff value of the A/T ratio to distinguish between benign and malignant nodules was approximately 0.90, which may be related to the inconsistency of the pathological types of the collected nodules in our research. There were 2 limitations in our study. First, the study used only margins and the A/T ratio in neural network training. Further studies are needed to explore which aspects the neural network has learned in addition to the margin and A/T ratio, including other ultrasound features of thyroid nodules, such as echogenicity, composition, and calcification. In addition, further studies are needed to determine whether neural networks can learn microscopic features at the pixel-level that humans cannot observe. Second, the pathological types of nodules included in our study were limited. Different pathological types of nodules should be used in AI-model training for the visualization of generalized performance. In this study, the A/T ratio and margin information of thyroid nodules were crucial criteria for distinguishing between malignant and benign nodules. These criteria are also important factors in the auxiliary diagnosis of neural network models, which have significant value in medicine. Further, according to our visualization experiment results, we conclude there is risk in using the entire ultrasound image to train the neural network; therefore, we suggest segmenting images first. None. Supplementary Data e927007-9 White paper of the ACR TI-RADS Committee Training deep learning algorithms with weakly labeled pneumonia chest X-ray data for COVID-19 detection. medRxiv Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: A retrospective, multicohort, diagnostic study Fully automatic brain tumor segmentation using end-to-end incremental deep neural networks in MRI images Deep learning and lung cancer: AI to extract information hidden in routine CT scans Computer-aided diagnosis for the differentiation of malignant from benign thyroid nodules on ultrasonography Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks Thyroid segmentation and volume estimation in ultrasound images Urine sediment recognition method based on multi-view deep residual learning in microscopic image Speech synthesis from ECoG using densely connected 3D convolutional neural networks Convolutional neural networks for medical image analysis: Full training or fine tuning? A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images Segmentation and diagnosis of papillary thyroid carcinomas based on generalized clustering algorithm in ultrasound elastography Intuitionistic based segmentation of thyroid nodules in ultrasound images Computer-aided diagnosis of thyroid nodule: A review Automated delineation of thyroid nodules in ultrasound images using spatial neutrosophic clustering and level set Computer aided thyroid nodule detection system using medical ultrasound images Management of thyroid nodules seen on US images: Deep learning may match performance of radiologists Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A crosssectional study Deep learning for classifying fibrotic lung disease on high-resolution computed tomography: A casecohort study Interpretable artificial intelligence: Why and when Deep learning-based computer-aided diagnosis system for localization and diagnosis of metastatic lymph nodes on ultrasound: A pilot study A taller-than-wide shape in thyroid nodules in transverse and longitudinal ultrasonographic planes and the prediction of malignancy European Thyroid Association guidelines for ultrasound malignancy risk stratification of thyroid nodules in adults: The EU-TIRADS Ultrasonography diagnosis and imagingbased management of thyroid nodules: Revised Korean Society of Thyroid Radiology Consensus Statement and Recommendations American College of Endocrinology, and Associazione Medici Endocrinologi Medical Guidelines for Clinical Practice for the Diagnosis and Management of Thyroid Nodules -2016 Update A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of thyroid nodules on ultrasound: Initial clinical assessment Computer-aided diagnosis of thyroid nodules via ultrasonography: Initial clinical experience Computer-aided diagnosis system for thyroid nodules on ultrasonography: Diagnostic performance and reproducibility based on the experience level of operators Diagnostic performance evaluation of a computer-assisted imaging analysis system for ultrasound risk stratification of thyroid nodules Automatic thyroid nodule recognition and diagnosis in ultrasound imaging with the YOLOv2 neural network What hinders the uptake of computerized decision support systems in hospitals? A qualitative study and framework for implementation Causability and explainability of artificial intelligence in medicine Computerized clinical decision support for prescribing: Provision does not guarantee uptake Machine learning-assisted system for thyroid nodule diagnosis Ensemble deep learning model for multicenter classification of thyroid nodules on ultrasound images The diagnostic efficiency of ultrasound computer-aided diagnosis in differentiating thyroid nodules: A systematic review and narrative synthesis Localizing B-Lines in lung ultrasonography by weakly supervised deep learning, in-vivo results Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs MUC5B variant is associated with visually and quantitatively detected preclinical pulmonary fibrosis Taller-than-wide sign" of thyroid malignancy: Comparison between ultrasound and CT Figure 1 . Sketches of visualization AI method. The input of the model was the thyroid ultrasound image after data preprocessing, and then the fully connected operation was performed after the feature extraction through the convolution neural network. During training, the features that were useful for classification were gradually strengthened. Therefore, by inversely calculating the weights of the fully connected layers and the feature maps, it was possible to visualize the areas in the images which had a significant effect on the classification task.