key: cord-0797725-pk8cabxj authors: Masud, Mehedi; Eldin Rashed, Amr E.; Hossain, M. Shamim title: Convolutional neural network-based models for diagnosis of breast cancer date: 2020-10-09 journal: Neural Comput Appl DOI: 10.1007/s00521-020-05394-5 sha: 2ecab0250eacdffaa3839f3c849cd99882e1cfb5 doc_id: 797725 cord_uid: pk8cabxj Breast cancer is the most prevailing cancer in the world and each year affecting millions of women. It is also the cause of largest number of deaths in women dying in cancers. During the last few years, researchers are proposing different convolutional neural network models in order to facilitate diagnostic process of breast cancer. Convolutional neural networks are showing promising results to classify cancers using image datasets. There is still a lack of standard models which can claim the best model because of unavailability of large datasets that can be used for models’ training and validation. Hence, researchers are now focusing on leveraging the transfer learning approach using pre-trained models as feature extractors that are trained over millions of different images. With this motivation, this paper considers eight different fine-tuned pre-trained models to observe how these models classify breast cancers applying on ultrasound images. We also propose a shallow custom convolutional neural network that outperforms the pre-trained models with respect to different performance metrics. The proposed model shows 100% accuracy and achieves 1.0 AUC score, whereas the best pre-trained model shows 92% accuracy and 0.972 AUC score. In order to avoid biasness, the model is trained using the fivefold cross validation technique. Moreover, the model is faster in training than the pre-trained models and requires a small number of trainable parameters. The Grad-CAM heat map visualization technique also shows how perfectly the proposed model extracts important features to classify breast cancers. Breast cancer is affecting millions of women every year in the world and is the reason of highest cause of deaths by cancers among women [1] . The survival rates of breast cancer largely vary in the countries. In North America, it is greater than 80%, in Sweden and Japan it is around 60%, while in low-income countries it is below 40% [1] . The main reason of low survival rate in low-income countries is the lack of programs for early detection and the shortage of enough diagnosis and healthcare facilities. Therefore, it is vital to detect breast cancer in the pre-mature stage to minimize the rate of mortality. Mammography and ultrasound images are the common tools to identify cancers by the experts and requires expert radiologists. Manual process may cause to generate high false positive and false negative numbers. Therefore, nowadays computer aided diagnosis systems (CADs) are vastly used to aid radiologists during the process of decision making in identifying cancers. The CAD systems now potentially reducing the efforts of radiologists and minimizing the false positive and negative numbers in diagnosis. Machine traditional computer aided systems for disease diagnosis and patient monitoring [2] . However, the traditional machine learning techniques involve a hand-created step for extraction of features which is very difficult sometimes. It also requires domain knowledge and an expert radiologist. Meanwhile, deep learning (DL) models automatically develop a learning process adaptively and can extract features from the input dataset considering the target output [3, 4] . The DL methods tremendously reduce the exhaustive process of data engineering and feature extraction while enabling the reusability of the methods. Numerous researches [5] have been conducted to study breast cancer images from various perceptions. Machine learning (ML), convolutional neural networks (CNNs), and deep learning methods are now widely used to classify breast cancers from the breast images. CNN models have been effectively used in the wideranging computer vision fields for years [6, 7] . Since the last few years numerous researches have been conducted applying CNN-based deep learning deep architectures for disease diagnosis. A CNN-based image recognition and classification model perhaps first applied in the competition of ImageNet [8] . After then CNN-based models are currently considered in various applications, for example, image segmentation in medical image processing, feature extraction from images, finding region of interests, object detection, natural language processing, etc. CNN has incredible trainable parameters at the various layers that are applied for extracting important features at various abstraction levels [9] . Meanwhile, a CNN model needs a huge dataset to train. Particularly in the medical field medical dataset may not be always possible to obtain. Moreover, a CNN model requires high speed computing resources to train and tune its hyper parameters. To overcome the data unavailability transfer learning techniques at present vastly applied in classification of medical images. Applying transfer learning techniques, a model can use knowledge from other the pre-trained models (e.g., VGG16 [10] , AlexNet [11] , DenseNet [12] , etc.) that are trained over a huge dataset to classify images. This lessens the requirement of data linked to the problem we are tackling with. The pre-trained models often are used as feature extractors in images from abstract level to more detailed levels. Transfer learning techniques using pre-trained models have shown promising results in different medical diagnosis, such as chest X-ray image analysis for pneumonia and COVID-19 patients' identification [13] , retina image analysis for blind person classification, MRI image analysis for brain tumor classification, etc. Deep learning models leveraging CNNs are used widely to classify breast cancers. We now discuss some of the promising researches that have been proposed using CNN. Authors in [14] proposed a learning framework leveraging deep learning architecture that can learn features automatically form mammography images in order to identify cancer. The framework was tested on the BCDR-FM dataset. Although they showed improved results, however, did not compare employing pre-trained models. Authors in [15] considered AlexNet as a feature extractor for mass diagnosis in mammography images. Support Vector Machin (SVM) is applied as a classification model after AlexNet generates features. The outcome of the proposed model is higher compared to the analytically feature extractor method. In our approach, we considered eight different pre-trained models and show their performances using ultrasound images. Authors in [16] considered transfer learning approach using GoogleNet [17] and AlexNet pre-trained models and some preprocessing techniques. The model is applied on mammograms images, where cancers are already segmented. The authors claim that the model achieves improved performance than the human involved methods. Authors in [18] proposed a convolutional neural network leveraging Inception-v3 pre-trained model to classify breast cancer using breast ultrasound images. The model supports facility for extracting Multiview features. The model is trained only on 316 images and achieved 0.9468 AUC, 0.886 sensitivity, and 0.876 specificity. Authors in [19] developed an ensembled CNN model leveraging VGG192 and ResNet1523 pre-trained models with fine tuning. The authors considered a dataset managed by JABTS. There are 1536 breast masses that include 897 malignant and 639 benign cases. The model achieves 0.951 AUC, 90.9% sensitivity, and 87.0 specificity. Authors in [20] developed another ensemble-based computer aided diagnosis (CAD) system combining VGGNet, ResNet, and DenseNet pre-trained models. They considered a private database that consists of 1687 images that includes 953 benign and 734 malignant cases. The model achieved 91.0% accuracy and 0.9697 AUC score. The model is also tested on the BUSI dataset. In this dataset, the model achieved 94.62% accuracy and 0.9711 AUC score. Authors in [21] implemented two different approaches (1) a CNN and (2) a transfer learning to classify breast cancer from combining two sets of datasets, one containing 780 images and another containing 163 images. The model showed better performance results combining traditional and generative adversarial network augmentation techniques. In the transfer learning approach, the authors compared the performance of four pre-trained models, mainly, VGG16, Inception [22] , ResNet, and NASNet [23] . In the combined dataset, the NASNet achieved highest accuracy value 99%. Authors in [24] compared three CNN-based transfer learning models ResNet50, Xception, and InceptionV3, and proposed a base model that consists of three convolutional layers to classify breast cancers from the breast Neural Computing and Applications ultrasound images dataset. The dataset comprised of 2058 images that includes 1370 benign and 688 malignant cases. According to their analysis, InceptionV3 showed best accuracy of 85.13% with AUC score 0.91. Authors in [25] analyzed four pre-trained models VGG16, VGG19, InceptionV3, and ResNet50 on a dataset that consists of 5000 breast images comprised of 2500 benign and 2500 malignant cases. InceptionV3 model achieved the highest AUC of 0.905. Authors in [26] proposed a CNN model for breast cancer classification considering the local and frequency domain information using Histopathological images. The objective is to utilize the important information of images that are carried by the local and frequency domain information which sometime shows better accuracy for the model. The proposed model is applied on the BreakHis dataset. However, the model obtained 94.94% accuracy. Authors in [27] proposed a novel deep neural network consisting of clustering method and CNN model for breast cancer classification using Histopathological images. The model is based on CNN, a Long-Short-Term-Memory (LSTM), and a mixture of the CNN and LSTM models. In the model, both Softmax and SVM are applied at the classifier layer. However, the model achieved 91% accuracy. From the above discussion, it is evident researchers still on the search for a better model to classify breast cancers. In order to overcome the scarcity of datasets, this research combines two publicly available ultrasound image datasets. Then eight different pre-trained models after fine tuning are applied on the combined dataset to observe the performance results of breast cancer classification. However, the pre-trained models did not show expected outcome. Therefore, we also develop a shallow CNN-based model. The model outperforms all the fine-tuned pre-trained models in all the performance metrics. The proposed model is also faster in training. We also employed different evaluation techniques to prove the better outcome of the proposed model. The details of the methods study, evaluation results and discussion are presented in Sect. 3 . The paper is organized as follows: Sect. 2 discusses materials and methods that are used for the purpose of breast cancer classification. Section 3 proposes the custom CNN model. Section 4 discusses evaluation results of the pre-trained models and the proposed custom. Finally, the paper concludes in Sect. 5. In this research, we consider two publicly available breast ultrasound image datasets [28, 29] . The two datasets are considered mainly for two reasons: (1) to increase the size of the dataset for the training purpose in order to avoid overfitting and biasness and (2) to consider three classes (benign, malignant and normal). Combining the datasets also will improve the reliability of the model. Dataset in [28] contains 250 images in which there are two categories: malignant and benign cases. The size of the images is different. The minimum and the maximum size of the images are 57 9 75 and 61 9 199 pixels with gray and RGB colors, respectively. Therefore, all the images are transformed into gray color to fit into the model. The dataset in [29] contains 780 images, in which there are three categories: malignant, benign, and normal cases. The average image size of the images is 500 9 500 pixels. The breast ultrasound images are collected from 600 women in 2018, and the age range of the women is between 25 and 75 years. Table 1 shows the class distribution of the images in the two datasets. Figure 1 demonstrates examples of ultrasound images of different cases in the two datasets. Data normalization is an important pre-processing phase before feeding the data into a model for training. With preprocessing the data features become easily interpretable by the model. Lack of correct pre-processing makes the model slow in training and unstable. Generally, standardization and normalization techniques are used in scaling data. Normalization technique rescales the data values between 0 and 1. Since the datasets that are considered In this research, are both gray and color images, hence the values of the pixels lie between 0 and 255. We consider zerocentering approach that shifts the distribution data values in such a way that its mean becomes equal to zero. Assume a dataset D, that consists of N samples and M features. Therefore, D[:, i] denotes ith feature and D[j, :] denotes sample j. The equation below defines zerocentering. D½k; i and In this research, we employed k-fold (k = 5) cross-validation on the dataset to overcome overfitting problem during model training. In k-fold cross validation method, K different datasets of same size is generated, where each fold is used to validate the model, and k-1 folds are considered for the purpose of model training. This ensures that the model produces reliable accuracy. Cross-validation is a widely used mechanism to resample data for evaluating machine learning models when the dataset sample size is small. Cross-validation is mainly considered to approximate the learning skill of a machine learning model using data which the model has not seen previously. The result of a model obtained using cross-validation is normally less biased or gives optimistic estimation skill of the model compared to train/test split method. Table 2 shows how the fivefold cross-validation generates five different datasets of ultrasound images from the two datasets. During the last few years, transfer learning algorithms are widely used in many research problems in machine learning which concentrate on preserving knowledge acquired during unraveling one problem and employing the knowledge into another but a relevant problem. For example, an algorithm that is trained to learn in recognizing dogs can be applied to recognize horses. Authors in [30] formally define the transfer learning in terms of domain and task as follows: Let an arbitrary domain D = {X, P(X)}. Here X denotes a feature vector {x 1 , x 2 , …, x n } and the probability distribution in X is denoted by P(X). One of the reasons that the transfer learning algorithms being used when small size dataset is available to train a custom model, but the goal is to produce an accurate model. A custom model employing transfer learning, applies the knowledge of the pre-trained models that are trained over a huge dataset for a long duration. There are mainly two approaches to apply transfer learning: (i) model developing and (ii) using pre-trained models. The pretrained model approach is widely used in deep learning domain. Considering the importance of the pre-trained models as feature extractors this research implements eight pre-trained models using the weights of the convolutional layers of the pre-trained models. These weights act as feature extractors for classifying breast cancers applying on the ultrasound images. Table 3 shows the pre-trained models that are considered in this research. All the models are built on convolutional neural network and were trained on the ImageNet database [31] that consists of a million images. The models can classify 1000 objects (mouse, keyboard, pencil, and many animals) from different images. Therefore, all the models have learned huge feature representations from a large number of images. From the Table 3 , we see that different models us different input size. Therefore, the images in the dataset are transformed accordingly to feed into the models. In the fine-tuning process of the pre-trained models, the final layer is substituted with a classifier that can classify three objects since the dataset consists of images with three classes (normal, malignant, and benign). Hence the models are fine tuned at the top layers. In the fine-tuning process the last three layers of the models are substituted with (i) a fully connected layer (ii) Softmax activation layer, and (iii) a custom classifier. We considered three different optimizers to train the models and to determine which model produces the best results. The brief description of the optimizers is given below: Stochastic Gradient Descent with Momentum (SGDM) is the fundamental optimizer in neural network that is used for the convergence of neural networks, i.e., moving in the direction of the optimum cost function. The following equation is used to update neural network parameters to calculate the gradient r. here l: initial learning, v t : exponential average of squares of gradients, and g t : gradient at time along w j . Adam optimizer associates the heuristics of momentum and RMSprop. The equation is given below. here l: initial learning, v t : exponential average of gradients w j and g t : gradient at time t along w j s t : exponential average of squares of gradients along w j , b 1 ; b 2 are hyperparameters. The fine-tuned pre-trained models used Softmax activation function to generate the probability between the range 0 and 1 of the class outcomes from the input images. Using Softmax activation function at the end of a CNN model to convert its outcome scores into a normalized Neural Computing and Applications probability distribution is a very well-known practice. Softmax function is defined with following equation: where z is a input vector, z i are the elements in z, e z i is the exponential function, and P k j¼1 e z j is the normalization term: 3 Proposed custom model The model applies batch normalization with 20 channels. It also consists of one max pooling layer, one fully connected layer. Dropout regularization is also added after the fully connected layer. Finally, SoftMax activation function is applied since the model needs to classify three classes. The initial learning rate 1.0000e-04 is considered during training. The model also considers mini-batch size 8. The model is trained using three optimizers as the pre-trained models are trained. The model is trained and validated using the configuration of Table 4 . The performance of the fine-tuned pre-trained models are evaluated with various standard performance of metrics. The metrics are accuracy (ACC), Area Under Curve (AUC), precision, recall, sensitivity, specificity, and F1score. Confusion matrix for each model is also generated to observe the scores of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) of normal, malignant, and benign cases. The TP (e.g., malignant) score represents how the model correctly classify real malignant cases as malignant. The FP (e.g., malignant) represents how the model wrongly classifies benign cases as malignant. Similarly, TN (e.g., benign) score represents how the model correctly classifies benign cases as benign, and FN (e.g., benign) score represents how the model wrongly classifies malignant cases as benign. Another important metric is the precision that demonstrates the performance of a model in terms of proportion of the truly classified patients as malignant, benign, and normal cases. Meanwhile, sensitivity or recall value shows the proportion of a case (e.g., malignant) a model truly classifies as malignant cases. Specificity demonstrates the percentage of a case (e.g., benign) that a model classifies correctly. Through the F1-score we achieve a single score from precision and recall through evaluating their harmonic mean. In the below, we show the formula of different metrics. Fig. 2 The architecture of the custom model The scores of performance evaluation of the fine-tuned pretrained models as well as the custom model are shown are Table 6 summarizes the models' performance with the best scores with different evaluation metrics and compares the results with the proposed CNN model. Figure 3 shows the confusion matrix generated from the different pre-trained models as well as the proposed custom model. The figure only shows the confusion matrix of the best pre-trained models as mentioned in the Table 6 . From the confusion matrix of the custom model, we observe that in all the classification of breast cancers the score is high. For example, the model classifies 100% benign class, 100% malignant class, and 100% normal classes using Adam optimizer. The results also outperform the results of the pre-trained models. Table 7 shows the classification results of the models. Table 8 shows performance comparison results between the custom and the pre-trained models. The custom model outperforms all the pre-trained models with respect to accuracy, prediction time and number of parameters. The custom model is also very fast in training than all the fine- tuned pre-trained models. The reason is that the custom model has only one fully connected layer. In addition, the custom model requires a very small number of trainable parameters compared to the other models. All the models are trained in a GPU (NVIDIA Ò GeForce GTX 1660 Ti with Max-Q design and 6 GB RAM) considering a minibatch size of 8. Figure 4 shows execution time and the accuracy score of each model. To calculate accurate time, we run the code four times. The area of each marker in the Fig. 4 shows the size of the number of parameters in the networks. The time of models' prediction is calculated with respect to the fastest network. From the plot, it is quite evident that that custom model is fast and training and produces higher accuracy than the other pre-trained models. Figure 5 shows the accuracy and loss values when the custom model is trained and validated. From the graph in Fig. 5 , it is evident that the custom model generates very high accuracy result as claimed in Table 8 . The custom model's performance is also evaluated by generating heat map visualization using Grad-CAM tool [32] to see how the model identifies the region of interest and how well the model distinguishes cancer classes. Grad-CAM is used to judge whether a model identifies the key areas in the images for prediction. Grad-CAM visualizes the portion of an image through heatmap of a class label that the model focuses for prediction. Figure 6 shows a sample Grad-CAM output of benign and malignant classes and prediction probability. From the output, we observe that the model perfectly focuses on the key areas of images to classify cancers. This study implemented eight pre-trained CNN models with fine tuning leveraging transfer learning to observe the classification performance of breast cancer from ultrasound images. The images are combined from two different datasets. We evaluated the fine-tuned pre-trained models applying the Adam, RMSprop, and SGDM optimizers. The highest accuracy 92.4% is achieved by the ResNet50 with Adam optimizer and the highest AUC 0.97 score is achieved by VGG16. We also proposed a shallow custom model since the pre-trained models have not shown expected results and all the pre-trained models have many convolutional layers and need long duration in the training phase. The proposed custom model consists of only one convolutional layer as feature extractors. The custom model achieved 100% accuracy and 1.0AUC value. With respect to training time, the custom model is faster than any other model and needs small size of trainable parameters. The future plan is to validate the model with other datasets that include new ultrasound images. html#:*:text= Breast%20cancer%20survival%20rates%20vary,et%20al.%2C% 202008. Accessed Cloud-supported cyber-physical localization framework for patients monitoring Applying deep learning for epilepsy seizure detection and brain mapping visualization Hybrid deeplearning-based anomaly detection scheme for suspicious flow detection in SDN: a social multimedia perspective Cervical cancer classification using convolutional neural networks and extreme learning machines Automatic fruit classification using deep learning for industrial applications Emotion recognition using secure edge and cloud computing ImageNet large scale visual recognition challenge Deep relative attributes Very deep convolutional networks for large-scale image recognition. arXiv reprint ImageNet classification with deep convolutional neural networks Neural Computing and Applications Explainable AI and Mass Surveillance System-Based Healthcare Framework to Combat COVID-I9 Like Pandemics Representation learning for mammography mass lesion classification with convolutional neural networks Digital mammographic tumor classification using transfer learning from deep convolutional neural networks Improving EEG-based emotion classification using conditional transfer learning Going deeper with convolutions Breast cancer classification in automated breast ultrasound using multiview convolutional neural network with transfer learning Computer-aided diagnosis system for breast ultrasound images using deep learning Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks Deep learning approaches for data augmentation and classification of breast masses using ultrasound images Rethinking the Inception Architecture for Computer Vision Learning transferable architectures for scalable image recognition Comparison of transferred deep neural networks in ultrasonic breast masses discrimination Diagnostic efficiency of the breast ultrasound computer-aided prediction model based on convolutional neural network in breast cancer Histopathological breast-image classification using local and frequency domains by convolutional neural network Histopathological breast cancer image classification by deep neural network techniques guided by local clustering Dataset of breast ultrasoundimages. Data Brief Intro to optimization in deep learning: momentum Visual explanations from deep networks via gradient-based localization Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Funding Not applicable.Availability of data and material Datasets are collected from public repository [28, 29] . Conflicts of interest Not applicable.Code availability Not applicable.