key: cord-0752766-jj6chy01 authors: Khan, Taha Muthar; Xu, Shengjun; Khan, Zullatun Gull; Uzair chishti, Muhammad title: Implementing Multilabeling, ADASYN, and ReliefF Techniques for Classification of Breast Cancer Diagnostic through Machine Learning: Efficient Computer-Aided Diagnostic System date: 2021-03-22 journal: J Healthc Eng DOI: 10.1155/2021/5577636 sha: 9dfa74a18bb85e7b50347afdfcf89123ed9ff997 doc_id: 752766 cord_uid: jj6chy01 Multilabel recognition of morphological images and detection of cancerous areas are difficult to locate in the scenario of the image redundancy and less resolution. Cancerous tissues are incredibly tiny in various scenarios. Therefore, for automatic classification, the characteristics of cancer patches in the X-ray image are of critical importance. Due to the slight variation between the textures, using just one feature or using a few features contributes to inaccurate classification outcomes. The present study focuses on five different algorithms for extracting features that can extract further different features. The algorithms are GLCM, LBGLCM, LBP, GLRLM, and SFTA from 8 image groups, and then, the extracted feature spaces are combined. The dataset used for classification is most probably imbalanced. Additionally, another focal point is to eradicate the unbalanced data problem by creating more samples using the ADASYN algorithm so that the error rate is minimized and the accuracy is increased. By using the ReliefF algorithm, it skips less contributing features that relieve the burden on the process. Finally, the feedforward neural network is used for the classification of data. The proposed method showed 99.5% micro, 99.5% macro, 0.5% misclassification, 99.5% recall rats, specificity 99.4%, precision 99.5%, and accuracy 99.5%, showing its robustness in these results. To assess the feasibility of the new system, the INbreast database was used. Breast cancer is considered a key health issue in women which is causing a high rate of casualty. e initial diagnosis of breast cancer with mammographic screening and appropriate pharmacological treatments has steadily increased the prognosis of breast cancer [1] . ese include mammography, biopsy, ultrasound image, and thermography [2] . e biopsy is painful procedure and rather expensive. Chemotherapy is usually frailty associated with a psychiatric condition defined as the accumulation of several interactive diseases, impairs, and disability: exhaustion, nausea, inadequate, relatively slow walking speed and physical exercise, and unintended weight loss [3, 4] . So, some doctors recommended dispensing low-dose aspirin before and after the detection of breast cancer [5] . But today's world image recognition methods have an important role to play in the analysis of tumor images by using a machine learning methodology. It uses a random generator, a function extractor, and a classifier to model a doctor's enquiry and construct a personalized questionnaire [6] . Also microwavebased imaging techniques were developed for breast cancer detection [7] . e data mining as well as classification techniques is a well-organized approach of classifying data. Particularly in the medical field, these approaches are commonly useable diagnostics research for decision making. Many classification methods are used in the algorithms of machine learning like decision tree (C45), support vector machine (SVM), and naive Bayes algorithm [8] . Support vector machine (SVM) discriminatory classifier is used to classify hyperplanes for binary groups. But, in particular case, the major drawback is low results for a greater number of characteristics than the number of samples [9] . e decision tree (C4.5) is a hierarchical decision support technique, but its downside is that it is highly unreliable and data-based. A minor shift in data leads to a completely different tree being created [9] . Naive Bayes (NB) and linear discriminant analysis (LDA) are unable to locate a nonlinear structure concealed in high-dimensional results. Secondly, the singularity of inherent matrix is a problem in which the determinant value is zero which leads to nonclassification of matrix [9] . A fuzzy support vector machine (FSVM) was implemented by Nedeljkovic to define and characterize the amount of breast ultrasound [10] . In order to eradicate such errors, algorithms were developed to assist radiologists. erefore, this distinctive attribute of the tissue patches in the image played an important role in classification. e machine can extract five unique features [11] . Balance them by ADASYN [12] and classification is by multilayer perceptron neural network. e advantage multilayer perceptron neural networks (MLPNNs) MLP is an artificial neural network feedforward architecture that is used to design recognition schemes to identify particular patterns. It is nonparametric learning and is enforceable on a noisy input. It is able to model complex nonlinear and high-dimensional problems. You can pick different kernel functions [9] . Finally, classification is used in the doubtful areas into abnormal or normal detection. For several years, the community of medical imaging has made attempts to develop CAD framework. With its arrival, novel challenges started emerging, which now require proper knowledge as well as thorough research. Research has been done on some key modules of CAD. is created a need to develop the computer-aided diagnosis system. It has three aims. First is the detection of the abnormal breast from mammography. e second involves masses' detection, while the third one is meant to differentiate benign from malevolent masses. In this study, the former is focused while it is the essential process of further two attempts. In the previous 10 years, various approaches have been recommended. Milosevic et al. [13] utilized the 20 GLCM features. ey used naive Bayes classifier for sorting, keeping up vector machines as well as k-nearest neighbors. Petrosian et al. [14] investigated the utility of texture characteristics for mass and normal tissue classification based on spatial gray-level dependency (SGLD) matrices. Iseri and Oz [15] developed a new method of extraction of features, i.e., statistical analysis based on multiwindow, in order to detect microcalcification clusters. Nababan et al. [16] use three layers of SECoS with 16 features as proposed for classifying benign and malignant masses. Sigh et al. [17] utilized a support vector device that had texture, shape features, and hierarchical technique for categorizing both malignant and compassionate stacks. Perez et al. [18] talked about experimental evaluation and the theoretical description of an innovative attribute collection method that is called uFilter. Via the integrated mammography data as well as MRI, Yang and Li recognized breast cancer. ey performed information integration by two techniques, i.e., MIP and TPS. In Kinoshita et al. [19] , the mixture of form and texture used provides space for the classification of regular and infected breast lesions based on gray-level cooccurrence matrices (GLCM). Anita and Peter [20] proposed an automated segmentation technique to classify and segment abnormal mass regions on the basis of the maximum cell intensity update. In Peng et al. [21] , for initial breast cancer diagnosis in patients with breast microcalcification lesions, FDG-PET/CT was used. Molloi et al. [22] evaluated the breast density through spectral mammography. Kegelmeyer [23] constructed a tool to identify satellite lesions in mammograms and texture characteristics of computed laws from a chart of the local edge orientations. Gorgel et al. [24] suggested spherical wavelet transform. Mohanty et al. [25] recommended a technique that gets ROI by cropping operation. After ROI was extracted, 2D discrete wavelet transform was combined with GLCM in order to obtain texture-based features. ere are a total of 65 features that were calculated by this combination. Moreover, PCA was also implemented to eradicate redundant features. At the last step, forest optimization algorithm was implemented to get classification results. Abdel-Nasser et al. [26] invented technique for change in temperature on normal and abnormal breast cancer detection by extracting GLCM and 22 features and used learning-to-rank and texture analysis methods. Wang et al. [27] developed an algorithm for classifying benign and malignant masses into their appropriate classes. In their work, there were 16 spectrums that contained 16,777,216 features which were further reduced to 18 features by using PCA. e FD was combined with Jaya for training weights and biases of FNN. e projected method was later named Jaya-FNN. Welch et al. [28] did contrast enhancement and also performed dimensioning by CLAHE based adaptive method as well as histogram equalization. e ROI was extracted by using a bimodel processing algorithm in two levels. Firstly, extraction of normal breast boundary was done and then the abnormal breast boundary was extracted. GLCM method was utilized to get the second level statistical texture features. Shape features included eccentricity, LBP, circularity, and Hog. Intensity features included mean kurtoses and skewness. So the feature's space was reduced by a recursive approach. KNN, support vector machine, and decision tree were used for the classification of desirable functions. From the above literature research, segmentation, feature extraction, and feature selection as well as classification are considered as the major factors for categorizing to get better effectiveness of discovery of breast stacks. Mohanty et al. [29] used 19 (GLCM + GLRLM) features of extraction to classify detection of benign and malignant masses. e category imbalance problem [16, [30] [31] [32] [33] [34] is heavily influenced by machine learning and statistical algorithms. e Heuristic Cover-Sampling Algorithm is a Synthetic Minority Over-sampling Technique (SMOTE). It produces artificial samples from the minority class by parsing existing instances that lie close together. It has been one of the most common methods used for data sample selection for a few weeks [35] . Sampling data approaches are such as Random Over-Sampling (ROS), which replicates extracted features, and Random Under-Sampling (RUS), which removes majority-2 Journal of Healthcare Engineering class samples. In order to adapt to the binary classifier ratio, these methods distort individual counseling [12] . Sampling learning is an active process from datasets through Adaptive Synthetic (ADASYN) algorithm. ADASYN's guiding theory is the use of weighted consideration for the different ethnic groups. e ADASYN approach improves learning in two ways with respect to data distribution: (1) the bias created by the inequality of the class and (2) dynamically altering the boundary of the classification decision to reflect on those samples that are difficult to understand [12] . While each tissue type has its own characteristics for the proposed work, it often hardly differentiates between normal and abnormal cancers. Commonly, cells begin to expand in abnormal cancers and their tissues coloring is visible. Indiscretions have been brought to completion in cell arrays. e cell shows more common in natural tissue and its color is darker. However, there is also a low description of only some low-level dimensions of certain complex systems. In this research, different algorithms were tested to extract features at higher levels on morphological images. So, with extraction algorithms for GLCM features, LBP features, LBGLCM features, GLRLM features, and SFTA features, optimization methods have been attempted in the literature. As an appropriate method to extract features from individual image, each algorithm is applied to the entire dataset. Unbalancing of dataset, therefore, using hand-crafted features will trigger poor performance from the dataset. e unbalanced data problem is eliminated with the ADASYN algorithm [12] , which then deletes irrelevant features by ReliefF that improves training time. Finally, for the classification of data, the feedforward neural network is used. e methodology is adopted to distinguish among eight classes (benign to malignant) according to Breast Imaging-Reporting and Data System (BI-RADS). For such a purpose, preprocessing and segmentation are skipped in this algorithm and restrict the algorithm to four stages, i.e., features extraction, features margin, oversampling, and feature selection and reduction; finally classification is shown in Figure 1 . e model that was mentioned was checked on the INbreast database [36] . For each object, five hand-crafted features (GLCM, GLRLM, LBP, LBGLCM, and SFTA) were extracted from 8 groups of images and their values are stored in a file. For each image, 88 features are obtained by combining these feature vectors. en, with the ADASYN procedure, the feature vectors of 411 images consisting of 88 functions are oversampled. After this step, 1773 function vectors are generated. Using the ReliefF algorithm, these features were layered down to ten subset functions. In the end, feedforward neural network was modified to multilayer and trained on a selected subset to get a result. e Gray Level Cooccurrence Matrix (GLCM). GLCM is a common method of extraction of texture-based features. By performing an operation in the images due to the second-order statistics, the GLCM decides the textural relationship between pixels. For this procedure, two pixels are normally used [37] . e frequency of variations in these measured pixel brightness values is specified by the GLCM. Namely, it reflects the pixel pairs' frequency creation [38] . As seen in Figure 2 , there are several statistical characteristics from a GLCM grey level picture type. e square matrix of features can be denoted by G (i, j). Four distinct forms are used to segment the G matrix into regularized typical forming modes. Such patterns are referred to as crossed directions: vertical, lateral, right, and left paths. is can be determined for both neighboring paths. e Grey Level Cooccurrence Matrix was utilized to extract 22 texture characteristics I-e dissimilarity, association, homogeneity, liveliness dissimilarity, entropy, cluster hue, square variance number, energy, sum variance, sum average, entropy, sum entropy, entropy difference, maximum likelihood, cluster prominence, variance difference, normalized autocorrelation inverse difference moment and measurement details. [39] . A quite effective technique that is responsive to light variations is the extraction algorithm. e LBP method can simply be defined as follows; the image is crossed through a window with a given neighborhood value. And an assignment of an image pixels mark is made. In this step, the threshold is applied according to pixel values adjacent to the middle pixel. e LBP matric is then determined according to clockwise or counter clockwise values in the surrounding neighborhood. us, it comprehensively defined the systemic and statistical pattern of the textural system [40] . e LBP algorithm's most key qualities are resistant to changes in the grey level and statistical versatility in real-time applications, which could be used [41] . A combined approach applied along with Local Binary Pattern (LBP) and GLCM is the LBGLCM feature extraction process. e grey level picture is related to the LBP methodology. Instead, since the acquired LBP texture image, GLCM features are removed. At the feature extraction point, the GLCM technique takes adjacent pixels into consideration. It does not execute any procedure in the picture on other local patterns. Textural and spatial knowledge in the picture is collected in accordance with the LBGLCM process. e availability of the LBGLCM algorithm in image processing applications is improved by the simultaneous acquisition of this information [42] . In extracting the spatial properties of gray level pixels, GLRLM uses higher-level statistical techniques. e structure of the features obtained is two-dimensional. Each value in the matrix reflects the maximum value of the grey level. e characteristics of GLRLM are seven in total. Short-term concentration, long-term focus, and graylevel semi, run-long nonuniformity, take, low gray-level running focus, and overall organization running focus are such high statistical characteristics [43] . Limited computation time and successful attribute extraction are important in texture analysis. e SFTA solution is a methodology that can be tested in this theory. roughout the SFTA process, multiple thresholding techniques transform the image into a binary form. resholds of t1, t2, t3, . . ., tn are rendered. Interclass and in-class variance values are used to determine the threshold sets. e optimal threshold number is added to the representation regions to minimize the in-class variance value. Algorithm. To address unbalanced class allocation issues for classification activities, the ADAYSN approach is useful. is method is applied to all minority classes. In general, ADASYN bases its operation on weighting the examples of the minority classes according to their difficulty of being learned; therefore more synthetic data will be generated from the more difficult samples, and fewer samples in the case of the easier to learn [12] . is sampling method aims to help the classifier in two ways: first, reducing the error produced by the imbalance of the classes and then focusing the synthetic samples only on the difficult samples to learn [45, 46] . To apply the oversampling method in a multiclass problem, all of the sampled minority classes will be nearer to 1 until the imbalanced rate is nearer to 1. For example, the second is the majority class, with 220 samples, and the first class is with 67 samples; the imbalance rate of these classes is 0.3045. e ADASYN process generates synthetic samples before the rate equal to or nearest to 1 in the method generates 154 samples, generating an imbalanced rate � ∼1.3rd class of 24 samples, the imbalance rate is 0.1090, the method generates 192 samples, the fourth class of 13 samples generates an imbalance rate of 0.05909, and the method generates 209 samples. In the fifth class of 8 samples, the imbalance rate is 0.0363 and the method generates 208 samples; in the sixth class of 21 samples, the imbalance rate is 0.0954 and the method generates 207 samples; in the seventh class of 50 samples, the imbalance rate is 0.2272 and the method generates 181 samples, and in the last class of 8 samples, the imbalance rate is 0.0363 and the method generates 211 samples of the minority class. ADASY displays the balance of eight groups in Figure 3 . e multiple classes' classification problem is described. e algorithm representation is shown in Algorithm 1. is stage is of significant importance as rarely distinguishable features are discarded, which increases the computational burden on the modeling process. In the present study, ReliefF algorithm was used for dimensional reduction as ReliefF is proved to be computationally efficient and has been proved to be sensitive to complex patterns of association [47] [48] [49] [50] . It was used to estimate attributes based upon how their values reflect variations between instances close to each other. By using the abovementioned method, 88 raw features were selected and reduced to 10 optimal features because of their participation topmost according to the weightage to get maximum accuracy. e graph shows 99.5%. Accuracy at 10 features and 12 features is 100%. is is shown in Figure 4 . So that is why we select 10 features. ere is not much difference in output, so we skip the remaining ones because that elevates the computational burden of a model. e graph in Figure 4 shows cumulative accuracy against feature numbers which are selected as 10 features that can participate more than total variances. ReliefF is capable of accurately evaluating the quality of attributes with heavy correlations between attributes in classification problems. ey have a global perspective by leveraging local knowledge generated by distinct contexts. ReliefF makes a ranking of features according to weight-age and the ones who are participating the most come in the first predictor rank. Other features contain nearly less contribution toward the last as shown in Figure 5 . So we can select the 10 most powerful features according to their weight-age and skip irrelevant features that elevate the computational burden of the model [47] [48] [49] . Original ReliefF is work done with two classes: difficulties and a stronger class, which deal with noisy data and imperfect data. It is used to calculate feature usability for every feature which can later be applied in order to select features that contain top scores for feature selection. Likewise, ReliefF selects an instance T randomly (Step 3), but later, k observes the closest neighbors of the same class called the nearest hit values of H. So we use the number of nearest neighbors as 3 to mean a positive integer scalar (Step 4), and in the same (1) n s is the majority class, n l is the minority class; (2) for n s ∈ y do (3) % the imbalance index d is calculated % (4) d � n s /n l (5) if d < dth then (6) % where dth is the maximum tolerance to imbalance rate (7) % Calculate the number of synthetic data examples G, where β ∈ [0, 1] % (8) G � (n s − n l ) * β (9) for each example x n ∈ n s do (10) Find k nearest neighbors (∆n) (11) % Calculate the ratio r n % (12) r n � ∆n/K (13) % Normalize r n % (14) r n � r n / a n�1 r n P n (15) % Calculate the number of synthetic data examples that need to be generated % (16) g n � r n * G (17) % Generate g n samples % (18) (X res , Y res ) � append new sample (19) Classification of morphological images of each class and identifying cancerous conditions seem to be very complex. Feedforward neural network is used to classify tumors [41] . is algorithm is the fastest backpropagation algorithm and is strongly recommended to be used as a first choice supervised algorithm and does not need more memory than others. e network has used four hidden layers. Feedforward neural network was developed by using "newff," command. e first hidden layer of the network contained 40 neurons having linear transfer function. e second secret layer held 20 hidden neurons. And the third and fourth secret layers contained 10 and 8 hidden neurons. e four-layer network shown (Layer-4) is the output layer and the remaining three layers are hidden layers (Layer-1, 2, 3 ). e problem under consideration was multilabel classification. For problems with more than two classes, the softmax function is used with multinomial cross-entropy as the cost function; it updated the weight as well as bias values conferring to Levenberg-Marquardt optimization. Data is classified into training, validation, and testing as 60 percent is for preparation, 20 percent is for validation, and the remaining 15 percent is for research. It is fastest for training a moderatesized FNN. It has been deduced that this optimization is used for approaching second direction training speed without the Hessian matrix. On the other hand, training feedforward networks Hessian matrix can be as Gradient will be where J represents Jacobian matrix having network errors derivate regarding biases and weights and network errors represent vector by "e." A network has four layers. Each layer has a matrix of "W" mass, a vector of 'b' bias, and an output vector "a." To differentiate between matrices of weight, vectors of output, etc., in our estimates for each one of these layers, we are adding the layer number as a superscript to the interest variable. In the four-layer network seen in Figure 6 , you can see the use of this layer notation and in the calculations at the bottom of the diagram. e network is shown in Figure 6 , below R1 inputs, S1 neurons first layer, S2 neurons second layer, etc. For different layers, it is common to have different neuron numbers. For each neuron, the constant input 1 is fed to the biases. Notice that each intermediate layer's outputs are the inputs to the following layer. It is also possible to evaluate layer 2 as a one-layer network with S1 inputs, S2 neurons, and a W2 weight matrix of S2 × S1. A1 is the input to layer 2; a2 is the output. Now that we have identified all the vectors and matrices of layer 2 and so on with other layers, the output layer of the network is the last layer. is approach can be adjusting weight as shown in Figure 7 . e proposed model for MATLAB 2017a has been created. It was used on a Core-i7 processor personal computer with 8GB of RAM as well as 2GB of the graphics card. Mammographic photographs have been used for the study of the scheme proposed. ese photographs were taken from the INbreast dataset [26] , created by multiple University of Porto institutions, and made available to the public with the permission of the authors. e dataset had a total of 411 images. e matrix of the images was 3328 × 4084 or 2560 × 3328 pixels, while the images were processed in DICOM format. e present research used 411 mammogram images and was further divided into 6 classes according to the Breast Imaging-Reporting and Data System (BI-RADS) class [26] and the classification is shown in Figure 8 . It is a risk management and quality assurance method developed by the American College of Radiology that offers a generally recognized lexicon and reporting scheme for breast imaging as shown in Table 1 . is refers to mammography, ultrasound, and MRI. e archive is accessible on this web page. http://medicalresearch.inescporto.pt/breastresearch/ind ex.php/Get_INbreast_Database. Input: Training feature values and the class value Output: w of valuations of the makings of features. Step 1 Set all weights w[A]: � 0.0; Step 2 For i: � 1 to m do begin Step 3 Randomly select an instance ri; Step 4 Find k-nearest hits h j ; Step 5 For each class C class(ri) do Step 6 From class C find k nearest misses mj (c); Step 7 For A: � 1 to a Step 9 [(c)1 − p(class ri diff(a, rihj))]kj � 1(m, k)C ≠ class ri Step 10 End Lw4, 2 Multilabel classification and the closely related problem of multioutput classification are variations of the classification problem where several labels can be applied to each case. Multilabel classification is a generalization of multiclass classification [51] , which is the single-label problem of categorizing instances into exactly one of more than two classes; there is no restriction on how many of the classes an instance may be allocated to in the multilabel problem. In our case of Multiclass Classification Issue to be an 8-Class Classification Issue since we have a dataset that has eight class names, sensitivity, precision, and consistency are very critical for the system. Sensitivity reflected the proportion of both positive and genuinely positive events [52] . Specificity showed the percentage of true negative classified cases. Meanwhile, accuracy depicted the percentage of true positive and true negative correctly classified mammograms. e confusion matrix represented in Figure 8 shows the actual and predicted class count obtained by the classifier. But multiclass classification is an implementation of unlikely binary classification of mammograms. Since the classes here are not positive or negative, at first, it may be a little difficult to find TP, TN, FP, and FN because there are no positive or negative grades, but it is pretty simple. What we did here, for each person class, was to find TP, TN, FP, and FN. We take class BI-RADS 1 and then let us see the values of the metrics from the confusion matrix. Confusion matrix is represented in Figure 8 , For the true negative (TN) class, it is not difficult to take the number of columns and rows and deduct the column and row class. e assessment metrics written below are those of the following. Sensitivity is used to deal with positive cases only. e ratio of classified to the actual positive cases is shown by sensitivity. When sensitivity is high, the false negative rate is less. Specificity deals with negative cases only. It is used to depict the ratio of actual negative cases to classified ones. When specificity is greater, the false positive rate is less. Positive predictive value (PPV) is used to deal with positive predictive cases only. e ratio of classified to the actual positive predictive cases is shown by PPV. positive predictive value(PPV) � TP (TP + FP) . Negative predictive value (NPV) deals with negative predictive cases only. It is used to depict the ratio of actual negative predictive cases to classified ones. When NPV is greater, the false positive predictive rate is less. negative predictive value(NPV) � TN (TN + FN) . Accuracy deals with the correctness of classification results. e system is considered efficient when accuracy is more. Data is classified into training, validation, and testing by using 10 global features, in which 60% is for training, 20% is considered for validation, and the remaining 20% is for testing. After applying oversampling ADASYN, 1773 images are used, in which 1764 images are correctly classified and 6 images are misclassified. ree phases are included in the proposed framework: feature extraction, feature selection, and classification. Firstly, with 411 samples, the classification results are examined in raw form for 88 features [42] . And then the ReliefF algorithm chooses the 10 most contributory characteristics [50] . rough comparing the output of the classification processes by the FNN algorithm, the contribution of the oversampling system is explored. And there results of each class are shown in Table 2. And individual class accuracy is defined in Figure 9 . Figure 9 shows individual accuracy of each class with samples of classes. e methods involve four steps: feature extraction, oversampling, feature selection, and classification. Here is the oversampling method ADASYN which helps balance all classes until the imbalanced rate will be closest to 1. And it also prevents overfit problem. ADASYN bases its operation on weighting the examples of the minority classes according to their difficulty of being learned; therefore more synthetic data will be generated from the more difficult samples, and fewer samples are in the case of the easier to learn [25] . e majority class only provides information to quantify the degree of class imbalance, and the number of synthetic data examples to be generated for the minority class in Table 3 shows the output parameters of the 10 most significant feature vectors consisting of 1773 samples of the ADAYSN algorithm. Accuracy of each class is shown in Figure 10 . Growing the minority groups with synthetic samples is shown to have a beneficial impact on the accuracy of classification. And the results of each class are shown in Table 3 . e definition made with the inclusion of synthetic samples is seen to generate a contribution of more. One of the most relevant factors is to eliminate the imbalance between classes shown in Table 4 . After accuracy level improved, in the confusion matrix, the correctly classified cases are shown in Figure 11 . Diagonal is within green color, while misclassified cases are shown in red color. e last column of the confusion matrix shows the sensitivity, precision, and accuracy of the model. e training state is presented in Figure 12 . e gradient is the value of the backpropagation gradient on every iteration. Epoch shows how many iterations should be shown for training purposes. And MU is momentum update which includes weight update expression to avoid the problem of a local minimum. e performance plot for training is shown in Figure 13 . Training mean square error (mse) is downloading which shows perfect training. Best training performance shows few errors determined in Figure 13 , which shows 0.0012691 error estimates that are minimal error rate. e network's ability to estimate the model target is evaluated by showing the regression plot in Figure 14 . Regression evaluation can support a model that links between a dependent variable (which you are seeking to forecast) and one or better independent variables (the input of the model). Regression evaluation can determine if there is a considerable link between the independent variables and the dependent variable and the weight of the impact-when the independent variables move, by how much you can predict the dependent variable to proceed. Here we use linear regression to achieve the number of outputs. Linear regression is suitable for dependent variables that are stable and can be fitted with a linear function (straight line). e plot shows that the linear regression of the targets relatively achieves the numbers of outputs. e plot shows that the linear regression of the training, validation, and testing and overall targets of comparison achieves the number of outputs. ReliefF optimal features with raw samples Samples Accuracy Many researchers have worked on those key modules of CAD. is created a need to develop a computer-aided diagnosis system. It has two aims. First is the disclosure of abnormal breasts from mammographic images by an innovative approach. And the second is various classes' classification that uniquely explains all classes according to BI-RADS. For tumor identification, few features contribute to a poor classification due to a slight variation in textures. e present thesis focuses on 5 different algorithms for the extraction of features that can extract different features. en, because of the overfitting problem, we purpose the oversampling technique by ADASYN that increases the number of samples which also increases algorithm complexity and train time. So for that, we apply feature selection techniques such as ReliefF. And for multiple class classification, we use FNN and the results are shown by a confusion matrix. Table 5 shows that Abdel-Nasser et al. [2] used local database by extracting 20 optimal features and SVM classifier to obtain accuracy of 88.0%. Pérez et al. [18] in 2011 used DDSM dataset and extracted GLCM + GLCLM 19 features from a region of interest (ROI) to obtain accuracy of 94.9% and furthermore on second hand selected 12 most contribution features of GLCM + GLCLM and got 92.3% accuracy. For classifying tumor region, they used a Boolean vector algorithm. Nababan et al. [16] used Mini MIAS dataset and got 92.26% sensitivity, 92.28% specificity, and 92.27% accuracy. Wang's 16,777,216 features were further reduced to 18 features by using PCA and FFN. Frisk et al. [5] used INbreast dataset and extracting GLCM 16 features from ROI and for classification used SECoS techniques obtaining 82.98% accuracy. Alam and Faruqui [39] expanded the previous work in 2012 and classified by decision tree and obtained accuracy of 96.7% by extracting 19 features and also obtained accuracy of 93.3% by extracting 12 most contributed features. Ozturk et al. used shrunken features which includes GLCM + LBGLCM + GLRLM + SFTA, for oversampling by SMOTE algorithms and classification use and PCA after which they got 94% accuracy for a COVID-19 dataset. We purposed a model by using INbreast dataset; the proposed model focuses on four steps: global feature extraction, oversampling method, feature selection method, and lastly classification and we got 99.5% sensitivity, 99.4% specificity, and 99.5% accuracy. Various investigations have been carried out in the field of medicine to study medical disorders and thus to find their correct diagnosis. For this purpose, in the present work, data Table 5 : Result of proposed system and comparison with methodology by using features. Methodology Features Accuracy (%) Abdel-Nasser et al. [2] Local database + GLCM + SVM 20 88.0% Pérez et al. [18] DDSM + ROI + GLCM + GLCLM + Boolean vector 19 12 94.9% 92.3% Nababan et al. [16] Mini MIAS + WFRFT + Jaya-FNN 18 92.27% Nababan [5] INbreast + ROI + GLCM + SECoS 16 82.98% Mohanty et al. [49] DDSM + ROI + GLCM + GLCLM + decision tree 19 12 96.7% 93.6% Ozturk et al. [42] Shrunken features + SMOTE + PCA 20 94.23% mining techniques were considered. By utilizing fewer numbers of features, the computational time was reduced without dropping the accuracy of diagnosis. Instead of using complex systems to strengthen the classification accuracy, an effort was made to adopt a simple method to produce a significant result. e results showed 99.5% accuracy which proved the effectiveness as well as the robustness of the proposed system. e archive is accessible on the web page http:// medicalresearch.inescporto.pt/breastresearch/index.php/ Get_INbreast_Database. e authors declare no conflicts of interest in this research. Is breast cancer survival improving Temporal mammogram image registration using optimized curvilinear coordinates Frailty in older adults: evidence for a phenotype e longitudinal relationship between immune cell profiles and frailty in patients with breast cancer receiving chemotherapy No association between low-dose aspirin use and breast cancer outcomes overall: a Swedish population-based study Personalized body constitution inquiry based on machine learning Review of microwaves techniques for breast cancer detection Using machine learning algorithms for breast cancer risk prediction and diagnosis Foundation and methodologies in computer-aided diagnosis systems for breast cancer detection Image classification based on fuzzy logic Application of feature extraction and classification methods for histopathological image using ADASYN: adaptive synthetic sampling approach for imbalanced learning Comparative analysis of breast cancer detection in mammograms and thermograms Computer-aided diagnosis in mammography: classification of mass and normal tissue by texture analysis Computer aided detection of microcalcification clusters in mammogram images with machine learning approach Breast cancer identification on digital mammogram using evolving connectionist systems Mammogram mass classification using support vector machine with texture, shape features and hierarchical centroid method Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography Detection and characterization of mammographic masses by artificial neural network," in Digital Mammography Mammogram segmentation using maximal cell strength updation in cellular automata FDG-PET/CT detection of very early breast cancer in women with breast microcalcification lesions found in mammography screening Breast density evaluation using spectral mammography, radiologist reader assessment, and segmentation techniques Evaluation of stellate lesion detection in a standard mammogram data set Computer-aided classification of breast masses in mammogram images based on spherical wavelet transform and support vector machines Mammogram classification using contourlet features with forest optimization-based feature selection approach Breast cancer detection in thermal infrared images using representation learning and texture analysis methods Abnormal breast detection in mammogram images by feedforward neural network trained by Jaya algorithm Breast-cancer tumor size, overdiagnosis, and mammography screening effectiveness Classifying benign and malignant mass using GLCM and GLRLM based texture features from mammogram Focal loss for dense object detection Cost-sensitive learning of deep feature representations from imbalanced data Identification of breast cancer using integrated information from MRI and mammography Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain An analysis of local and global solutions to address big data imbalanced classification: a case study with SMOTE preprocessing Addressing the big data multi-class imbalance problem with oversampling and deep learning neural networks INbreast Adaptive gray level run length features from class distance matrices Transition temperatures of thermotropic liquid crystals from the local binary gray level cooccurrence matrix Optimized calculations of haralick texture features Description of interest regions with local binary patterns Automatic age classification with LBP Classification of coronavirus images using shrunken features Local relative GLRLM-based texture feature extraction for classifying ultrasound medical images Fast feature selection using fractal dimension Farthest SMOTE: a modified SMOTE approach Handling class imbalance problem using oversampling techniques: a review A practical approach to feature selection Relief-based feature selection: introduction and review Texture-based features for classification of mammograms using decision tree ReliefF for multi-label feature selection Squares: supporting interactive performance analysis for multiclass classifiers Application of machine learning on brain cancer multiclass classification Acknowledgments e authors thank their supervisor Mr. Shengjun Xu who gave consistent support, motivation, and expert guidance to complete the research work and also the Chinese Government who gave this support and funding. is research was done under the supervision of Shengjun Xu and funded by the National Natural Science Foundation of China (Nos. 51678470 and 61803293) and also supported by the National Natural Science Foundation of Shaanxi Province, China (2019JQ-760).