key: cord-0540081-8u5uhi7q authors: Mondal, Chayan; Hasan, Md. Kamrul; Jawad, Md. Tasnim; Dutta, Aishwariya; Islam, Md.Rabiul; Awal, Md. Abdul; Ahmad, Mohiuddin title: Acute Lymphoblastic Leukemia Detection from Microscopic Images Using Weighted Ensemble of Convolutional Neural Networks date: 2021-05-09 journal: nan DOI: nan sha: 6352f60ebd22eafd088ef2375274cd33c5fa386a doc_id: 540081 cord_uid: 8u5uhi7q Acute Lymphoblastic Leukemia (ALL) is a blood cell cancer characterized by numerous immature lymphocytes. Even though automation in ALL prognosis is an essential aspect of cancer diagnosis, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy demands experienced pathologists to carefully read the cell images, which is arduous, time-consuming, and often suffers inter-observer variations. This article has automated the ALL detection task from microscopic cell images, employing deep Convolutional Neural Networks (CNNs). We explore the weighted ensemble of different deep CNNs to recommend a better ALL cell classifier. The weights for the ensemble candidate models are estimated from their corresponding metrics, such as accuracy, F1-score, AUC, and kappa values. Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network. We utilize the publicly available C-NMC-2019 ALL dataset to conduct all the comprehensive experiments. Our proposed weighted ensemble model, using the kappa values of the ensemble candidates as their weights, has outputted a weighted F1-score of 88.6 %, a balanced accuracy of 86.2 %, and an AUC of 0.941 in the preliminary test set. The qualitative results displaying the gradient class activation maps confirm that the introduced model has a concentrated learned region. In contrast, the ensemble candidate models, such as Xception, VGG-16, DenseNet-121, MobileNet, and InceptionResNet-V2, separately produce coarse and scatter learned areas for most example cases. Since the proposed kappa value-based weighted ensemble yields a better result for the aimed task in this article, it can experiment in other domains of medical diagnostic applications. Cancer, a group of uncommon and distinctive diseases, is one of the deadliest diseases [17] , which is abnormal and uncontrolled cell growth. In 2020, World Health Organization (WHO) claimed that approximately 19.3 million people were diagnosed with cancer, caused a death of 10 million people, which is almost 1.6 times greater than in 2000 [81] . The affected number is expected to be around 50 percent higher in 2040 than now [81] . Among various types of cancer, one of the most common types of childhood cancer is Acute Lymphoblastic Leukemia (ALL), which affects the White Blood Cells (WBCs) [6] . ALL patients have an excessive amount of premature WBCs in their bone marrow and can spread to other organs, like the spleen, liver, lymph nodes, central nervous system, and testicles [1] . Although the leading causes of ALL are unknown yet, several representatives, like contact with severe radiation and chemicals, such as benzene and infection with T-cell lymphoma, can boost the possibility of generating ALL [5] . Almost 55.0 % of total worldwide ALL cases are caused in the Asia Pacific region [69] . According to WHO, ALL's total cases are 57377, which is 21.9 % of the worldwide total childhood cancer cases in 2020 [80] . following section. This section presents the review of current CAP methods for the analysis of ALL, where we first discuss the ML-based systems, then subsequently DL-based approaches. ML-based methods. Mohapatra et al. [56] proposed a fuzzy-based color segmentation method to segregate leukocytes from other blood components, followed by the nucleus shape and texture extraction as discriminative features. Finally, the authors applied a Support Vector Machine (SVM) [15] to detect leukemia in the blood cells. The k-means Clustering (KMC)-based segmentation [51] was employed by Madhukar et al. [52] to extract the leukocytes' nuclei using color-based clustering. Different types of features, such as shape (area, perimeter, compactness, solidity, eccentricity, elongation, form-factor), GLCM [58] (energy, contrast, entropy, correlation), and fractal dimension were extracted from the segmented images. Finally, they applied the SVM classifier, utilizing K-fold, Hold-out, and Leave-one-out cross-validation techniques. Joshi et al. [38] developed a blood slide-image segmentation method followed by a feature extraction (area, perimeter, circularity, etc.) policy for detecting leukemia. The authors utilized the k-Nearest Neighbor (KNN) [21] classifier to classify lymphocyte cells as blast cells from normal white blood cells. Mishra et al. [55] proposed a discrete orthonormal S-transform [71] -based feature extraction followed by a hybrid Principal Component Analysis (PCA) and linear discriminant analysis-based feature reduction approach for a lymphoblastic classification scheme. Finally, the author classified those reduced features using an AdaBoost-based Random Forest (ADBRF) [25] classifier. The authors in [53] aimed at four machine learning-based algorithms, such as classification and regression trees (CART), RF, Gradient Boosted (GM) engine [18] , and C5.0 decision tree algorithm [45] . Their experiment demonstrated the superior performance of the CART method. Fathi et al. [14] produced an integrated approach combining PCA, neuro-fuzzy, and GMDH (group method of data handling) to diagnose ALL, which helps to detect two types of leukemia, such as ALL and acute myeloid leukemia. Kashef et al. [39] recommended different ML algorithms, such as decision tree [25] , SVM, linear discriminant analysis, multinomial linear regression, gradient boosting machine, RF, and XGBoost [25] , where the XGBoost algorithm exhibited the best results. Authors in [16] developed a K-means image segmentation and marker controlled segmentation-based classification and detection algorithms, where multi-class SVM was used as a classifier. Table 1 shows a summary of several ML-based models for ALL classification with their respective pre-processing, utilized datasets, and classification results in terms of accuracy. [63] . According to weighted majority voting, they corrected the initial label of the test cell images. Authors in [3] proposed a ten-layer CNN architecture to detect ALL automatically. In [62] , the authors compared three different deep learning-based algorithms, such as AlexNet, GoogleNet [75] , and VGG classifier model, to detect lymphoblast cells. Recently, Gehlot et al. [17] developed the SDCT-AuxNetθ classifier that uses features from CNN and other auxiliary classifiers. Rather than traditional RGB images, stain deconvolved quantity images were utilized in their work. Table 2 summarizes several DL-based models for ALL classification with their respective pre-processing, utilized datasets, and classification results in terms of F1-score. The above discussions in Section 1.2 on the automatic ALL detection from the microscopic images recommend that different CNN-based DL methods are most widely adopted nowadays, as they alleviate the necessity of handcrafted feature extraction (see details in Table 1 and Table 2 ). Although many articles have already been published, there is still room for performance improvements with better genericity of the trained model. Moreover, the CNN-based approaches experience data insufficiency to avoid overfitting, where the en-semble of different CNN architectures relieves the data scarcity limitations, as demonstrated in various articles [10, 20, 41, 50, 66, 83] . With the aforementioned thing in mind, this article intends to contribute to the exploration of building a robust ensemble model for the ALL classification, incorporating different pre-processing. We propose to aggregate the outputs of the ensemble candidate models, considering their corresponding achievements. Therefore, we proposed a weighted ensemble model, where we conduct an ablation study to determine the best weight metric. We perform the center-cropping of the original input images to enhance the detection results. A center-cropping enables the classifier to discover the abstract region and detailed structural information while bypassing neighboring background areas. This section illustrates the materials and methodology employed for the ALL classifi- The utilized datasets were released in the medical imaging challenge, named C-NMC-2019 [19] , organized by IEEE International Symposium on Biomedical Imaging (ISBI), which contains 118 subjects with 69 ALL patients and 49 Hem patients. The detailed information of the datasets is represented in Table 3 . In our proposed model, the training dataset is split into train and validation sets, and final prediction is made throughout only the preliminary test set, as shown in Table 3 . The resolutions of the dataset's image size are 450 × 450 pixels. Several sample images of the C-NMC-2019 dataset are displayed in Fig. 2 . Table 3 shows that the dataset is imbalanced, and cancer cell images of training are around 2.15 times more than normal cell images, making the classifier biased towards the ALL class. Such a biasness due to data imbalance is alleviated in the proposed framework by applying two following techniques (see details in Section 2.1.2). Our proposed system's crucial integral pre-processing strategies are briefly described, ensuring the better ALL prognosis system. Almost every image in the utilized dataset contains the region of interest in the center position with black background (see in Fig. 2) . Therefore, we have cropped the images centrally as the size of 300 × 300 pixels to decrease the overall dimension of the input data, making learning a classifier faster and easier by providing the region of interest [60] . The class imbalance is a common phenomenon in the medical imaging domain as manually annotated images are very complex and arduous to achieve [20, 23] . Such a class imbalance can be partially overcome using different algorithmic-level approaches. We have used the random oversampling technique, which involves replicating the samples of minority class randomly and adding to training samples for balancing the imbalanced datasets. In our proposed model, the Hem class was oversampled to 5822 images, and a total of 11644 images were trained during the training process. Different data augmentation techniques, such as horizontal and vertical flipping, rotation, zooming, and shifting, are applied during the training process for enhancing the model's performance and building a generic model. As mentioned earlier in Section 1.2 that the CNN-based methods outperform ML-based methods and the radiologists with high values of balanced accuracy as proven in [40, 61] . However, single CNN may be obliquely limited when employed with highly variable and distinctive image datasets with limited samples. Transfer learning technique from a pretrained model, which was trained on a large dataset previously, is becoming popular day by day for its advantage of using learned feature maps without having a large dataset. In this circumstance, we adopted five pre-trained networks, such as VGG-16 [67] , Xception [8] , MobileNet [33] , InceptionResNet-V2 [74] , and DenseNet-121 [35] for using transfer learning application and building an ensemble classifier to categorize ALL and Hem white blood cell images. 3) . Such a configuration allows the network to capture more excellent information with lesser computational complexity. In VGG-16, five max-pooling layers carry out spatial pooling consists of (2 × 2) kernel size, which downsamples the input by a factor of 2, bypassing the maximum value in a neighborhood of (2 × 2) to the output. The VGG-16 ends with three fully connected layers followed by a 2-node softmax layer. Xception (CN N 2 ) . Xception Chollet [8] is InceptionResNet (CN N 4 ) . The InceptionResNet is a deep neural network designed by He et al. [31] in 2016, combining the Inception architecture [74] with the residual connection. It has a hybrid inception module inspired by the ResNet, adding the output of the convolution operation of the inception module to the input. In this network, the pooling operation inside the main inception modules is replaced in favor of the residual connections. DenseNet (CN N 5 ) . The DenseNet is a memory-saving architecture with high computational efficiency, which concatenates the feature maps of all previous layers for the inputs to the following layers [10] . DenseNets have remarkable benefits, such as they can alleviate the vanishing gradient problem, encourage feature reuse, strengthen feature propagation, and significantly reduce the number of parameters. DenseNets consists of Dense blocks, where the dimensions of the feature maps remain constant within a block, but the number of filters changes between them and Transition layers, which takes care of the downsampling, applying batch normalization, 1 × 1 convolution, and 2 × 2 pooling layers. Ensemble's Strategies. Esteva et al. [13] proved that CNNs could outperform a human expert in a classification task after an exhausting learning phase on a huge annotated training set. However, in many cases, a sufficient number of annotated images (ground-truth) is not available, so we should improve the accuracy by other approaches. The fields of decision making and risk analysis, where information derived from several experts and aggregated by a decision-maker, have well-established literature [37, 43] . In general, the aggregation of the opinions of the experts increases the precision of the forecast. To achieve the highest possible accuracy considering our image classification scenario, we have investigated and elaborated an automated method considering the ensemble of CNNs. To perform the aggregation for building an ensemble classifier, the outputs of the classification layers have been considered, which use the output of the fully-connected layers to determine probability values for each class (n = 2). A CNN ascribes n probability values P j ∈ R to an unseen test image, where P j ∈ [0, 1], ∀j = 1, 2, and n j=1 P j = 1. In ensemble modeling, we have to find out the probabilities P j , where P j ∈ [0, 1], ∀j = 1, 2, and n j=1 P j = 1 for each test image from the probability values of the individual CNN architecture. The possible ensemble's approaches are discussed in the following paragraphs. Simple Averaging of Probabilities (SAP). Averaging of the individual class confidence value is considered as one of the most commonly used ensemble model [20, 25] , which can be expressed as in Eq. 1. where P jk and N stand for the probability of CN N k that a test image belongs to a particular class and the number of CNN models (N = 5). Unluckily, an image may be misclassified through the SAP technique if a model with low overall accuracy treats a test image with high confidence, while the other models also provide low but non zero confidence values to the same wrong class [20] . where W k denotes the weighted value of each CN N k , ∀k ∈ N = 5. We have used four evaluation score, such as accuracy, AUC, F1-score, and Cohen's Kappa, as weighted values denoted as W acc k , W auc k , W f 1 k , and W kappa k , respectively. The term m k=1 W k normalizes the P j to ensure that P j ∈ [0, 1], ∀j = 1, 2 and n j=1 P j = 1. We employ the adamax optimizer [42] with an initial learning rate of 0.0002 to train all five different CNN models. The values of β 1 and β 2 are set to 0.9 and 0.999, respectively. Sometimes, monotonic reduction of learning rate can lead a model to stuck in either local minima or saddle points. A cyclic learning rate policy [68] is used for cycling the learning rate between two boundaries, such as 0.0000001 and 0.002. The "triangular2" policy shown in Fig. 3 is applied, and the step size is set to StepSize = 6 × IterP erEpoch, where IterP erEpoch denotes the number of iterations per epoch. Categorical cross-entropy is employed as a loss function, and accuracy is chosen as the metric to train our models. Step Our where F S(c i ) is the F1-score of i th class, n(c i ) is the number of test images in i th class, and N is the total number of unseen test images. This section demonstrates and interprets the obtained results from comprehensive ex- with several recent results for the same dataset and task. The sample images have been center-cropped using the nearest neighbor interpolation technique to eliminate the black border regions and provide a better area of interest, as pictorially illustrated in Fig. 4 . Such a center-cropping to the size of 300×300 pixels reduces the surrounded black background without distorting the original texture, shape, and other pieces of information (see in Fig. 4) . Table 4 manifests the ALL classification results for these two Table 4 ) also confirm that all the individual CNN model enhances their respective performance, while the center-cropped images are utilized as an input. The ROC curves in Table 4 . beats all the remaining models, providing the best ROC curve with the maximum AUC value. In the end, the proposed W EN kappa model has outputted the best ALL categorization results when inputted with the 300 × 300 pixels (center-cropped), as experimentally verified in the ROC curves in Fig. 5 . The detailed class-wise performances of ALL classification by the two best-performing classifiers with the center-cropped inputs, such as Xception from individual CNN and kappabased weighted ensemble (W EN kappa ) from the proposed fusion models, are exhibited in Table 5 (left) and Table 5 (right), respectively. and Choo [41] with a very significant margin of 6.9 %. This article proposed and developed an automated CNN-based acute lymphoblastic leukemia detection framework for the early diagnosis, combining center-cropping, image augmentations, and class rebalancing. It was experimentally certified that the center-cropped images rather than the whole images contribute higher salient and discriminative features from the CNNs, leading to increased ALL detection. The ensemble model for the image direction will also focus on investigating the effect of data imbalance and accounting for the subject information fully, assuming that DL models can be adopted in more and more Xie et al. [84] Marzahl et al. [53] Verma and Singh [77] Shi et al. [65] Ding et al. [9] Shah et al. [64] Kulhalli et al. [45] Liu and Long [49] Khan Figure 9 : The comparison of several ALL detection methods (our proposed and recently published) on the same C-NMC dataset and the same task, showing the weighted F1-score (WFS). About Acute Lymphocytic Leukemia (ALL) Acute lymphocytic leukemia detection and diagnosis A convolutional neural network-based learning approach to acute lymphoblastic leukaemia detection with automated feature extraction Deep learning based computer-aided diagnosis systems for diabetic retinopathy: A survey Causes, risk factors, and prevention Types of Cancer that Develop in Children Acute Lymphoblastic Leukemia (ALL) Xception: Deep learning with depthwise separable convolutions Imagenet: A large-scale hierarchical image database Deep learning for classifying of white blood cancer Sd-layer: stain deconvolutional layer for cnns in medical microscopic imaging Skin lesion classification using convolutional neural network for melanoma recognition. medRxiv Dermatologistlevel classification of skin cancer with deep neural networks Design of an integrated model for diagnosis and classification of pediatric acute leukemia using machine learning Support vector machine classification and validation of cancer tissue samples using microarray expression data Automatic early detection and classification of leukemia from microscopic blood image Sdct-auxnetθ: Dct augmented stain deconvolutional cnn with auxiliary classifier for cancer diagnosis gbm: Generalized boosted regression models Classification of normal vs malignant cells in b-all white blood cancer microscopic images Skin lesion classification with ensembles of deep convolutional neural networks Prediction of epileptic seizure by analysing time series eeg signal using-nn classifier Covid-19 identification from volumetric chest ct scans using a progressively resized 3d-cnn incorporating segmentation, augmentation, and class-rebalancing Dermo-doctor: A web application for detection and recognition of the skin lesion using a deep convolutional neural network Challenges of deep learning methods for covid-19 detection using public datasets Diabetes prediction using ensembling of different machine learning classifiers Drnet: Segmentation and localization of optic disc and fovea from diabetic retinopathy image Automatic mass classification in breast using transfer learning of deep convolutional neural network and support vector machine Detection, segmentation, and 3d pose estimation of surgical tools using convolutional neural networks and algebraic geometry Dsnet: Automatic dermoscopic skin lesion segmentation Dermoexpert: Skin lesion classification using a hybrid convolutional neural network through segmentation, transfer learning, and augmentation. medRxiv Deep residual learning for image recognition Classification of normal versus malignant cells in b-all white blood cancer microscopic images Mobilenets: Efficient convolutional neural networks for mobile vision applications Squeeze-and-excitation networks Densely connected convolutional networks Review of mri-based brain tumor image segmentation using deep learning methods Methods for combining experts' probability assessments White blood cells segmentation and classification to detect acute leukemia Treatment outcome classification of pediatric acute lymphoblastic leukemia patients with clinical and medical data using machine learning: A case study at mahak hospital Identifying medical diagnoses and treatable diseases by image-based deep learning Classification of cancer microscopic images via convolutional neural networks, in: ISBI 2019 C-NMC Challenge: Classification in Cancer Cell Imaging Adam: A method for stochastic optimization On combining classifiers Imagenet classification with deep convolutional neural networks C50: C5. 0 decision trees and rule-based models Toward automated classification of b-acute lymphoblastic leukemia All-idb: The acute lymphoblastic leukemia image database for image processing Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: A review Acute leukemia classification by using svm and k-means clustering Acute lymphoblastic leukemia cells image analysis with deep bagging ensemble learning Some methods for classification and analysis of multivariate observations New decision support tool for acute lymphoblastic leukemia classification Identification of significant risks in pediatric acute lymphoblastic leukemia (all) through machine learning (ml) approach Classification of leukemic b-lymphoblast cells from blood smear microscopic images with an attention-based deep learning method and advanced augmentation techniques Texture feature based classification on microscopic blood smear for acute lymphoblastic leukemia detection Fuzzy based blood image segmentation for automated leukemia detection Deep learning covid-19 features on cxr using limited training data sets Effect of probability-distance based markovian texture extraction on discrimination in biological imaging Neighborhood-correction algorithm for classification of normal and malignant cells Acute lymphoblastic leukemia classification from microscopic images using convolutional neural networks, in: ISBI 2019 C-NMC Challenge: Classification in Cancer Cell Imaging Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Investigation of white blood cell biomarker model for acute lymphoblastic leukemia detection based on convolutional neural network Image classification with the fisher vector: Theory and practice Automatic morphological analysis for acute leukemia identification in peripheral blood microscope images Classification of normal and leukemic blast cells in b-all cancer using a combination of convolutional and recurrent neural networks, in: ISBI 2019 C-NMC Challenge: Classification in Cancer Cell Imaging Ensemble convolutional neural networks for cell classification in microscopic images Very deep convolutional networks for large-scale image recognition Cyclical learning rates for training neural networks Global incidence and prevalence of acute lymphoblastic leukemia: a 10-year forecast bethlehem Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer A basis for efficient representation of the s-transform Design and simulation of maximum power point tracking of photovoltaic system using ann Classification of blasts in acute leukemia blood samples using k-nearest neighbour Inception-v4, inception-resnet and the impact of residual connections on learning Going deeper with convolutions Rethinking the inception architecture for computer vision Brain tissue segmentation using neuronet with different pre-processing techniques Isbi challenge 2019: Convolution neural networks for b-all cell classification Fuzzy c means detection of leukemia based on morphological contour segmentation Global Country Profiles on Breast Cancer Now Most Common Form of Cancer: WHO Taking Action Global Cancer Profile Deepmen: Multi-model ensemble network for blymphoblast cell classification Aggregated residual transformations for deep neural networks Multi-streams and multi-features for cell classification None. No funding to declare. There is no conflict of interest to publish this article.