key: cord-0934384-w518a0wa authors: Habib, Nahida; Hasan, Md. Mahmodul; Reza, Md. Mahfuz; Rahman, Mohammad Motiur title: Ensemble of CheXNet and VGG-19 Feature Extractor with Random Forest Classifier for Pediatric Pneumonia Detection date: 2020-10-30 journal: SN Comput Sci DOI: 10.1007/s42979-020-00373-y sha: d9f32724b99125f3365b223ec833dcc0f6b0fa0c doc_id: 934384 cord_uid: w518a0wa Pneumonia, an acute respiratory infection, causes serious breathing hindrance by damaging lung/s. Recovery of pneumonia patients depends on the early diagnosis of the disease and proper treatment. This paper proposes an ensemble method-based pneumonia diagnosis from Chest X-ray images. The deep Convolutional Neural Networks (CNNs)—CheXNet and VGG-19 are trained and used to extract features from given X-ray images. These features are then ensembled for classification. To overcome data irregularity problem, Random Under Sampler (RUS), Random Over Sampler (ROS) and Synthetic Minority Oversampling Technique (SMOTE) are applied on the ensembled feature vector. The ensembled feature vector is then classified using several Machine Learning (ML) classification techniques (Random Forest, Adaptive Boosting, K-Nearest Neighbors). Among these methods, Random Forest got better performance metrics than others on the available standard dataset. Comparison with existing methods shows that the proposed method attains improved classification accuracy, AUC values and outperforms all other models providing 98.93% accurate prediction. The model also exhibits potential generalization capacity when tested on different dataset. Outcomes of this study can be great to use for pneumonia diagnosis from chest X-ray images. Pneumonia is an infection in one or both lungs in which lungs' alveoli may fill up with fluid or pus causing cough, fever or making it difficult to breathe. It is the major cause of morbidity and mortality for infants under 2 years old and people over 60 years old in many countries. Every year in the US alone, more than 250,000 people are hospitalized and around 50,000 die [1] . Pneumonia causes death of more than 800,000 children where the death toll was more than the sum of malaria, AIDS and measles [2, 3] . According to a study, conducted by a UK based NGO in India [4] , around 4 million people could have been saved if appropriate actions were taken. Thus, early and accurate diagnosis and treatment of pneumonia is of utmost important. Different diagnosing methods of pneumonia are blood test, chest radiograph, lung ultrasound, CT scan, pulse oximetry and bronchoscopy [5] . The advances in artificial intelligence (AI) specially Machine Learning (ML) and Deep Learning (DL) algorithms have made robust improvement in automatic diagnosis of diseases. The Computer Aided Diagnosis (CAD) tool makes disease detection and prediction task easier, cheaper and more accessible. The CAD may be referred to a diagnosis made by a radiologist considering the results obtained through computer analysis [6] . CAD methods using Deep Convolutional Neural Network (CNN) and ML are now using in successful diagnosis of lung diseases-lung cancer, pneumonia, covid-19, tuberculosis, breast cancer, colon cancer, prostate cancer, coronary artery disease, congenital heart defects, brain diseases, skin lesions etc. Chest X-ray is one of the most commonly used painless and non-invasive radiological tests to screen and diagnose many lung diseases also other methods such as CT and MRI can be used [7] . As Chest X-ray is fast, easy and inexpensive than CT and MRI, they are mostly used in emergency diagnosis and treatment of lungs, hearts and chest wall diseases. CheXNet is a 121-layer convolutional neural network model proposed by some researchers of Stanford University to diagnose pneumonia. The model is trained on ChestX-ray14 dataset and diagnose all the 14 pathologies of the dataset with best results [8] . VGG-19 is a 19-layer trained Convolutional Neural Network invented by Visual Geometry Group of Oxford University. This CNN architecture contains 16 convolutional layers and 3 fully connected layers. This paper proposed an ensemble technique of two CNN models-fine-tuned CheXNet and VGG-19 models for the diagnosis of pediatric pneumonia from Chest X-ray images. The kermany dataset [9] of 5856 chest X-ray images are used here for this purpose. CNN features from the two models are collected and ensembled. The number of Pneumonia and Normal images are not same in the dataset. So, to level the dataset Random Under Sampler (RUS), Random Over Sampling (ROS), and SMOTE over sampling techniques are applied on the ensembled features. For the detection and classification of Pneumonia from Normal images different ML algorithms-Random Forest (RF), Adaptive Boosting (AdaBoost), K-Nearest Neighbors (KNN) are applied on the features afterword's. Among the Models RF achieves the best classification accuracy of almost 99%. The rest of the paper is organized as follows. Related works are mentioned in "Related Works" section. The proposed method of this paper is described in "Proposed Methodology" section. "Results and Discussion" section describes and discusses the results. Finally, last but not least conclusion and future work of this research can be found in "Conclusion" section. This section highlights the studies or works done by other researchers related to this research. Modern technology has made diagnosis and treatment easier and more convenient than before. The availability of large datasets and unbeatable success of deep learning has made diagnosis task more accurate. Authors of Ref. [5] , mentioned that, due to pneumonia at least 2500 children die every day and it is the leading cause of children death. Rudan et al. [10] , shared that on an annual basis more than 150 million people get infected with pneumonia especially children under 5 years old. Lobar pneumonia, pulmonary tuberculosis and lung cancer, these three types of diseases are discriminated from chest X-ray by authors of Ref. [11] on their paper. Ghimire et al. [12] indicated that influenza was the cause in 10% of all childhood pneumonia and 28% of all children with influenza developed pneumonia in Bangladesh. The rate of pneumonia affected children is seen more to poor family with unhygienic livelihood than economically stable families. For successful diagnosis of pneumonia various deep learning CNN models have already been developed and still developing to get more accurate results. Chest X-ray is easy to use medical imaging and diagnostic technique performed by expert radiologists to diagnose pneumonia, tuberculosis, interstitial lung disease, and early lung cancer [13] . Stephen et al. [14] . proposed a deep learning method on their research for pneumonia classification. PneumoCAD is a Computer Aided Diagnosis system developed by Ref. [15] that uses handcrafted features to diagnose pediatric pneumonia from chest X-ray images. Vamsha Deepa et al. [5] , proposed a feature extraction method that classify normal lungs and pneumonia infected lungs from X-ray images. The method extracts haralick texture features of X-rays and detect pneumonia with an accuracy of 86%. To detect clouds of pneumonia in chest X-rays, [16] used Otsu thresholding method that separates the healthy part of the lung from the pneumonia infected lungs. Based on segmentation, feature extraction and artificial neural networks [17] defined a method for the classification of lung diseases such as tuberculosis, pneumonia and lung cancer from chest radiographs. GLCM features, Haralick features and Congruency parameter are used by some authors for pneumonia detection. In Rapid Childhood Pneumonia Diagnosis, Wavelet augmented analysis is used by Ref. [18] for the cough sound while Ref. [19] proposed a deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays. For pediatric pneumonia diagnosis a transfer learning method with deep residual network is proposed by Ref. [7] . The National Institutes of Health (NIH) CXR dataset released by Ref. [20] is comprised of 112,120 frontal CXRs, individually labeled to include up to 14 different diseases. For creating these labels from the associated radiological reports, the authors used Natural Language Processing to text-mine disease classifications whose expected accuracy is more than 90%. The CheXNet deep CNN model uses this NIH CXR dataset and is said to exceed the average radiologist performance on the pneumonia detection task [8] . Both VGG-19 and CheXNet image classification models accept input images of size 224 × 224. In this paper, here an ensemble of two deep CNN models (CheXNet and VGG-19) is proposed with transfer learning and fine tuning. The features collected from the ensembled models are then balanced and fed to the ML model for successful and accurate classification of Pneumonia and Non-Pneumonia (Normal) images. The proposed model is also tested on a different to dataset [21] for generalization. Each and every steps of the proposed methodology-from data collection to the classification of the X-Ray images is discussed here. The proposed methodology includes image preprocessing using an image enhancement technique and resizing of images, augmentation of training images, finetuning CNN models, model's training, extraction of CNN's feature vector, ensemble of extracted feature vectors, dataset imbalance handling and Pneumonia classification using different machine learning algorithms. Figure 1 shows the overall procedure of the proposed methodology. Table 1 describes the datasets used to train and evaluate the proposed method. This study uses two Kaggle datasets-one is the dataset of Ref. [9] collected from Guangzhou Women and children's Medical Center to train, test and validate the CNN model's performance and the other datasets is of Ref. [21] to validate the proposed method performances. The dataset [9] contains 5856 chest X-ray images collected by X-ray scanning of pediatric patients between 1 and 5 years old. This dataset also comes with the ground truth of each X-ray image and the data is distributed into train, validation and test folder [22] . There are 4273 images of pneumonia patients and rest are the chest X-Ray images of healthy people. The other dataset [21] named 'COVID-19 Chest X-ray Database' from Kaggle challenge which is the 'Winner of the COVID-19 Dataset Award' is used to evaluate the generalization performance of this proposed method. The dataset contains Chest X-ray images of COVID-19, Normal healthy people and Viral Pneumonia patients. There are 219 COVID-19 positive images, 1341 Normal images and 1345 Viral Pneumonia images. But as this research only focuses on the diagnosis of Pneumonia, only the Normal and Pneumonia images are selected and tested on the proposed final model. This section narrates the image preprocessing operations performed on the X-ray images. The preprocessing operation consists of two different tasks-image enhancement and data augmentation. Different image enhancement techniques such as-Gabor filtering, Local binary Pattern, Histogram Equalization, Adaptive Histogram Equalization etc. are applied on the X-ray images. Among them, adaptive histogram equalization (AHE) does well in CNN based feature extraction. The AHE technique enhances the contrast of the images. To retrain a transfer learned CNN model, the X-ray images need to be rescaled as required for the transfer learned model. For training the transfer learned, fine-tuned CheXNet and VGG19 model, the images are resized to 224 × 224 size images. Besides that, the images are rescaled into 0-1 range to match input types of models. Training deep CNN models requires an extensive amount of labeled data. A model trained with limited data may produce models that perform well on train set however fails to generalize. The obnoxious condition of overfitting to training example leads to poor performance on the test dataset. Increasing dataset size is an excellent solution to reduce this overfitting or memorization problem. Data augmentation or artificial increase of data from available data is one of the best practices to reduce overfitting in the research community. This study utilizes geometric translations of images to increase dataset size. Shearing, zooming, flipping and shifting (width shift and height shift) are the operations applied for data augmentation. convolution), ReLU activation and Batch Normalization. Figure 2 shows the architectural design of the fine-tuned CheXNet model. The VGG-19 model consists of five blocks, and each contains few convolution layers followed by a maxpooling layer. For convolution operation, VGG-19 uses the kernel of size 3 × 3 with 64, 128, 256, 512 and 512 channels on its five convolution blocks, respectively. Figure 3 represents the architecture of the fine-tuned VGG-19 model. Both of the models are fine-tuned for the detection and classification of pneumonia images. During fine-tuning, following modification are performed on the models for retraining the models using dataset [9] . For fine-tuning CheXNet, at first the neural network part of the pre-trained CheXNet model is eliminated. The redesigned new classifier part of the model uses 512, 128 and 64 neurons on its hidden dense layers. The output layer of the model has one neuron to classify images into two classes (Normal and Pneumonia). Each dense layer (except output layer) of the classifier part is followed by a dropout. Dropout reduces the capability of the model while training and guides the model during training against overfitting. The dense layer usages ReLU activation function while output layer usages sigmoid activation function for binary classification. Fine-tuning VGG-19 includes retraining only the last convolution block of the VGG-19, while freezing the first four blocks of the model. Instead of using a flattening layer after feature extractor, this research has applied global average pooling to reduce the number of learning parameters. Both of the model usages Adam optimizer with binary cross-entropy as the loss function. Learning rate of the models is maintained based on the validation accuracy of the model. The training procedure uses learning rate decay by A dataset with a severe skew in the distribution of data among classes is a tricky situation known as dataset imbalance. The inconsistent distribution of data may produce a biased model and perform poorly on generalization. As the dataset [9] used here contains more Pneumonia images than Normal images though the difference is not severe, it may cause the model get slightly biased towards Pneumonia image detection. Thus, as a dataset balancing technique Random Under Sampling (RUS), Random Over-Sampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE) are applied on the ensembled feature vector to balance the dataset. The RUS randomly deletes images from the majority class while ROS technique increases the dataset size by reproducing the minority class data randomly and SMOTE randomly creates new minority class points by interpolating between the existing minority points and their neighbors [23] . All of the techniques create the same number of images for each class. Among the techniques ROS performs little better than the RUS and SMOTE. The classification techniques used for pneumonia detection from the feature vector are discussed in this section. This study experimented with different ML classification methods from tree-based and non-tree-based techniques. Random Forest uses bagging ensemble technique for classification. The RF classifier contains decision trees (DT) as a building block. Each decision tree is trained with a randomly selected samples of the dataset and these facilitates to train uncorrelated DT. Then, the output of all DTs is combined to make the final decision. Figure 4 illustrates the random forest classifier. Adaptive Boosting or AdaBoost is an ensemble method that aims to creates a strong classifier from a number of weak classifiers. In AdaBoost, at first a base model is trained then other models are added which tries to correct the errors of the previous one. This research uses Logistic regression (LR), DTs and support vector machines (SVM) as the base model of AdaBoost. AdaBoost with LR works better here. K-Nearest Neighbors is a simple supervised algorithm that can be used for both classification and regression problems. Based on a distance functions KNN classifies new cases while storing all available cases. Using 7 number of neighbors gives the best result in this research. Among these different methods, Random Forest (RF) classifier achieved better performance than others. As, RF is an ensemble of high variance and low bias DTs where the output of each of the DT acts as a vote for the output class. Besides, ensemble of these DTs also creates a final model containing low bias and moderate variance. Thus, RF ensured here better performances than others. This section justifies each of the steps of the proposed methodology by discussing the experimental results of each method and showing related comparison with other pneumonia detection models. Both of the two X-ray image dataset [9] and [21] are collected and preprocessed using the techniques mentioned in "Data Preprocessing" section. As training a CNN model with huge amount of data helps to make the model more accurate, generalize and robust, augmentation is applied only on the training datasets to artificially increase the size of training images, to get more different images and to resolve the overfitting problems. Moreover, the proposed method is also tested on a different dataset [21] to ensure strong generalization ability of the model. Figures 5, 6 displays the preprocessed images of dataset [9] and [21] . The images are also labeled as 0 and 1 respectively for Normal and Pneumonia images. The CNN models CheXNet and VGG-19 used in this research were trained using the preprocessed Chest X-ray images of size 224 × 224. Both of the model is then tested and validated on the 624 test and 16 validation images of dataset [9] . CheXNet achieves 92.63% test accuracy and 100% validation accuracy while VGG-19 obtained 89.26% and 50% test and validation accuracy on test and validation images. The ensemble of two models will help to increase the overall performances of the model. Figure 7a , b shows the validation performances of the CheXNet and VGG-19 models on the validation dataset. The features of all the train and test images are now collected separately and combined them altogether to make a feature vector of the dataset. To make the pneumonia detection model more realistic and increase accuracy the model's feature vectors are then ensembled. The coding part of this task is done completely in python keras framework with TensorFlow backend using google colab gpu on a mac operating system. The ensembled feature vector has 4265 pneumonia images and 1575 normal images. Thus, to ensure unbiased results from the classifier RUS, ROS and SMOTE data balancing techniques are applied on the feature vector. After applying RUS, there will be 1575 pneumonia images and 1575 normal images and both of ROS and SMOTE makes total 8530 images 4265 for each normal and pneumonia class. ML models such as Random Forest, AdaBoost, KNN are then applied with five-fold cross validation on the balanced dataset. Among the techniques ROS with RF performs better than other. Table 2 below shows the mean accuracy of five-fold cross validation for different ML models using different data balancing techniques. The table shows that the best mean accuracy obtains using RF with ROS. In Table 3 , comparative accuracy of RF, AdaBoost and KNN is shown for each five-fold with ROS. In every fold, RF achieves the best classification accuracy than AdaBoost and KNN. So, RF is selected as final classifier and ROS as the data imbalance handler. And the best mean accuracy obtained by the proposed model is 98.93% or 99% approx. with 99% precision, 99% recall and 99% f-score on average. To validate the model performances again the previous 16 validation images are tested on the final proposed model which achieves 100% validation accuracy. Figure 8 displays the validation report of our final model with confusion matrix. To test the model's generalization performances a different dataset [21] is used. All of the Normal and Pneumonia images of new test dataset [21] are tested using the final model for generalization. The model achieves 98.25% generalization accuracy on this new test dataset. Figure 9 presents the model's performance on the test dataset. The proposed model performance on this new dataset proves the model's strong and appropriate generalization capability. Among the 2686 images the model successfully classifies 2639 images while only 47 images are misclassified. The above discussion proved that the proposed model acts so well in diagnosing pediatric pneumonia from Chest X-rays. This section demonstrates the comparison of our model's performances with existing models and methods. The most frequently used models for medical image classifications are VGG16, ResNet, DenseNet121, InceptionV3, Xception which achieves approximately 74-87% accuracy on Pneumonia classification. The proposed model performs much better than these models. The models proposed on paper [7, 14, 19] and paper [24] shows that their models perform better than the other existing models for pneumonia detection. Comparison of these models with proposed model is given in Table 4 . The comparison table clearly proves that the proposed model is the best performing Pediatric Pneumonia Classification Model. The model achieves an average AUC (Area under the ROC Curve) value of 98.94%. The two-dimensional area under the ROC curve is defined by AUC. Figure 10 shows the ROC curve (Receiver Operating Characteristic Curve) with 99% AUC area. ROC curve is a graphical representation showing the performance of a model. An automated CAD tool is presented in this paper, for the detection of childhood pneumonia from chest X-ray images. In this research a fusion of two fine-tuned CNN model is proposed which is then ensembled with ML Random Forest classifier. The proposed final model classifies pneumonia 98.93% accurately and comparing to other models provide better prediction. So, our proposed model can be referred to as one of the best Pneumonia classification models. In future, the authors will try to optimize the model to classify two or more diseases with an effective result. Prateek et al. [19] 0.901 Liang et al. [7] 0.905 Stephen et al. [14] 0.937 Saraiva et al. [24] 0.953 Proposed model 0.989 Epidemiology and etiology of childhood pneumonia Childhood pneumonia as a global health priority and the strategic interest of the Bill & Melinda gates foundation 17 lakh Indian children to die due to pneumonia by 2030; here's how it can be averted Feature extraction and classification of X-ray lung images using haralick texture features Role of gist and PHOG features in computer-aided diagnosis of tuberculosis without segmentation A transfer learning method with deep residual network for pediatric pneumonia diagnosis Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017. ArXiv Identifying medical diagnoses and treatable diseases by image-based deep learning Global estimate of the incidence of clinical pneumonia among children under five years of age Pair-wise discrimination of some lung diseases using chest radiography Pneumonia in South-East Asia region: public health perspective Computer-aided detection in chest radiography based on artificial intelligence: a survey An efficient deep learning approach to pneumonia classification in healthcare. Hindawi J Healthcare Eng Computer-aided diagnosis in chest radiography for detection of childhood pneumonia Detection of pneumonia clouds in chest X-ray using image processing approach Automatic detection of major lung diseases using chest radiographs and classification by feed-forward artificial neural network Wavelet augmented cough analysis for rapid childhood pneumonia diagnosis Deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays Chest x-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases COVID-19 Chest X-ray Database Visualizing and explaining deep learning predictions for pneumonia detection in pediatric chest radiographs Handling data irredularities in classification: foundations, trends, and future challenges. Pattern Recogn Classification of images of childhood pneumonia using convolutional neural networks, In: 6th international conference on bioimaging Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors are grateful to the participants who contributed to this research. No financial support is provided from any organization during the research project. Conflict of interest The authors declare that there is no conflict of interests regarding the publication of this work.