key: cord-1009500-nmcfucvg authors: Canayaz, Murat title: MH-COVIDNet: Diagnosis of COVID-19 using Deep Neural Networks and Meta-heuristic-based Feature Selection on X-ray Images date: 2020-10-06 journal: Biomed Signal Process Control DOI: 10.1016/j.bspc.2020.102257 sha: dec30dd8a9302b3a52f15ab4e2e249561f4536dd doc_id: 1009500 cord_uid: nmcfucvg COVID-19 is a disease that causes symptoms in the lungs and causes deaths around the world. Studies are ongoing for the diagnosis and treatment of this disease, which is defined as a pandemic. Early diagnosis of this disease is important for human life. This process is progressing rapidly with diagnostic studies based on deep learning. Therefore, to contribute to this field, a deep learning-based approach that can be used for early diagnosis of the disease is proposed in our study. In this approach, a data set consisting of 3 classes of COVID19, normal and pneumonia lung X-ray images was created, with each class containing 364 images. Pre-processing was performed using the image contrast enhancement algorithm on the prepared data set and a new data set was obtained. Feature extraction was completed from this data set with deep learning models such as AlexNet, VGG19, GoogleNet, and ResNet. For the selection of the best potential features, two metaheuristic algorithms of binary particle swarm optimization and binary gray wolf optimization were used. After combining the features obtained in the feature selection of the enhancement data set, they were classified using SVM. The overall accuracy of the proposed approach was obtained as 99.38%. The results obtained by verification with two different metaheuristic algorithms proved that the approach we propose can help experts during COVID-19 diagnostic studies. COVID-19 is a pandemic disease that has affected about 6.2 million people as of early June. This disease has caused many deaths around the world. COVID-19 is highly contagious and continues to spread rapidly with common symptoms such as fever, cough, muscle pain, and weakness. In addition to the tests performed for the diagnosis of this disease, infected individuals are detected using radiology images. Currently, real-time transcriptase-polymerase chain reaction (RT-PCR) is the accepted standard diagnostic method. Since it is a new type of virus, vaccination studies are continuing and deep learning-based approaches that can help experts diagnose this disease will enable the process to progress faster. When the studies about COVID-19 diagnosis with deep learning are examined, both X-ray and computed tomography (CT) images are used with the data sets created for the disease. It was observed that the images with COVID-19 diagnosis are limited in number in these studies. In the studies we examined, studies using X-ray images for the diagnosis of COVID-19 are as follows:  Hemdan et al. [1] classified the status of being positive or negative on X-Ray images with 7 deep learning models in their study called Covidx-Net. In the study, it was stated that the VGG19 model gave better results with 90% success compared to other models.  Toğaçar et al. [2] trained MobileNet and SqueezeNet with X-Ray images for COVID-19 diagnosis. A stacked data set was obtained before training. The features were J o u r n a l P r e -p r o o f extracted from the models trained with these data and these features were selected with the help of the SMO algorithm. Selected features were classified by SVM. As a result of the classification, it was stated that an overall accuracy of 99.27% was achieved.  Zhang et al. [3] detected COVID-19 by performing anomaly detection on X-ray images. They used 18-layer ResNet in their work. Images were separated into COVID and non-COVID using the binary cross entropy loss function in the network structure established with a 2-class structure. The success of their work was given as 95.18%.  Afshar et al. [4] also defined COVID-19 on X-Ray images by using a capsule networks-based framework. The approach they recommend includes many capsules and convolutional layers. It was stated that this approach achieved success of 98.3%.  Apostolopoulos et al. [5] carried out diagnosis of COVID-19 disease on X-ray images with the transfer learning approach on convolutional neural networks. Success of 98.75% was achieved in the transfer learning process with VGG19.  Ozturk et al. [6] achieved success of 98.08% on X-ray images by using DarkNet with 17 convolutional layers.  Pereira et al. [7] defined COVID-19 with both its deep neural network and its texture properties obtained by using various feature extraction methods. It was reported to have achieved an 0.89 score for F-score.  Uçar et al. [8] achieved 98.3% success in their study using SqueezeNet and Bayesian optimization. The studies which used CT images for the detection of COVID-19 are as follows:  Ardakani et al. [9] classified COVID and non-COVID classes using 10 well-known deep learning models. In their study, ResNet and Xception models provided the best results. J o u r n a l P r e -p r o o f  Barstugan et al. [10, 11] performed the classification of the features they obtained by using feature extraction algorithms during image processing operations from CT images with machine learning methods. In another study conducted by the same team, the feature fusion and ranking method was applied to the features obtained from deep learning models and then classified with SVM.  Yan et al. [12] performed segmentation to determine COVID-19 on CT images.  Hasan et al. [13] used LSTM Neural Network Classifier as a classifier in their study using Q-Deformed Entropy and Deep Learning Features.  Singh et al. [14] used multi-objective differential evolution in determining the parameters of convolutional neural networks, and thus classified CT images with COVID-19. The motivation for the study is to propose an approach that effectively classifies COVID-19, pneumonia and healthy lung X-Ray images by combining deep learning and meta-heuristic algorithms for early diagnosis of this disease that is important for human life. It is known that X-ray images contain a high amount of noise and are low-density grayscale images. For this reason, the contrast on X-rays obtained from some machines and boundary representations may be weak. It is quite challenging to extract features from these X-ray images. The quality of these images can be improved by applying some contrast enhancement techniques. Thus, feature extraction from these images can be performed more efficiently and comfortably. In this study, we focused on an image processing method that provides the best contrast. After trying many techniques, we decided on the image contrast enhancement algorithm (ICEA). We obtained the enhancement data set using this technique. We ran this data set with the best known deep learning models and extracted the feature vectors of the data set. Another source of motivation for our study is that there is no study about contrast among the other studies and J o u r n a l P r e -p r o o f there are a limited number of studies that perform feature extraction with deep learning models. We can summarize our contribution to this field as follows: 1. The study presents a contrast-adjusted data set containing 3 classes of COVID-19, pneumonia, and normal images for the use of researchers. 2. The study shows the effect of feature extraction on classification results after using the image contrast enhancement technique in X-ray images. 3. Assessment of classification performances with a small number of features selected on X-ray images with the help of meta-heuristic algorithms. Our study is organized as follows for this purpose. In section 2, datasets, models, methods and MH-CovidNet approach are explained. Experimental studies are shown in section 3. At the end of the study, there is a discussion and conclusion. The data set we created for the study includes COVID-19, pneumonia and normal X-ray images. As known, it is very difficult to find an open source data set since COVID-19 is a new disease type. For this reason, the images in openly shared data sets were combined when creating the data set. The first data set we received is the data set presented by Joseph Paul Cohen [15] on Github. This data set contains 145 images labeled COVID-19. The second dataset we used for COVID-19 is the dataset that was made publicly available in Kaggle by Rahman et al. [16] with 219 images. By combining these data sets, a data set with 364 images was obtained for COVID-19. For pneumonia and normal chest X-ray images, the data set prepared by Kermany et al. [17] was used. In order to increase the performance of the models, J o u r n a l P r e -p r o o f an equal number of images were selected for each class. Since cCOVID-19 images are limited in number, 364 images were selected for other classes, and the original data set was created. In the process of creating the enhancement data set, contrast enhancement was performed on each image in the original dataset separately by using the image contrast enhancement algorithm (ICEA). In this way, the noise in the original data set was removed and the best contrast was achieved. The ICEA is one of the image processing techniques developed as a solution to the contrast enhancement problem. In this study, it was used for the first time on X-ray images. The algorithm is explained in section 2.3.1. In the approach proposed for the study, the results will be examined in both data sets. In experimental studies, 70% of the data set was used as training data and 30% as test data. In the final steps of the study, the consistency of the study results was tested by using k-fold cross validation for the Enhancement data set. COVID-19 chest images in the original dataset and enhancement data set are shown in Fig. 1 . AlexNet [18] , an 8-Layer CNN network, was first announced in 2012 with an award in the ImageNet competition. After this competition, it was proven that the image properties obtained from CNN architectures can exceed the properties obtained by classical methods. In the AlexNet structure, there is a 11x11 convolution window on the first layer. The input size before this layer was determined as 227x227. In the second layer, this convolution window is first reduced to 5x5 and then to 3x3. Also, 2-step stride and max pooling layers are added. There is an output layer of 4096 after the convolution layer in the last layer. The layer named "FC8" that comes after this is the layer where we obtained the 1000 feature vectors used in our application. In this model, RELU is used as activation function instead of Sigmoid. VGG is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford [19] . The model achieved 92.7% top 5 test accuracy on more than 14 million image datasets of 1000 classes in ImageNet. This model improved on AlexNet by using 3 × 3 core size filters one after the other, instead of large core size filters. The input size in the first layer is 224x224. After 3x3 convolution layers and max pooling layers, two 4096 fully connected layers are found in the structure of this model. As with the AlexNet model, this model also has an "FC8" layer, which we used for feature extraction. The GoogleNet model was introduced in 2015 as a deep learning model that emerged with the idea that existing neural networks should go deeper [20] . This network model consists of modules. Each module consists of different-sized convolution and max-pooling layers. Each module is called 'inception'. Although the model consisting of a total of 9 inception blocks has computational complexity, the speed and performance of the model were increased with the J o u r n a l P r e -p r o o f improvements. In this model, 1000 features were extracted using the "loss3-classifier" layer in our study. The ResNet [21] model is a deep learning model developed by Microsoft Research Team that won the 2015 "ImageNet Large Scale Visual Recognition Challenge (ILSVRC)" competition with an error rate of 3.57%. Each layer of a ResNet consists of several blocks. With this model, when the residual layer structure is determined, the number of parameters calculated is reduced compared to other models. In this model, 1000 features were extracted by using the "fc1000" layer for feature extraction in our study. In our study, these deep learning models were used for feature extraction. From each of them, 1000 features were obtained and selected with feature selection of meta-heuristic algorithms and classified with SVM. The parameter values used in the models are given in Table 1 . Model structures are summarized in Table 2 . Table 1 contains the parameters for each model that we used in our study. For example, while the input image size of the AlexNet model is 227x227, the input size of the others is 224x224. The momentum of the stochastic gradient descent (SGD) optimization algorithm used for each model was determined as 0.9. The minibatch value is determined as 64 for each model. This value can be 128 or 256 depending on the performance of the hardware that the J o u r n a l P r e -p r o o f applications are running on. It is a significant parameter that needs to be adjusted since it requires a lot of memory. The learning rate used for all models is 1e-5. All these parameters were obtained through experimental experience. Table 2 shows the dimensions of all models, the number of layers in the models, and the names of these layers, respectively. This algorithm used to create the enhancement dataset is a new algorithm proposed by Ying et al. [22] to provide accurate contrast enhancement. This algorithm works as follows; first, the weight matrix is designed for image fusion using lighting prediction techniques. The camera response model is then presented to synthesize multiple exposure images. Then, in regions where the original image is underexposed, the best exposure ratio is found for good exposure of the synthetic image. Finally, the input image and the synthetic image are combined according to the weight matrix to obtain an enhanced image. The main formulas used in the algorithm are given in Eqs. 1 and 2. The publication [22] can be reviewed for detailed information about the algorithm. The images are combined as in Eq. 4 to obtain a wellexposed image of all pixels. Where, N represents the number of images, Pi represents the i-th image in the exposure set, Wi represents the i-th image's weight map, c is the index of the three-color channels, and R is the result of enhancement. Pi is obtained from Eq. 2. Where g is called Brightness Transform Function (BTF) and ki is the exposure ratio. The Beta-Gamma Connection Model in Equation 3 was used as BTF in our study. Where β and γ are parameters that can be calculated from camera parameters a, b, and exposure ratio k. We used a constant parameter as in the original study (a = −0.3293, b = 1.1258). At the end of the algorithm, the enhancement image is obtained by using Eq. 4. Feature selection is a critical component in data science. High dimensional data causes some undesirable situations in applied models. These include 1) training time increases with increased features, and 2) causes overfitting in models Selecting effective features with feature selection helps prevent such undesirable situations Although there are many feature selection algorithms, feature selection with meta-heuristic algorithms has been widely used recently. Therefore, in our approach, we chose to use two swarm-based meta-heuristic algorithms for feature selection. The parameter values used for these algorithms were obtained by examining the studies using these algorithms. Among the features obtained using deep neural networks, the most effective ones are selected using these meta-heuristic algorithms. Binary versions of the algorithms are preferred. Algorithms choose features according to their flows during the study. A fitness value is obtained by sending these features to the fitness function. In each iteration, features that will provide a better value than this fit value are sought by the algorithm. At the end of the algorithm's work, the features with the best value are selected. In our study, the number of features obtained with the help of each model at the beginning is 1000. The features selected from these 1000 features were classified with the SVM classifier in the next step. J o u r n a l P r e -p r o o f Particle swarm optimization (PSO) [23] is a meta-heuristic algorithm that results from modeling swarm movements of animals such as birds and fish. In this algorithm, there are two important arguments of pbest and gbest values used to update the speed and position information of candidate solutions in the swam. Eqs. for the algorithm are [24] ; Where rand is a random number uniformly distributed between 0 and 1. Where x is the solution, pbest is personal best and gbest is global best solution, F(.) is fitness function and t is the number of iterations. BPSO is the binary version of this algorithm. For the fitness function specified in the equations, K-nearest neighbor classifier [25] error rate was used in our application. The BPSO algorithm used for the application can be accessed at [26] . Parameter values are; N=20; T=100; c1=2; c2=2; Vmax=6; Wmax=0.9; and Wmin=0.4. J o u r n a l P r e -p r o o f This is an optimization algorithm that mimics the hunting strategy and social leadership of gray wolves proposed by Mirjalili in 2014 [27] . The group size is between 5 and 12 individuals. The hierarchy of gray wolves comprises four groups: alpha, beta, delta and omega wolves. Leader wolves are called alpha. Alpha wolves are the best wolves to manage the other wolves in the group and are usually responsible for making decisions about hunting, sleeping place, waking time and so on. The second in the social group's social hierarchy is the beta wolf. Beta is the leading wolf's (alpha) assistant in many events. Delta wolf is the compulsory third wolf to comply with alpha and beta wolves, and can only rule omega wolves. In other words, the omega wolf is the lowest level of gray wolf [27] . Mathematical equations for the models developed for the hunting strategies of wolves are given in Eq. 8. Where Xp is the position of prey, A is the coefficient vector, and D is defined as Where C is the coefficent vector, and X is the position of the grey wolf. The position updates of the gray wolves occur as in Eq. 9. ( + 1) = 1 + 2 + 3 3 Feature selection was made with BGWO [28] , which is a binary version of this algorithm. Again, as in BPSO, K-nearest neighbor [25] error rate was used as the fitness function for this algorithm. The algorithm can be accessed at [29] . Parameter values used for this algorithm; population: 20, and iteration: 100. J o u r n a l P r e -p r o o f SVM [30] , a supervised learning model, is particularly effective for classification, numerical prediction and pattern recognition tasks. SVMs find a line between different classes of data to maximize the distance of a line or hyper plane to the next closest data points. In other words, the support vector machines calculate a maximum margin limit, which leads to a homogeneous division of all data points. Eqs. 10 and 11 represent formulas for a line or hyper plane, respectively. SVM [31, 32] should find weights so that the data points are separated according to a decision rule. SVM is demonstrated in Fig. 2 . . ⃗⃗⃗⃗ + = 0 (10) where w is the normal vector to the hyperplane, x is the input vector, and y is the correct output of the SVM for ith training example. Since we use deep learning algorithms and meta-heuristic algorithms together in our proposed approach, we thought it would be appropriate to name it MH-CovidNet. Our approach will be referred to by this name from now on. The MH-CovidNet approach aims to distinguish COVID-19, pneumonia, and normal X-ray images using features. The MH-CovidNet proposed in our study to achieve this consists of 4 stages. After the data set is created, a new To briefly summarize Fig. 3 and MH-CovidNet: In our study, a 3-class data set of COVID-19, normal, and pneumonia was created from the Xray images we obtained from open sources. Pre-processing was performed on this data set using the ımage contrast enhancement algorithm (ICEA) [15] . The newly obtained data set was trained with deep learning models such as AlexNet, VGG19, GoogleNet, and ResNet. Feature extraction was completed using the trained models. With the help of two different meta-heuristic algorithms of binary particle swarm optimization (BPSO) and binary gray wolf optimization (BGWO), the most effective features were selected. If we explain the process here a little more; features obtained from each model are subjected to feature selection through the BPSO algorithm. Selected features are classified by SVM. Then, the features of the two models that provide the highest accuracy in SVM from the features obtained with the help of BPSO are combined and the feature selection is made with BPSO. The same process is done for the features of the models providing the lowest accuracy. We decided it would be appropriate to use a second meta-heuristic algorithm to verify the reliability of these results obtained with BPSO. Therefore, we used the BGWO algorithm for the same operations. In The application developed for the study was developed in the Matlab environment. The computer running the application has features such as 16 GB RAM, I7 processor and GeForce 1070 graphics card. Performance metrics [33, 34] In experimental studies, 30% of the data was used for testing and 70% for training at every stage of the approach. In the final steps of the study, the consistency of the study result was tested using k-fold cross-validation with a k value of 5 for the feature dataset obtained from the enhancement data set. The first stage of the application is the stage of creating the enhancement data set by applying the ICEA method to the original data set. In the second stage, each deep neural network was trained with original and enhancement data and the models were recorded as "* .mat" files separately. In the first step of this stage, results from J o u r n a l P r e -p r o o f trained models were obtained using the original data set and SVM. In experimental studies using the original dataset, AlexNet achieved 97.55%, VGG19 98.16%, GoogleNet 95.10% and ResNet achieved overall accuracy rate of 95.71%. In Fig. 5 , the confusion matrices resulting from the SVM classification of the feature dataset that we obtained after training the original dataset on VGG19 are shown. The experimental results obtained for both data sets are shown in Table 3 . Fig. 8 shows the some of confusion matrices obtained in this step. The analysis results in this step are given in Table 4 . a) AlexNet b) VGG19 Fig. 8 . Confusion matrices with the method of 5-fold cross-validation for enhancement data Table 4 Metric values of the confusion matrix of models (cross validation). c) GoogleNet d) ResNet Fig.9 . Confusion matrices obtained using the BPSO method Table 5 Metric values obtained using the BPSO method. data. The analysis results in this stage are given in Table 7 and 8. Confusion matrices are given in Fig. 11 , and Fig.12 . a) AlexNet b) VGG19 Fig. 11 . Confusion matrices obtained using the BGWO method The studies performed on X-ray images so far and the comparative table of our study is given in Table 9 . When we look at the studies done so far, the approach we propose achieved the best value. It should be emphasized that this success was achieved by using fewer numbers of features than other models. In addition, the non-balanced class problem is avoided by keeping the number of cases in the class equal. For detailed information about the studies, see the introduction section. In this study, deep learning models and meta-heuristic algorithms were used together for classification of 3 classes using COVID-19, pneumonia and normal lung X-ray images. AlexNet and VGG19, both deep learning models, provided better results in experimental studies using both the original and enhancement datasets compared to other models. When we look at the initial values of the results obtained using the features obtained from GoogleNet and ResNet models and the results obtained with the approach we propose, there was not much change. However, when the features obtained from these two models were combined after feature selection with the help of meta-heuristic algorithms, an increase was observed in the success value. In the study, the most effective features were selected with the help of two meta-heuristic algorithms that verify each other. In order to confirm the accuracy of the results obtained, we tried to prove the reliability of the approach we proposed using holdout validation and k-fold cross validation methods. J o u r n a l P r e -p r o o f studies will contribute to the process. It is obvious that different models and methods should be attempted to diagnose the disease. COVID-19, which is a rapidly spreading disease in the world, will continue to affect our lives for a long time if vaccine studies do not succeed in the near future. Researchers continue to investigate methods for diagnosis and treatment in this regard. The primary purpose of our study is to contribute to this research. For this purpose, we created a 3-class dataset, which included COVID-19, pneumonia and normal X-ray lung images we obtained from open sources. The created data set was pre-processed and a new data set was obtained. Deep learning models of AlexNet, VGG19, GoogleNet, and ResNet, trained with this data set, were used for feature extraction. Then, the most effective features were selected from the extracted features with the help of meta-heuristic algorithms. Selected features were classified with the SVM classifier. The features of the models that provided the highest performance were combined among themselves, and the features of the models that provided the lowest performance were combined. Again, classification was done with SVM. When we look at the J o u r n a l P r e -p r o o f results obtained, 99.38% overall accuracy was obtained as a result of selecting and classifying the features obtained from the VGG19 model with the help of the BPSO algorithm. Another successful model was found to be AlexNet. Since the approach was proven to be reliable by considering different criteria, it is predicted that it can be used to provide another idea for experts during the diagnosis of COVID-19 disease. In order to contribute to this field in future studies, the plan is to continue studies using image processing and different deep learning models. Information about source codes, datasets, and related analysis results used in this study will be given at this web link. https://github.com/mcanayaz Murat CANAYAZ: Methodology, Software, Validation, Investigation,Data curation, Writing -original draft, Visualization, Project administration, Conceptualization, Validation, Formalanalysis, Resources, Writing -review & editing, Supervision, Visualization, Funding acquisition. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. This article does not contain any data, or other information from studies or experimentation, with the involvement of human or animal subjects. COVIDX-Net: A Framework of Deep Learning Classifiers to Diagnose COVID-19 in X-Ray Images COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches COVID-19 Screening on Chest X-ray Images Using Deep Learning based Anomaly Detection. (2020) COVID-CAPS: A Capsule Network-based Framework for Identification of COVID-19 cases from X-ray Images Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks Automated detection of COVID-19 cases using deep neural networks with X-ray images COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks Coronavirus (COVID-19) Classification Using CT Images by Machine Learning Methods, (2020) Coronavirus (COVID-19) Classification using Deep Features Fusion and Ranking Technique,(2020) Chest CT Image Segmentation --A Deep Convolutional Neural Network Solution. (2020) Classification of Covid-19 Coronavirus, Pneumonia and Healthy Lungs in CT Scans Using Q-Deformed Entropy and Deep Learning Features Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks. European journal of clinical microbiology & infectious diseases : official publication of the COVID-19 image data collection Can AI Help in Screening Viral and COVID-19 Pneumonia? Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification ImageNet classification with deep convolutional neural networks Very Deep Convolutional Networks for Large-Scale Image Recognition Going deeper with convolutions Deep Residual Learning for Image Recognition A New Image Contrast Enhancement Algorithm Using Exposure Fusion Framework Particle Swarm Optimization A New Co-Evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy for Feature Selection Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier EMG Feature Selection and Classification Using a Pbest-Guide Binary Particle Swarm Optimization Grey Wolf Optimizer Binary grey wolf optimization approaches for feature selection A New Competitive Binary Grey Wolf Optimizer to Solve the Feature Selection Problem in EMG Signals Classification Support-vector networks Support Vector Machines for Classification BT -Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers A unified view on multi-class support vector classification A new approach for image classification: convolutional neural network Fusing fine-tuned deep features for recognizing different tympanic membranes This study was supported by Scientific Research Projects Department Project No. FBA-2018-6915 from Van Yuzuncu Yıl University.