key: cord-0895301-kracygyq authors: Narin, Ali title: Accurate Detection of COVID-19 Using Deep Features Based on X-Ray Images and Feature Selection Methods date: 2021-08-19 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104771 sha: da3115982d878c6f43f0acee74cb696da281c775 doc_id: 895301 cord_uid: kracygyq COVID-19 is a severe epidemic affecting the whole world. This epidemic, which has a high mortality rate, affects the health systems and the economies of countries significantly. Therefore, ending the epidemic is one of the most important priorities of all states. For this, automatic diagnosis and detection systems are very important to control the epidemic. In addition to the recommendation of the “reverse transcription-polymerase chain reaction (RT-PCR)” test, additional diagnosis and detection systems are required. Hence, based on the fact that the COVID-19 virus attacks the lungs, automatic diagnosis and detection systems developed using X-ray and CT images come to the fore. In this study, a high-performance detection system was implemented with three different CNN (ResNet50, ResNet101, InceptionResNetV2) models and X-ray images of three different classes (COVID-19, Normal, Pneumonia). The particle swarm optimization (PSO) algorithm and ant colony algorithm (ACO) was applied among the feature selection methods, and their performances were compared. The results were obtained using support vector machines (SVM) and a k-nearest neighbor (k-NN) classifier using the 10-fold cross-validation method. The highest overall accuracy performance was 99.83% with the SVM algorithm without feature selection. The highest performance was achieved after the feature selection process with the SVM + PSO method as 99.86%. As a result, higher performance with less computational load has been achieved by realizing the feature selection. Based on the high results obtained, it is thought that this study will benefit radiologists as a decision support system. IV) To detect unnecessary and non-informative features, having less computational load and fewer parameters, PSO and ACO were preferred among metaheuristic methods. V) A high-accuracy decision-making system has been provided to radiologists for detection of COVID-19 and follow-up. In this study, the X-ray images database known as "COVID-19 radiography database", which is open to access by Kaggle, was used [24] . (3) "COVID-19 (+) Chest X-ray Images" compiled from 43 different scientific articles, and (4) "Kaggle chest X-ray database" consisting of Kaggle chest X-6 J o u r n a l P r e -p r o o f ray images [25] [26] [27] . In this way, 219 COVID-19 patient images, 1341 normal individuals, and 1345 pneumonia patients were compiled. Fig 2 shows sample images belonging to these three groups. CNN is a sub-branch of deep learning that takes its name from the convolution process. As an end-to-end model, it is understood from the literature 7 J o u r n a l P r e -p r o o f that it is highly effective and high performance [28] . Besides this usage, CNNs are also used in feature extraction processes [29, 30] . Hybrid approaches are developed by using the extracted features with traditional classifiers. The first reason why CNN models are used in feature extraction is that deep features give very high results in studies in the literature. Secondly, there is a limited number of hybrid approaches using deep attributes to detect COVID-19. CNN networks generally consist of 3 layers: a convolution layer, pooling layer, and fully connected layer [31] . In the convolution layer, it is the most basic layer for which the model is named. The patterns coming here are passed through filters, and feature maps are created. These filters allow a wide variety of features to be detected by shifting them along with the pattern. The higher the number of convolution layers the deeper the properties can be obtained. In the pooling layer, the size of the feature maps and the reduction of the number of network parameters are performed. Finally, the feature maps obtained in the fully connected layer are transformed into one-dimensional vectors. Interconnections of fully connected layers are weighted. In addition to these layers, it is used in layers such as normalization and dropout layers. In the literature, there are famous models created by combining all these structures with different topologies. ResNet50, ResNet101 and InceptionResNetV2 models were used in this study. Unlike classical CNN approaches, the ResNet50 model is created by adding the residual value and residual block model. The ResNet50 architecture, which consists of 50 layers, is a model that won first place in the 2015 ILSVRC (ImageNet Large Scale Visual Recognition Challenge) image classification task on the ImageNet test set and 2015 COCO (Common Objects in Context) competition for COCO detection and COCO segmentation [32] . Convolution processes include 1 x 1, 3 x 3, and 1 x 1 convolution stages. ResNet50 architecture consists of 25.6 million parameters [33] . ResNet101 is 8 J o u r n a l P r e -p r o o f another model used. . This model, which also includes 33 residual blocks, consists of 101 layers [34] . The InceptionResNetV2 model, which has a total of 164 deep residual network layers, is used as the third model [35] . In the study, features obtained from fully connected layers were used in the classification stage of all these models. They are fully connected layers from which "fc1000" for ResNet50, "fc1000" for ResNet101 and "predictions" for InceptionResNetV2 features are obtained. In each layer, 1000 feature values were obtained. The total of the features obtained is 3000. Outputs from beginning, middle of 3 different models and layers from which features are taken are shown in Fig 3. Feature selection is used to find the optimum features subset by deleting irrelevant data from large datasets [36] . Among the reasons for doing this, it can be said that decreasing the computational load ratio and increasing the estimation rate (such as accuracy value), and preventing overfitting problem. On the other hand, the feature selection process does not guarantee an increase in estimation rates. However, achieving similar performance even with fewer features is a positive development. Within the scope of this study, using metaheuristic methods such as PSO and ACO, the reduction of too many features and their effects on achievement were investigated. PSO and ACO are used in feature selection because they have a small number of parameters among metaheuristic methods and can quickly reach the result. J o u r n a l P r e -p r o o f PSO was inspired by the movements of some animals moving in a herd while meeting their basic needs. In this way, it is seen that they reach their goals 10 J o u r n a l P r e -p r o o f more quickly. It is an optimization algorithm introduced by Dr. Kennedy and Dr. Eberhart in 1995 [37] . Each individual with velocity and position information is called a "particle," the group formed by these particles is called a "swarm". Thanks to mathematically determining fitness functions, the optimization of particles is controlled. While the best state of a particle that comes closest to the solution is called pbest (personnel best), the current state of the particle with which the whole herd is closest to the solution is called gbest (global best). By updating these values, each particle's rate of change and movements are determined [38] . The PSO flow diagram used for feature selection is given in The initial parameters used for PSO are given in Table 1 . In addition, general accuracies for the fitness value were obtained for k-NN (k = 1). The calculated 11 J o u r n a l P r e -p r o o f velocity and position equations are as follows: Where X id and V id are particle velocity and particle position, respectively. t represents the t. iteration in the process, w is the inertia weight. c 1 and c 2 are cognitive and social factors, r 1i and r 2i are random values uniformly distributed in [0, 1]. pbest is best position, gbest is global best. Table 1 PSO algorithm initial parameters in used. ACO is one of the metaheuristic methods introduced by Dorigo in 1991, inspired by the daily lives of ants [39] . It is known that ants leave a pheromone secretion, which is a chemical substance, on their way to food and move collectively as they communicate between them. There are many possibilities for an ant coming out of the nest looking for food. It decides the direction to go by looking at the pheromone substance density in the environment. Here, the possibility arises that other ants prefer the route with the most pheromone substance. Therefore, the probability of ants choosing this route increases more. In addition, if ants encounter any obstacle or an unusual situation, finding the shortest path between food and their nests as soon as possible reveals the importance of this method in terms of optimization. This method was first applied for the traveling salesman problem, and the results were quite impressive [40] . ACO is also used in feature selection processes. In feature selection with ACO, most ants select a feature that indicates that this feature is more distinctive than other features. As a result, this feature has more pheromone secretion, and other ants in the herd will be more likely to select this feature. According to this approach, the desired number of features can be selected by evaluating the abundance of pheromone secretion. The feature selection stage using ACO is shown in The parameters used for ACO are given in Table 2 . In addition, general accu- J o u r n a l P r e -p r o o f racy values for the fitness value were obtained for k-NN (k=1). The pheromone is updated according to the following equation: The mathematical expression of the probability of the ants moving from point i to point j is as follows. Where τ ij (t) is the pheromone value at the time t, p is the pheromone trail evaporation rate, n is number of ants, η ij (t) is a priori available heuristic information at the time t, α is the weight of pheromone, β is the weight of heuristic information. Table 2 ACO algorithm initial parameters in used. Number of ants (n) 10 Evaporation rate (p) 0. Vapnik developed SVM for the solution of pattern recognition and classification problems [41] . While classifying the data, it aims to find the closest samples of the classes to each other and to maximize the perpendicular distances of these samples to the separating surface that will separate the two classes. The separator surface can have many different combinations without changing its success on the data set. Thanks to the support vectors, the distance between the classes is maximum. The Lagrange method, which finds the smallest and largest values of a function depending on a constraint, is used to realize this operation. In this study, the results are obtained using the SVM algorithm's linear, quadratic, and cubic kernel functions. The parameter of C was tested from 0.01 to 100 with 0.1 increments. The kernel scale was set to 1 and the box constraint set to 1. k-NN is a supervised and sample-based classification algorithm. In the k-NN classifier algorithm, the training phase occurs by separating the training set 15 J o u r n a l P r e -p r o o f [42] . Therefore, no additional time is wasted for training the model. In this algorithm, a test data is classified by many votes of its neighbors, with the object assigned to the class most common among its k nearest neighbors. In this study, odd numbers with k values from 1 to 11 were used. In order to avoid a tie in voting, only odd numbers were preferred in the selection of the k value. The Euclidean metric was used as the distance criterion. 5 metrics were used to evaluate the predictive performance of the classifiers [43] : In classifier studies, as it is known, the high training performance of the classifier does not mean that its performance on test data will be high. For this reason, the data set is divided into two as training and testing in all classifier 16 J o u r n a l P r e -p r o o f studies. The k-fold cross validation (CV) method in which the dataset is divided into k-folds. While the k-1 piece is used for training, one of them is used for testing. This process is repeated until all parts are used for testing. Classifier performance is calculated as classifier training performance and classifier test performance by taking averages separately for training and testing [44] . In this study, the fold value of k is taken as 10. In this study, feature maps obtained from 3 different CNN models, feature selection algorithms, and classification algorithms were implemented using MATLAB2020a. The classifier performances were calculated by applying the 10-fold cross-validation method to the feature maps obtained with 3 different models. Adaptive moment estimation (ADAM) optimization algorithm was used as an optimizer for all models [45] . Learning rate and minibatch size for the CNN models used in feature extraction were chosen as 0.00001 and 10, respectively. In the preliminary trials, the maximum number of iterations was taken as 30 since the overfitting occurred after an average of 30 iterations. Table 3 shows the total number of features obtained and all execution times from each CNN model. In terms of time complexity, it can be seen in Table 3 that it takes much longer to extract features from the InceptionResNetv2 Table 4 . Table 3 The features obtained in the study and all execution times. 3, 5, 7, 9, 11) values are given in Table 5 . As can be seen from Table 5 , the highest overall performance value was obtained in the ResNet101 model among all models. The highest performance has been achieved by combining the features obtained from all models. It can be said that the results are high for small values of k of the k-NN algorithm. It is understood that especially the value of k = 1 stands out. The performance of the k-NN classifier is lower than the SVM algorithm. Feature selection performances were investigated by using metaheuristic PSO and ACO optimization algorithms on the features. The number of selected features and all execution times of each model are given in Table 6 . It can be seen from Table 6 that the time complexity value of PSO is much higher than ACO. Performance results of the feature selection algorithms for both SVM and k-NN are given separately in Table 7, Table 8 , Table 9 , Table 10 . It is clear from Table 7 that the highest performance results were obtained with SVM (linear) and combined features. The overall accuracy value was 99.86% and the F1-score value was 99.08%. In the results given in Table 8 , the results obtained by k-NN are lower than SVM. In general, the highest overall accuracy value and F1-score value for k = 1 are 99.41% and 95.96%, respectively. to be expected and the other 2 as viral pneumonia. According to the results obtained with ACO in Table 9 , the highest general accuracy and F1-score value were obtained with the combined features. It is seen that the highest classifier performance is achieved with the ResNet50 model. said that k-NN has a lower performance value than SVM. The overall accuracy value and the F1-score values are 99.35% and 95.47%, respectively. In audio signals [15, 16] . These two imaging technologies are among the widely used methods in addition to RT-PCR testing in the epidemic. While a large number of daily cases causes excessive density in imaging methods, the workload of radiologists who play a role in the examination and decision-making of these images is increasing. Therefore, the number of false detections may increase due to workloads and human eye vision errors. To minimize these errors, automatic diagnosis and detection systems performed with machine learning algorithms are developed by experts. Among these detection systems, many studies have been carried out with deep learning algorithms and traditional methods. Some of these studies using lung X-ray images are given in Table 11 obtained all the results with the hold-out method [13] . On the other hand, Ozturk et al. studied data from too many classes, unlike many other studies. Since the data in some classes were too small, they increased the data using the SMOTE method. Although they have stated that they have over-25 J o u r n a l P r e -p r o o f come the unbalanced data set problem, the data is still not balanced. In other words, 126 pieces of data in total were increased to 260 with this method. The authors noted that handmade feature extraction would be more appropriate with limited data. In this study, PCA, one of the feature reduction methods, was used to increase performance. They achieved an F1 score of 94.23% and 93.99%, respectively, with very high overall accuracy for small and multi-class datasets. Nevertheless, the researchers obtained performance values using the hold-out method in this study [12] . Another fundamental approach in detecting COVID-19 is the studies using end-to-end deep learning models. Ozturk et al. achieved 87.02% accuracy and 87.37% F1-score in their results for multi-class and 5-CV. The most striking aspect of this article is evaluating the results obtained from the model they call DarkCovidNet separately by the radiologist. As a result, it was revealed that the results obtained from DarkCovidNet were successful by an expert. In this study, they used a total of 1125 data (125 COVID-1, 500 Pneumonia, and 500 No-Findings). The only disadvantage of the study is that they have little COVID-19 data [46] . In another study, Narin et al. achieved the highest 96.1% accuracy and 83.5% F1-score with the ResNet50 model among the five different pre-trained CNN models [47] . Nevertheless, all the results of the study were obtained for binary classes. It is known that the results obtained from binary classification problems are higher than in other multi-class studies. Ismael and Sengur presented a versatile study using end-to-end CNN models, deep features approach, and hand-crafted feature extraction. They achieved the highest performance value with 94.70% accuracy, with the features they obtained through the ResNet50 deep learning model. However, in their studies, they obtained performance results by using 50%-50% of the data set as test and training data [48] . In addition to these methods, hybrid studies are using deep learning-based features and traditional classification algorithms. However, these studies are very few. From these studies, Nour et al. performed a 3-class study for the detection of COVID-19. They used the Bayesian algorithm for the optimization of hyper-parameters of machine learning algorithms. Although they obtained the results with the data they reserved for 70% training and 30% testing, they achieved the highest accuracy of 98.97% and F1-score of 96.72%. In this study, they achieved high performance by not using the feature selection method [18] . Sethy Pneumonia, and 127 Normal). In addition, presenting the performance results of deep features obtained through 13 different deep learning models is one of the prominent aspects of this study. Unfortunately, for this study, the results were obtained by separating 80% of the data for training and 20% for testing. [19] . In another study, Narin detected COVID-19 with 3-class and 5-fold CV with ResNet50 deep features. The overall accuracy performance of COVID-19, differing from other classes without feature selection, was found to be 99.35% [20] . the Bagging tree classification algorithm. They reported that they achieved 99.07% accuracy and 96.00% F1 score with 10-fold CV. These high results in the study were obtained through binary classification without feature selection [22] . Using the Relief algorithm and feature selection, Turkoglu obtained 99.18% accuracy for 90% -10% training and test data. The same researcher achieved 98.08% accuracy by applying the 10-fold CV method. The researcher used AlexNet, one of the most basic CNN models. Unlike the studies in the literature, deep features obtained from different layers of AlexNet were used [23] . This study was conducted using feature maps obtained from 3 different pretrained deep learning models and feature maps obtained by combining the features obtained from these methods. Since it is aimed to achieve higher performance with fewer features, feature selection on these feature maps was carried out with PSO and ACO optimization algorithms.It was observed that the features selected with both PSO and ACO slightly increased the performance of the SVM algorithm. When the feature selection strategy based on both PSO and ACO was compared, when SVM was used, the PSO-based feature selection method had higher performance values compared to the ACO method. In the k-NN algorithm, it can be said that the features obtained with PSO and ACO reduce the performance. Considering the model-based features, it can be said that the features obtained from the ResNet50 model have higher overall performance values than other models. It can be clearly stated that SVM is more successful in classification algorithms than the k-NN algorithm, both in this study and in other studies in the literature [49, 50] . According to the results obtained, the advantages of the study are as follows: 28 J o u r n a l P r e -p r o o f 1) Features have been extracted from pre-trained deep learning models. The results were found to be considerably higher. 2) PSO and ACO metaheuristic feature selection methods were compared on deep features. 3) High results can be obtained when using deep learning approaches and traditional machine learning approaches together. shown to improve performance. The promising results show that deep learning-based models have the potential to assist expert radiologists and medical practitioners in successfully managing the COVID-19 pandemic. We believe that these high-performance approaches will be significant in the epidemic that affects the whole world. The most critical issue limiting the study is the limited number of data. Increasing the data and testing it with data from many different centers will provide more stable systems. Another limitation of the study is that the data used in classes other than COVID-19 data belong to children between the ages of 1-5. There are cases where COVID-19 disease can also be seen in children. However, using adult image classes will provide more stable and generalized results in this disease, generally seen in adults. In the future, features will be extracted using image processing methods on X-ray and CT images. From these extracted features, the features that provide the best separation between classes will be determined and performance values will be measured with different classification algorithms. Apart from this, the results of the study will be tested with data from many different centers. In another study to be carried out in the future, studies will also be conducted to determine the demographic charac-29 J o u r n a l P r e -p r o o f teristics of patients and the probability of catching COVID-19 with artificial intelligence-based systems. In this study, traditional classification methods and feature selection algorithms are used by using features obtained from the CNN model. The performance of the features obtained from the ResNet50, ResNet101 and Inception-ResNetV2 models were tested with two different classification algorithms. It is classified by the high performance of COVID-19 and other classes. The features selected with PSO and ACO to increase the classification performance also contributed positively to the achievements. As a result, it is seen that the features taken from CNN models show very high performances with traditional classification algorithms and feature selection algorithms. There is no funding source for this article. This article does not contain any data, or other information from studies or experimentation, with the involvement of human or animal subjects. J o u r n a l P r e -p r o o f Kassani et al. [22] DenseNet121+Bagging Trees 2 10-Fold Acc=99.07 F1=96.00 Turkoglu [23] Pre-Trained CNN model+Relief+SVM Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: A meta-analysis Nervous system involvement after infection with COVID-19 and other coronaviruses Clinical Pathway for Early Diagnosis of COVID-19: Updates from Experience to Evidence-Based Practice Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review Application of deep learning technique to manage COVID-19 in routine 32 using CT images: Results of 10 convolutional neural networks Diagnosis of SARS-CoV-2 infection based on CT scan vs RT-PCR: reflecting on experience from MERS-CoV CoroDet: A deep learning based classification for COVID-19 detection using chest X-ray images Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images Frequency and Distribution of Chest Radiographic Findings in Patients Positive for COVID-19 Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Extracting Possibly Representative COVID-19 Biomarkers from X-ray Images with Deep Learning Approach and Image Data Related to Pulmonary Diseases Classification of coronavirus (covid-19) from x-ray and ct images using shrunken features Covid-classifier: An automated machine learning model to assist in the diagnosis of covid-19 infection in chest x-ray images Coronavirus disease (covid-19) detection in chest x-ray images using majority voting based classifier ensemble Automated Detection and Forecasting of COVID-19 using Deep Learning Techniques: A Review Deep neural networks for COVID-19 detection and diagnosis using images and acoustic-based techniques: A recent review, arXiv (2020) Covid-19 detection empowered with machine learning and deep learning techniques: A systematic review A Novel Medical Diagnosis model for COVID-19 infection detection based on Deep Features and Bayesian Optimization Detection of coronavirus disease (COVID-19) based on deep features and support vector machine Detection of Covid-19 Patients with Convolutional Neural Network Based Features on Multi-class X-ray Chest Images COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches Automatic detection of coronavirus disease (COVID-19) in X-ray and CT images: A machine learning based approach COVIDetectioNet: COVID-19 diagnosis system based on X-ray images using features selected from pre-learned deep features ensemble Can AI Help in Screening Viral and COVID-19 Pneumonia? Covid-19 image data collection Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals Classification of Covid-19 coronavirus, pneumonia and healthy lungs in CT scans using Q-deformed entropy and deep learning features Classification of white blood cells using deep features obtained from Convolutional Neural Network models based on the combination of feature selection methods Deep learning Wider or Deeper: Revisiting the ResNet Model for Visual Recognition Deep residual learning for image recognition Rethinking the inception architecture for computer vision A new feature selection method to improve the document clustering using particle swarm optimization algorithm Particle swarm optimization Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms Ant colony optimization Ant colonies for the travelling salesman problem An overview of statistical learning theory Nearest neighbor pattern classification Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation Adam: A method for stochastic optimization Automated detection of covid-19 cases using deep neural networks with x-ray images Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Deep learning approaches for covid-19 detection based on chest x-ray images The authors declare that there is no conflict to interest related to this paper.